Скачать презентацию Profit Mining From Patterns to Action Ke Wang Скачать презентацию Profit Mining From Patterns to Action Ke Wang

a7ee80dabdfd068604ab1b571671d4f6.ppt

  • Количество слайдов: 22

Profit Mining: From Patterns to Action Ke Wang, Senqiang Zhou, Jiawei Han Simon Fraser Profit Mining: From Patterns to Action Ke Wang, Senqiang Zhou, Jiawei Han Simon Fraser University 1

Why Profit Mining? n A major obstacle in data mining application is the gap Why Profit Mining? n A major obstacle in data mining application is the gap between: – – n statistic-based pattern extraction and value-based decision making Profit mining: – value-based data mining 2

An Example n Suppose we want to maximize profit. Association rules [AIS 93] {Perfume}->Lipstick An Example n Suppose we want to maximize profit. Association rules [AIS 93] {Perfume}->Lipstick (more often) {Perfume}->Diamond (more profit) do not suggest which items (and prices) to recommend to a customer who bought Perfume. n Similar problems with correlation, classification, etc. 3

The Problem n Given: several transactions of form: – – n {<I, P, Q>, The Problem n Given: several transactions of form: – – n {, …, | }, for Item, Promotion code, and Quantity. | separates nontarget items and target items. { | } Recommend target to customers who buy non-target items, to maximize profit. 4

Not Prediction Problem n An example: – 100 customers each bought 1 pack for Not Prediction Problem n An example: – 100 customers each bought 1 pack for $1/pack. Profit=100(1 -0. 5)=$50. – 100 customers each bought 4 packs for $3. 2/4 -pack. Profit=100(3. 2 -2)=$120. n Prediction repeats the history. n Profit mining gets smarter from the history, by n recommending “right items” and “right prices”. 5

Challenge I - notion of profit n Pure statistic approach favors – n Pure Challenge I - notion of profit n Pure statistic approach favors – n Pure profit approach favors – n {Perfume}-> Lipstick {Perfume}-> Diamond. Profit mining considers: – both statistical significance and profit significance. 6

Challenge II - customer intention n Mining On Availability (MOA): – Paying a higher Challenge II - customer intention n Mining On Availability (MOA): – Paying a higher price implies the willingness to pay a lower price. n {} -> can be extracted from transaction { | } n Recognizing this behavior brings new sales opportunities (at lower price). 7

Challenge III - search space n Thousands of items, and much more sales. Any Challenge III - search space n Thousands of items, and much more sales. Any combination can trigger a recommendation. n Search at alternative concepts (food, meat, etc) and prices makes it worse. 8

Step 1: generating rules n Association rules – n {Diaper -> Beer}, supp=10%, conf=80% Step 1: generating rules n Association rules – n {Diaper -> Beer}, supp=10%, conf=80% Recommendation rules: – {g 1, …, gk} -> , where gi is , or Item, or Concept. – {} -> – {Flaked. Chick. } -> – {Meat} -> 9

Handle alternative concept and prices 10 Handle alternative concept and prices 10

Step 2: building the model n We rank rules by the “average profit” made Step 2: building the model n We rank rules by the “average profit” made by the recommendation of a rule. – {} -> matches n n – n t 1: {| } (a hit) t 2: {|} ( a miss) If the cost of Sunchip is $0. 7, the average profit is $0. 15. To recommend, we select the matching rule of the highest possible rank. 11

Step 3: Pruning the model n The model favors “high average profit” rules. n Step 3: Pruning the model n The model favors “high average profit” rules. n Such rules may bring a large profit. n Such rules may be random noise. n Cannot prune them simply based on statistical frequency. 12

Pruning the model n We prune rules to increase the estimated profit on the Pruning the model n We prune rules to increase the estimated profit on the whole population. n We organize rules into specificity tree: the parent is the highest ranked general rule of a child. n We cut off the tree to maximize the estimated profit. 13

14 14

Evaluation n Synthetic datasets: IBM synthetic data generator, modified to have price and cost. Evaluation n Synthetic datasets: IBM synthetic data generator, modified to have price and cost. n 1000 items and 1000 K transactions n For non-target item i: – – n cost(i)=c/i price j=(1+j*10%)cost(i), j=1, 2, 3, 4. For target items: – Dataset I has 2 target items – Dataset II has 10 target items 15

Profit Gain on Dataset I 16 Profit Gain on Dataset I 16

Hit Ratio on Dataset I 17 Hit Ratio on Dataset I 17

Hit Ratio on Dataset I 18 Hit Ratio on Dataset I 18

Profit Gain on Dataset II 19 Profit Gain on Dataset II 19

Hit Ratio on Dataset II 20 Hit Ratio on Dataset II 20

Hit Ratio on Dataset II 21 Hit Ratio on Dataset II 21

Conclusion n Proposed a new direction of data mining: Mining for profit. n Directly Conclusion n Proposed a new direction of data mining: Mining for profit. n Directly factor in business goal into data mining n Related work: microeconomic view of data mining [KPR 98] 22