a7ee80dabdfd068604ab1b571671d4f6.ppt
- Количество слайдов: 22
Profit Mining: From Patterns to Action Ke Wang, Senqiang Zhou, Jiawei Han Simon Fraser University 1
Why Profit Mining? n A major obstacle in data mining application is the gap between: – – n statistic-based pattern extraction and value-based decision making Profit mining: – value-based data mining 2
An Example n Suppose we want to maximize profit. Association rules [AIS 93] {Perfume}->Lipstick (more often) {Perfume}->Diamond (more profit) do not suggest which items (and prices) to recommend to a customer who bought Perfume. n Similar problems with correlation, classification, etc. 3
The Problem n Given: several transactions of form: – – n {, …, | }, for Item, Promotion code, and Quantity. | separates nontarget items and target items. {
Not Prediction Problem n An example: – 100 customers each bought 1 pack for $1/pack. Profit=100(1 -0. 5)=$50. – 100 customers each bought 4 packs for $3. 2/4 -pack. Profit=100(3. 2 -2)=$120. n Prediction repeats the history. n Profit mining gets smarter from the history, by n recommending “right items” and “right prices”. 5
Challenge I - notion of profit n Pure statistic approach favors – n Pure profit approach favors – n {Perfume}-> Lipstick {Perfume}-> Diamond. Profit mining considers: – both statistical significance and profit significance. 6
Challenge II - customer intention n Mining On Availability (MOA): – Paying a higher price implies the willingness to pay a lower price. n {
Challenge III - search space n Thousands of items, and much more sales. Any combination can trigger a recommendation. n Search at alternative concepts (food, meat, etc) and prices makes it worse. 8
Step 1: generating rules n Association rules – n {Diaper -> Beer}, supp=10%, conf=80% Recommendation rules: – {g 1, …, gk} -> , where gi is
Handle alternative concept and prices 10
Step 2: building the model n We rank rules by the “average profit” made by the recommendation of a rule. – {
Step 3: Pruning the model n The model favors “high average profit” rules. n Such rules may bring a large profit. n Such rules may be random noise. n Cannot prune them simply based on statistical frequency. 12
Pruning the model n We prune rules to increase the estimated profit on the whole population. n We organize rules into specificity tree: the parent is the highest ranked general rule of a child. n We cut off the tree to maximize the estimated profit. 13
14
Evaluation n Synthetic datasets: IBM synthetic data generator, modified to have price and cost. n 1000 items and 1000 K transactions n For non-target item i: – – n cost(i)=c/i price j=(1+j*10%)cost(i), j=1, 2, 3, 4. For target items: – Dataset I has 2 target items – Dataset II has 10 target items 15
Profit Gain on Dataset I 16
Hit Ratio on Dataset I 17
Hit Ratio on Dataset I 18
Profit Gain on Dataset II 19
Hit Ratio on Dataset II 20
Hit Ratio on Dataset II 21
Conclusion n Proposed a new direction of data mining: Mining for profit. n Directly factor in business goal into data mining n Related work: microeconomic view of data mining [KPR 98] 22