Cross-Selling with Collaborative Filtering Qiang Yang HKUST Thanks

Cross-Selling with Collaborative Filtering Qiang Yang HKUST Thanks: Sonny Chee 1

Motivation n n Question: n A user bought some products already n what other products to recommend to a user? Collaborative Filtering (CF) n Automates “circle of advisors”. + 2

Collaborative Filtering “. . people collaborate to help one another perform filtering by recording their reactions. . . ” (Tapestry) n Finds users whose taste is similar to you and uses them to make recommendations. n Complimentary to IR/IF. n IR/IF finds similar documents – CF finds similar users. 3

Example n Which movie would Sammy watch next? n Ratings 1 --5 • If we just use the average of other users who voted on these movies, then we get • Matrix= 3; Titanic= 14/4=3. 5 • Recommend Titanic! • But, is this reasonable? 4

Types of Collaborative Filtering Algorithms n Collaborative Filters n n n Statistical Collaborative Filters Probabilistic Collaborative Filters [PHL 00] Bayesian Filters [BP 99][BHK 98] Association Rules [Agrawal, Han] Open Problems n Sparsity, First Rater, Scalability 5

Statistical Collaborative Filters n n n Users annotate items with numeric ratings. Users who rate items “similarly” become mutual advisors. Recommendation computed by taking a weighted aggregate of advisor ratings. 6

Basic Idea n n Nearest Neighbor Algorithm Given a user a and item i n First, find the most similar users to a, n n n Let these be Y Second, find how these users (Y) ranked i, Then, calculate a predicted rating of a on i based on some average of all these users Y n How to calculate the similarity and average? 7

Statistical Filters n Group. Lens [Resnick et al 94, MIT] n n n Filters Use. Net News postings Similarity: Pearson correlation Prediction: Weighted deviation from mean 8

Pearson Correlation 9

Pearson Correlation n Weight between users a and u n Compute similarity matrix between users n n Use Pearson Correlation (-1, 0, 1) Let items be all items that users rated 10

Prediction Generation n Predicts how much a user a likes an item i n Generate predictions using weighted deviation from the mean (1) n : sum of all weights 11

Error Estimation n Mean Absolute Error (MAE) for user n a Standard Deviation of the errors 12

Example Users Correlation Sammy Dylan Mathew Sammy 1 1 -0. 87 Dylan 1 1 0. 21 Mathew -0. 87 0. 21 1 =0. 83 13

Statistical Collaborative Filters n Ringo [Shardanand Maes 95 (MIT)] n Recommends music albums n n n Each user buys certain music artists’ CDs Base case: weighted average Predictions n Mean square difference n n n First compute dissimilarity between pairs of users Then find all users Y with dissimilarity less than L Compute the weighted average of ratings of these users Pearson correlation (Equation 1) Constrained Pearson correlation (Equation 1 with weighted average of similar users (corr > L)) 14

Open Problems in CF n “Sparsity Problem” n n CFs have poor accuracy and coverage in comparison to population averages at low rating density [GSK+99]. “First Rater Problem” n The first person to rate an item receives no benefit. CF depends upon altruism. [AZ 97] 15

Open Problems in CF n “Scalability Problem” n CF is computationally expensive. Fastest published algorithms (nearest-neighbor) are n 2. n n n Any indexing method for speeding up? Has received relatively little attention. References in CF: n http: //www. cs. sfu. ca/CC/470/qyang/lectures/cf ref. htm 16

References n P. Domingos and M. Richardson, Mining the Network Value of Customers, Proceedings of the Seventh International Conference on Knowledge Discovery and Data Mining (pp. 57 -66), 2001. San Francisco, CA: ACM Press. 17

Motivation n n Network value is ignored (Direct marketing). Examples: Market to Low expected profit Affected (under the network effect) High expected profit Marketed High expected profit 18

Some Successful Case n Hotmail n n n Grew from 0 to 12 million users in 18 months Each email include a promotional URL of it. ICQ n n n Expand quickly First appear, user addicted to it Depend it to contact with friend 19

Introduction n n Incorporate the network value in maximizing the expected profit. Social networks: modeled by the Markov random field Probability to buy = Desirability of the item + Influence from others Goal = maximize the expected profit 20

Focus n n Making use of network value practically in recommendation Although the algorithm may be used in other applications, the focus is NOT a generic algorithm 21

Assumption n Customer (buying) decision can be affected by other customer’s rating Market to people who is inclined to see the film One will not continue to use the system if he did not find its recommendations useful (natural elimination assumption) 22

Modeling n n n View the markets as Social Networks Model the Social Network as Markov random field What is Markov random field ? n n An experiment with outcomes being functions of more than one continuous variable. [e. g. P(x, y, z)] The outcome depends on the neighbors’. 23

Variable definition n n X={X 1, …, Xn} : a set of n potential customer, Xi=1 (buy), Xi=0 (not buy) Xk (known value), Xu (unknown value) Ni ={Xi, 1, …, Xi, n} : neighbor of Xi Y={Y 1, …, Ym} : a set of attribute to describe the product M={M 1, …, Mn} : a set of market action to each customer 24

Example (set of Y) n n n Using Each. Movie as example. Xi : Whether the person i saw the movie ? Y : The movie genre Ri : Rating to the movie by person i It sets Y as the movie genre, n different problems can set different Y. 25

Goal of modeling n n To find the market action (M) to different customer, to achieve best profit. Profit is called ELP (expected lift in profit) n n n ELPi(Xk, Y, M) = r 1 P(Xi=1|Xk, Y, fi 1(M))r 0 P(Xi=1|Xk, Y, fi 0(M)) –c r 1: revenue with market action 26 r revenue without market action

Three different modeling algorithm n n n Single pass Greedy search Hill-climbing search 27

Scenarios n n Customer {A, B, C, D} A: He/She will buy the product if someone suggest and discount ( and M=1) C, D: He/She will buy the product if someone suggest or discount (M=1) B: He/She will never buy the product C Discount / suggest The best: M=1 A Discount + suggest M=1 D B Discount / suggest 28

Single pass n n n For each i, set Mi=1 if ELP(Xk, Y, fi 1(M 0)) > 0, and set Mi=0 otherwise. Adv: Fast algorithm, one pass only Disadv: n n Some market action to the later customer may affect the previous customer And they are ignored 29

Single Pass Example A, B, C, D n n n M M M = = = {0, 0, 0, 0} {0, 0, 1, 1} C Discount / suggest M=1 ELP(Xk, Y, f 01(M 0)) ELP(Xk, Y, f 11(M 0)) ELP(Xk, Y, f 21(M 0)) ELP(Xk, Y, f 31(M 0)) Done A Discount + suggest D B <= 0 >0 >0 Single pass Discount / suggest M=1 30

Greedy Algorithm n n Set M= M 0. Loop through the Mi’s, n n setting each Mi to 1 if ELP(Xk, Y, fi 1(M)) > ELP(Xk, Y, M). Continue until no changes in M. Adv: Later changes to the Mi’s will affect the previous Mi. Disadv: It takes much more computation time, several scans needed. 31

Greedy Example n n M M A, B, C, D 0 = {0, 0, 0, 0} = {0, 0, 1, 1} = {1, 0, 1, 1} C A no pass first pass second pass Done D Discount / suggest Discount + suggest Discount / suggest M=1 B M=1 32

Hill-climbing search n n Set M= M 0. Set Mi 1=1, where i 1=argmaxi{ELP(Xk, Y, fi 1(M))}. Repeat n n Let i=argmaxi{ELP(Xk, Y, fi 1( fi 11(M)))} set Mi=1, n Until there is no i for setting Mi=1 with a larger ELP. n Adv: n n The best M will be calculated, as each time the best Mi will be selected. Disadv: The most expensive algorithm. 33

Hill Climbing Example n n M M = = A, B, C, D {0, 0, 0, 0} {0, 0, 1, 0} {1, 0, 1, 0} C Discount / suggest The best: M=1 A Discount + suggest no pass first pass Second pass Done D B Discount / suggest M=1 34

Who Are the Neighbors? n n Mining Social Networks by Using Collaborative Filtering (CFin. SC). Using Pearson correlation coefficient to calculate the similarity. The result in CFin. SC can be used to calculate the Social networks. ELP and M can be found by Social networks. 35

Who are the neighbors? n Calculate the weight of every customer by the following equation: 36

Neighbors’ Ratings for Product n n Calculate the Rating of the neighbor by the following equation. If the neighbor did not rate the item, Rjk is set to mean of Rj 37

Estimating the Probabilities n n n P(Xi) : Items rated by user i P(Yk|Xi) : Obtained by counting the number of occurrences of each value of Yk with each value of Xi. P(Mi|Xi) : Select user in random, do market action to them, record their effect. (If data not available, using prior knowledge to judge) 38

Preprocessing n Zero mean n Prune people ratings cover too few movies (10) n Non-zero standard deviation in ratings n n Penalize the Pearson correlation coefficient if both users rate very few movies in common Remove all movies which were viewed by < 1% of the people 39

Experiment Setup n n Data: Each movie Trainset & Testset (temporal effect) rating 1/96 9/96 Trainset (old) 9/97 Testset (new) 1/96 9/96 released 12/96 40

Experiment Setup – cont. n Target: 3 methods of searching an optimized marketing action VS baseline (direct marketing) 41

Experiment Results [Quote from the paper directly] 42

Experiment Results – cont. n n n Proposed algorithms are much better than direct marketing Hill > greedy >> single-pass >> direct Higher α, better results! (slight) 43

Item Selection By “Hub. Authority” Profit Ranking ACM KDD 2002 Ke Wang Ming-Yen Thomas Su Simon Fraser University 44

Ranking in Inter-related World n n n Web pages Social networks Cross-selling 45

Item Ranking with Crossselling Effect n What are the most profitable items? 100% $10 $8 $5 60% 100% $1. 5 $2 50% $3 35% $3 $0. 5 30% $15 46

The Hub/Authority Modeling n n n Hubs i: “introductory” for sales of other items j (i->j). Authorities j: “necessary” for sales of other items i (i->j). Solution: model the mutual enforcement of hub and authority weights through links. n Challenges: Incorporate individual profits of items and strength of links, and ensure 47 hub/authority weights converges

Selecting Most Profitable Items n Size-constrained selection n given a size s, find s items that produce the most profit as a whole solution: select the s items at the top of ranking Cost-constrained selection n n given the cost for selecting each item, find a collection of items that produce the most profit as a whole solution: the same as above for uniform cost 48

Solution to const-constrained selection Estimated profit Selection cost Optimal cutoff # of items selected 49

Web Page Ranking Algorithm – HITS (Hyperlink-Induced Topic Search) n Mutually reinforcing relationship n n n Hub weight: h(i) = a(j), for all page j such that i have a link to j Authority weight: a(i) = h(j), for all page j that have a link to i h(j) a and h converge if normalized before each iteration 50

The Cross-Selling Graph n n n Find frequent items and 2 -itemsets Create a link i j if Conf(i j) is above a specified value (i and j may be same) “Quality” of link i j: prof(i)*conf(i j). Intuitively, it is the credit of j due to its influence on i 51

Computing Weights in HAP n For each iteration, n n n Authority weights: a(i) = Hub weights: h(i) = i j j i prof(j) conf(j i) h(j) prof(i) conf(i j) a(i) Cross-selling matrix B n n B[i, j] = prof(i) conf(i, j) for link i j B[i, j]=0 if no link i j (i. e. (i, j) is not frequent set) Compute weights iteratively or use eigen analysis Rank items using their authority weights 52

Example n Given frequent items, X, Y, and Z and the table prof(X) = $5 prof(Y) = $1 conf(Y X)= 0. 06 conf(Z X)= 0. 2 prof(Z) = $0. 1 n conf(X Y)= 0. 2 conf(X Z)= 0. 8 conf(Y Z)= 0. 5 conf(Z Y)= 0. 375 We get the cross-selling matrix B: X X Y Z 5. 000 0 1. 000 0 4. 000 0 Y 0. 060 1. 000 0. 500 0 e. g. B[X, Y] = prof(X)0. 037 0. 1001. 0000 Z 0. 020 conf(X, Y) = 0 53

Example (con’t) n n n prof(X) = $5, prof(Y) = $1, prof(Z) = $0. 1 a(X) = 0. 767, a(Y) = 0. 166, a(Z) = 0. 620 HAP Ranking is different from ranking the individual profit n The cross-selling effect increases the profitability of Z 54

Empirical Study n n n Conduct experiments on two datasets Compare 3 selection methods: HAP, PROFSET [4, 5], and Naïve. HAP generate the highest estimated profit in most cases. 55

Empirical Study Drug Store Synthetic Transaction # 193, 995 10, 000 Item # 26, 128 1, 000 Avg. Trans length 2. 86 $1, 006, 970 10 $317, 579 Total profit minsupp Freq. items Freq. pairs 0. 1% 332 39 0. 05% 999 115 0. 5% 602 15 0. 1% 879 11322 56

Experiment Results *PROFSET[4] 57