
881a35e6c12e69255e50bbb29306f31e.ppt
- Количество слайдов: 32
Recommender systems Ram Akella February 23, 2011 Lecture 6 b, i 290 & 280 I University of California at Berkeley Silicon Valley Center/SC
Outline o Definition o Motivation o Types of recommendation systems n n n n Search-based recommendations Category-based recommendations Collaborative filtering Clustering Association Rules Information filtering Classifiers in Recommender Systems
Recommender Systems o Recommender systems are a way of suggesting like or similar items and ideas to a users specific way of thinking. o They try to automate aspects of a completely different information discovery model where people try to find other people with similar tastes and then ask them to suggest new things.
Motivation o Many of the top commerce sites use recommender systems to facilitate the access to information. o Users may find new books, music, or movies that was previously unknown to them. o Also can find the opposite for e. g. : movies or music that will definitely not be enjoyed.
Search-based recommendations o The only visitor types a search query n « data mining customer » o The system retrieves all the items that correspond to that query n e. g. 6 books o The system recommend some of these books based on general, non-personalized ranking (sales rank, popularity, etc. )
Search-based recommendations o Pros: n Simple to implement o Cons: n Not very powerful n Which criteria to use to rank recommendations? n Is it really « recommendations » ? n The user only gets what he asked
Category-based recommendations o Each item belongs to one category or more. o Explicit / implicit choice: n The customer select a category of interest (refine search, opt-in for category-based recommendations, etc. ). o « Subjects > Computers & Internet > Databases > Data Storage & Management > Data Mining » n o The system selects categories of interest on the behalf of the customer, based on the current item viewed, past purchases, etc. Certain items (bestsellers, new items) are eventually recommended
Category-based recommendations o Pros: n Still simple to implement o Cons: n Again: not very powerful, which criteria to use to order recommendations? is it really « recommendations » ? n Capacity highly dependd upon the kind of categories implemented o Too specific: not efficient o Not specific enough: no relevant recommendations
Collaborative filtering o Collaborative filtering techniques « compare » customers, based on their previous purchases, to make recommendations to « similar » customers o It’s also called « social » filtering o Follow these steps: n 1. Find customers who are similar ( « nearest neighbors » ) in term of tastes, preferences, past behaviors n 2. Aggregate weighted preferences of these neighbors n 3. Make recommendations based on these aggregated, weighted preferences (most preferred, unbought items)
Collaborative filtering o Example: the system needs to make recommendations to customer C o Customer B is very close to C (he has bought all the books C has bought). Book 5 is highly recommended o Customer D is somewhat close. Book 6 is recommended to a lower extent o Customers A and E are not similar at all. Weight=0
Collaborative filtering o Pros: n n n Extremely powerful and efficient Very relevant recommendations (1) The bigger the database, (2) the more the past behaviors, the better the recommendations o Cons: n n n Difficult to implement, resource and time-consuming What about a new item that has never been purchased? Cannot be recommended What about a new customer who has never bought anything? Cannot be compared to other customers no items can be recommended
Clustering o Another way to make recommendations based on past purchases of other customers is to cluster customers into categories o Each cluster will be assigned « typical » preferences, based on preferences of customers who belong to the cluster o Customers within each cluster will receive recommendations computed at the cluster level
Clustering o Customers B, C and D are « clustered » together. Customers A and E are clustered into another separate group o « Typicical » preferences for CLUSTER are: n n Book 2, very high Book 3, high Books 5 and 6, may be recommended Books 1 and 4, not recommended at all
Clustering o How does it work? o Any customer that shall be classified as a member of CLUSTER will receive recommendations based on preferences of the group: n Book 2 will be highly recommended to Customer F
Clustering o Problem: customers may belong to more than one cluster; clusters may overlap o Predictions are then averaged across the clusters, weighted by participation
Clustering o Pros: n Clustering techniques work on aggregated data: faster n It can also be applied as a « first step » for shrinking the selection of relevant neighbors in a collaborative filtering algorithm o Cons: n Recommendations (per cluster) are less relevant than collaborative filtering (per individual)
Association rules o Clustering works at a group (cluster) level o Collaborative filtering works at the customer level o Association rules work at the item level
Association rules o Past purchases are transformed into relationships of common purchases
Association rules o These association rules are then used to made recommendations o If a visitor has some interest in Book 5, he will be recommended to buy Book 3 as well o Of course, recommendations are constrained to some minimum levels of confidence
Association rules o What if recommendations can be made using more than one piece of information? o Recommendations are aggregated • If a visitor is interested in Books 3 and 5, he will be recommended to buy Book 2, than Book 3
Association rules o Pros: n n n Fast to implement Fast to execute Not much storage space required Not « individual » specific Very successful in broad applications for large populations, such as shelf layout in retail stores o Cons: n n Not suitable if knowledge of preferences change rapidly It is tempting to do not apply restrictive confidence rules
Information filtering o Association rules compare items based on past purchases o Information filtering compare items based on their content o Also called « content-based filtering » or « content-based recommendations »
Information filtering o What is the « content » of an item? o It can be explicit « attributes » or « characteristics » of the item. For example for a film: n n n Action / adventure Feature Bruce Willis Year 1995 o It can also be « textual content » (title, description, table of content, etc. ) n Several techniques exist to compute the distance between two textual documents
Information filtering o How does it work? n A textual document is scanned and parsed n Word occurrences are counted (may be stemmed) n Several words or « tokens » are not taken into account. That includes « stop words » (the, a, for), and words that do not appear enough in documents n Each document is transformed into a normed TFIDF vector, size N (Term Frequency / Inverted Document Frequency). n The distance between any pair of vector is computed
Information filtering
Information filtering An (unrealistic) example: how to compute recommendations between 8 books based only on their title? Books selected: n Building data mining applications for CRM n Accelerating Customer Relationships: Using CRM and o n n n Relationship Technologies Mastering Data Mining: The Art and Science of Customer Relationship Management Data Mining Your Website Introduction to marketing Consumer behavior marketing research, a handbook Customer knowledge management
Mastering Data Mining: The Art and Science of Customer Relationship Management Data mining your website Data 0. 187 0. 316
Information filtering o A customer is interested in the following book: « Building data mining applications for CRM » o The system computes distances between this book and the 7 others o The « closest » books are recommended: n n n #1: Data Mining Your Website #2: Accelerating Customer Relationships: Using CRM and Relationship Technologies #3: Mastering Data Mining: The Art and Science of Customer Relationship Management n Not recommended: Introduction to marketing n n n Not recommended: Consumer behavior Not recommended: marketing research, a handbook Not recommended: Customer knowledge management
Information filtering o Pros: n No need for past purchase history n Not extremely difficult to implement o Cons: n « Static » recommendations n Not efficient is content is not very informative e. g. information filtering is more suited to recommend technical books than novels or movies
Classifiers o Classifiers are general computational models o They may take in inputs: n n n Vector of item features (action / adventure, Bruce Willis) Preferences of customers (like action / adventure) Relations among items o They may give as outputs: n n n Classification Rank Preference estimate o That can be a neural network, Bayesian network, rule induction model, etc. o The classifier is trained using a training set
Classifiers o Pros: n Versatile n Can be combined with other methods to improve accuracy of recommendations o Cons: n Need a relevant training set