39c0679127eb5f312fc8f18049f7ddc1.ppt
- Количество слайдов: 22
Recommendation Systems Prof. Dr. Daning Hu Department of Informatics University of Zurich Nov 13 th, 2012
Outline n Introduction n Approaches Recommendation Systems ¨ ¨ Content-based ¨ n Collaborative Filtering Social Contagion Ref Book: Social Network Analysis: Methods and Applications (Structural Analysis in the Social Sciences) ¨ http: //www. amazon. com/Social-Network-Analysis-Applications. Structural/dp/0521387078 2
Introduction n Recommendation systems are a subclass of information filtering system that seek to predict the 'rating' or 'preference' that a user would give to an item or social element they had not yet considered (Wiki) ¨ the user's social approaches) environment (Collaborative Filtering ¨ using a model built from the characteristics of an item (Contentbased approaches) or ¨ studying consumer purchase behavior in e-commerce setting ¨ In particular, the evolution of interactions among consumers and products reflected in online-sales transactions.
Underlying Technologies: Machine Learning n Recommendation systems are instances of personalization software. ¨ ¨ n adapting to the individual needs, interests, and preferences of each user. as part of Customer Relationship Management (CRM). Machine Learning (ML) aims to learn a user model or profile of a particular user based on: ¨ Sample interaction ¨ Rated examples ¨ Used to filter information and predict consumer behaviors 4
Collaborative Filtering n A database of many users’ ratings of a variety of items. n For a given user, find other similar users whose ratings strongly correlate with the current user. n Recommend items rated highly by these similar users, but have not yet rated by the current user. n Amazon, etc. 5
Collaborative Filtering User Database A B C : Z 9 3 : 5 A B C 9 : : Z 10 A B C : Z 5 3 A B C 8 : : Z : 7 Correlation Match Active User A 9 B 3 C. . Z 5 A 6 B 4 C : : Z A B C : Z 9 3 : 5 A 10 B 4 C 8. . Z 1 Extract Recommendations C 6
Collaborative Filtering Method n Weight all users with respect to similarity with the active user. n Select a subset of the users (neighbors) to use as predictors. n Normalize ratings and compute a prediction from a weighted combination of the selected neighbors’ ratings. n Present items with highest predicted ratings as recommendations. 7
Similarity Weighting n Typically use Pearson correlation coefficient between ratings for active user, a, and another user, u. n n ra and ru are the ratings vectors for the m items rated by both a and u Covariance: n Standard Deviation: n ri, j is user i’s rating for item j 8
Cons n Cold Start: enough users in the system to find a match. n Sparsity: The user/ratings matrix is sparse, and it is hard to find users that have rated the same items. n First Rater: Not for an item that has not been previously rated n Popularity Bias: Cannot recommend items to someone with unique tastes. ¨ Tends to recommend popular items. 9
Content-based Approaches n Recommendations are based on information on the content of items rather than on other users’ opinions. n Uses machine learning algorithms to induce a profile of the users preferences from examples based on content features. ¨ No need for data on other users. ¨ No cold-start or sparsity problems. ¨ Able to recommend to users with unique tastes. ¨ No first-rater problem. 10
Combining Content and Collaboration n Content-based and collaborative methods have complementary strengths and weaknesses. Combined methods to obtain the best of both. ¨ Apply both methods and combine recommendations. ¨ Use collaborative data as content. ¨ Use content-based predictor as another collaborator. ¨ Use content-based predictor to complete collaborative data. 11
Using Social Contagion for Recommendations § Intelligent Advertising, Product Recommendation § Who are the most influential people? § What are the patterns of information diffusion? 12
Social Contagion Thoery – Le. Bon et al. 1895 n Le Bon, Park and Blumer the three major theorists made an assumption that something happens in a crowd situation that can cause people to become irrational. n The social pathology and social contagion perspectives – the idea that someone who already has the affliction (behavior) can pass it on the someone else, and it can rapidly infect others ¨ n Gabrielle Tarde’s work on the ‘laws of imitation’ Applications: Viral marketing, social media marketing 13
Social Recommendations for Marketing n Mass marketing is not the best way to attract people ¨ ¨ n $ Expensive $ Usually not very focused Recommendations by people we know are more effective then input by unknown individuals ¨ Content: Our friends know what we like ¨ Homophily: Our friends and us are more likely to share interests and preferences ¨ Biased: We listen more to what our friends say (usually) ¨ Inexpensive 14
15
16
Data n The dataset for this study was collected from a large online OSS community – Ohloh, which provides information about 11, 800 OSS projects involving 94, 330 people Positive evaluation relationship ¨ Developers’ sociological features n Nationality, geographical location, etc. ¨ OSS project related information n Primary programming language, development activity, ratings, etc. ¨ From software revision control repositories – Subversion, CVS and Git. ¨ n Ohloh web site provides a REST-based application programming interface (API) for users to access and query its data. Figure. 1. Sample data from Ohloh developers 17
Statistical Analysis on Link Formation n Dependent variable: The outcome of a developer D participates in an OSS project P at time T , coded as a binary variable “Kudo” link. n Independent variables include three types of possible determinants ¨ Homophily factors ¨ Share affiliation factors ¨ Preferential attachment factors 18
19
20
Conditional Logit Analysis n Conditional logistic model (CLM) have been widely used to examine the determinants which affect individuals’ choices (Mc. Fadden 1980; Mc. Fadden et al. 1974; Powell et al. 2005). ¨ n Model human choice behavior – project participation choices. It is specified as follows: where is the observed choice of the new developer to participate in project i , and is a vector of the factors that influence such choice. J is the alternative set of projects available. The unknown coefficients are typically estimated by maximum likelihood methods. 21
Predicting Future Evaluation Choices n Our analysis also provided a prediction mechanism using conditional logistic model and the discovered determinants. n For instance, if developers a and the developer b Live in New York City (Coefficient of homophily in location: 5. 190) ¨ Use Java as their primary programming language (Coefficient : 1. 623) ¨ etc. ¨ n The probability for a choose to positively evaluate b from an alternative set can be calculated.


