Interdisciplinary Privacy Course June 2010 1 Web Mining

Interdisciplinary Privacy Course, June 2010 1 Web Mining and Privacy Bettina Berendt K. U. Leuven, Belgium www. berendt. de

2 What is Web Mining? And who am I? Knowledge discovery (aka Data mining): "the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data. " Web Mining: the application of data mining techniques on the content, (hyperlink) structure, and usage of Web resources. Navigation, queries, content access & creation Web mining areas: Web content mining Web structure mining Web usage mining

Why Web / data mining? 3

Agenda 1. What is (Web) data mining? And what does it have to do with privacy? – a simple view – 2. Examples of data mining and "privacy-preserving data mining": 1. Association-rule mining (& privacy-preserving AR mining) 2. Collaborative filtering (& privacy-preserving collaborative filtering) 3. A second look at. . . privacy 4. A second look at. . . Web / data mining 5. The goal: More than modelling and hiding – Towards a comprehensive view of Web mining and privacy. Threats, opportunities and solution approaches. 6. An outlook: Data mining for privacy 4

What is (Web) data mining? And what does it have to do with privacy? (vis-à-vis the cryptographic techniques we heard about yesterday) – a simple view – 5

1. Behaviour on the Web (and elsewhere) Data 6

7 2. Web (and other data) mining Data Privacy problems!

Technical background of the problem: Privacy Problems: Example 1 • The dataset allows for Web mining (e. g. , which search queries lead to which site choices), • it violates k-anonymity (e. g. "Lilburn" a likely k = #inhabitants of Lilburn) 8

9 Privacy Problems: Example 2 Where do people live who will buy the Koran soon? Technical background of the problem: • A mashup of different data sources • Amazon wishlists • Yahoo! People (addresses) • Google Maps each with insufficient k-anonymity, allows for attribute matching and thereby inferences

Privacy Problems: Example 3 Predicting political affiliation from Facebook profile and link data (1): Most Conservative Traits Trait Name Trait Value Weight Conservative Group george w bush is my homeboy 45. 88831329 Group college republicans 40. 51122488 Group texas conservatives 32. 23171423 Group bears for bush 30. 86484689 Group kerry is a fairy 28. 50250433 Group aggie republicans 27. 64720818 Group keep facebook clean 23. 653477 Group i voted for bush 23. 43173116 Group protect marriage one man one woman 21. 60830487 Lindamood et al. 09 & Heatherly et al. 09 10

Predicting political affiliation from Facebook profile and link data (2): Most Liberal Traits per Trait Name Trait Value Weight Liberal activities amnesty international 4. 659100601 Employer hot topic 2. 753844959 favorite tv shows queer as folk 9. 762900035 grad school computer science 1. 698146579 hometown mumbai 3. 566007713 Relationship Status in an open relationship 1. 617950632 religious views agnostic 3. 15756412 looking for whatever i can get 11 1. 703651985 Lindamood et al. 09 & Heatherly et al. 09

12 3. Cryptographic privacy solutions Data not all !

4. "Privacy-preserving data mining" 13 Data not all !

Two examples: Association-rule mining (& privacy-preserving AR mining) Collaborative filtering (& privacy-preserving collaborative filtering) 14

Two examples: Association-rule mining (& privacy-preserving AR mining) Collaborative filtering (& privacy-preserving collaborative filtering) 15

16 Reminder: The use of AR mining for store layout (Amazon, earlier: Wal-Mart, . . . ) Where to put: spaghetti, butter?

17 Data "Market basket data": attributes with boolean domains In a table each row is a basket (aka transaction) Transaction ID Attributes (basket items) 1 Spaghetti, tomato sauce 2 Spaghetti, bread 3 Spaghetti, tomato sauce, bread 4 bread, butter 5 bread, tomato sauce

18 Generating large k-itemsets with Apriori Transaction ID Attributes (basket items) 1 Spaghetti, tomato sauce 2 Spaghetti, bread 3 Spaghetti, tomato sauce, bread 4 bread, butter 5 bread, tomato sauce Min. support = 40% step 1: candidate 1 -itemsets n Spaghetti: support = 3 (60%) n tomato sauce: support = 3 (60%) n bread: support = 4 (80%) n butter: support = 1 (20%)

19 Contd. Transaction ID Attributes (basket items) 1 Spaghetti, tomato sauce 2 Spaghetti, bread 3 Spaghetti, tomato sauce, bread 4 bread, butter step 2: large 5 1 -itemsets n Spaghetti n tomato sauce n bread, tomato sauce bread candidate 2 -itemsets n {Spaghetti, tomato sauce}: support = 2 (40%) n {Spaghetti, bread}: support = 2 (40%) n {tomato sauce, bread}: support = 2 (40%)

20 Contd. Transaction ID Attributes (basket items) 1 Spaghetti, tomato sauce 2 Spaghetti, bread 3 Spaghetti, tomato sauce, bread 4 bread, butter step 3: large 5 2 -itemsets bread, tomato sauce n {Spaghetti, tomato sauce} n {Spaghetti, bread} n {tomato sauce, bread} candidate 3 -itemsets n {Spaghetti, tomato sauce, bread}: support = 1 (20%) step 4: large 3 -itemsets n {}

The apriori principle and the pruning of the search tree – an example of "the data mining approach" 21 Spagetthi, Tomato sauce, Bread, butter Spagetthi, Tomato sauce, Bread Spaghetti, tomato sauce spaghetti Spagetthi, Tomato sauce, butter Spaghetti, bread Spaghetti, butter Tomato sauce Spagetthi, Bread, butter Tomato s. , bread Tomato sauce, Bread, butter Tomato s. , butter Bread, butter

The apriori principle and the pruning of the search tree – an example of "the data mining approach" 22 Spagetthi, Tomato sauce, Bread, butter Spagetthi, Tomato sauce, Bread Spaghetti, tomato sauce spaghetti Spagetthi, Tomato sauce, butter Spaghetti, bread Spaghetti, butter Tomato sauce Spagetthi, Bread, butter Tomato s. , bread Tomato sauce, Bread, butter Tomato s. , butter Bread, butter

The apriori principle and the pruning of the search tree – an example of "the data mining approach" 23 Spagetthi, Tomato sauce, Bread, butter Spagetthi, Tomato sauce, Bread Spaghetti, tomato sauce spaghetti Spagetthi, Tomato sauce, butter Spaghetti, bread Spaghetti, butter Tomato sauce Spagetthi, Bread, butter Tomato s. , bread Tomato sauce, Bread, butter Tomato s. , butter Bread, butter

The apriori principle and the pruning of the search tree – an example of "the data mining approach" 24 Spagetthi, Tomato sauce, Bread, butter Spagetthi, Tomato sauce, Bread Spaghetti, tomato sauce spaghetti Spagetthi, Tomato sauce, butter Spaghetti, bread Spaghetti, butter Tomato sauce Spagetthi, Bread, butter Tomato s. , bread Tomato sauce, Bread, butter Tomato s. , butter Bread, butter

25 From itemsets to association rules Schema: If subset then large k-itemset with support s and confidence c n s = (support of large k-itemset) / # tuples n c = (support of large k-itemset) / (support of subset) Example: If {spaghetti} then {spaghetti, tomato sauce} n Support: s = 2 / 5 (40%) n Confidence: c = 2 / 3 (66%)

Two examples: Association-rule mining (& privacy-preserving AR mining) Collaborative filtering (& privacy-preserving collaborative filtering) 26

Privacy-preserving data mining (PPDM) n Database inference problem: "The problem that arises when confidential information can be derived from released data by unauthorized users” n Objective of PPDM : "develop algorithms for modifying the original data in some way, so that the private data and private knowledge remain private even after the mining process. ” n Approaches: l Data distribution – Decentralized holding of data l Data modification – Aggregation/merging into coarser categories – Perturbation, blocking of attribute values – Swapping values of individual records – sampling l Data or rule hiding – Push the support of sensitive patterns below a threshold 27

"Privacy-preserving Web mining" example: find patterns, unlink personal data Volvo S 40 website targets people in 20 s n Are visitors in their 20 s or 40 s? n Which demographic groups like/dislike the website? n An example of the "Randomization Approach" to PPDM: R. Agrawal and R. Srikant, "Privacy Preserving Data Mining", SIGMOD 2000. 28

Randomization Approach Overview 30 | 70 K |. . . 50 | 40 K |. . . Randomizer 65 | 20 K |. . . 25 | 60 K |. . . Reconstruct distribution of Age Reconstruct distribution of Salary Data Mining Algorithms . . Model 29

Reconstruction Problem Original values x 1, x 2, . . . , xn n from probability distribution X (unknown) To hide these values, we use y 1, y 2, . . . , yn n from probability distribution Y Given n x 1+y 1, x 2+y 2, . . . , xn+yn n the probability distribution of Y Estimate the probability distribution of X. 30

Intuition (Reconstruct single point) Use Bayes' rule for density functions 31

Intuition (Reconstruct single point) Use Bayes' rule for density functions 32

Reconstructing the Distribution Combine estimates of where point came from for all the points: n Gives estimate of original distribution. 33

Reconstruction: Bootstrapping 34

Seems to work well! 35

Two examples: Association-rule mining (& privacy-preserving AR mining) Collaborative filtering (& privacy-preserving collaborative filtering) 36

What is collaborative filtering? "People like what people like them like" – regardless of support and confidence 37

38 User-based Collaborative Filtering n Idea: People who agreed in the past are likely to agree again n To predict a user’s opinion for an item, use the opinion of similar users n Similarity between users is decided by looking at their overlap in opinions for other items n Next step: build a model of user types "global model" rather than "local patterns" as mining result

39 Example: User-based Collaborative Filtering Item 1 Item 2 Item 3 Item 4 Item 5 User 1 8 1 ? 2 7 User 2 2 ? 5 7 5 User 3 5 4 7 User 4 7 1 7 3 8 User 5 1 7 4 6 5 User 6 8 3 7

40 Similarity between users Item 1 Item 2 Item 3 Item 4 Item 5 User 1 8 1 ? 2 7 User 2 2 ? 5 7 5 User 4 7 1 7 3 8 • How similar are users 1 and 2? • How similar are users 1 and 4? • How do you calculate similarity?

42 Popular similarity measures Cosine based similarity Adjusted cosine based similarity Correlation based similarity

44 Algorithm 2: K-Nearest-Neighbours are people who have historically had the same taste as our user 7 5 7 8 4 Aggregation function: often weighted sum Weight depends on similarity

Outlook: Model-based collaborative filtering Instead of using ratings directly ("memory-based collaborative filtering"), develop a model of user ratings Use the model to predict ratings for new items To build the model: n Bayesian network (probabilistic) n Clustering (classification) n Rule-based approaches (e. g. , association rules between copurchased items) 45

Two examples: Association-rule mining (& privacy-preserving AR mining) Collaborative filtering (& privacy-preserving collaborative filtering) 46

Collaborative filtering: idea and architecture n Basic idea of collaborative filtering: "Users who liked this also liked. . . " generalize from "similar profiles" n Standard solution: l l Compute, from all users and their ratings/purchases, etc. , a global model l n At the community site / centralized: To derive a recommendation for a given user: find "similar profiles" in this model and derive a prediction Mathematically: depends on simple vector computations in the user-item space 47

Distributed data mining / secure multi-party computation: The principle explained by secure sum n Given a number of values x 1, . . . , xn belonging to n entities n compute n such that each entity ONLY knows its input and the result of the computation (The aggregate sum of the data) S xi 48

Canny: Collaborative filtering with privacy n Each user starts with their own preference data, and knowledge of who their peers are in their community. n By running the protocol, users exchange various encrypted messages. n At the end of the protocol, every user has an unencrypted copy of the linear model Λ, ψ of the community’s preferences. n They can then use this to extrapolate their own ratings n At no stage does unencypted information about a user’s preferences leave their own machine. n Users outside the community can request a copy of the model Λ, ψ from any community member, and derive recommendations for themselves Canny (2002), Proc. IEEE Symp. Security and Privacy; Proc. SIGIR 49

Privacy-preserving data publishing (PPDP) n Taking a broader look: n In contrast to the general assumptions of PPDM, arbitrary mining methods may be performed after publishing need adversary models n Objective: "access to published data should not enable the attacker to learn anything extra about any target victim compared to no access to the database, even with the presence of any attacker’s background knowledge obtained from other sources” l n (this needs to be relaxed by assumptions about the background knowledge) A comprehensive current survey: Fung et al. ACM Computing Surveys 2010 50

. . . and many more (and more advanced and comprehensive) PPDM methods and algorithms. . . Problem solved? 51

A second look at. . . privacy (very much influenced by joint work with Seda Gürses, see her presentation in this course) 52

1. Privacy as confidentiality: "the right to be let alone" – and to hide data Data Is this all there is to privacy? 53

54 2. Privacy as control: informational self-determination Data Don‘t do THIS ! n e. g. data privacy: "the right of the individual to decide what information about himself should be communicated to others and under what circumstances" (Westin, 1970) n behind much of data-protection legislation (see Eleni Kosta‘s talk)

55 3. Privacy as practice: Identity construction Data

1. Privacy as practice: Identity construction and the societal re-negotiation of the public/private divide "privacy negotiations" (incl. work by/with Teltzrow, Preibusch, Spiekermann) 56

A second look at. . . Web / data mining 57

Is this how it works? 58

No. . . ! How do people like/buy books? Should we show the recommen dations at the top or bottom of the page? Only to registered customers ? 59 What if someone bought a book as a present for their father? What do our Webserver logs tell us about viewing behaviour? How can we combine Webserver and transaction logs? Which data noise do we have to remove from our logs? Which of these association rules are frequent/co nfident enough?

Data mining IS-PART-OF Knowledge Discovery? Data mining IS-PART-OF Knowledge Discovery! 60

As an aside: what can happen when "business understanding" and data understanding are neglected 61

62 The goal: More than modelling and hiding – Towards a comprehensive view of Web mining and privacy [Berendt, Data Mining and Knowledge Discovery, accepted]

Goal: From the simple view. . . towards a more complex view 63

64 Plan P. as control confidentialit y Business / application understandin g Data preparation Modelling Evaluation Deployment P. as practice for each cell: challenges – opportunities – solution approaches (in the following: simplification: by phase, cell differentiation only in selected cases

Business understanding: Business models based on personal-data-as-value 65

Business understanding: Business models based on avoiding data collection 66

Data understanding, in particular data collection n Threats: l n Opportunities: l n Data collection may in itself be intrusive New forms of data collection (e. g. anonymous incident reporting) Solution approaches: l Anonymisation technology l Use of pseudonyms l Other PETs that lead to fewer data being collected 67

Data preparation: data selection and data integration Threats data selection and integration can lead to record linkage and therefore inferences control via purpose limitations becomes essential. . . threat or opportunity? : 68

Data integration: an example n Paper published by the Movie. Lens team (collaborative-filtering movie ratings) who were considering publishing a ratings dataset, see http: //movielens. umn. edu/ n Public dataset: users mention films in forum posts n Private dataset (may be released e. g. for research purposes): users‘ ratings n Film IDs can easily be extracted from the posts n 69 Observation: Every user will talk about items from a sparse relation space (those – generally few – films s/he has seen) [Frankowski, D. , Cosley, D. , Sen, S. , Terveen, L. , & Riedl, J. (2006). You are what you say: Privacy risks of public mentions. In Proc. SIGIR‘ 06] Generalisation with more robust de-anonymization attacks and different data: [Narayanan A, Shmatikov V (2009) De-anonymizing social networks. In: Proc. 30 th IEEE Symposium on Security and Privacy 2009]

Merging identities – the computational problem n Given a target user t from the forum users, find similar users (in terms of which items they related to) in the ratings dataset n Rank these users u by their likelihood of being t n Evalute: l If t is in the top k of this list, then t is k-identified l Count percentage of users who are k-identified l E. g. measure likelihood by TF. IDF (m: item) 70

Results 72

What do you think helps? 73

Data preparation: data construction - Definition and examples n "constructive data preparation operations such as the production of derived attributes, entire new records or transformed values for existing attributes. " May involve usage and/or prediction of values for attributes such as § gender, age § ethnicity, skin colour, sexual orientation § "people who are nostalgic for the former East German State" [http: //www. sociovision. de/loesungen/sinus-milieus. html] § "terror risk score" (cf. Pilkington E (2006) Millions assigned terror risk score on trips to the US. The Guardian, 2 Dec. 2006. http: //www. guardian. co. uk/usa/story/0, , 1962299, 00. html ) 74

Data preparation: data construction - Analysis n Threats: The construction and naming of new attributes may create controversial psychological or social categories. The intentional or unintentional reification produces a social category or norm that may be offensive per se and/or lend itself to abuses such as further privacy-relevant activities (privacy as practice). n At the same time an opportunity? (imagine categories like "prolific donors to online free-speech causes” and, during modelling, findings that they do "good" things) n Solution approaches: l anything that avoids such inferences ( all PPDM/PPDP)? l However, with the focus of PPDM on (i) data utility, (ii) avoiding inferences on / damages to individuals, creation of new attributes and profiling are explicitly not addressed. 75

Modelling - Definition and Analysis (threats and opportunities) n Identification of interesting patterns l l n global characterizations of the modelled set of instances (e. g. clusters) local characterizations Of a subset of all instances (e. g. , association rules) Threats l KD result patterns may be descriptions of or ascriptions to unwished-for social categories (s. a. ) l may also have implications on the public-private divide: "[A system in which individuals and groups determine which description best fits them] also addresses the second sense of privacy – that of public participation in the definition of the public/private divide. One of the most insidious aspects of market profiling is that the social models thus produced are private property [e. g. , trade secrets]. . When this private social modeling is combined with the private persuasive techniques of targeted marketing, the result is an anti-democratic [. . . ] process of social shaping. ” [Phillips(2004), p. 703] n Opportunities l Controversial relationships as a possible starting point of liberating debates that further privacy as practice. l Example abortion? ! [Donohue and Levitt(2001)] 76

Modelling - Analysis: Solution approaches from PPDM n Modifications to the data (so that "private data remain private”) (s. a. ) AND n Modifications to the results (so that "private knowledge remains private”). n Rule hiding example: discrimination-aware data mining [Pedreschi et al(2008)]. l discriminatory attributes (US law): race, religion, pregnancy status, . . . l Discriminatory classification rules: propose a decision (e. g. , whether to give a loan) based on a discriminatory attribute in a – direct way (appearing in the rule premise) or – indirect way (appearing in an associated rule). l The authors propose metrics to control for such discrimination. 77

Evaluation n the step at which to ascertain that the results of the previous stages "properly achieve[. . . ] the business objectives. A key objective is to determine if there is some important business issue that has not been sufficiently considered. At the end of this phase, a decision on the use of the data mining results should be reached. ” n review all the previously raised problems to make sure that the deployment will be as privacy-protecting as possible (or as desired). n Look at unexpected results! n Example discrimination-aware data mining: l aware of pre-defined discriminatory categories / mining patterns l But what about newly found categories? 78

Deployment - Definition the gained insight is used, for example in n real-time personalization of Web page delivery and design n decision processes: what contract to offer or deny a customer, whether to search a traveller at the border or not, . . . 79

Deployment - Analysis n Threats: These operational steps may l be intrusive per se: e. g. , searching someone at an airport in response to a high "terror risk score”, searching their home and/or computer, l contribute to the knowledge about a data subject and thus be similar to more data being collected and stored, l install social categories and norms as ‘facts’ with all the consequences of such redefinitions of reality: less consumer choice, heightened social inequalities, more people treated as criminals, etc. l be wrongly applied due to the inherently statistical nature of patterns: – error margin s (e. g. misclassification errors) – Inconvenience and worse of false positives! – Survey of incidents: e. g. Daten-speicherungde (2010) Fälle von Datenmissbrauch und –irrtümern [cases of data abuse and errors]. http: //datenspeicherung. de/wiki/index. php? title=F%C 3%A 4 lle_von_Datenmissbrauch_ und_-irrt%C 3%BCmern&oldid=3639 n Opportunities: for reverse patterns, see above n Solution approaches: economic pressure (loss of goodwill / public image)? ! See Facebook users discussion (e. g. Gürses, Rizk & Günther, 2008) 80

Discussion item: What is this an example of? Tracing anonymous edits in Wikipedia http: //wikiscanner. virgil. gr/ 81

[Method: Attribute matching] 82

Results (an example) 83

An outlook: Data mining for privacy 84

"Experiential vs. Reflective" Experiental cognition and reflective cognition "The experiential mode leads to a state in which we perceive and react to the events around us, efficiently and effortlessly. This. . . is a key component of efficient performance. The reflective mode is that of comparison and contrast, of thought, of decision making. This is the mode that leads to new ideas, novel responses. . [a proper balance is needed]. . . Alas, our educational system is more and more trapped in an experiential mode: the brilliant inspired lecturer, the prevalence of prepackaged films. . . To engage the student, the textbooks that follow a predetermined sequence. " [Norman, Things That Make Us Smart] Feedback Goal: 2010] and awareness tools for privacy protection? ! use data mining as basis [Gürses & Berendt, 85

Example: Privacy Wizards for Social Networking Sites [Fang & Le. Fevre 2010] n Interface: user specifies what they want to share with whom l Not in an abstract way ("group X" or "friends of friends" etc. ) l Not for every friend separately l But for a subsect of friends, and the system learns the "implicit rules behind that" n Data mining: active learning (system asks only about the most informative friends instances) n Results: good accuracy, better for "friends by communities" (linkage information) than for "friends by profile" (their profile data) 86

Privacy Wizards. . . – more feedback: “Expert interface“ showing the learned classifier 87

88 Thank you!

References I (in the order in which they appear in the slides) Barbaro M, Zeller T (9 August 2006) A face is exposed for aol searcher no. 4417749. New York Times Owad T (2006) Data mining 101: Funding subversives with amazon wishlists. http: //www. applefritter. com/bannedbooks Lindamood J, Heatherly R, Kantarcioglu M, Thuraisingham BM (2009) Inferring private information using social network data. In: Proceedings of the 18 th International Conference on. World. Wide. Web, WWW 2009, Madrid, Spain, April 20 -24, 2009, ACM, pp 1145– 1146 Raymond Heatherly, Murat Kantarcioglu, Bhavani Thuraisinghaim, "Social Network Classification Incorporating Link Type Values", IEEE International Conference on Intelligence and Security Informatics 2009. Verykios VS, Bertino E, Fovino IN, Provenza LP, Saygin Y, Theodoridis Y (2004) State-of-the-art in privacy preserving data mining. SIGMOD Record 33(1): 50– 57 Agrawal R, Srikant R (2000) Privacy-preserving data mining. In: SIGMOD Conference, ACM, pp 439– 450 John Canny, Collaborative Filtering with Privacy, Proceedings of the 2002 IEEE Symposium on Security and Privacy, p. 45, May 12 -15, 2002 John F. Canny: Collaborative filtering with privacy via factor analysis. SIGIR 2002: 238 -245 Fung BCM, Wang K, Chen R, Yu PS (2010) Privacy-preserving data publishing: A survey on recent developments. ACM Computing Surveys 42(4) 89

References II (in the order in which they appear in the slides) Berendt B (accepted) More than modelling and hiding: Towards a comprehensive view of Web mining and privacy. Data Mining and Knowledge Discovery Frankowski, D. , Cosley, D. , Sen, S. , Terveen, L. , & Riedl, J. (2006). You are what you say: Privacy risks of public mentions. In Proc. SIGIR‘ 06 Narayanan A, Shmatikov V (2009) De-anonymizing social networks. In: Proc. 30 th IEEE Symposium on Security and Privacy 2009 Phillips D (2004) Privacy policy and PETs: The influence of policy regimes on the development and social implications of privacy enhancing technologies. New Media & Society 6(6): 691– 706 Donohue J, Levitt S (2001) The impact of legalized abortion on crime. Quarterly Journal of Economics 116(2): 379– 420 Seda Gürses, Ramzi Rizk, and Oliver Günther. Privacy design in online social networks: Learning from privacy breaches and community feedback. In Proc. of the Twenty Ninth International Conference on Information Systems, 2008. Norman D (1993) Things That Make Us Smart: Defending Human Attributes In The Age Of The Machine. Perseus Books. Gürses S, Berendt B (2010) The social web and privacy: Practices, reciprocity and conflict detection in social networks. In: Ferrari E, Bonchi F (eds) (2010) Privacy-Aware Knowledge Discovery: Novel Applications and New Techniques. Chapman & Hall/CRC Press. Lujun Fang, Kristen Le. Fevre: Privacy wizards for social networking sites. WWW 2010: 351 -360. 90

Sources: I have re-used slides and pictures. . . (thanks to the Web community!) Slides 10 -11 are from http: //www. utdallas. edu/~bxt 043000/cs 7301_s 10/Lecture 25. ppt Slides 28 -35 are from (some slightly adapted) http: //www. rsrikant. com/talks/pakdd 02. ppt Slides 39 -45 are from (some slightly adapted) http: //www. abdn. ac. uk/~csc 263/teaching/AIS/lectures/abdn. only/Collaborative. Filtering. ppt All picture credits are in the "Powerpoint comment fields" 91

Further reading: Surveys Web data mining: n Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. Springer, Berlin etc. , 2007. Privacy-preserving data mining: n Aggarwal CC, Yu PS (eds) (2008) Privacy-Preserving Data Mining: Models and Algorithms. Springer Publishing Company, Incorporated Privacy-preserving data publishing: n Fung BCM, Wang K, Chen R, Yu PS (2010) Privacy-preserving data publishing: A survey on recent developments. ACM Computing Surveys 42(4) 92