d9729152c7a8ca8a930e48c0f77c5301.ppt
- Количество слайдов: 97
1 Last update: 11 November 2008 1 Advanced databases – Inferring implicit/new knowledge from data(bases): Web mining, esp. Web usage mining Bettina Berendt Katholieke Universiteit Leuven, Department of Computer Science http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 1
Semi-structured and unstructured data 2 2 n Unstructured data: has „no“ structure (esp. not a relational one) n Common sources of unstructured data include: l Documents: Word documents, Power. Point presentations, newsletters, source code, hard-copy documents l Images and graphics n Semi-structured data: has „some“ structure (partly structured, partly unstructured) n Common sources of semi-structured data sources include: l E-mails TCP/IP packets l XML data l Images and graphics l Documents (all listed previously) Web, text as two particularly interesting representatives Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 2
3 Agenda 3 Intro: Web Mining, specifically Web Usage Mining task „Find association rules“; apriori algo. Data Acquisition, Understanding, and Preparation Forms of analysis; mining techniques Case study 1: A multi-channel retailer method: Association-rule discovery Case study 2: Search in an educational portal method: Sequence mining / generalized-sequ. discovery Case study 3: Search in a community portal Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 3
What Web pages answer my information need? 4 4 Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 4
What Web pages are “good“ (better than others)? 5 5 Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 5
What should I buy? 6 6 Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 6
7 CRM questions example: Why go to a shop. . . 7 . . . if everything is available on the Internet? Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 7
How do people search? 8 8 Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 8
9 Web Mining 9 Knowledge discovery (aka Data mining): “the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data. ” 1 Web Mining: the application of data mining techniques on the content, (hyperlink) structure, and usage of Web resources. Web mining areas: Web content mining Web structure mining Web usage mining Fayyad, U. M. , Piatetsky-Shapiro, G. , Smyth, P. , & Uthurusamy, R. (Eds. ) (1996). Advances in Knowledge Discovery and Data Mining. Boston, MA: AAAI/MIT Press Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 1 9
Web Usage Mining: Basics and data sources 10 10 Definition of Web usage mining: n discovery of meaningful patterns from data generated by client-server transactions on one or more Web servers Typical Sources of Data n automatically generated data stored in server access logs, referrer logs, agent logs, and client-side cookies n e-commerce and product-oriented user events (e. g. , shopping cart changes, ad or product click-throughs, purchases) n user profiles and/or user ratings n meta-data, page attributes, page content, site structure This is a slide from 2002. . . Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 10
11 Web usage is more than „browsing“: Interactions on the Web 11 Social viewpoint n User – server l l Online store l Digital library l n Search engine . . . User – user l „Web 2. 0“ (and all its precursors) Technical viewpoint n Access content („read“) n Create content („write“) n Navigate Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 11
Structure of the rest (as always. . . ) 12 12 Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 12 http: //www. crisp-dm. org/Images/187343_CRISPart. jpg
13 Agenda 13 Intro: Web Mining, specifically Web Usage Mining task „Find association rules“; apriori algo. Data Acquisition, Understanding, and Preparation Forms of analysis; mining techniques Case study 1: A multi-channel retailer method: Association-rule discovery Case study 2: Search in an educational portal method: Sequence mining / generalized-sequ. discovery Case study 3: Search in a community portal Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 13
14 The data: The Market-Basket Model 14 n A large set of items, e. g. , things sold in a supermarket. n A large set of baskets, each of which is a small set of the items, e. g. , the things one customer buys on one day. Item 1 Item 2 . . . Item m Basket 1 Basket 2. . . Basket n n Application examples: l Items = books etc. ; baskets = the purchase(s) of one customer (amazon: “bought”; Wal. Mart) l Items = Web pages; baskets = visit/session (amazon: “looked at”) l Items = words; baskets = documents (related content? !) Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 14
What to look for? : Support 15 15 n Simplest question: find sets of items that appear “frequently” in the baskets. n Support for itemset I = the percentage of baskets containing all items in I. l n Terminology not completely consistent: often, the support denotes the (total) number of baskets containing all items in I. Given a support threshold s, sets of items that appear in > s baskets are called frequent itemsets. Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 15
16 16 Example n Items={milk, coke, pepsi, beer, juice, spaghetti}. n Minimum support = 3/8 or 3 (baskets) B 1 = {m, c, b, s} B 3 = {m, b, s} B 4 = {c} B 5 = {m, p, b} B 6 = {m, c, b, j, s} B 7 = {c, b} n B 2 = {m, p, j} B 8 = {b, c} Frequent itemsets: {m}, {c}, {b}, {j}, {s} {m, b}, {c, b}, {m, s}, {b, s} {m, b, s} Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 16
Scale of Problem 17 17 Wal. Mart sells 100, 000 items and can store billions of baskets. The Web has over 100, 000 words and billions of pages. Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 17
Association Rules 18 18 n If-then rules about the contents of baskets. n {i 1, i 2, …, ik} → j means: “if a basket contains all of i 1, …, ik , then it is likely to contain j. n The confidence of this association rule is the probability (more exactly: frequency) of j given i 1, …, ik. n = support({i 1, i 2, …, ik & j} ) / support ({i 1, i 2, …, ik}) n Generally written as Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 18
19 19 Example (contd. ) B 1 = {m, c, b, s} B 3 = {m, b, s} B 4 = {c} B 5 = {m, p, b} B 6 = {m, c, b, j, s} B 7 = {c, b} n B 2 = {m, p, j} B 8 = {b, c} Frequent itemsets: {m}, {c}, {b}, {j}, {s} {m, b}, {c, b}, {m, s}, {b, s} {m, b, s} n The association rule: {m, b} → s has support 3/8 = 0. 375 and confidence 3/4 = 0. 75 Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 19
Another measure of “Interestingness”: Lift 20 20 n The lift of an association rule is the amount by which the support differs from what you would expect if items were selected independently of one another. n Note: This refers to “support” as percentage n Example: The association rule: {m, b} → s has lift (3/8) / (4/8*3/8) = 2 n Aside: All pattern types have “interestingness measures” – cf. accuracy for classification! Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 20
Finding (Mining for) Association Rules 21 21 A typical question: “find all association rules with support ≥ s and confidence ≥ c. ” n Note: “support” of an association rule is the support of the set of items it mentions. Hard part: finding the high-support (frequent ) itemsets. n Checking the confidence of association rules involving those sets is relatively easy. Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 21
Basic idea of the apriori algorithm 22 22 (= the basic algorithm for AR mining; many variants exist) n The apriori principle: An itemset can only be frequent if all of ist subsets are also frequent helps to prune many candidates! n Iterative two-pass algorithm: For each k=1, . . . (itemset size) 1. generate all candidates 2. check for frequency in the database Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 22
23 23 In the example B 1 = {m, c, b, s} B 2 = {m, p, j} B 3 = {m, b, s} B 4 = {c} B 5 = {m, p, b} B 6 = {m, c, b, j, s} B 7 = {c, b} B 8 = {b, c} mbs mc mbc cbs mb ms mp cb m c b s cs cp p bs bp sp j Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 23
24 Agenda 24 Intro: Web Mining, specifically Web Usage Mining task „Find association rules“; apriori algo. Data Acquisition, Understanding, and Preparation Forms of analysis; mining techniques Case study 1: A multi-channel retailer method: Association-rule discovery Case study 2: Search in an educational portal method: Sequence mining / generalized-sequ. discovery Case study 3: Search in a community portal Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 24
Web Usage Mining 25 25 Discovery of meaningful patterns from data generated by clientserver transactions on one or more Web servers Typical Sources of Data n automatically generated data stored in server access logs, referrer logs, agent logs, and client-side cookies n e-commerce and product-oriented user events (e. g. , shopping cart changes, ad or product click-throughs, etc. ) n user profiles and/or user ratings n meta-data, page attributes, page content, site structure Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 25
26 Data collection 26 Web server Client (Browser) Proxy Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 26
What’s in a typical Web server log … (Requests to www. acr-news. org) <ip_addr> - - <date><method><file><protocol><code><bytes><referrer><user_agent> 203. 30. 5. 145 - - [01/Jun/1999: 03: 09: 21 -0600] "GET /Calls/OWOM. html HTTP/1. 0" 200 3942 "http: //www. lycos. com/cgi-bin/pursuit? query=advertising+psychology&maxhits=20&cat=dir" "Mozilla/4. 5 [en] (Win 98; I)" 203. 30. 5. 145 - - [01/Jun/1999: 03: 09: 23 -0600] "GET /Calls/Images/ earthani. gif HTTP/1. 0" 200 10689 "http: //www. acr-news. org/Calls/OWOM. html" "Mozilla/4. 5 [en] (Win 98; I)" 203. 30. 5. 145 - - [01/Jun/1999: 03: 09: 24 -0600] "GET /Calls/Images/line. gif HTTP/1. 0" 200 190 "http: //www. acr-news. org/Calls/OWOM. html" "Mozilla/4. 5 [en] (Win 98; I)" 203. 252. 234. 33 - - [01/Jun/1999: 03: 12: 31 -0600] "GET / HTTP/1. 0" 200 4980 "" "Mozilla/4. 06 [en] (Win 95; I)" 203. 252. 234. 33 - - [01/Jun/1999: 03: 12: 35 -0600] "GET /Images/line. gif HTTP/1. 0" 200 190 "http: //www. acr-news. org/" "Mozilla/4. 06 [en] (Win 95; I)" 203. 252. 234. 33 - - [01/Jun/1999: 03: 12: 35 -0600] "GET /Images/red. gif HTTP/1. 0" 200 104 "http: //www. acr-news. org/" "Mozilla/4. 06 [en] (Win 95; I)" 203. 252. 234. 33 - - [01/Jun/1999: 03: 12: 35 -0600] "GET /Images/ earthani. gif HTTP/1. 0" 200 10689 "http: //www. acr-news. org/" "Mozilla/4. 06 [en] (Win 95; I)" 203. 252. 234. 33 - - [01/Jun/1999: 03: 11 -0600] "GET /CP. html HTTP/1. 0" 200 3218 "http: //www. acr-news. org/" "Mozilla/4. 06 [en] (Win 95; I)“ 203. 30. 5. 145 - - [01/Jun/1999: 03: 13: 25 -0600] "GET /Calls/ AWAC. html HTTP/1. 0" 200 104 "http: //www. acr-news. org/Calls/OWOM. html" "Mozilla/4. 5 [en] (Win 98; I)"
… and what does it mean? (Requests to www. acr-news. org) <ip_addr> - - <date><method><file><protocol><code><bytes><referrer><user_agent> 203. 30. 5. 145 - - [01/Jun/1999: 03: 09: 21 -0600] "GET /Calls/OWOM. html HTTP/1. 0" 200 3942 "http: //www. lycos. com/cgi-bin/pursuit? query=advertising+psychology&maxhits=20&cat=dir" "Mozilla/4. 5 [en] (Win 98; I)" 203. 30. 5. 145 - - [01/Jun/1999: 03: 09: 23 -0600] "GET /Calls/Images/ earthani. gif HTTP/1. 0" 200 10689 "http: //www. acr-news. org/Calls/OWOM. html" "Mozilla/4. 5 [en] (Win 98; I)" 203. 30. 5. 145 - - [01/Jun/1999: 03: 09: 24 -0600] "GET /Calls/Images/line. gif HTTP/1. 0" 200 190 "http: //www. acr-news. org/Calls/OWOM. html" "Mozilla/4. 5 [en] (Win 98; I)" 203. 252. 234. 33 - - [01/Jun/1999: 03: 12: 31 -0600] "GET / HTTP/1. 0" 200 4980 "" "Mozilla/4. 06 [en] (Win 95; I)" 203. 252. 234. 33 - - [01/Jun/1999: 03: 12: 35 -0600] "GET /Images/line. gif HTTP/1. 0" 200 190 "http: //www. acr-news. org/" "Mozilla/4. 06 [en] (Win 95; I)" 203. 252. 234. 33 - - [01/Jun/1999: 03: 12: 35 -0600] "GET /Images/red. gif HTTP/1. 0" 200 104 "http: //www. acr-news. org/" "Mozilla/4. 06 [en] (Win 95; I)" 203. 252. 234. 33 - - [01/Jun/1999: 03: 12: 35 -0600] "GET /Images/earthani. gif HTTP/1. 0" 200 10689 "http: //www. acr-news. org/" "Mozilla/4. 06 [en] (Win 95; I)" 203. 252. 234. 33 - - [01/Jun/1999: 03: 11 -0600] "GET /CP. html HTTP/1. 0" 200 3218 "http: //www. acr-news. org/" "Mozilla/4. 06 [en] (Win 95; I)“ 203. 30. 5. 145 - - [01/Jun/1999: 03: 13: 25 -0600] "GET /Calls/AWAC. html HTTP/1. 0" 200 104 "http: //www. acr-news. org/Calls/OWOM. html" "Mozilla/4. 5 [en] (Win 98; I)"
Sources and destinations 29 29 Logs may extend beyond visits to the site and show where a visitor was before (referrer). . . 203. 30. 5. 145 - - [01/Jun/1999: 03: 09: 21 -0600] "GET /Calls/OWOM. html HTTP/1. 0" 200 3942 "http: //www. lycos. com/cgibin/pursuit? query=advertising+psychology-&maxhits=20&cat=dir" "Mozilla/4. 5 [en] (Win 98; I)". . . and where s/he went next (URL rewriting): Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 29
30 Preprocessing of Web Usage Data 30 Raw Usage Data Cleaning User/Session Identification Page View Identification Path Completion Server Session File Episode Identification Usage Statistics Site Structure and Content Episode File Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 30
31 Preprocessing of Web Usage Data 31 Raw Usage Data Cleaning User/Session Identification Page View Identification Path Completion Server Session File Episode Identification Usage Statistics Site Structure and Content Episode File not always necessary and/or done Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 31
Data Preprocessing (1) 32 32 Data cleaning n remove irrelevant references and fields in server logs n remove references due to spider navigation n remove erroneous references n add missing references due to caching (done after sessionization) Data integration n synchronize data from multiple server logs n Integrate semantics, e. g. , l l n meta-data (e. g. , content labels) e-commerce and application server data integrate demographic / registration data Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 32
Data Preprocessing (2) 33 33 Data Transformation n user identification n sessionization / episode identification n pageview identification l a pageview is a set of page files and associated objects that contribute to a single display in a Web Browser Data Reduction n sampling and dimensionality reduction (ignoring certain pageviews / items) Identifying User Transactions (i. e. , sets or sequences of pageviews possibly with associated weights) Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 33
Why sessionize? 34 34 n Quality of the patterns discovered in KDD depends on the quality of the data on which mining is applied. n In Web usage analysis, these data are the sessions of the site visitors: the activities performed by a user from the moment she enters the site until the moment she leaves it. n Difficult to obtain reliable usage data due to proxy servers and anonymizers, dynamic IP addresses, missing references due to caching, and the inability of servers to distinguish among different visits. n Cookies and embedded session IDs produce the most faithful approximation of users and their visits, but are not used in every site, and not accepted by every user. n Therefore, heuristics are needed that can sessionize the available access data. Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 34
Mechanisms for User Identification 35 35 Examples: page tags (use javascript), some browser plugins Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 35
Examples of “software agents“ 36 36 Page tagging with Javascript: see also http: //www. bruceclay. com/analytics/disadvantages. htm Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 36
37 Sessionization strategies: Sessionization heuristics 37 These heuristics are quite accurate! (see Spiliopoulou et al. , 2003) Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 37
Path Completion 38 38 Refers to the problem of inferring missing user references due to caching. Effective path completion requires extensive knowledge of the link structure within the site Referrer information in server logs can also be used in disambiguating the inferred paths. Problem gets much more complicated in frame-based sites. Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 38
39 Why integrate semantics? 39 Basic idea: associate each requested page with one or more domain concepts, to better understand the process of navigation / Web usage Example: a shopping site From. . . p 3 ee 24304. dip. t-dialin. net - - [19/ Mar/2002: 12: 03: 51 +0100] "GET /search. html? l=ostsee%20 strand&syn=023785&ord=asc HTTP/1. 0" 200 1759 p 3 ee 24304. dip. t-dialin. net - - [19/ Mar/2002: 12: 05: 06 +0100] "GET /search. html? l=ostsee%20 strand&p=low&syn=023785&ord=desc HTTP/1. 0" 200 8450 p 3 ee 24304. dip. t-dialin. net - - [19/ Mar/2002: 12: 06: 41 +0100] "GET /mlesen. html? Item=3456&syn=023785 HTTP/1. 0" 200 3478 To. . . Refine search Search by category Choose item Search by Category+title Look at individual product Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 39
40 From URLs to topics / concepts: Basics of semantic session modelling n 1 request 1 concept or n concepts n Concepts can concern content or service n Concepts can be part of an ontology (simple case: concept hierarchy) n 40 Session = set / sequence / tree / graph of requests also possible: n requests 1 concept Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 40
41 41 Ontology-based behaviour modelling – basic ideas (1) The request for a Web page signals interest in the concept(s) and relations dealt with in this page – interest in the obtained content as well as in the requested service. Formally: a request as a (multi)set, or as a vector, of concepts/relations. Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 41
Resulting format: if the request is the instance 42 42 Usually flat file (format like Web server log) or database Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 42
Resulting format: If a session is the instance 43 43 n What features can a session have? n Refer again to the example: p 3 ee 24304. dip. t-dialin. net - - [19/ Mar/2002: 12: 03: 51 +0100] "GET /search. html? l=ostsee%20 strand&syn=023785&ord=asc HTTP/1. 0" 200 1759 p 3 ee 24304. dip. t-dialin. net - - [19/ Mar/2002: 12: 05: 06 +0100] "GET /search. html? l=ostsee%20 strand&p=low&syn=023785&ord=desc HTTP/1. 0" 200 8450 p 3 ee 24304. dip. t-dialin. net - - [19/ Mar/2002: 12: 06: 41 +0100] "GET /mlesen. html? Item=3456&syn=023785 HTTP/1. 0" 200 3478 Refine search Search by category Choose item Search by Category+title Look at individual product Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 43
Site Content Basic Framework for E-Commerce Data Analysis Web Usage and E-Business Analytics Content Analysis Module Web/Application Server Logs Data Cleaning / Sessionization Module Data Integration Module Integrated Sessionized Data E-Commerce Data Mart Session Analysis / Static Aggregation OLAP Tools OLAP Analysis Data Cube Site Map customers orders products Site Dictionary Operational Database Data Mining Engine Pattern Analysis
45 Agenda 45 Intro: Web Mining, specifically Web Usage Mining task „Find association rules“; apriori algo. Data Acquisition, Understanding, and Preparation Forms of analysis; mining techniques Case study 1: A multi-channel retailer method: Association-rule discovery Case study 2: Search in an educational portal method: Sequence mining / generalized-sequ. discovery Case study 3: Search in a community portal Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 45
Web Usage and E-Business Analytics 46 46 Different Levels of Analysis n Session Analysis n Static Aggregation and Statistics n OLAP n Data Mining Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 46
Session Analysis 47 47 Simplest form of analysis: examine individual or groups of server sessions and e-commerce data. Advantages: n Gain insight into typical customer behaviors. n Trace specific problems with the site. Drawbacks: n LOTS of data. n Difficult to generalize. Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 47
Static Aggregation (Reports) 48 48 Most common form of analysis. Data aggregated by predetermined units such as days or sessions. Generally gives most “bang for the buck. ” Advantages: n Gives quick overview of how a site is being used. n Minimal disk space or processing power required. Drawbacks: n No ability to “dig deeper” into the data. Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 48
Online Analytical Processing (OLAP) 49 49 Allows changes to aggregation level for multiple dimensions. Generally associated with a Data Warehouse. Advantages & Drawbacks n Very flexible n Requires significantly more resources than static reporting. Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 49
50 Data Mining: Going deeper 50 Prediction of next event Discovery of associated events or application objects Sequence mining Markov chains Association rules Discovery of visitor groups with common properties and interests Clustering Discovery of visitor groups with common behaviour Session Clustering Characterization of visitors with respect to a set of predefined classes Classification Card fraud detection Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 50
51 KDD Techniques for Web Applications: Examples (1) 51 Calibration of a Web server: n Prediction of the next page invocation over a group of concurrent Web users under certain constraints l Sequence mining, Markov chains Cross-selling of products: n Mapping of Web pages/objects to products n Discovery of associated products l n Association rules, Sequence Mining Placement of associated products on the same page Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 51
52 KDD Techniques for Web Applications: Examples (2) 52 Sophisticated cross-selling and up-selling of products: n Mapping of pages/objects to products of different price groups n Identification of Customer Groups l n Discovery of associated products of the same/different price categories l n Clustering, Classification Association rules, Sequence Mining Formulation of recommendations to the end-user l Suggestions on associated products l Suggestions based on the preferences of similar users Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 52
53 Agenda 53 Intro: Web Mining, specifically Web Usage Mining task „Find association rules“; apriori algo. Data Acquisition, Understanding, and Preparation Forms of analysis; mining techniques Case study 1: A multi-channel retailer method: Association-rule discovery Case study 2: Search in an educational portal method: Sequence mining / generalized-sequ. discovery Case study 3: Search in a community portal Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 53
54 CRM questions example: Why go to a shop. . . 54 . . . if everything is available on the Internet? Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 54
55 A multi-channel retailer, its business goals, and analysis questions 55 General goals: “Standard e-tailer goals“ – attract users/shoppers and convert them into customers Specific goals: assess the success of the Web site – in relation to other distribution channels Background: Internet market shares [BCG 2002] Questions of the evaluation: • What business metrics can be calculated from Web usage data, transaction and demographic data for determining online success? • Are there cross-channel effects between a company‘s e-shop and its physical stores? Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 55
The site 56 56 Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 56
Outline of the KDD process 57 57 Business underst. : customer buying process Data: § Web server sessions, transaction info. Data understanding – main step: § modelling the semantics of the site in terms of a hierarchy of service concepts Data preparation: n Session IDs; usual data cleaning steps n Linking of sessions & transaction information (anonymized) Modelling / pattern discovery: n Web metrics, cluster analysis, association rules, sequence mining + correlation analysis, questionnaire study, qualitative market analysis Evaluation: Interesting patterns Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 57
Agenda – Case Study 58 58 Business Understanding Data understanding and preparation Pattern discovery + evaluation: Success metrics Pattern disc. + eval. : Behavioural patterns Pattern disc. + eval. : User types Pattern disc. + eval. : Behaviour & demographics Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 58
Agenda – Case Study 59 59 Business Understanding Data understanding and preparation Pattern discovery + evaluation: Success metrics Pattern disc. + eval. : Behavioural patterns Pattern disc. + eval. : User types Pattern disc. + eval. : Behaviour & demographics Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 59
Description of the site and its services 60 60 n The retailer operates an e-shop and more than 5000 retail shops in over 10 European countries n It sells a wide range of consumer electronics n Online customers can pay, pick-up/deliver and return both online and offline n Web pages provide for all tasks in the customer buying process Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 60
61 Purchase Phases (Page Concepts) at Large MC Retailers 61 Home (Acquisition) 1. Acquisition (home): All Web pages that are semantically related to the initial acquisition of a visitor Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 61
62 Purchase Phases (Page Concepts) at Large MC Retailers Home (Acquisition) 2. 62 Product Impressi on Catalogue information: pages providing an overview of product categories. Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 62
63 Purchase Phases (Page Concepts) at Large MC Retailers Home (Acquisition) 3. Product Impressi on 63 Product Click. Through Information product (infprod): pages displaying information about a specific product Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 63
64 Purchase Phases (Page Concepts) at Large MC Retailers Home (Acquisition) 4. Product Impressi on Product Click. Through 64 Offlineinf o offline information (offinfo): All pages related to any offline information: store locator (pages for finding physical stores in one’s neighbourhood), information about offline services, offline referrers etc. Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 64
65 Purchase Phases (Page Concepts) at Large MC Retailers Home (Acquisition) 5. Product Impressi on Product Click. Through Offlineinf o 65 Transacti on transaction (transact): steps before an actual purchase, starting with a customer entering the order process: check-out, input of customer data, payment and delivery preferences (online or offline), etc. Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 65
66 66 Purchase Phases (Page Concepts) at Large MC Retailers Home (Acquisition) 6. Product Impressi on Product Click. Through Offlineinf o Transacti on Purchase purchase: indicates if a visitor completed the transaction process and bought a product, e. g. invocation of an order confirmation page. Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 66
Agenda – Case Study 67 67 Business Understanding Data understanding and preparation Pattern disc. + eval. : Behavioural patterns Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 67
Data and data preparation 68 68 Data sources and sample: n 92, 467 sessions from the company’s Web logs from 21 days in 2002 n anonymized transaction information of 13, 653 customers who bought online over a period of 8 months in 2001/02. n 621 transaction records (21 days) were linked to Web-usage records Data preparation: n Sessions were determined by session IDs n Robot visits eliminated, usual data cleaning steps n Each URL request mapped to a service concept from {c 1, . . . , cn} n Session representation: s = [w 1, . . . wn], with wi = weight of ci, indicating whether or not the concept was visited (1/0), or how often it was visited n Customer record: feature vector incl. session and transaction data Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 68
69 Site semantics: A service concept hierarchy 69 760, 535 page requests were mapped onto the concepts from this hierarchy: Any Services Game Registration Acquisition Home Offline Referrer Advertiser Company Infos Other Offline Service and Support Transaction Information Catalog Store Locator Shopping Cart Fulfillment/ Service Customer Data Payment Information Product = Multi-Channel Concept Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 69
Types of patterns 70 70 n Conversion rates (~ confidence of content-specified sequential association rules) for assessing business success n Association rule and sequence analysis for understanding online/offline preferences and their temporal development n Cluster analysis for customer segmentation n Correlation analysis for investigating the relationship between demographic indicators and online/offline preferences Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 70
>> Session representation 71 71 Each session represented as a feature vector on the multi-channel concepts Two methods used for definition of new conversion metrics: weighted-concept method (number of visits to a concept) Session home infcat infprod service transa purch. offinfo ct A 0 3 7 4 2 1 0 B 1 3 5 0 0 0 2. . . àdichotomized concept method (whether or not concept was visited) Session home infcat infprod service transa purch. offinfo ct A 0 1 1 1 0 B 1 1 1 0 0 0 1 Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/. . . 71
Agenda – Case Study 72 72 Business Understanding Data understanding and preparation Pattern disc. + eval. : Behavioural patterns Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 72
73 73 “Internal consistency“ of preferences – payment and delivery preferences Online payment Direct delivery (s=0. 27, c=0. 97) < 1/3 traditional onl. users! Online payment In-store pickup (s=0. 02, c=0. 03) Cash on delivery Direct delivery (s=0. 02, c=0. 03) In-store payment In-store pickup (s=0. 69, c=0. 94) s: support, c: confidence of the sequence Site is primarily used to collect information. Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 73
74 74 “Internal consistency“ of preferences – return preferences Return In-store (s=0. 06, c=0. 87) Return Mail-in (s=0. 04, c=0. 13) s: support, c: confidence of the association rule Customers may wish personal assistance. (a result supported by the service mix analysis of different multichannel retailers and by questionnaire results) Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 74
Development of preferences over time 75 75 s: support, c: confidence of the sequence Direct delivery In-store pickup in 1 following transaction (s=0. 001, c=0. 15) Direct delivery in all following transactions (s=0. 003, c=0. 85) In-store pickup Direct delivery in 1 foll. transaction (s=0. 001, c=0. 10) (*) In-store pickup in all foll. transactions (s=0. 004, c=0. 90) Results for payment migration are similar. 90% of repeat customers did not change transaction preferences at all. Rule (*) as an indicator of the development of trust? ! Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 75
76 Agenda 76 Intro: Web Mining, specifically Web Usage Mining task „Find association rules“; apriori algo. Data Acquisition, Understanding, and Preparation Forms of analysis; mining techniques Case study 1: A multi-channel retailer method: Association-rule discovery Case study 2: Search in an educational portal method: Sequence mining / generalized-sequ. discovery Case study 3: Search in a community portal Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 76
77 Agenda 77 Intro: Web Mining, specifically Web Usage Mining task „Find association rules“; apriori algo. Data Acquisition, Understanding, and Preparation Forms of analysis; mining techniques Case study 1: A multi-channel retailer method: Association-rule discovery Case study 2: Search in an educational portal method: Sequence mining / generalized-sequ. discovery Case study 3: Search in a community portal Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 77
78 The site 78 Business understanding / problem definition: * How do users search in this online catalog? * Which search criteria are popular? * Which are efficient? Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 78
The concept hierarchies / site ontology 79 79 (excerpt) SEITE 1 -. . . LI (1 st page of a list) or SEITEn-. . . LI (further page) LA („Land“) SA („Schulart“) SU („Suche“) Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 79
80 Sequence mining – one result pattern: successful search for a school in Germany select t from node a b, template a * b as t where a. url startswith "SEITE 1 -" and a. occurrence = 1 and b. url contains "1 SCHULE" and b. occurrence = 1 and (b. support / a. support) >= 0. 2 80 /liste. html? offset=920&ze ilen=20&anzahl=1323&sprac he=de&sw_kategorie=de&ers cheint=&suchfeld=&suchwer t=&staat=de®ion=by&sch ultyp= a refinement a repetition a continuation one example pattern Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 80 (Berendt & Spiliopoulou, VLDB J. 2000)
81 Agenda 81 Intro: Web Mining, specifically Web Usage Mining task „Find association rules“; apriori algo. Data Acquisition, Understanding, and Preparation Forms of analysis; mining techniques Case study 1: A multi-channel retailer method: Association-rule discovery Case study 2: Search in an educational portal method: Sequence mining / generalized-sequ. discovery Case study 3: Search in a community portal Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 81
An overview of the WUM formalism and algorithm 82 82 Berendt, B. (2007). Web Usage Mining - Modelling: frequentpattern mining I (sequence mining with WUM, classification and clustering). http: //vasarely. wiwi. hu-berlin. de/Web. Mining 07/index 5_final. ppt pp. 10 -19 Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 82
83 Agenda 83 Intro: Web Mining, specifically Web Usage Mining task „Find association rules“; apriori algo. Data Acquisition, Understanding, and Preparation Forms of analysis; mining techniques Case study 1: A multi-channel retailer method: Association-rule discovery Case study 2: Search in an educational portal method: Sequence mining / generalized-sequ. discovery Case study 3: Search in a community portal Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 83
The site 84 84 Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 84
Understanding the semantics of requests 85 85 Step 1: Domain ontology • community portal ka 2 portal. aifb. uni-karlsruhe. de affiliation • ontology-based: • Knowledge base in F-Logic • Static pages: annotations • Dynamic pages: generated from queries • Queries also in F-Logic • Logs contain these queries Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 85
86 Agenda 86 Intro: Web Mining, specifically Web Usage Mining task „Find association rules“; apriori algo. Data Acquisition, Understanding, and Preparation Forms of analysis; mining techniques Case study 1: A multi-channel retailer method: Association-rule discovery Case study 2: Search in an educational portal method: Sequence mining / generalized-sequ. discovery Case study 3: Search in a community portal method Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 86
87 87 You decide! Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 87
In the preparation of a log file 88 (recommendations for open-source tools are shown in green) 88 1. Use qualitative methods for application understanding (read!) 2. Inspect the site and the URLs for data understanding 1. Generate Analog reports for getting base statistics of usage 2. Build concept system / hierarchy and mapping: URLs concepts (notation: WUMprep regex) 3. Use WUMprep for data preparation 1. Remove unwanted entries (pictures etc. ) 2. Sessionize 3. Remove robots 4. Replace URLs by concepts 5. (Build a database) 4. Use WEKA for modelling 1. [ Transform log file into ARFF (WUMprep 4 WEKA) ] 2. Cluster, classify, find association rules, . . . 5. Use WUM for modelling 6. Select patterns based on objective interestingness measures (support, confidence, lift, . . . ) and on subjective interestingness measures (unexpected? Application-relevant? ) 7. Present results in tabular, textual and graphical form (use Excel, . . . ) 8. Interpret the results 9. Make recommendations for site improvement etc. Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 88
89 In the case study: 89 1. Use qualitative methods for application understanding (read!) 2. Inspect the site and the URLs for data understanding 1. Generate Analog reports for getting base statistics of usage 2. Build concept system / hierarchy and mapping: URLs concepts (notation: WUMprep regex) 3. Use WUMprep for data preparation 1. Remove unwanted entries (pictures etc. ) 2. Sessionize 3. Remove robots 4. Replace URLs by concepts 5. (Build a database) done 4. Use WEKA for modelling 1. [ Transform log file into ARFF (WUMprep 4 WEKA) ] 2. Cluster, classify, find association rules, . . . 5. Use WUM for modelling 6. Select patterns based on objective interestingness measures (support, confidence, lift, . . . ) and on subjective interestingness measures (unexpected? Application-relevant? ) 7. Present results in tabular, textual and graphical form (use Excel, . . . ) 8. Interpret the results 9. Make recommendations for site improvement etc. Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 89
URLs of the tools 90 90 Analog: http: //www. analog. cx/ WUMprep: http: //www. hypknowsys. de/ WEKA: http: //www. cs. waikato. ac. nz/ml/weka/ WUM: http: //www. hypknowsys. de/ Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 90
Short introductions to WUMprep 91 91 Lüderitz, S. (2006). Pre-processing of webserver logs for data mining. http: //www. cs. kuleuven. be/~berendt/teaching/2007 w/adb/Lecture/Other. Slides/luederitz-presentation 1 -slides_2006_07_10. pdf (pp. 30 -32) Dettmar, G. (2003). Logfile-Preprocessing using WUMprep. http: //warhol. wiwi. hu-berlin. de/~berendt/lehre/2003 w/wmi/Student_Presentations/Gebhard_WUMprep. pdf Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 91
Materials for your case study 92 92 n Original log n A transformed log (to simplify your work of sessionizing) n Some explanation: http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/Lect ure/Other. Slides/explaining-the-ka 2 portal-logs. html l n (original log and transformed log are hyperlinked there) The ontology l l n http: //annotation. semanticweb. org/iswc. daml You can browse this ontology (it is the default ontology, see Wizard) for example with the Ontomat tool: http: //annotation. semanticweb. org/ontomat/simple. html Unfortunately, the site itself is not running any more! Use www. archive. org to inspect earlier versions Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 92
To structure your case study: 93 93 More details in CRISP-DM 1. 0. Step-bystep data mining guide. www. crisp-dm. org/CRISPWP-0800. pdf Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 93
94 Agenda 94 Intro: Web Mining, specifically Web Usage Mining task „Find association rules“; apriori algo. Data Acquisition, Understanding, and Preparation Forms of analysis; mining techniques Case study 1: A multi-channel retailer method: Association-rule discovery Case study 2: Search in an educational portal method: Sequence mining / generalized-sequ. discovery Case study 3: Search in a community portal More unstructured/semistructured data: text Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 94
References / background reading (1) 95 95 n Association rule mining is a basic technique covered in all major textbooks (see literature list of last session) n The apriori algorithm was introduced by l n Agrawal, R. and Srikant, R. 1994. Fast Algorithms for Mining Association Rules in Large Databases. In Proceedings of the 20 th international Conference on Very Large Data Bases (September 12 - 15, 1994). J. B. Bocca, M. Jarke, and C. Zaniolo, Eds. Very Large Data Bases. Morgan Kaufmann Publishers, San Francisco, CA, 487 -499. http: //citeseerx. ist. psu. edu/viewdoc/summary? doi=10. 1. 1. 40. 7506 Web mining l l n Baldi, P. , Frasconi, P. , & Smyth, P. (2003). Modeling the Internet and the Web. Probabilistic Methods and Algorithms. Chichester, UK: John Wiley & Sons. http: //ibook. ics. uci. edu/ Bing Liu (2006). Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data (Data-Centric Systems and Applications). Springer. http: //www. cs. uic. edu/%7 Eliub/Web. Mining. Book. html A general overview of Web usage mining l Srivastava, J. , Desikan, P. , & Kumar, V. (2004). Web Mining - Concepts, Applications and Research Directions. In H. Kargupta, A. Joshi, K. Sivakumar, & Y. Yesha (Eds. ), Data Mining: Next Generation Challenges and Future Directions (pp. 405 -423). Menlo Park, CA: AAAI/MIT Press. (earlier, longer version: http: //www. ieee. org. ar/downloads/Srivastava-tut-paper. pdf Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 95
References / background reading (2) 96 96 n Data preparation for Web usage mining l Cooley, R. , B. Mobasher, J. Srivastava. 1999. Data preparation for mining world wide web browsing patterns. J. of Knowledge and Inform. Systems 1 5– 32. http: //citeseer. ist. psu. edu/cooley 99 data. html l Spiliopoulou, M. , Mobasher, B. , Berendt, B. , & Nakagawa, M. (2003). A framework for the evaluation of session reconstruction heuristics in Web-usage analyis. INFORMS Journal on Computing, 15, 171 -190. http: //warhol. wiwi. hu-berlin. de/~berendt/Papers/spiliopoulou_etal_2003. pdf n Case study 1 l Teltzrow, M. , & Berendt, B. (2003). Web-Usage-Based Success Metrics for Multi. Channel Businesses. In Proceedings of the Web. KDD 2003 Workshop - Webmining as a Premise to Effective and Intelligent Web Applications. August 27 th, 2003, Washington DC, USA. Held in conjunction with The Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. http: //warhol. wiwi. hu-berlin. de/~teltzrow/teltzrow_berendt_webkdd 03. pdf l Teltzrow, M. , Berendt, B. , & Günther, O. (2003). Consumer behaviour at multi-channel retailers. In Proceedings of the 4 th IBM e. Business Conference, School of Management, University of Surrey, 9 th December 2003. http: //warhol. wiwi. hu-berlin. de/~berendt/Papers/teltzrow_berendt_guenther_2003. pdf n Case study 2 l Berendt, B. & Spiliopoulou, M. (2000). Analysis of navigation behaviour in web sites integrating multiple information systems. The VLDB Journal, 9, 56 -75. http: //vasarely. wiwi. hu-berlin. de/Home/berendt-spiliopoulou-vldbj 00. pdf Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 96
Acknowledgements 97 97 n Slides 14 -21 have been inspired by Jeffrey Ullman. (n. d. ) Association rules. http: //infolab. stanford. edu/~ullman/mining/assoc-rules 1. ppt n The Latex formulas for confidence and lift were taken from http: //en. wikipedia. org/wiki/Association_rule Berendt: Advanced databases, first semester 2008, http: //www. cs. kuleuven. be/~berendt/teaching/2008 w/adb/ 97
d9729152c7a8ca8a930e48c0f77c5301.ppt