Скачать презентацию ACCTG 6910 Building Enterprise Business Intelligence Systems Скачать презентацию ACCTG 6910 Building Enterprise Business Intelligence Systems

0b2012bf58b11b87e05d3bcb1475199a.ppt

  • Количество слайдов: 25

ACCTG 6910 Building Enterprise & Business Intelligence Systems (e. bis) Introduction to Data Mining ACCTG 6910 Building Enterprise & Business Intelligence Systems (e. bis) Introduction to Data Mining Olivia R. Liu Sheng, Ph. D. Emma Eccles Jones Presidential Chair of Business 1

Outline • Introduction – Why data mining? – What is data mining? – Data Outline • Introduction – Why data mining? – What is data mining? – Data mining process • Types of Data Mining Tasks • Main Data Mining Tools • Reading – T 2, Ch. 1 2

Why Business Intelligence Systems? • Knowledge Management Problems (Drowning in data, starving for knowledge) Why Business Intelligence Systems? • Knowledge Management Problems (Drowning in data, starving for knowledge) 1. Can’t access data (easily) E. g. , data from different branches, years, functional areas, etc. 2. Give me only what’s important (knowledge) E. g. , which products do customers tend to buy together? 3. I need to reduce data to what’s important by slicing and dicing. E. g. , by branch, product, year, etc. 3

Why Business Intelligence Systems? 4. Data inconsistency and poor data quality E. g. , Why Business Intelligence Systems? 4. Data inconsistency and poor data quality E. g. , the 2001 PC sales amount in SLC from the CFO and the SLC Account Manager are not the same. 5. Need to improve the practices of making informed decisions. E. g. , Did the VP for Marketing decide on the advertising budgets for branches in the SW region based on their sales performances over the last five years? 6. Hard and slow to query the database? E. g. , VP for Marketing, CFO and Account Manager had to wait for the MIS Department to generate sales performance reports and analyses. 4

Why Business Intelligence Systems? • ROI Problems 7. Can I get more value out Why Business Intelligence Systems? • ROI Problems 7. Can I get more value out of my data? Ans: Make informed, potent decisions using knowledge extracted from integrated and consistent data over a long period of time. 8. Can I do this cost-effectively? 9. Can I easily scale up or change how I get knowledge out of my data? Options: manually versus automatically identifying knowledge 5

Why data mining? • OLAP can only provide shallow data analysis -what – Ex: Why data mining? • OLAP can only provide shallow data analysis -what – Ex: sales distribution by product 6

Why data mining? • Shallow data analysis is not sufficient to support business decisions Why data mining? • Shallow data analysis is not sufficient to support business decisions -- how – Ex: how to boost sales of other products – Ex: when people buy product 6 what other products do they are likely to buy? – cross selling 7

Why data mining? • OLAP can only do shallow data analysis – OLAP is Why data mining? • OLAP can only do shallow data analysis – OLAP is based on SQL SELECT PRODUCTS. PNAME, SUM(SALESFACTS. SALES_AMT) FROM DBSR. PRODUCTS, DBSR. SALESFACTS WHERE ( ( PRODUCTS. PRODUCT_KEY = SALESFACTS. PRODUCT_KEY ) ) GROUP BY PRODUCTS. PNAME; – The nature of SQL decides that complicated algorithm cannot be implemented with SQL. • Complicated algorithms need to be developed to support deep data analysis – data mining 8

Why Data Mining? Walmart (!? ) Diaper + Beer = $$$ ? 9 Why Data Mining? Walmart (!? ) Diaper + Beer = $$$ ? 9

Market Basket (Association Rule) Analysis A market basket is a collection of items purchased Market Basket (Association Rule) Analysis A market basket is a collection of items purchased by in an individual customer transaction, which is a well-d business activity Ex: • a customer’s visit a grocery store • an online purchase from a virtual store such as ‘Amazo 10

Market Basket (Association Rule) Analysis Market basket analysis is a common analysis run against Market Basket (Association Rule) Analysis Market basket analysis is a common analysis run against a transaction database to find sets of items, or itemsets, that appear together in many transactions. Each pattern extra through the analysis consists of an itemset and the number o transactions that contain it. Applications: • improve the placement of items in a store • the layout of mail-order catalog pages • the layout of Web pages • others? 11

 • Degenerate key provides additional grouping of fact records Impractical to view market • Degenerate key provides additional grouping of fact records Impractical to view market baskets using OLAP tools Degenerate Key: ORDER_NO 12

Why data mining? • OLAP results generated from data sets with large number of Why data mining? • OLAP results generated from data sets with large number of attributes are difficult to be interpreted – Ex: cluster customers of my company --- target marketing – Pick two attributes related to a customer: income level and sales amount 13

Why data mining? – Ex: cluster customers of my company --- target marketing – Why data mining? – Ex: cluster customers of my company --- target marketing – Pick three attributes related to a customer: income level, education level and sales amount 14

What is data mining? • Data mining is a process to extract hidden and What is data mining? • Data mining is a process to extract hidden and interesting patterns from data. • Data mining is a step in the process of Knowledge Discovery in Database (KDD). 15

What is NOT Data Mining? • Not SQL language – SQL : extraction of What is NOT Data Mining? • Not SQL language – SQL : extraction of detailed data • Not OLAP – OLAP : summary, trends, forecasts • Not Magic: – Data Mining: Based on algorithms that can discover hidden patterns. It is interactive, not fully automated 16

Major data mining tasks • Association rule mining – e. g. , to cross Major data mining tasks • Association rule mining – e. g. , to cross sell, identify other items that a customer tends to buy if the customer has already purchased item A • Clustering – e. g. , for target marketing identify clusters of similar customers • Classification – e. g. , for fraud detection, identify which customer or transaction is fraudulent 17

Steps of the KDD Process Step 4: Data Mining Step 2: Cleaning Step 5: Steps of the KDD Process Step 4: Data Mining Step 2: Cleaning Step 5: Interpretation & Evaluation Knowledge Step 3: Transformation Patterns Step 1: Selection Data Transformed Data Preprocessed Data Target Data 18

Steps of the KDD Process • Step 1: select interested columns (attributes) and rows Steps of the KDD Process • Step 1: select interested columns (attributes) and rows (records) to be mined. • Step 2: clean errors from selected data • Step 3: data are transformed to be suitable for high performance data mining • Step 4: data mining • Step 5: filter out non-interesting patterns from data mining results 19

Data mining – on what kind of data • • Transactional Database Data warehouse Data mining – on what kind of data • • Transactional Database Data warehouse Flat file Web data – Web content – Web structure – Web log 20

Step 4: Data mining Step 5: Interpretation & evaluation Discovered knowledge Step 3: Transformation Step 4: Data mining Step 5: Interpretation & evaluation Discovered knowledge Step 3: Transformation Step 2: Cleaning & preprocessing Step 1: Selection Target data for DM Patterns Transformed data for DM Preprocessed data for DM OLAP & reporting Data warehouse Step 2: Selection Domain expert Step 3: Cleaning & preprocessing Step 4: Transformation Transformed data for DW Step 1: Acquisition Raw data Target data for DW Preprocessed data for DW 21

Data Mining Tools • Over 100 commercial data mining tools available, new entries keep Data Mining Tools • Over 100 commercial data mining tools available, new entries keep arriving • Tools offer a variety of functionality and features, making evaluation and comparison difficult 22

Evaluation Criteria 1. System Requirements 2. Data Access 3. Mining Performance Data Mover (Data Evaluation Criteria 1. System Requirements 2. Data Access 3. Mining Performance Data Mover (Data Access) Server Side Database or Flat files 4. User Interface Data Mining Engine Tool Manager (Often GUI) Visualization Tools Client End Users 5. Visualization 23

Data Mining Tools: Market Leaders Class choice 24 Data Mining Tools: Market Leaders Class choice 24

Web Analytics Software Providers • • • • • http: //surfaid. dfw. ibm. com/web/home/index. Web Analytics Software Providers • • • • • http: //surfaid. dfw. ibm. com/web/home/index. html http: //pro. blogger. com/ http: //www. clickstream. com/ http: //www. deepmetrix. com/index. asp? source=google&keyword=web+analytics http: //www. eloqua. com/srch/analytics. asp http: //surfaid. dfw. ibm. com/web/home/index. html http: //www. intellitracker. com/ http: //www. maxamine. com/ http: //www. mediahouse. com/ http: //www. netiq. com/webtrends/default. asp http: //www. omniture. com/products. html http: //www. sitebrand. com/? source=jan http: //www. statsoftinc. com/ http: //www. urchin. com/ http: //www. webabacus. com/ http: //www. websidestory. com/ http: //www. databeacon. com/index_IE. html http: //www. sane. com/ads/whoiscoming. html 25