3e322bf02a3356508ce9606b144cf98e.ppt
- Количество слайдов: 12
MIS 451 Building Business Intelligence Systems Introduction to Data Mining
Why data mining? n OLAP can only provide shallow data analysis -- what n Ex: sales distribution by product 2
Why data mining? n Shallow data analysis is not sufficient to support business decisions -- how n n Ex: how to boost sales of other products Ex: when people buy product 6 what other products do they are likely to buy? – cross selling 3
Why data mining? n OLAP can only do shallow data analysis n OLAP is based on SQL SELECT PRODUCTS. PNAME, SUM(SALESFACTS. SALES_AMT) FROM DBSR. PRODUCTS, DBSR. SALESFACTS WHERE ( ( PRODUCTS. PRODUCT_KEY = SALESFACTS. PRODUCT_KEY ) ) GROUP BY PRODUCTS. PNAME; n n The nature of SQL decides that complicated algorithm cannot be implemented with SQL. Complicated algorithms need to be developed to support deep data analysis – data mining 4
Why data mining? n OLAP results generated from data sets with large number of attributes are difficult to be interpreted n Ex: cluster customers of my company --- target marketing n Pick two attributes related to a customer: income level and sales amount 5
Why data mining? n n Ex: cluster customers of my company --- target marketing Pick three attributes related to a customer: income level, education level and sales amount 6
What is data mining? n n Data mining is a process to extract hidden and interesting patterns from data. Data mining is a step in the process of Knowledge Discovery in Database (KDD). 7
Steps of the KDD Process Step 4: Data Mining Step 2: Cleaning Step 5: Interpretation & Evaluation Knowledge Step 3: Transformation Patterns Step 1: Selection Data Transformed Data Preprocessed Data Target Data 8
Steps of the KDD Process n n n Step 1: select interested columns (attributes) and rows (records) to be mined. Step 2: clean errors from selected data Step 3: data are transformed to be suitable for high performance data mining Step 4: data mining Step 5: filter out non-interesting patterns from data mining results 9
Data mining – on what kind of data n n Transactional Database Data warehouse Flat file Web data n n n Web content Web structure Web log 10
Major data mining tasks n Association rule mining – cross selling n Clustering – target marketing n Classification – potential customer identification, fraud detection 11
n Reading : data mining book chapter 1 12
3e322bf02a3356508ce9606b144cf98e.ppt