Скачать презентацию Chapter 1 Why What is Data Mining Скачать презентацию Chapter 1 Why What is Data Mining

0fe1d0e5ba223ec6712f89ea89db79e4.ppt

  • Количество слайдов: 30

Chapter 1 Why & What is Data Mining? Note: Included in this Slide Set Chapter 1 Why & What is Data Mining? Note: Included in this Slide Set is both Chapter 1 material and additional material from the instructor.

Data Mining is a subset of Business Intelligence (BI) 2 Data Mining is a subset of Business Intelligence (BI) 2

Topics to Discuss in Session #1 • • What is Data Mining (DM)? Who Topics to Discuss in Session #1 • • What is Data Mining (DM)? Who uses DM? Why DM Where DM When DM How DM Why study DM 3

What, Who Data Mining – Definition & Goal • Definition – DM is the What, Who Data Mining – Definition & Goal • Definition – DM is the exploration and analysis of large quantities of data in order to discover meaningful patterns and rules. • Goal – To allow an “enterprise”* to IMPROVE its ______ through better understanding of its ______. – Potential for Competitive Advantage. * Synonyms include: corporation, firm, non-profit organization, government agency 4

Foundations of Data Mining ü Data mining is the process of using “raw” data Foundations of Data Mining ü Data mining is the process of using “raw” data to infer important “business” relationships. ü Despite a consensus on the value of data mining, a great deal of confusion exists about what it is. ü Data Mining is a collection of powerful techniques intended for analyzing large amounts of data. ü There is no single data mining approach, but rather a set of techniques that can be used stand alone or in combination with each other. 5

Why, Where, When Data Mining – Why now? 1. Data are being produced 2. Why, Where, When Data Mining – Why now? 1. Data are being produced 2. Data are being warehoused 3. Computing power is more affordable 4. Competitive pressures are enormous 5. Data Mining software is available 6

How Customer Relationship Management (CRM) 7 How Customer Relationship Management (CRM) 7

Customer Relationship Management (CRM) How In order to form a learning relationship with its Customer Relationship Management (CRM) How In order to form a learning relationship with its customers, an enterprise (firm) must be able to: 1. Notice – what its customers are doing 2. Remember – what it and its customers have done over time 3. Learn – from what it has remembered 4. Act On – what it has learned to make customers more profitable 8

How Based on “Transaction” Data 9 How Based on “Transaction” Data 9

How Based on “Transaction” Data 10 How Based on “Transaction” Data 10

Identifying and Remembering Relationships is the Key! How 11 Identifying and Remembering Relationships is the Key! How 11

Group Exercise #1 • Time Box = 15 minutes • Teams of 4 or Group Exercise #1 • Time Box = 15 minutes • Teams of 4 or less • Discuss DM situations among yourselves and pick one to report to the class • What to report (verbally – 5 minute max): – Describe the DM situation – How does it help the enterprise? • Presentations…another 15 to 30 minutes 12

Why Study Data Mining? Open discussion to identify these 13 Why Study Data Mining? Open discussion to identify these 13

Topics to Discuss in Session #2 • Data Mining History • Data Warehouse • Topics to Discuss in Session #2 • Data Mining History • Data Warehouse • Data Mart 14

Data Mining History • The approach has roots in practice dating back over 40 Data Mining History • The approach has roots in practice dating back over 40 years. • In the early 1960 s, data mining was called statistical analysis, and the pioneers were statistical software companies such as SAS and SPSS. • By the late 1980 s, the traditional techniques had been augmented by new methods such as fuzzy logic, heuristics and neural networks. 15

Definitions of a Data Warehouse “A subject-oriented, integrated, time-variant and 1. non-volatile collection of Definitions of a Data Warehouse “A subject-oriented, integrated, time-variant and 1. non-volatile collection of data in support of management's decision making process” - W. H. Inmon 2. “A copy of transaction data, specifically structured for query and analysis” - Ralph Kimball 16

Data Warehouse • For organizational learning to take place, data from many sources must Data Warehouse • For organizational learning to take place, data from many sources must be gathered together and organized in a consistent and useful way – hence, Data Warehousing (DW) • DW allows an organization (enterprise) to remember what it has noticed about its data • Data Mining techniques make use of the data in a DW 17

Data Warehouse Enterprise “Database” Customers Orders Transactions Vendors Etc… Data Miners: • “Farmers” – Data Warehouse Enterprise “Database” Customers Orders Transactions Vendors Etc… Data Miners: • “Farmers” – they know • “Explorers” - unpredictable Etc… Copied, organized summarized Data Warehouse Data Mining 18

Data Warehouse q A data warehouse is a copy of transaction data specifically structured Data Warehouse q A data warehouse is a copy of transaction data specifically structured for querying, analysis and reporting – hence, data mining. q Note that the data warehouse contains a copy of the transactions which are not updated or changed later by the transaction system. q Also note that this data is specially structured, and may have been transformed when it was copied into the data warehouse. 19

Data Mart • A Data Mart is a smaller, more focused Data Warehouse – Data Mart • A Data Mart is a smaller, more focused Data Warehouse – a mini-warehouse. • A Data Mart typically reflects the business rules of a specific business unit within an enterprise. 20

Data Warehouse to Data Mart Data Warehouse Decision Support Information Data Mart Decision Support Data Warehouse to Data Mart Data Warehouse Decision Support Information Data Mart Decision Support Information 21

Data Warehouse & Mart • Set of “Tables” – 2 or more dimensions • Data Warehouse & Mart • Set of “Tables” – 2 or more dimensions • Designed for Aggregation 22

Group Exercise #2 • Time Box = 15 minutes • Teams of 4 or Group Exercise #2 • Time Box = 15 minutes • Teams of 4 or less • Discuss Data Warehouse to Data Mart situations among yourselves and pick one to report to the class • What to report (verbally – 5 minute max): – Describe the DW to Data Mart situation – How does it help the enterprise’s “business” unit? • Presentations…another 15 to 30 minutes 23

Topics to Discuss in Session #3 • Data Mining Flavors • Data Mining Examples Topics to Discuss in Session #3 • Data Mining Flavors • Data Mining Examples • Data Mining Tasks • Data Mining’s Biggest Challenge • What does all of this mean? 24

Data Mining Flavors • Directed – Attempts to explain or categorize some particular target Data Mining Flavors • Directed – Attempts to explain or categorize some particular target field such as income or response. • Undirected – Attempts to find patterns or similarities among groups of records without the use of a particular target field or collection of predefined classes. 25

Data Mining Examples in Enterprises • US Government – FBI – track down criminals Data Mining Examples in Enterprises • US Government – FBI – track down criminals (SD Police also) – Treasury Dept – suspicious int’l funds transfer • Phone companies • Supermarkets & Superstores (Vons, Albertsons, Wal. Mart, Costco) • Mail-Order, On-Line Order (L. L. Bean, Victoria’s Secret, Lands End) • Financial Institutions (Bof. A, Wells Fargo, Charles Schwab) • Insurance Companies (USAA, Allstate, State Farm) • Tons of others… 26

Data Mining Tasks • Classification – example: Fr, So, Jr, Sr • Estimation – Data Mining Tasks • Classification – example: Fr, So, Jr, Sr • Estimation – example: household income • Prediction – example: predict credit card balance transfer average amount • Affinity Grouping – Example: people who buy X, often buy Y also with probability Z% • Clustering – similar to classification but no predefined classes • Description and Profiling – behavior begets an explanation such as “More guys prefer In-n-Out Burger than do gals. ” 27

Data Mining’s Biggest Challenge • The largest challenge a data miner may face is Data Mining’s Biggest Challenge • The largest challenge a data miner may face is the sheer volume of data in the data warehouse. • It is quite important, then, that summary data also be available to get the analysis started. • A major problem is that this sheer volume may mask the important relationships the data miner is interested in. • The ability to overcome the volume and be able to interpret the data is quite important. 28

What Does All of This Mean? • On a regular basis, “farmers” and “explorers” What Does All of This Mean? • On a regular basis, “farmers” and “explorers” utilize their data warehouses to give guidance for and/or answer a limitless variety of questions. • Nothing is free, however, and the benefits do come with a cost. • The value of a data warehouse and subsequent data mining is a result of the new and changed business processes it enables – competitive advantage also. • There are limitations, though - A Data Warehouse cannot correct problems with its data, although it may help to more clearly identify them. 29

End of Chapter 1 30 End of Chapter 1 30