Скачать презентацию OLAM and Data Mining Concepts and Techniques Скачать презентацию OLAM and Data Mining Concepts and Techniques

4ce510405debf0d71329de856d41ea69.ppt

  • Количество слайдов: 13

OLAM and Data Mining: Concepts and Techniques OLAM and Data Mining: Concepts and Techniques

Introduction • Data explosion problem: – Automated data collection tools and mature database technology Introduction • Data explosion problem: – Automated data collection tools and mature database technology lead to tremendous amounts of data stored in databases, data warehouses and other information repositories • We are drowning in data, but starving for knowledge! • Data warehousing and data mining: – On-line analytical processing – query-driven data analysis – The efficient discovery of interesting knowledge (rules, regularities, patterns, constraints) from data in large databases

Evolution of Database Technology • 1960 s: – Data collection, database creation, IMS and Evolution of Database Technology • 1960 s: – Data collection, database creation, IMS and network DBMS • 1970 s: – Relational data model, relational DBMS • 1980 s: – RDBMS, advanced data models (extended-relational, OO, deductive, etc. ) and application-oriented DBMS (spatial, scientific, engineering, etc. ) • 1990 s: – Data mining and data warehousing, multimedia databases, and Web technology

What is data mining? • Data mining: the process of efficient discovery of previously What is data mining? • Data mining: the process of efficient discovery of previously unknown patterns, relationships, rules in large databases and data warehouses • Goal: help the human analyst to understand the data • SQL query: – How many bottles of wine did we sell in 1 st Qtr of 1999 in Poland vs Austria?

What is data mining? • Data mining query: – How do the buyers of What is data mining? • Data mining query: – How do the buyers of wine in Poland Austria differ? – What else do the buyers of wine in Austria buy along with wine? – How the buyers of wine can be characterized?

What is data mining? • Data mining (knowledge discovery in databases): – Extraction of What is data mining? • Data mining (knowledge discovery in databases): – Extraction of interesting ( non-trivial, implicit, previously unknown and potentially useful) information from data in large databases • Alternative names and their “inside stories”: – Knowledge discovery in databases (KDD: SIGKDD), knowledge extraction, data archeology, data dredging, information harvesting, business intelligence, etc. – Data mining: a misnomer? • What is not data mining? – Expert systems or small statistical programs – OLAP

Data Mining: A KDD Process • Steps of a KDD Process: – Learning the Data Mining: A KDD Process • Steps of a KDD Process: – Learning the application domain: • relevant prior knowledge and goals of application – – Creating a target data set: data selection Data cleaning and preprocessing: (may take 60% of effort!) Data reduction and projection: Find useful features, dimensionality/variable reduction, invariant representation. – Choosing functions of data mining • summarization, classification, regression, association, clustering. – Choosing the mining algorithm(s) – Data mining: search for patterns of interest – Interpretation: analysis of results. • visualization, transformation, removing redundant patterns, etc. – Use of discovered knowledge

Data Mining and Business Intelligence Increasing potential to support business decisions Making Decisions Data Data Mining and Business Intelligence Increasing potential to support business decisions Making Decisions Data Presentation Visualization Data Mining Information Discovery End User Business Analyst Data Exploration Statistical Analysis, Reporting Data Warehouses/Data Marts OLAP, MDA Data Sources Paper, Files, Database systems, OLTP, WWW Data Analyst DBA

Mining query Mining result User GUI API OLAM Engine OLAP Engine Data Cube API Mining query Mining result User GUI API OLAM Engine OLAP Engine Data Cube API MDDB Meta Data Filtering&Integration Databases Database API Filtering Data cleaning Data integration Warehouse An OLAM Architecture

Data Mining: Confluence of Multiple Disciplines • • Database systems, data warehouse and OLAP Data Mining: Confluence of Multiple Disciplines • • Database systems, data warehouse and OLAP Statistics Machine learning Visualization Information science High performance computing Other disciplines: – Neural networks, mathematical modeling, information retrieval, pattern recognition, etc.

Data Mining: On What Kind of Data? • • Relational databases Data warehouses Transactional Data Mining: On What Kind of Data? • • Relational databases Data warehouses Transactional databases Advanced DB systems and information repositories – – – Object-oriented and object-relational databases Spatial databases Time-series data and temporal data Text databases and multimedia databases Heterogeneous and legacy databases WWW

Data Mining Functionality Data mining methods may be classified onto 6 basic classes: • Data Mining Functionality Data mining methods may be classified onto 6 basic classes: • Associations – Finding rules like “if the customer buys mustard, sausage, and beer, then the probability that he/she buys chips is 50%” • Classifications – Classify data based on the values of the decision attribute, e. g. classify patients based on their “state” • Clustering – Group data to form new classes, cluster customers based on their behavior to find common patterns

Data Mining Functionality • Sequential patterns – Finding rules like “if the customer buys Data Mining Functionality • Sequential patterns – Finding rules like “if the customer buys TV, then, few days later, he/she buys camera, then the probability that he/she will buy within 1 month video is 50%” • Time-Series similarities – Finding similar sequences (or subsequences) in timeseries (e. g. stock analysis) • Outlier detection – Finding anomalies/exceptions/deviations in data