Скачать презентацию ACAI 05 SEKT 05 ADVANCED COURSE ON KNOWLEDGE DISCOVERY Скачать презентацию ACAI 05 SEKT 05 ADVANCED COURSE ON KNOWLEDGE DISCOVERY

719bf06441969fc7a2cf0c75cc688252.ppt

  • Количество слайдов: 73

ACAI’ 05/SEKT’ 05 ADVANCED COURSE ON KNOWLEDGE DISCOVERY Data Mining and Decision Support Integration ACAI’ 05/SEKT’ 05 ADVANCED COURSE ON KNOWLEDGE DISCOVERY Data Mining and Decision Support Integration Marko Bohanec 1 Jožef Stefan Institute Department of Knowledge Technologies & University of Ljubljana Faculty of Administration

Data Mining vs. Decision Support knowledge discovery from data Use of models: Data Mining Data Mining vs. Decision Support knowledge discovery from data Use of models: Data Mining • classificatio n model data • clustering • evaluation modeling • analysis Decision Support 2 decision makers+ experts+ decision analysts • visualizatio n model • explanation • . . .

Overview 1. Decision Support: – – – Decision problem Decision-making Decision support Decision analysis Overview 1. Decision Support: – – – Decision problem Decision-making Decision support Decision analysis Multi-attribute modeling 2. Decision Support and Data Mining – How to combine and integrate DS and DM? • • • DS for DM DM for DS DM, then DS DS, then DM DM and DS – DS for DM: ROC space – DM and DS: Combining DEX and HINT 3

Literature Part I: Basic Technologies – Chapter 3: Decision Support – Chapter 4: Integration Literature Part I: Basic Technologies – Chapter 3: Decision Support – Chapter 4: Integration of Data Mining and Decision Support Part II: Integration Aspects of DM and DS – Chapter 7: DS for DM: ROC Analysis Part III: Applications of DM and DS – Chapter 15: Five Decision Support Applications – Chapter 16: Large and Tall Buildings – Chapter 17: Educational Planning 4

1. Decision Support Decision Problem Decision-Making Decision Support Decision Analysis Multi-Attribute Modeling 5 Chapter 1. Decision Support Decision Problem Decision-Making Decision Support Decision Analysis Multi-Attribute Modeling 5 Chapter 3 – M. Bohanec: Decision Support

Decision-Making Decision: The choice of one among a number of alternatives Decision-Making: A process Decision-Making Decision: The choice of one among a number of alternatives Decision-Making: A process of making the choice that includes: • Assessing the problem • Collecting and verifying information • Identifying alternatives • Anticipating consequences of decisions • Making the choice using sound and logical judgment based on available information • Informing others of decision and rationale • Evaluating decisions 6

Decision Problem options (alternatives) goals • FIND the option that best satisfies the goals Decision Problem options (alternatives) goals • FIND the option that best satisfies the goals • RANK options according to the goals 7 • ANALYSE, JUSTIFY, EXPLAIN, …, the decision

Types of Decisions • Easy (routine, everyday) vs. Difficult (complex) • One-Time vs. Recurring Types of Decisions • Easy (routine, everyday) vs. Difficult (complex) • One-Time vs. Recurring • One-Stage vs. Sequential • Single Objective vs. Multiple Objectives • Individual vs. Group • Structured vs. Unstructured • Tactical, Operational, Strategic 8

Characteristics of Complex Decisions • Novelty • Unclearness: Incomplete knowledge about the problem • Characteristics of Complex Decisions • Novelty • Unclearness: Incomplete knowledge about the problem • Uncertainty: Outside events that cannot be controlled • Multiple objectives (possibly conflicting) • Group decision-making • Important consequences of the decision • Limited resources 9

Decision-Making 10 Human DM Decision Sciences Machine DM Decision Systems • Switching circuits • Decision-Making 10 Human DM Decision Sciences Machine DM Decision Systems • Switching circuits • Processors • Computer programs • Systems for routine DM • Autonomous agents • Space probes

Decision-Making Decision Sciences Normati ve Decision Theory Utility Theory Game Theory of Choice 11 Decision-Making Decision Sciences Normati ve Decision Theory Utility Theory Game Theory of Choice 11 Decision Systems Descriptiv Decision e Support Cognitive Psychology Social and Behavioral Sciences

Decision Support: Methods and tools for supporting people involved in the decisionmaking process Central Decision Support: Methods and tools for supporting people involved in the decisionmaking process Central Disciplines: • Operations Research and Management Sciences • Decision Analysis • Decision Support Systems Contributing and Related Disciplines: • Decision Sciences (other than DS itself) • Statistics, Applied Mathematics • Computer Sciences: Information Systems, Databases, Data Warehouses, OLAP • Artificial Intelligence: Expert Systems, ML, NN, GA • Knowledge Discovery from Databases and Data Mining Other Methods and Tools: • Representation and visualization tools • Methods and tools for organizing data, facts, thoughts, . . . • Communication technology • Mediation systems 12

Decision-Making Decision Sciences Decision Systems Normati Descriptiv Decision ve e Support 13 OR/MS DA Decision-Making Decision Sciences Decision Systems Normati Descriptiv Decision ve e Support 13 OR/MS DA Decision Influence trees diagrams DSS Multiattribute models Other

Decision Analysis: Applied Decision Theory Provides a framework for analyzing decision problems by • Decision Analysis: Applied Decision Theory Provides a framework for analyzing decision problems by • structuring and breaking them down into more manageable parts, • explicitly considering the: – – possible alternatives, available information uncertainties involved, and relevant preferences • combining these to arrive at optimal (or "good") decisions 14

The Decision Analysis Process 15 Identify decision situation and understand objectives Identify alternatives Decompose The Decision Analysis Process 15 Identify decision situation and understand objectives Identify alternatives Decompose and model • problem structure • uncertainty • preferences Sensitivity Analyses Choose best alternative Implement Decision

Evaluation Models options 16 EVALUATION MODEL ANALYSIS Evaluation Models options 16 EVALUATION MODEL ANALYSIS

Types of Models in Decision Analysis Decision Trees 17 Multi-Attribute Utility Models Succeed Invest Types of Models in Decision Analysis Decision Trees 17 Multi-Attribute Utility Models Succeed Invest Fail Investmen t Do not invest Influence Diagrams Invest? Cost s Risk s Results Success? Return Analytic Hierarchy Process

Multi-Attribute Models cars 18 buying maint PRICE safety CAR doors pers TECH COMF lug Multi-Attribute Models cars 18 buying maint PRICE safety CAR doors pers TECH COMF lug problem decomposition

Tree of Attributes Decomposition of the problem to sub-problems ( Tree of Attributes Decomposition of the problem to sub-problems ("Divide and Conquer!") CAR PRICE BUYING MAINTEN The most difficult stage! 19 TECH. CHA R. SAFETY COMFORT

Utility Functions (Aggregation) Aggregation: bottom-up aggregation of attributes’ values CAR PRICE 75% BUYING 20 Utility Functions (Aggregation) Aggregation: bottom-up aggregation of attributes’ values CAR PRICE 75% BUYING 20 TECH. CHA R. 25% MAINTEN SAFETY COMFORT low COMFOR T exc TECH. C H. unacc high low unacc med accept high good exc

Evaluation and Analysis 21 EVALUATION • • • direction: bottom-up (terminal root attributes) result: Evaluation and Analysis 21 EVALUATION • • • direction: bottom-up (terminal root attributes) result: each option evaluated inaccurate/uncertain data?

Evaluation and Analysis 22 ANALYSIS • • interactive inspection “what-if” analysis sensitivity analysis explanation Evaluation and Analysis 22 ANALYSIS • • interactive inspection “what-if” analysis sensitivity analysis explanation

DEXi: Computer Program for Multi-Attribute Decision Making • • Creation and editing of – DEXi: Computer Program for Multi-Attribute Decision Making • • Creation and editing of – – model structure (tree of attributes) value scales of attributes decision rules (incl. using weights) options and their descriptions (data) Evaluation of options (can handle missing values) “What-if” analysis Reporting: – – tables charts 23 http: //www-ai. ijs. si/Marko. Bohanec/dexi. html

Some Application Areas 1. INFORMATION TECHNOLOGY 4. PERSONNEL • evaluation of computers MANAGEMENT • Some Application Areas 1. INFORMATION TECHNOLOGY 4. PERSONNEL • evaluation of computers MANAGEMENT • • evaluation of software evaluation of Web portals 2. PROJECTS • • • evaluation of projects evaluation of proposal and investments product portfolio evaluation 3. COMPANIES • • 24 business partner selection performance evaluation of companies • • personnel evaluation selection and composition of expert groups evaluation of personal applications educational planning 5. MEDICINE and HEALTHCARE • • risk assessment diagnosis and prognosis 6. OTHER AREAS • • • assessment of technologies assessments in ecology and environment granting personal/corporate loans

Allocation of Housing Loans 25 Ownership Present Suitability Solving Housing Stage Work stage Advantages Allocation of Housing Loans 25 Ownership Present Suitability Solving Housing Stage Work stage Advantages Earnings Priority Status Maint/Employ Health Family Soc-Health Social Age Children

Medicine: Breast Cancer Risk Assessment 26 Bohanec, M. , Zupan, B. , Rajkovič, V. Medicine: Breast Cancer Risk Assessment 26 Bohanec, M. , Zupan, B. , Rajkovič, V. : Applications of qualitative multi-attribute decision models in health care, International Journal of Medical Informatics 58 -59, 191 -205,

Evaluation and Analysis of Options 27 Evaluation and Analysis of Options 27

Selective Explanation of Options 28 Selective Explanation of Options 28

Diabetic Foot Risk Assessment Who: • General Hospital Novo Mesto, Slovenia • IJS • Diabetic Foot Risk Assessment Who: • General Hospital Novo Mesto, Slovenia • IJS • Infonet, d. o. o. Why: • Reduce the number of amputations • Improve the risk assessment methodology • Improve the DSS module of clinical information system How: • Develop multi-attribute risk assessment model • Evaluate it on patient data (about 3400 patients) • Integrate into the clinical information system 29 Chapter 15 – M. Bohanec, V. Rajkovič, B. Cestnik: 5 DS

Diabetic Foot Risk Assessment 30 Model Structure Diabetic Foot Risk Assessment 30 Model Structure

2. Combining Data Mining and Decision Support 31 How to combine DS and DM? 2. Combining Data Mining and Decision Support 31 How to combine DS and DM? DS for DM: ROC space DM and DS: Combining DEX and HINT Chapter 4 – N. Lavrač, M. Bohanec: Integration of DM

Data Mining vs. Decision Support knowledge discovery from data Use of models: Data Mining Data Mining vs. Decision Support knowledge discovery from data Use of models: Data Mining • classificatio n model data 32 • clustering • evaluation modeling • analysis Decision Support decision makers+ experts+ decision analysts • visualizatio n model • explanation • . . .

DM + DS Integration ? 33 Data Mining Decision Support ? DM + DS Integration ? 33 Data Mining Decision Support ?

DM + DS Integration ! 34 DM + DS Integration ! 34

Combining DM and DS • “DS for DM”: – ROC methodology – meta-learning • Combining DM and DS • “DS for DM”: – ROC methodology – meta-learning • “DM for DS”: – MS Analysis Services – model revision (from data) • “DM, then DS” (sequential application): – Decisions-At-Hand approach • “DS, then DM” (sequential application): – using models in data pre-processing for DM • “DM and DS” (parallel application): 35 – combining through models, e. g. , DEXi and HINT – considering different problem dimensions

“DS for DM” 36 Data Mining Decision Support Decision support within the DM process “DS for DM” 36 Data Mining Decision Support Decision support within the DM process e. g. , ROC curves

ROC space • True positive rate = #true pos. / #pos. – TPr 1 ROC space • True positive rate = #true pos. / #pos. – TPr 1 = 40/50 = 80% – TPr 2 = 30/50 = 60% • False positive rate = #false pos. / #neg. – FPr 1 = 10/50 = 20% – FPr 2 = 0/50 = 0% • ROC space has – FPr on X axis – TPr on Y axis 37 Chapter 7 – Slides by Peter Flach

true positive rate The ROC convex hull 38 false positive rate true positive rate The ROC convex hull 38 false positive rate

true positive rate The ROC convex hull 39 false positive rate true positive rate The ROC convex hull 39 false positive rate

true positive rate Choosing a classifier 40 false positive rate true positive rate Choosing a classifier 40 false positive rate

true positive rate Choosing a classifier 41 false positive rate true positive rate Choosing a classifier 41 false positive rate

“DM for DS” Data Mining Introducing DM methods into the DS process: 42 – “DM for DS” Data Mining Introducing DM methods into the DS process: 42 – MS SQL Server - Analysis Services – model revision Decision Support

“DM for DS”: Model Revision 43 “DM for DS”: Model Revision 43

Sequential Application: “First DS, then DM” Decision Support 44 Data Mining Model 1 Model Sequential Application: “First DS, then DM” Decision Support 44 Data Mining Model 1 Model 2

“First DS, then DM” in Data Pre-Processing 45 Input attributes Generated attributes “First DS, then DM” in Data Pre-Processing 45 Input attributes Generated attributes

Sequential Application: “First DM, then DS” Data Mining 46 Decision Support Model 1 Model Sequential Application: “First DM, then DS” Data Mining 46 Decision Support Model 1 Model 2

Decisions-At-Hand Schema Decision Support Shells … … on Palm Data Mining (Model Construction) 47 Decisions-At-Hand Schema Decision Support Shells … … on Palm Data Mining (Model Construction) 47 Decision Model in XML (Synchronization or Upload) Blaž Zupan et al. : http: //www. ailab. si/app/palm/ … on the Web

“DM and DS” Through Model Development Data 48 Requiremen ts Data Mining Expertise Decision “DM and DS” Through Model Development Data 48 Requiremen ts Data Mining Expertise Decision Support Model Chapter 4 + references Common modeling formalism

Multi-Attribute Decision Models Expertise Data Decision Support T D IN EX H 49 Data Multi-Attribute Decision Models Expertise Data Decision Support T D IN EX H 49 Data Mining Model Qualitative Hierarchical Multi-Attribute Decision Models

Model 1. Qualitative Multi-Attribute Models • Decomposition of the problem to less complex subproblems Model 1. Qualitative Multi-Attribute Models • Decomposition of the problem to less complex subproblems • Qualitative attributes • Decision rules CAR PRICE buying 50 maint TECH safety doors COMFORT pers lug

Expertise 2. Expertise Understanding of the decision problem and ways for its solving by: Expertise 2. Expertise Understanding of the decision problem and ways for its solving by: • Decision owner(s) • Expert(s) • Decision analyst(s) • User(s) 3. Data Previously solved decision problems • Attribute-value representation 51

EX D 4. DEX EX D 4. DEX "An Expert System Shell for Multi-Attribute Decision Making" Functionality: 1. Acquisition of attributes and their hierarchy. 2. Acquisition and consistency checking of decision rules. 3. Description, evaluation and analysis of options. 4. Explanation of evaluation results. Over 50 real-life applications: • Health-care • Education • Industry: • • • 52 Land-use planning Ecology Evaluation of enterprises, products, projects, investments, . . .

H IN 5. HINT T 53 Hierarchy INduction Tool: Automated development of hierarchical models H IN 5. HINT T 53 Hierarchy INduction Tool: Automated development of hierarchical models from data based on Function Decomposition

HINT: Further Information 54 http: //magix. fri. uni-lj. si/hint/ HINT: Further Information 54 http: //magix. fri. uni-lj. si/hint/

HINT Implementation: In ORANGE 55 http: //magix. fri. uni-lj. si/orange/ HINT Implementation: In ORANGE 55 http: //magix. fri. uni-lj. si/orange/

Application: Housing Loan Allocation • • User: Housing Fund of the Republic of Slovenia Application: Housing Loan Allocation • • User: Housing Fund of the Republic of Slovenia Task: Allocating available funds to applicants for housing loans Method: Using a multi-attribute model for priority evaluation of applications Supported by a DSS since 1991: • Completed floats of loans: 21 • Applications: 44378 received, 27813 approved • Allocated loans: 254 million € (2/3 of housing loans in Slovenia) 56

Modes of Operation 1. DEX only: from expertise 2. HINT only: from data 3. Modes of Operation 1. DEX only: from expertise 2. HINT only: from data 3. Supervised: from data under expert supervision 4. Serial: HINT-developed model subsequently refined by the expert 5. Parallel: parallel development of model(s) by DEX and HINT 6. Combined: combining sub-models developed in different ways 57

58 1. DEX-Only Mode 58 1. DEX-Only Mode

2. HINT-Only Mode (1 of 2) Reconstruction of the original model from unstructured data: 2. HINT-Only Mode (1 of 2) Reconstruction of the original model from unstructured data: • Real-life data from one float in 1994 • 1932 applications • 12 attributes (2 to 5 values) • 722 unique examples • 3. 7% coverage of the attribute space • unsupervised decomposition 59

2. HINT-Only Mode (2 of 2) Results: • Relatively good overall structure • Inappropriate 2. HINT-Only Mode (2 of 2) Results: • Relatively good overall structure • Inappropriate structure around c 3 • Excellent classification accuracy: • • 60 HINT: 94. 7 ± 2. 5 % C 4. 5: 88. 9 ± 3. 9 %

3. Supervised Mode (1 of 4) Unstructured dataset: Redundant: cult_hist, fin_sources 61 3. Supervised Mode (1 of 4) Unstructured dataset: Redundant: cult_hist, fin_sources 61

3. Supervised Mode (2 of 4) All partitions with b=3 and minimal ( =3) 3. Supervised Mode (2 of 4) All partitions with b=3 and minimal ( =3) [11 of 120] New concept: status 62

3. Supervised Mode (3 of 4) All partitions with b=3 and minimal ( =4) 3. Supervised Mode (3 of 4) All partitions with b=3 and minimal ( =4) [3 of 56] New concepts: social and then present 63

3. Supervised Mode (4 of 4) Final structure Results: • Expert sastified with the 3. Supervised Mode (4 of 4) Final structure Results: • Expert sastified with the structure • Improved classification accuracy: 64 • • supervised: 97. 8 ± 1. 8 % unsupervised: 94. 7 ± 2. 5 %

4. Serial Mode 1. Develop an initial model by HINT from data 2. Extend/enhance 4. Serial Mode 1. Develop an initial model by HINT from data 2. Extend/enhance the model "manually" using DEX For example: 1. Take the model developed by HINT in supervised mode 2. Add the attributes cult-hist and finsources: 65 – Extend the model structure – Define the corresponding decision rules

5. Parallel Mode Develop two or more independent models by HINT and DEX for: 5. Parallel Mode Develop two or more independent models by HINT and DEX for: • comparison • "second opinion" • flexibility For example, in this research we developed: 1. one DEX model 2. two HINT models: in supervised and unsupervised mode 66

6. Combined Mode Develop a single model using sub-models developed • by different methods 6. Combined Mode Develop a single model using sub-models developed • by different methods and • from different sources Hypothetical example: 1. Develop subtree for status by HINT 2. Develop soc-health by HINT from a different data set 3. A real-estate expert develops the house subtree using DEX 4. All three models "glued" together in DEX by a loan-allocation expert 67

DEX and HINT: Results • • • 68 Integration of DM and DS for DEX and HINT: Results • • • 68 Integration of DM and DS for model-based problem solving Requirements: – – – common model representation expertise and data (possibly partial) methods for "automatic" (DM) and "manual" (DS) model development Offers a multitude of method combinations: – independent, serial, parallel, combined, … Specific schema: – – – qualitative hierarchical multi-attribute models DEX as a DS method HINT as a DM method Real-world application: Housing loan allocation – – Application of DEX-only, HINT-only, supervised and parallel modes Integration of DS and DM through HINT improved both the classification accuracy and comprehensibility of the model

Parallel Applications: Multiple DM models, then DS Data Mining 69 Decision Model 1 Support Parallel Applications: Multiple DM models, then DS Data Mining 69 Decision Model 1 Support Model 3 Model 2

Problem: Prediction of Academic Achievement Primary School 1 7 8 High School 1 2 Problem: Prediction of Academic Achievement Primary School 1 7 8 High School 1 2 3 4 5: graduates: 4 or 5 4: graduates: 2 or 3 3: prolonged . . . Predictio n 70 2: fails soon 1: fails late Chapter 17 – S. Gasar, M. Bohanec, V. Rajkovič

DM + DS Integration: Academic Achivement 71 DM: Weka Data DS: DEXi DM: HINT DM + DS Integration: Academic Achivement 71 DM: Weka Data DS: DEXi DM: HINT

Parallel Application: EC Harris 72 Chapter 16 – Steve Moyle, Marko Bohanec, Eric Parallel Application: EC Harris 72 Chapter 16 – Steve Moyle, Marko Bohanec, Eric

Conclusion • DM & DS approaches are: – complementary – supplementary • New and Conclusion • DM & DS approaches are: – complementary – supplementary • New and developing research area • Typical combinations: – – – DS for DM DM for DS DM, then DS DS, then DM DM and DS • Open questions: – formalization (framework) of DM&DS integration – common methodologies and approaches – standardization 73