719bf06441969fc7a2cf0c75cc688252.ppt
- Количество слайдов: 73
ACAI’ 05/SEKT’ 05 ADVANCED COURSE ON KNOWLEDGE DISCOVERY Data Mining and Decision Support Integration Marko Bohanec 1 Jožef Stefan Institute Department of Knowledge Technologies & University of Ljubljana Faculty of Administration
Data Mining vs. Decision Support knowledge discovery from data Use of models: Data Mining • classificatio n model data • clustering • evaluation modeling • analysis Decision Support 2 decision makers+ experts+ decision analysts • visualizatio n model • explanation • . . .
Overview 1. Decision Support: – – – Decision problem Decision-making Decision support Decision analysis Multi-attribute modeling 2. Decision Support and Data Mining – How to combine and integrate DS and DM? • • • DS for DM DM for DS DM, then DS DS, then DM DM and DS – DS for DM: ROC space – DM and DS: Combining DEX and HINT 3
Literature Part I: Basic Technologies – Chapter 3: Decision Support – Chapter 4: Integration of Data Mining and Decision Support Part II: Integration Aspects of DM and DS – Chapter 7: DS for DM: ROC Analysis Part III: Applications of DM and DS – Chapter 15: Five Decision Support Applications – Chapter 16: Large and Tall Buildings – Chapter 17: Educational Planning 4
1. Decision Support Decision Problem Decision-Making Decision Support Decision Analysis Multi-Attribute Modeling 5 Chapter 3 – M. Bohanec: Decision Support
Decision-Making Decision: The choice of one among a number of alternatives Decision-Making: A process of making the choice that includes: • Assessing the problem • Collecting and verifying information • Identifying alternatives • Anticipating consequences of decisions • Making the choice using sound and logical judgment based on available information • Informing others of decision and rationale • Evaluating decisions 6
Decision Problem options (alternatives) goals • FIND the option that best satisfies the goals • RANK options according to the goals 7 • ANALYSE, JUSTIFY, EXPLAIN, …, the decision
Types of Decisions • Easy (routine, everyday) vs. Difficult (complex) • One-Time vs. Recurring • One-Stage vs. Sequential • Single Objective vs. Multiple Objectives • Individual vs. Group • Structured vs. Unstructured • Tactical, Operational, Strategic 8
Characteristics of Complex Decisions • Novelty • Unclearness: Incomplete knowledge about the problem • Uncertainty: Outside events that cannot be controlled • Multiple objectives (possibly conflicting) • Group decision-making • Important consequences of the decision • Limited resources 9
Decision-Making 10 Human DM Decision Sciences Machine DM Decision Systems • Switching circuits • Processors • Computer programs • Systems for routine DM • Autonomous agents • Space probes
Decision-Making Decision Sciences Normati ve Decision Theory Utility Theory Game Theory of Choice 11 Decision Systems Descriptiv Decision e Support Cognitive Psychology Social and Behavioral Sciences
Decision Support: Methods and tools for supporting people involved in the decisionmaking process Central Disciplines: • Operations Research and Management Sciences • Decision Analysis • Decision Support Systems Contributing and Related Disciplines: • Decision Sciences (other than DS itself) • Statistics, Applied Mathematics • Computer Sciences: Information Systems, Databases, Data Warehouses, OLAP • Artificial Intelligence: Expert Systems, ML, NN, GA • Knowledge Discovery from Databases and Data Mining Other Methods and Tools: • Representation and visualization tools • Methods and tools for organizing data, facts, thoughts, . . . • Communication technology • Mediation systems 12
Decision-Making Decision Sciences Decision Systems Normati Descriptiv Decision ve e Support 13 OR/MS DA Decision Influence trees diagrams DSS Multiattribute models Other
Decision Analysis: Applied Decision Theory Provides a framework for analyzing decision problems by • structuring and breaking them down into more manageable parts, • explicitly considering the: – – possible alternatives, available information uncertainties involved, and relevant preferences • combining these to arrive at optimal (or "good") decisions 14
The Decision Analysis Process 15 Identify decision situation and understand objectives Identify alternatives Decompose and model • problem structure • uncertainty • preferences Sensitivity Analyses Choose best alternative Implement Decision
Evaluation Models options 16 EVALUATION MODEL ANALYSIS
Types of Models in Decision Analysis Decision Trees 17 Multi-Attribute Utility Models Succeed Invest Fail Investmen t Do not invest Influence Diagrams Invest? Cost s Risk s Results Success? Return Analytic Hierarchy Process
Multi-Attribute Models cars 18 buying maint PRICE safety CAR doors pers TECH COMF lug problem decomposition
Tree of Attributes Decomposition of the problem to sub-problems ("Divide and Conquer!") CAR PRICE BUYING MAINTEN The most difficult stage! 19 TECH. CHA R. SAFETY COMFORT
Utility Functions (Aggregation) Aggregation: bottom-up aggregation of attributes’ values CAR PRICE 75% BUYING 20 TECH. CHA R. 25% MAINTEN SAFETY COMFORT low COMFOR T exc TECH. C H. unacc high low unacc med accept high good exc
Evaluation and Analysis 21 EVALUATION • • • direction: bottom-up (terminal root attributes) result: each option evaluated inaccurate/uncertain data?
Evaluation and Analysis 22 ANALYSIS • • interactive inspection “what-if” analysis sensitivity analysis explanation
DEXi: Computer Program for Multi-Attribute Decision Making • • Creation and editing of – – model structure (tree of attributes) value scales of attributes decision rules (incl. using weights) options and their descriptions (data) Evaluation of options (can handle missing values) “What-if” analysis Reporting: – – tables charts 23 http: //www-ai. ijs. si/Marko. Bohanec/dexi. html
Some Application Areas 1. INFORMATION TECHNOLOGY 4. PERSONNEL • evaluation of computers MANAGEMENT • • evaluation of software evaluation of Web portals 2. PROJECTS • • • evaluation of projects evaluation of proposal and investments product portfolio evaluation 3. COMPANIES • • 24 business partner selection performance evaluation of companies • • personnel evaluation selection and composition of expert groups evaluation of personal applications educational planning 5. MEDICINE and HEALTHCARE • • risk assessment diagnosis and prognosis 6. OTHER AREAS • • • assessment of technologies assessments in ecology and environment granting personal/corporate loans
Allocation of Housing Loans 25 Ownership Present Suitability Solving Housing Stage Work stage Advantages Earnings Priority Status Maint/Employ Health Family Soc-Health Social Age Children
Medicine: Breast Cancer Risk Assessment 26 Bohanec, M. , Zupan, B. , Rajkovič, V. : Applications of qualitative multi-attribute decision models in health care, International Journal of Medical Informatics 58 -59, 191 -205,
Evaluation and Analysis of Options 27
Selective Explanation of Options 28
Diabetic Foot Risk Assessment Who: • General Hospital Novo Mesto, Slovenia • IJS • Infonet, d. o. o. Why: • Reduce the number of amputations • Improve the risk assessment methodology • Improve the DSS module of clinical information system How: • Develop multi-attribute risk assessment model • Evaluate it on patient data (about 3400 patients) • Integrate into the clinical information system 29 Chapter 15 – M. Bohanec, V. Rajkovič, B. Cestnik: 5 DS
Diabetic Foot Risk Assessment 30 Model Structure
2. Combining Data Mining and Decision Support 31 How to combine DS and DM? DS for DM: ROC space DM and DS: Combining DEX and HINT Chapter 4 – N. Lavrač, M. Bohanec: Integration of DM
Data Mining vs. Decision Support knowledge discovery from data Use of models: Data Mining • classificatio n model data 32 • clustering • evaluation modeling • analysis Decision Support decision makers+ experts+ decision analysts • visualizatio n model • explanation • . . .
DM + DS Integration ? 33 Data Mining Decision Support ?
DM + DS Integration ! 34
Combining DM and DS • “DS for DM”: – ROC methodology – meta-learning • “DM for DS”: – MS Analysis Services – model revision (from data) • “DM, then DS” (sequential application): – Decisions-At-Hand approach • “DS, then DM” (sequential application): – using models in data pre-processing for DM • “DM and DS” (parallel application): 35 – combining through models, e. g. , DEXi and HINT – considering different problem dimensions
“DS for DM” 36 Data Mining Decision Support Decision support within the DM process e. g. , ROC curves
ROC space • True positive rate = #true pos. / #pos. – TPr 1 = 40/50 = 80% – TPr 2 = 30/50 = 60% • False positive rate = #false pos. / #neg. – FPr 1 = 10/50 = 20% – FPr 2 = 0/50 = 0% • ROC space has – FPr on X axis – TPr on Y axis 37 Chapter 7 – Slides by Peter Flach
true positive rate The ROC convex hull 38 false positive rate
true positive rate The ROC convex hull 39 false positive rate
true positive rate Choosing a classifier 40 false positive rate
true positive rate Choosing a classifier 41 false positive rate
“DM for DS” Data Mining Introducing DM methods into the DS process: 42 – MS SQL Server - Analysis Services – model revision Decision Support
“DM for DS”: Model Revision 43
Sequential Application: “First DS, then DM” Decision Support 44 Data Mining Model 1 Model 2
“First DS, then DM” in Data Pre-Processing 45 Input attributes Generated attributes
Sequential Application: “First DM, then DS” Data Mining 46 Decision Support Model 1 Model 2
Decisions-At-Hand Schema Decision Support Shells … … on Palm Data Mining (Model Construction) 47 Decision Model in XML (Synchronization or Upload) Blaž Zupan et al. : http: //www. ailab. si/app/palm/ … on the Web
“DM and DS” Through Model Development Data 48 Requiremen ts Data Mining Expertise Decision Support Model Chapter 4 + references Common modeling formalism
Multi-Attribute Decision Models Expertise Data Decision Support T D IN EX H 49 Data Mining Model Qualitative Hierarchical Multi-Attribute Decision Models
Model 1. Qualitative Multi-Attribute Models • Decomposition of the problem to less complex subproblems • Qualitative attributes • Decision rules CAR PRICE buying 50 maint TECH safety doors COMFORT pers lug
Expertise 2. Expertise Understanding of the decision problem and ways for its solving by: • Decision owner(s) • Expert(s) • Decision analyst(s) • User(s) 3. Data Previously solved decision problems • Attribute-value representation 51
EX D 4. DEX "An Expert System Shell for Multi-Attribute Decision Making" Functionality: 1. Acquisition of attributes and their hierarchy. 2. Acquisition and consistency checking of decision rules. 3. Description, evaluation and analysis of options. 4. Explanation of evaluation results. Over 50 real-life applications: • Health-care • Education • Industry: • • • 52 Land-use planning Ecology Evaluation of enterprises, products, projects, investments, . . .
H IN 5. HINT T 53 Hierarchy INduction Tool: Automated development of hierarchical models from data based on Function Decomposition
HINT: Further Information 54 http: //magix. fri. uni-lj. si/hint/
HINT Implementation: In ORANGE 55 http: //magix. fri. uni-lj. si/orange/
Application: Housing Loan Allocation • • User: Housing Fund of the Republic of Slovenia Task: Allocating available funds to applicants for housing loans Method: Using a multi-attribute model for priority evaluation of applications Supported by a DSS since 1991: • Completed floats of loans: 21 • Applications: 44378 received, 27813 approved • Allocated loans: 254 million € (2/3 of housing loans in Slovenia) 56
Modes of Operation 1. DEX only: from expertise 2. HINT only: from data 3. Supervised: from data under expert supervision 4. Serial: HINT-developed model subsequently refined by the expert 5. Parallel: parallel development of model(s) by DEX and HINT 6. Combined: combining sub-models developed in different ways 57
58 1. DEX-Only Mode
2. HINT-Only Mode (1 of 2) Reconstruction of the original model from unstructured data: • Real-life data from one float in 1994 • 1932 applications • 12 attributes (2 to 5 values) • 722 unique examples • 3. 7% coverage of the attribute space • unsupervised decomposition 59
2. HINT-Only Mode (2 of 2) Results: • Relatively good overall structure • Inappropriate structure around c 3 • Excellent classification accuracy: • • 60 HINT: 94. 7 ± 2. 5 % C 4. 5: 88. 9 ± 3. 9 %
3. Supervised Mode (1 of 4) Unstructured dataset: Redundant: cult_hist, fin_sources 61
3. Supervised Mode (2 of 4) All partitions with b=3 and minimal ( =3) [11 of 120] New concept: status 62
3. Supervised Mode (3 of 4) All partitions with b=3 and minimal ( =4) [3 of 56] New concepts: social and then present 63
3. Supervised Mode (4 of 4) Final structure Results: • Expert sastified with the structure • Improved classification accuracy: 64 • • supervised: 97. 8 ± 1. 8 % unsupervised: 94. 7 ± 2. 5 %
4. Serial Mode 1. Develop an initial model by HINT from data 2. Extend/enhance the model "manually" using DEX For example: 1. Take the model developed by HINT in supervised mode 2. Add the attributes cult-hist and finsources: 65 – Extend the model structure – Define the corresponding decision rules
5. Parallel Mode Develop two or more independent models by HINT and DEX for: • comparison • "second opinion" • flexibility For example, in this research we developed: 1. one DEX model 2. two HINT models: in supervised and unsupervised mode 66
6. Combined Mode Develop a single model using sub-models developed • by different methods and • from different sources Hypothetical example: 1. Develop subtree for status by HINT 2. Develop soc-health by HINT from a different data set 3. A real-estate expert develops the house subtree using DEX 4. All three models "glued" together in DEX by a loan-allocation expert 67
DEX and HINT: Results • • • 68 Integration of DM and DS for model-based problem solving Requirements: – – – common model representation expertise and data (possibly partial) methods for "automatic" (DM) and "manual" (DS) model development Offers a multitude of method combinations: – independent, serial, parallel, combined, … Specific schema: – – – qualitative hierarchical multi-attribute models DEX as a DS method HINT as a DM method Real-world application: Housing loan allocation – – Application of DEX-only, HINT-only, supervised and parallel modes Integration of DS and DM through HINT improved both the classification accuracy and comprehensibility of the model
Parallel Applications: Multiple DM models, then DS Data Mining 69 Decision Model 1 Support Model 3 Model 2
Problem: Prediction of Academic Achievement Primary School 1 7 8 High School 1 2 3 4 5: graduates: 4 or 5 4: graduates: 2 or 3 3: prolonged . . . Predictio n 70 2: fails soon 1: fails late Chapter 17 – S. Gasar, M. Bohanec, V. Rajkovič
DM + DS Integration: Academic Achivement 71 DM: Weka Data DS: DEXi DM: HINT
Parallel Application: EC Harris 72 Chapter 16 – Steve Moyle, Marko Bohanec, Eric
Conclusion • DM & DS approaches are: – complementary – supplementary • New and developing research area • Typical combinations: – – – DS for DM DM for DS DM, then DS DS, then DM DM and DS • Open questions: – formalization (framework) of DM&DS integration – common methodologies and approaches – standardization 73