
e5c969d92fd4872ab1f98ee57978848f.ppt
- Количество слайдов: 32
Enterprise and Business Intelligence Systems (e. bis. business. utah. edu) Research Lab, UA -> UU Director Olivia R. Liu Sheng, Ph. D. Emma Eccles Jones Presidential Chair of Business School of Accounting and Information Systems David Eccles School of Business University of Utah 801 -585 -9071, olivia. sheng@business. utah. edu 10/2002 1
e. bis Research Focus Enterprise Systems E-procurement technology Web content caching and storage mgmt Enterprise application integration Process modeling and re-use System security and risk management Portal design and management Business Intelligence Systems 10/2002 Decision support systems Data/web mining Knowledge management Knowledge refreshing Personalization 2
e. bis Research Output Models Methods Technology Analyses Fueled by Applications! 10/2002 3
Faculty Olivia R. Liu Sheng, Ph. D. Paul Hu, Ph. D. UU UU Ph. D. students and Post Docs Xiao Fang, 5 th-yr Ph. D. student Lin, 3 rd-yr Ph. D. student Wei Gao, 3 rd-yr Ph. D. student Hua Su, post-doc Xiaoyun Sun, 1 st-yr Ph. D. student Zhongmin Ma, 1 st-yr Ph. D. student UA UA UA UU 6 to 10 Master and UG students per yr International and industrial collaborators 10/2002 4
Web Mining for Knowledge Management 10/2002
What is Data Mining? The automated process of discovering relationships and patterns in data Related terms: knowledge discovery in database (KDD), machine learning A step in the knowledge discovery process consisting of particular algorithms (methods) that under some acceptable objective, produces a particular enumeration of patterns (models) over the data. An iterative process within which progress is defined by “discovery”, through either automatic or manual methods The application of statistical and artificial intelligence techniques (algorithms) for discovering patterns and regularities in large volumes of data. 10/2002 6
Why Data Mining n Type of knowledge (more abstract) and the level of sophistication in required computation, e. g. , n n n Which buyers are likely to be late on future payments? Which sellers are likely to be late on future deliveries? If a seller increases product-in-week by x units, how much % of sales increase can be expected. Which buyers are similar in their buying powers and product and contract preferences? Frequency in discovering and applying the knowledge is met with bottlenecks in human processing n Decision support for buyers, sellers and market hosts at each transaction decision point Data Visualization Needs Going beyond business charts (e. g. , pie, line, bar charts) Maps, trees, 2 -D, and 3 -D 10/2002 7
Taxonomies of Data Mining By Tasks By Data 10/2002 8
Data Mining Tasks n Association/Sequential Patterns n n Clustering n n Identifying clusters embedded in the data, where a cluster is a collection of data objects that are “similar” to one another. Classification n n The discovery of co-occurrence correlations among a set of items. Analyzing a set of training data and constructing a model for each class based on the features in the data. Class Description n Providing a concise and succinct summarization of a collection of data. Time-series Analysis Analyzing large set of time-series data to find certain regularities and interesting characteristics. 10/2002 9
Market Basket (Association Rule) Analysis A market basket is a collection of items purchased by in an individual customer transaction, which is a well-d business activity Ex: • a customer’s visit a grocery store • an online purchase from a virtual store such as ‘Amazo 10/2002 10
Market Basket (Association Rule) Analysis Market basket analysis is a common analysis run against a transaction database to find sets of items, or itemsets, that appear together in many transactions. Each pattern extra through the analysis consists of an itemset and the number o transactions that contain it. Applications: • improve the placement of items in a store • the layout of mail-order catalog pages • the layout of Web pages • others? 10/2002 11
Clustering distributes data into several groups so that similar objects fall into the same group. For example, we can cluster customers based on their purchase behavior. Applications: customer, web content, document and gene segmentation 10/2002 12
Classification classifies data into pre-defined outcome classes Example: 10/2002 13
Classification Age <25 Car Type in {sports} High Low High Applications: customer profiling, shopping prediction Diagnostic decision support 10/2002 14
By Data Structured alphanumeric data Buyer, supplier, product, order, bank acct Image data Satellite, patient, document, handwriting, facial, etc. Spatial data Map, traffic, geological, CAD, graphics, etc. 10/2002 15
By Data, Cont’d Temporal data Time series, population, stock, inventory, sales, etc. Spatial and temporal data – trajectory Text – documents, web pages, etc. Video/audio – surveillance video, voice, music, etc. 10/2002 16
Web (Data) Mining Web data – generated or used by the Web content - static or dynamic Web structure – hyperlinks Web usage – web access log 10/2002 17
Why is Web Mining Important? Rich data gathering and access medium A variety of important applications Information retrieval Ecommerce – CRM, SCM, etc. Knowledge management Interesting challenges Scalability – global, multi-lingual, growth Agility of knowledge 10/2002 18
What is “knowledge”? Relationships and patterns in data Organized, analyzed and understandable Truths, beliefs, perspectives, concepts, procedures, judgments, expectations, methodologies, heuristics, restrictions, know-how Applicable to problem solving and decision making DBs, documents, policies and procedures as well as the un-captured, tacit expertise and experience Actionable, at the right place and right time!!! 10/2002 19
What is Knowledge Management? Views: Process (KM activities) Goal (Operational efficiency and innovations) Methodology (formalization, control and technology) Delphi Group: “Leveraging collective wisdom to increase responsiveness and innovation. ” 10/2002 20
What is a KM program? Processes Organizational structure and policies Management theories and methodologies Information assurance Technologies and resources Implementation, training and change management Measurement, maintenance and evolution A multi-disciplinary effort!!! Managerial and cultural Technological and engineering 10/2002 21
KM Process Identify Collect Organize Represent Store Locate Retrieve Extract Discover 10/2002 Visualize Interpret Share Transfer Adapt Apply Monitor Evaluate Create 22
Data Mining & KM Data mining discover knowledge Data mining support management of KM infrastructure (Personalized) content management Security management Workflow management Scalable performance 10/2002 23
Web Mining & KM Web mining discover knowledge Web mining support management of web KM portal R&D Intranet Consulting B 2 B, B 2 C, e-government, e-financing, e-risk management 10/2002 24
Web Mining & Knowledge Refreshing 10/2002
The KDD Process Step 4: Data Mining Step 2: Cleaning & Preprocessing 10/2002 Discovered Knowledge Step 3: Transformation Patterns Step 1: Selection Data Step 5: Interpretation & Evaluation Transformed Data Preprocessed Data Target Data 26
Types of Domain Knowledge Step 4: Data Mining Step 2: Cleaning & Preprocessing Step 5: Interpretation & Evaluation Discovered Knowledge Step 3: Transformation Patterns Step 1: Selection Transformed Data Preprocessed Data Target Data DBA Knowledge Domain Expert Knowledge Data Mining Expert Knowledge 10/2002 27
Fundamental Problems The size of the database is significantly large The number of rules resulting from mining activity is also large The knowledge derived from a database reflects only the current state of the database 10/2002 28
Issues in the KDD Process Agility Scalability Step 4: Data Mining Step 2: Cleaning & Preprocessing Step 5: Interpretation & Evaluation Discovered Knowledge Step 3: Transformation Patterns Transformed Data 10/2002 Preprocessed Data Target Data 29
Knowledge Refreshing • The process to efficiently update discovered knowledge as data and domain knowledge change. • Goals – Up-to-date knowledge (Agility) – Knowledge Re-use (Scalability) 10/2002 30
Type of Changes NEW Discovered Knowledge NEW Patterns NEW Data NEW Transformed Data DBA Knowledge Preprocessed Data Domain Expert Knowledge Target Data Mining Expert Knowledge NEW 10/2002 31
Knowledge Refreshing Needs assessment Monitoring vs. analytic approaches Monitoring/estimate changes in knowledge to determine if and when to re-mine Incremental data mining (learning) How to leverage knowledge previously discovered from data mining to improve computational efficiency and quality of knowledge 10/2002 32
e5c969d92fd4872ab1f98ee57978848f.ppt