
62f3e533856f38d7b49c05d6aa0a52ab.ppt
- Количество слайдов: 65
Data Warehouse 1 Sharif University
Objectives • • Need for Data Warehouse. What is Data Warehouse? Data Warehouse Properties. Data Warehouse Architectures. Data Marts. Corporate Information Factory. Extraction, Transportation, Loading and Transformation. • Design in Data Warehouses. • Data Warehousing Schemas. 2 Sharif University
Decision support questions that enterprises need to have answered • How did sales representatives perform over different periods of time? • What are the popular products? • What types of customers buy what types of products? • How much are the various internal organizations spending on what products? 3 Sharif University
Cont. • What were the variances between the amounts budgeted and the amounts spent? • What positions are being filled by people with what types of background? • What is the average pay for people within different age brackets? 4 Sharif University
What is a Data Warehouse? • A data warehouse is a relational database that is designed for query and analysis rather than for transaction processing • A common way of introducing data warehousing is to refer to the characteristics of a data warehouse as set forth by “ William Inmon ”: Subject Oriented – Integrated – Nonvolatile – Time Variant – 5 Sharif University
Data Warehouse Properties Subject Oriented Integrated Data Warehouse Non Volatile 6 Time Variant Sharif University
Subject Oriented • Data is categorized and stored by business subject rather than by application. • For example, to learn more about your company’s sales data , "Who was our best customer for this item, in this region last year? " This ability to define a data warehouse by subject matter, sales in this case, makes the data warehouse subject oriented. Product Region Time Operational Systems 7 Customer Financial Information Data Warehouse Subject Area Sharif University
Integrated Data warehouses must put data from disparate sources into a consistent format. 8 Sharif University
Time Variant (time series) • Data is stored as a series of snapshots, each representing a • period of time. Time Data Jan/03 Data for January Feb/03 Data for February Mar/03 Data for March Data Warehouse 9 Sharif University
Non Volatile • Typically data in the data warehouse is not updated or deleted. Nonvolatile means that, once entered into the warehouse, data should not change. This is logical because the purpose of a warehouse is to enable you to analyze what has occurred. Load Operational Databases INSERT Read Warehouse Database Read UPDATE DELETE 10 Sharif University
Other Characteristics of Data Warehouse • • 11 Summarized Not Normalized Meta Data Sources (Both operational and external data are presents) Sharif University
Summary Data – Provide fast access to pre-computed data – Reduce use of • I/O • CPU • Memory – Distill from • Source systems - lightly summarized • Pre-calculated summaries - highly summarized – Determine requirements early 12 Sharif University
Summary Data • • Average Maximum Total Percentage Units Sold Sales($) Store Product A Total Dimension Data Product B Fact Data Total Product C Total 13 Sharif University
Summary Data Product Time Store Summary Fact (Derived) 14 Sharif University
Normalization – Normalized data contains no • Redundancy. • Repeating data. • Key independent columns. – Denormalized data often • Improves efficiency in OLAP systems. • Exists in data warehouse databases. • Comprises derived or summary data. – Star and snowflake models are denormalized. 15 Sharif University
Meta Data (Data about Data) Provides information about the content of the warehouse. Meta Data includes: • A guide to moving data to the warehouse • Rules for summarization • Business terms used to describe data • Technical terminology • Rules for data extractions 16 Sharif University
Data Warehouse Architectures • • • 17 Data Warehouse Architecture (Basic) Data Warehouse Architecture (with a Staging Area and Data Marts) Sharif University
Data Warehouse Architecture (Basic) • 18 End users directly access data derived from several source systems through the data warehouse. Sharif University
Data Warehouse Architecture (with a Staging Area) you need to clean and process your operational data before putting it into the warehouse. You can do this programmatically, although most data warehouses use a staging area instead. 19 Sharif University
Data Warehouse Architecture (with a Staging Area and Data Marts) you may want to customize your warehouse’s architecture for different groups within your organization. You can do this by adding data marts, which are systems designed for a particular line of business. 20 Sharif University
Data Marts A Data Mart is a small warehouse designed for strategic business unit or a department. Data Mart Advantages: • • • The cost is low. Implementation time is shorter. They are controlled locally rather than centrally. They contain less information than the data warehouse and hence have more rapid response. They allow a business unit to build its own DSS without relying on a centralized IS department. Data Mart Types: • • 21 Replicated Data Marts. Stand-alone Data Marts. Sharif University
Corporate Information Factory Information Workshop Library & Toolbox Workbench Information Feedback Exploration Warehouse API Internet API Other Operational Systems Management 22 CIF Data Management Data Delivery Operational Data Store Legacy API Data Acquisition Tr. I Data Mining Warehouse DSI Oper Mart Data Warehouse ERP DSI OLAP Data Mart External DSI Meta Data Management Data Acquisition Management Operation & Administration Service Management Change Sharif Management University
Major Business Functions Information Workshop Library & Toolbox Workbench Information Feedback Business Intelligence Exploration Warehouse External API ERP Business Data Internet API Operations Acquisition API Other Operational Systems Management 23 CIF Data Management Data Delivery Operational Data Store Legacy API Data Warehouse DSI Data Mining Warehouse DSI OLAP Data Mart DSI Business Management Tr. I Oper Mart DSI Meta Data Management Data Acquisition Management Operation & Administration Service Management Change Sharif Management University
Operational Systems Information Workshop Library & Toolbox External API ERP Internet API Data Acquisition Systems Management CIF Data Management Data Delivery Warehouse Tr. I OLAP Data Mart DSI Oper Mart Operational Data Store Other Operational Systems 24 Information Feedback Operational Systems are the internal and external core systems that run the day-to-day business Exploration operations. They are accessed through application DSI Warehouse program interfaces (APIs) and are the source of data Data Warehouse Data Mining for the data warehouse and operational data store. DSI Legacy API Workbench DSI Meta Data Management Data Acquisition Management Operation & Administration Service Management Change Sharif Management University
External Data Information Workshop Library & Toolbox Workbench Information Feedback Exploration Warehouse Internet API Legacy API Other Operational Systems Management 25 Data Warehouse ERP DSI OLAP Data API DSI Data Mining Warehouse External DSI Data CIF Data Acquisition. External Data is any data outside the normal Management Delivery Mart data collected through an enterprise’s internal Operational Data Store applications. Generally, external data, such as Tr. I Oper Mart DSI demographic, credit, competitor, and financial information, is purchased by the enterprise Meta Data Management from a vendor of such information. Data Acquisition Management Operation & Administration Service Management Change Sharif Management University
Data Acquisition Information Workshop Library & Toolbox Data Acquisition is the set of processes that capture, integrate, transform, cleanse, Information Feedback and load source data into the data warehouse and operational data store. Exploration Warehouse External API Internet API Other Operational Systems Management 26 CIF Data Management Data Delivery Operational Data Store Legacy API Data Acquisition Tr. I DSI Data Mining Warehouse DSI OLAP Data Mart DSI Oper Mart Data Warehouse ERP API Workbench DSI Meta Data Management Data Acquisition Management Operation & Administration Service Management Change Sharif Management University
Data Problems 27 Sharif University
Data Warehouse Library & Toolbox Information Workshop Workbench Information Feedback Exploration Warehouse External API API Data Warehouse ERP Internet Data Acquisition CIF Data Management Data Delivery Operational subject-oriented, Data Store Legacy The Data Warehouse is a Tr. I integrated, time-variant, non-volatile collection Other API of data used to support the strategic decision. Operational Systems making process for. Metaenterprise. the Data Management Systems Management 28 Data Acquisition Management Operation & Administration Service Management DSI Data Mining Warehouse DSI OLAP Data Mart DSI Oper Mart DSI Change Sharif Management University
Operational Data Store The Operational & Toolbox an subject. Library Data Store is. Information Workshop oriented, integrated, current, volatile collection of data used to support the tactical Information Feedback decision-making process for the enterprise. Workbench Exploration Warehouse API Internet API Data Acquisition Other Operational Systems Management CIF Data Management Data Delivery Operational Data Store Legacy API 29 Data Warehouse ERP Tr. I DSI Data Mining Warehouse DSI OLAP Data Mart DSI Oper Mart External DSI Meta Data Management Data Acquisition Management Operation & Administration Service Management Change Sharif Management University
Comparing an Operational Data Store and a Data Warehouse 30 Sharif University
CIF Data Management Library & Toolbox CIF Data Management is the set of processes that Information Workshop Workbench protect the integrity and continuity of the data within and across the data warehouse and operational data Information Feedback store. It may employ a staging area for cleansing and synchronizing data. Exploration Warehouse API Internet API Data Acquisition Other Operational Systems Management CIF Data Management Data Delivery Operational Data Store Legacy API 31 Data Warehouse ERP Tr. I DSI Data Mining Warehouse DSI OLAP Data Mart DSI Oper Mart External DSI Meta Data Management Data Acquisition Management Operation & Administration Service Management Change Sharif Management University
Transactional Interface Information Workshop Library & Toolbox Workbench The Transactional Interface is an easy-to-use and intuitive interface for the end user to. Feedback and Information access manipulate data in the operational data store. Exploration Warehouse API Internet API Data Acquisition Other Operational Systems Management CIF Data Management Data Delivery Operational Data Store Legacy API 32 Data Warehouse ERP Tr. I DSI Data Mining Warehouse DSI OLAP Data Mart DSI Oper Mart External DSI Meta Data Management Data Acquisition Management Operation & Administration Service Management Change Sharif Management University
Data Delivery Information Workshop Library & Toolbox Workbench Data Delivery is the set of processes that Information Feedback enables end users and their supporting IT groups to filter, format, and deliver data to data marts and oper-marts. External API Internet API Other Operational Systems Management 33 CIF Data Management Data Delivery Operational Data Store Legacy API Data Acquisition Tr. I DSI Data Mining Warehouse DSI OLAP Data Mart DSI Oper Mart Data Warehouse ERP API Exploration Warehouse DSI Meta Data Management Data Acquisition Management Operation & Administration Service Management Change Sharif Management University
Exploration Warehouse Library & Toolbox Information Workshop Workbench Information Feedback Exploration Warehouse External API API Data Warehouse ERP Internet Data Acquisition CIF Data Management Data Delivery Operational Data Store data mart Legacy The Exploration Warehouse is a whose Tr. I Other API purpose is to provide a safe haven for exploratory and Operational ad hoc processing. An exploration warehouse may Systems Meta Data Management utilize specialized technologies to provide fast response times with the ability to access the entire database. Service Systems Data Acquisition Management 34 Management Operation & Administration Management DSI Data Mining Warehouse DSI OLAP Data Mart DSI Oper Mart DSI Change Sharif Management University
Data Mining Warehouse Library & Toolbox Information Workshop Workbench Information Feedback Exploration Warehouse API API Data Warehouse ERP Internet Data Acquisition CIF Data Management Data Delivery Operational Data Store Legacy Tr. I The API Data Mining Warehouse includes tasks Other known as knowledge extraction, data Operational Systems Meta Data Management archaeology, data exploration, data pattern processing and data harvesting. Systems Management 35 Data Acquisition Management Operation & Administration Service Management DSI Data Mining Warehouse DSI OLAP Data Mart DSI Oper Mart External DSI Change Sharif Management University
OLAP Data Mart Information Workshop Library & Toolbox Workbench The OLAP (online analytical Information Feedback processing) Data Mart is aggregated and/or summarized data that is derived External from the data warehouse and tailored to support the multidimensional requirements of a given business Data ERP API Warehouse unit or business function. Internet API Other Operational Systems Management 36 CIF Data Management Tr. I DSI Data Mining Warehouse DSI OLAP Data Mart DSI Oper Mart Data Delivery Operational Data Store Legacy API Data Acquisition Exploration Warehouse DSI Meta Data Management Data Acquisition Management Operation & Administration Service Management Change Sharif Management University
Oper-Mart Information Workshop Library & Toolbox The Oper-Mart is a subset of data derived from of the. Workbench operational data store used in tactical analysis and usually stored in a multidimensional manner (star schema or Information Feedback hypercube). They may be created in a temporary manner and dismantled when no longer needed. Exploration External Warehouse API Internet API Data Acquisition Other Operational Systems Management CIF Data Management Tr. I Data Mining Warehouse DSI OLAP Data Mart DSI Oper Mart Data Delivery Operational Data Store Legacy API 37 Data Warehouse ERP DSI Meta Data Management Data Acquisition Management Operation & Administration Service Management Change Sharif Management University
Decision Support Interface Information Workshop Library & Toolbox Workbench The Decision Support Interface is an easy-to-use, intuitive tool to enable Information Feedback end user capabilities such as exploration, data mining, OLAP, query, and reporting to distill information from data. External API Internet API Other Operational Systems Management 38 CIF Data Management Data Delivery Operational Data Store Legacy API Data Acquisition Tr. I DSI Data Mining Warehouse DSI OLAP Data Mart DSI Oper Mart Data Warehouse ERP Exploration Warehouse DSI Meta Data Management Data Acquisition Management Operation & Administration Service Management Change Sharif Management University
Meta Data Management Information Workshop Library & Toolbox Workbench Information Feedback External API ERP Internet API Systems Management CIF Data Management Data Delivery Warehouse DSI Tr. I OLAP Data Mart DSI Oper Mart Operational Data Store Other Operational Systems 39 Data Acquisition Legacy API Meta Data Management is the set of processes for Exploration DSI Warehouse managing the information needed to promote data Data Warehouse legibility, use, and administration. Data Mining DSI Meta Data Management Data Acquisition Management Operation & Administration Service Management Change Sharif Management University
Information Feedback Information Workshop Library & Toolbox Workbench Information Feedback Exploration Warehouse External API ERP API Internet API Legacy DSI Data Warehouse Feedback Information is the set of processes Data Mining DSI Warehouse Data transmit. CIF Data that the intelligence Data gained through Acquisition of the Corporate Information Factory to Management Delivery usage OLAP Data DSI Mart Operational appropriate data stores. Data Store API Operational Systems Management 40 Tr. I Other Oper Mart DSI Meta Data Management Data Acquisition Management Operation & Administration Service Management Change Sharif Management University
Information Workshop Library & Toolbox Workbench Information Feedback Exploration Warehouse External API ERP API Internet API Legacy API Systems Management 41 Data Information Workshop is the set of the facilities that Warehouse Data Mining DSI optimize use of the Corporate Information Factory by Warehouse Data CIF Data organizing Management its capabilities and knowledge, and then Acquisition Delivery OLAP Data assimilating them into the business process. DSI Mart Other Operational Systems DSI Operational Data Store Tr. I Oper Mart DSI Meta Data Management Data Acquisition Management Operation & Administration Service Management Change Sharif Management University
Library and Toolbox Information Workshop Library & Toolbox Workbench Information Feedback Exploration Warehouse External API ERP API Internet API Legacy API Other Operational Systems Management 42 DSI The Library and. Data Toolbox is the collection of meta data Warehouse Data Mining DSI and capabilities that provides information to effectively Warehouse Data CIF Data use and administer the Corporate Information Factory. Acquisition Management Delivery The library provides the medium from OLAP Data knowledge which DSI Mart Operational is enriched. The Store Data toolbox is a vehicle for organizing, Tr. I locating, and accessing capabilities. Oper Mart DSI Meta Data Management Data Acquisition Management Operation & Administration Service Management Change Sharif Management University
Workbench Information Workshop Library & Toolbox Workbench Information Feedback Exploration Warehouse API Data The Workbench is a strategic mechanism for ERP Warehouse automating. Data integration of capabilities. Data the and CIF Data Internet knowledge into the business process. Delivery Acquisition Management API Other Operational Systems Management 43 Operational Data Store Legacy Tr. I DSI Data Mining Warehouse DSI OLAP Data Mart DSI Oper Mart External DSI Meta Data Management Data Acquisition Management Operation & Administration Service Management Change Sharif Management University
Operations and Administration Information Workshop Library & Toolbox Workbench Information Feedback External API ERP Internet API Systems Management CIF Data Management Data Delivery Warehouse Tr. I OLAP Data Mart DSI Oper Mart Operational Data Store Other Operational Systems 44 Data Acquisition Legacy API Operation and Administration is the set of Exploration activities required to ensure smooth daily DSI Warehouse operations, to ensure that resources are Data Warehouse optimized, and to ensure that growth Data managed. is Mining DSI Meta Data Management Data Acquisition Management Operation & Administration Service Management Change Sharif Management University
Systems Management Information Workshop Library & Toolbox Workbench Information Feedback External API ERP Internet API Systems Management CIF Data Management Data Delivery Warehouse DSI Tr. I OLAP Data Mart DSI Oper Mart Operational Data Store Other Operational Systems 45 Data Acquisition Legacy API Systems Management is the set of Exploration processes for maintaining, versioning, and Warehouse upgrading the core technology on which the Data Warehouse data, software, and tools operate. Data Mining DSI Meta Data Management Data Acquisition Management Operation & Administration Service Management Change Sharif Management University
Data Acquisition Management Information Workshop Library & Toolbox Workbench Information Feedback External API ERP Internet API Systems Management CIF Data Management Data Delivery Warehouse Tr. I OLAP Data Mart DSI Oper Mart Operational Data Store Other Operational Systems 46 Data Acquisition Legacy API Data Acquisition Management is the set of processes that manage and maintain processes Explorationto capture used DSI Warehouse source data and its preparation for loading into the Data Warehouse data warehouse or operational data store. Data Mining DSI Meta Data Management Data Acquisition Management Operation & Administration Service Management Change Sharif Management University
Service Management Information Workshop Library & Toolbox External API ERP API Internet API Acquisition Systems Management Delivery Tr. I OLAP Data Mart DSI Oper Mart Operational Data Store Other Operational Systems 47 Information Feedback Service Management is the set of processes for promoting user satisfaction and productivity within the Corporate Information Factory. It. Exploration DSI includes Warehouse processes that. Data manage and maintain service level agreements, Warehouse for change, user Mining DSI requests Data Warehouse Data CIF Data communications, and the data delivery mechanisms. Legacy API Workbench DSI Meta Data Management Data Acquisition Management Operation & Administration Service Management Change Sharif Management University
Change Management Information Workshop Library & Toolbox Workbench Information Feedback External API ERP Internet API Systems Management CIF Data Management Data Delivery Warehouse DSI Tr. I OLAP Data Mart DSI Oper Mart Operational Data Store Other Operational Systems 48 Data Acquisition Legacy API Exploration Change Management is the set of processes Warehouse coordinating modifications to the Corporate Data Warehouse Information Factory. Data Mining DSI Meta Data Management Data Acquisition Management Operation & Administration Service Management Change Sharif Management University
Extraction, Transportation, Loading and Transformation (ETL) OLTP Databases Staging File Warehouse Database Purchase specialist tools, or develop programs • • Transportation - move data into the warehouse • 49 Extraction - select data using different methods Loading and Transformation - validate, clean, integrate, and time stamp data Sharif University
Data Quality - Importance Ensure data is Relevant • Useful • Quality • Accurate Useful • Accessible Quality Clean up Accurate Change Operational systems Warehouse Restructure • Large time consuming task 50 Accessible Sharif University
An Example Sale 1/2/98 12: 00: 01 Ham Pizza $10. 00 Sale 1/2/98 12: 00: 02 Cheese Pizza $15. 00 Browser: http: // Cu s tom ers + : $12. 00 Return 1/2/98 12: 00: 03 Anchovy Pizza - $12. 00 Sale 1/2/98 12: 00: 04 Sausage Pizza $11. 00 Sale 1/2/98 12: 00: 01 Ham Pizza $10. 00 Sale 1/2/98 12: 00: 02 Cheese Pizza $15. 00 Sale 51 orof 12: 00: 02 Anchovy Pizza 1/2/98 a rec 1/2/98 X as Sale Hollywood 12: 00: 04 Sausage Pizza $11. 00 Sharif University
Extraction in Data Warehouses • Logical Extraction Methods – Full Extraction • The data is extracted completely from the source system. – Incremental Extraction • At a specific point in time, only the data that has changed since a well-defined event back in history will be extracted. • Physical Extraction Methods – Online Extraction • The data is extracted directly from the source system itself. – Offline Extraction • • 52 Flat files Dump files Redo and archive logs Transportablespaces Sharif University
Changing Data First time load Operational Databases Warehouse Database Refresh Purge or Archive Refresh 53 Sharif University
Transportation in Data Warehouses • Transportation Mechanisms in Data Warehouses – Transportation Using Flat Files – Transportation Through Distributed Operations – Transportation Using Transportable Tablespaces 54 Sharif University
Transportation in Data Warehouses • Transportation Using Flat Files – The most common method for transporting data is by the transfer of flat files, using mechanisms such as FTP or other remote file system access protocols • Transportation Through Distributed Operations – Distributed queries, either with or without gateways, can be an effective mechanism for extracting data. These mechanisms also transport the data directly to the target system. • Transportation Using Transportable Tablespaces – Some Databases such as Oracle and DB 2 introduced an important mechanism for transporting data: transportablespaces. This feature is the fastest way for moving large volumes of data between two databases. 55 Sharif University
Loading and Transformation in Data Warehouses • Loading Mechanisms – – SQL*Loader External Tables OCI and Direct-Path APIs Export/Import • Transformation Mechanisms – Transformation Using SQL – Transformation Using PL/SQL – Transformation Using Table Functions 56 Sharif University
Incremental Development – Focus on business functionality – Deliver business benefit – Are suited to Project and Program warehouse evolution Management – Once an increment is complete the selection and scope of the next increment is defined – Each increment follows the same phase sequence 57 Strategy Incremental Development Definition Analysis Design Build ETA Enterprise Technical Architecture Transition to Production Discovery Sharif University
Roles – The project team: roles and responsibilities – Common roles • Analyst, Database Administrator, Programmer, Tester – Warehouse specific roles • DW Architect, Metadata Architect, Data Quality Administrator, DW Administrator 58 Sharif University
Design in Data Warehouses • Logical Design in Data Warehouses – Data Warehousing Schemas • Star • Snowflake • Constellation • Physical Design in Data Warehouses – Physical Design Structures • • 59 Tablespaces Tables and Partitioned Tables Views Integrity Constraints Dimensions Indexes and Partitioned Indexes Materialized Views Sharif University
Data Warehousing Schemas • Star • Snowflake • Constellation 60 Sharif University
Star Schema Product Table Product_id Product_desc … • The center of the star consists of one or more fact tables and the points of the star are the dimension tables. Time Table Day_id Month_id Period_id Year_id 61 Store Table Store_id District_id. . . Sales Fact Table Product_id Store_id Item_id Day_id Sales_dollars Sales_units. . . Item Table Item_id Item_desc. . . Sharif University
Snowflake Schema Product Table Product_id Product_desc • d Store Table Store_id Store_desc District_id District Table District_id District_desc Sales Fact Table Item_id Store_id Sales_dollars Sales_units Time Table Week_id Period_id Year_id 62 Item Table Item_id Item_desc Dept_id Dept Table Dept_id Dept_desc Mgr_id Mgr Table Dept_id Mgr_name Sharif University
Constellation Warehouse Table Warehouse_id Warehouse_loc Product Table Product_id Product_desc Inventory Fact Table Product_id Shelf_id Cost_dollars Qty_on_hand Sales Fact Table Item_id Store_id Sales_dollars Sales_units Time Table Week_id Period_id Year_id 63 Store Table Store_id District_id Item Table Item_id Dept_id Sharif University
Summary • • • 64 Need for Data Warehouse. What is Data Warehouse? Data Warehouse Properties. Data Warehouse Architectures. Data Marts. Corporate Information Factory. Extraction, Transportation, Loading and Transformation. Design in Data Warehouses. Data Warehousing Schemas. Sharif University
Q&A Internal and external systems 65 Data warehouse Decision makers Sharif University
62f3e533856f38d7b49c05d6aa0a52ab.ppt