Скачать презентацию Data Warehouse Fundamentals Chapter 5 Analyzing and Defining Скачать презентацию Data Warehouse Fundamentals Chapter 5 Analyzing and Defining

4fdc369156e63ef23cc4de6cc8c4de06.ppt

  • Количество слайдов: 57

Data Warehouse Fundamentals Chapter 5 Analyzing and Defining Business Requirements for a Data Warehouse Data Warehouse Fundamentals Chapter 5 Analyzing and Defining Business Requirements for a Data Warehouse Paul K Chen 1

Chapter 5 - Objectives u u u Learn what is the definition of the Chapter 5 - Objectives u u u Learn what is the definition of the business requirements Understand the role of business dimensions related to DW business requirements Learn specifically the steps in defining and recording DW business requirements Review methods for gathering requirements (JAD, Interviews and Sampling) Discuss briefly architecture concepts impacted by business requirements

Definition of The Business Requirements u The definition of requirements is the user’s statement Definition of The Business Requirements u The definition of requirements is the user’s statement of how he or she wants to do business, and the information required to support his or her new methods of operations.

Definition of The Business Requirements The requirements can be broadly divided into two areas: Definition of The Business Requirements The requirements can be broadly divided into two areas: 1 Functional requirements—written in user terminology since it is user operations that are being described. 2 Non-functional requirements –these are the limitations and demands imposed upon the computing solutions; such as architectural plan, data storage specifications and information system performance expectations.

Requirements As the Driving Force for Data Warehousing u u u Understand why business Requirements As the Driving Force for Data Warehousing u u u Understand why business requirements are the driving force Discuss how requirements drive every development phase Specifically Learn how requirements influence data design Review the impact of requirements on architecture Note the special considerations for ETL and metadata Examine how requirements shape information delivery

Business Requirements As the Driving Force Business Requirements Maintenance Planning & Management Design Architecture Business Requirements As the Driving Force Business Requirements Maintenance Planning & Management Design Architecture Infrastructure Data Acquisition Data Storage Information Delivery Deployment Construction Architecture Infrastructure Data Acquisition Data Storage Information Delivery

Dimensional Nature of Business Data The business data of sales units (fact) is measured Dimensional Nature of Business Data The business data of sales units (fact) is measured analyzed in three dimensional. Product Time Geography

Examples of Business Facts and Dimensions Supermarket Chain Time Promotion Sale Unit Store Product Examples of Business Facts and Dimensions Supermarket Chain Time Promotion Sale Unit Store Product Insurance Business Agent Time Claims Insured Party Status Policy Manufacturing Company Cust-ship-to Time Ship from Shipment Ship Mode Deal Product Airline Company Customer Time Flight Frequent Fare Class Flyer Flights Status Airport

Defining and Recording Information Requirements for a Data Warehouse u Nine-Step Methodology includes the Defining and Recording Information Requirements for a Data Warehouse u Nine-Step Methodology includes the following steps: Step 1: Choosing the process Step 2: Choosing the grain Step 3: Identifying and conforming the dimensions Step 4: Choosing the facts Step 5: Storing pre-calculations in the fact table Step 6: Rounding out the dimension tables Step 7: Choosing the duration of the database Step 8: Tracking slowly changing dimensions Step 9: Deciding the query priorities and the query modes.

Step 1: Choosing The Process (Subject Area) u The process (function) refers to the Step 1: Choosing The Process (Subject Area) u The process (function) refers to the subject matter of a particular data mart. u First data mart built should be the one that is most likely to be delivered on time, within budget, and to answer the most commercially important business questions.

Subject Area Selecting the first subject area or areas to be populated Use the Subject Area Selecting the first subject area or areas to be populated Use the enterprise level data model in selecting appropriate subject area(s) u Three Options: -- Implement a single subject area (best option) -- Implement a subset of a subject area -- Implement a subset of several subject areas (most common) • Determine how much data should be loaded and its variety u

Step 2: Choosing The Grain u Decide what a record of the fact table Step 2: Choosing The Grain u Decide what a record of the fact table is to represent. u Identify dimensions of the fact table. The grain decision for the fact table also determines the grain of each dimension table. u Also include time as a core dimension, which is always present in star schemas. Due to disk space constraint, data selected must be time relevant in terms of trend, predictability, and profitability for the enterprise.

Step 3: Identifying And Conforming The Dimensions u Dimensions set the context for asking Step 3: Identifying And Conforming The Dimensions u Dimensions set the context for asking questions about the facts in the fact table. u If any dimension occurs in two data marts, they must be exactly the same dimension, or one must be a mathematical subset of the other. u A dimension used in more than one data mart is referred to as being conformed (shared).

Step 4: Choosing The Facts u The grain of the fact table determines which Step 4: Choosing The Facts u The grain of the fact table determines which facts can be used in the data mart. u Facts should be numeric and additive. u Unusable facts include: – non-numeric facts – non-additive facts – fact at different granularity from other facts in table.

Fact Criteria Weight the Fact attributes based upon the following criteria: u They exhibit Fact Criteria Weight the Fact attributes based upon the following criteria: u They exhibit measurable results to the Users and Management. • They are visible within the business and through management. • They are manageable.

Subject Area Subject areas are collections of like data that support analysis of the Subject Area Subject areas are collections of like data that support analysis of the major subjects in a business. Election criteria: u They consist of two or more attributes. • They are essential to the successful operation of the target system or business area to meet client objectives. • They can be defined by governing business rules.

Step 5: Storing Pre-Calculations In The Fact Table u Once the facts have been Step 5: Storing Pre-Calculations In The Fact Table u Once the facts have been selected each should be reexamined to determine whethere are opportunities to use pre-calculations.

ADD DERIVED DATA • Benefits Less space used Enhanced performance Breaking_lease Percentage_of_breaking_lease (< 3_months) ADD DERIVED DATA • Benefits Less space used Enhanced performance Breaking_lease Percentage_of_breaking_lease (< 3_months) Percentage_of_breaking_lease (>3 but < 6 months) Percentage_of_breaking_lease (>6 but <9 months) Percentage_of_breaking_lease (>9 but <12 months) Percentage_of_breaking_lease (> 12 months)

Add Summarization Schemes u Simple summation u Summation by group u Aggregation u Vertical Add Summarization Schemes u Simple summation u Summation by group u Aggregation u Vertical summarization

Simple Summation --Add Summarization Schema Individual Daily sales Date Product Qty Jan 1 nuts Simple Summation --Add Summarization Schema Individual Daily sales Date Product Qty Jan 1 nuts 100 Jan 1 nuts 200 Sales $ 300 600 Jan 2 nuts 300 100 900 300 Jan 3 Nuts 50 40 150 120 Daily Sales Summary Date Product Qty Jan 1 Nuts 300 Jan 2 Nuts 400 Jan 3 Nuts 90 Sales $ 900 1, 200 20

Summation By Group data attributes based on usage and stability. • Group stable and Summation By Group data attributes based on usage and stability. • Group stable and slowly changing data all in one table • Group unstable and frequently changing data all in another table

Aggregation is used to create data marts. For instance, a group of users frequently Aggregation is used to create data marts. For instance, a group of users frequently perform analysis comparing sales across geographic regions, broken by product line. If a data mart were created that stores the sales data already aggregated to the desired level, the users’ queries would be simpler.

Aggregation Add up amounts by day in sql: SELECT date, sum (amt) FROM SALE Aggregation Add up amounts by day in sql: SELECT date, sum (amt) FROM SALE GROUP BY date sale p Store date amt p 1 c 1 p 2 c 2 p 1 c 3 p 2 c 1 1 1 2 1 ans 1 2 4 3 date 1 2 Roll Up Drill Down sum 6 4

Vertical Summarization building upon a single dimensional theme: • • • Monthly renters Total Vertical Summarization building upon a single dimensional theme: • • • Monthly renters Total # of all renters Total # of new renters Total rental income Monthly sales Staff name Total sales $ Total houses sold

Step 6: Rounding Out The Dimension Tables u Text descriptions are added to the Step 6: Rounding Out The Dimension Tables u Text descriptions are added to the dimension tables. u Text descriptions should be as intuitive and understandable to the users as possible. u Usefulness of a data mart is determined by the scope and nature of the attributes of the dimension tables.

Step 7: Choosing The Duration Of The Database u Duration measures how far back Step 7: Choosing The Duration Of The Database u Duration measures how far back in time the fact table goes. For ex. Insurance &Tax Considerations. u Very large fact tables raise at least two very significant data warehouse design issues. – Often difficult to source increasing old data. – It is mandatory that the old versions of the important dimensions be used, not the most current versions, known as the ‘Slowly Changing Dimension’ problem.

Step 8: Tracking Slowly Changing Dimensions u Slowly changing dimension problem means that the Step 8: Tracking Slowly Changing Dimensions u Slowly changing dimension problem means that the proper description of the old dimension data must be used with old fact data. u Often, a generalized key must be assigned to important dimensions in order to distinguish multiple snapshots of dimensions over a period of time.

Step 9: Deciding The Query Priorities And The Query Modes u Most critical physical Step 9: Deciding The Query Priorities And The Query Modes u Most critical physical design issues affecting the enduser’s perception includes: – physical sort order of the fact table on disk – presence of pre-stored summaries or aggregations. u Additional physical design issues include administration, backup, indexing performance, and security.

Criteria For Assessing The Dimensionality Of A Data Warehouse u Criteria proposed by Ralph Criteria For Assessing The Dimensionality Of A Data Warehouse u Criteria proposed by Ralph Kimball to measure the extent to which a system supports the dimensional view of data warehousing. u Twenty criteria divided into three broad groups: architecture, administration, and expression.

Architectural Criteria Architectural criteria describes way the entire system is organized. – – – Architectural Criteria Architectural criteria describes way the entire system is organized. – – – – Explicit declaration Conformed dimensions and facts Dimensional integrity Open aggregate navigation Dimensional symmetry Dimensional scalability Sparsity tolerance

Administration Criteria Administration criteria are considered to be essential to the ‘smooth running’ of Administration Criteria Administration criteria are considered to be essential to the ‘smooth running’ of a dimensionally-oriented data warehouse. – – – Graceful modification Dimensional replication Changed dimension notification Surrogate key administration International consistency

Expression Criteria Expression criteria are mostly analytic capabilities that are needed in real-life situations. Expression Criteria Expression criteria are mostly analytic capabilities that are needed in real-life situations. – – – – Multiple-dimension hierarchies Ragged-dimension hierarchies Multiple valued dimensions Slowly changing dimensions Roles of a dimension Hot-swappable dimensions On-the-fly fact range dimension On-the-fly behaviour dimension

Business Requirements (Use Automaker Sales as an example) In order to get an idea Business Requirements (Use Automaker Sales as an example) In order to get an idea of the data to be used by the sales and Inventory department, a facilitation session was held with 15 key end users and the IT data warehouse team. The following business questions were generated from that meeting: u What is the sales trend in quantity and dollar amounts sold each Make, Model, Series and Color for a specific dealer, for each

Matching User Requirements to DW Data Requirements (Develop Fact Table) Primary Key u dealer_id Matching User Requirements to DW Data Requirements (Develop Fact Table) Primary Key u dealer_id u month_year u sales_area_id u make u model u series

Determine Dimensions & Attributes Dimensions u sales_area_dim u sales_time_dim u dealer_dim Attributes u dealer_mms_sales_qty Determine Dimensions & Attributes Dimensions u sales_area_dim u sales_time_dim u dealer_dim Attributes u dealer_mms_sales_qty u dealer_mms_sales_dollar_amt u dealer_ytd_mms_sales_qty u dealer_ytd_mms_sales_amt u dealer_inventory_qty

Determine Dimensions & Attributes Determine Dimensions & Attributes

Collecting The Business Requirements via JAD Sessions JAD (Joint Application Development) vs. Traditional Way Collecting The Business Requirements via JAD Sessions JAD (Joint Application Development) vs. Traditional Way of Gathering Requirements u JAD sessions (also called facilitated session) are used to gather information and feedback and confirm the results of requirements gathering. u JAD sessions replace the traditional way of conducting a series of interviews on a one-to-one basis with the users. Advantages: Achieving consensus during the session when multiple sources of information exist, raising and addressing issues or assigning them for resolution, and immediately confirming information.

JAD Session u JAD sessions are used to scope the project. Each session should JAD Session u JAD sessions are used to scope the project. Each session should last two to three day. They are very focused and fast-paced. u JAD sessions can be very formal and follow strict guidelines or be informal group sessions.

JAD - Roles Whether they are formal or informal, there are four necessary roles JAD - Roles Whether they are formal or informal, there are four necessary roles to be filled: u u Facilitator The Facilitator is the session leader. It is the facilitator’s responsibility to ensure that the objectives of the sessions are met. Scribes(s) Scribes are responsible for recording the minutes of the session and optionally constructing deliverables using an automated tool as the session progresses.

JAD - Roles u User The users provide knowledge specific to the scope of JAD - Roles u User The users provide knowledge specific to the scope of the project. u Developers are the team members who will be building the system.

JAD Session The session is divided into three segments: u u u Introduction: Welcoming JAD Session The session is divided into three segments: u u u Introduction: Welcoming remarks; description of the facilities such as rest room locations, messages, reviewing the agenda and setting expectations. Conducting the session: To confirm deliverables set out in the session objectives. Wrapping up the session: By summarizing progress towards the objectives; reviewing the agenda for the next one and obtaining feedback from the participants.

JAD Session Potential drawbacks u The commitment of a large block of time for JAD Session Potential drawbacks u The commitment of a large block of time for all participants u Requirements collected could be less than satisfactory due to unpredictability of the JAD session or organizational culture not sufficiently developed to enable the concerted efforts required to be productive in a JAD setting.

Five Steps in Interview Preparation u Reading background material u Establishing interview objectives u Five Steps in Interview Preparation u Reading background material u Establishing interview objectives u Deciding when to interview u Preparing the interviewee u Deciding on question type and structure

Two Types of Questions – Open-End Questions vs. Closed Questions Open-ended interview questions Open Two Types of Questions – Open-End Questions vs. Closed Questions Open-ended interview questions Open describes the interviewee’s options for responding. They are open. Advantages: u Putting the interviewee at ease u Allowing more spontaneity Disadvantages: u Possibly losing control of the interview u May not get the types of answers you want

Two Types of Questions – Open-End Questions vs. Closed Questions Closed interview questions Such Two Types of Questions – Open-End Questions vs. Closed Questions Closed interview questions Such as “ How many subordinates do you have? Benefits: u Getting to relevant data u Keeping control over the interview Drawbacks: u Failing to obtain rich detail u Intimidating the interviewee

Three Basic Ways of Structuring Interviews u Pyramid Structure: Starting from closed questions, then Three Basic Ways of Structuring Interviews u Pyramid Structure: Starting from closed questions, then gradually expand into open territory. u Funnel Structure: The reverse of pyramid structure approach. u Diamond-Shaped: A combination of the two above structures.

The Needs for Sampling u Containing costs u Speeding up the data gathering u The Needs for Sampling u Containing costs u Speeding up the data gathering u Improving effectiveness u Reducing bias

Sampling Design Four steps: u Determine the data to be collected or described u Sampling Design Four steps: u Determine the data to be collected or described u Determine the population to be sampled u Choose the type of sample u Decide on the sample size

Kinds of Information Sought in Investigation Type of hard data (other than interviewing and Kinds of Information Sought in Investigation Type of hard data (other than interviewing and observation) - Quantitative Data u u Reports for decision making Performance reports Records Data capture forms

Kinds of Information Sought in Investigation Qualitative Data u u u Memos Signs in Kinds of Information Sought in Investigation Qualitative Data u u u Memos Signs in bulletin boards and in work areas Corporate websites Manuals Policy handbooks

The Architectural Plan (Non-Functional Requirements) Relevant Architecture Concepts Impacted by Requirements • • • The Architectural Plan (Non-Functional Requirements) Relevant Architecture Concepts Impacted by Requirements • • • Client / Server Architecture Data Warehouse Parallel Database Technology RAID Technology

Client / Server Architecture Machine Configuration Example Client / Server Architecture Machine Configuration Example

Client / Server architecture provides flexibility to support different combinations of host machine configurations. Client / Server architecture provides flexibility to support different combinations of host machine configurations. 3 Tier hosting configuration for Typical Queries (or any process that makes a high volume of DB calls) suffers in performance due to Network Messaging overhead.

2 Tier hosting configuration supports the performance requirements of Typical Queries by eliminating 99. 2 Tier hosting configuration supports the performance requirements of Typical Queries by eliminating 99. 8% of Network Messaging overhead.

Data Warehouse Parallel Database Technologies u u Shared memory architecture (SMP) – All the Data Warehouse Parallel Database Technologies u u Shared memory architecture (SMP) – All the servers share all the data Shared nothing architecture (MPP) – Each server has its own partition of data

RAID Technology u u u RAID 0: Sector interleave, no error checking (no redundancy) RAID Technology u u u RAID 0: Sector interleave, no error checking (no redundancy) RAID 1: Mirroring (duplicate copy) RAID 2: Bit interleave with error correction codes on multiple drives RAID 3: Bit interleave with error correction on single drive RAID 4: Sector interleave with dedicated parity drive RAID 5: Sector interleave, parity stored on all drives

Tugas u Jelaskan apa yang dimaksud dengan bussines requirement? u Apa yang Anda lakukan Tugas u Jelaskan apa yang dimaksud dengan bussines requirement? u Apa yang Anda lakukan dalam bussiness requirement?