Скачать презентацию 8 7 2008 Introduction to SQL Server 2008 Integration Services Скачать презентацию 8 7 2008 Introduction to SQL Server 2008 Integration Services

02 - Business_Intelligence_Fundamentals_SSIS.pptx

  • Количество слайдов: 40

8/7/2008 Introduction to SQL Server 2008 Integration Services 8/7/2008 Introduction to SQL Server 2008 Integration Services

2 Disclaimer The information contained in this slide deck represents the current view of 2 Disclaimer The information contained in this slide deck represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication. This slide deck is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS DOCUMENT. Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this slide deck may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation. Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this slide deck. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this slide deck does not give you any license to these patents, trademarks, copyrights, or other intellectual property. Unless otherwise noted, the example companies, organizations, products, domain names, e-mail addresses, logos, people, places and events depicted herein are fictitious, and no association with any real company, organization, product, domain name, email address, logo, person, place or event is intended or should be inferred. © 2008 Microsoft Corporation. All rights reserved. Microsoft, SQL Server, Office System, Visual Studio, Share. Point Server, Office Performance. Point Server, . NET Framework, Pro. Clarity Desktop Professional are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries. The names of actual companies and products mentioned herein may be the trademarks of their respective owners.

Where Are We? Data Warehouse Data Sources Data Marts Staging Area Manual Cleansing Client Where Are We? Data Warehouse Data Sources Data Marts Staging Area Manual Cleansing Client Access

Presentation title goes in here 4 1. Reviewing ETL Fundamentals 2. Introducing SSIS Components Presentation title goes in here 4 1. Reviewing ETL Fundamentals 2. Introducing SSIS Components 3. Designing ETL Packages 4. Slowly Changing Dimension Transformation Agenda

5 Module Overview Reviewing ETL Fundamentals Introducing SSIS Components Designing ETL Packages Slowly Changing 5 Module Overview Reviewing ETL Fundamentals Introducing SSIS Components Designing ETL Packages Slowly Changing Dimension Transformation

6 Reviewing ETL Fundamentals Data Integration Business Scenarios SQL Server 2008 Data Integration Populating 6 Reviewing ETL Fundamentals Data Integration Business Scenarios SQL Server 2008 Data Integration Populating Dimension Tables Populating Fact Tables

7 Data Integration The ability to transform corporate data into meaningful and actionable information 7 Data Integration The ability to transform corporate data into meaningful and actionable information Challenges Retrieve and merge data from multiple sources Cleanse and transform the data Load the data into appropriate data stores for analysis and reporting Enterprises spend 60%– 80% of their resources in the data integration stage

8 Business Scenarios Populating data warehouses (ETL) Cleaning and standardizing data Merging data from 8 Business Scenarios Populating data warehouses (ETL) Cleaning and standardizing data Merging data from heterogeneous data stores Automating administrative tasks

9 SQL Server 2008 Integration Services Satisfies traditional demands of ETL operations and general-purpose 9 SQL Server 2008 Integration Services Satisfies traditional demands of ETL operations and general-purpose data integration Provides a robust, flexible, fast, scalable, and extensible architecture Challenges traditional ETL design approaches

Populating Dimension Tables Dimension source Transform Correlate records New record? Y N Type 1 Populating Dimension Tables Dimension source Transform Correlate records New record? Y N Type 1 change? Y Update changed column(s) N Type 2 change? Y Expire existing record Insert new record 10

Populating Fact Tables Fact source Transform Lookup dimension key Lookup failed? N Repeat for Populating Fact Tables Fact source Transform Lookup dimension key Lookup failed? N Repeat for each dimension key Y Insert new dimension record Insert new record 11

12 Introducing SSIS Components Architecture The Package Connection Manager Control Flow Data Flow 12 Introducing SSIS Components Architecture The Package Connection Manager Control Flow Data Flow

Examining the SSIS build environment DEMO 13 Examining the SSIS build environment DEMO 13

Architecture SQL Server Integration Services (SSIS) service SSIS object model Two distinct runtime engines: Architecture SQL Server Integration Services (SSIS) service SSIS object model Two distinct runtime engines: Control flow Data flow 32 -bit and 64 -bit editions 14

15 The Package The basic unit of work, deployment, and execution An organized collection 15 The Package The basic unit of work, deployment, and execution An organized collection of: Connection managers Control flow components Data flow components Variables Event handlers Configurations Can be designed graphically or built programmatically Saved in XML format to the file system or SQL Server

16 Connection Manager Logical representation of a connection Stored in the package and cannot 16 Connection Manager Logical representation of a connection Stored in the package and cannot be shared between packages Used by package elements Does not need to connect to SQL Server

Connection Manager DEMO 17 Connection Manager DEMO 17

18 Control Flow Control flow is a process-oriented workflow engine A package contains a 18 Control Flow Control flow is a process-oriented workflow engine A package contains a single control flow Control flow elements Containers Tasks Precedence constraints Variables

19 Containers Provide structure and services for Grouping tasks Implementing repeating flows Execute in 19 Containers Provide structure and services for Grouping tasks Implementing repeating flows Execute in sequence defined by precedence constraints in control flow Manage variable and transactional boundaries

Data Flow Task Encapsulates the data flow engine Extract Transform Load 20 Data Flow Task Encapsulates the data flow engine Extract Transform Load 20

Control Flow and Data Flow DEMO 21 Control Flow and Data Flow DEMO 21

22 Scripting Tasks Use a Script Task to develop scripts that are precompiled for 22 Scripting Tasks Use a Script Task to develop scripts that are precompiled for better performance Develop VB. NET or C#. NET scripts in Visual Studio for Applications (VSTA) using these features Intelli. Sense Color coding Integrated Help References: –. NET Framework libraries – Managed assemblies – COM objects Debugging Consider a Custom Task to reuse functionality between packages

Precedence Constraints Link containers and tasks (executables) to control the order of execution Configure Precedence Constraints Link containers and tasks (executables) to control the order of execution Configure conditions that determine whether the executable runs Success, Failure, or Completion Constraints Expressions Logical AND/OR for multiple constraints 23

24 Variables customize package behavior by changing expression values or object properties System variables 24 Variables customize package behavior by changing expression values or object properties System variables store values collected while the package runs User variables store values assigned By default in the current package By a variable passed by a parent package By an expression All variables use case-sensitive names Variables can be scoped at package, container, or task level

Developing a Query-Driven E-Mail Delivery System DEMO 25 Developing a Query-Driven E-Mail Delivery System DEMO 25

26 Data Flow The Data Flow Task Encapsulates the data flow engine Exists in 26 Data Flow The Data Flow Task Encapsulates the data flow engine Exists in the context of an overall control flow Performs traditional ETL in addition to other extended scenarios Is fast and scalable Data Flow Components Extract data from Sources Load data into Destinations Modify data with Transformations Service Paths Connect data flow components Create the pipeline

Data Flow Sources extract data from Relational tables and views Files Analysis Services databases Data Flow Sources extract data from Relational tables and views Files Analysis Services databases 27

Data Flow Destinations load data to Relational tables and views Files Analysis Services databases Data Flow Destinations load data to Relational tables and views Files Analysis Services databases and objects Data. Readers and Recordsets Enterprise Edition only 28

Row Transformations Update column values or create new columns Transform each row in the Row Transformations Update column values or create new columns Transform each row in the pipeline input 29

Rowset Transformations Create new rowsets that can include Aggregated values Sorted values Sample rowsets Rowset Transformations Create new rowsets that can include Aggregated values Sorted values Sample rowsets Pivoted or unpivoted rowsets Are also called asynchronous components Mar-2008 Microsoft Developer & Platform Evangelism 30

Split and Join Transformations Distribute rows to different outputs Create copies of the transformation Split and Join Transformations Distribute rows to different outputs Create copies of the transformation inputs Join multiple inputs into one output Perform lookup operations 31

Loading and Integrating Data from Multiple Sources DEMO 2 32 Loading and Integrating Data from Multiple Sources DEMO 2 32

33 Designing ETL Packages Orchestrating the ETL Process Populating Dimension Tables Populating Fact Tables 33 Designing ETL Packages Orchestrating the ETL Process Populating Dimension Tables Populating Fact Tables

34 Orchestrating the ETL Process A single package performs ETL for a single dimension 34 Orchestrating the ETL Process A single package performs ETL for a single dimension or fact table A “master package” orchestrates the sequence of the entire ETL process by using Execute Package tasks Execute all dimension packages first Execute all fact table packages second A robust design includes extensive logging to store execution history and audit trails Implement package logging to capture most execution details Customize and extend logging by using control flow tasks

Populating Dimension Tables Dimension source Transform Correlate records New record? Y N Type 1 Populating Dimension Tables Dimension source Transform Correlate records New record? Y N Type 1 change? Most dimension packages benefit from this transformation Y Update changed column(s) N Type 2 change? Y Expire existing record Insert new record 35

Populating Fact Tables Fact source Transform Lookup dimension key The Lookup transformation is ideal Populating Fact Tables Fact source Transform Lookup dimension key The Lookup transformation is ideal for dimension key lookups and can be configured to manage lookup failures Lookup failed? N Process each dimension key Y Insert new dimension record Insert new record 36

37 Slowly Changing Dimension Transformation The wizard-based configuration promotes rapid ETL development Supports Type 37 Slowly Changing Dimension Transformation The wizard-based configuration promotes rapid ETL development Supports Type 0 (Fixed Attribute) Type 1 (Changing Attribute) Type 2 (Historical Attribute) Inferred member management Automatically constructs the downstream data flow Handles the majority of slowly changing dimension scenarios

SCD Type 1 Existing record is updated History is not preserved Last. Name update SCD Type 1 Existing record is updated History is not preserved Last. Name update to Valdez-Smythe 38

SCD Type 2 Existing record is ‘expired’ and new record inserted History is preserved SCD Type 2 Existing record is ‘expired’ and new record inserted History is preserved Most common form of Slowly Changing Dimension Sales. Territory. Key update to 10 39

ISV Innovation will now be the headquarters for all U. S. ISV training. We’re ISV Innovation will now be the headquarters for all U. S. ISV training. We’re dedicated to providing comprehensive training for the entire developer community so be sure to make www. isvinnovation. com your first stop for all your training needs. This site is filled with fantastic new resources, features, and functionality. We plan to make this site a one-stop shop for all your ISV needs so if you have any feedback please contact us as usisv@microsoft. com. 40