02 - Business_Intelligence_Fundamentals_SSIS.pptx
- Количество слайдов: 40
8/7/2008 Introduction to SQL Server 2008 Integration Services
2 Disclaimer The information contained in this slide deck represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication. This slide deck is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS DOCUMENT. Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this slide deck may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation. Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this slide deck. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this slide deck does not give you any license to these patents, trademarks, copyrights, or other intellectual property. Unless otherwise noted, the example companies, organizations, products, domain names, e-mail addresses, logos, people, places and events depicted herein are fictitious, and no association with any real company, organization, product, domain name, email address, logo, person, place or event is intended or should be inferred. © 2008 Microsoft Corporation. All rights reserved. Microsoft, SQL Server, Office System, Visual Studio, Share. Point Server, Office Performance. Point Server, . NET Framework, Pro. Clarity Desktop Professional are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries. The names of actual companies and products mentioned herein may be the trademarks of their respective owners.
Where Are We? Data Warehouse Data Sources Data Marts Staging Area Manual Cleansing Client Access
Presentation title goes in here 4 1. Reviewing ETL Fundamentals 2. Introducing SSIS Components 3. Designing ETL Packages 4. Slowly Changing Dimension Transformation Agenda
5 Module Overview Reviewing ETL Fundamentals Introducing SSIS Components Designing ETL Packages Slowly Changing Dimension Transformation
6 Reviewing ETL Fundamentals Data Integration Business Scenarios SQL Server 2008 Data Integration Populating Dimension Tables Populating Fact Tables
7 Data Integration The ability to transform corporate data into meaningful and actionable information Challenges Retrieve and merge data from multiple sources Cleanse and transform the data Load the data into appropriate data stores for analysis and reporting Enterprises spend 60%– 80% of their resources in the data integration stage
8 Business Scenarios Populating data warehouses (ETL) Cleaning and standardizing data Merging data from heterogeneous data stores Automating administrative tasks
9 SQL Server 2008 Integration Services Satisfies traditional demands of ETL operations and general-purpose data integration Provides a robust, flexible, fast, scalable, and extensible architecture Challenges traditional ETL design approaches
Populating Dimension Tables Dimension source Transform Correlate records New record? Y N Type 1 change? Y Update changed column(s) N Type 2 change? Y Expire existing record Insert new record 10
Populating Fact Tables Fact source Transform Lookup dimension key Lookup failed? N Repeat for each dimension key Y Insert new dimension record Insert new record 11
12 Introducing SSIS Components Architecture The Package Connection Manager Control Flow Data Flow
Examining the SSIS build environment DEMO 13
Architecture SQL Server Integration Services (SSIS) service SSIS object model Two distinct runtime engines: Control flow Data flow 32 -bit and 64 -bit editions 14
15 The Package The basic unit of work, deployment, and execution An organized collection of: Connection managers Control flow components Data flow components Variables Event handlers Configurations Can be designed graphically or built programmatically Saved in XML format to the file system or SQL Server
16 Connection Manager Logical representation of a connection Stored in the package and cannot be shared between packages Used by package elements Does not need to connect to SQL Server
Connection Manager DEMO 17
18 Control Flow Control flow is a process-oriented workflow engine A package contains a single control flow Control flow elements Containers Tasks Precedence constraints Variables
19 Containers Provide structure and services for Grouping tasks Implementing repeating flows Execute in sequence defined by precedence constraints in control flow Manage variable and transactional boundaries
Data Flow Task Encapsulates the data flow engine Extract Transform Load 20
Control Flow and Data Flow DEMO 21
22 Scripting Tasks Use a Script Task to develop scripts that are precompiled for better performance Develop VB. NET or C#. NET scripts in Visual Studio for Applications (VSTA) using these features Intelli. Sense Color coding Integrated Help References: –. NET Framework libraries – Managed assemblies – COM objects Debugging Consider a Custom Task to reuse functionality between packages
Precedence Constraints Link containers and tasks (executables) to control the order of execution Configure conditions that determine whether the executable runs Success, Failure, or Completion Constraints Expressions Logical AND/OR for multiple constraints 23
24 Variables customize package behavior by changing expression values or object properties System variables store values collected while the package runs User variables store values assigned By default in the current package By a variable passed by a parent package By an expression All variables use case-sensitive names Variables can be scoped at package, container, or task level
Developing a Query-Driven E-Mail Delivery System DEMO 25
26 Data Flow The Data Flow Task Encapsulates the data flow engine Exists in the context of an overall control flow Performs traditional ETL in addition to other extended scenarios Is fast and scalable Data Flow Components Extract data from Sources Load data into Destinations Modify data with Transformations Service Paths Connect data flow components Create the pipeline
Data Flow Sources extract data from Relational tables and views Files Analysis Services databases 27
Data Flow Destinations load data to Relational tables and views Files Analysis Services databases and objects Data. Readers and Recordsets Enterprise Edition only 28
Row Transformations Update column values or create new columns Transform each row in the pipeline input 29
Rowset Transformations Create new rowsets that can include Aggregated values Sorted values Sample rowsets Pivoted or unpivoted rowsets Are also called asynchronous components Mar-2008 Microsoft Developer & Platform Evangelism 30
Split and Join Transformations Distribute rows to different outputs Create copies of the transformation inputs Join multiple inputs into one output Perform lookup operations 31
Loading and Integrating Data from Multiple Sources DEMO 2 32
33 Designing ETL Packages Orchestrating the ETL Process Populating Dimension Tables Populating Fact Tables
34 Orchestrating the ETL Process A single package performs ETL for a single dimension or fact table A “master package” orchestrates the sequence of the entire ETL process by using Execute Package tasks Execute all dimension packages first Execute all fact table packages second A robust design includes extensive logging to store execution history and audit trails Implement package logging to capture most execution details Customize and extend logging by using control flow tasks
Populating Dimension Tables Dimension source Transform Correlate records New record? Y N Type 1 change? Most dimension packages benefit from this transformation Y Update changed column(s) N Type 2 change? Y Expire existing record Insert new record 35
Populating Fact Tables Fact source Transform Lookup dimension key The Lookup transformation is ideal for dimension key lookups and can be configured to manage lookup failures Lookup failed? N Process each dimension key Y Insert new dimension record Insert new record 36
37 Slowly Changing Dimension Transformation The wizard-based configuration promotes rapid ETL development Supports Type 0 (Fixed Attribute) Type 1 (Changing Attribute) Type 2 (Historical Attribute) Inferred member management Automatically constructs the downstream data flow Handles the majority of slowly changing dimension scenarios
SCD Type 1 Existing record is updated History is not preserved Last. Name update to Valdez-Smythe 38
SCD Type 2 Existing record is ‘expired’ and new record inserted History is preserved Most common form of Slowly Changing Dimension Sales. Territory. Key update to 10 39
ISV Innovation will now be the headquarters for all U. S. ISV training. We’re dedicated to providing comprehensive training for the entire developer community so be sure to make www. isvinnovation. com your first stop for all your training needs. This site is filled with fantastic new resources, features, and functionality. We plan to make this site a one-stop shop for all your ISV needs so if you have any feedback please contact us as usisv@microsoft. com. 40