a683d50f38823f56eb703799bd1464b6.ppt
- Количество слайдов: 44
Experiences with Real-Time Data Warehousing Using Oracle Database 10 G Mike Schmitz High Performance Data Warehousing mike. schmitz@databaseperformance. com Michael Brey Principal Member Technical Staff ST/NEDC Oracle Engineering Oracle Corporation Mike Schmitz High Performance Data Warehousing
Agenda n n The meaning of Real-Time in Data Warehousing Customer Business Scenario n n n Our Real-Time Solution n n Real-Time data architecture Incremental Operational Source Change Capture Transformation and Population into DW Target Simplified Functional Demonstration Asynchronous Change Data Capture (Oracle) n 2 Customer Environment “Real-Time” Requirement Performance Characteristics and Considerations Mike Schmitz High Performance Data Warehousing
My Background An independent data warehousing consultant specializing in the dimensional approach to data warehouse / data mart design and implementation with in-depth experience utilizing efficient, scalable techniques whether dealing with large-scale data warehouses or smallscale, platform constrained data mart implementations. I deliver dimensional design and implementation as well as ETL workshops in the U. S. and Europe. I have helped implement data warehouses using Redbrick, Oracle, Teradata, DB 2, Informix, and SQL Server on mainframe, UNIX, and NT platforms, working with small and large businesses across a variety of industries including such customers as Hewlett Packard, American Express, General Mills, AT&T, Bell South, MCI, Oracle Slovakia, J. D. Power and Associates, Mobil Oil, The Health Alliance of Greater Cincinnati, and the French Railroad SNCF. 3 Introduction -Data Warehouse Design
Real-Time in Data Warehousing Systems are complex environments n n n Almost never pure Real-Time n n Some latency is a given What do you need? n n n 4 Business rules Various data process flows and dependencies Real Time Near Real-Time Just in Time for the business Mike Schmitz High Performance Data Warehousing
Customer Business Scenario n n Client provides software solutions for utility companies Utility companies have plants generating energy supply n n n Peak demand periods are somewhat predictable Each day is pre-planned on historical behavior n n n Cheaper to buy energy ahead Expensive to have unused capacity Existing data warehouse supports the planning function n n 5 Recommended maximum output capacity Reserve Capacity Buy supplemental energy as needed Reduced option expenses Cut down of supplemental energy costs Mike Schmitz High Performance Data Warehousing
Customer “Real-Time” Requirement n Getting more in-time accuracy enhances operational business n n n Customer Target n n n 6 Compare today's plant output volumes to yesterdays or last week’s average Know when to purchase additional options or supplies Actual data within a 5 minute lag Use a single query Use a single tool Mike Schmitz High Performance Data Warehousing
Sample Analysis Graph 7 Mike Schmitz High Performance Data Warehousing
Our Real-Time Solution Overview n Three-Step Approach: 1. 2. 3. 8 Implement a real-time DW data architecture Near real-time incremental change capture from operational system Transformation and Propagation (population) of change data to DW Mike Schmitz High Performance Data Warehousing
Our Real-Time Solution Real-Time DW Data Architecture n Add a Real-Time “Partition” to our Plant Output Fact Table for current day activity n n n 9 Separate physical table No indexes or RFI constraints (data coming in will have RFI enforced) during daily activity UNION ALL viewed to the Plant Output Fact Table Mike Schmitz High Performance Data Warehousing
Our Real-Time Solution Change Capture and Population 1. Incremental change capture from operational site n 2. Synchronous or Asynchronous Transformation and Propagation (population) of change data to the DW Continuous trickle feed or periodic batch Synch CDC Trigger Staging DW Operations Asynch CDC Batch n 10 Mike Schmitz High Performance Data Warehousing
Our Real-Time Solution Incremental Change Capture n Done with Oracle’s Change Data Capture (CDC) functionality n n n Asynchronous CDC is the preferred mechanism n 11 Synchronous CDC available with Oracle 9 i Asynchronous CDC with Oracle 10 g Decoupling of change capture from the operational transaction Mike Schmitz High Performance Data Warehousing
Asynchronous CDC Redo log files Based on Log Miner Logical Change Data Oracle 10 g DW Tables Transform SQL, PL/SQL, Java n n OLTP DB n n SQL interface to change data Publish/subscribe paradigm Parallel access to log files, leveraging Oracle Streams Parallel transformation of data
Our Real-Time Solution Population of Change Data into DW n Continuous n n Periodic Batch n n 13 Change table owner creates trigger to populate warehouse real-time partition Utilize the Subscribe Interface Subscribe to specific table and column changes through view Sets a window and extracts the changes at required period Purges view and moves window Mike Schmitz High Performance Data Warehousing
Our Real-Time Solution The Daily Process n Integrate daily changes into historical fact table n At the end of the day index the current day table and apply constraints (no validate) n Create new fact table partition n Exchange current day table with new partition n Create next days “Real-Time Partition” table n 14 Mike Schmitz High Performance Data Warehousing
Simplified Functional Demo Schema Owners n AO_CDC_OP n n AO_CDC n n n Owns the CDC change sets and change tables (needs special cdc privileges) ? CDC Publish Role AO_CDC_DW n n 15 Owns the operational schema Owns the data warehouse schema (also needs special cdc privileges) ? CDC Subscribe Role Mike Schmitz High Performance Data Warehousing
Simplified Functional Demo Operational Schema 16 Mike Schmitz High Performance Data Warehousing
Simplified Functional Demo Data Warehouse Schema 17 Mike Schmitz High Performance Data Warehousing
What do we have? n Operational transaction table n n DW historical partitioned fact table n n AO_CDC_DW. F_CURRENT_DAY_PLANT_OUTPUT Data Warehouse UNION ALL view n 18 AO_CDC_DW. F_PLANT_OUTPUT DW current day table (“Real-Time Partition”) n n AO_CDC_OP. PLANT_OUTPUT AO_CDC_DW. V_PLANT_OUTPUT Mike Schmitz High Performance Data Warehousing
First n The CDC user publishes n n 19 Create a Change Set (CDC_DW) Add supplemental logging for the operational table Create a change table for the operational table (CT_PLANT_OUTPUT) Force database logging on the tablespace to catch any bulk insert /*+ APPEND */ (nonlogged) activity Mike Schmitz High Performance Data Warehousing
Next – Transform and Populate n One of two ways n Continuous Feed n n Logged Insert activity Permits nearer real-time Constant system load Periodic Batch Feed n n Permits non-logged bulk operations You set the lag time – how often do you run the batch process? n n n 20 Hourly Every five minutes Less system load overall Mike Schmitz High Performance Data Warehousing
The Continuous Feed n 21 Put an insert trigger on the change table which joins to the dimension tables picking up the dimension keys and does any necessary transformations Mike Schmitz High Performance Data Warehousing
The Batch Feed n The CDC schema owner n n Authorizes AO_CDC_DW to select from the change table (the select will be accomplished via a generated view) The DW schema owner n n n Subscribes to the change table and the columns he needs (with a centralized EDW approach this would usually be the whole change table) with a subscription and view name Activates the subscription Extract n n n 22 Extend the window Extracts changed data via the view (same code as trigger) Purges the window (logical Delete – physical deletion is handled by the CDC schema owner) Mike Schmitz High Performance Data Warehousing
Extraction from Change Table View insert /*+ APPEND*/ into ao_cdc_dw. F_CURRENT_DAY_PLANT_OUTPUT (generating_plant_key, output_day_key, output_minute_key, output_actual_qty_in_kwh) select p. generating_plant_key , d. output_day_key , m. output_minute_key , new. output_in_kwh from ao_cdc_dw. PO_ACTIVITY_VIEW new inner join ao_cdc_dw. d_generating_plant p on new. plant_id = p. plant_id inner join ao_cdc_dw. d_output_day d on trunc(new. output_ts) = d. output_day inner join ao_cdc_dw. d_output_minute m on to_number(substr(to_char(new. output_ts, 'YYYYMMDD HH: II: SS'), 10, 2)||substr(to_char(new. output_ts, 'YYYYMMDD HH: II: SS'), 13, 2)) = m. output_time_24 hr_nbr; 23 Mike Schmitz High Performance Data Warehousing
Next Step n Add the current days activity (the contents of the current day fact table) to the historical fact table as a new partition n n 24 Index and apply constraints to the current day fact table Add a new empty partition to the fact table Exchange the current day fact table with the partition Create the new current day fact table Mike Schmitz High Performance Data Warehousing
Let’s step thru this live 25 Mike Schmitz High Performance Data Warehousing
Summary n n n 26 We created a real-time partition for current day activity We put CDC on the operational table and created a change table populated by an asynchronous process (reads redo log) We demonstrated continuous feed to the DW by using a trigger based approach We demonstrated a batch DW feed by using the CDC subscribe process We showed how to add the current day table to the fact table and set up the next days table An electronic copy of the SQL used to build this prototype is available by emailing mike. schmitz@databaseperormance. com Mike Schmitz High Performance Data Warehousing
Michael Brey Principal Member Technical Staff ST/NEDC Oracle Engineering Oracle Corporation
Overview n n 28 Benchmark Description System Description Database Parameters Performance Data Mike Schmitz High Performance Data Warehousing
The Benchmark n n n 29 Customer OLTP benchmark run internally at Oracle Insurance application handling customer inquires and quotes over the phone N users perform M quotes Quote = actual work performed during a call with a customer Mixture of Inserts, Updates, Deletes, Singleton Selects, Cursor Fetches, Rollbacks/commits, savepoints Compute average time for all quotes across users Mike Schmitz High Performance Data Warehousing
System Info n n n 30 Sun. Fire 4800 A standard Shared Memory Processor (SMP) 8 900 -Mhz CPUs 16 GB physical memory Solaris 5. 8 Database storage: striped across 8 Sun Stor. Edge T 3 arrays (9 X 36. 4 MB each) Mike Schmitz High Performance Data Warehousing
Database Parameters n n n 31 Parallel_max_servers 20 Streams_pool_size 400 M (default 10% shared pool) Shared_pool_size 600 M Buffer cache 128 M Redo buffers 4 M Processes 600 Mike Schmitz High Performance Data Warehousing
Change Data Capture (CDC) Sync Async Hot. Log Async Auto. Log Available Oracle 9 i Oracle 10 g source system cost System resources Minimal Part of txn YES NO NO Changes seen Systems Real time Near real time 1 Variable 32 1 2 Mike Schmitz High Performance Data Warehousing
Tests n n Conducted tests with Asynchronous Hotlog CDC enabled and disabled and with Sync CDC. Asynchronous Hotlog CDC tests conducted at different log usage levels n n Tests run with: n n n 33 Appr. 10, 50, and 100% of all OLTP tables with DML operations were included in CDC 250 concurrent users Continuous peak workload after ramp-up 175 transactions per second Mike Schmitz High Performance Data Warehousing
Impact on Transaction Time 34 Mike Schmitz High Performance Data Warehousing
CPU Consumption Supplemental Logging 35 Mike Schmitz High Performance Data Warehousing
CPU Consumption 10% DML Change tracking 36 Mike Schmitz High Performance Data Warehousing
CPU Consumption 50% DML Change tracking 37 Mike Schmitz High Performance Data Warehousing
CPU Consumption 10%, 100% DML Change tracking 38 Mike Schmitz High Performance Data Warehousing
Latency of Change Tracking n Latency is defined as the time between the actual change and its reflection in the Change Capture Table n n n Latency measurement were made for the 100% Asynchronous Hotlog CDC run 99. 7% of records arrived in less than 2 secs n n n 39 Latency = time[change record insert] – time[redo log insert] 53. 5% of records arrived in less than 1 sec Remaining records arrived in less than 3 sec Asynchronous CDC kept up with the constant high OLTP workload all the time Mike Schmitz High Performance Data Warehousing
Summary n Change Data Capture enables enterprise-ready near real-time capturing of change data n n 40 No fallback for constant high-load OLTP environments Minimal impact on origin OLTP transactions Predictable additional resource requirements, solely driven by the amount of change tracking Oracle provides the flexibility to meet your “ontime” business needs Mike Schmitz High Performance Data Warehousing
Q & A
Next Steps…. Data Warehousing DB Sessions Monday Tuesday 11: 00 AM #40153, Room 304 8: 30 AM #40125, Room 130 Oracle Warehouse Builder: New Oracle Database 10 g Release 3: 30 PM #40176, Room 303 Oracle Database 10 g: A Spatial VLDB Case Study 3: 30 PM #40177, Room 303 Security and the Data Warehouse Building a Terabyte Data Warehouse, Using Linux and RAC 4: 00 PM #40166, Room 130 5: 00 PM #40043, Room 104 Oracle Database 10 g SQL Model Clause Data Pump in Oracle Database 10 g: Foundation for Ultrahigh-Speed Data Movement Mike Schmitz High Performance Data Warehousing For More Info On Oracle BI/DW Go To http: //otn. oracle. com/products/bi/db/dbbi. html 42
Next Steps…. Data Warehousing DB Sessions Thursday 8: 30 AM #40179, Room 304 Business Intelligence and Data Warehousing Demos All Four Days In The Oracle Demo Campground Oracle Database 10 g Data Warehouse Backup and Recovery Oracle Database 10 g 11: 00 AM #36782, Room 304 Oracle OLAP Experiences with Real-Time Data Warehousing Oracle 10 g Oracle Data Mining 1: 00 PM #40150, Room 102 Turbocharge your Database, Using the Oracle Database 10 g SQLAccess Advisor Oracle Warehouse Builder Oracle Application Server 10 Mike Schmitz High Performance Data Warehousing For More Info On Oracle BI/DW Go To http: //otn. oracle. com/products/bi/db/dbbi. html 43
Reminder – please complete the Oracle. World online session survey Thank you. 44 Mike Schmitz High Performance Data Warehousing
a683d50f38823f56eb703799bd1464b6.ppt