0b3932de892ff2aebbb35849e5d31933.ppt
- Количество слайдов: 20
® IBM Software Group Database Replication and Change Propagation Technologies for Continuous Availability WMO TECO-WIS Convention Seoul, November 8 th, 2006 Serge Bourbonnais Database Replication Silicon Valley laboratory
IBM Software Group | DB 2 Information Management Software Abstract - IBM Database Replication Technologies • Database replication technologies allow an IT infrastructure to achieve continuous availability of the enterprise operations, by providing solutions for Disaster Recovery, Workload Isolation, and Information Integration. • When used for Disaster Recovery, each replicated database can be fully active, and copies do not need to be identical. Some trade-offs include administrative costs, and the overhead of capturing and applying changes. • For Workload Isolation, the replication process can manage conflicts that may arise from the application workload, database constraints, loading a target while changes are still occurring at the source, or changes arriving out of order in a multi-node configuration. Conflict resolution either relies on timestamp and the origin of each change, or on a designated master. Configurations for data distribution and consolidation to/from hundreds of databases can also be deployed. • For Information Integration, the replication process deals with heterogeneous data schemas, data stores, or even data models. IBM Database replication technologies can capture and propagate changes with low-latency at high-throughput over long distances, while preserving database transactional integrity, and tolerating system outages or intermittent connectivity.
IBM Software Group | DB 2 Information Management Software Agenda § Database Replication Technologies for continuous availability 4 In support of the Global Enterprise: From Continuous Availability to Business Integration 4 Where Database Replication fits 4 Where replication does not fit § IBM Product Architecture and Capabilities 4 Capture, Apply, Federation, and Transforms 4 Topologies, Conflict Detection and Resolution § Sample Implementations
IBM Software Group | DB 2 Information Management Software Why Replication in an Information System? 1. Disaster Recovery Requirements § Goal: High-Availability § Applications: Standby copy for failover, Scheduled and Unscheduled outages Requirements Less § Requirements: Minimize recovery time and eliminate or reduce data loss. Preserve transactional consistency. 2. Workload Isolation § Goal: High-Availability, Improve Performance § Applications: Data Distribution/Consolidation, Regional Data Centers, Caches § Requirements: Maintain live copies or subsets for working in disconnected mode, often geographically distributed. Need to detect and resolve conflicts, if any. Data mappings and transformations. 3. Information Integration § Goal: High-Availability, Improve Performance, Global Enterprise View § Applications: Analytics, Enterprise Business Integration § Requirements: Moving data to/from heterogeneous data stores. Cleansing and transformations. Assembling objects with data from several sources. More
IBM Software Group | DB 2 Information Management Software From Disaster Recovery to Information Integration: Replication needs are cumulating - Semantics are increasing. Requirements and Scenarios Maintain a full database copy for Disaster Recovery Database Maintain a logical database subset for Disaster Recovery, Workload Isolation Relational tables Business Objects <order><oid>197</oid> <pid>AS 207</pid> <desc>Wheel</desc> <qty>1</qty> </order> Propagated Objects Technologies and Products Publish changes (with transformations) for Disaster Recovery, Workload Isolation, Data Integration Log Shipping • HADR (LUW) Disk Mirroring • GDPS • PPRC Logical Replication • Data Propagator • Q Replication Event Publishing • Q Event Publish • II Federation Integration Software • Data. Stage Database Replication Application Space
IBM Software Group | DB 2 Information Management Software Application Space for Database Replication Technologies § Database Replication is a good fit when: q Asynchronous Capture and Delivery q Outages. Network, servers, a site, the RDBMS. q Occasionally connected q Non-identical sources and targets q Different platforms. OS, RDBMS, even data models q Different shapes. Sub-setting required q Row-level transformations. , Codepages, Schemas q Update-Anywhere with possible conflicts q Only possible with replication q When some data loss is tolerable in case of a major disaster. q Often, solution can be designed to limit loss to a few seconds. q Fast delivery over large distances (1000 s km) q Several 10 000 s rows/second achievable (up to 100 000 row/sec) q Avoid or minimize full-refresh of data at the target § Other factors: q Minimize down-time, administrative cost, application performance impact Replication Technologies guarantee Transactional Consistency with Resilience
IBM Software Group | DB 2 Information Management Software Limits of Database Replication technologies q. Non-zero Data Loss required in case of Disaster (fire, flood) Use Synchronous technologies instead, i. e. , HADR, PPRC q. Set-level. transformations are required on the data Use ETL software instead However, replication can be used to feed a staging area for ETL tools. Replication can hide the differences between the target and the source (database schema, data model, codepage, hardware architecture) and provide a continuous, asynchronous feed. q. Business Objects need to be assembled Develop applications in the application layer Other factors q Cost-Benefit analysis of the solutions, given the requirements
IBM Software Group | DB 2 Information Management Software IBM SQL Replication § Staging is in relational tables § Control, and Monitoring information also in relational tables § Transport is over a database connection Source server Target servers Staging Tables DB 2 database recovery log Capture Apply DB 2 Apply SQL DB 2 Apply Non-DB 2 Control tables Triggers • Informix • Oracle • Sybase • SQL Server Apply DB 2 Informati on Integrator DB 2 Non-DB 2 • z/OS • i. Series • UDB LUW • Informix • Oracle • Sybase • SQL Server • Teradata
IBM Software Group | DB 2 Information Management Software A parenthesis: Database Federation §Remote objects (structured files, tables, spreadsheets) appear to the application as if local tables in a DB 2 database § Local and non-local data can be manipulated in the same SQL statement CREATE NICKNAME ORAT 3 FOR ORACLE 9. SCOTT. T 3 INSERT INTO ORAT 3 VALUES(5) SELECT * FROM ORAT 3 T 1 is a Table; ORAT 3 a Nicknames appear as local tables. For example: > db 2 list tables Table/View ---------------T 1 ORAT 3 CUSTOMERS Schema -------BOURBON Type ----T N T
IBM Software Group | DB 2 Information Management Software IBM Q Replication § Staging and Transport over MQSeries persistent message queues § High-throughput, Low-latency. Apply with parallel agents Source server Web. Sphere Queue Manager (or client) DB 2 database recovery log Target servers Apply Admin queue Capture DB 2 Send queues Apply • z/OS • UDB LUW Control Tables • z/OS • UDB LUW • VM/VSE Non-DB 2 Control tables Restart queue Apply DB 2 Informati on Integrator Control Tables • Informix • Oracle • Sybase • SQL Server • Teradata
IBM Software Group | DB 2 Information Management Software Performance § Q Replication is between 3 to 10 times faster than SQL replication 4 Higher throughput and shorter latency 4 Capture measured throughput: 49000 rows/second (V 9. 1) 4 Latency less than 2 seconds achievable over 1000 s of kilometers § Measured Time to clear up receive queues after an outage 4 1, 000 rows accumulated in target receive queue 4 Continuous arrival rate: 5, 000 rows per second 4 Time to re-sync target database = 91 seconds (1) Turbo Freeway (2064 -216) 2 LPARs 4 CP for the source system and 4 cp for the target system.
IBM Software Group | DB 2 Information Management Software Q Replication subscriptions - defining target copies § Projection over columns and rows of a table: 4 Only changes for subscribed tables are sent 4 Some transactions can be ignored (e. g. , by owner ID, trans ID, with signal or command) 4 Some operations can be ignored (e. g. , delete) 4 Filter rows with a predicate (e. g. , WHERE : LOCATION ='EAST' AND : SALES > (SELECT SUM(expense) FROM STORES WHERE stores. deptno = : DEPTNO) § Database Schema mappings examples: 4 1 column to N columns, e. g. , [ : C 1 || : C 2] 4 N columns 1 columns, e. g. , [substr(: C 2, 2, 3)] 4 Generated columns, e. g. , [CURRENT TIMESTAMP] Capture side: ORDERS oid price Replication handles codepage conversion, architecture difference Apply side IBMORDERS ibm. ID price ts
IBM Software Group | DB 2 Information Management Software Q Replication Subscription Types § Unidirectional 4 Changes are replicated in one direction 4 1: N – N: 1 topologies – Distribution and Consolidation Source 4 Changes can be filtered and transformed Target(s) § Bidirectional – master/slave 4 Changes replicated in both directions 4 Conflicts detected on data values: Primary Secondary/backup • Conflict rules: Check key, changed only, or all columns 4 One server designated as winner • Conflict action: Force, ignore, merge change 4 Tree topologies only 4 Minimum overhead § Peer to peer – no master, use timestamps 4 Conflicts resolved by using most recent version, no master copy - Handles out of order arrivals (e. g. , delete before insert) 4 Requires extra columns and triggers
IBM Software Group | DB 2 Information Management Software Data Distribution from a (CCD) staging area SQL Apply Q Capture Q Apply Q Replication Read/write Source Table MQSeries Target Table SQL Connections SQL Apply Read-only CCD Table … Target Table SQL Apply … Target Table
IBM Software Group | DB 2 Information Management Software Consistent Changed Data (CCD) Apply targets § Usages: 4 AUDIT trail of database changes. • Answer: Who changed what, when, and how? 4 Staging table for data distribution (with SQL Apply) PARTS_CCD COMMITSEAUTHID OPERATI LOGMARK XPARTN Q ON ER O current O 1 USER_A U timestamp A 7571 current 2 USER_B I null A 7981 timestamp current 3 USER_A D null A 7981 timestamp XPRICE E 4. 31 5. 0 3 null 121. 0 3 null • For updates, before values can be optionally present in the CCD (e. g. , XPARTNO) • Condensed CCD: Contains only the latest changed value of each row • Complete CCD: Initially created with values for all rows from the source table.
IBM Software Group | DB 2 Information Management Software Event Publishing Function § Capture changed data in real time § Correlate by transactions within a single database § Output: XML or CSV Usage § Building the Data Warehouse § Business Integration § Auditing requirements DB 2 z/OS and LUW IMS VSAM CA IDMS Data. Stage Log-based capture Capture Software AG Adabas Web. Sphere MQ Web. Sphere Business Integration Web. Sphere MQ Integrator Broker Target DBs User Application JMS-aware Application
IBM Software Group Mazda Challenge § Support 700 dealers in USA § Trouble matching customer demand with available inventory § More current data needed to track sales achievements with period-end goals Solution § Sales and inventory information is replicated every minute to portal server § Improved access to current data without changes to existing IT infrastructure “Within 5 weeks of receiving the [Web. Sphere] Information Integrator product we were able to implement it in our … environments. It now provides us up to the minute sales activity. ” Joe Neria, Software Consultant. Mazda Business benefits § Increased auto sales § Improved dealer satisfaction § Currency of information improved by 93% Technology benefits § Re-used existing application and data base infrastructure § Decreased network load compared to full data refreshes 4 times an hour § Ease and speed of deployment 17
IBM Software Group International provider of financial & investment services Challenge § Corporate initiative to provide customers better performing real-time queries by utilizing multiple sites. § Replication of critical order processing details for core business functionality Solution §Q Replication for high speed movement of up to 10 Million transactions to secondary site several thousand miles away. Current implementation is Uni. Directional with peer-to-peer plans. Business benefits § Replicating 5 -10 Million transactions with less than 2 seconds latency. § More efficient and cost-effective resource utilization § Secondary platform services reporting and business intelligence queries and acts as backup to primary Technology benefits § Real-time back up of secondary system provides results in increased capacity for peak workloads. 18
IBM Software Group Citi. Street Challenge § Support single sign-on access through both Web and IVR applications ensuring 24 x 7 portal access for plan participants and sponsors Solution § Support redundant, active single sign-on applications for failover processing replicating profile changes between them in real time. “Since nearly 10 million of Citi. Street customers are offered 24 -hour access to their retirement accounts, the company can't afford downtime and must be able to replicate data changes when they happen. We fully replicate our database over redundancy data lines, so to us the stability and speed of that asynchronous replication is strategic for us. " Barry Strasnick , CIO Citi. Street Overview • Citi. Street is one of the largest and most experienced global benefits providers servicing over 9 million plan participants across all markets. Citi. Street was formed in partnership between subsidiaries of State Street Corporation and Citigroup Business benefits § Ensure application availability for plan participants and sponsors § The new solutions from IBM will improve data integrity with a reduced level of maintenance Technology benefits § Maintain bi-directional synchronization of profile updates (approx 175, 000 updates daily) in real time 19
IBM Software Group | DB 2 Information Management Software Summary § IBM develops Data Propagation technologies to provide Continuous Availability and achieve a Global Integrated view of the enterprise in an heterogeneous environment § Q Replication (IBM Web. Sphere Replication Server) delivers low latency, high throughput, and resilience. It is best-of-breed for heavy OLTP workloads, providing resilience and preserving transactional integrity throughout outages while minimizing the need for full data refreshes.
0b3932de892ff2aebbb35849e5d31933.ppt