f81cd25258996ea3d7ee649fe3d11833.ppt
- Количество слайдов: 54
IBM Systems and Technology Group GDPS/Active-Active Overview (1. 4) David Petersen petersen@us. ibm. com © 2014 IBM Corporation
IBM Systems and Technology Group Trademarks The following are trademarks of the International Business Machines Corporation in the United States and/or other countries. IBM* IBM (logo)* Ibm. com* AIX* DB 2* DS 6000 DS 8000 Dynamic Infrastructure* ESCON* Flash. Copy* GDPS* Hyper. Swap IBM* IBM logo* Parallel Sysplex* POWER 5 Redbooks* Sysplex Timer* System p* System z* Tivoli* z/OS* z/VM* * Registered trademarks of IBM Corporation The following are trademarks or registered trademarks of other companies. Adobe, the Adobe logo, Post. Script, and the Post. Script logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries. Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or both and is used under license there from. Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. Microsoft, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. Infini. Band is a trademark and service mark of the Infini. Band Trade Association. Intel, Intel logo, Intel Inside logo, Intel Centrino logo, Celeron, Intel Xeon, Intel Speed. Step, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. UNIX is a registered trademark of The Open Group in the United States and other countries. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. ITIL is a registered trademark, and a registered community trademark of the Office of Government Commerce, and is registered in the U. S. Patent and Trademark Office. IT Infrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency, which is now part of the Office of Government Commerce. * All other products may be trademarks or registered trademarks of their respective companies. Notes: Performance is in Internal Throughput Rate (ITR) ratio based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput improvements equivalent to the performance ratios stated here. IBM hardware products are manufactured from new parts, or new and serviceable used parts. Regardless, our warranty terms apply. All customer examples cited or described in this presentation are presented as illustrations of the manner in which some customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics will vary depending on individual customer configurations and conditions. This publication was produced in the United States. IBM may not offer the products, services or features discussed in this document in other countries, and the information may be subject to change without notice. Consult your local IBM business contact for information on the product or services available in your area. All statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only. Information about non-IBM products is obtained from the manufacturers of those products or their published announcements. IBM has not tested those products and cannot confirm the performance, compatibility, or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. Prices subject to change without notice. Contact your IBM representative or Business Partner for the most current pricing in your geography. 2 © 2014 IBM Corporation
IBM Systems and Technology Group Agenda § Level set § Requirements § Concepts § Configurations § Sample Scenarios § Use Cases § Summary 3 © 2014 IBM Corporation
IBM Systems and Technology Group Suite of GDPS service products to meet various business requirements for availability and disaster recovery Continuous Availability of Data within a Data Center Continuous Availability with DR within Metropolitan Region Disaster Recovery Extended Distance CA Regionally and Disaster Recovery Extended Distance GDPS/PPRC HM GDPS/PPRC GDPS/GM & GDPS/XRC GDPS/MGM & GDPS/Mz. GM RPO=0 [RTO secs] RPO=0 RTO mins / RTO<1 h RPO secs, RTO<1 h RPO=0, RTO mins/<1 h & RPO secs, RTO<1 h for disk only (<20 km) (>20 km) Single Data Center Two Data Centers Three Data Centers Applications remain active Systems remain active High availability for site disasters Continuous access to data in the event of a storage outage Multi-site workloads can withstand site and/or storage failures Rapid Systems D/R w/ “seconds” of data loss Disaster Recovery for out of region interruptions Disaster recovery for regional disasters RPO – recovery point objective RTO – recovery time objective 4 © 2014 IBM Corporation
IBM Systems and Technology Group Interagency Paper on Sound Practices to Strengthen the Resilience of the U. S. Financial System [Docket No. R-1128] (April 7, 2003) 1. Identify clearing and settlement activities in support of critical financial markets 2. Determine appropriate recovery and resumption objectives for clearing and settlement activities in support of critical markets – core clearing and settlement organizations should develop the capacity to recover and resume clearing and settlement activities within the business day on which the disruption occurs with the overall goal of achieving recovery and resumption within two hours after an event 3. Maintain sufficient geographically dispersed resources to meet recovery and resumption objectives – Back-up arrangements should be as far away from the primary site as necessary to avoid being subject to the same set of risks as the primary location – The effectiveness of back-up arrangements in recovering from a wide-scale disruption should be confirmed through testing 4. Routinely use or test recovery and resumption arrangements – One of the lessons learned from September 11 is that testing of business recovery arrangements should be expanded 5 © 2014 IBM Corporation
IBM Systems and Technology Group How Much Interruption can your Business Tolerate? Ensuring Business Continuity: Standby § Disaster Recovery What is the cost of 1 hour of downtime during core business hours ? – Restore business after an unplanned outage § High Availability – Meet Service Availability objectives, e. g. , 99. 9% availability or 8. 8 hours of down-time a year Active/Active § Continuous Availability – No downtime (planned or not) Enterprises that operate across time-zones no longer have any ‘off-hours’ window, Continuous Availability is required 6 © 2014 IBM Corporation
IBM Systems and Technology Group Disruptions affect more than the bottom line… Disruptions affect more than the bottom line … August 18, 2013 Google total eclipse sees 40 percent drop in Internet traffic August 22, 2013 Nasdaq: ‘Connectivity issue‘ led to three-hour shutdown July 20, , 2013 DMV Computers Fail Statewide, Police Can’t Access Database April 16, 2013 American Airlines Grounds Flights Nationwide … with enormous impact on the business § § § Downtime costs can equal up to 16 percent of revenue 1 4 hours of downtime severely damaging for 32 percent of organizations 2 Data is growing at explosive rates – growing from 161 EB in 2007 to 988 EB in 2010 3 Some industries fine for downtime and inability to meet regulatory compliance Downtime ranges from 300– 1, 200 hours per year, depending on industry 1 1 Infonetics Research, The Costs of Enterprise Downtime: North American Vertical Markets 2005, Rob Dearborn and others, January 2005 2 Continuity Central, “Business Continuity Unwrapped, ” 2006, http: //www. continuitycentral. com/feature 0358. htm 3 The Expanding Digital Universe: A Forecast of Worldwide Information Growth Through 2010, IDC white paper #206171, March 2007 7 © 2014 IBM Corporation
IBM Systems and Technology Group Evolving customer requirements § Shift focus from failover model to near-continuous availability model (RTO near zero) § Access data from any site (unlimited distance between sites) § Multi-sysplex, multi-platform solution – “Recover my business rather than my platform technology” § Ensure successful recovery via automated processes (similar to GDPS technology today) – Can be handled by less-skilled operators § Provide workload distribution between sites (route around failed sites, dynamically select sites based on ability of site to handle additional workload) § Provide application level granularity – Some workloads may require immediate access from every site, other workloads may only need to update other sites every 24 hours (less critical data) – Current solutions employ an all-or-nothing approach (complete disk mirroring, requiring extra network capacity) 8 © 2014 IBM Corporation
IBM Systems and Technology Group From High Availability to Continuous Availability GDPS/PPRC GDPS/XRC or GDPS/GM GDPS/Active-Active Failover Model Near CA model Recovery Time ≈ 2 min Recovery Time < 1 hour Recovery time < 1 minute Distance < 20 km Unlimited distance § GDPS/Active-Active is for mission critical workloads that have stringent recovery objectives that can not be achieved using existing GDPS solutions – – RTO approaching zero, measured in seconds for unplanned outages RPO approaching zero, measured in seconds for unplanned outages Non-disruptive site switch of workloads for planned outages At any distance § Active-Active is NOT intended to substitute for local availability solution such as Parallel Sysplex 9 © 2014 IBM Corporation
IBM Systems and Technology Group There are multiple GDPS service products under the GDPS solution umbrella to meet various customer requirements for Availability and Disaster Recovery Continuous Availability of Data within a Data Center Continuous Availability with DR within Metropolitan Region Disaster Recovery Extended Distance CA Regionally and Disaster Recovery Extended Distance CA, DR, & Cross-site Workload Balancing Extended Distance GDPS/PPRC HM GDPS/PPRC GDPS/GM & GDPS/XRC GDPS/MGM & GDPS/Mz. GM GDPS/Active-Active RPO=0 [RTO secs] RPO=0 RTO mins / RTO<1 h for disk only (<20 km) (>20 km) RPO secs, RTO<1 h RPO=0, RTO mins/<1 h RPO secs, RTO secs & RPO secs, RTO<1 h Single Data Center Two Data Centers Three Data Centers Applications remain active Systems remain active Rapid Systems D/R w/ “seconds” of data loss High availability for site disasters Automatic workload switch in seconds; Disaster recovery seconds of data for regional loss disasters Continuous access Multi-site workloads to data in the event can withstand site of a storage and/or storage outage failures Disaster Recovery for out of region interruptions Two or more Active Data Centers RPO – recovery point objective RTO – recovery time objective 10 © 2014 IBM Corporation
IBM Systems and Technology Group Active/Active Sites concept § Two or more sites, separated by unlimited distances, running the same applications and having the same data to provide: Connections – Cross-site Workload Balancing – Continuous Availability – Disaster Recovery § Data at geographically dispersed sites kept in sync via replication Workload Distributor Replication Workloads are managed by a client and routed to one of many replicas, depending upon workload weight and latency constraints; extends workload balancing to SYSPLEXs across multiple sites 11 Monitoring spans the sites and now becomes an essential element of the solution for site health checks, performance tuning, etc © 2014 IBM Corporation
IBM Systems and Technology Group Active/Active Sites Configurations § Configurations 1. Active/Standby – GA date 30 th June 2011 2. Active/Query – GA date 31 st October 2013 3. … § A configuration is specified on a workload basis § A workload is the aggregation of these components – Software: user written applications (eg: COBOL programs) and the middleware run time environment (eg: CICS regions, Info. Sphere Replication Server instances and DB 2 subsystems) – Data: related set of objects that must preserve transactional consistency and optionally referential integrity constraints (such as DB 2 Tables, IMS Databases and VSAM Files) – Network connectivity: one or more TCP/IP addresses & ports (eg: 10. 10. 1: 80) 12 © 2014 IBM Corporation
IBM Systems and Technology Group Active/Standby configuration Connections Static Routing Automatic Failover Application A, B active standby Application A, B active A B Workload Distributor queued site 1 >> IMS VSAM >> DB 2 Replication << DB 2 << IMS VSAM site 2 This is a fundamental paradigm shift from a failover model to a continuous availability model 13 © 2014 IBM Corporation
IBM Systems and Technology Group Active/Query configuration Connections Appl B (grey) is in active/query configuration Appl A (gold) is in active/standby configuration • using same data as Appl A but read only • active to both site 1 & site 2, but favor site 1 • Appl B query routing according to Appl A latency policy • performing updates in active site [site 2] Workload Distributor A B B B latency<3; [A] latency=2; as latency is less latency>5; “max latency” than “resetbeen exceeded. policy has latency”, policy, route “max latency” follow route more queries to site 1 to skew queries to site 1 all queries to site 2 M site 1 IMS VSAM DB 2 Replication DB 2 << << IMS VSAM site 2 Read-only or query connections to be routed to both sites, while update connections are routed only to the active site 14 © 2014 IBM Corporation
IBM Systems and Technology Group What is a GDPS/Active-Active environment? § Two Production Sysplex environments (also referred to as sites) in different locations – One active, one standby – for each defined update workload, and potential query workload active in both sites – Software-based replication between the two sysplexes/sites • DB 2, IMS and VSAM data is supported § Two Controller Systems – Primary/Backup – Typically one in each of the production locations, but there is no requirement that they are co-located in this way § Workload balancing/routing switches – Must be Server/Application State Protocol compliant (SASP) • RFC 4678 describes SASP – What switches/routers are SASP-compliant? … the following are those we know about • Cisco Catalyst 6500 Series Switch Content Switching Module • F 5 Big IP Switch • Citrix Net. Scaler Appliance • Radware Alteon Application Switch (bought Nortel appliance line) 15 © 2014 IBM Corporation
IBM Systems and Technology Group Sample scenario – both sites active for individual workloads Network SASP-compliant Routers LB 1° Tier F 5 Routing for WKLD 1 & 3 WKLD 2 AA Controller [AAC 1] LB 2° Tier Sysplex Distrib Backup [AAPLEX 1] Sysplex-A 1 A 2 Prod-sys [A 1 P 1] [A 1 P 2] [A 2 P 1] [A 2 P 2] wkld 1 active wkld 1 standby wkld 2 active wkld 2 standby wkld 3 active DB 2 Site 1 VSAM IMS wkld 3 standby S/W Replication DB 2 VSAM Sysplex-A 2 A 1 Prod-sys [AAPLEX 2] A 1 Prod-sys wkld 3 active 16 [AAC 2] LB 2° Tier Sysplex Distrib Primary AA Controller wkld 3 standby IMS Site 2 © 2014 IBM Corporation
IBM Systems and Technology Group What S/W makes up a GDPS/Active-Active environment? § GDPS/Active-Active § IBM Tivoli Net. View Monitoring for GDPS, which pre-reqs: – IBM Tivoli Net. View for z/OS • IBM Tivoli Net. View for z/OS Enterprise Management Agent (Net. View agent) – separate orderable § System Automation for z/OS § IBM Multi-site Workload Lifeline for z/OS § Middleware – DB 2, IMS, CICS… § Replication Software – IBM Info. Sphere Data Replication for DB 2 for z/OS (IIDR for DB 2) – IBM Info. Sphere Data Replication for IMS for z/OS (IIDR for IMS) – IBM Info. Sphere Data Replication for VSAM for z/OS (IIDR for VSAM) § Optionally the Tivoli OMEGAMON XE monitoring products – Individually or part of a suite Integration of a number of software products 17 © 2014 IBM Corporation
IBM Systems and Technology Group Automation – deeper insight § All components of a Workload should be defined in SA* as – One or more Application Groups (APG) – Individual Applications (APL) § The Workload itself is defined as an Application Group § SA z/OS keeps track of the individual members of the Workload's APG and reports a “compound” status to the A/A Controller TCP/IP DDS=On. Demand | Asis DB 2 HP HP HP CICS TOR CICS AOR HP CAPTURE CICS/APG APPLY Legend DDS: HP: On. Demand: Asis: Default Desired Status Has. Parent Resource is UNAVAILABLE at IPL time Resource is kept in the state it is at IPL time MYWORKLOAD/APG * Note that although SA is required on all systems, you can be using an alternative automation product to manage your workloads. 18 © 2014 IBM Corporation
IBM Systems and Technology Group Automation – sharing components between workloads Infrastructure § Certain components of a Workload, for instance DB 2, could be also viewed as “infrastructure” JES HP TCP/IP HP Agt HP DB 2 MQ HP HP § Relationship(s) from the Workload ensure that the supporting “infrastructure” resources are available when needed § Infrastructure is typically started at IPL time HP HP CICS TOR Lifeline CICS AOR HP APPLY CAPTURE CICS/APG MYWORKLOAD_2/APG 19 © 2014 IBM Corporation
IBM Systems and Technology Group Automation – sharing components between workloads Infrastructure Shared members § Other components of a Workload, for instance, capture and apply engines can also be shared TCP/IP JES HP Agt HP HP DB 2 HP § However, GDPS requires that they are members of the Workload Lifeline MQ HP Rationale § The A/A Controller needs to know the capture and apply engines that belong to a Workload in order to – Quiesce work properly including replication – Send commands to them 20 CICS 31/ APG CAPTURE MYWKL_31/APG APPLY CICS 32/ APG MYWKL_32/APG © 2014 IBM Corporation
IBM Systems and Technology Group Software replication – Deeper Insight Capture latency 2 Log Capture Network latency 3 Apply latency 4 Apply 1 TARGET 3 5 TARGET 2 TARGET 1 SOURCE 3 SOURCE 2 SOURCE 1 Replication latency (E 2 E) 1. Transaction committed 2. Capture read the DB updates from the log 3. Capture sends the updates to Apply 4. Apply receives the updates from Capture 5. Apply applies the DB updates to the target databases 21 © 2014 IBM Corporation
IBM Systems and Technology Group Connectivity – deeper insight SYSPLEX 1 site-1 Application / Database Tier Primary Controller Lifeline Advisor Net. View 5 2 2 3 1 st Tier 1 TCP/IP SYS-A 4 2 nd Tier LB Server Applications Lifeline Agent S E TCP/IP 2 SYS-B LB Agent 4 1 Application / Database Tier 1 1. 2 2. 3 3. 4. 4 5. 5 Secondary Controller Lifeline Advisor Net. View Advisor to Agent Advisor to LBs Advisor to SEs Advisor NMI TCP/IP SYS-C 2 nd Tier LB SYS-D 22 Lifeline Server Applications Lifeline Agent S E TCP/IP site-2 Server Applications Agent SYSPLEX 2 © 2014 IBM Corporation
IBM Systems and Technology Group GDPS/A-A configuration Network TEP Interface GDPS Web Interface LB 1° Tier F 5 Primary Controller AAC 1 Backup Controller Netview Backup Netview Master LLAdvisor Primary A 1 Production 1 A 1 P 1 LB 2° Tier LLAdvisor Secondary LB 2° Tier Sysplex Distrib TEMA Sysplex Distrib A 1 Production 2 A 1 P 2 A 2 Production 2 TEMA A 2 P 2 A 2 Production 1 LLAgent MQ / TCPIP Workload 1 Workload 3 Active DB 2 Rep IMS Rep CICS/DB 2 IMS Appl 23 A 2 P 1 LLAgent MQ / TCPIP Site 1 AAC 2 DB 2 Rep IMS Rep CICS/DB 2 IMS Appl IMS VSAM replication resources not shown for clarity sake Workload 1 Workload 3 Standby DB 2 Rep IMS Rep CICS/DB 2 IMS Appl S/W Replication DB 2 Rep IMS Rep CICS/DB 2 IMS Appl IMS Site 2 © 2014 IBM Corporation
IBM Systems and Technology Group Planned workload switch Initiate Network [click here] SS TEP Interface LB 1° Tier F 5 GDPS Web Interface SWITCH ROUTING AAC 1 AAC 2 LB 2° Tier Sysplex Distrib primary Sysplex Distrib backup WKLD 3 WKLD 2 standby active WKLD 1 CICS/DB 2 Appl AAPLEX 1 standby active CICS/DB 2 Appl DB 2 Rep [DB 2 Rep] A 1 P 1 <<<< >>>> DB 2 Rep A 1 P 2 IMS VSAM [DB 2 Rep] A 2 P 2 >> DB 2 << DB 2 AAPLEX 2 standby active VSAM Site 1 A 2 P 1 IMS Site 2 Note: multiple workloads and needed infrastructure resources are not shown for clarity sake 24 © 2014 IBM Corporation
IBM Systems and Technology Group Unplanned site failure Network TEP Interface LB 1° Tier GDPS Web Interface F 5 START ROUTING AAC 1 LB 2° Tier Sysplex Distrib WKLD 3 WKLD 2 WKLD 1 CICS/DB 2 Appl AAPLEX 1 standby active CICS/DB 2 Appl <<<< DB 2 Rep queued [DB 2 Rep] WKLD 2 active WKLD 1 Failure Detection Interval = 60 sec CICS/DB 2 SITE_FAILURE = Automatic Appl DB 2 Rep A 1 P 2 VSAM WKLD 2 [DB 2 Rep] A 2 P 2 >> DB 2 AAPLEX 2 standby active backup WKLD 3 Automatic switch WKLD 2 IMS AAC 2 Sysplex Distrib primary A 1 P 1 STOP ROUTING << DB 2 VSAM Site 1 A 2 P 1 IMS Site 2 Note: multiple workloads and needed infrastructure resources are not shown for clarity sake 25 © 2014 IBM Corporation
IBM Systems and Technology Group Go Home scenario After an unplanned workload/site outage After a planned workload/site outage Note: there is the potential for transactions to have been stranded in the failed site, had completed execution and committed data to the database at the time of the failure, but this data had not been replicated to the standby site. Note: as the process to perform a planned site switch ensures that there are no stranded updates in the active site at the start of the switch, there is no need to start replication in the opposite direction in order to deliver stranded updates. Assume the data is still available on the disk subsystems Start the site or workload that had failed Start the site or workload that had been stopped Restart replication from the site brought back online to the currently active site - this delivers any stranded changes resulting from the unplanned outage (*) Re-synchronize the recovering site with data from the currently active site, by starting replication in the other direction Re-synchronize the restarted site or workload with data from the currently active site, by starting replication from the active to now standby site Re-direct the workload, once the recovered site is Re-direct the workload, once the restarted site is operational and can process workloads both operational and the data replication has caught up and can now process workloads (*) attempts to apply the stranded changes to the data in the active site may result in an exception or conflict, as the before image of the update that is stranded will no longer match the updated value in the active site. For IMS replication, the adaptive apply process will discard the update and issue messages to indicate that there has been a conflict and an update has been discarded. For DB 2 replication, the update may not be applied, depending on conflict handling policy settings, and additionally an exception record will be inserted into a table. 26 © 2014 IBM Corporation
IBM Systems and Technology Group Disk Replication and Software Replication with GDPS Connections Standby Sysplex B Active Sysplex A Workload Distributor RTO a few seconds SW replication for DB 2/IMS/VSAM DB 2, IMS, VSAM SW replication managed by GDPS/A-A DB 2, IMS, VSAM System Volumes Batch, Other disk replication managed by GDPS/MGM DR Sysplex A RTO < 1 hour HW replication for all data in region Two switch decisions for Sysplex A problems … Workload Switch – switch to SW copy (B); once problem is fixed, simply restart SW replication Site Switch – switch to SW copy (B) and restart DR Sysplex A from the disk copy 27 © 2014 IBM Corporation
IBM Systems and Technology Group Disk Replication Integration § Provide DR for whole production sysplex (AA workloads & non-A/A workloads) § Restore A/A Sites capability for A/A Sites workloads after a planned or unplanned region switch § Restart batch workloads after the prime site is restarted and re-synced § The disk replication integration is optional SW replication for IMS/DB 2 and/or VSAM – RTO a few seconds HW replication for all data in region – RTO < 1 hour 28 © 2014 IBM Corporation
IBM Systems and Technology Group GDPS Disk Replication Integration Connections Region A Region B Sysplex A Workload Distributor Sysplex B WKLD-1 active WKLD-2 standby WKLD-3 -standby Sysplex A WKLD-1 standby WKLD-3 active Sysplex B (System, Batch, other) SW replication (System, Batch, other) for DB 2, IMS and/or VSAM MM MM GM Sysplex B’ Sysplex A’ GM HW replication for all data in region High Availability in Region & DR Protection in other Region 29 © 2014 IBM Corporation
IBM Systems and Technology Group Unplanned Region Switch – Restart A/A & non-A/A workloads Connections Region A Sysplex A Region B Sysplex B 1 Workload Distributor WKLD-1 active WKLD-2 active WKLD-3 active Sysplex B (System, Batch, other) MM SW replication 4 Sysplex B’ HW replication (suspended) 30 5 Sysplex A (System, Batch, other) Sysplex A’ 3 2 1. Switch A/A workloads from Region A to Region B 2. Recover Sysplex A secondary /tertiary disk 3. Restart Sysplex A’ in Region B Potential manual tasks … (not automated by GDPS) 4. Start software replication from B to A’ using adaptive (force) apply 5. Start software replication from A’ to B with default (ignore) apply 6. Manually reconcile exceptions from force (step 4) © 2014 IBM Corporation
IBM Systems and Technology Group Deployment of GDPS/Active-Active § Option 1 – create new sysplex environments for active/active workloads – Simplifies operations as scope of Active/Active environment is confined to just this or these specific workloads and the Active/Active managed data § Option 2 – Active/Active workload and traditional workload co-exist within the same sysplex – Still will need new active sysplex for the second site – Increased complexity to manage recovery of Active/Active workload to one place, and remaining systems to a different environment, from within the same sysplex – Existing GDPS/PPRC customer will have to implement GDPS co-operation support between GDPS/PPRC and GDPS/Active-Active No single right answer – will depend on client environment and requirements/objectives 31 © 2014 IBM Corporation
IBM Systems and Technology Group GDPS/A-A 1. 4 New function summary § Active /Query configuration – Fulfills So. D made when the Active/Standby configuration was announced § VSAM Replication support – Adds to IMS and DB 2 as the data types supported – Requires either CICS TS V 5 for CICS/VSAM applications or CICS VR V 5 for logging of non-CICS workloads § Support for IIDR for DB 2 (Qrep) Multiple Consistency Groups – Enables support for massive replication scalability § Workload switch automation – Avoids manual checking for replication updates having drained as part of the switch process § GDPS/PPRC Co-operation support – Enables GDPS/PPRC and GDPS/A-A to coexist without issues over who manages the systems § Disk replication integration – Provides tight integration with GDPS/MGM for GDPS/A-A to be able to manage disaster recovery for the entire sysplex 32 © 2014 IBM Corporation
GDPS Active/Active Sites Customer Use Case – reducing planned outage downtime by 90% © 2014 IBM Corporation
IBM Systems and Technology Group Customer Background § Large Chinese financial institution § Several critical workloads – Self-services (ATMs) – Internet banking (query-only) § Workloads access data from DB 2 tables through CICS § Planned outages – Minor application upgrades (as needed) • Often included DB 2 table schema changes – Quarterly application version upgrades • Other planned maintenance activities such as software infrastructure 34 © 2014 IBM Corporation
IBM Systems and Technology Group Customer Goal - Seeking a better recovery time for planned outages § Critical workloads were down for three to four hours – Scheduled for 3 rd shifts local time on weekends to limit impact to banking customers • Still affected customers accessing accounts from other world-wide locations – Site taken down for application upgrades, possible database schema changes, scheduled maintenance • All business stopped • Required manual coordination across geographic locations to block and resume routing of connections into data center • Reload of DB 2 data elongated outage period § Goal was to reduce planned outage time for these workloads down to minutes 35 © 2014 IBM Corporation
IBM Systems and Technology Group Customer Solution – Leveraging continuous availability solution to provide better recovery time for planned outages § Solution provides – A transactional consistent copy of DB 2 on a remote site • IBM Info. Sphere Data Replication for DB 2 for z/OS (IIDR) - provides a highperformance replication solution for DB 2 – A method to easily switch selected workloads to a remote site without any application changes • IBM Multi-site Workload Lifeline (Lifeline) - facilitates planned outages by rerouting workloads from one site to another without disruption to users – A centralized point of control to manage the graceful switch • GDPS Active/Active Sites - coordinates interactions between IIDR and Lifeline to enable a non-disruptive switch of workloads without loss of data § Reduced impact to their banking customers! – Total outage time for update workloads was reduced from 3 -4 hours down to about 2 minutes – Total outage time for the query workload was reduced from 3 -4 hours down to under 2 minutes 36 © 2014 IBM Corporation
IBM Systems and Technology Group Summary § Manages availability at a workload level § Provides a central point of monitoring & control § Manages replication between sites § Provides the ability to perform a controlled workload site switch § Provides near-continuous data and systems availability and helps simplify disaster recovery with an automated, customized solution § Reduces recovery time and recovery point objectives – measured in seconds § Facilitates regulatory compliance management with a more effective business continuity plan § Simplifies system resource management GDPS/Active-Active is the next generation of GDPS 37 © 2014 IBM Corporation
IBM Systems and Technology Group Dank u Merci Спаcибо French Dutch Russian ﺷﻜﺮ Spanish Korean Obrigado धनयव द Thank You Esperanto Breton Japanese நனற Tamil go raibh maith agat Gaelic 谢谢 Chinese Trugarez ありがとうございます Italian Turkish Hebrew Dankon Grazie Czech תודה רבה Teşekkür ederim Hindi Brazilian Portuguese 38 děkuji 감사합니다 Arabic Gracias Danke German Tack så mycket Swedish Tak Danish ขอบค ณ Thai © 2014 IBM Corporation
IBM Systems and Technology Group Additional Charts © 2014 IBM Corporation
IBM Systems and Technology Group Pre-requisite software matrix GDPS Controller A-A Systems non A-A Systems YES YES DB 2 for z/OS V 9 or higher NO YES wkld dependent as required IMS V 11 NO YES wkld dependent as required Websphere MQ V 7. 0. 1 NO MQ is only req‘d for DB 2 data replication as required CICS Transaction Server for z/OS V 5. 1 NO YES 1) as required CICS VSAM Recovery for z/OS V 5. 1 NO YES 1) as required Info. Sphere Data Replication for DB 2 for z/OS 10. 2 and SPE NO YES wkld dependent as required 2) Info. Sphere Data Replication for IMS for z/OS V 11. 1 NO YES wkld dependent as required 2) Info. Sphere Data Replication for VSAM for z/OS V 11. 1 NO YES wkld dependent as required 2) Pre-requisite software [version/release level] Operating Systems z/OS 1. 13 or higher Application Middleware 1) CICS TS and CICS VR are required when using VSAM replication for A-A workloads Replication 2) Non-Active/Active systems & their workloads can, if required, use Replication Server instances, but not the same instances as the A-A workloads 40 © 2014 IBM Corporation
IBM Systems and Technology Group Pre-requisite software matrix (cont) Pre-requisite software [version/release level] GDPS Controller A-A Systems non A-A Systems YES 3) YES YES 3) Management and Monitoring GDPS/A-A V 1. 4 3) GDPS/A-A requires the installation of the GDPS satellite code in production systems where A-A workloads ru IBM Tivoli Net. View Monitoring for GDPS v 6. 2 4) YES 4) IBM Tivoli Net. View Monitoring for GDPS v 6. 2 requires IBM Tivoli Net. View for z/OS V 6. 2. Net. View Monitoring for GDPS and Net. View for z/OS just GA‘ed v 6. 2. 1 releases IBM Tivoli Management Services for z/OS V 6. 3 Fixpack 1 or later YES 5) YES 6) 5) IBM Tivoli Net. View Management Services for z/OS is required for the Net. View for z/OS Enterprise Management Agent to monitor the A-A solution. 6) IBM Tivoli Net. View Management Services for z/OS is optionally required to run where the Net. View for z/OS Enterprise Management Agent runs to monitor Net. View itself or where OMEGAMON XE products are deployed. IBM Tivoli Monitoring V 6. 3 Fix Pack 1 or later NO NO NO Tivoli System Automation for z/OS V 3. 4 + SPE APARs YES YES IBM Multi-site Workload Lifeline Version for z/OS 2. 0 YES NO Optional Monitoring Products Additional products such as Tivoli OMEGAMON XE on z/OS, Tivoli OMEGAMON XE for DB 2, and Tivoli OMEGAMON XE for IMS may optionally be deployed to provide specific monitoring of products that are part of the Active/Active sites solution 41 Note: Details of cross product dependencies are listed in the PSP information for GDPS/A-A which can be found by selecting the Upgrade: GDPS and Subset: AAV 1 R 4 at the following URL: http: //www 14. software. ibm. com/webapp/set 2/psearch/search? domain=psp&new=y © 2014 IBM Corporation
IBM Systems and Technology Group Pre-requisite products § IBM Multi-site Workload Lifeline v 2. 0 – Advisor – runs on the Controllers & provides information to the external load balancers on where to send connections and information to GDPS on the health of the environment • There is one primary and one secondary advisor – Agent – runs on all production images with active/active workloads defined and provide information to the Lifeline Advisor on the health of that system § IBM Tivoli Net. View Monitoring for GDPS v 6. 2 or higher – Runs on all systems and provides automation and monitoring functions. This new product pre-reqs IBM Tivoli Net. View for z/OS at the same version/release. The Net. View Enterprise Master runs on the Primary Controller § IBM Tivoli Monitoring v 6. 3 FP 1 – Can run on z. Linux, or distributed servers – provides monitoring infrastructure and portal plus alerting/situation management via Tivoli Enterprise Portal, Tivoli Enterprise Portal Server and Tivoli Enterprise Monitoring Server – If running Net. View Monitoring for GDPS v 6. 2. 1 and Net. View for z/OS v 6. 2. 1, ITM v 6. 3 FP 3 is required. 42 © 2014 IBM Corporation
IBM Systems and Technology Group Pre-requisite products… § IBM Info. Sphere Data Replication for DB 2 for z/OS v 10. 2 – Runs on production images where required to capture (active) and apply (standby) data updates for DB 2 data. Relies on MQ as the data transport mechanism (QREP) § IBM Info. Sphere Data Replication for IMS for z/OS v 11. 1 – Runs on production images where required to capture (active) and apply (standby) data updates for IMS data. Relies on TCPIP as the data transport mechanism § IBM Infosphere Data Replication for VSAM for z/OS v 11. 1 – Runs on production images where required to capture (active) and apply (standby) data updates for VSAM data. Relies on TCP/IP as data transport mechanism. Requires CICS TS or CICS VR § System Automation for z/OS v 3. 4 or higher – Runs on all images. Provides a number of critical functions: • BCPii for GDPS • Remote communications capability to enable GDPS to manage sysplexes from outside the sysplex • System Automation infrastructure for workload and server management § Optionally the OMEGAMON XE products can provide additional insight to underlying components for Active/Active Sites, such as z/OS, DB 2, IMS, the network, and storage – There are 2 “suite” offerings that include the OMEGAMON XE products (OMEGAMON Performance Management Suite and Service Management Suite for z/OS). 43 © 2014 IBM Corporation
IBM Systems and Technology Group Terminology § Active/Active Sites – This is the overall concept of the shift from a failover model to a continuous availability model. – Often used to describe the overall solution, rather than any specific product within the solution. § GDPS/Active-Active – The name of the GDPS product which provides, along with the other products that make up the solution, the capabilities mentioned in this presentation such as workload, replication and routing management and so on. 44 © 2014 IBM Corporation
IBM Systems and Technology Group Two Types of Active/Active Workloads § Update Workloads Currently only run in what is defined as an active/standby configuration – performing updates to the data associated with the workload, and – has a relationship with the data replication component – not all transactions within this workload will necessarily be update transactions § Query Workloads Run in what is defined as an active/query configuration – must not perform any updates to the data associated with the workload – allows the query workload to run, or could be said to be active, in both sites at the same time – a query workload must be associated with an update workload 45 © 2014 IBM Corporation
IBM Systems and Technology Group Multiple Consistency Groups (MCGs) – for DB 2 for ultra large scale replication needs § A Consistency Group (CG) corresponds to a set of DB 2 tables for which the replication apply process maintains transactional consistency - by applying data -dependent transactions serially, and other transactions in parallel § Multiple Consistency Groups (MCGs) are primarily used to provide scalability – if and when one CG (Single Consistency Group) cannot keep up with all transactions for one workload – query workloads can tolerate data replicated with eventual consistency § Q Replication (V 10. 2. 1) can coordinate the Apply programs across CGs to guarantee that a time-consistent point across all CGs can be established at the standby site, following a disaster or outage, before switching workloads to this standby side § GDPS operations on a workload controls and coordinates replication for all CGs that belong to this workload – For example, 'STOP REPLICATION' for a workload, stops replication in a coordinated manner for all CGs (all queues and Capture/Apply programs) – GDPS supports up to 20 consistency groups for each workload 46 © 2014 IBM Corporation
IBM Systems and Technology Group Multiple consistency groups (MCG) – deeper insight Site-1 (DB 2 sharing group) Site-2 (DB 2 sharing group) Network SOURCE 3 SOURCE 2 Multiple channels for throughput rate SOURCE 1 Capture 1 Workload 1 CG 1 Single CG meets the requirements of majority of workloads Apply 1 TARGET 3 TARGET 2 TARGET 1 receive queue send queue MQ Manager 1 MQ Manager 3 Workload 2 CG 2 Capture 2 Apply 2 SOURCE 5 SOURCE 4 MQ Manager 2 Multiple consistency groups TARGET 5 TARGET 4 CG 3 SOURCE 6 TARGET 6 MQ Manager 4 Note: Eventual Consistency is suitable for a large number of READONLY applications 47 © 2014 IBM Corporation
IBM Systems and Technology Group Conceptual view Connections Any load balancer or workload distributor that supports the Server Application State Protocol (SASP) Workload Routing to active sysplex Active Update Workload & Active Query Workload Distribution S/W Data Replication Standby Update Workload & Active Query Workload Control information passed between systems and workload distributor Workload Lifeline, Tivoli Net. View, System Automation, … 48 Controllers © 2014 IBM Corporation
IBM Systems and Technology Group High level architecture GDPS/Active-Active Lifeline MQ TCP/IP Workload IMS DB 2 VSAM Monitoring Replication SA z. OS Net. View z/OS System z Hardware 49 © 2014 IBM Corporation
IBM Systems and Technology Group Sample scenario – all workloads active in one site Network Routing for WKLD 1, 2 & 3 AA Controller SASP-compliant Routers LB 1° Tier F 5 AA Controller [AAC 1] [AAC 2] LB 2° Tier Sysplex Distrib Backup [AAPLEX 1] Sysplex-A 1 A 2 Prod-sys [A 1 P 1] [A 1 P 2] [A 2 P 1] [A 2 P 2] wkld-1 active wkld-1 standby wkld-2 active wkld-3 active DB 2 Site 1 VSAM IMS wkld-3 standby S/W Replication DB 2 VSAM Sysplex-A 2 A 1 Prod-sys [AAPLEX 2] A 1 Prod-sys wkld-3 active 50 LB 2° Tier Sysplex Distrib Primary wkld-3 standby IMS Site 2 © 2014 IBM Corporation
IBM Systems and Technology Group What is an Active/Active Workload? A workload is the aggregation of these components § Software: user written applications (eg: COBOL programs) and the middleware run time environment (eg: CICS regions, Info. Sphere Replication Server instances and DB 2 subsystems) – Data: related set of objects that must preserve transactional consistency and optionally referential integrity constraints (eg: DB 2 Tables, IMS Databases, VSAM Files) – Network connectivity: one or more TCP/IP addresses and ports (eg: 10. 10. 1: 80) 51 © 2014 IBM Corporation
IBM Systems and Technology Group Data – deeper insight SENDQ [SQ 1] RECVQ [SQ 1] T 1 SOURCE [SUBS] TARGET [SUBS] Target T 2 § In DB 2 Replication, the mapping between a table at the source and a table at the target is called a subscription – Example shows 2 subscriptions for tables T 1 and T 2 § A subscription belongs to a QMap which defines the sendq that is used to send data for that subscription – Example shows that both subscriptions are using the same QMap (SQ 1) § In IMS Replication, a subscription is a combination of a source server and a target server – The subscription is the object that is started/stopped by GDPS/A-A. – This corresponds to the QMap in Q Replication § Each IMS Replication subscription contains a list of replication mappings – There is one replication mapping for each IMS database being replicated – This corresponds to a subscription in Q Replication 52 © 2014 IBM Corporation
IBM Systems and Technology Group Architectural building blocks WAN & SASP-compliant Routers Active Production used for workload distribution Standby Production z/OS Lifeline Agent SE/HMC LAN Workload IMS/DB 2/VSAM Workload TCPIP Net. View z/OS Lifeline Advisor Net. View GDPS/A-A SA 53 Replication Apply TCPIP MQ Net. View SA Tivoli Monitoring Other Automation Product IMS/DB 2/VSAM SA & BCPii GDPS/A-A MQ Backup Controller SA & BCPii Replication Capture Primary Controller Tivoli Monitoring Other Automation Product © 2014 IBM Corporation
IBM Systems and Technology Group GDPS/Active-Active (the product) § Automation code is an extension on many of the techniques tried and tested in other GDPS products and with many client environments for management of their mainframe CA & DR requirements § Control code only runs on Controller systems § Workload management - start/stop components of a workload in a given Sysplex § Software Replication management - start/stop replication for a given workload between sites § Disk Replication management – ability to manipulate GDPS/MGM from GDPS/A-A § Routing management - start/stop routing of connections to a site § System and Server management - STOP (graceful shutdown) of a system, LOAD, RESET, ACTIVATE, DEACTIVATE the LPAR for a system, and capacity on demand actions such as CBU/OOCo. D § Monitoring the environment and alerting for unexpected situations § Planned/Unplanned situation management and control - planned or unplanned site or workload switches; automatic actions such as automatic workload switch (policy dependent) § Powerful scripting capability for complex/compound scenario automation 54 © 2014 IBM Corporation
f81cd25258996ea3d7ee649fe3d11833.ppt