
33e5c347451fce317054bbeda2273370.ppt
- Количество слайдов: 22
Disaster Recovery Broad Team – UCSD, UCOP, and others! (special credit to Kris Hafner & Elazar Harel) Presenter - Paul Weiss – Executive Director UCOP/IR&C Paul. weiss@ucop. edu March 9 -11, 2009 • Long Beach, CA • cenic 09. cenic. org 1
Agenda • Business view and background as to how and why • The services portfolio • Technical details • Network implications • Lessons learned, going forward RIDING THE WAVES OF INNOVATION • cenic 09. cenic. org 2
Situation as of 2 Q 2006 • UCSD had almost no DR plan in place • UCOP used IBM contract in Colorado – Cost $200 k / yr + $600 k/month if ever used – Had insufficient gear and network reserved, cautiously estimate would be > 50% more cost if updated appropriately – 40 hrs of testing / year limit, difficult to schedule – RPO (Recovery Point Objective) <= 7 days – RTO (Recovery Time Objective) <= 3 days – Required UCOP personnel to activate and operate – Past testing indicated decent mainframe recovery plan in place, limited distributed system capability RIDING THE WAVES OF INNOVATION • cenic 09. cenic. org 3
DR Concept • UCOP required shorter RPO & RTO • Found trusted partner (UCSD) – Willingness to be “married” • Technical choices • Change management – ongoing • One “team” • Common principles • Use the WAN “stupid” RIDING THE WAVES OF INNOVATION • cenic 09. cenic. org 4
Keys to Approach – Buy enough storage, synchronize data in real or near real time, avoid loading data during an actual DR event – Mainframe – CBU option and buy memory – Other servers – buy sufficient gear to have capacity available to run at either location without having to repurpose servers during event – Must be able to test and retest – DR is not STATIC! The decision to do it! RIDING THE WAVES OF INNOVATION • cenic 09. cenic. org 5
Advantages of this Approach • Costs for UCOP are comparable to old DR plan • Costs for UCSD are <50% of a vendor solution • Capability is dramatically improved – RTO and RPO < 1 day (and will be far less) – Can test as often as needed (we need it!) – Equipment is there and operational – More services can be “easily” added (and have!) after the initial investment and can optimize over time – UC personnel “on other side” will assist in case of disaster, long term goal is to recover without any personnel from down location immediately available RIDING THE WAVES OF INNOVATION • cenic 09. cenic. org 6
Initial Critical Success Factors • UCOP assigned. 5 FTE staff dedicated to drive effort • One Team – UCOP and UCSD • Agree to basic principles, including $$$ • Fight scope creep • Engage procurement personnel • Communicate, communicate • Test, Test • The WAN! RIDING THE WAVES OF INNOVATION • cenic 09. cenic. org 7
Current UCOP to UCSD DR Portfolio – All Mainframe services (including 9 (and soon to be 10) PPS instances & UCRS) – AYSO and all Benefits services – Endowment and Investment Accounting System – Active Directory – VPN – Email & File sharing – Web Servers – Banking/Treasury Systems – Loan Programs – Risk Services RIDING THE WAVES OF INNOVATION • cenic 09. cenic. org 8
The Picture - Part I UCOP RIDING THE WAVES OF INNOVATION • cenic 09. cenic. org UCSD 9
Current UCSD to UCOP DR Portfolio – All Mainframe services (including HR, financial and student transactional backend systems) – All Web Based systems for HR/PPS, Financial, Student, Telecommunications billing, etc. – Google search appliances – Multi terabyte data warehouse – Multi terabyte production data for all mainframe and open systems – Dev and QA testing data and LPAR’s for mainframe applications – Stand Alone systems for Intl. Student tracking, Audit, Coeus, and DARS systems RIDING THE WAVES OF INNOVATION • cenic 09. cenic. org 10
Future UCSD to UCOP DR Portfolio – Portal/CMS backup for campus, business and student portals – Single Sign-on, roles, affiliates authentication/authorization failover – VPN – Active Directory – Domain controllers – Core MTA (Ironport for now) – Blackberry – Mailing lists – Mailbox machines RIDING THE WAVES OF INNOVATION • cenic 09. cenic. org 11
The Picture - Part II UCOP RIDING THE WAVES OF INNOVATION • cenic 09. cenic. org UCSD 12
Then it got interesting As positive word got out, more locations and functional areas realized that DR was achievable So… RIDING THE WAVES OF INNOVATION • cenic 09. cenic. org 13
Other DR services in place or committed too – – – UC Effort Reporting System (3 Q 2009) UCOP Office of Technology Transfer Informix DB UCOP IDP Shibboleth Server UC Replacement Security Number (RSN) UCOP TSM Server UC Pathways (3 Q 2009) – UCSD Med Mainframe, PPRC – UCSB Distributed DNS Server – UCLA Continuing Education of the Bar – UCSD External Relations – UCDC File Server – Irvine Secondary DNS and Web Server – SD Coastal Data Information Program RIDING THE WAVES OF INNOVATION • cenic 09. cenic. org 14
And a Special Case! – UCSB mainframe load – Four Steps: • DR from UCSB to UCOP utilizing PPRC • Do failover test to UCOP, if fully successful, keep production at UCOP • DR from UCOP to UCSD - trivial • Turn off UCSB mainframe RIDING THE WAVES OF INNOVATION • cenic 09. cenic. org 15
The Picture - Part III UCI San Diego Coastal UCOP UCSD External Relations UCSDMC UCDC UCSB RIDING THE WAVES OF INNOVATION • cenic 09. cenic. org UCLACEB 16
Services being Considered – UCOP California Institute for Energy and Environment – UCLA Med PPRC And what’s next? Broader discussions are now occurring, not just w/ UCOP, but between more and more UC players – nice “halo” effect with many leveraging the WAN! RIDING THE WAVES OF INNOVATION • cenic 09. cenic. org 17
Technical Details • SD & OP (and SB & SDMC) purchased comparable HW • IBM SAN & Cisco SAN switches, supports global mirroring (PPRC – Peer to Peer remote copy) • Mainframe – memory upgrade and CBU option – must have sufficient capacity on both sides to support total load • Worked through CENIC and local network teams to set up appropriate links for PPRC to ensure throughput • Wrote (and are writing) special monitoring tools • Setup remote tape capabilities so we don’t have to use outside vendor for offsite storage on tape copies • You need to remember that this hardware needs to be in normal refresh cycle just like hardware on your primary floor RIDING THE WAVES OF INNOVATION • cenic 09. cenic. org 18
Network concerns • Frame size • For low traffic, default end to end of 1500 bytes – works fine • OP/SD (more traffic) had to move into “jumbo frames” – 2300 bytes seems to work • On HPR today, need to move to DC • @ OP – likely upgrade to 10 Gb, at 1 Gb now • Must refine SLA’s & due diligence • Acceptable catch up (RPO issue) • Better understanding of traffic RIDING THE WAVES OF INNOVATION • cenic 09. cenic. org 19
Network Layout 20
Implications due to “Success” • OP WAN capacity connection upgrade • Change management is a lot more complicated • Some technical “lock in” • Insufficient documentation and test plans – even now. • Better monitoring tools required • Org processes can be stressed RIDING THE WAVES OF INNOVATION • cenic 09. cenic. org 21
Lessons Learned • WAN is an underutilized/unrecognized asset • Geography is less of an inhibitor then many believe • This project will never be completed • Can/should continuously optimize this over time (examples – virtualization, better sharing) • Adding DR capability is easier after initial heavy lifting - e. g. Mainframe RIDING THE WAVES OF INNOVATION • cenic 09. cenic. org 22
33e5c347451fce317054bbeda2273370.ppt