Скачать презентацию Condor Build Test NMI OMII ETICS Peter Скачать презентацию Condor Build Test NMI OMII ETICS Peter

59d9cee47e0a86ea0fcda511ef60c070.ppt

  • Количество слайдов: 27

Condor Build & Test: NMI, OMII, ETICS Peter F. Couvares Associate Researcher, Condor Team Condor Build & Test: NMI, OMII, ETICS Peter F. Couvares Associate Researcher, Condor Team Computer Sciences Department University of Wisconsin-Madison [email protected] wisc. edu http: //www. cs. wisc. edu/condor

How the Condor Team Got Started in the Build/Test Business: Prehistory › Oracle shamed^H^H^Hinspired How the Condor Team Got Started in the Build/Test Business: Prehistory › Oracle shamed^H^H^Hinspired us. › The Condor team was in the stone age, producing modern software to help people reliably automate their computing tasks -with our bare hands. • Every Condor release took weeks/months to do. • Build by hand on each platform, discover lots of bugs introduced since the last release, track them down, re-build, etc. www. cs. wisc. edu/condor

What Did Oracle Do? › Oracle selected Condor as the resource manager › › What Did Oracle Do? › Oracle selected Condor as the resource manager › › underneath their Automated Integration Management Environment (AIME) Relied on to perform automated build and regression testing of multiple components for Oracle's flagship Database Server product. Oracle chose Condor because they liked the maturity of Condor's core components. www. cs. wisc. edu/condor

Doh! › Oracle used distributed computing to automate › › › their build/test cycle, Doh! › Oracle used distributed computing to automate › › › their build/test cycle, with huge success. If Oracle can do it, why can’t we? Use Condor to build Condor! NSF Middleware Initiative (NMI) • right initiative at the right time! • opportunity to collaborate with others to do for production software developers like Condor what Oracle was doing for themselves • important service to the scientific computing community www. cs. wisc. edu/condor

NMI Statement › Purpose – to develop, deploy and sustain a set of › NMI Statement › Purpose – to develop, deploy and sustain a set of › reusable and expandable middleware functions that benefit many science and engineering applications in a networked environment Program encourages open source software development and development of middleware standards www. cs. wisc. edu/condor

Why should you care? From our experience, the functionality, robustness and maintainability of a Why should you care? From our experience, the functionality, robustness and maintainability of a production-quality software component depends on the effort involved in building, deploying and testing the component. • If it is true for a component, it is definitely true for a software stack • Doing it right is much harder than it appears from the outside • Most of us had very little experience in this area www. cs. wisc. edu/condor

Goals of the NMI Build & Test System › Design, develop and deploy a Goals of the NMI Build & Test System › Design, develop and deploy a complete build › system (HW and SW) capable of performing daily builds and tests of a suite of disparate software packages on a heterogeneous (HW, OS, libraries, …) collection of platforms And make it: • • • Dependable Traceable Manageable Portable Extensible Schedulable www. cs. wisc. edu/condor

The Build Challenge › Automation - “build the component at the push of a The Build Challenge › Automation - “build the component at the push of a button!” • always more to it than just “configure” & “make” • e. g. , ssh to right host; cvs checkout; untar; setenv, etc. › Reproducibility – “build the version we released 2 years ago!” • Well-managed & comprehensive source repository • Know your “externals” and keep them around › Portability – “build the component on node. X. cluster. com!” • No dependencies on “local” capabilities • Understand your hardware & software requirements › Manageability – “run the build daily on 15 platforms and email me the outcome!” www. cs. wisc. edu/condor

The Testing Challenge › All the same challenges as builds (automation, › reproducibility, portability, The Testing Challenge › All the same challenges as builds (automation, › reproducibility, portability, manageability), plus: Flexibility • • “test our RHEL 4 binaries on RHEL 5!” “run our new tests on our old binaries” important to decouple build & test functions making tests just a part of a build -- instead of an independent step -- makes it difficult/impossible to: • run new tests against old builds • test one platform’s binaries on another platform • run different tests at different frequencies www. cs. wisc. edu/condor

“Eating Our Own Dogfood” › What Did We Do? • We built the NMI “Eating Our Own Dogfood” › What Did We Do? • We built the NMI Build & Test Lab on top of Condor, DAGMan, and other distributed computing technologies to automate the build, deploy, and test cycle. • To support it, we’ve had to construct and manage a dedicated, heterogeneous distributed computing facility. • Opposite extreme from typical “cluster” -- instead of 1000’s of identical CPUs, we have a handful of CPUs each for ~40 platforms. • Much harder to manage! You try finding a sysadmin tool that works on 40 platforms! • We’re just another big Condor user • If Condor sucks, we feel the pain. www. cs. wisc. edu/condor

INPUT Spe c File NMI Build & Test Facility Distributed Build/Test Pool NMI Build INPUT Spe c File NMI Build & Test Facility Distributed Build/Test Pool NMI Build & Test Software Condor Queue DAG Spec File Customer Source Code DAGM an results Customer Build/Test Scripts OUTPUT results Web Portal Finished Binaries My. SQL Results DB build/test jobs results

Numbers 100 39 34 9 3 ~100 ~1400 ~350 CPUs HW/OS “Platforms” OS HW Numbers 100 39 34 9 3 ~100 ~1400 ~350 CPUs HW/OS “Platforms” OS HW Arch Sites GB of results per day Builds/tests per month Condor jobs per day www. cs. wisc. edu/condor

Condor Build & Test › Automated Condor Builds • Two (sometimes three) separate Condor Condor Build & Test › Automated Condor Builds • Two (sometimes three) separate Condor versions, each automatically built using NMI on 13 -17 platforms nightly • Stable, developer, special release branches › Automated Condor Tests • Each nightly build’s output becomes the input to a new NMI run of our full Condor test suite › Ad-Hoc Builds & Tests • Each Condor developer can use NMI to submit ad-hoc builds & tests of their experimental workspaces or CVS branches to any or all platforms www. cs. wisc. edu/condor

www. cs. wisc. edu/condor www. cs. wisc. edu/condor

More Condor Testing Work • Advanced Test Suite • Using binaries from each build, More Condor Testing Work • Advanced Test Suite • Using binaries from each build, we deploy an entire self-contained Condor pool on each test machine • Runs a battery of Condor jobs and tests to verify critical features • Currently >150 distinct tests • each executed for each build, on each platform, for each release, every night • Flightworthy Initiative • Ensuring continued “core” Condor scalability, robustness • NSF funded, like NMI • Producing new tests all the time www. cs. wisc. edu/condor

NMI Build & Test Customers › NMI Build & Test Facility was built to NMI Build & Test Customers › NMI Build & Test Facility was built to › serve all NMI projects Who else is building and testing? • Globus • NMI Middleware Distribution • many “grid” tools, including Condor & Globus • Virtual Data Toolkit (VDT) for the Open Science Grid (OSG) • 40+ components • Soon Tera. Grid, NEESgrid, others… www. cs. wisc. edu/condor

Build & Test Beyond NMI › We want to integrate with other, related software Build & Test Beyond NMI › We want to integrate with other, related software quality projects, and share build/test resources. . . • an international (US/Europe/China) federation of build/test grids… • Offer our tools as the foundation for other B&T systems • Leverage others’ work to improve out own B&T service www. cs. wisc. edu/condor

OMII-UK • Integrating software from multiple sources • • Established open-source projects Commissioned services OMII-UK • Integrating software from multiple sources • • Established open-source projects Commissioned services & infrastructure • Deployment across multiple platforms • Verify interoperability between platforms & versions • Automatic Software Testing vital for the Grid • • • Build Testing – Cross platform builds Unit Testing – Local Verification of APIs Deployment Testing – Deploy & run package Distributed Testing – Cross domain operation Regression Testing – Compatibility between versions Stress Testing – Correct operation under real loads • Distributed Testbed • • Need a breadth & variety of resources not power Needs to be a managed resource – process www. cs. wisc. edu/condor

NMI/OMII-UK Collaboration › Phase I: OMII-UK developed automated builds & › tests using the NMI/OMII-UK Collaboration › Phase I: OMII-UK developed automated builds & › tests using the NMI Build & Test Lab at UWMadison Phase II: OMII-UK deployed their own instance of the NMI Build & Test Lab at Southampton University • Our lab at UW-Madison is well and good, but some collaborators want/need their own local facilities. › Phase III (in progress): Move jobs freely between UW and OMII-UK B&T labs as needed. www. cs. wisc. edu/condor

Next: ETICS Build system, software configuration, service infrastructure, dissemination, EGEE, g. Lite, project coord. Next: ETICS Build system, software configuration, service infrastructure, dissemination, EGEE, g. Lite, project coord. NMI Build & Test Framework, Condor, distributed testing tools, service infrastructure Software configuration, service infrastructure, dissemination Web portals and tools, quality process, dissemination, DILIGENT Test methods and metrics, unit testing tools, EBIT www. cs. wisc. edu/condor

ETICS Project Goals › ETICS will provide a multi-platform environment for building › and ETICS Project Goals › ETICS will provide a multi-platform environment for building › and testing middleware and applications for major European e-Science projects “Strong point is automation: of builds, of tests, of reporting, etc. The goal is to simplify life when managing complex software management tasks” • One button to generate finished package (e. g. , RPMs) for any chosen component › ETICS is developing a higher-level web service and DB to generate B&T jobs -- and use multiple, distributed NMI B&T Labs to execute & manage them • This work complements the existing NMI Build & Test system and is something we want to integrate & use to benefit other NMI users! www. cs. wisc. edu/condor

ETICS Web Interface www. cs. wisc. edu/condor ETICS Web Interface www. cs. wisc. edu/condor

OMII-Japan • What They’re Doing • • “…provide service which can use on-demand autobuild OMII-Japan • What They’re Doing • • “…provide service which can use on-demand autobuild and test systems for Grid middlewares on on-demand virtual cluster. Developers can build and test their software immediately by using our autobuild and test systems” Underlying B&T Infrastructure is NMI Build & Test Software www. cs. wisc. edu/condor

This was a Lot of Work… But It Got Easier Each Time › Deployments This was a Lot of Work… But It Got Easier Each Time › Deployments of the NMI B&T Software › with international collaborators taught us how to export Build & Test as a service. Tolya Karp: International B&T Hero • Improved (i. e. , wrote) NMI install scripts • Improved configuration process • Debugged and solved a myriad of details that didn’t work in new environments www. cs. wisc. edu/condor

What This Means For You › NMI B&T Lab Deployment Experience + › › What This Means For You › NMI B&T Lab Deployment Experience + › › Improved Packaging + Improved Portability… We now have unique ability to give you not only source code, but a whole production build & test infrastructure to go along with it … and we have done it for a number of users already www. cs. wisc. edu/condor

New Condor+NMI Users › Yahoo • First industrial user to deploy NMI B&T Framework New Condor+NMI Users › Yahoo • First industrial user to deploy NMI B&T Framework to build/test custom Condor contributions › Hartford Financial • Deploying it as we speak… www. cs. wisc. edu/condor

What’s to Come › More US & international collaborations • OMII-Europe • More Industrial What’s to Come › More US & international collaborations • OMII-Europe • More Industrial User/Developers… › New Features • Becky Gietzel: parallel testing! • Major new feature: multiple co-scheduled resources for individual tests • Going beyond multi-platform testing to cross-platform parallel testing › UW-Madison B&T Lab: ever more platforms • “it’s time to make the doughnuts” • Questions? www. cs. wisc. edu/condor