ca4233851652e854fe5983e0e8995508.ppt
- Количество слайдов: 33
Software Support for High Performance Problem Solving on the Grid An Overview of the Gr. ADS Project Sponsored by NSF NGS Kennedy Center for High Performance Software Rice University http: //www. cs. rice. edu/~ken/Presentations/Gr. ADSOverview. pdf
Principal Investigators Francine Berman, UCSD Andrew Chien, UCSD Keith Cooper, Rice Jack Dongarra, Tennessee Ian Foster, Chicago Dennis Gannon, Indiana Lennart Johnsson, Houston Kennedy, Rice Carl Kesselman, USC ISI John Mellor-Crummey, Rice Dan Reed, UIUC Linda Torczon, Rice Rich Wolski, UCSB
Other Contributors Dave Angulo, Chicago Henri Casanova, UCSD Holly Dail, UCSD Anshu Dasgupta, Rice Sridhar Gullapalli, USC ISI Charles Koelbel, Rice Anirban Mandal, Rice Gabriel Marin, Rice Mark Mazina, Rice Celso Mendes, UIUC Otto Sievert, UCSD Martin Swany, UCSB Satish Vadhiyar, Tennessee Shannon Whitmore, UIUC Asim Yarkan, Tennessee
National Distributed Problem Solving Database Supercomputer
Gr. ADS Vision • Build a National Problem-Solving System on the Grid • Software Support for Application Development on Grids • Challenges: — Transparent to the user, who sees a problem-solving system — Goal: Design and build programming systems for the Grid that broaden the community of users who can develop and run applications in this complex environment — Presenting a high-level application development interface – If programming is hard, the Grid will not reach its potential — Designing and constructing applications for adaptability — Late mapping of applications to Grid resources — Monitoring and control of performance – When should the application be interrupted and remapped?
Today: Globus • Developed by Ian Foster and Carl Kesselman • Basic Services for distributed computing • Applications are programmed by hand — Grew from the I-Way (SC-95) — Resource discovery and information services — User authentication and access control — Job initiation — Communication services (Nexus and MPI) — Many applications — User responsible for resource mapping and all communication – Existing users acknowledge how hard this is
Today: Condor • Support for matching application requirements to resources • What is missing? — User and resource provider write Class. AD specifications — System matches Class. ADs for applications with Class. ADs for resources – Selects the “best” match based on a user-specified priority — Can extend to Grid via Globus (Condor-G) — User must handle application mapping tasks — No dynamic resource selection — No checkpoint/migration (resource re-selection) — Performance matching is straightforward – Priorities coded into Class. ADs
Gr. ADS Strategy • Goal: Reduce work of preparing an application for Grid execution • Key Issue: What is in the application and what is in the system? — Provide generic versions of key components currently built in to applications – E. g. , scheduling, application launch, performance monitoring — Gr. ADS: Application = configurable object program – Code, mapper, and performance modeler
Gr. ADSoft Architecture Performance Feedback Software Components Source Application Libraries Whole. Program Compiler Performance Problem Configurable Object Program Resource Negotiator Real-time Performance Monitor Negotiation Scheduler Binder Grid Runtime System
Configurable Object Program • • Goal: Provide minimum needed to automate resource selection and program launch Code — Today: MPI program — Tomorrow: more general representations • Mapper • Performance Model — Defines required resources and affinities to specialized resources — Given a set of resources, maps computation to those resources – “Optimal” performance, given all requirements met — Given a set of resources and mapping, estimates performance — Serves as objective function for Resource Negotiator/Scheduler
Gr. ADSoft Architecture Execution Environment Performance Feedback Software Components Source Application Libraries Whole. Program Compiler Performance Problem Configurable Object Program Resource Negotiator Real-time Performance Monitor Negotiation Scheduler Binder Grid Runtime System
Execution Cycle • Configurable Object Program is presented • Resource Negotiator solicits acceptable resource collections • Execution begins • Contract monitoring is performed continuously during execution — Space of feasible resources must be defined — Mapping strategy and performance model provided — Performance model is used to evaluate each — Best match is selected and contracted for — Binder tailors program to resources – Carries out final mapping according to mapping strategy – Inserts sensors and actuators for performance monitoring — Soft violation detection based on fuzzy logic
Gr. ADS Program Execution System Application Manager (one per app) Perf Model Scheduler/ Resource Negotiator Configurable Application Gr. ADS Information Repository Binder Mapping Sensor Insertion Launch Grid Contract Monitor Resources And Services
Gr. ADSoft Architecture Program Preparation System Performance Feedback Software Components Source Application Libraries Whole. Program Compiler Performance Problem Configurable Object Program Resource Negotiator Real-time Performance Monitor Negotiation Scheduler Binder Grid Runtime System
Program Preparation Tools • • Goal: provide tools to support the construction of Grid-ready applications (in the Gr. ADS framework) Performance modeling — Challenge: synthesis and integration of performance models – Combine expert knowledge, trial execution, and scaled projections — Focus on binary analysis, derivation of scaling factors • Mapping • High-level programming interfaces — Construction of mappers from parallel programs – Mapping of task graphs to resources (graph clustering) — Integration of mappers and performance modelers from components — Problem-solving systems: integration of components
Generation of Mappers • Start from parallel program • Construct a task graph • Use a clustering algorithm to match tasks to resources — Typically written using a communication library (e. g. MPI) — Can be composed from library components — Vertices represent tasks — Edges represent data sharing – Read-read: undirected edges – Read-write in any order: directed edges (dependences) – Weights represent volume of communication — Identify oportunities for pipelining — One option: global weighted fusion
Constructing Scalable, Portable Models Construct Application Signatures • • Measure static characteristics • Determine sensitivity of aggregate dynamic characteristics to • Build the model based via integration Measure dynamic characteristics for multiple executions — computation — memory access locality — message frequency and size — data size — processor count — machine characteristics
High Level Programming • Rationale • Strategy: Make the End User a Programmer • Achieving High Performance — programming is hard, and getting harder with new platforms — professional programmers are in short supply — high performance will continue to be important — professional programmers develop components — users integrate components using: – problem-solving environments (PSEs) based on scripting languages (possibly graphical) examples: Visual Basic, Tcl/Tk, AVS, Khoros — translate scripts and components to common intermediate language — optimize the resulting program using whole-program compilation
Whole-Program Compilation Global Optimizer Component Library User Library Script Translator Intermediate Code Generator Problem: long compilation times, even for short scripts! Problem: expert knowledge on specialization lost
Telescoping Languages L 1 Class Library Script Compiler Generator Script Translator Could run for hours L 1 Compiler Vendor Compiler understands library calls as primitives Optimized Application
Telescoping Languages: Advantages • Compile times can be reasonable • High-level optimizations can be included • User retains substantive control over language performance • Reliability can be improved — More compilation time can be spent on libraries — Script compilations can be fast – Components reused from scripts may be included in libraries — Based on specifications of the library designer – Properties often cannot be determined by compilers – Properties may be hidden after low-level code generation — Mature code can be built into a library and incorporated into language — Specialization by compilation framework, not user
Applications • Matlab Compiler • Matlab SP* • Optimizing Component Integration System • Generator for ARPACK • System for Analysis of Cancer Experiments* • Flexible Data Distributions in HPF • Generator for Grid Computations* — Automatically generated from LAPACK or Sca. LAPACK — Based on signal processing library — DOE Common Component Architecture — High component invocation costs — Avoid recoding developer version by hand — Based on S+ (collaboration with M. D. Anderson Cancer Center) — Data distribution == collection of interfaces that meet specs — Gr. ADS: automatic generation of Net. Solve
Testbeds • Goal: • Macro. Grid (Carl Kesselman) • Micro. Grid (Andrew Chien) — Provide vehicle for experimentation with the dynamic components of the Gr. ADS software framework — Collection of processors running Globus and Gr. ADS framework – Consistent software environment — At all 9 Gr. ADS sites (but 3 are really useful) – Availability listed on web page — Permits experimentation with real applications — Cluster of processors (currently Compaq Alphas and x 86 clusters) — Runs standard Grid software (Globus, Nexus, Gr. ADS middleware) — Permits simulation of varying loads and configurations – Stress Gr. ADS components (Performance modeling and control)
Research Strategy • Applications Studies • Move from Hand Development to Automated System • Experiment — Prototype a series of applications using components of envisioned execution system – Sca. LAPACK and Cactus demonstration projects — Identify key components that can be isolated and built into a Grid execution system – e. g. , prototype reconfiguration system — Use experience to elaborate design of software support systems — Use testbeds to evaluate results and refine design
Progress Report • • Testbeds Working Preliminary Application Studies Complete — Sca. LAPACK and Cactus — Gr. ADS functionality built in
Sca. LAPACK Across 3 Clusters OPUS 3500 OPUS, CYPHER OPUS, TORC, CYPHER 2 OPUS, 4 TORC, 6 CYPHER Time (seconds) 3000 2500 2000 8 OPUS, 4 TORC, 4 CYPHER 8 OPUS, 2 TORC, 6 CYPHER 1500 6 OPUS, 5 CYPHER 1000 8 OPUS, 6 CYPHER 500 5 OPUS 8 OPUS 0 0 5000 10000 Matrix Size 15000 20000
Largest Problem Solved • Matrix of size 30, 000 — 7. 2 GB for the data — 32 processors to choose from at UIUC and UT – Not all machines have 512 MBs, some little as 128 MBs — PM chose 17 processors in 2 clusters from UT — Computation took 84 minutes – 3. 6 Gflop/s total – 210 Mflop/s per processor – Sca. LAPACK on a cluster of 17 processors would get about 50% of peak – Processors are 500 MHz or 500 Mflop/s peak – For this Grid computation 20% less than Sca. LAPACK
PDSYEVX – Timing Breakdown
Gig-E 100 MB/sec Cactus 17 4 12 5 2 2 OC-12 line (But only 2. 5 MB/sec) SDSC IBM SP 1024 procs 5 x 12 x 17 =1020 12 NCSA Origin Array 256+128 5 x 12 x(4+2+2) =480 5 • Solved equations for gravitational waves (real code) • • Used 10 ghost zones along direction of machines: communicate every 10 steps — Tightly coupled, communications required through derivatives — Must communicate 30 MB/step between machines — Time step takes 1. 6 sec Compression/decompression on all data passed in this direction Achieved 70 -80% scaling, ~200 GF (only 14% scaling without tricks) Gordon Bell Award Winner at SC’ 2001
Progress Report • • Testbeds Working • Prototype Execution System Complete • Prototype Program Preparation Tools Under Development Application Studies Complete — Sca. LAPACK and Cactus — Gr. ADS functionality built in — All components of Execution System (except rescheduling/migration) — Six applications working in new framework — Demonstrations at SC 02 – Sca. LAPACK, FASTA, Cactus, Gr. ADSAT – In NPACI, NCSA, Argonne, Tennessee, Rice Booths — Black-box performance model construction preliminary experiments — Prototype mapper generator complete – Generated Grid version of HPF appplication Tomcatv
SC 02 Demo Applications • Sca. LAPACK • Cactus • FASTA • Smith-Waterman • Tomcatv • Satisfiability — LU decomposition of large matrices — Solver for gravitational wave equations — Collaboration with Ed Seidel’s Grid. LAB — Biological sequence matching on distributed databases — Another sequence matching application using a strong algorithm — Vectorized mesh generation written in HPF — An NP-complete problem useful in circuit design and verification
Summary • Goal: • Strategy: — Design and build programming systems for the Grid that broaden the community of users who can develop and run applications in this complex environment — Build an execution environment that automates the most difficult tasks – Maps applications to available resources – Manages adapting to varying loads and changing resources — Automate the process of producing Grid-ready programs – generate performance models and mapping strategies semiautomatically — Construct programs using high-level domain-specific programming interfaces
Resources • Gr. ADS Web Site — http: //hipersoft. rice. edu/grads/ — Contains: – Planning reports – Technical reports – Links to papers
ca4233851652e854fe5983e0e8995508.ppt