Скачать презентацию Software Support for High Performance Problem Solving on Скачать презентацию Software Support for High Performance Problem Solving on

ca4233851652e854fe5983e0e8995508.ppt

  • Количество слайдов: 33

Software Support for High Performance Problem Solving on the Grid An Overview of the Software Support for High Performance Problem Solving on the Grid An Overview of the Gr. ADS Project Sponsored by NSF NGS Kennedy Center for High Performance Software Rice University http: //www. cs. rice. edu/~ken/Presentations/Gr. ADSOverview. pdf

Principal Investigators Francine Berman, UCSD Andrew Chien, UCSD Keith Cooper, Rice Jack Dongarra, Tennessee Principal Investigators Francine Berman, UCSD Andrew Chien, UCSD Keith Cooper, Rice Jack Dongarra, Tennessee Ian Foster, Chicago Dennis Gannon, Indiana Lennart Johnsson, Houston Kennedy, Rice Carl Kesselman, USC ISI John Mellor-Crummey, Rice Dan Reed, UIUC Linda Torczon, Rice Rich Wolski, UCSB

Other Contributors Dave Angulo, Chicago Henri Casanova, UCSD Holly Dail, UCSD Anshu Dasgupta, Rice Other Contributors Dave Angulo, Chicago Henri Casanova, UCSD Holly Dail, UCSD Anshu Dasgupta, Rice Sridhar Gullapalli, USC ISI Charles Koelbel, Rice Anirban Mandal, Rice Gabriel Marin, Rice Mark Mazina, Rice Celso Mendes, UIUC Otto Sievert, UCSD Martin Swany, UCSB Satish Vadhiyar, Tennessee Shannon Whitmore, UIUC Asim Yarkan, Tennessee

National Distributed Problem Solving Database Supercomputer National Distributed Problem Solving Database Supercomputer

Gr. ADS Vision • Build a National Problem-Solving System on the Grid • Software Gr. ADS Vision • Build a National Problem-Solving System on the Grid • Software Support for Application Development on Grids • Challenges: — Transparent to the user, who sees a problem-solving system — Goal: Design and build programming systems for the Grid that broaden the community of users who can develop and run applications in this complex environment — Presenting a high-level application development interface – If programming is hard, the Grid will not reach its potential — Designing and constructing applications for adaptability — Late mapping of applications to Grid resources — Monitoring and control of performance – When should the application be interrupted and remapped?

Today: Globus • Developed by Ian Foster and Carl Kesselman • Basic Services for Today: Globus • Developed by Ian Foster and Carl Kesselman • Basic Services for distributed computing • Applications are programmed by hand — Grew from the I-Way (SC-95) — Resource discovery and information services — User authentication and access control — Job initiation — Communication services (Nexus and MPI) — Many applications — User responsible for resource mapping and all communication – Existing users acknowledge how hard this is

Today: Condor • Support for matching application requirements to resources • What is missing? Today: Condor • Support for matching application requirements to resources • What is missing? — User and resource provider write Class. AD specifications — System matches Class. ADs for applications with Class. ADs for resources – Selects the “best” match based on a user-specified priority — Can extend to Grid via Globus (Condor-G) — User must handle application mapping tasks — No dynamic resource selection — No checkpoint/migration (resource re-selection) — Performance matching is straightforward – Priorities coded into Class. ADs

Gr. ADS Strategy • Goal: Reduce work of preparing an application for Grid execution Gr. ADS Strategy • Goal: Reduce work of preparing an application for Grid execution • Key Issue: What is in the application and what is in the system? — Provide generic versions of key components currently built in to applications – E. g. , scheduling, application launch, performance monitoring — Gr. ADS: Application = configurable object program – Code, mapper, and performance modeler

Gr. ADSoft Architecture Performance Feedback Software Components Source Application Libraries Whole. Program Compiler Performance Gr. ADSoft Architecture Performance Feedback Software Components Source Application Libraries Whole. Program Compiler Performance Problem Configurable Object Program Resource Negotiator Real-time Performance Monitor Negotiation Scheduler Binder Grid Runtime System

Configurable Object Program • • Goal: Provide minimum needed to automate resource selection and Configurable Object Program • • Goal: Provide minimum needed to automate resource selection and program launch Code — Today: MPI program — Tomorrow: more general representations • Mapper • Performance Model — Defines required resources and affinities to specialized resources — Given a set of resources, maps computation to those resources – “Optimal” performance, given all requirements met — Given a set of resources and mapping, estimates performance — Serves as objective function for Resource Negotiator/Scheduler

Gr. ADSoft Architecture Execution Environment Performance Feedback Software Components Source Application Libraries Whole. Program Gr. ADSoft Architecture Execution Environment Performance Feedback Software Components Source Application Libraries Whole. Program Compiler Performance Problem Configurable Object Program Resource Negotiator Real-time Performance Monitor Negotiation Scheduler Binder Grid Runtime System

Execution Cycle • Configurable Object Program is presented • Resource Negotiator solicits acceptable resource Execution Cycle • Configurable Object Program is presented • Resource Negotiator solicits acceptable resource collections • Execution begins • Contract monitoring is performed continuously during execution — Space of feasible resources must be defined — Mapping strategy and performance model provided — Performance model is used to evaluate each — Best match is selected and contracted for — Binder tailors program to resources – Carries out final mapping according to mapping strategy – Inserts sensors and actuators for performance monitoring — Soft violation detection based on fuzzy logic

Gr. ADS Program Execution System Application Manager (one per app) Perf Model Scheduler/ Resource Gr. ADS Program Execution System Application Manager (one per app) Perf Model Scheduler/ Resource Negotiator Configurable Application Gr. ADS Information Repository Binder Mapping Sensor Insertion Launch Grid Contract Monitor Resources And Services

Gr. ADSoft Architecture Program Preparation System Performance Feedback Software Components Source Application Libraries Whole. Gr. ADSoft Architecture Program Preparation System Performance Feedback Software Components Source Application Libraries Whole. Program Compiler Performance Problem Configurable Object Program Resource Negotiator Real-time Performance Monitor Negotiation Scheduler Binder Grid Runtime System

Program Preparation Tools • • Goal: provide tools to support the construction of Grid-ready Program Preparation Tools • • Goal: provide tools to support the construction of Grid-ready applications (in the Gr. ADS framework) Performance modeling — Challenge: synthesis and integration of performance models – Combine expert knowledge, trial execution, and scaled projections — Focus on binary analysis, derivation of scaling factors • Mapping • High-level programming interfaces — Construction of mappers from parallel programs – Mapping of task graphs to resources (graph clustering) — Integration of mappers and performance modelers from components — Problem-solving systems: integration of components

Generation of Mappers • Start from parallel program • Construct a task graph • Generation of Mappers • Start from parallel program • Construct a task graph • Use a clustering algorithm to match tasks to resources — Typically written using a communication library (e. g. MPI) — Can be composed from library components — Vertices represent tasks — Edges represent data sharing – Read-read: undirected edges – Read-write in any order: directed edges (dependences) – Weights represent volume of communication — Identify oportunities for pipelining — One option: global weighted fusion

Constructing Scalable, Portable Models Construct Application Signatures • • Measure static characteristics • Determine Constructing Scalable, Portable Models Construct Application Signatures • • Measure static characteristics • Determine sensitivity of aggregate dynamic characteristics to • Build the model based via integration Measure dynamic characteristics for multiple executions — computation — memory access locality — message frequency and size — data size — processor count — machine characteristics

High Level Programming • Rationale • Strategy: Make the End User a Programmer • High Level Programming • Rationale • Strategy: Make the End User a Programmer • Achieving High Performance — programming is hard, and getting harder with new platforms — professional programmers are in short supply — high performance will continue to be important — professional programmers develop components — users integrate components using: – problem-solving environments (PSEs) based on scripting languages (possibly graphical) examples: Visual Basic, Tcl/Tk, AVS, Khoros — translate scripts and components to common intermediate language — optimize the resulting program using whole-program compilation

Whole-Program Compilation Global Optimizer Component Library User Library Script Translator Intermediate Code Generator Problem: Whole-Program Compilation Global Optimizer Component Library User Library Script Translator Intermediate Code Generator Problem: long compilation times, even for short scripts! Problem: expert knowledge on specialization lost

Telescoping Languages L 1 Class Library Script Compiler Generator Script Translator Could run for Telescoping Languages L 1 Class Library Script Compiler Generator Script Translator Could run for hours L 1 Compiler Vendor Compiler understands library calls as primitives Optimized Application

Telescoping Languages: Advantages • Compile times can be reasonable • High-level optimizations can be Telescoping Languages: Advantages • Compile times can be reasonable • High-level optimizations can be included • User retains substantive control over language performance • Reliability can be improved — More compilation time can be spent on libraries — Script compilations can be fast – Components reused from scripts may be included in libraries — Based on specifications of the library designer – Properties often cannot be determined by compilers – Properties may be hidden after low-level code generation — Mature code can be built into a library and incorporated into language — Specialization by compilation framework, not user

Applications • Matlab Compiler • Matlab SP* • Optimizing Component Integration System • Generator Applications • Matlab Compiler • Matlab SP* • Optimizing Component Integration System • Generator for ARPACK • System for Analysis of Cancer Experiments* • Flexible Data Distributions in HPF • Generator for Grid Computations* — Automatically generated from LAPACK or Sca. LAPACK — Based on signal processing library — DOE Common Component Architecture — High component invocation costs — Avoid recoding developer version by hand — Based on S+ (collaboration with M. D. Anderson Cancer Center) — Data distribution == collection of interfaces that meet specs — Gr. ADS: automatic generation of Net. Solve

Testbeds • Goal: • Macro. Grid (Carl Kesselman) • Micro. Grid (Andrew Chien) — Testbeds • Goal: • Macro. Grid (Carl Kesselman) • Micro. Grid (Andrew Chien) — Provide vehicle for experimentation with the dynamic components of the Gr. ADS software framework — Collection of processors running Globus and Gr. ADS framework – Consistent software environment — At all 9 Gr. ADS sites (but 3 are really useful) – Availability listed on web page — Permits experimentation with real applications — Cluster of processors (currently Compaq Alphas and x 86 clusters) — Runs standard Grid software (Globus, Nexus, Gr. ADS middleware) — Permits simulation of varying loads and configurations – Stress Gr. ADS components (Performance modeling and control)

Research Strategy • Applications Studies • Move from Hand Development to Automated System • Research Strategy • Applications Studies • Move from Hand Development to Automated System • Experiment — Prototype a series of applications using components of envisioned execution system – Sca. LAPACK and Cactus demonstration projects — Identify key components that can be isolated and built into a Grid execution system – e. g. , prototype reconfiguration system — Use experience to elaborate design of software support systems — Use testbeds to evaluate results and refine design

Progress Report • • Testbeds Working Preliminary Application Studies Complete — Sca. LAPACK and Progress Report • • Testbeds Working Preliminary Application Studies Complete — Sca. LAPACK and Cactus — Gr. ADS functionality built in

Sca. LAPACK Across 3 Clusters OPUS 3500 OPUS, CYPHER OPUS, TORC, CYPHER 2 OPUS, Sca. LAPACK Across 3 Clusters OPUS 3500 OPUS, CYPHER OPUS, TORC, CYPHER 2 OPUS, 4 TORC, 6 CYPHER Time (seconds) 3000 2500 2000 8 OPUS, 4 TORC, 4 CYPHER 8 OPUS, 2 TORC, 6 CYPHER 1500 6 OPUS, 5 CYPHER 1000 8 OPUS, 6 CYPHER 500 5 OPUS 8 OPUS 0 0 5000 10000 Matrix Size 15000 20000

Largest Problem Solved • Matrix of size 30, 000 — 7. 2 GB for Largest Problem Solved • Matrix of size 30, 000 — 7. 2 GB for the data — 32 processors to choose from at UIUC and UT – Not all machines have 512 MBs, some little as 128 MBs — PM chose 17 processors in 2 clusters from UT — Computation took 84 minutes – 3. 6 Gflop/s total – 210 Mflop/s per processor – Sca. LAPACK on a cluster of 17 processors would get about 50% of peak – Processors are 500 MHz or 500 Mflop/s peak – For this Grid computation 20% less than Sca. LAPACK

PDSYEVX – Timing Breakdown PDSYEVX – Timing Breakdown

Gig-E 100 MB/sec Cactus 17 4 12 5 2 2 OC-12 line (But only Gig-E 100 MB/sec Cactus 17 4 12 5 2 2 OC-12 line (But only 2. 5 MB/sec) SDSC IBM SP 1024 procs 5 x 12 x 17 =1020 12 NCSA Origin Array 256+128 5 x 12 x(4+2+2) =480 5 • Solved equations for gravitational waves (real code) • • Used 10 ghost zones along direction of machines: communicate every 10 steps — Tightly coupled, communications required through derivatives — Must communicate 30 MB/step between machines — Time step takes 1. 6 sec Compression/decompression on all data passed in this direction Achieved 70 -80% scaling, ~200 GF (only 14% scaling without tricks) Gordon Bell Award Winner at SC’ 2001

Progress Report • • Testbeds Working • Prototype Execution System Complete • Prototype Program Progress Report • • Testbeds Working • Prototype Execution System Complete • Prototype Program Preparation Tools Under Development Application Studies Complete — Sca. LAPACK and Cactus — Gr. ADS functionality built in — All components of Execution System (except rescheduling/migration) — Six applications working in new framework — Demonstrations at SC 02 – Sca. LAPACK, FASTA, Cactus, Gr. ADSAT – In NPACI, NCSA, Argonne, Tennessee, Rice Booths — Black-box performance model construction preliminary experiments — Prototype mapper generator complete – Generated Grid version of HPF appplication Tomcatv

SC 02 Demo Applications • Sca. LAPACK • Cactus • FASTA • Smith-Waterman • SC 02 Demo Applications • Sca. LAPACK • Cactus • FASTA • Smith-Waterman • Tomcatv • Satisfiability — LU decomposition of large matrices — Solver for gravitational wave equations — Collaboration with Ed Seidel’s Grid. LAB — Biological sequence matching on distributed databases — Another sequence matching application using a strong algorithm — Vectorized mesh generation written in HPF — An NP-complete problem useful in circuit design and verification

Summary • Goal: • Strategy: — Design and build programming systems for the Grid Summary • Goal: • Strategy: — Design and build programming systems for the Grid that broaden the community of users who can develop and run applications in this complex environment — Build an execution environment that automates the most difficult tasks – Maps applications to available resources – Manages adapting to varying loads and changing resources — Automate the process of producing Grid-ready programs – generate performance models and mapping strategies semiautomatically — Construct programs using high-level domain-specific programming interfaces

Resources • Gr. ADS Web Site — http: //hipersoft. rice. edu/grads/ — Contains: – Resources • Gr. ADS Web Site — http: //hipersoft. rice. edu/grads/ — Contains: – Planning reports – Technical reports – Links to papers