Performance Engineering Large Scale Computing Systems SC 07

Скачать презентацию Performance Engineering Large Scale Computing Systems SC 07

69a9303e5c4ae0c02cc881e21ee0d864.ppt

Количество слайдов: 27

Performance Engineering Large Scale Computing Systems SC 07 -APART Workhop on: Performance Analysis and Optimization of High-End Computing Systems Dr. Frederica Darema CISE/NSF

Outline • • • The BIG PICTURE Applications Directions Computing Platforms Directions Research and Technology Directions Examples of some advances Future Challenges and Opportunities 2

Science, Engineering, and “Commercial” Applications Environments: how are they shaping in the future What does it entail for: Large-Scale Computing and. . for Large-Scale High-End Computing 3

Small-Scale and Large-Scale Systems – Increasing complexity of systems and applications … • Processing at multiple levels • Computation and data processing, both at the application and the instruments/sensors side • New Computational Units – Beyond commodity microprocessors /superscalar / (D)MT GPU/(GP)2 Us (MC-P), MT, FPGAs, GPUs, … – Populating: high-end platforms, workstations, visualization servers, data servers, etc, … • Potentially: – MC-Ps, FPGAs, GPUs at application side – MC-Ps, FPGAs, GPUs at the data acquisition side • One kind of processor EVERYWHERE? ? ? • Or Mix of MC-Ps, FPGAs, GPUs? ? ? • Pros & deficiencies in each - advances close gaps • Complexity persists and increases 4

– Vector Processors – SIMD MPPs – Distributed Memory MPs – Shared Memory MPs t en es – Distributed Platforms, Heterogeneous Computers and Networks Fu tu r e Pr Pa st Platforms Directions • Heterogeneity • Latencies – architecture – variable (internode, (computer &network) intranode) – node power • Bandwidths (supernodes, MCP) – different for different links – different based on traffic Petaflops Platform (Grid-in-a-Box) Distributed Platform tac-com alg accelerator …. MPP NOW fire cntl data base fire cntl SAR SP

Applications Directions ast – Mostly monolithic P – Mostly one programming language ure ut – – t/F n e – res Multi-Modular P– – – Multi-Language – – Multi-Developers – Multi-Source Data – – – Computation Intensive – Batch – Hours/days Computation Intensive Data Intensive Real Time Few Minutes/hours Visualization Interactive Steering Integrated Simulations&Experiments Dynamic Data Driven Applications Systems

Example of new applications and systems directions Dynamic Data Driven Application Systems (DDDAS) (www. cise. nsf. gov/dddas & www. dddas. org) DDDAS: ability to dynamically incorporate additional data into an executing application, and in reverse, ability of an application to dynamically steer the measurement process ) y or iples e Th rinc P st r (Fi Experiment Measurements Field-Data (on-line/archival) User Dynamic Integration of s Computation & Measurements/Data on ling i (from the Real-Time to the High-End) lat de ogy g u Unification of im. Mo nol elin S th e Computing Platforms & Sensors/Instruments a om Mod M n n ) ( e DDDAS guides sensor systems architectures n h erv’ sig P s De b O Challenges: Application Simulations Methods Algorithmic Stability Measurement/Instrumentation Methods Computing Systems Software Support Dynamic Feedback & Control Loop Software Architecture Frameworks Synergistic, Multidisciplinary Research

Tera. Grid • A distributed system of unprecedented scale • 30+ TF, 1+ PB, 40 Gb/s net • Unified user environment across resources • User software environment User support resources • Integrated new partners to introduce new capabilities • Additional computing, visualization capabilities • New types of resources: data collections, instruments courtesy Charlie Catlett • Created an initial community of over 500 users, 80 PIs • Created User Portal in collaboration with NMI

DDDAS: Beyond Grid Computing “Extended Grid” – “Super. GRID”: the Application Platform is the computational&measurement system Measurement Grids m Pl put at a fo tio rm na s l Co Ar St chi or val Da ed / ta s or Se ns In st ru m en t s Applications Computational Grids Super. Grids: Dynamically Coupled Networks of Data and Co 9

Examples of Tera. Grid Applications Aquaporin Mechanism Animation pointed to by 2003 Nobel chemistry prize announcement. Schulten, UIUC Atmospheric Modeling Droegemeier, OU Reservoir Modeling Wheeler/UTAustin, Saltz/OSU, Parashar/Rutgers Advanced Support for Tera. Grid Applications: Tera. Grid staff are “embedded” with applications to create - Functionally distributed workflows - Remote data access, storage and visualization - Distributed data mining - Ensemble and parameter sweep run and data management Groundwater/Flood Modeling Lattice-Boltzman Simulations Maidment, Wells, UT courtesy Charlie Catlett Coveney, UCL Bruce Boghosian, Tufts

To address the complexity of today’s and future systems, applications and their environments We need systematic modeling and analysis approaches for designing, supporting the runtime, and management of such systems Systems Performance Engineering 11

Background • Systems Modeling and Analysis increasingly important: – systems design cycle and runtime – measurements (static and runtime) – functional correctness of hw, hw and sw performance, dependability, reliability, power management, security, debugging, … • Traditionally/in the past (for example): – modeling specific aspects components, rather than full system – architectural simulators trade speed for accuracy – full-system simulators trade accuracy for speed • Want modeling/simulation capabilities that allow – accurate – cycle level resolution – complete modeling of the entire system – simulate execution of real workloads (full applications or realistic benchmarks) on top of real OS systems – allow users to probe features in the systems (hardware, systems software, application) • A number of research efforts are addressing such challenges, and more… 12

System Modeling and Analysis develop methods and tools for modeling, measuring, analyzing, evaluating, and predicting the performance, dependability, reliability, runtime management, debugging, security, etc. . for design & runtime support of complex computing and communications systems • Hardware and Software modeling – methods tools and measurements, providing multimodal, hierarchical or multilevel modeling and analysis capabilities of such systems; – methods that describe components of the system, but also the system as a total, and enable assessment of the effects of individual hardware and software layers and components of these systems; – ability to describe the system in multiple levels of detail (characteristics and time-scales); – combine different (hybrid) methods of describing components and layers, from analytical, statistical, to simulation, etc…. – performance specification languages and compilers – 13 testing & validation of developed methods and tools

System Modeling and Analysis • Modeling and measurement approaches – capabilities to describe, analyze and predict the behavior of the components as well as the systems; – analysis and prediction due to characteristics or changes in the application, system software, hardware; – multilevel approaches and multi-modal approaches • Performance Frameworks – combine tools in “plug-and-play” fashion – multiple views of the system • Use of systems modeling and analysis methods and tools beyond the design cycle. . … that is: to support optimized application composition, mapping, runtime with performance, dependability, faulttolerance 14

Systems Modeling and Analysis Distributed Applications Performance Frameworks Application Models . . . File/IO Models Scalable I/O Data Management Archiving/Retrieval Services Collaboration Environments / Authenication Authorization Fault Recovery Services OS Scheduler Models Distributed Systems Management Architecture Network Models Distributed, Heterogeneous, Dynamic, Adaptive Computing Platforms and Networks Memory Models 15 Prog. Models Compilers Libraries Tools Visualization Memory Technology CPU Technology Device Technology . . .

Multiple views of the system The Operating Systems’ view Application Models . . . IO / File Models OS Scheduler Models Architecture / Network Models Memory Models Distributed Applications Languages Compilers Libraries Tools Visualization Collaboration Environments Scalable I/O Authenication / Data Management Authorization Archiving/Retrieval Dependability Services Other Services. . . Distributed Systems Management Distributed, Heterogeneous, Dynamic, Adaptive Computing Platforms and Networks Memory Technology CPU Technology Device Technology . . .

Technology for integrated feedback & control Runtime Compiling System (RCS) and Dynamic Application Composition Application Model Dynamic Analysis Situation Distributed Programming Model Application Program Compiler Front-End Application Intermediate Representation Compiler Back-End Launch Application (s) Dynamically Link & Execute Performance Measuremetns & Models Application Components & Frameworks com Adapt puti able Infr ng Sy astr s uctu tems re Distributed Computing Resources Distributed Platform tac-com alg accelerator …. MPP NOW fire cntl data base fire cntl SAR SP

Great set of efforts that are developing systems modeling methods along these directions and leading to performance frameworks Emphasis on Multidisciplinary Research (across sub-areas of CS) Application driven validation of research and technology advances Collaborations with industry are fruitful Projects can be found in the proceedings of the Next Generation Software Workshop Series organized every year in conjunction with IPDPS 18

GRADS Project & VGRADS PI: Kennedy, (& Dan Reed, Andrew Chien, Fran Berman, Dennis Gannon, Ian Foster, Jack Dongarra, et. al) Project Goals: To develop program preparation system support for computational Grid applications and technologies to support efficient run-time management of computational Grid resources, and achieve reliable performance under varying load . Gr. ADSoft Architecture Program Preparation System Program Execution System Performance Feedback Software Components Source Application Libraries Whole. Program Compiler Configurable Object Program Performance Problem Service Negotiator Real-time Performance Monitor Negotiation Scheduler Dynamic Optimizer Performance Contracts - At the Heart of the Gr. ADS Model: 19 • Fundamental mechanism for managing mapping and execution What are they? • Mappings from resources to performance • Mechanisms for determining when to interrupt and reschedule Abstract Definition • Random Variable: r(A, I, C, t 0) with a probability distribution • A = app, I = input, C = configuration, t 0 = time of initiation • Important statistics: lower and upper bounds (95% confidence) Challenge • When should a contract be violated? • Strict adherence balanced against cost of reconfiguration Grid Runtime System

Dynamic Adaptive Systems Software for Robust and Dependable Large-Scale Systems 20 {Adve &

Montage - An Integrated End-to-End Design and Development Framework for Wireless Networks • • PI: Rappaport (& Browne, Shakkottai, Ramakrishnan, Varadarajan) {UTAustin, VTech} Project advanced the state-of-the art in fast and efficient methods for simulating largescale networks Deliverables: – generated a wide range of analytical and simulation-based modeling methods – Developed a wireless channel simulator (the Site Specific Software Simulator for Wireless - S^4 W) • S^4 W was used by the PIs to develop more powerful and efficient techniques for end-to-end improved network performance for users of both wired and wireless networks. S^4 W has been used by several universities (in US and Canada), industry (Boeing) and NASA, and commercial business (Schlotzky’s deli) • Developed fast simulation capabilities of networks • Fast hybrid network simulation using spatiotemporal dilations Flu. Net: hybrid simulation-emulation environment, based on combined fluid models • Developed scalable parallel discrete event simulator (Shakkottai, Ramakrishnan) • Open Network Emulator – Highly scalable distributed direct code execution environment; supports both simulation and emulation in a single tool; novel method, using the notion of Relativistic Time, so that the global virtual time is derived by dilating the real (wall-clock) time – Productivity with Performance through Components&Composition (Browne) • P-COM^2 environement: automated compile-time/runtime-composition of a 21 parallel programs - applied here to performance modeling

A Fast, Cycle-Accurate Computer System Technology 22

Fast and Accurate Simulation of Scalable Computer Systems {Falsafi & Hoe Proto. Flex addresses full-system and scaling complexity for FPGA-based simulation in two ways. Hybrid emulation (a) avoids reconstruction of the entire system on FPGAs. Interleaved emulation (b) lets us decouple the size and complexity of the simulated system from that of the underlying FPGA host. (a) Hybrid Emulation 23 (b) Multiple-context Interleaved Emulation

Examples of Modeling & Analysis Efforts • • • (Performance Modeling Frameworks) FPGA Accelerated Simulation Technologies – functional simulator + timing model (implemented in FPGAs) for fastest cycle-accurate, full system simulator (within 1 -3 orders of real hw) Fast and accurate simulator through sampling, checkpointing to capture the microarchitectural state, and performing cycle-accurate simulation in the selected sampled regions, to simulate full (unmodified) applications Structural and composable performance simulation of complex systems effort constructs simulators from system descriptions and component libraries (e. g. produced in 11 wks Itanium 2 simulator accurate to 3% of actual hardware) Real-time large-scale network simulation environment, through a hybrid of continuous and event-driven simulation paradigms, of a fluid-model representation the mean traffic and a packet-oriented simulation. The hybrid testbed will combine advantages of analytical models, simulation and emulation, and physical network testbeds. Component based software environment for simulation, emulation and synthesis of network protocols, integrating model-checking with event-driven simulations to allow performance evaluation and protocol validation in a unified way End-to-end design and development framework for large-scale wireless networks - composed through capabilities developed under problem solving environments application compile-time and runtime composition methods to compose the simulation and emulation systems for setting-up experimental testbeds, performance engineering methods (of the POEMS project), the Weaves runtime and the P-COM for parallel/distributed execution of discrete event simulations, and integrate low level channel models to higher level protocol layers and the 24 relativistic time temporal model developed under the collabort’n.

Examples of Modeling & Analysis Efforts (Application modeling, resource management, …) • • • Modeling system for enabling algorithm designers and programmers to develop, evaluate and compare application algorithms for CMP/CMT systems Software tools to enable access to coordinated information collected through hardware-based profiling of local and remote memory access of application computation and communication patterns Dynamic profiling of application phases for optimizing power consumption under set performance constraints for reconfigurable multi-core environments and data servers Cross platform performance estimation by partial execution of applications, capturing computation and communication parameters, and generalizing prediction to problemscaling scenarios, in parallel and distributed platforms Language support continuous monitoring of distributed systems, grids and other datacentric and network systems Adaptive resource sharing mechanisms autonomically matching resources to dynamically changing needs via statistical and stochastic approaches Data driven resource allocation in complex systems, through workload characterization analytical models and policy development Compiler enabled model- and measurement-driven adaptation environment for dependability and performance (performability) Engineering reliability at software design time by coupling software component architectural models with statistical methods to address uncertainties in design stage Tools for pro-active runtime system health monitoring and enhancement for large-scale parallel systems, by collecting and analyzing through on-line models data collected over extended periods of time, and in real-time, filtering and correlating evolving failure data with respect to factors such as workload and operating temperature, and use this information to schedule or checkpoint jobs 25

Summary Thoughts • Large scale high-End systems cannot be treated as isolated platforms • Such systems demand: enhanced and optimized computation, communication and data management capabilities, in the presence of resource heterogeneity, dynamicity, adaptivity • Need to advance the technologies that will automate the mapping of complex and dynamic applications on complex platforms with multiple and heterogeneous levels of processors, memory, and networks • Modeling and Analysis Methods – Performance Engineering of systems are crucial in enabling optimized design, runtime, and management of such systems 26

Dynamic Adaptive Systems Software for Robust and Dependable Large-Scale Systems Award 0406351: A Compiler-Enabled Model- and Measurement-Driven Adaptation Environment for Dependability and Performance William Sanders and Vikram Adve Develops compiler controlled performance data monitoring together with performance models for adaptive and optimized runtime support, in environments with underlying computational, communication, and storage resources maybe changing, as well as environments where also the application requirements may be changing Combines and advances in novel directions work on dynamic runtime compilation methods (LLVM) developed by Adve in 0093426(CAREER) NGS: Techniques and Applications of Dynamic Compilation; and system level integrated performance methods developed by Sanders in 0228762 - Next Generation Software: An Integrated Framework for Performance Engineering and Resource-Aware Compilation Other Technical impacts of the individual projects: Möbius is a performance engineering framework and tool for the evaluation of In addition to the multidisciplinary work from two sub-areas of distributed and parallel computing systems, accounting for system components computer sciences: compilers and performance modeling and analysis including the application software itself, the operating system, and the underlying the project includes collaboration with industry, and specifically with computing and communication hardware. The framework provides a means by which two senior researchers from ATT Labs-Research, which provides multiple, heterogeneous models can be composed together, each representing a resources such as production-level software, to drive and validate the different module (software or hardware), component, or view of the system. research methods, and also provides opportunities for student Möbius has made a significant worldwide impact in the research area of stochastic model analysis. The impact spans both academic and commercial domains. In addition to being the internships at the ATT Research Lab. Other Technical impacts of the individual projects: The LLVM compiler infrastructure has been publicly distributed since October 2003 and downloaded well over 2000 times since. It has attracted at least 40 serious users in academia (instructors and researchers) and industry (startups and established companies). Apple Computer has not only adopted LLVM and has set up an active group of developers working on incorporating LLVM in Apple’s products such as the next release of Mac. OS due in Spring 2007 A paper: Automatic Pool Allocation, on novel methods developed under the project and incorporated in LLVM, won a Best Paper award at the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), the premier conference in the area of compilers. 27 principal tool used in the graduate-level system reliability courses at the University of Illinois, USA and the Univ. of Florence, Italy, Möbius has been licensed to over 150 university sites throughout the world for teaching and research purposes. International Partnerships with tesearch groups from the Univ. of Twente, Dörtmund University, University of the Federal Armed Forces München, and Saarland University are partnering with the Möbius team to developing plug-in modules for the Möbius framework. The first International Möbius Developer’s Working group meeting was held in Sept. 2004, further increasing the number of groups that use Möbius in their research. Möbius has also been licensed for commercial use to many companies, including: Motorola, Iridium, Pioneer Hybrids, Windber Research Institute, General Dynamics and Boeing. For example, Möbius have been used for numerous telecommunications and computer system applications at Motorola and was designated one of three company wide system availability modeling packages. Recently, researchers have begun to use Möbius for biological applications; over 25 universities and Pioneer Hybrid (the world's largest seed producer) and Windber Research Incorporated (non-profit research organization with projects studying the disease progression of breast cancer) have licensed it for use with biological systems.