c6f1585e933d498adae080406aeb0266.ppt
- Количество слайдов: 41
SAGA-based Frameworks: Supporting Application Usage Modes Shantenu Jha Director, Cyber-Infrastructure Development, CCT Text Asst Research Professor, CS e-Science Institute, Edinburgh http: //www. cct. lsu. edu/~sjha http: //saga. cct. lsu. edu
Outline (1) Understanding Distributed Applications (DA) Differ from HPC or || App, Challenges of DA Development Objectives (IDEAS) Understanding SAGA (and the SAGA-Landscape) Rough Taxonomy of Distributed Applications Text Using SAGA to develop Distributed Applications Examples: Application & Application Frameworks Discuss how IDEAS are met Some SAGA-based Tools and Projects Adv. Of Standards Derive (Initial) User Requirements for Future. Grid
Understanding Distributed Applications Critical Perspectives The number of applications that utilize multiple sites sequentially, concurrently or asynchronously is low (~5%): Not referring to tightly-coupled across multiple-sites Distributed CI: Is the whole > than the sum of the parts? Managing data and applications across multiple resources is (increasingly) hard: Distributed Data/Jobs vs Bring it to the Computing Compute where data is or Data to where computing is Challenges qualitatively and quantitatively set to get worse: Increasing complexity, heterogeneity and scale
Understanding Distributed Applications Require: Coordination over Multiple & Distributed sites: Scale-up and Scale-out Peta/Exa/Atta - Scientific Applications requiring multiple-runs, ensembles, workflows etc. Core characteristics of logically and physically distributed applications are the SAME Application Usage Mode: Composed using Application as the UNIT of execution Not a workflow (i. e. , composed using control and data flow) Usage Mode: Closer to an Abstract Workflow (template) Examples: Run once; or Set of copies of an application with varied input data (Ensemble); Loosely-Coupled ensembles. .
Understanding Distributed Applications Development Challenges • Fundamentally a hard problem: • Dynamical Resource, Heterogeneous resources • Add to it: Complex underlying infrastructure • Programming Systems for Distributed Applications: • Incomplete? Customization? Extensibility? Text • What should end-user control? Must control? • Computational Models of Distributed Computing • Range of DA, no clear taxonomy • More than (peak) performance • Application Usage Mode • Inter-play of Application, Infrastructure, Usage Mode
Understanding Distributed Applications Implicit vs Explicit ? Which approach (implicit vs explicit) is used depends: How the application is used? Need to control/marshall more than one resource? Why distributed resources are being used? How much can be kept out of the application? Can’t predict in advance? Not obvious what to do, application-specific metric If possible, Applications should not be explicitly distributed GATEWAYS approach: Implicit for the end-users Supporting Applications? Or Application Usage Modes?
Understanding Distributed Applications Development Objectives Interoperability: Ability to work across multiple distributed resources Distributed Scale-Out: The ability to utilize multiple distributed resources concurrently Extensibility: Support new patterns/abstractions, different programming systems, functionality & Infrastructure Adaptivity: Response to fluctuations in dynamic resource and availability of dynamic data Simplicity: Accommodate above distributed concerns at different levels easily… Challenge: How to develop DA effectively and efficiently with the above as first-class objectives?
SAGA: Basic Philosophy There exists a lack of Programmatic approaches that: Provide general-purpose common grid functionality for applications and thus hide underlying complexity, varying semantics. . Hides “bad” heterogeneity, means to address “good” heterogeneity Building blocks upon which to construct higher-levels of functionality and abstractions Meets the need for a Broad Spectrum of Application: Simple Distributed Scripts, Gateways, Smart Applications and Text Production Grade Tooling, Workflow… Simple, integrated, stable, uniform and high-level interface Simple and Stable: 80: 20 restricted scope and Standard Integrated: Similar semantics & style across commonly used distributed functional requirements Uniform: Same interface for different distributed systems SAGA: Provides Application* developers with basic units required to compose high-functionality across different distributed systems (*) One person’s Application is another person’s Tool
SAGA: In a Thousand Words
SAGA: Job Submission Role of Adaptors (middleware binding) Text
SAGA Job API: Example
SAGA Job Package
SAGA File Package
File API: Example
SAGA Advert
SAGA Advert API: Example
SAGA: Other Packages
SAGA: Implementations Currently there are several implementations under active development: C++ Reference Implementation (LSU) -- OMII-UK http: //saga. cct. lsu. edu/cpp/ Java Implementation (VU Amsterdam), part of the OMII-UK project http: //saga. cct. lsu. edu/java/ JSAGA (IN 2 P 3/CNRS) http: //grid. in 2 p 3. fr/jsaga/ DEISA (partial) job, file package C++: Currently at v 1. 3. 3 (October 2009) Python bindings to the C++ available Good faith effort to keep things working
SAGA: Available Adaptors Job Adaptors Fork (localhost), SSH, Condor, Globus GRAM 2, OMII Grid. SAM, Amazon EC 2, Platform LSF File Adaptors Local FS, Globus Grid. FTP, Hadoop Distributed Filesystem (HDFS), Cloud. Store KFS, Open. Cloud Sector-Sphere Replica Adaptors Postgre. SQL/SQLite 3, Globus RLS Advert Adaptors Postgre. SQL/SQLite 3, Hadoop H-Base, Hypertable
SAGA: Available Adaptors Other Adaptors Default RPC / Stream / SD Planned Adaptors CURL file adaptor, g. Lite job adaptor Open issues: Consolidating the Adaptor code base and adding rigorous tests in order to improve adaptor quality Capability Provider Interface (CPI - the ‘Adaptor API’) is not documented or standardized (yet), but looking at existing adaptor code should get you started if you want to develop your own adaptor Proof by example. .
SAGA and Distributed Applications
Taxonomy of Distributed Application Example of Distributed Execution Mode: Implicitly Distributed 1000 job submissions on the TG SAGA shell example/tutorial Example of Explicit Coordination and Distribution Explicitly Distributed DAG-based Workflows En. KF-HM application Example of SAGA-based Frameworks Map. Reduce, Pilot-Jobs
Development Distributed Application Frameworks Frameworks: Logical structure for Capturing Application Requirements, Characteristics & Patterns Pattern: Commonly recurring modes of computation Programming, Deployment, Execution, Data-access. . Abstraction: Mechanism to support patterns and application characteristics Frameworks designed to either: • Support Patterns: Map-Reduce, Master-Worker, Hierarchical Job-Submission • Provide the abstractions and/or support the requirements & characteristics of applications • i. e. Encode a Usage-Mode using a Framework
Abstractions for Distributed Computing (1) Big. Job: Container Task Adaptive: Type A: Fix number of replicas; vary cores assigned
Abstractions for Distributed Computing (2) SAGA Pilot-Job (Glide-In)
Coordinate Deployment & Scheduling of Multiple Pilot-Jobs
Distributed Adaptive Replica Exchange (DARE) Scale-Out, Dynamic Resource Allocation and Aggregation
Multi-Physics Runtime Frameworks Extensibility Coupled Multi-Physics require two distinct, but concurrent simulations Can co-scheduling be avoided? Adaptive execution model: Yes Load-balancing required. Capability comes for free! First demonstrated multiplatform Pilot-Job: TG(MD) – Condor (CFD)
Dynamic Execution Reduced Time to Solution
Ensemble Kalman Filters Heterogeneous Sub-Tasks Ensemble Kalman filters (En. KF), are recursive filters to handle large, noisy data; use the En. KF for history matching and reservoir characterization En. KF is a particularly interesting case of irregular, hard-to -predict run time characteristics:
Results: Scale-Out Performance Using more machines decreases the TTC and variation between experiments Using BQP decreases the TTC & variation between experiments further Lowest time to completion achieved when using BQP and all available resources
Performance Advantage from Scale-Out But Why does BQP Help?
Understanding Distributed Applications Development Objectives Redux Interoperability: Ability to work across multiple distributed resources SAGA: Middleware Agnostic Distributed Scale-Out: The ability to utilize multiple distributed resources concurrently Support Multiple Pilot-Jobs: Ranger, Abe, QB Extensibility: Support new patterns/abstractions, different programming systems, functionality & Infrastructure Pilot-Job also Coupled CFD-MD, Integrated BQP Adaptivity: Response to fluctuations in dynamic resource and availability of dynamic data Simplicity: Accommodate above distributed concerns at different levels easily…
SAGA: Bridging the Gap between Infrastructure and Applications Focus on Application Development and Characteristics, not infrastructure details
SAGA-based Tools and Projects JSAGA from IN 2 P 3 (Lyon) http: //grid. in 2 p 3. fr/jsaga/index. html Slides Ack: Sylvain Renaud GANGA-DIANE (EGEE) http: //faust. cct. lsu. edu/trac/saga/wiki/Applications/Ganga. SAGA Text Slides Ack: Jackub Mosciki, Massimo L, O. Weidner NAREGI/KEK (Active) DESHL DEISA-based Shell and Workflow library Xtreem. OS SD Specification With g. Lite adaptors
JSAGA: Implementer and user of SAGA Applications JSAGA jobs collection JSAGA uses SAGA in a module, which hides heterogeneity of grid infrastructures SAGA JSAGA core engine + plug-ins JSAGA implements SAGA to hide heterogeneity of middlewares Legacy APIs 36
Projects using JSAGA n Elis@ / – a web portal for submitting jobs to industrial and research grid infrastructures n Sim. Explorer – a set of tools for managing simulation experiments – includes a workflow engine that submit jobs to heterogeneous distributed computing resources n JJS – a tool for running efficiently short-life jobs on EGEE n JUX – a multi-protocols file browser JSAGA 37
DIANE INTEGRATION cont. Diane without SAGA Diane with SAGA
Applications on heterogeneous resources ing rat de e. Maste rces! F ou resr Payload distribution (Not in this demo: cloud resources, additional Grid Application-aware Ganga/SAGAinfrastructures…) (to *) (and resource- aware) scheduling Ganga/SAGA (to Tera. Grid) Agents scheduling Heterogeneous resources allocation (Ganga + Ganga/SAGA) Ganga/g. Lite
Acknowledgements SAGA Team and DPA Team and the UK-EPSRC (UK EPSRC: DPA, OMII-UK PAL) People: SAGA D&D: Hartmut Kaiser, Ole Weidner, Andre Merzky, Joohyun Kim, Lukasz Lacinski, João Abecasis, Chris Miceli, Bety Rodriguez-Milla SAGA Users: Andre Luckow, Yaakoub el-Khamra, Kate Stamou, Cybertools (Abhinav Thota, Jeff, N. Kim), Owain Kenway Google So. C: Michael Miceli, Saurabh Sehgal, Miklos Erdelyi Collaborators and Contributors: Steve Fisher & Group, Sylvain Renaud (JSAGA), Go Iwai & Yoshiyuki Watase (KEK) DPA: Dan Katz, Murray Cole, Manish Parashar, Omer Rana, Jon Weissman
c6f1585e933d498adae080406aeb0266.ppt