dc2f95737aefac5fa109d20a3fd0911a.ppt
- Количество слайдов: 16
Beyond Workflows - DOE Cloud Computing Paradigm and the SDM Role and Future Mladen A. Vouk, Nagiza Smatova, Paul Breimyer, Pierre Moualem, Mei Nagappan, and the whole SPA team (list available separately) Scientific Data Management Center – Scientific Process Automation Group NC State University, Raleigh, NC 27695 your name here 1
Overview • Scientific Workflow technology – A success story from the past 7 years in the SDM center (a technology used in production or otherwise by application people) – Developed components: Workflows, Provenance, “Dashboard”, other • DOE SDM “Cloud” -Vision for the future of the SDM centre – Integration of components - Intelligent Analytics and Social Networks, Component-based “cloud”, Integrated Services (service oriented architecture) • Sustainable science - Long term approach for the survival of SDM center technology (Beyond Sci. DAC and longer) – Integration of Research, Engineering, Transfer-of. Technology, Partnerships, Results (ROI, TOC) your name here 2
Scientific Process Automation • A key differentiating element of a successful information technology (IT) is its ability to become a true, valuable, and economical contributor to cyberinfrastructure. • An IT-assisted workflow represents a series of structured activities and computations that arise in information assisted problem solving. • Scientific process automation principles, as well as production level pilots, is SDM’s Key Contribution over last 7 years – Smokey Mountains retreat. • From NC State: numerous publications, 3 graduated Ph. D and 4 MS with thesis students, several in progress, several generations of software. your name here 3
Environment Computations Analytics Networking Local/Remote … “Cloud” Services Orchestration (Kepler) Control Panels (Dashboard) & Display Data, Data. Bases Provenance… Storage 4 your name here
Workflow Framework Control Plane (light data flows) Kepler Execution Plane (“Heavy Lifting” Computations and flows) Provenance, Tracking & Meta-Data (DBs and Portals) Synchronous or Asynchronous 5 your name here
Actor/Process in a Broader Sense Out In Network/”Cloud” Bsub < code_run ------ where code_run is a script -------code_run #! /bin/csh source /usr/local/lsf/conf/cshrc. lsf #BSUB -W 5 #BSUB -n 100 mpiexec. /code #BSUB -o /share/vouk/WFLOW/code. out. %J #BSUB -e /share/vouk/WFLOW/code. err. %J #BSUB -J codevouk your name here ------------- 6 6
Modular Framework Supercomputers + Analytics Nodes Kepler Storage Auth Dash Meta-Data about: Processes, Data, Workflows, System, Apps & Environment your name here Access Rec Data Disp API Store API Management API Orchestration Trust 7
Read More … • Singh M. P. and M. A. Vouk, "Network Computing, " in John G. Webster (editor), Encyclopedia of Electrical and Electronics Engineering, John Wiley & Sons, New York, Vol. 14, pp. 114 -132, 1999 • S Klasky, M Beck, V Bhat, E Feibush, B Ludäscher, M Parashar, A Shoshani, D Silver and M Vouk, "Data management on the fusion computational pipeline, " Sci. DAC 2005, Journal of Physics: Conference Series 16 (2005), 510 -520, doi: 10. 1088/1742 -6596/16/1/070 • Ilkay Altintas, Oscar Barney, Zhengang Cheng, Terence Critchlow, Bertram Ludaescher, Steve Parker, Arie Shoshani and Mladen Vouk, "Accelerating the scientific exploration process with scientific workflows, " sci. DAC 2006, Journal of Physics: Conference Series 46 (2006), 468 -478, doi: 10. 1088/1742 -6596/46/1/065 • M. A. Vouk, I. Altintas R. Barreto, J. Blondin, Z. Cheng, T. Critchlow, A. Khan, S. Klasky, J. Ligon, B. Ludaescher, P. A. Mouallem, S. Parker, N. Podhorszki, A. Shoshani, C. Silva, " Automation of Network-Based Scientific Workflows, " Proc. of the IFIP Wo. Co 9 on Grid-based Problem Solving Environemnts: Implications for Development and Deployment of Numerical Software, IFIP WG 2. 5 on Numerical Software, Prescott, AZ, 2006, printed in IFIP, Vol 239, "Grid-Based Problem Solving Environments, eds. Gaffney PW and Pool JCT (Boston: Springer), pp. 35 -61, 2007 • Klasky, S. ; Barreto, R. ; Kahn, A. ; Parashar, M. ; Podhorszki, N. ; Parker, S. ; Silver, D. ; Vouk, M. A. "Collaborative visualization spaces for petascale simulations, " Proceedings of the CTS 2008 International Symposium on Collaborative Technologies and Systems, pp 203 -211, Digital Object Identifier 10. 1109/CTS. 2008. 4543933, 10 -23 May 2008 • More… http: //sdm. ncsu. edu your name here 8
DOE Cloud • “Cloud” computing – builds on decades of research in virtualization, distributed computing, utility computing, grids, and more recently networking, web and software services. • It implies a seamless service oriented and component-based architecture - delivery of an integrated and orchestrated suite of ondemand functions to an end-user through composition of both loosely and tightly coupled functions, or services - often network-based, reduced information technology overhead for the end-user, service orchestration, virtualization of resources, great flexibility, reduced total cost of ownership, different “flavors”. • Intelligent Analytics and Knowledge-Creating Social Networks, Component-based “Clouds”, Seamless/Integrated Services • Necessary in the context of Peta- and Exa- sciences, data, etc. your name here 9
“Analytics Cloud" Workflow control plane Conceptdriven Analytics Knowledge creation & Integration, Social Networking, Provenance, Tracking & Meta-Data (DBs and Portals) W/F Engine Generation Wizard Run-time Manager and Scheduler Synchronous & Asynchronous Services Execution Plane - “Heavy duty” in-cloud Computations, Flows Services Analytics Enabled Resources Supercomputers your name here Clusters Active Storage Other “cloud” devices 10
Components • Reusability (elements can be re-used in other workflows) • Substitutability (alternative implementations are easy to insert, very precisely specified interfaces are available, run-time component replacement mechanisms exist, there is ability to verify and validate substitutions, etc), extensibility and scalability (ability to readily extend system component pool and to scale it, increase capabilities of individual components, have an extensible and scalable architecture that can automatically discover new functionalities and resources, etc), • Customizability (ability to customize generic features to the needs of a particular scientific domain and problem), • Composability (easy construction of more complex functional solutions using basic components, reasoning about such compositions, etc. ). There are other characteristics that also are very important. • Reliability and availability of the components and services, • Cost - the cost of the services, total cost of ownership, economy of scale • Security and privacyand so on. your name here 11
Example: Meta-Data Framework Supercomputers + Analytics Kepler? Storage Auth Rec API DB Disp API Other. . . Dash Orchestration Custom Web 12 your name here
Fault-Tolerance – Clouds of Clouds Master DB (replicated) your name here 13
User Categories • Developers (10) • Service Authors (100 to 1, 000) • Service Integrators (100– 10, 000) • End-users (1000 - ? ) your name here 14
Read More … • Sam Averitt, Michael Bugaev, Aaron Peeler, Henry Shaffer, Eric Sills, Sarah Stein, Josh Thompson, Mladen Vouk “Virtual Computing Laboratory (VCL), ” In the proceedings of the International Conference on Virtual Computing Initiative, May 78, 2007, IBM Corp. , Research Triangle Park, NC, pp. 1 -16. • Mladen Vouk, Sam Averitt, Michael Bugaev, Andy Kurth, Aaron Peeler, Andy Rindos*, Henry Shaffer, Eric Sills, Sarah Stein, Josh Thompson , “Powered by VCL” - Using Virtual Computing Laboratory (VCL) Technology to Power Cloud Computing, Published in the Prelim. Proceedings of the 2 nd International Conference on Virtual Computing Initiative, 15 -16 May 2008, RTP, NC, pp. 1 -10, final version to be available through the ACM Digital Library • Mladen A. Vouk, “Cloud Computing – Issues, Research and Implementations, ” ITI 08, to appear in IEEE Digital Library • Google for “cloud computing” … • Other. . your name here 15
Sustainable Science • A Long term approach for the survival of SDM center technology (Beyond Sci. DAC and longer) • Research • Engineering • Transfer-of-Technology, • Partnerships with scientists • Operational open-source tools • Visible results (agreed upon ROI, and an accounting of TOC) your name here 16
dc2f95737aefac5fa109d20a3fd0911a.ppt