Скачать презентацию The Tera Grid An essential tool for 21 Скачать презентацию The Tera Grid An essential tool for 21

5f8f8404e27b96c2f835292f2f923f5d.ppt

  • Количество слайдов: 41

The Tera. Grid: An essential tool for 21 st century science Craig Stewart, Associate The Tera. Grid: An essential tool for 21 st century science Craig Stewart, Associate Dean, Research Technologies Chief Operating Officer, Pervasive Technology Labs Chair, Coalition for Academic Scientific Computing IU Tera. Grid Resource Partner PI Indiana University stewart@iu. edu 17 February 2008

2 License Terms • • Please cite this presentation as: Stewart, C. A. The 2 License Terms • • Please cite this presentation as: Stewart, C. A. The Tera. Grid: An essential tool for 21 st century science. (Presentation). 17 February 2010, Annual Meeting American Association for Advancement of Science, Boston, MA. Available from: http: //hdl. handle. net/2022/14527 Portions of this document that originated from sources outside IU are shown here and used by permission or under licenses indicated within this document. Items indicated with a © are under copyright and used here with permission. Such items may not be reused without permission from the holder of copyright except where license terms noted on a slide permit reuse. Except where otherwise noted, the contents of this presentation are copyright 2011 by the Trustees of Indiana University. This content is released under the Creative Commons Attribution 3. 0 Unported license (http: //creativecommons. org/licenses/by/3. 0/). This license includes the following terms: You are free to share – to copy, distribute and transmit the work and to remix – to adapt the work under the following conditions: attribution – you must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work). For any reuse or distribution, you must make clear to others the license terms of this work.

Outline • • • Why this workshop may be valuable to you – (Time Outline • • • Why this workshop may be valuable to you – (Time consuming computations on the critical path of you research? Need more storage? Do you provide scientific services/resources over the Web? ) What is cyberinfrastructure? Examples of Tera. Grid uses More detailed info about the Tera. Grid – Architecture – Storage – Computation – Science Gateway use and support, including Visualization – Data source and service hosting How can you get going using the Tera. Grid? – Resources are available to use – Help using the system is available – At the end of the talk we will help those who wish (and have laptops here) start the application process. You need your CV to finish the whole process, but you can do some of the work and save it NB: ‘Tufte was here’ 3

What is Cyberinfrastructure? • • • Indiana University’s definition of Cyberinfrastructure: “Cyberinfrastructure consists of What is Cyberinfrastructure? • • • Indiana University’s definition of Cyberinfrastructure: “Cyberinfrastructure consists of computing systems, data storage systems, advanced instruments and data repositories, visualization environments, and people, all linked together by software and high performance networks to improve research productivity and enable breakthroughs not otherwise possible. ” This and other information in Wikipedia definition of Cyberinfrastructure Some basic terms – TFLOPS - Trillions of FLOating Point operations per Second (mathematical operations) (10^12) – Processor hour - one hour of processor (CPU) utilization – TB - terabyte; PB - petabyte – Parallel programming – MPI - Message Passing Interface – WSRF - Web Services Resource Framework ©Trustees of Indiana University. May be reused so long as IU and Tera. Grid logos remain, and any modifications to original are noted. Courtesy Craig A. Stewart, IU 4

What is the Tera. Grid? • • An instrument (cyberinfrastructure) that delivers high-end IT What is the Tera. Grid? • • An instrument (cyberinfrastructure) that delivers high-end IT resources storage, computation, visualization, and data/service hosting - almost all of which are UNIX-based under the covers; some hidden by Web interfaces – A data storage and management facility: over 20 Petabytes of storage (disk and tape), over 100 scientific data collections – A computational facility - over 750 TFLOPS in parallel computing systems and growing – (Sometimes) an intuitive way to do very complex tasks, via Science Gateways, or get data via data services A service: help desk and consulting, Advanced Support for Tera. Grid Applications (ASTA), education and training events and resources The largest individual cyberinfrastructure facility funded by the NSF, which supports the national science and engineering research community Something you can use without financial cost - allocated via peer review (and without double jeopardy) ©Trustees of Indiana University. May be reused so long as IU and Tera. Grid logos remain, and any modifications to original are noted. Courtesy Craig A. Stewart, IU 5

Examples of what you can do with the Tera. Grid: Simulation of cell membrane Examples of what you can do with the Tera. Grid: Simulation of cell membrane processes • • • Simulation of Ton. B-dependent transporter (TBDT) Used 400, 000 processor (CPU) hours on systems at National Center for Supercomputing Applications, IU, Pittsburgh Supercomputing Center [45 years with one processor] Modeled mechanisms for allowing transport of molecules through cell membrane Experimental analysis not possible! Work by Emad Tajkhorshid and James Gumbart, of University of Illinois Urbana-Champaign. Mechanics of Force Propagation in Ton. B-Dependent Outer Membrane Transport. Biophysical Journal 93: 496504 (2007). Results of the simulation may be seen at www. life. uiuc. edu/emad/Ton. B-Btu. B/btub 2. 5 Ans. mpg 6 Image courtesy of Emad Tajkhorshid, UIUC

Predicting storms • Hurricanes and tornadoes cause massive loss of life and damage to Predicting storms • Hurricanes and tornadoes cause massive loss of life and damage to property • Tera. Grid supported spring 2007 NOAA and University of Oklahoma Hazardous Weather Testbed – Major Goal: assess how well ensemble forecasting predicts thunderstorms, including the supercells tornadoes – Nightly reservation at PSC – Delivers “better than real time” prediction – Used 675, 000 CPU hours for the season – Used 312 TB on HPSS storage at PSC 7 Slide courtesy of Dennis Gannon, IU, and LEAD Collaboration

Solve any Rubik’s Cube in 26 moves? • Rubik's Cube is perhaps the most Solve any Rubik’s Cube in 26 moves? • Rubik's Cube is perhaps the most famous combinatorial puzzle of its time • > 43 quintillion states (4. 3 x 10^19) • Gene Cooperman and Dan Kunkle of Northeastern Univ. proved any state can be solved in 26 moves • 7 TB of distributed storage on Tera. Grid allowed them to develop the proof 8 Source: http: //www. physorg. com/news 99843195. html

 • Resources for many disciplines! • > 40, 000 processors in aggregate • • Resources for many disciplines! • > 40, 000 processors in aggregate • Resource availability will grow during 2008 at unprecedented rates 9

The Tera. Grid Map Grid Infrastructure Group (UChicago) UW PSC UC/ANL NCAR PU NCSA The Tera. Grid Map Grid Infrastructure Group (UChicago) UW PSC UC/ANL NCAR PU NCSA IU Caltech UNC/RENCI ORNL Tennessee USC/ISI SDSC TACC LONI/LSU Resource Provider (RP) Software Integration Partner Network Hub ©University of Chicago, Courtesy Dane Skow, Director, Tera. Grid Infrastructure Group. Used with Permission. 10

But you don’t care - Tera. Grid Architecture RP 1 RP 2 POPS (for But you don’t care - Tera. Grid Architecture RP 1 RP 2 POPS (for now) User Portal Science Gateways Tera. Grid Infrastructure (Accounting, Network, Accounting, … Authorization, …) Command Line RP 3 ©University of Chicago, Courtesy Dane Skow, Director, Tera. Grid Infrastructure Group. Used with Permission and modified substantially from original by Craig A. Stewart Compute Service Viz Service Data Service 11

12 12

Data storage and management: Tape • • Tera. Grid provides persistent (up to Feb Data storage and management: Tape • • Tera. Grid provides persistent (up to Feb 2010+) storage on disk and tape Could you benefit from having a spare copy of your data stored someplace removed from your home location? Allocatable tape-based storage systems: – IU (Indiana University) - geographically distributed – NCAR (National Center for Atmospheric Research) - also supports dual copy – NCSA (National Center for Supercomputing Applications) – SDSC (San Diego Supercomputer Center) – Note: most sites have massive data storage systems that provide storage in support of computation Command line usage is reasonably straightforward with Grid. FTP; IU is developing a GUI ©Trustees of Indiana University. May be reused so long as IU and Tera. Grid logos remain, and any modifications to original are noted. Courtesy Craig A. Stewart, IU 13

©Trustees of Indiana University. May be reused so long as IU and Tera. Grid ©Trustees of Indiana University. May be reused so long as IU and Tera. Grid logos remain, and any modifications to original are noted. Courtesy Craig A. Stewart, IU 14

Data storage and management: Disk • GPFS-WAN (General Parallel File System Wide Area Network). Data storage and management: Disk • GPFS-WAN (General Parallel File System Wide Area Network). ~ 1 petabyte – Home at San Diego Supercomputer Center; may be accessed as if it were a local file system from NCAR, NCSA, IU, UC/ANL • IU Data Capacitor - Lustre – 1 petabyte of spinning disk – Primarily for short term storage of data • Long term disk storage allocations – Indiana University, National Center for Supercomputing Applications, San Diego Supercomputer Center ©Trustees of Indiana University. May be reused so long as IU and Tera. Grid logos remain, and any modifications to original are noted. Courtesy Craig A. Stewart, IU 15

Tera. Grid High Performance Computing Systems 2007 -8 UC/ANL NCSA NCAR Tennessee SDSC PU Tera. Grid High Performance Computing Systems 2007 -8 UC/ANL NCSA NCAR Tennessee SDSC PU IU 2008 (~1 PF) ORNL LONI/LSU 2007 (504 TF) TACC Computational Resources Slide Courtesy Tommy Minyard, TACC (size approximate - not to scale) 16

Two examples of Tera. Grid supercomputers • Newest addition to the Tera. Grid Texas Two examples of Tera. Grid supercomputers • Newest addition to the Tera. Grid Texas Advanced Computing Center’s Ranger – Biggest open supercomputer in world – 504 TFLOPS Sun Constellation – 15, 744 AMD Quad-core “Barcelona” processors – Disk subsystem - 1. 7 petabytes • IU’s Big Red – 30 TFLOPS – Particularly good for molecular dynamics codes – Biggest system in the Tera. Grid in summer 2006 Big Red 17 Ranger info courtesy of Tommy Minyard, TACC

Science Gateways • A Science Gateway is a domain-specific computing environment, typically accessed via Science Gateways • A Science Gateway is a domain-specific computing environment, typically accessed via the Web, that provides a scientific community with end-to-end support for a particular scientific workflow • Science Gateways are distinguished from Web portals (http: //en. wikipedia. org/wiki/Web_portal) in that portals “present information from diverse sources in a unified way. ” • Hides complexity (pay no attention to the grid behind the curtain…) ©Trustees of Indiana University. May be reused so long as IU and Tera. Grid logos remain, and any modifications to original are noted. Courtesy Craig A. Stewart, IU 18

LEAD (portal. leadproject. org) • • • Simple enough an undergraduate can use it! LEAD (portal. leadproject. org) • • • Simple enough an undergraduate can use it! National Center for Supercomputing Applications (NCSA) and IU teamed up to support Wx. Challenge weather forecast competition. 64 teams, 1000 students, ~16, 000 CPU hours on Big Red XBaya is available from http: //www. collab-ogce. org/ 19

Purdue’s Nano. HUB (www. nanohub. org) 20 Purdue’s Nano. HUB (www. nanohub. org) 20

U. Chicago SIDGrid (sidgrid. ci. uchicago. edu) 21 U. Chicago SIDGrid (sidgrid. ci. uchicago. edu) 21

IU Render Portal Image by Chris Matusek • • Image by Ralf Frieser Supports IU Render Portal Image by Chris Matusek • • Image by Ralf Frieser Supports scientific visualization Supports education in visualization, graphics, and new media ©Trustees of Indiana University. May be reused so long as IU and Tera. Grid logos remain, and any modifications to original are noted. Courtesy Craig A. Stewart, IU 22

Purdue Tera. DRE 23 Purdue Tera. DRE 23

Tera. Grid Science Gateways Accessible at http: //www. teragrid. org/programs/sci_gateways/ Title Discipline Open Science Tera. Grid Science Gateways Accessible at http: //www. teragrid. org/programs/sci_gateways/ Title Discipline Open Science Grid (OSG) Advanced Scientific Computing Special PRiority and Urgent Computing Environment (SPRUCE) Advanced Scientific Computing Massive Pulsar Surveys using the Arecibo L-band Feed Array (ALFA) Astronomical Sciences National Virtual Observatory (NVO) Astronomical Sciences High Resolution Daily Temperature and Precipitation Data for the Northeast United States Atmospheric Sciences Linked Environments for Atmospheric Discovery (LEAD) Atmospheric Sciences Computational Chemistry Grid (Grid. Chem) Chemistry Computational Science and Engineering Online (CSE-Online) Chemistry Network for Earthquake Engineering Simulation (NEES) Earthquake Hazard Mitigation GEON(GEOsciences Network) Earth Sciences Nano. HUB Nanotechnology Tera. Grid Geographic Information Science Gateway (GISolve) Geography 24

Tera. Grid Science Gateways Accessible at http: //www. teragrid. org/programs/sci_gateways/ Title Discipline CIG Science Tera. Grid Science Gateways Accessible at http: //www. teragrid. org/programs/sci_gateways/ Title Discipline CIG Science Gateway for the Geodynamics Community Geophysics Quake. Sim (Quake. Sim) Geophysics The Earth System Grid (ESG) Global Atmospheric Research National Biomedical Computation Resource (NBCR) Integrative Biology and Neuroscience Developing Social Informatics Data Grid (SIDGrid) Language, Cognition, and Social Behavior Neutron Science Tera. Grid Gateway (NSTG) Materials Research Biology and Biomedicine Science Gateway Molecular Biosciences Open Life Sciences Gateway (OLSG) Molecular Biosciences The Telescience Project Neuroscience Biology Grid Analysis Environment (GAE) Physics SCEC Earthworks Project Seismology Tera. Grid Visualization Gateway Visualization, Image Processing 25

Hosting services • Remember that old Waffle House commercial? • If you have a Hosting services • Remember that old Waffle House commercial? • If you have a data set or a data resource that serves a national community (or even a community that extends beyond your home institution… or a community you would like to extend beyond your home institution) … • Hosting of your service is available from Indiana University via our Quarry system! ©Trustees of Indiana University. May be reused so long as IU and Tera. Grid logos remain, and any modifications to original are noted. Courtesy Craig A. Stewart, IU 26

Mut. DB (www. mutdb. org) http: //www. chembiogrid. org/ 27 Mut. DB (www. mutdb. org) http: //www. chembiogrid. org/ 27

Getting an account and allocation • Get a POPS (Partnership Online Proposal System) account Getting an account and allocation • Get a POPS (Partnership Online Proposal System) account • Apply for a DAC allocation (Development Allocation Committee): < 5 TB disk, < 25 TB tape storage, and/or < 30, 000 Standard Units (SUs - related to CPU hours - in general an SU on one of the newer Tera. Grid systems is about 0. 5 CPU hours) • Wait a month (although IU can help you shorten that!) • Read the introductory documentation • Use the Tera. Grid KB if you need • Ask for help (researchtechnologies@iu. edu, help@teragrid. org) • Go discover! 28

Go to the POPS page https: //pops-submit. teragrid. org/ 29 Go to the POPS page https: //pops-submit. teragrid. org/ 29

Create a POPS Login 30 Create a POPS Login 30

Indicate that you are “New” to the Teragrid 31 Indicate that you are “New” to the Teragrid 31

Indicate that this is a “Start-up” Request 32 Indicate that this is a “Start-up” Request 32

Select DAC-TG (nonintuitive) 33 Select DAC-TG (nonintuitive) 33

Fill out PI information 34 Fill out PI information 34

Skip Co-PIs probably (unless Co-PI has current funding and you don’t) 35 Skip Co-PIs probably (unless Co-PI has current funding and you don’t) 35

Fill out info on your project 36 Fill out info on your project 36

Fill out info on your funding 37 Fill out info on your funding 37

Make reasonable estimates about your computing 38 Make reasonable estimates about your computing 38

Upload your CV and Submit! when ready 39 Upload your CV and Submit! when ready 39

Additional info • • • www. researchtechnologies. iu. edu (also pervasive. iu. edu) Getting Additional info • • • www. researchtechnologies. iu. edu (also pervasive. iu. edu) Getting started guide - includes examples of good proposals: http: //www. teragrid. org/userinfo/getting_started. php Review criteria: http: //www. teragrid. org/userinfo/access/allocationspolicy. php When you’re in a foreign country there is nothing like a guide. If you need help with the application process contact IU consultants at reseachtechnologies@iu. edu or submit a help request via the Tera. Grid (help@teragrid. org) If you are interested in having a data collection or science gateway hosted on the Tera. Grid, definitely contact IU directly (researchtechnologies@iu. edu). Do the same if you are interested in Advanced Support for Tera. Grid Allocations If you are anxious to get going, contact us as soon as you have your DAC allocation request submitted and we can provide a local login for up to 6 weeks of use 40

Acknowledgements • IU’s involvement as a Tera. Grid Resource Partner is supported in part Acknowledgements • IU’s involvement as a Tera. Grid Resource Partner is supported in part by the National Science Foundation under Grants No. ACI-0338618 l, OCI-0451237, OCI-0535258, and OCI-0504075. • The IU Data Capacitor is supported in part by the National Science Foundation under Grant No. CNS-0521433. • The Grid Infrastructure Group management of the Tera. Grid, and Dane Skow's leadership thereof, is funded by NSF grant 0503697. • Purdue’s involvement as a Tera. Grid Resource Partner is supported in part by the National Science Foundation under Grant No. OCI-050399. • This research was supported in part by the Pervasive Technology Labs and the Indiana METACyt Initiative. Both Indiana University initiatives are supported by the Lilly Endowment, Inc. • This work was supported in part by Shared University Research grants from IBM, Inc. to Indiana University. • The LEAD portal is developed under the leadership of IU Professors Dr. Dennis Gannon and Dr. Beth Plale, and supported by NSF grant 331480. Marcus Christie and Surresh Marru of the Extreme! Computing Lab contributed the LEAD graphics • The Chem. Bio. Grid Portal is developed under the leadership of IU Professor Dr. Geoffrey C. Fox and Dr. Marlon Pierce and funded via the Pervasive Technology Labs (supported by the Lilly Endowment, Inc. ) and the National Institutes of Health grant P 20 HG 003894 -01. • Many of the ideas presented in this talk were developed under a Fulbright Senior Scholar’s award to Stewart, funded by the US Department of State and the Technische Universitaet Dresden. • Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation (NSF), National Institutes of Health (NIH), Lilly Endowment, Inc. , or any other funding agency. • This work is made possible by the dedicated efforts of the expert staff of the Research Technologies Division of University Information Technology Services, the faculty and staff of the Pervasive Technology Labs, and the staff of UITS generally. Erik Cornet, Mike Lowe, Scott Tiege, Michael Grobe, and Malinda Lingwall helped with this presentation. • Thanks to the faculty and staff with whom we collaborate locally at IU and globally (within the US via the Tera. Grid, and internationally via collaboration with Technische Universitaet Dresden) Thank you! Any questions? 41