
9f3bdad07ab91e087b20252740c18955.ppt
- Количество слайдов: 38
The Global Storage Grid Or, Managing Data for “Science 2. 0” Ian Foster Computation Institute Argonne National Lab & University of Chicago
“Web 2. 0” l Software as services u l Services as platforms u l Declan Butler, Nature Easy composition of services to create new capabilities (“mashups”)—that themselves may be made accessible as new services Enabled by massive infrastructure buildout u u l Data- & computation-rich network services Google projected to spend $1. 5 B on computers, networks, and real estate in 2006 Dozens of others are spending substantially Paid for by advertising 2
Science 2. 0: E. g. , Cancer Bioinformatics Grid Data Service @ uchicago. edu <BPEL <Workflow Inputs> Doc> link <Workflow Results> BPEL Engine link Analytic service @ duke. edu Analytic service @ osu. edu ca. Bi. G: https: //cabig. nci. nih. gov/ BPEL work: Ravi Madduri et al. 3
Science 2. 0: E. g. , Virtual Observatories User Discovery tools Analysis tools Gateway Data Archives Figure: S. G. Djorgovski 4
Science 1. 0 Science 2. 0: For Example, Digital Astronomy Tell me about this star Tell me about these 20 K stars Support 1000 s of users E. g. , Sloan Digital Sky Survey, ~40 TB; others much bigger soon 5
Global Data Requirements l Service consumer u u Specify compute-intensive analyses u Compose multiple analyses u l Discover data Publish results as a service Service provider u Host services enabling data access/analysis u Support remote requests to services u Control who can make requests u Support time-varying, data- and computeintensive workloads 6
Analyzing Large Data: “Move Computation to the Data” l But: u Amount of computation can be enormous u Load can vary tremendously u l Users want to compose distributed services data must sometimes be moved, anyway Fortunately u u Networks are getting much faster (in parts) Workloads can have significant locality of reference 7
“Move Computation to the Core” Poorly connected “periphery” Highly connected “core” 8
Highly Connected “Core”: For Example, Tera. Grid 75 Teraflops (trillion calculations/s) = 12, 500 faster than all 6 billion humans on earth each doing one calculation per second 30 Gigabits/s to large sites = 20 -30 times major uni links = 30, 000 times my home broadband = 1 full length feature film per sec ANL • 16 Supercomputers - 9 different types, multiple sizes • World’s fastest network • Globus Toolkit and other middleware providing single login, application management, data movement, web services LA SDSC Starlight TACC NCSA PU IU Atlanta PSC ORNL 9
4000 jobs Open Science Grid May 7 -14, 2006 10
The Two Dimensions of Science 2. 0 Function Resource l Decompose across network Users Discovery tools Clients integrate dynamically u u Select “best of breed” providers u l Select & compose services Analysis tools Publish result as new services Data Archives Decouple resource & service providers Fig: S. G. Djorgovski 11
Technology Requirements: Integration & Decomposition Users l u u l Composition Service-oriented applications Wrap applications & data as services Compose services into workflows Service-oriented Grid infrastructure u Workflows Invocation Appln Service Provisioning Provision physical resources to support application workloads “The Many Faces of IT as Service”, ACM Queue, Foster, Tuecke, 2005 12
Technology Requirements: Within the Core … l Provide “service hosting services” that allow consumers to negotiate the hosting of arbitrary data analysis services l Dynamically manage resources (compute, storage, network) to meet diverse computational demands l Provide strong internal security, bridging to diverse external sources of attributes 13
Globus Software Enables Grid Infrastructure l Web service interfaces for behaviors relating to integration and decomposition u u l Primitives: resources, state, security Services: execution, data movement, … Open source software that implements those interfaces u l In particular, Globus Toolkit (GT 4) All standard Web services u “Grid is a use case for Web services, focused on resource management” 14
Open Source Grid Software Data Replication Globus Toolkit v 4 www. globus. org Credential Mgmt Replica Location Grid Telecontrol Protocol Delegation Data Access & Integration Community Scheduling Framework Web. MDS Python Runtime Community Authorization Reliable File Transfer Workspace Management Trigger C Runtime Authentication Authorization Grid. FTP Grid Resource Allocation & Management Index Java Runtime Security Data Mgmt Execution Mgmt Info Services Common Runtime 15 Globus Toolkit Version 4: Software for Service-Oriented Systems, LNCS 3779, 2 -13, 2005
GT 4 Data Services l Data movement u u u l Grid. FTP—secure, reliable, performant Reliable File Transfer: managed transfers Data Replication Service—managed replication Replica Location Service u l Disk-to-disk on Tera. Grid Scales to 100 s of millions of replicas Data Access & Integration services u Access to, and server-side processing, of structured data 16
Security Services l Attribute Authority (ATA) u l Issue signed attribute assertions (incl. identity, delegation & mapping) Authorization Authority (AZA) u Decisions based on assertions & policy Delegation Assertion VO Resource Admin User A User B can use Service A Attribute VO ATA Mapping ATA VO Me mber Attribu te VO Member Attribute VO User B VO AZA VO A Service VO-A Attr VO-B Attr VO B Service 17
Service Hosting Policy Client Allocate/provision Configure Initiate activity Monitor activity Control activity Interface Activity Environment Resource provider WSRF (or WS-Transfer/WS-Man, etc. ), Globus GRAM, Virtual Workspaces 18
Virtual OSG Clusters OSG cluster Xen hypervisors Tera. Grid cluster “Virtual Clusters for Grid Communities, ” Zhang et al. , CCGrid 2006 19
Managed Storage: Grid. FTP with Ne. ST (Demoware) Custom Application GT 4 Ne. ST (File transfers) (GSI-FTP) (Lot operations, etc. ) (chirp) Grid. FTP Server Ne. ST Module Ne. ST Server (File transfer) (chirp) Chirp Disk Storage Bill Allcock, Nick Le. Roy, Jeff Weber, et al. 20
Hosting Science Services 1) Integrate services from other sources u Virtualize external services as VO services Content Services Capacity Community Services Provider Capacity Provider 2) Coordinate & compose u Create new services from existing ones “Service-Oriented Science”, Science, 2005 21
Virtualizing Existing Services l Establish service agreement with service u l E. g. , WS-Agreement Delegate use to community (“VO”) user User A VO User B VO Admin Existing Services 22
The Globus-Based LIGO Data Grid LIGO Gravitational Wave Observatory Birmingham • §Cardiff AEI/Golm Replicating >1 Terabyte/day to 8 sites >40 million replicas so far MTBF = 1 month www. globus. org/solutions 23
Data Replication Service l Pull “missing” files to a storage system Data Location Data Movement Data Replication List of required Files Replica Location Index Grid. FTP Reliable File Transfer Service Grid. FTP Local Replica Catalog Replica Location Index Data Replication Service “Design and Implementation of a Data Replication Service Based on the 24 Lightweight Data Replicator System, ” Chervenak et al. , 2005
Data Replication Service: Dynamic Deployment Deploy service Deploy container Deploy virtual machine Deploy hypervisor/OS Procure hardware DRS JVM VM Grid. FTP LRC Grid. FTP VO Services VM Hypervisor/OS Physical machine State exposed & access uniformly at all levels Provisioning, management, and monitoring at all levels 25
Decomposition Enables Separation of Concerns & Roles S 1 User D S 3 “Provide access to data D at S 1, S 2, S 3 with performance P” Service Provider “Provide storage with performance P 1, network with P 2, …” Resource Provider S 2 S 1 D S 2 S 3 Replica catalog, User-level multicast, … S 1 D S 2 S 3 26
Example: Biology Public PUMA Knowledge Base Information about proteins analyzed against ~2 million gene sequences Back Office Analysis on Grid Millions of BLAST, BLOCKS, etc. , on OSG and Tera. Grid Natalia Maltsev et al. , http: //compbio. mcs. anl. gov/puma 2 27
Genome Analysis & Database Update (GADU) on OSG 3, 000 jobs April 24, 2006 GADU 28
Example: Earth System Grid l Climate simulation data u u Different user classes u l Per-collection control Server-side processing Implementation (GT) u Portal-based User Registration (PURSE) u PKI, SAML assertions u Grid. FTP, GRAM, SRM l >2000 users l >100 TB downloaded www. earthsystemgrid. org — DOE OASCR 29
“Inside the Core” of ESG 30
Example: Astro Portal Stacking Service l Purpose u l Challenge u u l On-demand “stacks” of random locations within ~10 TB dataset Rapid access to 1010 K “random” files Time-varying load Solution u + + + + = S 4 Web page or Web Service Sloan Data Dynamic acquisition of compute, storage Joint work with Ioan Raicu & Alex Szalay 31
Preliminary Performance (Tera. Grid, LAN GPFS) Joint work with Ioan Raicu & Alex Szalay 32
Example: Cybershake Calculate hazard curves by generating synthetic seismograms from estimated rupture forecast Hazard Map Strain Green Tensor Rupture Forecast Synthetic Seismogram Spectral Acceleration Hazard Curve Tom Jordan et al. , Southern California Earthquake Center 33
Enlisting Tera. Grid Resources Provenance Catalog Data Catalog Workflow Scheduler/Engine VO Service Catalog SCEC Storage Tera. Grid Storage VO Scheduler Tera. Grid Compute 20 TB, 1. 8 CPU-year Ewa Deelman, Carl Kesselman, et al. , USC Information Sciences Institute 34
http: //dev. globus. org Guidelines (Apache) Infrastructure (CVS, email, bugzilla, Wiki) Projects Include … 35 dev. globus — Community Driven Improvement of Globus Software, NSF OCI
Summary l “Science 2. 0”—science as service, & service as platform—demands u u New technology—hosting & management u l New infrastructure—service hosting New policy—hierarchically controlled access Data & storage management cannot be separated from computation management u l And increasingly become community roles A need for new technologies, skills, & roles u Creating, publishing, hosting, discovering, composing, archiving, explaining … services 36
Acknowledgements l Carl Kesselman for many discussions l Many colleagues, including those named on slides, for research collaborations and/or slides l Colleagues involved in the Tera. Grid, Open Science Grid, Earth System Grid, ca. BIG, and other Grid infrastructures l Globus Alliance members for Globus software R&D l DOE OASCR, NSF OCI, & NIH for support 39
For More Information l Globus Alliance u l Dev. Globus u l www. opensciencegrid. org Tera. Grid u l dev. globus. org Open Science Grid u l www. globus. org www. teragrid. org Background u 2 nd Edition www. mkp. com/grid 2 www. mcs. anl. gov/~foster Thanks for DOE, NSF, and NIH for research support!!40
9f3bdad07ab91e087b20252740c18955.ppt