Introduction to Grid Computing Ann Chervenak and Ewa

Скачать презентацию Introduction to Grid Computing Ann Chervenak and Ewa

8a21d5e9e540ecb5597fb76f2d4765a6.ppt

Количество слайдов: 25

Introduction to Grid Computing Ann Chervenak and Ewa Deelman USC Information Sciences Institute

Outline l l l Motivation Definition and characteristics of Grids Example Grid applications Grid Architecture How a Grid Is Assembled Overview of the Globus Toolkit u u l Security Tools Monitoring and Discovery System Computing/Execution Tools Data Tools A more detailed example: The Earth System Grid 2

Motivation: Supporting Scientific Applications l Computation intensive u u l Large-scale simulation and analysis (climate modeling, galaxy formation, gravity waves, event simulation) Engineering (parameter studies, linked models) Data intensive u u l Experimental data analysis (high energy physics) Image & sensor analysis (astronomy, climate) Distributed collaboration u u Remote visualization (climate studies, biology) u l Online instrumentation (microscopes, x-ray) Engineering (large-scale structural testing) Large, complex scientific problems u Require people in several organizations to collaborate u Share computing resources, data, instruments 3

The Grid Problem l Flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resource (From “The Anatomy of the Grid: Enabling Scalable Virtual Organizations”) l Enable communities (“Virtual Organizations”) to share geographically distributed resources as they pursue common goals l Assuming the absence of… u u central location central control omniscience existing trust relationships 4

An Old Idea … l “The time-sharing computer system can unite a group of investigators …. one can conceive of such a facility as an … intellectual public utility. ” u l Fernando Corbato and Robert Fano, 1966 “We will perhaps see the spread of ‘computer utilities’, which, like present electric and telephone utilities, will service individual homes and offices across the country. ” u Len Kleinrock, 1967 5

A Few Grid Application Examples

Earth System Grid objectives To support the infrastructural needs of the national and international climate community, ESG is providing crucial technology to securely access, monitor, catalog, transport, and distribute data in today’s Grid computing environment. HPC hardware running climate models ESG Sites 7 Bernholdt_ESG_0611 ESG Portal Slide Courtesy of Dave Bernholdt, ORNL 7

ESG Facts and Figures ESG Portal at NCAR 130 TB of data at four locations l l 840, 331 files Includes the past 6 years of joint DOE/NSF climate modeling experiments IPCC AR 4 ESG Portal 28 TB of data at one location l l l 68, 400 files Generated by a modeling campaign coordinated by the Intergovernmental Panel on Climate Change Model data from 11 countries 3, 200 registered users 818 registered analysis projects Downloads to date l l 25 TB 91, 000 files l l l IPCC Downloads (10/12/06) 123 TB 543, 500 files 300 GB/day (average) Nov 2004 – Oct 2006 Worldwide ESG user base 300 scientific papers published to date based on analysis of IPCC AR 4 data Slide Courtesy of Dave Bernholdt, ORNL 8

NSF’s Tera. Grid* l. Tera. Grid DEEP: Integrating NSF’s most powerful computers (60+ TF) u 2+ UC/ANL PU NCSA PSC IU ORNL UCSD UT A National Science Foundation Investment in Cyberinfrastructure $100 M 3 -year construction (2001 -2004) $150 M 5 -year operation & enhancement (2005 -2009) * Slide courtesy of Ray Bair, Argonne National Laboratory PB Online Data Storage u. National data visualization facilities u. World’s most powerful network (national footprint) l. Tera. Grid WIDE Science Gateways: Engaging Scientific Communities u 90+ Community Data Collections u. Growing set of community partnerships spanning the science community. u. Leveraging NSF ITR, NIH, DOE and other science community projects. u. Engaging peer Grid projects such as Open Science Grid in the U. S. as peer Grids in Europe and Asia. Pacific. l. Base Tera. Grid Cyberinfrastructure: Persistent, Reliable, National u. Coordinated distributed computing and information environment 9

Data Grids for High Energy Physics ~PBytes/sec Online System ~100 MBytes/sec Offline Processor Farm ~20 TIPS There is a “bunch crossing” every 25 nsecs. There are 100 “triggers” per second Each triggered event is ~1 MByte in size ~100 MBytes/sec ~622 Mbits/sec or Air Freight (deprecated) Tier 1 France Regional Centre 1 TIPS is approximately 25, 000 Spec. Int 95 equivalents Tier 0 Germany Regional Centre Italy Regional Centre CERN Computer Centre Fermi. Lab ~4 TIPS ~622 Mbits/sec Tier 2 ~622 Mbits/sec Institute ~0. 25 TIPS Physics data cache Institute ~1 MBytes/sec Tier 4 Caltech ~1 TIPS Tier 2 Centre Tier 2 Centre ~1 TIPS Physicists work on analysis “channels”. Each institute will have ~10 physicists working on one or more channels; data for these channels should be cached by the institute server Physicist workstations Image courtesy Harvey Newman, Caltech 10

Elements of a Grid l Resource sharing u u l This sharing is always conditional: issues of trust, policy, negotiation, payment, etc. Coordinated problem solving u l Computers, storage systems, sensors, networks, … Distributed data analysis, computation, simulation, collaboration, … Dynamic, multi-institutional virtual organizations u u Community overlays on classic organizational structures May be large or small, static or dynamic 11

Two Rules or Principles of the Grid l Can’t rely on homogeneity of resources u u l In practice, resources in a large, distributed environment will be heterogeneous STRATEGY - Plan for diverse systems and use mechanisms to manage heterogeneity Can’t rely on trust among participants u u u Sites will not be willing to share their resources if they cannot trust clients from other sites STRATEGY - Provide a security model that can express complicated social networks STRATEGY - Use full disclosure when making requests (who is requesting, authorizing, and authenticating the request) and give service owners tools to enforce local policies. 12

Grid Infrastructure l Provides distributed management u u u l Of physical resources Of software services Of communities and their policies Unified treatment u u u Build on Web Services framework Use Web Services Resource Framework (WS-RF), Web Services Notification (WS-Notification), etc. to represent and access state associated with a service Common management abstractions & interfaces 14

Elements of the End-to-End Problem Include … l Massively parallel petascale simulation l High-performance parallel I/O l Remote visualization l High-speed reliable data movement l Terascale local analysis l Data access and analysis by external users l Troubleshooting problems in end-to-end system l Security l Orchestration of these various activities Slide Courtesy of Ian Foster 15

Layered Grid Architecture

Layered Grid Architecture (By Analogy to Internet Architecture) “Coordinating multiple resources”: ubiquitous infrastructure services, app-specific distributed services “Sharing single resources”: negotiating access, controlling use Collective Application Resource “Talking to things”: communication (Internet protocols) & security Connectivity Transport Internet “Controlling things locally”: Access to, & control of, resources Fabric Link 17 Internet Protocol Architecture Application

Protocols, Services, and APIs Occur at Each Level Applications Languages/Frameworks Collective Service APIs and SDKs Collective Services Resource APIs and SDKs Resource Services Collective Service Protocols Resource Service Protocols Connectivity APIs Connectivity Protocols Local Access APIs and Protocols Fabric Layer 18

Important Points l Built on Internet protocols & services u l Communication, routing, name resolution, etc. “Layering” here is conceptual, does not imply constraints on who can call what u u u Protocols/services/APIs/SDKs will, ideally, be largely self -contained Some things are fundamental: e. g. , communication and security But, advantageous for higher-level functions to use common lower-level functions 19

The Hourglass Model Focus on architecture issues Applications Propose set of core services as basic infrastructure u Use to construct high-level, domain-specific solutions l Diverse global services u l Design principles Keep participation cost low u Enable local control u Support for adaptation u “IP hourglass” model u Core services Local OS 20

Connectivity Layer Protocols & Services l Communication protocols u l Internet protocols: IP, DNS, routing, etc. Security protocols and infrastructure u Uniform authentication, authorization, and message protection mechanisms in multi-institutional setting u Single sign-on, delegation, identity mapping u E. g. , Public key technology, SSL, X. 509, GSS-API u Supporting infrastructure: Certificate Authorities, certificate & key management, … GSI: www. gridforum. org/security 22

Resource Layer Protocols & Services l Job submission and management tools u l Data Transport Tools u l Remote allocation, advance reservation, control of compute resources High-performance data access & transport Information Provider u Collects information about the current state of a resource, makes available to higher-level service 23

Collective Layer Protocols & Services l Information Services u u l Aggregate and publish information about resource characteristics Monitor current status of resources Resource brokers u Resource discovery and allocation l Metadata and Replica Catalogs l Data Management Services (e. g. , replication) l Co-reservation and co-allocation services l Workflow management services 24

Example: High-Throughput Computing System App High Throughput Computing System Collective Dynamic checkpoint, job management, (App) failover, staging Collective Brokering, certificate authorities (Generic) Resource Access to data, access to computers, access to network performance data Connect Communication, service discovery (DNS), authentication, authorization, delegation Fabric Storage systems, schedulers 25

Example: Grid Services for Data-Intensive Applications App Discipline-Specific Data Grid Application Collective Coherency control, replica selection, task management, (App) data placement services, … Collective Replica catalog, replica management, co-allocation, (Generic) certificate authorities, metadata catalogs, … Resource Access to data, access to computers, access to network performance data, … Communication, service discovery (DNS), authentication, Connect authorization, delegation Fabric Storage systems, clusters, network caches, … 26