Скачать презентацию Outline l l l Motivation Definition and characteristics Скачать презентацию Outline l l l Motivation Definition and characteristics

6dc07446e8721fc9b28568035f2f2c77.ppt

  • Количество слайдов: 41

Outline l l l Motivation Definition and characteristics of Grids Example Grid applications Grid Outline l l l Motivation Definition and characteristics of Grids Example Grid applications Grid Architecture How a Grid Is Assembled Overview of the Globus Toolkit u u l Security Tools Monitoring and Discovery System Computing/Execution Tools Data Tools A more detailed example: The Earth System Grid 1

How Are Grids Built and Used Today? How Are Grids Built and Used Today?

Methodology l Building a Grid system or application is currently an exercise in software Methodology l Building a Grid system or application is currently an exercise in software integration. u u u u l Define user requirements Derive system requirements or features Survey existing components Identify useful components Develop components to fit into the gaps Integrate the system Deploy and test the system Maintain the system during its operation This should be done iteratively, with many loops and eddies in the flow. Slide Courtesy of Lee Liming, ANL 3

What End Users Need Secure, reliable, ondemand access to data, software, people, and other What End Users Need Secure, reliable, ondemand access to data, software, people, and other resources (ideally all via a Web Browser!) Slide Courtesy of Lee Liming, ANL 4

How it Really Happens Web Browser Compute Server Simulation Tool Web Portal Registration Service How it Really Happens Web Browser Compute Server Simulation Tool Web Portal Registration Service Data Viewer Tool Chat Tool Credential Repository Telepresence Monitor Application services organize VOs & enable access to other services Camera Database service Data Catalog Database service Certificate authority Users work with client applications Compute Server Collective services aggregate &/or virtualize resources Resources implement standard access & management interfaces 5

How it Really Happens l Implementations are provided by a mix of Application-specific code How it Really Happens l Implementations are provided by a mix of Application-specific code u “Off the shelf” tools and services u Tools and services from the Grid community (Globus + others using the same standards) u l Glued together by… Application development u System integration u Slide Courtesy of Lee Liming, ANL 6

The Importance of Community l All Grid technology is evolving rapidly. u u l The Importance of Community l All Grid technology is evolving rapidly. u u l Community is important! u u l Web services standards Grid interfaces Grid implementations Grid resource providers (ASP, SSP, etc. ) Best practices (OGF, OASIS, etc. ) Open source (Linux, Axis, Globus, etc. ) Application of community standards is vital. u u u Increases leverage Mitigates (a bit) effects of rapid evolution Paves the way for future integration/partnership Slide Courtesy of Lee Liming, ANL 7

Overview of the Globus Toolkit Overview of the Globus Toolkit

What Is the Globus Toolkit? l l The Globus Toolkit is a collection of What Is the Globus Toolkit? l l The Globus Toolkit is a collection of solutions to problems that frequently come up when trying to build collaborative distributed applications. Heterogeneity u u l Security u l Simplifying heterogenity for application developers. We are increasingly including more “vertical solutions” that implement typical application patterns. The Grid Security Infrastructure (GSI) provides security mechanisms that operate at the service/community level, allowing collaborators to share resources without blind trust. Standards u u Our goal has been to capitalize on and encourage use of existing standards (IETF, W 3 C, OASIS, OGF). The Toolkit also includes reference implementations of new/proposed standards in these organizations. 9

What’s In the Globus Toolkit? l A Grid development environment u u u l What’s In the Globus Toolkit? l A Grid development environment u u u l A set of basic Grid services u u u l l Develop new OGSA-compliant Web Services Develop applications using Java or C/C++ Grid APIs Secure applications using basic security mechanisms Job submission/management File transfer (individual, queued) Database access Data management (replication, metadata) Monitoring/Indexing system information Tools and Examples The prerequisites for many Grid community tools 10

Globus Toolkit 4. 1 Components 11 Globus Toolkit 4. 1 Components 11

Security Tools l Basic Grid Security Mechanisms l Certificate Generation Tools l Certificate Management Security Tools l Basic Grid Security Mechanisms l Certificate Generation Tools l Certificate Management Tools u u l Getting users “registered” to use a Grid Getting Grid credentials to wherever they’re needed in the system Authorization/Access Control Tools u Storing and providing access to system-wide authorization information 12

Basic Grid Security Mechanisms l Basic Grid authentication and authorization mechanisms come in two Basic Grid Security Mechanisms l Basic Grid authentication and authorization mechanisms come in two flavors. u u l Pre-Web services Both are included in the Globus Toolkit, and both provide vital security features. u u u u Grid-wide identities implemented as PKI certificates Transport-level and message-level authentication Ability to delegate credentials to agents Ability to map between Grid & local identities Local security administration & enforcement Single sign-on support implemented as “proxies” A “plug in” framework for authorization decisions 13

Simple CA and My. Proxy Service l Simple. CA: a convenient method of setting Simple CA and My. Proxy Service l Simple. CA: a convenient method of setting up a certificate authority (CA). u u l My. Proxy is a remote service that stores user credentials. u u l The Certificate Authority can then be used to issue certificates for users and services that work with GSI and WS-Security. Simple CA is intended for operators of small Grid testing environments and users who are not part of a larger Grid. Users can request proxies for local use on any system on the network. Web Portals can request user proxies for use with back-end Grid services. Grid administrators can pre-load credentials in the server for users to retrieve when needed 14

CAS: Community Authorization Service l l l CAS allows resource providers to specify course-grained CAS: Community Authorization Service l l l CAS allows resource providers to specify course-grained access control policies in terms of communities as a whole Fine-grained access control is delegated to the community. Resource providers maintain ultimate authority over their resources (including per-user control and auditing) but are spared most day-to-day policy administration tasks. 15

Shibboleth and Grid. Shib l l l Federated authorization and authentication management for Web-based Shibboleth and Grid. Shib l l l Federated authorization and authentication management for Web-based services Home institutions handle user authentication and role (“attribute”) information Resource providers use role info for authorization Allows for user privacy (resource provider sees a “handle” from the home institution, which can protect identity) Grid. Shib is a Globus Incubator project u Allows Grid proxies to be used to look up Shibboleth attributes 17

Monitoring and Discovery System: The Globus Index Service l Provides a registry capability u Monitoring and Discovery System: The Globus Index Service l Provides a registry capability u u Services register with the index service to make their presence known to other Grid components Index service can also pro-actively subscribe to specific services based on configuration l l Provides a caching capability u l Data, datatype, data provider information Caches resource property values from registered services Indexes can be set up for a variety of uses, projects 18

Monitoring and Discovery System: The Globus Trigger Service l Triggers email alerts when pre-defined Monitoring and Discovery System: The Globus Trigger Service l Triggers email alerts when pre-defined conditions are met u l Especially useful for system status alerts in “production” operation situations Monitoring behavior u u u Subscribes to a set of resource properties Runs a set of pre-configured tests on the resulting data streams to evaluate trigger conditions When a trigger condition matches, sends email to the preconfigured address(es) 19

Computing/Processing Tools l Workflow Managers u Organize and coordinate task execution within a complicated Computing/Processing Tools l Workflow Managers u Organize and coordinate task execution within a complicated application u u l Often coordinates data movement and task execution E. g. , Pegasus workflow manager from ISI Metaschedulers u Optimize use of distributed compute pools 20

GRAM - Basic Job Submission and Control Service l A uniform service interface for GRAM - Basic Job Submission and Control Service l A uniform service interface for remote job submission and control u u l Includes file staging and I/O management Includes reliability features Supports basic Grid security mechanisms Available in Pre-WS and WS GRAM is not a scheduler. u u u No scheduling No metascheduling/brokering Often used as a front-end to schedulers, and often used to simplify metaschedulers/brokers 21

Grid-enabled Schedulers l l Scheduling systems that are easily integrated with GRAM via plug-ins Grid-enabled Schedulers l l Scheduling systems that are easily integrated with GRAM via plug-ins Wide variety of capabilities (supported models) Wide variety of support (commercial vs. open source) Note that GRAM can be used as either an interface to a scheduler or the interface that a scheduler uses to submit a job to a resource. l Condor l Open. PBS l Torque l PBSPro l Sun Grid Engine l Platform LSF l Grid. Way (new component of Globus Toolkit) 22

Data Tools l Grid. FTP u l The Reliable File Transfer Service (RFT) u Data Tools l Grid. FTP u l The Reliable File Transfer Service (RFT) u l Distributed registry that records locations of data copies The Data Replication Service (DRS) u l Data movement services for GT 4 The Replica Location Service (RLS) u l Fast, secure data transport Integrates RFT and RLS to replicate and register files The Data Access and Integration Service (DAIS) u Service to access relational and XML database Data Services in Development l Managed Object Placement Service (MOPS) l Policy-driven Data Placement Services 23

Grid. FTP l A high-performance, secure data transfer service optimized for high-bandwidth widearea networks Grid. FTP l A high-performance, secure data transfer service optimized for high-bandwidth widearea networks u u u l FTP with extensions Uses basic Grid security (control and data channels) Multiple data channels for parallel transfers Partial file transfers Third-party (direct server-to-server) transfers Basic Transfer One control channel, several parallel data channels Third-party Transfer Control channels to each server, several parallel data channels between servers OGF recommendation GFD. 20 24

RFT - File Transfer Queuing l A WSRF service for queuing file transfer requests RFT - File Transfer Queuing l A WSRF service for queuing file transfer requests u u u l Server-to-server transfers Checkpointing for restarts Database back-end for failovers Allows clients to request transfers and then “disappear” u u No need to manage the transfer Status monitoring available if desired 25

OGSA-DAI l OGSA interface for accessing XML and relational data stores l Implements the OGSA-DAI l OGSA interface for accessing XML and relational data stores l Implements the GGF DAIS WG standard SQLOne Data Service Resource Data Resource Accessor Relational Data Resource Accessor XMLDB XMLOne Data Service Resource Files. One Data Service Resource Data Resource Accessor Files 26

The Globus Replica Location Service • Distributed registry • Records the locations of data The Globus Replica Location Service • Distributed registry • Records the locations of data copies • Allows replica discovery l l RLS maintains mappings between logical identifiers and target names Must perform and scale well: u u Replica Location Indexes Local Replica Catalogs support hundreds of millions of objects hundreds of clients 27

RLS in Production Use: LIGO l Laser Interferometer Gravitational Wave Observatory Currently use RLS RLS in Production Use: LIGO l Laser Interferometer Gravitational Wave Observatory Currently use RLS servers at 10 sites u Contain mappings from 11 million logical files to over 120 million physical replicas 28

The Data Replication Service l Design based on publication component of the LIGO Lightweight The Data Replication Service l Design based on publication component of the LIGO Lightweight Data Replicator system u Developed by Scott Koranda l Client specifies (via DRS interface) which files are required at local site l DRS uses: u u u Globus Delegation Service to delegate proxy credentials RLS to discover where replicas exist in the Grid Selection algorithm to choose among available source replicas (provides a callout; default is random selection) Reliable File Transfer (RFT) service to copy data to site l Via Grid. FTP data transport protocol RLS to register new replicas 29

Goal of Incubator Project: Facilitate Open Contributions to Globus by Additional Groups l Distributing Goal of Incubator Project: Facilitate Open Contributions to Globus by Additional Groups l Distributing code under an open source license does not guarantee open development, contributions l Under CDIGS, we adopted governance and open contribution models based on Apache Jakarta l Incubator Process defines steps needed to move: u u u l From a proposed Candidate Project completely outside the Globus infrastructure To an Incubator Project, part of the Globus Incubator framework To a full Project that is part of Globus Incubator Management Project (IMP) u Oversees incubator process form first contact to becoming a Globus project 30

24 Active Incubator Projects l l l l Co. G Workflow l Distributed Data 24 Active Incubator Projects l l l l Co. G Workflow l Distributed Data l Management (DDM) Dynamic Accounts Grid Authentication l and Authorization with Reliably l Distributed Services (GAARDS) Gavia-Meta Scheduler l Gavia- Job Submission Client Grid Development Tools for Eclipse l (GDTE) Grid Execution Mgmt. l for Legacy Code Apps. l (GEMLCA) Grid. Shib Higher Order Component Service Architecture (HOCSA) Introduce Local Resource Manager Adaptors (LRMA) MEDICUS (Medical Imaging and Computing for Unified Information Sharing) Metrics Mon. Man Net. Logger l l l l Open GRid OCSP (Online Certificate Status Protocol) Portal-based User Registration Service (PURSe) Serv. Mark SJTU Grid. FTP GUI Client (SGGC) Swift UCLA Grid Portal Software (UGP) Workflow Enactment Engine Project (WEEP) Virtual Workspaces 31

Active Committers from 28 Institutions l l l l l Aachen Univ. (Germany) Argonne Active Committers from 28 Institutions l l l l l Aachen Univ. (Germany) Argonne National Laboratory CANARIE (Canada) Certi. Ve. R Children’s Hospital Los Angeles Delft Univ. (The Netherlands) Indiana Univ. Kungl. Tekniska Högskolan (Sweden) Lawrence Berkeley National Lab l Leibniz Supercomputing Center (Germany) l NCSA l Ohio State Univ. l Semantic Bits l Shanghai Jiao Tong University (China) l Univ. of British Columbia (Canada) l National Research Council of Canada l l l UCLA l Univ. of Chicago l l l l Univ. of Marburg (Germany) Univ. of Muenster (Germany) Univ. Politecnica de Catalunya (Spain) Univ. of Rochester USC Information Sciences Institute Univ. of Victoria (Canada) Univ. of Vienna (Austria) Univ. of Westminster (UK) Univa Corp. Univ. of Delaware l 32

Globus Software: dev. globus. org Globus Projects OGSA-DAI GT 4 MPICH G 2 Java Globus Software: dev. globus. org Globus Projects OGSA-DAI GT 4 MPICH G 2 Java Runtime Delegation My. Proxy Data Replica Location Grid. Way C Runtime CAS GSIOpen. SSH Grid. FTP MDS 4 GRAM Reliable File Transfer GT 4 Docs Incubation Mgmt Incubator Projects Python Runtime C Sec Swift Mon. Man GAARDS MEDICUS Cog WF Virt Wk. Sp GDTE Grid. Shib OGRO UGP Introduce PURSE HOC-SA LRMA Common Runtime GEMLCA Security Execution Mgmt Dyn Acct Gavia JSC WEEP Gavia MS Data Mgmt Net. Logger DDM Metrics SGGC Serv. Mark Info Services Other 33

Selected Incubator Projects l Introduce u u l MEDICUS (Medical Imaging and Computing for Selected Incubator Projects l Introduce u u l MEDICUS (Medical Imaging and Computing for Unified Information Sharing) u u l Enables rapid development of Globus compliant grid services; enables users to graphically create services & add methods, resources, properties & security constraints Committers: Ohio State Univ. , UC/ANL Federates medical imaging and computing resources for clinical and research applications Committers: Children’s Hospital Los Angeles, USC/ISI The Virtual Workspaces Service u u Allows an authorized Grid client to deploy an environment described by workspace meta-data on a specified resource quota Committers: ANL/UC 34

More Detail: The Earth System Grid More Detail: The Earth System Grid

DOE Earth System Grid Goal: Enable sharing & analysis of high-volume data from advanced DOE Earth System Grid Goal: Enable sharing & analysis of high-volume data from advanced earth system models www. earthsystemgrid. org 36

Underlying Technologies NCAR Cache RLS ORNL HPSS NCAR MSS SRM My. Proxy RLS SRM Underlying Technologies NCAR Cache RLS ORNL HPSS NCAR MSS SRM My. Proxy RLS SRM DISK Cache SRM OPe. NDAP-G ESG Web Portal NERSC RLS User Registration Catalogs Browsing Data Search Access Control Climate Metadata Download Data Subsetting Data Publishing LANL Cache Usage Metrics Monitoring Services Web Browser publish Data Provider Figure Courtesy of Dave Bernholdt, ORNL search browse download Web Browser DML Data User 37

Data Management l l Data Mover Lite (DML) Storage Resource Manager (SRM) l Grid. Data Management l l Data Mover Lite (DML) Storage Resource Manager (SRM) l Grid. FTP l ORNL HPSS NCAR Cache MSS RLS SRM My. Proxy SRM DISK OPe. NDAP-G Cache SRM ESG Web Portal NERSC RLS User Catalogs Registration Browsing Globus Replica Location Service (RLS) Access Control Data Search RLS LANL Cache Climate Data Metadata Download Data Subsetting Publishing Usage Metrics Monitoring Services Web Browser Data Provider publish search browse download Web Browser DML Data User 38

Security Services l Grid Security Infrastructure (GSI) l PURSE User registration RLS My. Proxy Security Services l Grid Security Infrastructure (GSI) l PURSE User registration RLS My. Proxy l ORNL HPSS NCAR Cache MSS SRM RLS SRM My. Proxy SRM DISK OPe. NDAP-G Cache SRM ESG Web Portal NERSC RLS User Catalogs Registration Browsing Access Control Data Search RLS LANL Cache Climate Data Metadata Download Data Subsetting Publishing Usage Metrics Monitoring Services Web Browser Data Provider publish search browse download Web Browser DML Data User 39

Monitoring ESG with Globus Monitoring and Discovery System l l ESG consists of heterogeneous Monitoring ESG with Globus Monitoring and Discovery System l l ESG consists of heterogeneous components deployed across multiple administrative domains The climate community has come to depend on the ESG infrastructure as a critical resource u u l l Failures of ESG components or services can disrupt the work of many scientists Need to minimize infrastructure downtime Monitoring components to determine their current state and detect failures is essential Monitoring systems: u u Collect, aggregate, and sometimes act upon data describing system state Monitoring can help users make resource selection decisions and help administrators detect problems 40

Monitoring Overall System Status 41 Monitoring Overall System Status 41

Trigger Actions Based on Monitoring Information l MDS 4 Trigger service periodically polls for Trigger Actions Based on Monitoring Information l MDS 4 Trigger service periodically polls for data l Based on the current resource status, Trigger service determines whether specified trigger rules and conditions are satisfied u l Current action: Trigger service sends email to system administrators when services fail u l If so, performs specified action for each trigger Ideally, system failures can be detected and corrected by administrators before they affect larger ESG community Future plans: include richer recovery operations as trigger actions, e. g. , automatic restart of failed services Slide Courtesy of Ann Chervenak, ISI 42