Скачать презентацию Grid Initiatives for e-Science virtual communities in Europe Скачать презентацию Grid Initiatives for e-Science virtual communities in Europe

6341075bf0db191aafb4be97f6feab55.ppt

  • Количество слайдов: 32

Grid Initiatives for e-Science virtual communities in Europe and Latin America The middleware Grid Grid Initiatives for e-Science virtual communities in Europe and Latin America The middleware Grid Computing - DCC/FCUP www. gisela-grid. eu

Disclaimer • This presentation is based on materials provided and authorized by the EGEE Disclaimer • This presentation is based on materials provided and authorized by the EGEE project and is available to download and use according to the terms of the following license: http: //creativecommons. org/licenses/by-nc-sa/2. 5/ www. gisela-grid. eu 2

OUTLINE • The EGEE Project – Objective – Relationship to other projects • The OUTLINE • The EGEE Project – Objective – Relationship to other projects • The g. Lite middleware – Middleware decomposition § Foundation § High-level services www. gisela-grid. eu 3

Part I The EGEE Project www. gisela-grid. eu 4 Part I The EGEE Project www. gisela-grid. eu 4

The EGEE project • EGEE – 1 April 2004 – 31 March 2006 – The EGEE project • EGEE – 1 April 2004 – 31 March 2006 – 71 partners in 27 countries, federated in regional Grids • EGEE-II – 1 April 2006 – 31 March 2008 – 91 partners in 32 countries – 13 Federations • EGEE-III – 1 April 2008 – 31 March 2010 – More than 120 partners • Objectives – Large-scale, production-quality infrastructure for e-Science – Attracting new resources and users from industry as well as science – Improving and maintaining “g. Lite” Grid middleware www. gisela-grid. eu US partners in EGEE-II: • Univ. Chicago • Univ. South. California • Univ. Wisconsin • RENCI 5

Main lines of the EGEE project • Infrastructure operation – Currently includes sites across Main lines of the EGEE project • Infrastructure operation – Currently includes sites across 39 countries – Continuous monitoring of grid services & automated site configuration/management • Middleware – Production quality middleware distributed under business friendly open source licence • User Support - Managed process from first contact through to production usage – Training – Expertise in grid-enabling applications – Online helpdesk – Networking events (User Forum, Conferences etc. ) • Interoperability – Expanding geographical reach and interoperability with related infrastructures www. gisela-grid. eu Know. ARC TWGRID 6

Applications on EGEE • Applications from an increasing number of domains – – – Applications on EGEE • Applications from an increasing number of domains – – – Astrophysics Computational Chemistry Earth Sciences Financial Simulation Fusion Geophysics High Energy Physics Life Sciences Multimedia Material Sciences … Book of abstracts: http: //doc. cern. ch//archive/electronic/egee/tr/egee-tr-2006 -005. pdf www. gisela-grid. eu 7

EU projects related to EGEE EU GRID www. gisela-grid. eu 8 EU projects related to EGEE EU GRID www. gisela-grid. eu 8

Sustainability: Beyond EGEE-III • Need to prepare for permanent Grid infrastructure – Ensure a Sustainability: Beyond EGEE-III • Need to prepare for permanent Grid infrastructure – Ensure a reliable and adaptive support for all sciences – Independent of short project funding cycles – Infrastructure managed in collaboration with national grid initiatives EGI www. gisela-grid. eu 9

Part II The g. Lite middleware Programming the Grid with g. Lite http: //doc. Part II The g. Lite middleware Programming the Grid with g. Lite http: //doc. cern. ch//archive/electronic/egee/tr/egee-tr-2006 -001. pdf www. gisela-grid. eu 10

Middleware structure • Applications have access both to Higher-level Grid Services and to Foundation Middleware structure • Applications have access both to Higher-level Grid Services and to Foundation Grid Middleware • Higher-Level Grid Services are supposed to help the users building their computing infrastructure but should not be mandatory • Applications Foundation Grid Middleware will be deployed on the EGEE infrastructure – Must be complete and robust – Should allow interoperation with Higher-Level Grid Services Workload Management Replica Management Visualization Workflow Grid Economies. . . Foundation Grid Middleware Security model and infrastructure Computing (CE) and Storage Elements (SE) Accounting Information and Monitoring other major grid infrastructures – Should not assume the use of Higher-Level Grid Services Overview paper http: //doc. cern. ch//archive/electronic/egee/tr/egee-tr-2006 -001. pdf www. gisela-grid. eu g. Lite @ OMII-Europe All-Hands meeting, Bologna, 12 -13 February 2007 11

g. Lite Services Decomposition 6 High Level Services + CLI & API Legend: Available g. Lite Services Decomposition 6 High Level Services + CLI & API Legend: Available Foreseen in the architecture (only Job provenance was available in the end of EGEE-II) www. gisela-grid. eu 12

g. Lite components • • • UI: User Interface CE: Computing Element SE: Storage g. Lite components • • • UI: User Interface CE: Computing Element SE: Storage Element WN: Worker Node WMS: Workload Management System VOMS: Virtual Organization Membership Service LB: Logging and Bookkeeping Mon. BOX: monitoring LFC: Logical File Catalog BDII: Berkeley Database Information Index, stores all infomation about the resources available in the grid infrastructure www. gisela-grid. eu 13

Job Workflow in g. Lite -in it UI JDL LFC Catalog Input “sandbox” Data. Job Workflow in g. Lite -in it UI JDL LFC Catalog Input “sandbox” Data. Sets info -p ro xy Output “sandbox” s b nd Storage Element o Jo b nf r. I ke Publish ” ox tu sa o Br St a t“ pu ”+ ox Expanded JDL ms vo nfo ut db n sa Job Status CE i O t“ www. gisela-grid. eu pu Job Query Job Submit Event Globus RSL Job Status Logging & Book-keeping SE & In Resource Broker Author. &Authen. Information Service Job Submission Service Computing Element 14

UI JDL LFC Catalog Input “sandbox” Data. Sets info SE & b nd s UI JDL LFC Catalog Input “sandbox” Data. Sets info SE & b nd s sa ” Br Jo b ”+ er ok fo In Job Status Publish ox ox tu t“ b nd St a pu sa Expanded JDL ms vo nfo ut t“ pu www. gisela-grid. eu In Job Query Job Submit Event Globus RSL Job Status Logging & Book-keeping CE i O Resource Broker Author. &Authen. Information Index Output “sandbox” -p ro xy -in it Job Workflow in g. Lite Job Submission Service WMProxy Storage Element Computing Element 15

High Level Services: Workload Manag. • Resource brokering, workflow management, I/O data management èWeb High Level Services: Workload Manag. • Resource brokering, workflow management, I/O data management èWeb Service interface: WMProxy – Task Queue: keep non matched jobs – Information Super. Market: optimized cache of information system – Match Maker: assigns jobs to resources according to user requirements – Job submission & monitoring èCondor-G èICE (to CREAM) – External interactions: § Information System § Data Catalogs § Logging&Bookkeeping § Policy Management system (G-PBox) CREAM: Computing Resource Execution and Management www. gisela-grid. eu ICE: Interface to CREAM Environment 16

Grid Foundation: Security • Authentication based on X. 509 PKI infrastructure – Certificate Authorities Grid Foundation: Security • Authentication based on X. 509 PKI infrastructure – Certificate Authorities (CA) issue (long lived) certificates identifying individuals (much like a passport) § Commonly used in web browsers to authenticate to sites – Trust between CAs and sites is established (offline) – In order to reduce vulnerability, on the Grid user identification is done by using (short lived) proxies of their certificates • Proxies can – Be delegated to a service such that it can act on the user’s behalf – Include additional attributes (like VO information via the VO Membership Service VOMS) – Be stored in an external proxy store (My. Proxy) – Be renewed (in case they are about to expire) www. gisela-grid. eu 17

Grid Foundation: Security • Local Centre Authorization Service (LCAS) handles authorization requests to the Grid Foundation: Security • Local Centre Authorization Service (LCAS) handles authorization requests to the local computing fabric • Local Credential Mapping Service (LCMAPS) provides all local credentials needed for jobs allowed into the fabric. • Batch Local ASCII Helper – The protocol (BLAHP): provides a set of plain ASCII commands used by Condor-C (and CREAM) to manage jobs on the batch systems. – The daemon (BLAHPD): implements the helper daemon responsible for converting BLAHP commands into batch system actions, interpreting their results and reporting them in BLAHP format. www. gisela-grid. eu 18

Grid foundation: Information Systems • Generic Information Provider (GIP) – Provides LDIF information about Grid foundation: Information Systems • Generic Information Provider (GIP) – Provides LDIF information about a grid service in accordance to the GLUE Schema GIP Cache Provider Plugin Config File LDIF File • BDII: Information system in g. Lite 3. 0 (by LCG) – LDAP database that is updated by an external process – More than one DBs is used separate read and write – A port forwarder is used internally to select the correct DB LDIF: Lightweight Directory Interchange Format LDAP: Lightweight Data Access Protocol GLUE: Grid Laboratory Uniform Environment BDII: Berkeley Datbase Information Index www. gisela-grid. eu 19

Grid foundation: Information Systems • R-GMA: provides a uniform method to access and publish Grid foundation: Information Systems • R-GMA: provides a uniform method to access and publish distributed information and monitoring data – Used for job and infrastructure monitoring in g. Lite 3. 0 – Working to add authorization • Service Discovery: – – Provides a standard set of methods for locating Grid services Currently supports R-GMA, BDII and XML files as backends Will add local cache of information Used by some DM and WMS components in g. Lite 3. 0 www. gisela-grid. eu 20

Grid foundation: Computing Element • Three flavours available now: è LCG-CE (GT 2 GRAM) Grid foundation: Computing Element • Three flavours available now: è LCG-CE (GT 2 GRAM) § In production now but will be phased-out next year WMS, Clients § Already deployed but still needs thorough testing and tuning. Being done now Grid è g. Lite-CE (GSI-enabled Condor-C) § Deployed on the JRA 1 preview test-bed. After a first testing phase will be certified and deployed together with the g. Lite-CE § Our contribution to the OGF-BES group for a standard WS-I based CE interface § CREAM and WMProxy demo at SC 06! • BLAH is the interface to the local resource manager (via plug-ins) – CREAM and g. Lite-CE – Information pass-through: pass parameters to the LRMS to help job scheduling www. gisela-grid. eu Site è CREAM (WS-I based interface) Information System Computing Element bd. II R-GMA CEMon glexec + LCAS/ LCMAPS BLAH WN LRMS 21

Grid foundation: Accounting • APEL: Uses R-GMA to propagate and display job accounting information Grid foundation: Accounting • APEL: Uses R-GMA to propagate and display job accounting information for infrastructure monitoring – Reads LRMS log files provided by LCG-CE and BLAH – Preparing an update for g. Lite 3. 0 to use the files from BLAH • DGAS: Collects, stores and transfers accounting data. Compliant with privacy requirements – Reads LRMS log files provided by LCG-CE and BLAH. – Stores information in a site database (HLR) and optionally in a central HLR. Access granted to user, site and VO administrators – Not yet certified in g. Lite 3. 0. Deployment plan: § DGAS is in certification at INFN § It will send records to the GOC via DGAS 2 APEL HLR: Home Location Registers: manage user and resource accounts www. gisela-grid. eu 22

Grid foundation: Storage Element • Storage Element – Common interface: SRMv 1, migrating to Grid foundation: Storage Element • Storage Element – Common interface: SRMv 1, migrating to SRM v 2. 2 – Various implementation from LCG and other external projects § disk-based: DPM, d. Cache / tape-based: Castor, d. Cache – Support for ACLs in DPM (in future in Castor and d. Cache) § synchronization of ACLs between SEs – Common rfio library for Castor and DPM being added • Posix-like file access: – Grid File Access Layer (GFAL) by LCG § Support for ACL in the SRM layer (currently in DPM only) § Support for SRMv 2 – g. Lite I/O § Support for ACLs from the file catalog and interfaced to Hydra for data encryption § Not certified in g. Lite 3. 0. To be dismissed when all functionalities will be also available in GFAL. Hydra: encrypts files and stores them on normal storage elements www. gisela-grid. eu 23

High Level Services: Catalogues • File Catalogs – LFC from LCG § interfaced to High Level Services: Catalogues • File Catalogs – LFC from LCG § interfaced to POOL (Disk Pool Manager – DPM). § LFC replication and backup. – Hydra: stores keys for data encryption § interfaced to GFAL § Released with g. Lite 3. 1 – AMGA Metadata Catalog: generic metadata catalogue § Joint ARDA development. Used mainly by Biomed www. gisela-grid. eu 24

High Level Services: File transfer • FTS: Reliable, scalable and customizable file transfer – High Level Services: File transfer • FTS: Reliable, scalable and customizable file transfer – Manages transfers through channels § mono-directional network pipes between two sites – Web service interface – Automatic discovery of services – Support for different user and administrative roles – Adding support for pre-staging and new proxy renewal schema – Support for SRMv 2. 2, delegation, VOMS-aware proxy renewal in certification www. gisela-grid. eu 25

High Level Services: Workload mgmt. • WMS helps the user accessing computing resources – High Level Services: Workload mgmt. • WMS helps the user accessing computing resources – Resource brokering, management of job input/output, . . . • LCG-RB: GT 2 + Condor-G – To be replaced when the g. Lite WMS proves to be reliable • g. Lite WMS: Web service (WMProxy) + Condor-G – Management of complex workflows (DAGs) and compound jobs § bulk submission and shared input sandboxes § support for input files on different servers (scattered sandboxes) – Support for shallow resubmission of jobs – Job File Perusal: file peeking during job execution – Supports collection of information from CEMon, BDII, R-GMA and from DLI and Storage. Index data management interfaces – Support for parallel jobs (MPI) when the home dir is not shared – Deployed for the first time in g. Lite 3. 0 www. gisela-grid. eu 26

WMS/LB/UI and CE • New WMS deployed and thoroughly debugged – CMS: 100 collections WMS/LB/UI and CE • New WMS deployed and thoroughly debugged – CMS: 100 collections * 200 jobs/collection, 3 UIs, 33 CEs § ~ 2. 5 h to submit jobs • 0. 5 seconds/job § ~ 17 hours to transfer jobs to a CE • 3 seconds/job • 26 K jobs/day § Negligible failure rate due to WMS – Shallow resubmission § failure rate drops to less than 1% with 3 resubmissions CMS • Stability problems – investigating also other deployment scenarios to make it more robust ATLAS www. gisela-grid. eu • g. Lite CE still to be tested and optimized 27

High Level Services: Workflows • Direct Acyclic Graph (DAG) is a node. A set High Level Services: Workflows • Direct Acyclic Graph (DAG) is a node. A set of jobs where the input, output, or execution of one or more jobs depends on one or node. B node. C node. E more other jobs • A Collection is a group of jobs with no dependencies node. D – basically a collection of JDL’s • A Parametric job is a job having one or more attributes in the JDL that vary their values according to parameters • Using compound jobs it is possible to have one shot submission of a (possibly very large, up to thousands) group of jobs – Submission time reduction § Single call to WMProxy server § Single Authentication and Authorization process § Sharing of files between jobs – Availability of both a single Job ID to manage the group as a whole and an ID for each single job in the group www. gisela-grid. eu 28

High Level Services: Job Information • Logging and Bookkeeping service – – Tracks jobs High Level Services: Job Information • Logging and Bookkeeping service – – Tracks jobs during their lifetime (in terms of events) LBProxy for fast access L&B API and CLI to query jobs Support for “CE reputability ranking“: maintains recent statistics of job failures at CE’s and feeds back to WMS to aid planning • Job Provenance: stores long term job information – Supports job rerun – helps unloading the L&B – Released with g. Lite 3. 1. www. gisela-grid. eu 29

Highlights: Job Priorities • Applications ask for the possibility to diversify the access to Highlights: Job Priorities • Applications ask for the possibility to diversify the access to fast/slow queues depending on the user role/group inside the VO • GPBOX is a tool that provides the possibility to define, store and propagate fine-grained VO policies – based on VOMS groups and roles – enforcement of policies at sites: sites may accept/reject policies www. gisela-grid. eu 30

Summary • g. Lite 3 is – the next generation middleware for grid computing Summary • g. Lite 3 is – the next generation middleware for grid computing – developed according to a well defined process § controlled by the EGEE Technical Coordination Group – deployed on the EGEE production infrastructure § More than 200 sites – development is continuing to provide increased robustness, usability, and functionality § On the preview testbed • CREAM, Job Provenance, glexec on the WNs, GPBOX – g. Lite sources: http: //glite. cvs. cern. ch/cgi-bin/glite. cgi/ www. gisela-grid. eu 31

www. glite. org www. gisela-grid. eu 32 www. glite. org www. gisela-grid. eu 32