Скачать презентацию Status of the European Data Grid Project Charles Скачать презентацию Status of the European Data Grid Project Charles

d0a58fd5e2f89befd00b329b1b4c54a6.ppt

  • Количество слайдов: 16

Status of the European Data. Grid Project Charles Loomis (LAL/CNRS) LAL December 12, 2002 Status of the European Data. Grid Project Charles Loomis (LAL/CNRS) LAL December 12, 2002 Outline Introduction & Goals EDG Architecture EDG Deployment & Use External Software Typical Failure Modes Future Developments C. Loomis – Status EDG – Dec. 12, 2002 – 1

European Data. Grid (EDG) European Data. Grid v. EU-funded, 3 -year project (2001 -3) European Data. Grid (EDG) European Data. Grid v. EU-funded, 3 -year project (2001 -3) EDG Organization — demonstrate grid technology with working applications v. Strong application component unique! Data Mgt. WP 3 Info. & Monitoring Sys. WP 4 Fabric Mgt. WP 5 Storage Mgt. WP 6 Testbed Networking HEP Apps. WP 9 — deploy onto working testbed WP 2 WP 8 — develop grid middleware Workload Mgt. WP 7 v. Goals: WP 1 Biomedical Apps. WP 10 Earth Ob. Apps. WP 11 Dissemination 6 Partners; 21 Associates WP 12 Project Mgt. C. Loomis – Status EDG – Dec. 12, 2002 – 2

EDG Goals Transparent Access Actors v. Allow users transparent access to authorized resources with EDG Goals Transparent Access Actors v. Allow users transparent access to authorized resources with single authentication. End Users v. Allow users to delegate authorization to services. Site Administrators Virtual Organization v. High-level selection of resources, including datasets. Virtual Organizations v. Allow groups of people to acquire resources from sites. v. Allow organization to manage resource use among members. Optimization v. Allow optimal use of resources at site and grid levels. C. Loomis – Status EDG – Dec. 12, 2002 – 3

EDG Architecture Information Systems User Interface query Resource Broker submit MDS Replica Catalogs retrieve EDG Architecture Information Systems User Interface query Resource Broker submit MDS Replica Catalogs retrieve publish state broker chooses optimal site for job submit retrieve Global Batch System: v. Centralized Architecture. Site X Computing Element Storage Element v. Heavy infrastructure. C. Loomis – Status EDG – Dec. 12, 2002 – 4

Comments Optimization of Resources Centralized Architecture v. Resource Broker —must know state of grid Comments Optimization of Resources Centralized Architecture v. Resource Broker —must know state of grid and schedule effectively —requires knowledge of site policies and user/job details v. Information System (MDS & RC) —must respond quickly to high-volume and high-rate queries Central Points-of-Failure v. Resource Broker (redundancy at VO-level) v. MDS (unique hierarchy; some redundancy possible) With high-rate submissions: v. RB requires lots of memory, CPU, disk space. v. MDS requires lots of file descriptors, CPU. C. Loomis – Status EDG – Dec. 12, 2002 – 5

Authentication & Authorization Certification Authorities request certificate User ~15 National CAs France, INFN, … Authentication & Authorization Certification Authorities request certificate User ~15 National CAs France, INFN, … /C=FR/O=CNRS/OU=LAL/CN=Charles Loomis/[email protected] in 2 p 3. fr proxy sent for authentication Update CRL register Virtual Organizations accept/reject request Site X Computing Element ~10 Different VOs ATLAS, CMS, … Storage Element retrieve membership lists C. Loomis – Status EDG – Dec. 12, 2002 – 6

Comments Infrastructure v~15 National CAs as production service v 10 Virtual Organizations —High-Energy Physics: Comments Infrastructure v~15 National CAs as production service v 10 Virtual Organizations —High-Energy Physics: ALICE, Ba. Bar, ATLAS, CMS, DZero, LHCb —Earth Observation —Biomedical Applications —Misc. : WP 6, ITeam, Guidelines Limited Central Points-of-Failure v. VO Membership Server (for VO members) v. Certification Authority (for CA members) Caching, infrequent updates minimize problems; compromise security. C. Loomis – Status EDG – Dec. 12, 2002 – 7

Deployment & Use Site Location CPUs CC-IN 2 P 3 Lyon (F) 400 5 Deployment & Use Site Location CPUs CC-IN 2 P 3 Lyon (F) 400 5 v. CMS Event Simulation CERN Geneva (CH) 164 6 v. ATLAS Event Simulation CNAF Bologna (I) 40 Legnaro (I) 50 NIKHEF Amsterdam (NL) 22 Padova (I) 12 RAL Rutherford (GB) 16 Application Use v. Regular Tutorial Use Stability v. Filled Grid this week! Production Testbed (1. 4. 0) v. For applications to use & stress software in “semi-production” environment. v 8 sites (5 countries) 2 Development Testbed (1. 4. 0) v. To facilitate testing and integration of new middleware. v 3 sites (3 countries) C. Loomis – Status EDG – Dec. 12, 2002 – 8

Globus Experience GSI Security (OK) v. Some limitations with size of proxies. Grid. FTP Globus Experience GSI Security (OK) v. Some limitations with size of proxies. Grid. FTP (OK) v. Recent protocol change because of security fix. Replica Catalog (OK, limited) v. Unannounced, unnecessary schema change. Gate. Keeper/Job. Manager (Poor) v. Race conditions under load leading to failures. v. High resource use; poor response to errors. Information System-MDS (Poor) v. Serious problems with stability. v. Query times increase dramatically under load. C. Loomis – Status EDG – Dec. 12, 2002 – 9

Globus Experience (cont. ) Interaction v. Generally responsive to identified problems. v. Little advance Globus Experience (cont. ) Interaction v. Generally responsive to identified problems. v. Little advance warning of major changes. —Schema changes. —Rewrite of Job. Manager/Batch System interface. Testing v. Essentially non-existent by Globus. v. Major delays in EDG because of MDS and Gatekeeper. v. Finding/testing/fixing of major problems done outside Globus “high-level” services inappropriate for production environment. C. Loomis – Status EDG – Dec. 12, 2002 – 10

Condor Experience Condor. G v. Used for reliable job submission from Resource Broker. v. Condor Experience Condor. G v. Used for reliable job submission from Resource Broker. v. Responsive to problems and provide quick fixes. v. Encountered few problems in our testing. Condor v. Supported “batch” system for EDG. v. Largely untested, but expect to use with next major release. C. Loomis – Status EDG – Dec. 12, 2002 – 11

Typical Failure Modes Operations: v. CRL generation (CA); CRL update (sites) v. Network accessibility Typical Failure Modes Operations: v. CRL generation (CA); CRL update (sites) v. Network accessibility (VO LDAP servers) v. Misconfiguration of services (typically SE) Poor implementation (BUGS) v. Most catastrophic ones eliminated. Resource Exhaustion v. File descriptors, ports, disk space. Design Limitations v. Central points-of-failure (RB, MDS). C. Loomis – Status EDG – Dec. 12, 2002 – 12

Future Developments EDG Plans: v. Advanced data management —Real “Storage Element”. —Replica Location Service Future Developments EDG Plans: v. Advanced data management —Real “Storage Element”. —Replica Location Service (distributed Replica Catalog) —Replica Manager (higher-level user interface) v. Job Management —job splitting, checkpointing —interactive jobs v. Replace MDS with R-GMA. v. More robust, consistent security model. v. Local resources better tied to grid credentials. OGSA (Open Grid Services Architecture) v. New services written as web services. v. Probably no complete conversion with EDG lifetime. C. Loomis – Status EDG – Dec. 12, 2002 – 13

Slash. Grid File System: v. Uses grid credentials for access to local files. v. Slash. Grid File System: v. Uses grid credentials for access to local files. v. Frees grid user from local unix account. —Simplifies mapping of users to accounts. —Allows true account recycling. More Uses: v. Could hide remote access to data. v. Provide compatibility to Globus security model. v… Implementation: v. User-space daemon on top of CODA kernel module. v. Plug-in interface allows easy extension. C. Loomis – Status EDG – Dec. 12, 2002 – 14

Authentication & Authorization (VOMS) User request “ticket” Certification Authorities request certificate proxy sent for Authentication & Authorization (VOMS) User request “ticket” Certification Authorities request certificate proxy sent for authentication and authorization VOMS ~15 National CAs France, INFN, … Update CRL accept/reject request Site X Computing Element Storage Element Local Authorization Decision! C. Loomis – Status EDG – Dec. 12, 2002 – 15

Conclusions Software & Testbed: v. Production-quality security infrastructure in place. v. Production and development Conclusions Software & Testbed: v. Production-quality security infrastructure in place. v. Production and development testbeds: —Deployed. —Starting to see heavy use by end-users. —Reasonable stability for the first time. v. Failure modes: —Moving from bugs and operations problems to design and resource limitations. Unanswered Questions: v. Can optimization be achieved? At what level? v. How can resources be limited, reserved, and shared? v. Can efficient scheduling be done with inhomogeneous site policies? C. Loomis – Status EDG – Dec. 12, 2002 – 16