Скачать презентацию Tony Doyle a doyle physics gla ac uk Grid Скачать презентацию Tony Doyle a doyle physics gla ac uk Grid

4afd30386a53c67c2247ce1f03f74f22.ppt

  • Количество слайдов: 19

Tony Doyle a. doyle@physics. gla. ac. uk Grid. PP – Making the Grid Work Tony Doyle a. doyle@physics. gla. ac. uk Grid. PP – Making the Grid Work for the Science, ATSE e-Science Visit, Edinburgh, 20 April 2004 Tony Doyle - University of Glasgow

Contents • Context 1. General (yesterday) 2. Process (today) 3. Operations (tomorrow) • • Contents • Context 1. General (yesterday) 2. Process (today) 3. Operations (tomorrow) • • Start where Steve left off yesterday. . End up where Andrew begins tomorrow. . – How does the Grid Work? – Performance Indicators – Why was the “failure rate” ~20%? – Software Process – External dependencies – Managing a distributed project. . – Is Grid. PP a Grid? • What is the Grid anyway? (from PP perspective) – Demo. . Tony Doyle - University of Glasgow

How Does the Grid Work? 0. Web User Interface… or CLI 1. Authentication grid-proxy-init How Does the Grid Work? 0. Web User Interface… or CLI 1. Authentication grid-proxy-init 2. Job submission edg-job-submit 3. Monitoring and control edg-job-status edg-job-cancel edg-job-get-output 4. Data publication and replication globus-url-copy, RLS 5. Resource scheduling – use of Mass Storage Systems JDL, sandboxes, storage elements Tony Doyle - University of Glasgow

Job Submission (behind the scenes) nit UI JDL Input “sandbox” Data. Sets info y-i Job Submission (behind the scenes) nit UI JDL Input “sandbox” Data. Sets info y-i pr ox SE & s er ok Br Jo + fo In Job Status Job Submission Service Compute Element Publish ” x bo nd tu sa x” bo nd ta t“ pu sa b. S nfo ut t“ pu Expanded JDL idgr CE i O In Job Query Job Submit Event Globus RSL Job Status Logging & Book-keeping Information Service Output “sandbox” Resource Broker Author. &Authen. Replica Catalogue Storage Element

How do I Authorize? o=xyz, dc=eu-datagrid, dc=org ou=People CN=Homer Simpson CN=Tony Doyle Authentication Certificate How do I Authorize? o=xyz, dc=eu-datagrid, dc=org ou=People CN=Homer Simpson CN=Tony Doyle Authentication Certificate o=testbed, dc=eu-datagrid, dc=org ou=Testbed 1 VO Directory CN=Steven Hawking Authentication Certificate ou=People ou=? ? ? CN=Tony Doyle “Authorization Directory” Authentication Certificate mkgridmap local users CN=Steven Hawking grid-mapfile ban list Tony Doyle - University of Glasgow

UK Certificate Authority and Virtual Organisation membership 1. 3. PP “users” engaged from many UK Certificate Authority and Virtual Organisation membership 1. 3. PP “users” engaged from many institutes 3. 2. UK e-Science Certificate Authority now used in application testbed 2. 1. UK participating in 6 ex 9 EDG Virtual Organisations Tony Doyle - University of Glasgow

Performance indicators (as measured by end users) Conclusion: prototype performance, but with quality assurance Performance indicators (as measured by end users) Conclusion: prototype performance, but with quality assurance mechanisms built-in Tony Doyle - University of Glasgow

Why was the “failure rate” ~20%? I. Experiment Layer II. Application Middleware III. Grid Why was the “failure rate” ~20%? I. Experiment Layer II. Application Middleware III. Grid Middleware IV. Facilities and Fabrics • Component Testing e. g. RB Stress Tests (LCG) • RB never crashed • ran without problems at load for several days in a row 20 streams with 100 jobs each ( typical error rate ~ 2 % still present) • RB stress test in a job storm of 50 streams, 20 jobs each : – 50% of the streams ran out of connections between UI and RB. (configuration parameter – but machine constraints) – Remaining 50% streams finished normal (2% error rate) – Time between job-submit and return of the command (acceptance by the RB) is 3. 5 seconds (independent of number of streams) • PROBLEMS ARE END-TO-END: e. g. Site advertisement communicated via class ads to all sites (inc. e. g. CNAF) results in RB sending application jobs (e. g. Ali. En for ALICE) to “black hole” – these are recorded as “failures” (application corrects for these via re-submission) • OTHER “PROBLEM” IS INCORPORATION OF ADDED FUNCTIONALITY – ~Resolved by adherence to software process coupled to testbed structure… improved significantly within LCG (leading to EGEE) Tony Doyle - University of Glasgow

Data. Grid Release Milestones Evaluations (2. 0. 12) • Features (2. 0. 12) – Data. Grid Release Milestones Evaluations (2. 0. 12) • Features (2. 0. 12) – R-GMA replaced MDS – Refactored workload mgt. – Interactive, MPI, chkpt. jobs – Replica Location Service – Web Service SE EU Review (2. 1. 13) • Features (2. 1. 13) [0. 5 Mloc] – Reasonable stability, reliability – VOMS incorporated – Bug fixes for all services. • Stabilisation time on application testbed typically a few months Tony Doyle - University of Glasgow

Software Process Infrastructure LCG grid software applications (LHC experiments, projects, etc) LCG Application Area Software Process Infrastructure LCG grid software applications (LHC experiments, projects, etc) LCG Application Area POOL, SEAL, PI, SIMU SPI Infrastructure Common services and infrastructure Tools, templates, training General QA, tests, integration, release – Adopt the same set of tools, standards and procedures – Adopt commonly used open-source or commercial software when easily available – Avoid “do it yourself solutions” – Avoid commercial software, since it may give licensing problems Similar ways of working (process) Tony Doyle - University of Glasgow

SPI Services Overview General Services CVS service Collaborative Facilities External Software Web Portal Task SPI Services Overview General Services CVS service Collaborative Facilities External Software Web Portal Task Management Mailing Lists Software Development Coding Quality Assurance Deployment and Installation Analysis and Design Development Release Testing Build systems Specifications Documentation Provide General Services needed by each project – CVS repository, Web Site, Software Library – Mailing Lists, Bug Reports, Task Management, Collaborative Facilities Provide solutions specific to the Software Development phases – Tools, Templates, Policies, Support, Documentations, Examples Tony Doyle - University of Glasgow

External Software • We install software needed by Particle Physics projects • Open Source External Software • We install software needed by Particle Physics projects • Open Source and Public Domain software (libraries and tools) like: – Compilers (icc, ecc) – HEP made packages – Scientific libraries (GSL) – General tools (python) – Test tools (cppunit, qmtest) – Database software (mysql, mysql++) – Documentation generators (lxr, doxygen) – XML parsers (Xerces. C) • There are currently 50 different packages, plus others under evaluation. For more than 300 installations • The LCG projects propose what to install in agreement with LHC needs • The platforms are decided by the Architect Forum – Linux Red. Hat 7. 3 with the compilers • gcc 3. 2 (rh 73_gcc 32) • icc 7. 1 (rh 73_icc 71) • ecc 7. 1 (rh 73_ecc 71) – Windows • Visual Studio. NET 7. 1: (win 32_vc 7). Tony Doyle - University of Glasgow

How Is the process applied? Middleware Validation: From Testbed to Production Build System Unit How Is the process applied? Middleware Validation: From Testbed to Production Build System Unit Test Development Testbed ~15 CPU Application Testbed ~1000 CPU Certification Testbed ~40 CPU Production Run nightly build & auto. tests Individual WP tests Grid certification Certified public release for use by apps. Build system WPs Fix problems Integration Team Overall release tests Process to: Test frameworks Test support Releases Tagged Test policies candidate Releases Test documentation Test platforms/compilers Test Group Application Certification Apps. Representatives Releases Certified candidate Releases Certified release selected for deployment Certification Tagged release selected for certification add unit tested code to repository Integration Tagged package Build Users 24 x 7 Problem reports Tony Doyle - University of Glasgow

The UK Testbed Tony Doyle - University of Glasgow The UK Testbed Tony Doyle - University of Glasgow

e. g. Scot. Grid: Glasgow, Edinburgh and Durham Scot. GRID • Glasgow farm: WNs e. g. Scot. Grid: Glasgow, Edinburgh and Durham Scot. GRID • Glasgow farm: WNs on a private network with outbound NAT in place • 100, 000 jobs completed (900, 000 CPU hours) EDG 1. 4 CE SE • Data Management Testbed EDG 2. 1 SE BIO Shared resources (LHC, CDF and Bioinformatics) 59 x. WN CE LHC 34 dual blade servers and 5 TB Fast. T 500 being integrated now (next door) • CDF MON • Edinburgh: 24 TB Fast. T 700 and 8 -way server: data storage focus • Durham: 40 node farm • All being integrated into LCG-2 Tony Doyle - University of Glasgow

Managing a Distributed Project: Grid. PP 1 Project Status? Tony Doyle - University of Managing a Distributed Project: Grid. PP 1 Project Status? Tony Doyle - University of Glasgow Ø 76% of the 190 Grid. PP 1 tasks have been successfully completed

What is “The Grid” Is Grid. PP a Grid? Anyway? http: //www-fp. mcs. anl. What is “The Grid” Is Grid. PP a Grid? Anyway? http: //www-fp. mcs. anl. gov/~foster/Articles/What. Is. The. Grid. pdf 1. Coordinates resources that are not subject to centralized control 1. YES. This is why development and maintenance of a UK-EU-US testbed is important 2. … using standard, open, general 2. YES. . . Globus/Condor. G/EDG meet -purpose protocols and this requirement. Common interfaces experiment application layers are also important here. 3. … to deliver nontrivial qualities of service 3. NO(T YET)… Experiments define whether this is true - currently only ~100, 000 jobs submitted via the testbed c. f. internal component tests of up 10, 000 jobs per day. Next step: LCG-2 deployment outcome… this year Tony Doyle - University of Glasgow

What is The Grid Anyway? From Particle Physics Perspective The Grid is: not hype, What is The Grid Anyway? From Particle Physics Perspective The Grid is: not hype, but surrounded by it a working prototype running on testbed(s)… about seamless discovery of PC resources around the world using evolving standards for interoperation the basis for particle physics computing in the 21 st Century not (yet) as transparent as end-users want it to be Tony Doyle - University of Glasgow

The Grid: Demonstrations http: //www. gridpp. ac. uk/demos/ Demos used to establish that e. The Grid: Demonstrations http: //www. gridpp. ac. uk/demos/ Demos used to establish that e. g. the two LHC multi-purpose detector collaborations • can run jobs on an International Grid • Use common Grid infrastructure with secure Grid access • But doesn’t mean that the Grid works in production mode • (yet) • This is however signi ficant Tony Doyle - University of Glasgow