Скачать презентацию WP 4 Fabric Management 3 rd EU Review Скачать презентацию WP 4 Fabric Management 3 rd EU Review

2a3f63508c53479b1ddf4e5b3f7aeb5f.ppt

  • Количество слайдов: 19

WP 4 Fabric Management 3 rd EU Review Maite Barroso - CERN Maite. Barroso. WP 4 Fabric Management 3 rd EU Review Maite Barroso - CERN Maite. Barroso. [email protected] ch Data. Grid is a project funded by the European Commission under contract IST-2000 -25182 3 rd EU Review – 19 -20/02/2004

Outline u Objectives (3’) u Achievements u Lessons u Future (Summary of objectives for Outline u Objectives (3’) u Achievements u Lessons u Future (Summary of objectives for the whole project) (5’) (Summary of all useful products) learned (3’) & Exploitation (4’) u Questions (10’) Title - n° 2

WP 4: main objective “To deliver a computing fabric comprised of all the necessary WP 4: main objective “To deliver a computing fabric comprised of all the necessary tools to manage a center providing grid services on clusters of thousands of nodes. ” • User job management (Grid and local) • Automated management of large clusters Title - n° 3

WP 4 objective “To deliver a computing fabric comprised of all the necessary tools WP 4 objective “To deliver a computing fabric comprised of all the necessary tools to manage a center providing grid services on clusters of thousands of nodes. ” • User job management (Grid and local) • Automated management of large clusters The development work divided into 6 subtasks: WP 4 Configuration Mgt Installation Mgt Monitoring Fault Tolerance Resource Mgt Gridification Title - n° 4

Data. Grid Architecture Local Computing Grid Local Application Local Database Grid Application Layer Data Data. Grid Architecture Local Computing Grid Local Application Local Database Grid Application Layer Data Management Job Management Metadata Management Object to File Mapping Collective Services Information & Monitoring Replica Manager Grid Scheduler Underlying Grid Services SQL Database Services Computing Element Services Storage Element Services Replica Catalog Authorization Authentication and Accounting Service Index Grid Fabric services Resource Management Configuration Management Monitoring and Fault Tolerance Node Installation & Management Fabric Storage Management WP 4 Title - n° 5

WP 4 Architecture design and the ideas behind u Modularity: Independent modules that can WP 4 Architecture design and the ideas behind u Modularity: Independent modules that can work together as a complete solution but that can also be taken independently and cleanly interfaced to already existing solutions u Node n autonomy. Resolve local problems locally if possible Cache node configuration profile and local monitoring buffer u Information model. Configuration is distinct from monitoring n Configuration == desired state (what we want) n Monitoring == actual state (what we have) u Aggregation n of configuration information Good experience with LCFG concepts with central configuration template hierarchies u Scheduling u Plug-in of intrusive actions authorization and credential mapping Title - n° 6

Automated management of large clusters GRID FABRIC Computing Element RMS Monitoring System Installation System Automated management of large clusters GRID FABRIC Computing Element RMS Monitoring System Installation System Fault Tolerance Configuration System Title - n° 7

Automated management of large clusters Fault Tolerance System Monitoring System Node Configuration System Installation Automated management of large clusters Fault Tolerance System Monitoring System Node Configuration System Installation System Title - n° 8

Automated management of large clusters WP 4 Fault Tolerance framework Node Title - n° Automated management of large clusters WP 4 Fault Tolerance framework Node Title - n° 9

User job management (Grid and local) • Workload • Mgt System • (WP 1) User job management (Grid and local) • Workload • Mgt System • (WP 1) • • WP 4 non- • gridification • WP 4 non- • Grid • • Gridification component • • Non • • WP 4 subsystem • • -WP 4 subsystem Non- • External to fabric • Internal to fabric • CE • (Computing Element) • Job repository • Computing. Element • • SE SE • • RMS • farms • Storage. Element • (WP 5) • LCMAPS • • uid/gid • • other • • tokens • LCAS • plug • ins • • • static list • • wallclocktime • • quota check Title - n° 10

Achievements Long term solution for system installation and configuration; modular, robust, reliable and scalable Achievements Long term solution for system installation and configuration; modular, robust, reliable and scalable system which addresses the needs of large computing clusters Interim solution proposed to the EU Data. Grid testbed as installation and configuration management toolkit while the final quattor framework was developed Framework for monitoring of performance, system status and environmental changes for all resources contained in a fabric Title - n° 11

Achievements RMS Resource Management System. Its main task is to maintain control over the Achievements RMS Resource Management System. Its main task is to maintain control over the fabric’s farm resources and to ensure the efficient scheduling and execution of user (grid or local) jobs and their coordination with maintenance tasks Fault Tolerance Framework Gridification components Framework for automatic fault detection and correction Computing Element, Local Centre Authorization Service, Local Credential Mapping Service: provide mechanism for grid services to access the local fabric services: secure job submission and job control Title - n° 12

Lessons learned u. Fabric Management components are not grid components themselves but they are Lessons learned u. Fabric Management components are not grid components themselves but they are essential for a working grid. u Experience and feedback with existing tools and prototypes helped to get requirements and early feedback from users u There is a real need to be able to install, configure and manage the sites n n Correctly, to avoid configuration errors that may affect not only the site but the whole grid response Automatically, to reduce the work load of system administrators Supporting adaptability, properly managing resource reconfigurations in a fault tolerant way In a reproducible way Title - n° 13

Future & Exploitation u u All the WP 4 partners are committed to continue Future & Exploitation u u All the WP 4 partners are committed to continue support to the WP 4 middleware? ? To be discussed during the workshop Technical evolution (commitment from partners not needed, could be for whoever wants to work in this field in the future): n n n Gridification components: the components will be evolved in the directions marked by GGF for authorization and authentication (LCAS: GGF standards for expressing access policies; LCMAPS: support more services like file access using gird. FTP, support better OS insulation). The support and extension will be undertaken by EGEE. RMS: evolution to use it for resource management in data intensive cluster computing. Evolution towards OGSA. LCFGng: No support/evolution after the end of the project. Quattor: some open issues being tackled by the partners: overall installation toolkit and comprehensive end user documentation. Future work on security enhancements (e. g. fine-grained authorization access to CDB, data encryption). Porting to Solaris 9 and to future RH versions or other Linux distributions. Lemon: displays/GUIs, enhancement of simple data model, sensors for other platforms (Windows) Fault Tolerance: improvements on rule design (web spider? ), user FT API Title - n° 14

Future & Exploitation WP 4 products have been deployed not only within the EDG Future & Exploitation WP 4 products have been deployed not only within the EDG testbed, but also within other sites and Grid projects/environments (map of Europe with all the sites? ): u Virtual laboratory for E-science project (The Netherlands) u Fermilab’s Site Authentication and Authorization service (SAZ). This triggered the development of the authorization call-out mechanism within Globus u LHC Computing Grid project (LCG) u Cross. Grid u Grid. Ice project u CERN Computing Centre (~2000 nodes) u Universidad Autonoma de Madrid (Spain) u University u NIKHEF of Liverpool (UK) (The Netherlands) u LAL (Laboratoire de l'Accélérateur Linéaire, Orsay, France) u Zuse Institute Berlin (ZIB) Title - n° 15

Future & Exploitation u. An excellent example of WP 4 product exploitation by a Future & Exploitation u. An excellent example of WP 4 product exploitation by a production site is CERN: n n CERN Computer centre was one of the WP 4 main requirement sources Very close collaboration to test and evaluate some of the WP 4 products (Lemon and quattor) After a successful evaluation, they adopt them and made the necessary changes to run them in the production clusters (~2000 nodes) Support and future evolution will be overtaken by them? ? Title - n° 16

Future & Exploitation General concepts: u Move u. A from testbeds to production fabrics Future & Exploitation General concepts: u Move u. A from testbeds to production fabrics production fabric has n Inertia … as a virtue! n Charted Qo. S n Scalability n Procedures and Manageability u Cautious n introduction Retain qualities and add functionality! Title - n° 17

Service Lifecycle Focuses Prototype u Proliferation, Elaboration n Focus on functionality n Performance and Service Lifecycle Focuses Prototype u Proliferation, Elaboration n Focus on functionality n Performance and scalability u Risks n n u Destabilisation Workload Simplification, Automation n Focus on uniformity, minimisation n Process and procedure n Availability and reliability n Stability and robustness Production Title - n° 18

Questions? u Level n 1 Level 2 s Level 3 n Level 4 Level Questions? u Level n 1 Level 2 s Level 3 n Level 4 Level 5 Title - n° 19