0b0ac72e9a2802ec81426ea50085c2e5.ppt
- Количество слайдов: 16
WP 4 Fabric Management 3 rd EU Review Maite Barroso - CERN Maite. Barroso. Lopez@cern. ch Data. Grid is a project funded by the European Commission under contract IST-2000 -25182 3 rd EU Review – 19 -20/02/2004
Outline u Objectives from the technical annex u Achievements: u Lessons summary of WP 4 products learned u Future u Exploitation u Questions 3 rd EU Review - n° 2
WP 4: main objective “To deliver a computing fabric comprised of all the necessary tools to manage a center providing grid services on clusters of thousands of nodes. ” • User job management (Grid and local) • Automated management of large clusters 3 rd EU Review - n° 3
WP 4 Architecture concepts u. Modularity. Open interfaces and protocols u. Scalability. Thousands of machines u. Automation. Minimize manual interventions u. Node autonomy. Operations are handled locally whenever possible u. Site autonomy. A site must keep control of its local resources 3 rd EU Review - n° 4
Fabric Management Grid Users Fabric Gridification FABRIC Resource Management Local Users Fabric Installation Configuration Management Monitoring Fabric Fault Tolerance 3 rd EU Review - n° 5
Fabric Management Grid Users User job management LCAS Fabric Gridification FABRIC LCMAPS -Secure job submission and job control - Local authorization and mapping of grid credentials Resource Management Local Users RMS Ensure the efficient scheduling and execution of user (grid or local) jobs and their coordination with maintenance tasks Fabric Installation Configuration Management Monitoring Fabric Fault Tolerance 3 rd EU Review - n° 6
Fabric Management Grid Users -Fabric wide CDB provides central storage and management of all fabric configuration information -Subsystems running on the nodes take care of managing software packages and configuring local services -Framework for automatic fault detection and correction Fabric Gridification FABRIC Resource Management Local Users - Correlation Engines regularly check the - Provides framework for gathering, transporting, monitoring data storing and accessing performance, system status and environmental changes for all resources -If data is not between defined limits, they contained in a fabric trigger alarms or recovery actions Automated management of large clusters Fabric Installation Configuration Management Monitoring Fabric Fault Tolerance Framework 3 rd EU Review - n° 7
Fabric Management Grid Users LCAS Fabric Gridification FABRIC LCMAPS Resource Management Local Users RMS Fabric Installation Configuration Management Monitoring Fabric Fault Tolerance Framework 3 rd EU Review - n° 8
Lessons learned u Fabric Management components are not grid components themselves but they are essential for a working grid. u Fabric management components need to be deployed, stabilized and understood by system administrators before the rest of the middleware components u Experience and feedback with existing tools helped to get requirements and early feedback from users and site administrators But interim solutions tend to live longer than expected! 3 rd EU Review - n° 9
Lessons learned u There is a real need to be able to install, configure and manage the small-medium-big sites (complexity is the same!) u Sites find it very difficult to change fabric management framework: n n n It implies learning a new framework: new procedures, new tools It has to coexist with legacy tools, services and procedures, hence it has to be modular and with very clean interfaces so they can be incrementally replaced The EDG sites were testbeds, where tools and procedures could be imposed. This is not the case for production sites 3 rd EU Review - n° 10
Future Move from testbeds to production fabrics Testbed u Functionality n u Performance and scalability Simplification, Automation n n Process and procedure n Production Focus on providing a service Availability and reliability n Stability and robustness 3 rd EU Review - n° 11
Future u Gridification: evolution in the directions marked by GGF for authorization and authentication s u RMS: The support and extension will be undertaken by EGEE Evolution for data aware scheduling in clusters u EDG-LCFGng: No support after the end of the project. Replaced by quattor, the final solution for installation and configuration mgt u Quattor: s s Security enhancements (e. g. fine-grained authorization access to CDB, data encryption) Porting to Solaris and to future RH versions or other Linux distributions u Lemon: s s u Fault s s Displays/GUIs Port to other platforms (Solaris, Windows) Tolerance: User FT API Port to other platforms 3 rd EU Review - n° 12
Exploitation WP 4 products have been deployed within the EDG testbed and within other production sites and Grid projects/environments: u LHC Computing Grid project (LCG) u Cross. Grid project u Grid. Ice project u Virtual laboratory for E-science project u Flow. Grid project u INFN grid project u CERN Computing Centre (~2000 nodes) u Universidad Autonoma de Madrid (Spain) u University of Liverpool (UK) u NIKHEF (The Netherlands) u LAL (Orsay, France) u ZIB (Berlin, Germany) u KIP (Heidelberg, Germany) u Fermilab (U. S. ) u BARC (India) 3 rd EU Review - n° 13
Exploitation WP 4 products have been deployed within the EDG testbed and within other production sites and Grid projects/environments: u LHC Computing Grid project (LCG) Cross. Grid project u CERN Computing Centre (~2000 nodes) Universidad Autonoma These deployments will ensure the de Madrid (Spain) Grid. Ice project maintenance and evolution of Liverpool (UK) University of the WP 4 Virtual laboratory for E-science framework after the end of (The Netherlands) NIKHEF the Data. Grid project LAL (Orsay, France) project Flow. Grid u u u u u INFN grid project u ZIB (Berlin, Germany) u KIP (Heidelberg, Germany) u Fermilab (U. S. ) u BARC, India 3 rd EU Review - n° 14
Summary u. WP 4 has delivered a complete and evolvable fabric management framework u. Initial deployments at production sites show that the framework is accepted u. The growing user community will ensure the continued existence of the WP 4 fabric management framework after the end of the Data. Grid project 3 rd EU Review - n° 15
WP 4 release schedule EDG 1. 4. x LCFGng RMS Fabric Monitoring EDG 2. 0 LCMAPS, LCAS, VOMS 2003 2002 2001 EDG 1. 0 LCFG CE Info Providers EDG 1. 2 LCAS + edg_gatekeeper Quattor started Quattor finished 3 rd EU Review - n° 16
0b0ac72e9a2802ec81426ea50085c2e5.ppt