ca7e74390fe6b1f70945e72dbb92c68e.ppt
- Количество слайдов: 30
Managing Computer Centre machines with Quattor Germán Cancio and Piotr Poznański IT/FIO Post C 5, 12/12/03 http: //quattor. org Data. Grid is a project funded by the European Commission under contract IST-2000 -25182 IT Post-C 5, 12. 2003
Outline u Concepts u Architecture u Deployment, and Functionality next steps Managing Computer Centre Machines with Quattor – Post-C 5 – Cancio, Poznański - n° 2
quattor in a nutshell : fabric management system developed by EDG u WP 4 n Configuration, installation and management of fabric nodes u Used n n n to manage most of the Linux nodes in the CERN CC >1700 nodes out of ~ 2000 Multiple functionality (batch nodes, disk servers, tape servers, DB, web, …) Heterogeneous hardware (memory, HD size, . . ) u Part of , together with n LEMON monitoring system n LEAF Hardware and State Mgmt system Managing Computer Centre Machines with Quattor – Post-C 5 – Cancio, Poznański - n° 3
Key concepts behind quattor u Autonomous nodes: n Local configuration files n No remote management scripts n No reliance on global file systems AFS/NFS u Central control: n Primary configuration is kept centrally (and replicated on the nodes) n A single source for all configuration information u Reproducibility: n Idempotent operations n Atomicity of operations u Scalability: n Load balanced servers, scalable protocols u Use n of standards: HTTP, XML, RPM/PKG, Sys. V init scripts, … u Portability: n Linux, Solaris Managing Computer Centre Machines with Quattor – Post-C 5 – Cancio, Poznański - n° 4
quattor architecture - overview Node Configuration Management Node Management u Configuration Management n n Configuration access and caching n u Configuration Database Graphical and Command Line Interfaces Node and Cluster Management n Automated node installation n Node Configuration Management n Software distribution and management Managing Computer Centre Machines with Quattor – Post-C 5 – Cancio, Poznański - n° 5
Configuration Management Managing Computer Centre Machines with Quattor – Post-C 5 – Cancio, Poznański - n° 6
Configuration Information u Configuration u Information n is expressed using the Pan language is arranged in templates Common properties set only once u Using templates it is possible to create hierarchies to match service structure CERN CC lxbatch lxplus 001 cluster_name: lxbatch master: lxmaster 01 pkg_add (lsf 5. 1) eth 0/ip: 137. 138. 4. 246 pkg_add (lsf 5. 1_debug) lxplus name_srv 1: 137. 138. 16. 5 time_srv 1: ip-time-1 cluster_name: lxplus disk_srv pkg_add (lsf 5. 1) lxplus 020 eth 0/ip: 137. 138. 4. 225 lxplus 029 Managing Computer Centre Machines with Quattor – Post-C 5 – Cancio, Poznański - n° 7
Configuration Management Infrastructure GUI CDB RDBMS CLI S O A P pan XML Scripts S Q L H T T P Cache CCM P E R L Node Managing Computer Centre Machines with Quattor – Post-C 5 – Cancio, Poznański - n° 8
Configuration Database (CDB) u Keeps complete configuration information u Configuration machines. u Data n consistency is enforced by a transaction mechanism All changes are done in transactions u Configuration n describes the desired state of the managed is validated and kept under version control Built-in validation (e. g. types), user defined validation u Going back to previous versions of the configuration is possible n Full history is kept in CVS. u Conflicts of concurrent modification of the same configuration information are detected Managing Computer Centre Machines with Quattor – Post-C 5 – Cancio, Poznański - n° 9
SQL Query Interface u We can ask about properties spanning across machines u We can run SQL queries (SELECT) and create views: n “give me all machines with more than 512 Mbytes of memory” n “give me all machines that belong to lxplus” u Portability: available for Oracle and My. SQL Managing Computer Centre Machines with Quattor – Post-C 5 – Cancio, Poznański - n° 10
Examples of information in CDB u Hardware n n n u Software n n u Repository definitions Service definitions = groups of packages (RPMs) System n n u CPU Hard disk Network card Memory size Node location in CC Partition table Load balancing information Cluster information n n u Cluster name and type Batch master Audit information n Contract type and number n Purchase date Managing Computer Centre Machines with Quattor – Post-C 5 – Cancio, Poznański - n° 11
Graphical User Interface - Pan. GUIn Managing Computer Centre Machines with Quattor – Post-C 5 – Cancio, Poznański - n° 12
Configuration Cache Manager (CCM) u Runs on every managed node u Provides a local interface to the node’s configuration information u Information is downloaded from CDB and cached: n The access to the configuration is fast n Avoid peaks on CDB servers n Disconnected operation are supported n Information is kept in sync with CDB using notification/polling mechanism u Access to local configuration information is performed through an easy-to-use API Managing Computer Centre Machines with Quattor – Post-C 5 – Cancio, Poznański - n° 13
Node (Cluster) Management Managing Computer Centre Machines with Quattor – Post-C 5 – Cancio, Poznański - n° 14
Managing (cluster) nodes Software Servers RPM, PKG packages SW package Manager (SPMA) cache http nfs ftp Managed Standard nodes SWRe p (RPM, PKG Installed software kernel, system, applications. . Install server System services Node Configuration Manager (NCM) CCM CDB base OS nfs/http dhcp pxe AFS, LSF, SSH, accounting. . Vendor System installer RH 73, RHES, Fedora, … Install Manager Node (re)install Managing Computer Centre Machines with Quattor – Post-C 5 – Cancio, Poznański - n° 15
Install Manager u Sits on top of the standard vendor installer, and configures it n Which OS version to install n Network and partition information n What core packages n Custom post-installation instructions u Automated u It generation of control file (Kick. Start) also takes care of managing DHCP (and TFTP/PXE) entries u Can get its configuration information from CDB or via command line u Available n for Red. Hat Linux (Anaconda installer) Allows for plugins for other distributions (Su. SE, Debian) or Solaris Managing Computer Centre Machines with Quattor – Post-C 5 – Cancio, Poznański - n° 16
Node Configuration (I) u NCM (Node Configuration Manager) is responsible for ensuring that reality on a node reflects the desired state in CDB. u Framework system, where service specific plug-ins called Components make the necessary system changes n Regenerate local config files (eg. /etc/sshd_config) n Restard/reload services (Sys. V scripts) n configuration dependencies (eg. configure network before sendmail) u Components invoked on boot, via cron or on CDB config changes u Porting of SUE features to NCM components started, to be completed with next CERN certified Linux and Solaris versions. n n Currently available: grub, quota, snmp, dns, automounter, network, inetd, globuscfg, spma, sysaccounting, edg-cfg, … keep portability between Linux/Solaris whenever possible Managing Computer Centre Machines with Quattor – Post-C 5 – Cancio, Poznański - n° 17
Node Configuration (II) u Component support libraries for ease of component development n Configuration information access n Configuration file manipulation n Advanced file operations n Process management n Exception handling libraries u. A tool geared towards sysadmins/operators allows to query/visualize the node’s configuration profile. Managing Computer Centre Machines with Quattor – Post-C 5 – Cancio, Poznański - n° 18
Software Management (I - Server) (SPMA and SWRep were introduced in post-C 5 14/3/03) u SWRep = Software Repository u Universal n n repository for storing Software: Extendable to multiple platforms and packagers (RH Linux RPM, Solaris PKG, others like Debian pkg) Multiple package versions/releases u Management n ACL based mechanism to grant/deny modification rights (packages associated to “areas”) u Client n access: via standard protocols HTTP, AFS/NFS, FTP u Replication: n (“product maintainers”) interface: using standard tools (rsync) load balancing, redundancy Managing Computer Centre Machines with Quattor – Post-C 5 – Cancio, Poznański - n° 19
Software Management (II - Clients) u SPMA = Software Package Management Agent u Manage all or a subset of packages on the nodes n n u On production nodes: wipe out unknown packages, (re)install missing ones. On development nodes (or desktops): non-intrusive, configurable management of system and security updates. Package manager, not only upgrader n n u Can roll back package versions Transactional verification of operations Portability: Generic plug-in framework n u Plug-ins available for Linux RPM and Solaris PKG, (can be extended) Scalability: n Supports HTTP (also FTP, AFS/NFS) n time smearing n Package pre-caching u Possible to access multiple repositories (division/experiment specific) u Modularity: can be configured via CDB, or locally Managing Computer Centre Machines with Quattor – Post-C 5 – Cancio, Poznański - n° 20
Deployment Managing Computer Centre Machines with Quattor – Post-C 5 – Cancio, Poznański - n° 21
Quattor deployment @ CERN u Quattor n n is used by FIO to manage most CC Linux nodes: >1700 nodes, 15 clusters – to be scaled up to >5000 in 2006 -8 (LHC) LXPLUS, LXBATCH, LXSHARE, LXBUILD, disk and tape servers, Oracle DB servers n Red. Hat 7. 3 and RHES 2. 1 n RHES 30 (also on IA 64) to come soon u Not n deployed: Install. Mgr – as CERN legacy solution (AIMS) still in use AIMS interfaced to CDB by FIO (not fully automatic) u Server n n n 4 RH 73 nodes CDB: ~ 260 general templates, and 2 templates per node (one derived from LANDB) : >3800 templates in total SWRep: > 5900 software packages u Solaris n cluster (LXSERV) hosting CDB and SWRep replicas clusters, server nodes and desktops to come for Solaris 9 Cf. Ignacio’s C 5 presentation Managing Computer Centre Machines with Quattor – Post-C 5 – Cancio, Poznański - n° 22
FIO usage examples @ CERN-CC u LSF n batch system upgrade: Upgrade from LSF 4. 2 to LSF 5. 1 on >1000 nodes within 15 minutes, without service interruption u Security n All security upgrades are done by SPMA s s s SSH security updates KDE upgrades (~ 400 MB per node) on >700 nodes etc … (~once a week!) u Kernel n n n updates: upgrades: SPMA can handle multiple versions of the same package -> Allows to separate in time installation and activation (after reboot) of new kernel NCM component configures which kernel version to use Managing Computer Centre Machines with Quattor – Post-C 5 – Cancio, Poznański - n° 23
Deployment outside CERN-CC u EDG: n n no time for wide deployment Estimated effort for moving from LCFG to quattor exceeded remaining EDG lifetime EDG focus on stability rather than middleware functionality u Tutorials held at HEPi. X and EDG conferences have caused positive feedback and interests: n n n Experiments: LHCb, Atlas HEP institutes: UAM Madrid, LAL/IN 2 P 3, Liverpool University, NIKHEF Projects: Grille 5 K (CNRS France) Managing Computer Centre Machines with Quattor – Post-C 5 – Cancio, Poznański - n° 24
Work in Progress (I) u EDG finish-up n n u End user documentation, install guide, packaging FIO will continue maintaining Quattor (as part of ELFms) after EDG finishes. Remaining developments n n Scalability audit n Data encryption mechanisms: for sensitive data (ACLs) n u CDB fine grained access control Specialized GUIs and CLIs (eg. operators) Improved procedures and workflows n n u Take into account new commands and functionality Release cycle (‘test’, ‘new’, ‘production’ branches of CDB information) Finish migration out of legacy tools n n Finish SUE migration port of SUE features to NCM components in time for next certified Linux ASIS + SUE/security + rpmupdate phaseout was finished in August. Managing Computer Centre Machines with Quattor – Post-C 5 – Cancio, Poznański - n° 25
Work in Progress (II) u Single n CDB for CERN-CC Inclusion of Solaris clusters and nodes u More integration with ELFms LEMON/LEAF n Interfaces from LEAF HMS/SMS to quattor being developed n LEMON sensors for quattor u Upgrade LXSERV service cluster n New hardware n Split in front-end / back end nodes u LCG-2 integration for LXBATCH nodes (Worker. Nodes) n Deploy software from EDG, LCG, experiment SW n Deploy NCM components for local grid services Managing Computer Centre Machines with Quattor – Post-C 5 – Cancio, Poznański - n° 26
http: //quattor. org Managing Computer Centre Machines with Quattor – Post-C 5 – Cancio, Poznański - n° 27
Differences with ASIS/SUE ASIS: See post-C 5 14/3/2003 u Scalability n u SUE: u Focus on configuration, not installation u Powerful configuration language HTTP vs. shared file system Supports native packaging system (RPM, PKG) u Manages all software on the node u ‘real’ Central Configuration database u n (But: no end-user GUI, no package generation tool) n n n True hierarchical structures Extendable data manipulation language (user defined) typing and validation Sharing of configuration data between components now possible u Central Configuration Database u Supports unconfiguring services u Improved depenency model n u Pre/post dependencies Revamped component support libraries Managing Computer Centre Machines with Quattor – Post-C 5 – Cancio, Poznański - n° 28
Differences with EDG-LCFG u New and powerful configuration language u Modularity n n Clearly defined interfaces and protocols n Extendable data manipulation language n Mostly independent modules n u True hierarchical structures (user defined) typing and validation n Portability n u Plug-in architecture -> Linux and Solaris u n Removed non-scalable protocols n Enhanced components Sharing of configuration data between components now possible u “light” functionality built in (eg. package management) NFS mounts not necessary any longer Enhanced management of software packages n n ACL’s for SWRep n u New component support libraries Native configuration access API (NVA-API) n No need for RPM ‘header’ files Stick to the standards where possible n n Installation subsystem uses system installer Components don’t replace Sys. V init. d subsystem Managing Computer Centre Machines with Quattor – Post-C 5 – Cancio, Poznański - n° 29
NCM Component example [. . . ] sub Configure { my ($self, $config) = @_; # access configuration information my $arch=$config->get. Value('/system/architecture’); # CDB API $self->Fail (“not supported") unless ($arch eq ‘i 386’); # (re)generate and/or update local config file(s) open (myconfig, ’/etc/myconfig’); … # notify affected (Sys. V) services if required if ($changed) { system(‘/sbin/service myservice reload’); … } } sub Unconfigure {. . . } Managing Computer Centre Machines with Quattor – Post-C 5 – Cancio, Poznański - n° 30


