Скачать презентацию Managing Computer Centre machines with Quattor Germán Cancio Скачать презентацию Managing Computer Centre machines with Quattor Germán Cancio

ca7e74390fe6b1f70945e72dbb92c68e.ppt

  • Количество слайдов: 30

Managing Computer Centre machines with Quattor Germán Cancio and Piotr Poznański IT/FIO Post C Managing Computer Centre machines with Quattor Germán Cancio and Piotr Poznański IT/FIO Post C 5, 12/12/03 http: //quattor. org Data. Grid is a project funded by the European Commission under contract IST-2000 -25182 IT Post-C 5, 12. 2003

Outline u Concepts u Architecture u Deployment, and Functionality next steps Managing Computer Centre Outline u Concepts u Architecture u Deployment, and Functionality next steps Managing Computer Centre Machines with Quattor – Post-C 5 – Cancio, Poznański - n° 2

quattor in a nutshell : fabric management system developed by EDG u WP 4 quattor in a nutshell : fabric management system developed by EDG u WP 4 n Configuration, installation and management of fabric nodes u Used n n n to manage most of the Linux nodes in the CERN CC >1700 nodes out of ~ 2000 Multiple functionality (batch nodes, disk servers, tape servers, DB, web, …) Heterogeneous hardware (memory, HD size, . . ) u Part of , together with n LEMON monitoring system n LEAF Hardware and State Mgmt system Managing Computer Centre Machines with Quattor – Post-C 5 – Cancio, Poznański - n° 3

Key concepts behind quattor u Autonomous nodes: n Local configuration files n No remote Key concepts behind quattor u Autonomous nodes: n Local configuration files n No remote management scripts n No reliance on global file systems AFS/NFS u Central control: n Primary configuration is kept centrally (and replicated on the nodes) n A single source for all configuration information u Reproducibility: n Idempotent operations n Atomicity of operations u Scalability: n Load balanced servers, scalable protocols u Use n of standards: HTTP, XML, RPM/PKG, Sys. V init scripts, … u Portability: n Linux, Solaris Managing Computer Centre Machines with Quattor – Post-C 5 – Cancio, Poznański - n° 4

quattor architecture - overview Node Configuration Management Node Management u Configuration Management n n quattor architecture - overview Node Configuration Management Node Management u Configuration Management n n Configuration access and caching n u Configuration Database Graphical and Command Line Interfaces Node and Cluster Management n Automated node installation n Node Configuration Management n Software distribution and management Managing Computer Centre Machines with Quattor – Post-C 5 – Cancio, Poznański - n° 5

Configuration Management Managing Computer Centre Machines with Quattor – Post-C 5 – Cancio, Poznański Configuration Management Managing Computer Centre Machines with Quattor – Post-C 5 – Cancio, Poznański - n° 6

Configuration Information u Configuration u Information n is expressed using the Pan language is Configuration Information u Configuration u Information n is expressed using the Pan language is arranged in templates Common properties set only once u Using templates it is possible to create hierarchies to match service structure CERN CC lxbatch lxplus 001 cluster_name: lxbatch master: lxmaster 01 pkg_add (lsf 5. 1) eth 0/ip: 137. 138. 4. 246 pkg_add (lsf 5. 1_debug) lxplus name_srv 1: 137. 138. 16. 5 time_srv 1: ip-time-1 cluster_name: lxplus disk_srv pkg_add (lsf 5. 1) lxplus 020 eth 0/ip: 137. 138. 4. 225 lxplus 029 Managing Computer Centre Machines with Quattor – Post-C 5 – Cancio, Poznański - n° 7

Configuration Management Infrastructure GUI CDB RDBMS CLI S O A P pan XML Scripts Configuration Management Infrastructure GUI CDB RDBMS CLI S O A P pan XML Scripts S Q L H T T P Cache CCM P E R L Node Managing Computer Centre Machines with Quattor – Post-C 5 – Cancio, Poznański - n° 8

Configuration Database (CDB) u Keeps complete configuration information u Configuration machines. u Data n Configuration Database (CDB) u Keeps complete configuration information u Configuration machines. u Data n consistency is enforced by a transaction mechanism All changes are done in transactions u Configuration n describes the desired state of the managed is validated and kept under version control Built-in validation (e. g. types), user defined validation u Going back to previous versions of the configuration is possible n Full history is kept in CVS. u Conflicts of concurrent modification of the same configuration information are detected Managing Computer Centre Machines with Quattor – Post-C 5 – Cancio, Poznański - n° 9

SQL Query Interface u We can ask about properties spanning across machines u We SQL Query Interface u We can ask about properties spanning across machines u We can run SQL queries (SELECT) and create views: n “give me all machines with more than 512 Mbytes of memory” n “give me all machines that belong to lxplus” u Portability: available for Oracle and My. SQL Managing Computer Centre Machines with Quattor – Post-C 5 – Cancio, Poznański - n° 10

Examples of information in CDB u Hardware n n n u Software n n Examples of information in CDB u Hardware n n n u Software n n u Repository definitions Service definitions = groups of packages (RPMs) System n n u CPU Hard disk Network card Memory size Node location in CC Partition table Load balancing information Cluster information n n u Cluster name and type Batch master Audit information n Contract type and number n Purchase date Managing Computer Centre Machines with Quattor – Post-C 5 – Cancio, Poznański - n° 11

Graphical User Interface - Pan. GUIn Managing Computer Centre Machines with Quattor – Post-C Graphical User Interface - Pan. GUIn Managing Computer Centre Machines with Quattor – Post-C 5 – Cancio, Poznański - n° 12

Configuration Cache Manager (CCM) u Runs on every managed node u Provides a local Configuration Cache Manager (CCM) u Runs on every managed node u Provides a local interface to the node’s configuration information u Information is downloaded from CDB and cached: n The access to the configuration is fast n Avoid peaks on CDB servers n Disconnected operation are supported n Information is kept in sync with CDB using notification/polling mechanism u Access to local configuration information is performed through an easy-to-use API Managing Computer Centre Machines with Quattor – Post-C 5 – Cancio, Poznański - n° 13

Node (Cluster) Management Managing Computer Centre Machines with Quattor – Post-C 5 – Cancio, Node (Cluster) Management Managing Computer Centre Machines with Quattor – Post-C 5 – Cancio, Poznański - n° 14

Managing (cluster) nodes Software Servers RPM, PKG packages SW package Manager (SPMA) cache http Managing (cluster) nodes Software Servers RPM, PKG packages SW package Manager (SPMA) cache http nfs ftp Managed Standard nodes SWRe p (RPM, PKG Installed software kernel, system, applications. . Install server System services Node Configuration Manager (NCM) CCM CDB base OS nfs/http dhcp pxe AFS, LSF, SSH, accounting. . Vendor System installer RH 73, RHES, Fedora, … Install Manager Node (re)install Managing Computer Centre Machines with Quattor – Post-C 5 – Cancio, Poznański - n° 15

Install Manager u Sits on top of the standard vendor installer, and configures it Install Manager u Sits on top of the standard vendor installer, and configures it n Which OS version to install n Network and partition information n What core packages n Custom post-installation instructions u Automated u It generation of control file (Kick. Start) also takes care of managing DHCP (and TFTP/PXE) entries u Can get its configuration information from CDB or via command line u Available n for Red. Hat Linux (Anaconda installer) Allows for plugins for other distributions (Su. SE, Debian) or Solaris Managing Computer Centre Machines with Quattor – Post-C 5 – Cancio, Poznański - n° 16

Node Configuration (I) u NCM (Node Configuration Manager) is responsible for ensuring that reality Node Configuration (I) u NCM (Node Configuration Manager) is responsible for ensuring that reality on a node reflects the desired state in CDB. u Framework system, where service specific plug-ins called Components make the necessary system changes n Regenerate local config files (eg. /etc/sshd_config) n Restard/reload services (Sys. V scripts) n configuration dependencies (eg. configure network before sendmail) u Components invoked on boot, via cron or on CDB config changes u Porting of SUE features to NCM components started, to be completed with next CERN certified Linux and Solaris versions. n n Currently available: grub, quota, snmp, dns, automounter, network, inetd, globuscfg, spma, sysaccounting, edg-cfg, … keep portability between Linux/Solaris whenever possible Managing Computer Centre Machines with Quattor – Post-C 5 – Cancio, Poznański - n° 17

Node Configuration (II) u Component support libraries for ease of component development n Configuration Node Configuration (II) u Component support libraries for ease of component development n Configuration information access n Configuration file manipulation n Advanced file operations n Process management n Exception handling libraries u. A tool geared towards sysadmins/operators allows to query/visualize the node’s configuration profile. Managing Computer Centre Machines with Quattor – Post-C 5 – Cancio, Poznański - n° 18

Software Management (I - Server) (SPMA and SWRep were introduced in post-C 5 14/3/03) Software Management (I - Server) (SPMA and SWRep were introduced in post-C 5 14/3/03) u SWRep = Software Repository u Universal n n repository for storing Software: Extendable to multiple platforms and packagers (RH Linux RPM, Solaris PKG, others like Debian pkg) Multiple package versions/releases u Management n ACL based mechanism to grant/deny modification rights (packages associated to “areas”) u Client n access: via standard protocols HTTP, AFS/NFS, FTP u Replication: n (“product maintainers”) interface: using standard tools (rsync) load balancing, redundancy Managing Computer Centre Machines with Quattor – Post-C 5 – Cancio, Poznański - n° 19

Software Management (II - Clients) u SPMA = Software Package Management Agent u Manage Software Management (II - Clients) u SPMA = Software Package Management Agent u Manage all or a subset of packages on the nodes n n u On production nodes: wipe out unknown packages, (re)install missing ones. On development nodes (or desktops): non-intrusive, configurable management of system and security updates. Package manager, not only upgrader n n u Can roll back package versions Transactional verification of operations Portability: Generic plug-in framework n u Plug-ins available for Linux RPM and Solaris PKG, (can be extended) Scalability: n Supports HTTP (also FTP, AFS/NFS) n time smearing n Package pre-caching u Possible to access multiple repositories (division/experiment specific) u Modularity: can be configured via CDB, or locally Managing Computer Centre Machines with Quattor – Post-C 5 – Cancio, Poznański - n° 20

Deployment Managing Computer Centre Machines with Quattor – Post-C 5 – Cancio, Poznański - Deployment Managing Computer Centre Machines with Quattor – Post-C 5 – Cancio, Poznański - n° 21

Quattor deployment @ CERN u Quattor n n is used by FIO to manage Quattor deployment @ CERN u Quattor n n is used by FIO to manage most CC Linux nodes: >1700 nodes, 15 clusters – to be scaled up to >5000 in 2006 -8 (LHC) LXPLUS, LXBATCH, LXSHARE, LXBUILD, disk and tape servers, Oracle DB servers n Red. Hat 7. 3 and RHES 2. 1 n RHES 30 (also on IA 64) to come soon u Not n deployed: Install. Mgr – as CERN legacy solution (AIMS) still in use AIMS interfaced to CDB by FIO (not fully automatic) u Server n n n 4 RH 73 nodes CDB: ~ 260 general templates, and 2 templates per node (one derived from LANDB) : >3800 templates in total SWRep: > 5900 software packages u Solaris n cluster (LXSERV) hosting CDB and SWRep replicas clusters, server nodes and desktops to come for Solaris 9 Cf. Ignacio’s C 5 presentation Managing Computer Centre Machines with Quattor – Post-C 5 – Cancio, Poznański - n° 22

FIO usage examples @ CERN-CC u LSF n batch system upgrade: Upgrade from LSF FIO usage examples @ CERN-CC u LSF n batch system upgrade: Upgrade from LSF 4. 2 to LSF 5. 1 on >1000 nodes within 15 minutes, without service interruption u Security n All security upgrades are done by SPMA s s s SSH security updates KDE upgrades (~ 400 MB per node) on >700 nodes etc … (~once a week!) u Kernel n n n updates: upgrades: SPMA can handle multiple versions of the same package -> Allows to separate in time installation and activation (after reboot) of new kernel NCM component configures which kernel version to use Managing Computer Centre Machines with Quattor – Post-C 5 – Cancio, Poznański - n° 23

Deployment outside CERN-CC u EDG: n n no time for wide deployment Estimated effort Deployment outside CERN-CC u EDG: n n no time for wide deployment Estimated effort for moving from LCFG to quattor exceeded remaining EDG lifetime EDG focus on stability rather than middleware functionality u Tutorials held at HEPi. X and EDG conferences have caused positive feedback and interests: n n n Experiments: LHCb, Atlas HEP institutes: UAM Madrid, LAL/IN 2 P 3, Liverpool University, NIKHEF Projects: Grille 5 K (CNRS France) Managing Computer Centre Machines with Quattor – Post-C 5 – Cancio, Poznański - n° 24

Work in Progress (I) u EDG finish-up n n u End user documentation, install Work in Progress (I) u EDG finish-up n n u End user documentation, install guide, packaging FIO will continue maintaining Quattor (as part of ELFms) after EDG finishes. Remaining developments n n Scalability audit n Data encryption mechanisms: for sensitive data (ACLs) n u CDB fine grained access control Specialized GUIs and CLIs (eg. operators) Improved procedures and workflows n n u Take into account new commands and functionality Release cycle (‘test’, ‘new’, ‘production’ branches of CDB information) Finish migration out of legacy tools n n Finish SUE migration port of SUE features to NCM components in time for next certified Linux ASIS + SUE/security + rpmupdate phaseout was finished in August. Managing Computer Centre Machines with Quattor – Post-C 5 – Cancio, Poznański - n° 25

Work in Progress (II) u Single n CDB for CERN-CC Inclusion of Solaris clusters Work in Progress (II) u Single n CDB for CERN-CC Inclusion of Solaris clusters and nodes u More integration with ELFms LEMON/LEAF n Interfaces from LEAF HMS/SMS to quattor being developed n LEMON sensors for quattor u Upgrade LXSERV service cluster n New hardware n Split in front-end / back end nodes u LCG-2 integration for LXBATCH nodes (Worker. Nodes) n Deploy software from EDG, LCG, experiment SW n Deploy NCM components for local grid services Managing Computer Centre Machines with Quattor – Post-C 5 – Cancio, Poznański - n° 26

http: //quattor. org Managing Computer Centre Machines with Quattor – Post-C 5 – Cancio, http: //quattor. org Managing Computer Centre Machines with Quattor – Post-C 5 – Cancio, Poznański - n° 27

Differences with ASIS/SUE ASIS: See post-C 5 14/3/2003 u Scalability n u SUE: u Differences with ASIS/SUE ASIS: See post-C 5 14/3/2003 u Scalability n u SUE: u Focus on configuration, not installation u Powerful configuration language HTTP vs. shared file system Supports native packaging system (RPM, PKG) u Manages all software on the node u ‘real’ Central Configuration database u n (But: no end-user GUI, no package generation tool) n n n True hierarchical structures Extendable data manipulation language (user defined) typing and validation Sharing of configuration data between components now possible u Central Configuration Database u Supports unconfiguring services u Improved depenency model n u Pre/post dependencies Revamped component support libraries Managing Computer Centre Machines with Quattor – Post-C 5 – Cancio, Poznański - n° 28

Differences with EDG-LCFG u New and powerful configuration language u Modularity n n Clearly Differences with EDG-LCFG u New and powerful configuration language u Modularity n n Clearly defined interfaces and protocols n Extendable data manipulation language n Mostly independent modules n u True hierarchical structures (user defined) typing and validation n Portability n u Plug-in architecture -> Linux and Solaris u n Removed non-scalable protocols n Enhanced components Sharing of configuration data between components now possible u “light” functionality built in (eg. package management) NFS mounts not necessary any longer Enhanced management of software packages n n ACL’s for SWRep n u New component support libraries Native configuration access API (NVA-API) n No need for RPM ‘header’ files Stick to the standards where possible n n Installation subsystem uses system installer Components don’t replace Sys. V init. d subsystem Managing Computer Centre Machines with Quattor – Post-C 5 – Cancio, Poznański - n° 29

NCM Component example [. . . ] sub Configure { my ($self, $config) = NCM Component example [. . . ] sub Configure { my ($self, $config) = @_; # access configuration information my $arch=$config->get. Value('/system/architecture’); # CDB API $self->Fail (“not supported") unless ($arch eq ‘i 386’); # (re)generate and/or update local config file(s) open (myconfig, ’/etc/myconfig’); … # notify affected (Sys. V) services if required if ($changed) { system(‘/sbin/service myservice reload’); … } } sub Unconfigure {. . . } Managing Computer Centre Machines with Quattor – Post-C 5 – Cancio, Poznański - n° 30