Скачать презентацию The EPIKH Project Exchange Programme to advance e-Infrastructure Скачать презентацию The EPIKH Project Exchange Programme to advance e-Infrastructure

a49076dc9f45bd25f216884502f71447.ppt

  • Количество слайдов: 50

The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) CE+WN+site. BDII Installation and configuration The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) CE+WN+site. BDII Installation and configuration Bouchra RAHIM([email protected] ma) Africa 6 2010 - Joint EUMEDGRID-Support/EPIKH School for Grid Site Administrators Rabat, 01. 06. 2011 www. epikh. eu

Outline • • • Computing Element overview Worker Node overview CE CREAM overview g. Outline • • • Computing Element overview Worker Node overview CE CREAM overview g. Lite stack overview g. Lite CE site. BDII g. Lite CE cream and WN 2

g. Lite stack overview 3 g. Lite stack overview 3

g. Lite overview worker node 4 g. Lite overview worker node 4

glite overview • User Interface: it’s the point of access for users to glite glite overview • User Interface: it’s the point of access for users to glite grid services • WMS: it’s the component that optimize resource usage. • CE: the machine who manage worker nodes • WN: the machines who actually execute applications • SE: machines where files are stored • LFC: used to “find” files on the grid • BDII: services responsible to publish all info of your sites • Logging and Bookkeping: as it’s name says it’s a logger and alert user when job is finisched 5

Computing Element Overview • Computing Element provides some of main services of a site. Computing Element Overview • Computing Element provides some of main services of a site. • Main functionalities: – job management (job submission, job control) – job status updated for WMS – Communicate with BDII site that publishes all information regarding the computing element • It can runs several kinds of batch system: – – Torque + MAUI LSF SGE Condor 6

Torque + MAUI • Torque server service: – pbs_server provides basic batch services such Torque + MAUI • Torque server service: – pbs_server provides basic batch services such as receiving/creating a batch job. • Torque client service: – psb_mom places jobs into execution. It’s is also responsible for returning job’s output to the user. • MAUI system service: – job_scheduler contains site’s policy to decide which job is going to be executed and when. 7

Site BDII* • By default it was installed on CE but now it’s better Site BDII* • By default it was installed on CE but now it’s better to install it on a dedicated server, physical or virtual. • It collect all site GRISes* (for example SE, RB, LFC, etc. . . ) • Service is named bdii • Log file: /opt/bdii/var/bdii. log • *BDII = Berkeley Database Information Index • **GRIS = Grid Resouce Information Service 8

Worker Node Element Overview • They are machines which really execute your job. • Worker Node Element Overview • They are machines which really execute your job. • User can only access their services by a Computing Element. • Their characteristics are collected by Computing Element that publishes all information by BDII services 9

CE Cream overview • Computing Resource Execution And Management • Accept job submission requests CE Cream overview • Computing Resource Execution And Management • Accept job submission requests belonging from a WMS and other job management request. • It exposes a web services interface 10

Requirements • Three or more machine: – One will be used to perform CE Requirements • Three or more machine: – One will be used to perform CE installation; – One will be used to perform site BDII installation; – Others will be used to perform WN installation; • Architecture: 64 bit • Operating System: Scientific Linux 5 • Two machines with a public ip address, direct and reverse address resolution on a DNS (CE and BDII ) • The CE machine must be equipped with an X 509 certificate 11

BDII Installation ) 12 BDII Installation ) 12

Preparing the Linux machine • Network Time Protocol settings # yum install ntp • Preparing the Linux machine • Network Time Protocol settings # yum install ntp • Copy the ntp. conf file and the ntp directory from ftp: //repo. magrid. ma/pub/CE_WN_BDII/ to /etc/ (Winscp) • Synchronize the date # /etc/init. d/ntpd stop # ntpdate ntp. marwan. ma • Start the ntpd service and configure it to start on boot # /etc/init. d/ntpd start # chkconfig ntpd on 13

Preparing the Linux machine • Disable Selinux: make sure /etc/selinux/config contains line: SELINUX=disabled • Preparing the Linux machine • Disable Selinux: make sure /etc/selinux/config contains line: SELINUX=disabled • Please check If you have a valid hostname #hostname –f # cat /etc/hosts • Stop iptables # /etc/init. d/iptables stop # chkconfig iptables off • Reboot 14

Repository set up-BDII • Add to system repository ones specific for middleware to install Repository set up-BDII • Add to system repository ones specific for middleware to install # cd /etc/yum. repos. d/ # mv dag. repo. stop export MREPO=http: //repo. magrid. ma/yumrepo/glite 32 # REPOS="dag lcg-CA glite-BDII_site" # for name in $REPOS; do wget $MREPO/$name. repo –O /etc/yum. repos. d/$name. repo; done 15

 package installation-BDII • Use yum to install needed packets # yum install lcg-CA package installation-BDII • Use yum to install needed packets # yum install lcg-CA ca-policy-egi-core ca-policy-lcg # yum install glite-BDII_site 16

 Yaim Configuration • All the configuration samples files are located in /opt/glite/yaim/examples/siteinfo directory Yaim Configuration • All the configuration samples files are located in /opt/glite/yaim/examples/siteinfo directory • it’s better to make a copy of the original files #mkdir /opt/glite/yaim/etc/siteinfo/services/ #cp /opt/glite/yaim/examples/siteinfo/site-info. def /opt/glite/yaim/etc/siteinfo/site-info. def #cp /opt/glite/yaim/examples/siteinfo/services/glitebdii_site /opt/glite/yaim/etc/siteinfo/services/glitebdii_site #cp /opt/glite/yaim/examples/users. conf /opt/glite/yaim/etc/siteinfo/users. conf #cp /opt/glite/yaim/examples/groups. conf /opt/glite/yaim/etc/siteinfo/groups. conf #cp /opt/glite/yaim/examples/siteinfo/edgusers. conf /opt/glite/yaim/etc/siteinfo/edgusers. conf 17

 Yaim Configuration • You can find some template files in : ftp: //repo. Yaim Configuration • You can find some template files in : ftp: //repo. magrid. ma/pub/CE_WN_BDII/ • Edit the site-info. def file and change the following variables: – SITE_NAME=MA-ZZ-School (Name of the site) – CE_HOST=pc. XX. magrid. ma (XX the machine that will be a CE) – SITE_BDII_HOST=pc. YY. magrid. ma(the current machine) • Edit the services/glite-bdii_site file and change the following variables: – SITE_NAME=MA-ZZ-School – SITE_DESC="MA-ZZ-School" 18

 Yaim Configuration-BDII • Run the configuration Command: /opt/glite/yaim/bin/yaim -c -s /opt/glite/yaim/etc/siteinfo/site-info. def -n Yaim Configuration-BDII • Run the configuration Command: /opt/glite/yaim/bin/yaim -c -s /opt/glite/yaim/etc/siteinfo/site-info. def -n glite. BDII_site • if everything is OK, run a basic test – ldapsearch -x -h pc. YY. magrid. ma -p 2170 -b "mds-voname=local, o=grid" 19

CE Cream Installation (on Torque/PBS) 20 CE Cream Installation (on Torque/PBS) 20

Preparing the Linux machine • Network Time Protocol settings # yum install ntp Preparing Preparing the Linux machine • Network Time Protocol settings # yum install ntp Preparing the Linux machine # /etc/init. d/ntpd stop • Copy the ntp. conf file and the ntp directory from ftp: //repo. magrid. ma/pub/CE_WN_BDII/ to /etc/ (Winscp) • Synchronize the date with an ntp server # ntpdate ntp. marwan. ma • Start the ntpd service and configure it to start on boot # /etc/init. d/ntpd start # chkconfig ntpd on 21

Preparing the Linux machine • Disable Selinux: make sure /etc/selinux/config contains line: SELINUX=disabled Preparing Preparing the Linux machine • Disable Selinux: make sure /etc/selinux/config contains line: SELINUX=disabled Preparing the #hostname –f # cat Linux machine /etc/hosts • Please check If you have a valid hostname • Stop iptables # /etc/init. d/iptables stop # chkconfig iptables off • Reboot 22

Repository set up-CE • Add to system repository ones specific for middleware to install Repository set up-CE • Add to system repository ones specific for middleware to install # cd /etc/yum. repos. d/ # mv dag. repo. stop export MREPO=http: //repo. magrid. ma/yumrepo/glite 32 # REPO="dag lcg-CA glite-CREAM glite-TORQUE_server glite-TORQUE_utils" # for name in $REPOS; do wget $MREPO/$name. repo –O /etc/yum. repos. d/$name. repo; done 23

 package installation-CE • Use yum to install needed packets # yum clean all package installation-CE • Use yum to install needed packets # yum clean all # yum install lcg-CA ca-policy-egi-core ca-policy-lcg # yum install glite-CREAM # yum install glite-TORQUE_server glite-TORQUE_utils • Due to a dependency problem within the Tomcat distribution in SL 5 first install xml-commons-apis: yum install xml-commons-apis 24

Before configuration-Host. Certificates • Some preliminary steps before configuration: - copy host certificate in Before configuration-Host. Certificates • Some preliminary steps before configuration: - copy host certificate in default path: # cd # mv /root/pc. XXcert. pem /etc/grid-security/hostcert. pem # mv root/pc. XXkey. pem /etc/grid-security/hostkey. pem # chmod 400 /etc/grid-security/hostkey. pem # chmod 600 /etc/grid-security/hostcert. pem 25

YAIM configuration-CE • Main file to edit is site-info. def, where you specify some YAIM configuration-CE • Main file to edit is site-info. def, where you specify some general settings and other component’s parameters (CE Cream) • Other file to be edited are: wn-list. conf, users. conf, groups. conf, services/glite-creamce • Set variables with corrected values replacing example ones. # vi services/glite-creamce CEMON_HOST=pc. XX. $MY_DOMAIN CREAM_DB_USER=eumed CREAM_DB_PASSWORD=grid 2011 BLPARSER_HOST=pc. XX. $MY_DOMAIN 26

YAIM configuration-CE Declare the worker nodes in wn-list. conf # vi wn-list. conf pc. YAIM configuration-CE Declare the worker nodes in wn-list. conf # vi wn-list. conf pc. AA. magrid. ma pc. BB. magrid. ma 27

YAIM configuration-CE CE_HOST=pc. YY. magrid. ma CE_CPU_MODEL=XEON #cat /proc/cpuinfo CE_CPU_VENDOR=Intel CE_CPU_SPEED=2230 CE_OS=Scientific. SL CE_OS_RELEASE=5. YAIM configuration-CE CE_HOST=pc. YY. magrid. ma CE_CPU_MODEL=XEON #cat /proc/cpuinfo CE_CPU_VENDOR=Intel CE_CPU_SPEED=2230 CE_OS=Scientific. SL CE_OS_RELEASE=5. 5 #cat /etc/redhat-release CE_OS_VERSION="Boron" CE_OS_ARCH=x 86_64 CE_MINPHYSMEM=512 #cat /proc/meminfo on WN CE_MINVIRTMEM=512 CE_PHYSCPU=1 #total cpu in site CE_LOGCPU=4 CE_SMPSIZE=4 CE_OUTBOUNDIP=TRUE CE_INBOUNDIP=FALSE CE_OTHERDESCR="Cores=4, Benchmark=6. 5 -HEP-SPEC 06” http: //gkswiki. fzk. de/index. php 5/Configuration_of_the_CREAM_CE 28

YAIM configuration-CE • How to set CE_SI 00, CE_SF 00, CE_CAPABILITY, CE_OTHERDESCR ? • YAIM configuration-CE • How to set CE_SI 00, CE_SF 00, CE_CAPABILITY, CE_OTHERDESCR ? • Try to search for you value in this link: • • http: //www. italiangrid. org/grid_operations/site_manager/HEP-SPEC 06 • https: //hepix. caspur. it/processors/dokuwiki/doku. php? id=benchmarks: results https: //hepix. caspur. it/benchmarks/doku. php? id=bench: results_sl 5_x 86_64_gcc_4 12 • For example if you have an Intel XEON 5520 2. 23 GHz with no Hyper Threading will find in the table of previous link a value of 95 and a conversion factor of 1 HS 06=40 so: • CE_SI 00 = 3800 • CE_SF 00 = 3800 • CE_CAPABILITY="CPUScaling. Reference. SI 00=3800” • CE_OTHERDESCR="Cores=4, Benchmark=23. 75 -HEP-SPEC 06” • Where (3800/40)/4= 23. 75 29

YAIM configuration-CE BATCH_SERVER=$CE_HOST JOB_MANAGER=lcgpbs CE_BATCH_SYS=pbs BATCH_LOG_DIR=/var/spool/pbs APEL_DB_PASSWORD=grid 2011 DGAS_ACCT_DIR=/var/spool/pbs/server_priv/accounting VOS= YAIM configuration-CE BATCH_SERVER=$CE_HOST JOB_MANAGER=lcgpbs CE_BATCH_SYS=pbs BATCH_LOG_DIR=/var/spool/pbs APEL_DB_PASSWORD=grid 2011 DGAS_ACCT_DIR=/var/spool/pbs/server_priv/accounting VOS="eumed" QUEUES=“eumed" EUMED_GROUP_ENABLE="eumed" 30

YAIM configuration-CE • After editing you can launch command: #/opt/glite/yaim/bin/yaim -c -s /opt/glite/yaim/etc/siteinfo/site-info. def YAIM configuration-CE • After editing you can launch command: #/opt/glite/yaim/bin/yaim -c -s /opt/glite/yaim/etc/siteinfo/site-info. def -n cream. CE -n TORQUE_server -n TORQUE_utils #/opt/glite/yaim/bin/yaim -r -s /opt/glite/yaim/etc/siteinfo/site-info. def -n cream. CE -f config_cream_blparser http: //igrelease. forge. cnaf. infn. it/doku. php? id=doc: guides: devel: install-cream 32 31

Check the CE • http: //grid. pd. infn. it/cream/field. php? n=Main. Check. Your. CREAMCEC Check the CE • http: //grid. pd. infn. it/cream/field. php? n=Main. Check. Your. CREAMCEC onfiguration • Download the script wget http: //grid. pd. infn. it/cream/Check. Cream. Conf/current/Check. Crea m. Conf. pl chmod +x Check. Cream. Conf. pl • Run it: . /Check. Cream. Conf. pl • Check output : • Check. Cream. Conf. log 32

WN Cream Installation (on Torque/PBS) 33 WN Cream Installation (on Torque/PBS) 33

Preparing the Linux machine • Network Time Protocol settings # yum install ntp Preparing Preparing the Linux machine • Network Time Protocol settings # yum install ntp Preparing the Linux machine # /etc/init. d/ntpd stop • Copy the ntp. conf file and the ntp directory from ftp: //repo. magrid. ma/pub/CE_WN_BDII/ to /etc/ (Winscp) • Synchronize the date # ntpdate ntp. marwan. ma • Start the ntpd service and configure it to start on boot # /etc/init. d/ntpd start # chkconfig ntpd on 34

Preparing the Linux machine • Disable Selinux: make sure /etc/selinux/config contains line: SELINUX=disabled Preparing Preparing the Linux machine • Disable Selinux: make sure /etc/selinux/config contains line: SELINUX=disabled Preparing the #hostname –f # cat Linux machine /etc/hosts • Please check If you have a valid hostname • Stop iptables # /etc/init. d/iptables stop # chkconfig iptables off • Reboot 35

Repository set up-WN • Add to system repository ones specific for middleware to install Repository set up-WN • Add to system repository ones specific for middleware to install Repository set up -CE # cd /etc/yum. repos. d/ # mv dag. repo. stop export MREPO=http: //repo. magrid. ma/yumrepo/glite 32 # REPOS="dag lcg-CA glite-WN glite-TORQUE_client " # for name in $REPOS; do wget $MREPO/$name. repo –O /etc/yum. repos. d/$name. repo; done 36

package installation-WN • Use yum to install needed packets # yum clean all # package installation-WN • Use yum to install needed packets # yum clean all # yum install -y lcg-CA ca-policy-egi-core ca-policy-lcg # yum groupinstall glite-WN # yum install glite-TORQUE_client package installation-CE 37

WN - YAIM Configuration • You can use same configuration file edited on CE: WN - YAIM Configuration • You can use same configuration file edited on CE: - this can be done on all worker node of a site; - so you don’t neet to re-edit anything! • Copy configuration files from CE machine using scp mkdir /opt/glite/yaim/etc/siteinfo/ command: mkdir /opt/glite/yaim/etc/siteinfo/services #Copy the following files site-info. def , users. conf, groups. conf and wnlist. conf from ce [email protected] YY: /opt/glite/yaim/etc/siteinfo/site-info. def #copy the glite-wn from examples/services • Ready to configure now # /opt/glite/yaim/bin/yaim -c -s /opt/glite/yaim/etc/siteinfo/site-info. def -n glite. WN -n TORQUE_client 38

WN - YAIM Configuration • Ready to configure now # /opt/glite/yaim/bin/yaim -c -s /opt/glite/yaim/etc/siteinfo/site-info. WN - YAIM Configuration • Ready to configure now # /opt/glite/yaim/bin/yaim -c -s /opt/glite/yaim/etc/siteinfo/site-info. def -n glite. WN -n TORQUE_client • A basic test: • Check the status of pbs_mom • pbsnodes –a 39

 • Ready to configure now # /opt/glite/yaim/bin/yaim -c -s /opt/glite/yaim/etc/siteinfo/site-info. def -n glite. • Ready to configure now # /opt/glite/yaim/bin/yaim -c -s /opt/glite/yaim/etc/siteinfo/site-info. def -n glite. WN -n TORQUE_client • A basic test: • Check the status of pbs_mom • pbsnodes –a 40

Testing installation 41 Testing installation 41

Tests on CE • SSH access to CE to test if CE can see Tests on CE • SSH access to CE to test if CE can see WN and to test if all main service are up & running # pbsnodes # /etc/init. d/g. Lite status 42

Tests on CE • SSH access to CE and then become a gilda user: Tests on CE • SSH access to CE and then become a gilda user: # su – eumed 001 • Create a file and add the following: $ vi test. sh #!/bin/sh sleep 20 #(it's useful to see the job status) hostname • Set right permission to be executable: $ chmod 700 test. sh 43

Tests on CE • Launch job locally on CE $ qsub –q eumed test. Tests on CE • Launch job locally on CE $ qsub –q eumed test. sh • Then check list of job in execution on CE $ qstat –a ce. localdomain: Req'd Elap. Job ID Username Queue Jobname Sess. ID NDS TSK Memory Time S Time-------- ------ --- ----- - ----0. pc 22. magrid. ma eumed 001 short test. sh 5839 -- -- 00: 15 R -- • In case you want to more info: $ qstat -f 3 • In case you want to abort a job execution: $ qdel 3 #that is jobid 44

Tests on CE • If typing “qstat -a” command you didn’t get no output, Tests on CE • If typing “qstat -a” command you didn’t get no output, no jobs are being executed on CE and this means your previous job terminated so now you can list output. $ ls test. sh. e 3 test. sh. o 3 $ cat test. sh. e 3 #error file$$ cat test. sh. o 3 #output filewn. localdomain 45

JDL example $ vim hostname-cream. jdl Type = JDL example $ vim hostname-cream. jdl Type = "Job"; Job. Type = "Normal"; Executable = "/bin/hostname"; Std. Output = "hostname. out"; Std. Error = "hostname. err"; Output. Sandbox = {"hostname. err", "hostname. out"}; Arguments = "-f"; Output. Sandbox. Base. Dest. Uri = "gsiftp: //localhost/tmp“; 46

Working test • SSH access to UI to test if CE can receive and Working test • SSH access to UI to test if CE can receive and execute simple job $ ssh grid. [email protected] 01. magrid. ma #password: grid. XX #set up the certificate mkdir /home/grid 01/. globus [[email protected] 01 ~]# cp /root/user_cert/usercert. pem /home/grid 01/. globus/usercert. pem [[email protected] 01 ~]# cp /root/user_cert/userkey. pem /home/grid 01/. globus/userkey. pem [[email protected] 01 ~]# chown grid 01 /home/grid 01/. globus/usercert. pem [[email protected] 01 ~]# chown grid 01 /home/grid 01/. globus/userkey. pem [[email protected] 01 ~]# chmod 400 /home/grid 01/. globus/userkey. pem [[email protected] 01 ~]# su – grid 01 [grid [email protected] 01 ~]$ voms-proxy-init --voms eumed Enter GRID pass phrase: [grid 2011] $ voms-proxy-init --voms eumed password[grid 2011] #glite-ce-job-submit –r pc 22. magrid. ma: 8443/cream-pbs-eumed –o ID hostname-cream. jdl #glite-ce-job-status –i ID 47

Troubleshooting • Which logs are supposed to be open if something goes wrong? : Troubleshooting • Which logs are supposed to be open if something goes wrong? : – /var/log/message, for general errors – /opt/glite/var/log (especially glitece-cream. log) – /var/spool/pbs/server_priv/account ing/, if even local submission on batch system doesn’t work. 48

References • INFNGRID generic installation guide: – http: //igrelease. forge. cnaf. infn. it/doku. php? References • INFNGRID generic installation guide: – http: //igrelease. forge. cnaf. infn. it/doku. php? id=doc: guides: install-3_2 • YAIM configuration variables – https: //twiki. cern. ch/twiki/bin/view/LCG/Site-info_configuration_variables • CE Cream installation guide: – GLITE Cream CE 3. 2 SL 5 Installation Guide [INFNGRID Release Wiki] • YAIM system administrator guide: – https: //twiki. cern. ch/twiki/bin/view/LCG/Yaim. Guide 400 • EUMEDGRID wiki: – http: //wiki. eumedgrid. eu/bin/view • Eu. Med. GRID sites installation and setup tips – http: //wiki. eumedgrid. eu/twiki/bin/view/Infrastructure. Status/Eumed. Site. Installati on • How To Check And Test Your CREAMCE – http: //grid. pd. infn. it/cream/field. php? n=Main. How. To. Check. And. Test. Your. CREAM CE 49

Thank you for your kind attention ! Any questions ? 50 Thank you for your kind attention ! Any questions ? 50