83e81093adfeb0e16cbcdf55f8fce609.ppt
- Количество слайдов: 20
Consorzio COMETA - Progetto PI 2 S 2 FESR Long term job submission and monitoring uing grid services Riccardo Bruno INFN, Sez. CT 23/07/2007 Meeting sull'uso di applicazioni parallele in PI 2 S 2 www. consorzio-cometa. it
Outline • Long term job submission – My. Proxy. Server – Renewal – The renewal process and JDL tag – Long term job submission • Long term job monitoring – Middleware tools – How to do monitoring efficiently – The Watchdog – Watchdog use example – The main script – The watchdog flow – The main script code – Some outputs – The future … • References Catania, Meeting sull'uso di applicazioni parallele in PI 2 S 2 , 23. 07. 2007
Long term job submission Catania, Meeting sull'uso di applicazioni parallele in PI 2 S 2 , 23. 07. 2007
My. Proxy. Server – Proxy has limited lifetime (default is 12 h) • Bad idea to have longer proxy – myproxy server: • myproxy-init –voms
Renewal • A dedicated service on the RB can renew automatically the proxy: [edg-wl-renewd] - /etc/init. d/edg-wl- proxyrenewal • Some dedicated flags are required during the creation of the long term proxy credential with myproxy-init: – -d : Use the proxy certificate subject (DN) as the default username, instead of the LOGNAME env. var. – -n : Don't prompt for passphrase myproxy-init –voms cometa -d -n bash-2. 05 b$ Your identity: /C=IT/O=GILDA/L=INFN Catania/CN=Riccardo Bruno/ Email=riccardo. bruno@ct. infn. it Enter GRID pass phrase for this identity: Creating proxy. . . . . Done Proxy Verify OK Your proxy is valid until: Fri Jul 23 09: 30: 33 2007 A proxy valid for 168 hours (7. 0 days) for user /C=IT/O=GILDA/L=INFN Catania/ CN=Riccardo Bruno/Email=riccardo. bruno@ct. infn. it now exists on grid 001. ct. infn. it. Catania, Meeting sull'uso di applicazioni parallele in PI 2 S 2 , 23. 07. 2007
The renewal process and JDL tag • 5 or 10 minutes before the proxy expires the RB proxy renewal daemon will perform the following steps: – Contacts the My. Proxy. Server indicated into the JDL and asks for a new delegation – contacts the VOMS server to add the ACs – transfers the new VOMS-enabled proxy to the WNs running the job. • An additional attribute has to be added to the JDL – My. Proxy. Server = "grid 001. ct. infn. it"; § The item informs the RB which My. Proxy. Server has to be contacted to renew the credentials. Otherwise a default one is taken from UI VO configuration settings: glite_wmsui. conf Catania, Meeting sull'uso di applicazioni parallele in PI 2 S 2 , 23. 07. 2007
Long term job submission • Create the long term proxy on the My. Proxy server – myproxy-init --voms cometa -d –n • Create a new proxy or get the delegation from My. Proxy server – voms-proxy-init –voms cometa – myproxy-get-delegation –d -a $X 509_USER_PROXY (Please notice you must have already a valid proxy on the UI) • Submit the job normaly – edg-job-submit -o jid testmyproxy. jdl myproxy-init –voms cometa -d -n bash-2. 05 b$ Your identity: /C=IT/O=GILDA/L=INFN Catania/CN=Riccardo Bruno/ Email=riccardo. bruno@ct. infn. it Enter GRID pass phrase for this identity: Creating proxy. . . . . Done Proxy Verify OK Your proxy is valid until: Fri Jul 23 09: 30: 33 2007 A proxy valid for 168 hours (7. 0 days) for user /C=IT/O=GILDA/L=INFN Catania/ CN=Riccardo Bruno/Email=riccardo. bruno@ct. infn. it now exists on grid 001. ct. infn. it. Catania, Meeting sull'uso di applicazioni parallele in PI 2 S 2 , 23. 07. 2007
Renewal feedback Starting at: 20070720124320 subject : /C=IT/O=INFN/…/CN=proxy/CN=proxy/CN=limited proxy … type : limited proxy strength : 512 bits path : /tmp/globus-tmp. unime-wn-03. 27834. 0 timeleft : 0: 56: 58 === VO cometa extension information === VO : cometa subject : /C=IT/O=INFN/OU=Personal Certificate/L=Catania/CN=Riccardo Bruno This job has been executed with a delegated proxy 1 hr long issuer : /C=IT/O=INFN/OU=Host/L=Catania/CN=voms. ct. infn. it attribute : /cometa/Role=NULL/Capability=NULL (myproxy-get-delegation -d -t 1: 00 -a $X 509_USER_PROXY) timeleft : voms-proxy-info returns 0: 56: 58 as time left The 1° call to 11: 56: 01 … Other output from job’ core execution (just sleep execution) After the job core execution the 2° call to voms-proxy-info gives 8: 45: 18 as time left subject : /C=IT/O=INFN/…/CN=proxy/CN=proxy/CN=limited proxy … Please notice also the different subjects: type : limited proxy /C=IT/O=INFN/…/CN=proxy/CN=proxy/CN=limited proxy strength : 512 bits /C=IT/O=INFN/…/CN=proxy/CN=proxy/CN=limited proxy path : /tmp/globus-tmp. unime-wn-03. 27834. 0 timeleft : 8: 45: 18 === VO cometa extension information === VO : cometa subject : /C=IT/O=INFN/OU=Personal Certificate/L=Catania/CN=Riccardo Bruno issuer : /C=IT/O=INFN/OU=Host/L=Catania/CN=voms. ct. infn. it attribute : /cometa/Role=NULL/Capability=NULL timeleft : 10: 26: 00 Ending at: 20070720141321. Catania, Meeting sull'uso di applicazioni parallele in PI 2 S 2 , 23. 07. 2007
Long term jobs monitoring Catania, Meeting sull'uso di applicazioni parallele in PI 2 S 2 , 23. 07. 2007
Middleware tools • Currently g. Lite offers the following services allowing to monitor the job execution – Interactive Jobs or direct use of X server communication via SSH tunneling § User forced to use interactive JDL § Keep open the X client for the whole job duration – Use of RGMA § The use of dedicated producers need to apply code changes not ever possible. § Code changes are error prone and need to be tested – Use of AMGA § The use of AMGA APIs requires code changes as well Catania, Meeting sull'uso di applicazioni parallele in PI 2 S 2 , 23. 07. 2007
How to do monitoring efficiently • IDEA: Perform the job monitoring using still grid services in the less possible invasive way. – Observations: § Almost all jobs submitted on the grid are piloted by shell scripts • Shell scripting allow to get precious info in case of faults • Shell scripting can pilot more complex batch processing § Both SE and file catalog can be used as the simplest IS on the grid. • lfc-* and lcg-* tools already available for file creation and retrieve • The latency of CLI tools for the storage is very low compared to long term jobs – Requirements: § It would be useful to configure the monitoring tool accordingly to the user needs • Few shell environment variables can be used to configure the monitoring tool Catania, Meeting sull'uso di applicazioni parallele in PI 2 S 2 , 23. 07. 2007
The Watchdog • The Watchdog is a shell script to be included in the main script. – Some watchdog features: § § § It starts in background before to run the long term job The watchdog runs as long as the main job The main script can stop and wait until the watchdog has finished Easily and highly configurable The watchdog does not compromise the CPU power of the WN The watchdog is really simple and its behavior can be extended by the user • The best way to explain the watchdog is to make an use example … Catania, Meeting sull'uso di applicazioni parallele in PI 2 S 2 , 23. 07. 2007
Watchdog use example • The simplest use case foresees the following: – The JDL: script. jdl – The main script file: script. sh – The watchdog script file: watchdog. sh Input. Sandbox script. jdl file. out script. sh watchdog. sh file. err watchdog. out Output. Sandbox Type = "Job"; Job. Type = "Normal"; Executable = "/bin/bash"; Std. Output = "file. out"; Std. Error = "file. err"; Input. Sandbox = {"watchdog. sh", "script. sh"}; Output. Sandbox = {"file. out", "file. err", "watchdog. out"}; Arguments = "script. sh"; Catania, Meeting sull'uso di applicazioni parallele in PI 2 S 2 , 23. 07. 2007
The main script • It is a good practice to have a main script like the following structure: Get information about the WN Start the watchdog Execute and control the main job Stop the watchdog Collect information about the job execution Catania, Meeting sull'uso di applicazioni parallele in PI 2 S 2 , 23. 07. 2007
The watchdog flow Initialization File Catalog/SE USERPATH/Job. Id Enter the loop For each file in the list Take a snapshoot (just increments will be copied) CTLR File exsists Create notification file VO USERPATH FILE Catalog SE DELAY LIST OF FILES
The main script code # # watchdog – Riccardo Bruno 200707 # echo "Starting at: “ $(date +%y%m%d%H%M%S) HOSTNAME=$(hostname -f) USER=$(whoami) ARG 1=$1 LOCALDIR=$(pwd) echo "***************" echo "HOST: "$HOSTNAME echo "USER: "$USER echo "ARGS: "$ARG 1 echo "LOCALDIR is: "$LOCALDIR echo "HOMEDIR is: "$HOME echo "Content of home: " ls -l $HOME echo "Content of current dir: " ls -l. echo "***************" #start the watchdog chmod +x watchdog. sh. /watchdog. sh > watchdog. out & # perform 8 iterations, 15 seconds each # 2 minutes for i in $(seq 1 8) do echo "This is mine output at: “ $(date +%y%m%d%H%M%S) echo "This is mine error at: “ $(date +%y%m%d%H%M%S) 1>&2 sleep 15 done #stop and wait the dog rm -f watchdog. ctrl while [ ! -e watchdog. done ] do sleep 1 echo "Waiting for watchdog: “ $(date +%y%m%d%H%M%S) done echo "Watchdog closed" echo "done" 1>&2 echo "Ending at: "$(date +%y%m%d%H%M%S) Catania, Meeting sull'uso di applicazioni parallele in PI 2 S 2 , 23. 07. 2007
Some outputs [brunor@glite-tutor tmp]$ lfc-ls -l /grid/gilda/brunor/2 DFf. QYycd 5 gu. ISZSU 3 Zd. OQ -rw-rw-r-1 1023 102 2211 Jul 18 16: 13 070718161318_testmyproxy. out -rw-rw-r-1 1023 102 85 Jul 18 16: 14 070718161347_testmyproxy. err … [brunor@glite-tutor brunor_2 DFf. QYycd 5 gu. ISZSU 3 Zd. OQ]$ cat file. out Starting at: 070713155443 ********************
The future … • The watchdog can be easily improved – Use a special folder in the catalog to be used as a virtual UI on the WN allowing the user to issue shell commands: WD_USER_PATH/
References • The watchdog wiki – https: //grid. ct. infn. it/twiki/bin/view/PI 2 S 2/Watchdog. Utility Catania, Meeting sull'uso di applicazioni parallele in PI 2 S 2 , 23. 07. 2007
Questions… Catania, Meeting sull'uso di applicazioni parallele in PI 2 S 2 , 23. 07. 2007