51c29b378d70c61726436766edc93db7.ppt
- Количество слайдов: 59
Enabling Grids for E-scienc. E Grid application development with g. Lite and P-GRADE Portal Miklos Kozlovszky MTA SZTAKI m. kozlovszky@sztaki. hu www. eu-egee. org EGEE-II INFSO-RI-031688
Contents Enabling Grids for E-scienc. E • P-GRADE Portal in a nutshell • Workflow development with the Portal • Workflow execution with the Portal • Scaling up to a parametric workflow EGEE-II INFSO-RI-031688 2
Short History of P-GRADE portal Enabling Grids for E-scienc. E • Parallel Grid Application and Development Environment • Initial development started in the Hungarian Super. Computing Grid project in 2003 • It has been continuously developed since 2003 • Detailed information: http: //www. portal. p-grade. hu/ • Open Source community development since January 2008: https: //sourceforge. net/projects/pgportal/ EGEE-II INFSO-RI-031688 3
Download of OSS P-GRADE portal Enabling Grids for E-scienc. E 110 downloads within the first month EGEE-II INFSO-RI-031688 ~697 total downloads until now 4
Main P-GRADE related projects Enabling Grids for E-scienc. E • • • EU SEE-GRID-1 (2004 -2006) – Integration with LCG-2 and g. Lite EU SEE-GRID-2, SEE-GRID-SCI (2006 -2008 / 2008 -2010) – Parameter sweep extension EU Core. Grid (2005 -2008) – To solve grid interoperation for job submission – To solve grid interoperation for data handling: SRB, OGSA-DAI GGF GIN (2006) – Providing the GIN Resource Testing portal EGEE 2, 3 (2006 -2010) – Respect program tool used for training and application development ICEAGE (2006 -2008) – P-GRADE portal is used for training as official portal of the GILDA training infrastructure • EU EDGe. S (2008 -2009) – Transparent access to any EGEE and Desktop Grid systems EGEE-II INFSO-RI-031688 5
Portal installations Enabling Grids for E-scienc. E EGEE-II INFSO-RI-031688 7
Enabling Grids for E-scienc. E EGEE-II INFSO-RI-031688 Multi-Grid service portal To be used today! 8
Motivations for developing P-GRADE portal Enabling Grids for E-scienc. E • P-GRADE portal should – Hide the complexity of the underlying grid middlewares – Provide a high-level graphical user interface that is easy-to-use for e -scientists – Support many different grid programming approaches: § § § Simple Scripts & Control (sequential and MPI job execution) Scientific Application Plug-ins Complex Workflows Parameter sweep applications: both on job and workflow level Interoperability: transparent access to grids based on different middleware technology (both computing and data resources) – Support several levels of parallelism EGEE-II INFSO-RI-031688 11
Layers in a Grid system Enabling Grids for E-scienc. E Application toolkits, standards Higher-level grid services (brokering, …) Basic Grid services: AA, job submission, info, … EGEE-II INFSO-RI-031688 Graphical interface P-GRADE Portal services Command line tools Grid middleware 12
What is a P-GRADE Portal workflow? Enabling Grids for E-scienc. E • a directed acyclic graph where – Nodes represent jobs (batch programs to be executed on a computing element) – Ports represent input/output files the jobs expect/produce – Arcs represent file transfer operations • semantics of the workflow: – A job can be executed if all of its input files are available EGEE-II INFSO-RI-031688 14
Three Levels of parallelism Enabling Grids for E-scienc. E Multiple instances of the same workflow can process different data files – Job level: Parallel execution inside a workflow node (MPI job as workflow component) – Workflow level: Parallel execution among workflow nodes (WF branch parallelism) – PS workflow level: Parameter study execution of the workflow EGEE-II INFSO-RI-031688 Multiple jobs can run parallel Each job can be a parallel program 15
Example 1. : Computational Chemistry Enabling Grids for E-scienc. E Department of Chemistry, University of Perugia 25 times SOLUTION OF SCHRODINGER EQUATION FOR TRIATOMIC SYSTEMS USING TIMEDEPENDENT (RWAVEPR) OR TIME INDEPENDENT (ABC) METHOD A single execution can be between 5 hours and 10 hours Many simulations at the same time SEQUENTIAL FORTRAN 90 EGEE-II INFSO-RI-031688 16
Example 2. : Ultra-short range weather forecast Enabling Grids for E-scienc. E Hungarian Meteorology Service Forecasting dangerous weather situations (storms, fog, etc. ), crucial task in the protection of life and property 25 x 10 x 25 x 5 x Processed information: surface level measurements, high -altitude measurements, radar, satellite, lightning, results of previous computed models Requirements: • Execution time < 10 min • High resolution (1 km) EGEE-II INFSO-RI-031688 17
Grid interoperation by P-GRADE Acccessing Globus, g. Lite and ARC based grids simultaneously Enabling Grids for E-scienc. E P-GRADE portal EGEE-II INFSO-RI-031688 18
Typical user scenario Compilation phase Enabling Grids for E-scienc. E Certificate servers UPLOAD SOURCE(S) Portal server COMPILE – EDIT Grid services DOWNLOAD BINARI(ES) EGEE-II INFSO-RI-031688 19
Enabling Grids for E-scienc. E Typical user scenario Application development phase Certificate servers SAVE WORKFLOW Portal server Grid services START EDITOR OPEN & EDIT or DEVELOP WORKFLOW EGEE-II INFSO-RI-031688 20
Typical user scenario Workflow Execution phase Enabling Grids for E-scienc. E Certificate servers TRANSFER FILES, SUBMIT JOBS DOWNLOAD PROXY CERTIFICATES VISUALIZE JOBS and WORKFLOW PROGRESS Portal server MONITOR JOBS Grid services DOWNLOAD (SMALL) RESULTS EGEE-II INFSO-RI-031688 21
P-GRADE Portal structural overview Enabling Grids for E-scienc. E Client Web browser Java Webstart workflow editor User interface layer Presents the user interface P-GRADE Portal server Internal layer – Java classes Represents the internal concepts Grid layer – g. Lite and Globus command line tools Interfacing with grid services Grid EGEE-II INFSO-RI-031688 EGEE and Globus Grid services (g. Lite WMS, LFC, …; Globus GRAM, Grid. FTP, …) 22
Interface layer Enabling Grids for E-scienc. E Client Java Webstart workflow editor Web browser User interface layer Web server Gridpshere Web portal framework P-GRADE Portal server Gridsphere portlets EGEE-II INFSO-RI-031688 P-GRADE portlets Workflow monitor: Java applet generator Workflow editor: Java webstart application 23
Interface layer functionalities Enabling Grids for E-scienc. E Client • Login • Welcome • . . . • Workflow portlet • Workflow manager, Storage, Upload • Certificate portlet Java Webstart Web browser Upload, download and other operations • workflow editor • Settings portlet • Grid settings, Quota settings • File management User interface layer • Manage files in the grid • Compiler portlet Web server • Compile jobs on portal server Gridpshere Web portal framework P-GRADE Portal server Gridsphere portlets EGEE-II INFSO-RI-031688 P-GRADE portlets Workflow monitor: Java applet generator Workflow editor: Java webstart application 24
P-GRADE vs. Non-P-GRADE portlets Enabling Grids for E-scienc. E Grid. Sphere 2. x Grid Portal framework EGEE-II INFSO-RI-031688 P-GRADE Portal portlets 25
Interface layer Enabling Grids for E-scienc. E Client Java Webstart workflow editor Web browser User interface layer Web server Gridpshere Web portal framework P-GRADE Portal server Gridsphere portlets EGEE-II INFSO-RI-031688 P-GRADE portlets Workflow monitor: Java applet generator Workflow editor: Java webstart application 26
Interface layer Enabling Grids for E-scienc. E Client Java Webstart workflow editor Web browser User interface layer Web server Gridpshere Web portal framework P-GRADE Portal server Gridsphere portlets EGEE-II INFSO-RI-031688 P-GRADE portlets Workflow monitor: Java applet generator Workflow editor: Java webstart application 27
Portlets/functionalities of P-GRADE portal Enabling Grids for E-scienc. E • • Settings (portlet) Certificate and proxy management (portlet) Information system visualization (portlet) Graphical workflow editing Workflow manager (portlet) LFC (EGEE) file management (portlet) Compilation support (portlet) Fault-tolerance support EGEE-II INFSO-RI-031688 31
Settings Portlet Enabling Grids for E-scienc. E • Portal administrator can – connect the portal to several grids – register the basic resources of the connected grids EGEE-II INFSO-RI-031688 32
Settings Portlet Enabling Grids for E-scienc. E User can customize the connected grids by adding and removing resources EGEE-II INFSO-RI-031688 33
Certificate and proxy management Portlet Enabling Grids for E-scienc. E • User can upload his certificates of various grids to the My. Proxy server • User can download proxys and allocate to grids • User can use simultaneously as many proxys as many grids are connected to the portal • As a result parallel branches of a workflow can be executed simultaneously in several grids SEE-GRID access EGEE-II INFSO-RI-031688 HUNGRID access 34
My. Proxy interaction in P-GRADE: Certificate Manager Enabling Grids for E-scienc. E Certificates portlet • To start your session on the Grid you must create a proxy certificate on the portal server • “Certificates” portlet: • to upload a proxy into My. Proxy servers • to download a proxy from My. Proxy into the portal server EGEE-II INFSO-RI-031688 35
Enabling Grids for E-scienc. E Certificate Manager Downloading a proxy 1. 2. 3. EGEE-II INFSO-RI-031688 My. Proxy server access details: • Hostname • Port number • User name (from upload) • Password (from upload) Proxy parameters: • Lifetime • Comment Grid association 36
Enabling Grids for E-scienc. E Certificate Manager Associating the proxy with a grid This operation displays the details of the certificate and the list of available Grids (defined by portal administrator) EGEE-II INFSO-RI-031688 37
Solving Grid interoperation by P-GRADE Portal Enabling Grids for E-scienc. E EGEE Grid P-GRADEPortal London Different jobs can be parallel executed in different grids UK NGS Paris Athens EGEE-II INFSO-RI-031688 38
Interoperation vs. Interoperability Enabling Grids for E-scienc. E As defined by the GIN (Grid Interoperation Now) CG (Community Group) of the OGF (Open Grid Forum) Interoperation: – short term solution that defines what needs to be done to achieve interoperation between current production grids using existing technologies Interoperability: – native ability of Grids and Grid middleware to interact directly via common open standards Grid 1 P-GRADE Portal Grid 1 Grid 2 Interoperation EGEE-II INFSO-RI-031688 Grid 3 Grid 2 Grid 3 Interoperability 39
Information system Portlet Enabling Grids for E-scienc. E EGEE-II INFSO-RI-031688 40
Graphical workflow editing Enabling Grids for E-scienc. E • The aim is to define a DAG of batch jobs: 1. Drag & drop components: 2. 3. 4. jobs and ports Define their properties Connect ports by channels (no cycles, no loops, no conditions) Automatically generates JDL file EGEE-II INFSO-RI-031688 41
Workflow Editor Enabling Grids for E-scienc. E Properties of a job: • Binary executable • Type of executable • Number of required processors • Command line parameters • The resource to be used for the execution: • Grid/VO • (Computing element) EGEE-II INFSO-RI-031688 42
Direct resource selection: Which computing element to use? Enabling Grids for E-scienc. E I still don’t know which resource to use! EGEE-II INFSO-RI-031688 The information system portlet queries BDII and GIIS servers 43
Automatic resource selection Enabling Grids for E-scienc. E 1. Select a broker Grid/VO for the job (e. g. GILDA_LCG 2_broker/GILDA_g. Lite_broker) 2. (Describe the ranks & requirements of the job in JDL) 3. The portal will use the broker to find the best resource for the job! EGEE-II INFSO-RI-031688 44
Enabling Grids for E-scienc. E Workflow Editor Defining broker jobs Select a Grid with broker! (*_BROKER) Ignore the resource field! If default JDL is not sufficient use the built-in JDL editor! EGEE-II INFSO-RI-031688 45
Enabling Grids for E-scienc. E Workflow Editor Built-in JDL editor JDL look at the g. Lite Users’ manual! EGEE-II INFSO-RI-031688 46
Workflow Editor Enabling Grids for E-scienc. E Defining input-output files File properties Type: input: the job reads output: the job generates File type: local: comes from my desktop remote: comes from an SE File: location of the file Internal file name: Executable reads the file in this name – fopen(“file. in”, …) File storage type (output files only): Permanent: final result Volatile: only data channel EGEE-II INFSO-RI-031688 47
How to refer to an I/O file? Enabling Grids for E-scienc. E Input file Output file Local file • • Client side location: • LFC logical file name • Grid. FTP address (in Globus Grids): c: experiments11 -04. dat • • LFC logical file name (LFC file catalog is required – EGEE VOs) lfn: /grid/gilda/kozlovszky/11 -04. dat Grid. FTP address (in Globus Grids): gsiftp: //somengshost. ac. uk/mydir/11 -04. dat result. dat (LFC file catalog is required – EGEE VOs) lfn: /grid/gilda/kozlovszky/11 -04_-_result. dat gsiftp: //somengshost. ac. uk/mydir/result. dat Remote file EGEE-II INFSO-RI-031688 48
Enabling Grids for E-scienc. E Local vs. remote files Your binary can access data services directly too • Grid. FTP API • GFAL API • lfc-*, lcg-* commands LOCAL INPUT FILES & EXECUTABLES Portal server LOCAL OUTPUT FILES & EXECUTABLES Grid services Storage elements REMOTE INPUT FILES LOCAL OUTPUT FILES REMOTE OUTPUT FILES Computing elements Only the permanent files! EGEE-II INFSO-RI-031688 49
Workflow manager Enabling Grids for E-scienc. E • Lists available workflows • Enables – Submitting – Aborting – Deleting existing workflows • Shows status, logs and results of workflow executions • Orchestrates job executions inside a workflow EGEE-II INFSO-RI-031688 51
Enabling Grids for E-scienc. E Workflow Management (workflow portlet) • The portlet presents the status, size and output of the available workflow in the “Workflow” list • It has a Quota manager to control the users’ storage space on the server • The portlet also contains the “Abort”, “Attach”, “Details”, “Delete” and “Delete all” buttons to handle execution of workflows • The “Attach” button opens the workflow in the Workflow Editor • The “Details” button gives an overview about the jobs of the workflow EGEE-II INFSO-RI-031688 52
Enabling Grids for E-scienc. E Workflow Execution (observation by the workflow portlet) White/Red/Green color means the job is initial/running/finished state EGEE-II INFSO-RI-031688 53
Enabling Grids for E-scienc. E Workflow Execution (observation by the workflow portlet) White/Red/Green color means the job is initial/running/finished state EGEE-II INFSO-RI-031688 54
Enabling Grids for E-scienc. E Workflow Execution (observation by the workflow portlet) White/Red/Green color means the job is initial/running/finished state EGEE-II INFSO-RI-031688 55
Enabling Grids for E-scienc. E Workflow Execution (observation by the workflow portlet) White/Red/Green color means the job is initial/running/finished state EGEE-II INFSO-RI-031688 56
Enabling Grids for E-scienc. E Workflow Execution (observation by the workflow portlet) White/Red/Green color means the job is initialised/running/finished EGEE-II INFSO-RI-031688 57
LFC (EGEE) file management Enabling Grids for E-scienc. E EGEE-II INFSO-RI-031688 58
Compilation support Enabling Grids for E-scienc. E EGEE-II INFSO-RI-031688 59
Logs provided for each job Enabling Grids for E-scienc. E EGEE-II INFSO-RI-031688 60
Analysis of the log Enabling Grids for E-scienc. E • • • • • 2008. 01. 09 09: 32: 19 - Proxy with VOMS extensions created for VO "voce" with accounting group "". 2008. 01. 09 09: 32: 19 - Job submission in progress. . . 2008. 01. 09 09: 32: 23 - Job has been submitted successfully! 2008. 01. 09 09: 32: 23 - Job identifier is: "https: //skurut 1. cesnet. cz: 9000/m. D_8 Vz. Phm 8 Am. ITo. TJKtigg" 2008. 01. 09 09: 32: 26 - EGEE job's status has changed to "Waiting" (host is ). 2008. 01. 09 09: 33: 00 - EGEE job's status has changed to "Ready" (host is ce 1 egee. srce. hr). 2008. 01. 09 09: 35: 46 - EGEE job's status has changed to "Waiting" (host is egeece. grid. niif. hu). 2008. 01. 09 09: 36: 19 - EGEE job's status has changed to "Ready" (host is ce. cyfkr. edu. pl). 2008. 01. 09 09: 36: 53 - EGEE job's status has changed to "Waiting" (host is ce. cyf-kr. edu. pl). 2008. 01. 09 09: 37: 26 - EGEE job's status has changed to "Done" (host is egee-ce. grid. niif. hu). 2008. 01. 09 09: 37: 26 - Job found to be finished. Checking again if this is really the case. 2008. 01. 09 09: 38: 03 - EGEE job's status has changed to "Ready" (host is egee-ce 1. gup. uni-linz. ac. at). EGEE-II INFSO-RI-031688 61
Fault-tolerant Grid applications Enabling Grids for E-scienc. E • Utilizing – Condor DAGMan’s rescue mechanism – EGEE job resubmission mechanism of WMS • If the EGEE broker leaves a job stuck in a CEs’ queue, the portal automatically – kills the job on this site and – resubmits the job to the broker by prohibiting this site. • As a result – the portal guarantees the correct submission of a job as long as there exists at least one matching resource – job submission is reliable even in an unreliable grid EGEE-II INFSO-RI-031688 62
Fault-tolerance by P-GRADE portal Enabling Grids for E-scienc. E • • • 09: 33: the broker assigned the job to a site: ce 1 -egee. srce. hr 09: 35: The broker moved the job to another site: egee-ce. grid. niif. hu 09: 36: Again the broker moved the job to another site: ce. cyf-kr. edu. pl 09: 37: The broker indicated that the job is Done, but. 09: 38: . . . It turned out that the job was not finished (Done - Failed status), only it was moved to another site: egee-ce 1. gup. uni-linz. ac. at 09: 39: Again the broker moved the job to another site: ares 02. cyfkr. edu. pl 09: 39: Again the broker moved the job to another site: ce. cyf-kr. edu. pl 09: 40: After trying 10 different sites the VOCE broker gave it up and aborted the job (the Shallow Retry. Count was set for 10): 2008. 01. 09 09: 40: 16 - The job has been aborted! EGEE-II INFSO-RI-031688 63
Fault-tolerance by P-GRADE portal Enabling Grids for E-scienc. E • • • Our fault-tolerant portal did not give it up: 2008. 01. 09 09: 40: 16 - The job can be submitted again (try 1 out of 3, excluding host(s): ce. cyf-kr. edu. pl) 2008. 01. 09 09: 40: 17 - Proxy with VOMS extensions created for VO "voce" with accounting group "". 2008. 01. 09 09: 40: 17 - Job submission in progress. . . 2008. 01. 09 09: 40: 27 - Job has been submitted successfully! 2008. 01. 09 09: 40: 27 - Job identifier is: "https: //skurut 1. cesnet. cz: 9000/o 22 BTVq. Qsvwzj 2 wn 5 KP 8_A" 2008. 01. 09 09: 40: 30 - EGEE job's status has changed to "Waiting" (host is ). 2008. 01. 09 09: 41: 04 - EGEE job's status has changed to "Ready" (host is eszakigrid 66. inf. elte. hu). EGEE-II INFSO-RI-031688 64
Fault-tolerance by P-GRADE portal Enabling Grids for E-scienc. E • • • 2008. 01. 09 09: 41: 37 - EGEE job's status has changed to "Scheduled" (host is eszakigrid 66. inf. elte. hu). 2008. 01. 09 09: 44: 57 - EGEE job's status has changed to "Done" (host is eszakigrid 66. inf. elte. hu). 2008. 01. 09 09: 44: 57 - Job found to be finished. Checking again if this is really the case. 2008. 01. 09 09: 45: 34 - EGEE job's status has changed to "Waiting" (host is eszakigrid 66. inf. elte. hu). 2008. 01. 09 10: 06 - The job's status hasn't changed for 20 minutes, resubmitting. . . It is a quite frequently occurring problem in EGEE-like grids that the broker leaves jobs stuck in CEs. queues. ) In such case the portal automatically kills the job on this site and resubmits it to the broker. • • • 2008. 01. 09 10: 06 - Proxy with VOMS extensions created for VO "voce" with accounting group "". 2008. 01. 09 10: 06 - Job submission in progress. . . 2008. 01. 09 10: 06: 12 - Job has been submitted successfully! • 10: The job successfully finished with exit code 0 on site: ce. ui. savba. sk EGEE-II INFSO-RI-031688 65
Enabling Grids for E-scienc. E • Lessons learnt P-GRADE portal provides – Easy-to-use but powerful workflow system (graphical editor, wf manager, etc. ) – Three levels of parallelism § MPI job level § Workflow branch level § Parameter sweep at workflow level – Multi-grid/multi-VO access mechanism for various grids (LCG-2, g. Lite and GT 2) § Simultaneous access § Transparent access § Migrating a workflow from one grid to another requires no modification in the workflow EGEE-II INFSO-RI-031688 66
Enabling Grids for E-scienc. E Learn once, use everywhere Develop once, execute anywhere Thank you! www. portal. p-grade. hu pgportal@lpds. sztaki. hu www. eu-egee. org EGEE-II INFSO-RI-031688
51c29b378d70c61726436766edc93db7.ppt