b95c679202e34481a2641ec4ff339087.ppt
- Количество слайдов: 23
Enabling Grids for E-scienc. E A WS-based Interface to the WMS: the WMProxy Fabrizio Pacini On behalf of the JRA 1 IT-CZ Datamat group email: fabrizio. pacini@datamat. it Catania, 9 -11 Jan 2006 www. eu-egee. org INFSO-RI-508833 EGEE is a project funded by the European Union under contract IST-2003 -508833
Outline Enabling Grids for E-scienc. E • WMProxy Intro • What’s new – New request types – Job’s sandboxes management – New features • Client tools • Next Steps • Doc INFSO-RI-508833 NA 4 Generic Applications Meeting, 9 -11 Jan 2006, Catania
WMProxy (1/6) Enabling Grids for E-scienc. E • WMProxy (Workload Manager Proxy) – is a new service providing access to the g. Lite Workload Management System (WMS) functionality through a simple Web Services based interface. – has been designed to efficiently handle a large number of requests for job submission and control to the WMS – the service interface addresses the Web Services and SOA architecture standards, in particular adhering to WS-I – No WSRF: specs and implementation still not mature enough – Developed in C++ using gsoap 2. 7. 6 b as soap stubs generator INFSO-RI-508833 NA 4 Generic Applications Meeting, 9 -11 Jan 2006, Catania
WMProxy (2/6) Enabling Grids for E-scienc. E • WMProxy provides. . . – A WS-I compliant WSDL description of the services made available by the WMS – A core service performing validation, convertion, environment preparation etc. for each incoming request before delivering it to the WM Task Queue. – A set of client tools to interact with it • . . . i. e. , it is not only a WS-interface: the NS component has been completely refactored – – to include some of the logic previously “embedded” in the UI to provide new functionalities to provide a better error reporting to improve performance, usability and scalability INFSO-RI-508833 NA 4 Generic Applications Meeting, 9 -11 Jan 2006, Catania
WMProxy (3/6) Enabling Grids for E-scienc. E LBProxy • Apache + Grid. Site SOAP over HTTPS Fast. CGI Full duplex socket WMProxy WMProxy WM Task Queue Fast. CGI Protocol g. Soap JA MM Hosting Environment Helpers INFSO-RI-508833 NA 4 Generic Applications Meeting, 9 -11 Jan 2006, Catania
WMProxy (4/6) Enabling Grids for E-scienc. E • WMProxy runs as a dynamic fast. CGI application in an Apache+Grid. Site container (the Web Service hosting framework) – Grid. Site provides the Grid-based authentication and authorization environment removing the need for the service to manipulate Grid credentials (such as X. 509, GSI and VOMS) – Fast. CGI is a language independent, scalable, open extension to CGI that provides high performance and persistence § Fast. CGI applications use Unix or TCP sockets to communicate with the web server § Instances of the WMProxy service are spawned/killed dynamically according to demand (a sort of load balancing) § no per-request startup and initialization overhead INFSO-RI-508833 NA 4 Generic Applications Meeting, 9 -11 Jan 2006, Catania
WMProxy (5/6) Enabling Grids for E-scienc. E • Delegation approach has been changed – Delegation is no more part of the authentication process – Can be done only once for multiple jobs – WMProxy imports the delegation port type provided by Grid. Site and shared by all g. Lite components • LCMAPS is used for user mapping as it is for the CE – the gsi-free flavour of LCMAPS – works with VOMS pool account/group too • Authorization based on DN+FQAN – Using Gridsite gacl INFSO-RI-508833 NA 4 Generic Applications Meeting, 9 -11 Jan 2006, Catania
WMProxy (6/6) Enabling Grids for E-scienc. E Main operations • get. Proxy. Req put. Proxy Delegation stuff • • job. Register job. Start job. Cancel job. Purge We have tried to stay as much as possible aligned with the CREAM (see nex talk) operations to obtain a uniform interface • • • job. Submit job. List. Match get. Output. File. List get. Sandbox. Dest. URI get. Bulk. Sandbox. Dest. URI INFSO-RI-508833 • • • get. Free. Quota get. Max. Input. Sandbox. Size get. Version add. ACLItem … NA 4 Generic Applications Meeting, 9 -11 Jan 2006, Catania
New request types (1/4) Enabling Grids for E-scienc. E • DAGs: – Direct Acyclic Graphs of jobs: set of jobs where the input, output, or execution of one or more jobs depends on one or more other jobs § The jobs are nodes (vertices) in the graph § the edges (arcs) identify the dependencies – Not really new but their management has been improved § Shared sandboxes § Attributes Inheritance § Attribute references between nodes and with the ‘parent’ node. B node. A node. C Node. F node. D INFSO-RI-508833 NA 4 Generic Applications Meeting, 9 -11 Jan 2006, Catania
New request types (2/4) Enabling Grids for E-scienc. E • Parametric Jobs: – A Parametric job is a job having one or more parametric attributes in the JDL. – The parametric attributes in the JDL vary their values according to a parameter – Submission of a parametric job generates the submission of several instances of the same job just differing for the value of the parameter – Value of the parameter is made available in the job runtime environment [ Job. Type = "Parametric"; Executable = "cms_sim. exe"; Std. Input = "input_PARAM_. txt"; Std. Output = "output_PARAM_. txt"; Std. Error = "error_PARAM_. txt"; Parameters = 2500; Parameter. Step = 10; … ] INFSO-RI-508833 NA 4 Generic Applications Meeting, 9 -11 Jan 2006, Catania
New request types (3/4) Enabling Grids for E-scienc. E • Job Collections: – A Collection is a set of independent jobs that for some reason (known to the user) have to be submitted, monitored and controlled as a single request – the JDL description for a Collection is quite simple as it basically consists of a list of JDL descriptions (the sub-jobs) – Same features as for DAGs are available § Shared sandboxes § Attributes Inheritance § Attribute references between nodes and with the ‘parent’ [ Type = "collection"; nodes = { [ <job descr 1 >], [ <job descr 2 >], … }; … ] INFSO-RI-508833 NA 4 Generic Applications Meeting, 9 -11 Jan 2006, Catania
New request types (4/4) Enabling Grids for E-scienc. E • Support for new types strongly relies on newly developed JDL converters and on the DAG submission support – All JDL conversions are performed on the server – A single submission for several jobs • All new request types can be monitored and controlled through a single handle (the request id) – Each sub-jobs can be however followed-up and controlled independently through its own id • “Smarter” WMS client commands/API – allow submission of DAGs, Collections and parametric jobs exploiting the concept of “shared sandbox” – allow automatic generation and submission of collections and DAGs from sets of JDL files located in user specified directories on the UI INFSO-RI-508833 NA 4 Generic Applications Meeting, 9 -11 Jan 2006, Catania
Shared Sandboxes for sub-jobs of compound jobs Enabling Grids for E-scienc. E • JDL has been extended to allow specification of the input sandbox at the level of the compound request (i. e. DAGs, Collections and Parametric jobs) – This sandbox is “inherited” by all sub-jobs of the compound job not specifying their own sandbox – This Input sandbox is trasferred only once by the new WMS client commands but can be accessed by all sub-jobs of the compound job – Sub-jobs sandboxes can also refer to single files only of the “shared sandbox”, e. g. Input. Sandbox = root. Input. Sandbox[0]; – or to sandboxes of other subjobs, e. g. , Input. Sandbox = root. nodes. node. A. description. Output. Sandbox[2]; INFSO-RI-508833 NA 4 Generic Applications Meeting, 9 -11 Jan 2006, Catania
‘Scattered’ Input Sandboxes Enabling Grids for E-scienc. E • Input Sandbox can contain – file paths on the UI machine (i. e. the usual way) – URI pointing to files on a remote grid. FTP/HTTPS server Input. Sandbox = { "gsiftp: //neo. datamat. it: 2811/var/prg/sim. exe", "https: //ghemon. cnaf. infn. it: 8443/data/idat_1", "file: ///home/pacio/myconf"}; • A base URI to be applied to all sandbox files can also be specified Input. Sandbox. Base. URI = "gsiftp: //matrix. datamat. it: 2811/var"; • Only local files (file: //) are uploaded to the WMS node • File pointed by URIs are directly downloaded on the WN by the Job. Wrapper just before the job is started INFSO-RI-508833 NA 4 Generic Applications Meeting, 9 -11 Jan 2006, Catania
‘Scattered’ Output Sandboxes Enabling Grids for E-scienc. E • JDL has been enriched with new attributes for specifying the destinations for the files listed in the Output. Sandbox attribute list Output. Sandbox = { "job. Output", "run 1/event 1", "job. Error" }; Output. Sandbox. Dest. URI = { "gsiftp: //matrix. datamat. it/var/job. Output", "https: //grid 003. ct. infn. it: 8443/home/cms/event 1", "gsiftp: //matrix. datamat. it/var/job. Error" }; • A base URI to be applied to all sandbox files can also be specified Output. Sandbox. Base. Dest. URI = "gsiftp: //neo. datamat. it/home/run 1/"; • Files are copied when the job has completed execution by the Job. Wrapper to the specified destination without transiting on the WMS node INFSO-RI-508833 NA 4 Generic Applications Meeting, 9 -11 Jan 2006, Catania
‘Compressed’ Sandboxes Enabling Grids for E-scienc. E (g. Lite version >= 1. 5) • A compressed archive is created with the input sandboxes files using libtar and zlib libraries – This is done automatically by WMProxy client commands – this mechanism can be enabled/disabled by the user through the JDL (Allow. Zipped. ISB attribute) • The archive is transferred (instead of single files) to the WMS – Besides the gain brought by compression, allows saving the overhead for several calls to globus-url-copy • WMProxy service untars the files in the jobs directories when the job is ‘started’ and removes the archive INFSO-RI-508833 NA 4 Generic Applications Meeting, 9 -11 Jan 2006, Catania
. . . again about Sandboxes Enabling Grids for E-scienc. E • Support for ‘scattered’ sandboxes located on SEs providing a gridftp interface is there – allows inclusion of bigger files in the sandbox – allows saving disk space on the WMS node – allows simple sharing and reuse of sandboxes between different jobs and nodes of a compound job through very simple JDL descriptions –. . . the old approach is however still supported • NOTE: the WMProxy does not manage the areas on external servers where sandboxes are stored: just uses it • WMProxy also support sandboxes located on Gridsite HTTPS servers – Needs htcp client installed on WNs (does not come by default yet) • The WMProxy itself provides a Grid. Site HTTPS server – This makes job files accessible with a Web Browser presenting a X 509 certificate INFSO-RI-508833 NA 4 Generic Applications Meeting, 9 -11 Jan 2006, Catania
Job’s files perusal Enabling Grids for E-scienc. E (g. Lite version >= 1. 5) • Allow users inspecting job’s files content while the job is running – the simple way, i. e. , no restricted shell for debugging yet: a process running on the WN together with the job, transfers chunks of selected job’s files to the WMS node or to a specified URI (Perusal. Files. Dest. URI) – Files selection can be specified/changed/removed by the user through a specific WMProxy operation (enable. File. Perusal) – The WMProxy client allows retrieving and visualizing the chunks of job’s files generated so far (get. Perusal. Files operation) INFSO-RI-508833 NA 4 Generic Applications Meeting, 9 -11 Jan 2006, Catania
WMProxy client tools (1/2) Enabling Grids for E-scienc. E WMProxy can be accessed through: • WSDL – Developers can generate themselves client stubs in their favourite language from the published WSDL • C++/Java/Python API – Light client libraries generated using respectively g. Soap, Axis and SOAPpy – Includes the security/delegation stuff that is not part of the stubs generated from WSDL – Hides WSDL/SOAP tooling dirty details – Python API available starting from g. Lite 1. 5 INFSO-RI-508833 NA 4 Generic Applications Meeting, 9 -11 Jan 2006, Catania
WMProxy client tools (2/2) Enabling Grids for E-scienc. E • C++ Command line: – – – – th is i sn glite-wms-job-delegate-proxy ot glite-wms-job-submit th eu su glite-wms-job-list-match al py glite-wms-job-cancel th on glite-wms-job-output CL I glite-wms-job-perusal glite-wms-job-proxy-info (still in dev) • Full support for bulk submission • Very similar and “homogeneous” with the CREAM CLI • Jobs follow-up still has to be done through the python commands: – glite-job-status – glite-job-logging-info INFSO-RI-508833 NA 4 Generic Applications Meeting, 9 -11 Jan 2006, Catania
Next Steps Enabling Grids for E-scienc. E • Service Profiling – First measures of bulk submission performances are promising § g. Lite 1. 5 => ~180 secs for 500 jobs • • • – Goal is to get in the short term to ~60 secs for 1000 jobs JSDL support – XML based language to describe job characteristics – Currently a GGF recommendation; emerging as standard – Plan is to map JSDL to JDL and provide needed extensions to the JSDL schema Auth. Z on a per-job basis using gacl – Control user access to single jobs (e. g. cancel, get-output, perusal etc. ) through gacl set by job’s owner Scalability and high availability, e. g. : – Make service usable behind a DNS alias switch – Mirrored file system between independent node –. . . INFSO-RI-508833 NA 4 Generic Applications Meeting, 9 -11 Jan 2006, Catania
Conclusions Enabling Grids for E-scienc. E • First release of WMProxy was included in g. Lite 1. 4 • Since then (Sept 2005) – Problems raised during the testing and certification phase have been addressed (the usual savannah channel) – Close collaboration with some experiments/groups (e. g. CDF, ECGI, CMS, ATLAS) has been carried on to evolve/improve the service – The result is part of g. Lite 1. 5 and will be part of the big merge with LCG • Your feedback can be very valuable for steering future developments of WMProxy service – Send questions/feedback/requirements to egee@datamat. it INFSO-RI-508833 NA 4 Generic Applications Meeting, 9 -11 Jan 2006, Catania
Available doc Enabling Grids for E-scienc. E • WSDL documentation – http: //lxmi. infn. it/egee-jra 1 -wm/wmproxy – http: //jra 1 mw. cvs. cern. ch: 8180/cgi-bin/jra 1 mw. cgi/org. glite. wms. wmproxyinterface/WMProxy. wsdl • WMProxy User’s Guide – https: //edms. cern. ch/document/674643/1 • JDL Attributes Specification – https: //edms. cern. ch/document/590869/1 – http: //egee-jra 1 -wm. mi. infn. it/egee-jra 1 -wm/api_doc/wms_jdl/index. html • API documentation – http: //egee-jra 1 -wm. mi. infn. it/egee-jra 1 -wm/glite-wmproxy-api-index. shtml INFSO-RI-508833 NA 4 Generic Applications Meeting, 9 -11 Jan 2006, Catania
b95c679202e34481a2641ec4ff339087.ppt