fc6ccf31bc86bcbb6553327f975b3a70.ppt
- Количество слайдов: 15
Resource Selection in OSG & SAM-On-The-Fly Parag Mhashilkar Fermi National Accelerator Laboratory Condor Week 2006 April 25, 2006 Parag Mhashilkar, Fermilab 1
Resource Selection in OSG l Overview Why Resource Selection Service? l Resource Selection Service in OSG l Collaborators Involved l Resource Selection Service Architecture l Current Status l Future Work l April 25, 2006 Parag Mhashilkar, Fermilab 2
Why Resource Selection Service? l A job can l l Resources can l l Provide special services. Example: disk > 5 GB, memory > 512 MB, Software toolkit-X installed, etc. Job Resource Selection Service Without a resource selection service l l Have special requirements. Example: disk > 1 GB, memory > 256 MB User has to keep track of availability of every resource that can run the job. Resource selection service can l l Gather the information about the job and resources and make decision where the job should run. Dereference abstract attributes to bind to the job during matchmaking or execution time. April 25, 2006 1 Parag Mhashilkar, Fermilab 2 3 ……. . . N Resources 3
Resource Selection Service (Re. SS) in OSG l l The Resource Selector is a component of the OSG Job Management Infrastructure. Sponsored by PPDG, the project started in Sep 2005, with an aim to develop and deploy a Resource Selection Service that VOs with requirements on job management similar to DZero can use. Requirements that Re. SS should support – l community of 100 users, submitting jobs to 10 job schedulers. l 10, 000 jobs per day, with bursts of 2, 000 per hour. l 100 clusters l job and resource descriptions in classad format with 200 attributes and 5 Kb of information. l With Re. SS l l Emphasis is on supporting several Virtual Organizations (VO) based on policies. VOs can tag resources which are certified to run their jobs making resource selection more manageable. April 25, 2006 Parag Mhashilkar, Fermilab 4
Collaborators Involved l VOs l l l l l DZero Atlas LIGO Fermi. Grid Fermilab OSG TG-MIG group CEMon group from INFN Condor group from UW Madison GLUE group from INFN April 25, 2006 Parag Mhashilkar, Fermilab 5
Resource Selection Service Architecture job What Gate? Info Gatherer classads Condor Match Maker Gate 3 Condor Scheduler job classads Gate 1 classads CEMon jobs classads Gate 2 info CEMon jobs Gate 3 info CEMon jobs info CE job-managers GIP job-managers CE job-managers GIP job-managers CLUSTER April 25, 2006 Parag Mhashilkar, Fermilab 6
Architecture … Generic Information Provider (GIP) describes resources in LDIF format using GLUE Schema. l CEMon provides flexible plug-in mechanism to translate classads. l Information Gatherer (IG) l l Subscribes to several CEMons to gather the information about the CEs and advertises it to several condor pools. It acts as an adapter between CEMons and Condor matchmaker. Support for callouts to external match-making functions. These functions can make match-making more extensible. April 25, 2006 Parag Mhashilkar, Fermilab 7
Current Status l First release of the Re. SS is scheduled to be included in OSG ITB-0. 5. 0 l l l Focus on testing functionality, scalability and stress test of Information Gatherer. Validate Classads from different sites so they can be used for common resource selection criteria. Study the scalability and investigate how IG handles O(10) CEMon registrations and O(100) classad processing and transferring to the condor_collector. Stress test study of the IG. Simulate the load of the production environment by increasing 10 times the frequency of classad publication by the O(10) CEMon's. Stress test the match making infrastructure submitting O(1) job/sec for 1 hour. In particular, and push the limits …. . Evaluate the efficiency of the condor_negotiator using call-out to external code for match-making. April 25, 2006 Parag Mhashilkar, Fermilab 8
Future Working on deployment procedures for OSG production in context of VDT. l Work with other VOs with requirements similar to mentioned earlier and extend the support of Re. SS for other VOs. l Improve the scalability of Re. SS beyond the Run. II experiments. l Have end-to-end Samgrid-OSG integration by OSG 0. 6. 0 l April 25, 2006 Parag Mhashilkar, Fermilab 9
Sam-on-the-fly l Overview What is SAM? l Why sam-on-the-fly? l Addressing the Challenges l Current Status l April 25, 2006 Parag Mhashilkar, Fermilab 10
What is SAM? l Samgrid consists of l Job Management (JIM) l Data Management (SAM) l SAM stands for ‘Sequential Access via Metadata’ (SAM). l The project was started in 1997 by DZero l SAM is organized around the concepts of a dataset (Catalog of file metadata). Samgrid l Experiments: l Job Management (JIM) Data Management (SAM) DZero, CDF, MINOS April 25, 2006 Parag Mhashilkar, Fermilab 11
Why Sam-on-the-fly? Sites have resources that are available for longer duration. For example cluster at UW has 1 TB disk for DZero users for next 2 months. l SAM-on-the-fly tries to address the issue of making the resources available for the users dynamically. l Before DZero users can use this resource, there is a need to l l Deploy and configure SAM services like l l l l Station (collection of resources controlled by SAM system) Stager (service to handle staging of files on disk used by SAM) FSS (service to interface with the FS) File transferring services like gridftp, sam_fcp, etc. Register SAM services with central SAM DB Start and Stop SAM station services. Do the cleanup when the lease period expires. Firewall and security configurations. April 25, 2006 Parag Mhashilkar, Fermilab 12
Addressing the Challenges Job Resource April 25, 2006 1. Deploy and configure SAM 2. Register SAM services with the SAM system 3. Start SAM services for the duration of lease 4. When the lease expires, stop SAM 5. Do the cleanup Parag Mhashilkar, Fermilab 13
Current Status l l l Automated the product deployment steps. Semi-Automated the SAM services registration steps. Automated starting and stopping of SAM services. This project is a work in progress. People: l l Fermi National Accelerator Laboratory University of Wisconsin Madison: Alain Roy and Hidayat Teonadi. April 25, 2006 Parag Mhashilkar, Fermilab 14
References l Resource l l Selection Service for OSG http: //www. opensciencegrid. org http: //osg. ivdgl. org/twiki/bin/view/Resource. Selection/Web. Home l SAM l http: //projects. fnal. gov/samgrid Thanks to Miron and Condor Group for all the support! Questions? April 25, 2006 Parag Mhashilkar, Fermilab 15


