Скачать презентацию SAMGrid JIM and CDF Development Rick St Denis Скачать презентацию SAMGrid JIM and CDF Development Rick St Denis

20d16a8fbc8797ba665e2aea4bf0d45d.ppt

  • Количество слайдов: 31

SAMGrid: JIM and CDF Development Rick St. Denis, University of Glasgow • CDF Accepts SAMGrid: JIM and CDF Development Rick St. Denis, University of Glasgow • CDF Accepts the Need for the Grid – Requirements • How to Meet the Need – Status of SAMGrid for CDF 4 March 2004 Grid. PP 9 th Collaboration Meeting

Spokespersons’ Requirements for CDF Maximize physics output @ low Lumi –L 3 output rate: Spokespersons’ Requirements for CDF Maximize physics output @ low Lumi –L 3 output rate: 80 -> 360 Hz by 06 CDF needs the Grid Finance Director’s review, International Committee: 50% computing outside FNAL CDFGrid supported by FNAL PAC 4 March 2004 Grid. PP 9 th Collaboration Meeting

Scale of CDF Requirements THz FY 04 3. 7 %offsite CPU Speed 25% 3 Scale of CDF Requirements THz FY 04 3. 7 %offsite CPU Speed 25% 3 GHz #duals FY 05 9. 0 50% 5 GHz +360 FY 06 16. 5 50% 8 GHz +220 150 6 -7 sites, 100 Duals each, by 2006 + 700 @FNAL 4 March 2004 Grid. PP 9 th Collaboration Meeting

CDF Computing Model • Develop Analysis on desktop – Access to all CDF data CDF Computing Model • Develop Analysis on desktop – Access to all CDF data from anywhere • Large scale processing on batch clusters – Submission from anywhere Implemented Now with – interactive tools: ls, top, head/tail/cat CAF – Output to scratch space or desktop 4 March 2004 Grid. PP 9 th Collaboration Meeting

Use Cases for Summer 2004 • User Level MC Production – All CDF Users Use Cases for Summer 2004 • User Level MC Production – All CDF Users have access – No data on site -> SAM write SAM Essential for Summer 2004 • User Level Data Access – All users have access – Selected samples on site: Full SAM Support 4 March 2004 Grid. PP 9 th Collaboration Meeting

Medium Term Vision • Many Sites • Fully transparent submission to all of CDF Medium Term Vision • Many Sites • Fully transparent submission to all of CDF resources: 75% FNAL, 25% outside • Fully transparent input and output of data 4 March 2004 Grid. PP 9 th Collaboration Meeting

Summer 04 Functionality • User selects submission site, saying what dataset they will use Summer 04 Functionality • User selects submission site, saying what dataset they will use • System checks they can do this (privileges) • User access with SAM/d. Cache • User registers output with SAM 4 March 2004 Grid. PP 9 th Collaboration Meeting

October 04 • To extend beyond 25% outside computing JIM is essential: JIM Test October 04 • To extend beyond 25% outside computing JIM is essential: JIM Test for CDF June 04, production October 04 • HOWEVER: It already seems that the 25% resources are not sufficient for the produciton passes: will want JIM earlier. 4 March 2004 Grid. PP 9 th Collaboration Meeting

CDF Grid from a a User Perspective CDFGrid from User Perspective CAF Gui/CLI Uses CDF Grid from a a User Perspective CDFGrid from User Perspective CAF Gui/CLI Uses SAM AC++ Grid Italy 4 March 2004 Toronto Korea Only. Grid. Lab Outside Fermilab Taiwan Fermi. CAF Grid. PP 9 th Collaboration Meeting UK

CDF Grid Strategy • 25% of CDF Computing from external resources. All CDF computing CDF Grid Strategy • 25% of CDF Computing from external resources. All CDF computing on CDF Grid by April 15: Utilize resources fully controlled by CDF: Kerberos/fbsng: d. CAF + SAM • October 15, 2004: JIM to capture shared resources • June 2005: 50% of Computing resources external 4 March 2004 Grid. PP 9 th Collaboration Meeting

Anywhere @ each site Desktop Simple JIM Private LAN Globus GK CAF Submitter SAM Anywhere @ each site Desktop Simple JIM Private LAN Globus GK CAF Submitter SAM Station @regional centers Condor Submitter WN Private LAN d. Cache @FNAL SAM DB Condor Matchmaker 4 March 2004 Grid. PP 9 th Collaboration Meeting June 2004 testing June 2005 required

Detailed JIM User Interface Flow of: job data User Interface Submission meta-data User Interface Detailed JIM User Interface Flow of: job data User Interface Submission meta-data User Interface Submission Global Job Queue Resource Selector Match Making Global DH Services Info Gatherer SAM Naming Server Info Collector Grid Client SAM Log Server Resource Optimizer MSS Cluster Data Handling Grid Gateway SAM DB Server Site RC Local Job Handling SAM Station (+other servs) Cache SAM Stager(s) Local Job Handler (CAF, D 0 MC, BS, . . . ) 4 AAA March 2004 Worker Nodes Bookkeeping Service Info Manager JIM Advertise Dist. FS Meta. Data Catalog MDS Info Providers Web Serv Grid Monitoring XML DB server Site Conf. Glob/Loc JID map Grid. PP 9 th Collaboration Meeting. . . User Tools Site

Meeting the Needs • • • Progress in SAM JIM Status Run. Job CDFGrid. Meeting the Needs • • • Progress in SAM JIM Status Run. Job CDFGrid. Workshop: “Nerd’s Paradise” Strict Project Management and process to respond to operational issues 4 March 2004 Grid. PP 9 th Collaboration Meeting

Progress in SAM • Dbserver, the database server between applications and Oracle, was upgraded Progress in SAM • Dbserver, the database server between applications and Oracle, was upgraded to use a common schema for CDF and D 0. • All CDF data files are in SAM • Sam in is in beta testing on the CDF CAF (1200 cpus): passed 20 TB/Day delivery • Minos uses SAM for its Data Handling • Steve Mrenna (Phenomenology) depositing ALPGEN files in SAM for common CDF/D 0 use. 4 March 2004 Grid. PP 9 th Collaboration Meeting

JIM Deployment Issues Focus: • 200 jobs each getting 200 filesexpert! generated 120000 Communication JIM Deployment Issues Focus: • 200 jobs each getting 200 filesexpert! generated 120000 Communication with the requests simultaneously to the DBServer! – Sensible sam: reliability went to 60%. Now add retries. Training Users • D 0 has D 0 Tools: Big script; determines where user is and copies files: harder to get into a sandbox; • CAF conditions users! Distribution and compatibility: • This has made great strides with SAM, now time for JIM 4 March 2004 Grid. PP 9 th Collaboration Meeting

Run. Job • Dedicated farms at FNAL will go away and Run. Job will Run. Job • Dedicated farms at FNAL will go away and Run. Job will be used for production processing of data • CDF will use Run. Job for MC production • Dave Evans worked for CDF for 2 mo. : has made CDFRun. Job based on Run. Job(Shakar), a tool common to CMS. Morag will work on this. 4 March 2004 Grid. PP 9 th Collaboration Meeting

Florida workshop: • 11 installations in about 2 hours. Integrated with d. CAF in Florida workshop: • 11 installations in about 2 hours. Integrated with d. CAF in 2 cases in 2 days. Now 20! • 3 in Asia, 4 in Europe • 6 sites committed to summer 2004 usage of their facilities for all of CDF (mostly MC) • Sam installation now: initsam cdf • Follow-up on April 1. • Each site has a local user support person to reduce load on core development team. • Generally: Security ate 80% of the effort! 4 March 2004 Grid. PP 9 th Collaboration Meeting

4 March 2004 Grid. PP 9 th Collaboration Meeting 4 March 2004 Grid. PP 9 th Collaboration Meeting

Florida Workshop: After 2 Days 4 March 2004 Grid. PP 9 th Collaboration Meeting Florida Workshop: After 2 Days 4 March 2004 Grid. PP 9 th Collaboration Meeting

2 TB/Day: Karlsruhe 4 March 2004 Grid. PP 9 th Collaboration Meeting 2 TB/Day: Karlsruhe 4 March 2004 Grid. PP 9 th Collaboration Meeting

CDF Dcache on CAF ALL CDF on CAF reads 20 TB/Day 4 March 2004 CDF Dcache on CAF ALL CDF on CAF reads 20 TB/Day 4 March 2004 Grid. PP 9 th Collaboration Meeting

4 March 2004 Grid. PP 9 th Collaboration Meeting 4 March 2004 Grid. PP 9 th Collaboration Meeting

4 March 2004 Grid. PP 9 th Collaboration Meeting 4 March 2004 Grid. PP 9 th Collaboration Meeting

Dcache and SAM • Dcache shapes traffic into disk: If a SAM cache is Dcache and SAM • Dcache shapes traffic into disk: If a SAM cache is large, need to use Dcache instead of nfs mounts • Dcache gives the user what is requested. 1 TB gets same priority as 1 GB: CDF users must send email requesting data to be staged. • SAM examines consumption rate before staging next files – No EMAIL needed. • SAM uses Dcache for its Caching at FNAL. • This needs further work with SRM 4 March 2004 Grid. PP 9 th Collaboration Meeting

SAMGrid Management Sam Management Team Sam Project Leaders Sam Technical Leaders Sam Operations And SAMGrid Management Sam Management Team Sam Project Leaders Sam Technical Leaders Sam Operations And Projects 4 March 2004 Grid. PP 9 th Collaboration Meeting Sam Design

Sam. Grid Development Process Chaired by Project Leaders Chaired by Technical Managers SAMGrid Operations/Projects Sam. Grid Development Process Chaired by Project Leaders Chaired by Technical Managers SAMGrid Operations/Projects Issue Raised SAMGrid Design SAMGrid Management Team Grid Deliverables Subproject 4 March 2004 Grid. PP 9 th Collaboration Meeting

Subproject Organization • Each Subproject has a subproject leader (SPL) responsible for making a Subproject Organization • Each Subproject has a subproject leader (SPL) responsible for making a plan and reporting progress. • Each Subproject has one of the Technical leaders evaluating against an assessment template. • No deliverable requires more than 3 mo work to deliver. 4 March 2004 Grid. PP 9 th Collaboration Meeting

Sub. Project Assessment Template 1. 2. 3. 4. 5. 6. 7. 8. Background Documents Sub. Project Assessment Template 1. 2. 3. 4. 5. 6. 7. 8. Background Documents Project Definition/Mission Statement Deliverables and timetable Inter-project deliverables Project status Challenges and Critical Path Items Lessons Learned Project specific comments, alternate views 4 March 2004 Grid. PP 9 th Collaboration Meeting

SAMGrid Assigned Sub. Projects MC / Reconstruction Housekeeping Work Flow. Package MCRequest Housekeeping H SAMGrid Assigned Sub. Projects MC / Reconstruction Housekeeping Work Flow. Package MCRequest Housekeeping H Stream for CDF JIM: MCD 0 Test Harness User analysis Apps JIM: D 0 Tools Infrastructure Common API 4 March 2004 Retire CDF Replica Catalog Database Server Rewrite Database Servers to. Linux Configuration Management Caching Metadata Query with configurable Params Grid. PP 9 th Collaboration Meeting

Status of Assessments • Subprojects defined • Interviews conducted on about ½ • Assessment Status of Assessments • Subprojects defined • Interviews conducted on about ½ • Assessment reports being written 4 March 2004 Grid. PP 9 th Collaboration Meeting

Conclusions • CDF has embraced the need for the Grid to achieve its physics Conclusions • CDF has embraced the need for the Grid to achieve its physics mission • Progress in deployment, robustness testing has SAM in CDF • JIM is rapidly solving its problems • … with the help of a review and management process 4 March 2004 Grid. PP 9 th Collaboration Meeting