Скачать презентацию DØ Grid Computing Gavin Davies Frédéric Villeneuve-Séguier Imperial Скачать презентацию DØ Grid Computing Gavin Davies Frédéric Villeneuve-Séguier Imperial

dbd57205e7aaf7cce3ddfd5745b4632c.ppt

  • Количество слайдов: 18

DØ Grid Computing Gavin Davies, Frédéric Villeneuve-Séguier Imperial College London On behalf of the DØ Grid Computing Gavin Davies, Frédéric Villeneuve-Séguier Imperial College London On behalf of the DØ Collaboration and the SAMGrid team The 2007 Europhysics Conference on High Energy Physics Manchester, England 19 -25 July 2007 EPS-HEP 2007 Manchester

Outline • Introduction v DØ Computing Model v SAMGrid Components v Interoperability • Activities Outline • Introduction v DØ Computing Model v SAMGrid Components v Interoperability • Activities v Monte Carlo Generation v Data Processing • Conclusion v Next Steps / Issues v Summary EPS-HEP 2007 Manchester 2

Introduction • Tevatron – Running experiments (Less data than LHC, but still PBs/experiment) – Introduction • Tevatron – Running experiments (Less data than LHC, but still PBs/experiment) – Growing - great physics & better still to come. . • Have >3 fb-1 of data and expect up 5 fb-1 more by end 2009 • Computing model: Datagrid (SAM) for all data handling & originally distributed computing with evolution to automated use of common tools/solutions on the grid (SAMGrid) for all tasks – Started with production tasks eg MC generation, data processing • Greatest need & easiest to ‘gridify’ - ahead of wave & a running expt. – Base on SAMGrid, but have program of interoperability from v. early on • Initially LCG and then OSG – Increased automation, user analysis considered last • SAM gives remote data analysis EPS-HEP 2007 Manchester 3

Computing Model Remote Farms Central Farms Data Handling Services Raw Data RECO MC User Computing Model Remote Farms Central Farms Data Handling Services Raw Data RECO MC User Data Central Storage User Desktops Remote Analysis Systems Central Analysis Systems EPS-HEP 2007 Manchester 4

Components - Terminology • SAM (Sequential Access via Metadata) – Well developed metadata & Components - Terminology • SAM (Sequential Access via Metadata) – Well developed metadata & distributed data replication system – Originally developed by DØ & FNAL-CD, now used by CDF & MINOS • JIM (Job Information and Monitoring) – handles job submission and monitoring (all but data handling) – SAM + JIM →SAMGrid – computational grid • Runjob – handles job workflow management • Automation – d 0 repro tools, automc • (UK Role – Project leadership, key technology and operations) EPS-HEP 2007 Manchester 5

SAMGrid Interoperability • Long programme of interoperability – LCG 1 st and then OSG SAMGrid Interoperability • Long programme of interoperability – LCG 1 st and then OSG • Step 1: Co-existence – use shared resources with SAM(Grid) headnode – Widely done for both MC and p 17 2004/5 data reprocessing • Step 2: SAMGrid interface – SAM does data handling & JIM job submission – Basically forwarding mechanism SAMGrid-LCG – 1 st used early 2006 for data fixing – MC & p 20 data reprocessing since SAMGrid-OSG – Learnt from SAMGrid-LCG – p 20 data reprocessing (spring 07) • Replicate as needed EPS-HEP 2007 Manchester 6

SAM plots http: //d 0 db-prd. fnal. gov/sam/diagnostics. html 1 PB / month Over SAM plots http: //d 0 db-prd. fnal. gov/sam/diagnostics. html 1 PB / month Over 10 PB (250 B evts) last yr Up to 1. 6 PB moved per month (x 5 increase over 2 yrs ago) • SAM TV - monitor SAM and SAM stations • Continued success: SAM shifters – often remote http: //d 0 om. fnal. gov/sam. TV/current/ EPS-HEP 2007 Manchester 7

SAMGrid plots - I JIM: > 10 active execution sites “Moving to forwarding nodes” SAMGrid plots - I JIM: > 10 active execution sites “Moving to forwarding nodes” “No longer add red dots” http: //samgrid. fnal. gov: 8080/list_of_schedulers. php http: //samgrid. fnal. gov: 8080/list_of_resources. php http: //samgrid. fnal. gov: 8080/known_scheduler. php? scheduler name=samgrid. fnal. gov EPS-HEP 2007 Manchester 8

SAMGrid plots - II “native” SAMGrid (Europe) SAMGrid-LCG forwarding mechanism (Europe) SAMGrid-OSG forwarding Mechanism SAMGrid plots - II “native” SAMGrid (Europe) SAMGrid-LCG forwarding mechanism (Europe) SAMGrid-OSG forwarding Mechanism (US) “native” SAMGrid (China !) EPS-HEP 2007 Manchester 9

Monte Carlo • Massive increase with spread of SAMGrid use & LCG (OSG later) Monte Carlo • Massive increase with spread of SAMGrid use & LCG (OSG later) • p 17/p 20 – 550 M events since 09/05 • Up to 12 M events/week – Downtimes due to software transition, p 20 reprocessing and site availability • 80% in Eu – 30% in Fr • UKRAC – Full details on web – http: //www. hep. ph. ic. ac. uk/~villeneu /d 0_uk_rac. html • LCG gridwide submission reached scaling problem EPS-HEP 2007 Manchester 10

Data – reprocessing & fixing - I • p 14 Reprocessing: Winter 2003/04 – Data – reprocessing & fixing - I • p 14 Reprocessing: Winter 2003/04 – 100 M events remotely, 25 M in UK – Distributed computing rather than Grid • p 17 Reprocessing: Spring – Autumn 05 Site certification – x 10 larger ie 1 B events, 250 TB, from raw – SAMGrid as default • p 17 Fixing: Spring 06 – All Run. IIa – 1. 4 B events in 6 weeks – SAMGrid-LCG ‘burnt-in’ • Increasing functionality – Primary processing tested, will become default EPS-HEP 2007 Manchester 11

Data – reprocessing & fixing - II • p 20 (Run IIb) reprocessing – Data – reprocessing & fixing - II • p 20 (Run IIb) reprocessing – Spring 2007 – Improved reconstruction & detector calibration for Run. IIb data (2006 and early 2007) – ~ 500 M events (75 TB) – Reprocessing using native SAMGrid, SAMGrid-OSG (& SAMGrid-LCG) – 1 st large scale use of SAMGrid-OSG – Up to 10 M events produced / merged remote daily (initial goal was 3 M/day) – Successful reprocessing EPS-HEP 2007 Manchester 12

Integration of a “grid” (OSG) P 20 reprocessing • Such exercises ‘debug’ a grid Integration of a “grid” (OSG) P 20 reprocessing • Such exercises ‘debug’ a grid – Revealed some teething troubles – Solved quickly thanks to “LCG” • GOC, OSG and LCG partners • SAMGrid-LCG experience – Up to 3 M/day at full speed OSG (initially) A lot of green A lot of red EPS-HEP 2007 Manchester 13

Next steps / issues • Complete endgame development – Additional functionality /usage – skimming, Next steps / issues • Complete endgame development – Additional functionality /usage – skimming, primary processing on the grid as default (& at multiple sites? ) – Additional resources - Completing the forwarding nodes • Full data / MC functionality for both LCG & OSG • Scaling issues to access the full LCG &OSG worlds – Data analysis – how gridified do we go? – an open issue • Need to be ‘interoperable’ (Fermigrid, LCG sites, OSG, …) • Will need development, deployment and operations effort • “Steady” state – goal to reach by end of CY 07 (≥ 2 yrs running) – Maintenance of existing functionality – Continued experimental requests – Continued evolution as grid standard’s evolve • Manpower – Development, integration and operation handled by the dedicated few EPS-HEP 2007 Manchester 14

Summary / plans • Tevatron & DØ performing very well – A lot of Summary / plans • Tevatron & DØ performing very well – A lot of data & physics, with more to come • SAM & SAMGrid critical to DØ – Grid computing model as important as any sub-detector • Without LCG and OSG partners would not have worked either – Largest grid ‘data challenges’ in HEP (I believe) – Learnt a lot about the technology, and especially how it scales – Learnt a lot about organisation / operation of such projects – Some of these can be abstracted and of benefit to others… – Accounting model evolved in parallel (~$4 M/yr) • Baseline: Ensure (scaling for) production tasks – Further improving operational robustness / efficiency underway • In parallel open question of data analysis – will need to go part way EPS-HEP 2007 Manchester 15

Back-ups EPS-HEP 2007 Manchester 16 Back-ups EPS-HEP 2007 Manchester 16

SAMGrid Architecture EPS-HEP 2007 Manchester 17 SAMGrid Architecture EPS-HEP 2007 Manchester 17

Interoperability architecture Network Boundaries Forwarding Node LCG/OSG Cluster VO-Service (SAM) Job Flow Offers Service Interoperability architecture Network Boundaries Forwarding Node LCG/OSG Cluster VO-Service (SAM) Job Flow Offers Service EPS-HEP 2007 Manchester 18