
2ab534304af46e9c6d5487c9b05ab9b7.ppt
- Количество слайдов: 64
Grid Computing at LHC and ATLAS Data Challenges IMFP-2006 El Escorial, Madrid, Spain. April 4, 2006 Gilbert Poulard (CERN PH-ATC) IMPF-2006 G. Poulard - CERN PH-ATC
Overview q Introduction q LHC experiments Computing challenges q WLCG: Worldwide LHC Computing Grid q ATLAS experiment o Building the Computing System q Conclusions IMPF-2006 G. Poulard - CERN PH-ATC 2
LHC Introduction: LHC/CERN (CERN) Mont Blanc, 4810 m Geneva IMPF-2006 G. Poulard - CERN PH-ATC 3
LHC Computing Challenges q Large distributed community q Large data volume … and access to it to everyone q Large CPU capacity IMPF-2006 G. Poulard - CERN PH-ATC 4
Challenge 1: Large, distributed community ATLAS “Offline” software effort: 1000 person-years per experiment CMS Software life span: 20 years IMPF-2006 ~ 5000 Physicists around the world - around the clock G. Poulard - CERN PH-ATC LHCb 5
Large data volume Rate RAW [Hz] [MB] ESD r. DST RECO [MB] AOD Monte Carlo [k. B] [MB/evt] % of real ALICE HI 100 12. 5 250 ALICE pp 100 1 0. 04 ATLAS 200 1. 6 0. 5 100 2 20 CMS 150 1. 5 0. 25 50 2 100 LHCb 2000 0. 025 4 300 0. 4 0. 5 100 20 50 days running in 2007 107 seconds/year pp from 2008 on ~2 x 109 events/experiment 106 seconds/year heavy ion IMPF-2006 G. Poulard - CERN PH-ATC 6
Large CPU capacity q ATLAS resources in 2008 o Assume 2 x 109 events per year (1. 6 MB per event) o First pass reconstruction will run at CERN Tier-0 o Re-processing will be done at Tier-1 s (Regional Computing Centers) (10) o Monte Carlo simulation will be done at Tier-2 s (e. g. Physics Institutes) (~30) 4 Ø o Full simulation of ~20% of the data rate Analysis will be done at Analysis Facilities; Tier-2 s; Tier-3 s; … CPU (MSi 2 k) Disk (PB) Tape (PB) Tier-0 4. 1 0. 4 5. 7 CERN Analysis Facility 2. 7 1. 9 0. 5 Sum of Tier-1 s 24. 0 14. 4 9. 0 Sum of Tier-2 s 19. 9 8. 7 0. 0 Total 50. 7 25. 4 15. 2 ~50000 today’s CPU IMPF-2006 G. Poulard - CERN PH-ATC 7
IMPF-2006 CERN 58% pledged Tier-1 Tier-2 CPU Requirements G. Poulard - CERN PH-ATC 8
IMPF-2006 CERN 54% pledged Tier-1 Tier-2 Disk Requirements G. Poulard - CERN PH-ATC 9
IMPF-2006 CERN 75% pledged Tier-1 Tape Requirements G. Poulard - CERN PH-ATC 10
LHC Computing Challenges q Large distributed community q Large data volume … and access to it to everyone q Large CPU capacity q How to face the problems? q CERN Computing Review (2000 -2001) Ø Ø Ø “Grid” is the chosen solution “Build” the LCG (LHC Computing Grid) project Roadmap for the LCG project Ø o And for experiments In 2005 LCG became WLCG IMPF-2006 G. Poulard - CERN PH-ATC 11
What is the Grid? q The World Wide Web provides seamless access to information that is stored in many millions of different geographical locations. q The Grid is an emerging infrastructure that provides seamless access to computing power and data storage capacity distributed over the globe. o o o Global Resource Sharing Secure Access Resource Use Optimization The “Death of Distance” Open Standards IMPF-2006 networking G. Poulard - CERN PH-ATC 12
The Worldwide LHC Computing Grid Project - WLCG q Collaboration o LHC Experiments o Grid projects: Europe, US o Regional & national centres q Choices o Adopt Grid technology. o Go for a “Tier” hierarchy q. Goal o Prepare and deploy the computing environment to help the experiments analyse the data from the LHC detectors. Lab m Uni x grid for a regional group Lab a Tier 3 physics department SARA USA Uni a UK France Tier 1 Tier 2 Italy CERN Tier 0 Taipei Lab b Uni y Uni n Spain Germany Lab c Uni b grid for a physics study group Desktop IMPF-2006 G. Poulard - CERN PH-ATC 13
The Worldwide LCG Collaboration q Members o The experiments o The computing centres – Tier-0, Tier-1, Tier-2 q Memorandum of understanding o Resources, services, defined service levels o Resource commitments pledged for the next year, with a 5 year forward look IMPF-2006 G. Poulard - CERN PH-ATC 14
WLCG services – built on two major science grid infrastructures EGEE OSG IMPF-2006 - Enabling Grids for E-Scienc. E - US Open Science Grid G. Poulard - CERN PH-ATC 15
Enabling Grids for E-Scienc. E • EU supported project • Develop and operate a multiscience grid • Assist scientific communities to embrace grid technology • First phase concentrated on operations and technology • Second phase (2006 -08) Emphasis on extending the scientific, geographical and industrial scope à world-wide Grid infrastructure à international collaboration à in phase 2 will have > 90 partners in 32 countries IMPF-2006 G. Poulard - CERN PH-ATC 16
Open Science Grid q Multi-disciplinary Consortium o Running physics experiments: CDF, D 0, LIGO, SDSS, STAR o US LHC Collaborations o Biology, Computational Chemistry o Computer Science research o Condor and Globus o DOE Laboratory Computing Divisions o University IT Facilities q OSG today o 50 Compute Elements o 6 Storage Elements o VDT 1. 3. 9 o 23 VOs IMPF-2006 G. Poulard - CERN PH-ATC 17
Architecture – Grid services q Storage Element o Mass Storage System (MSS) (CASTOR, Enstore, HPSS, d. Cache, etc. ) o Storage Resource Manager (SRM) provides a common way to access MSS, independent of implementation o File Transfer Services (FTS) provided e. g. by Grid. FTP or srm. Copy q Computing Element o Interface to local batch system e. g. Globus gatekeeper. o Accounting, status query, job monitoring q Virtual Organization Management o Virtual Organization Management Services (VOMS) o Authentication and authorization based on VOMS model. q Grid Catalogue Services o Mapping of Globally Unique Identifiers (GUID) to local file name o Hierarchical namespace, access control q Interoperability o EGEE and OSG both use the Virtual Data Toolkit (VDT) o Different implementations are hidden by common interfaces IMPF-2006 G. Poulard - CERN PH-ATC 18
Technology - Middleware q Currently, the LCG-2 middleware is deployed in more than 100 sites q It originated from Condor, EDG, Globus, VDT, and other projects. q Will evolve now to include functionalities of the g. Lite middleware provided by the EGEE project which has just been made available. q Site services include security, the Computing Element (CE), the Storage Element (SE), Monitoring and Accounting Services – currently available both form LCG-2 and g. Lite. q VO services such as Workload Management System (WMS), File Catalogues, Information Services, File Transfer Services exist in both flavours (LCG-2 and g. Lite) maintaining close relations with VDT, Condor and Globus. IMPF-2006 G. Poulard - CERN PH-ATC 19
Technology – Fabric Technology q Moore’s law still holds for processors and disk storage o For CPU and disks we count a lot on the evolution of the consumer market o For processors we expect an increasing importance of 64 -bit architectures and multicore chips q Mass storage (tapes and robots) is still a computer centre item with computer centre pricing o It is too early to conclude on new tape drives and robots q Networking has seen a rapid evolution recently o Ten-gigabit Ethernet is now in the production environment o Wide-area networking can already now count on 10 Gb connections between Tier-0 and Tier-1 s. This will move gradually to the Tier-1 – Tier-2 connections. IMPF-2006 G. Poulard - CERN PH-ATC 20
Common Physics Applications q Core software libraries o SEAL-ROOT merger o Scripting: CINT, Python o Mathematical libraries o Fitting, MINUIT (in C++) q Data management o POOL: ROOT I/O for bulk data RDBMS for metadata o Conditions database – COOL q Event simulation o Event generators: generator library (GENSER) o Detector simulation: GEANT 4 (ATLAS, CMS, LHCb) o Physics validation, compare GEANT 4, FLUKA, test beam IMPF-2006 q Software development infrastructure o External libraries o Software development and documentation tools o Quality assurance and testing o Project portal: Savannah G. Poulard - CERN PH-ATC 21
The Hierarchical Model q Tier-0 at CERN o Record RAW data (1. 25 GB/s ALICE; 320 MB/s ATLAS) o Distribute second copy to Tier-1 s o Calibrate and do first-pass reconstruction q Tier-1 centres (11 defined) o Manage permanent storage – RAW, simulated, processed o Capacity for reprocessing, bulk analysis q Tier-2 centres (>~ 100 identified) o Monte Carlo event simulation o End-user analysis q Tier-3 o Facilities at universities and laboratories o Access to data and processing in Tier-2 s, Tier-1 s o Outside the scope of the project IMPF-2006 G. Poulard - CERN PH-ATC 22
Tier-1 s Tier-1 Centre Experiments served with priority ALICE TRIUMF, Canada ATLAS CMS LHCb X Grid. KA, Germany X X CC, IN 2 P 3, France X X CNAF, Italy X X SARA/NIKHEF, NL X X Nordic Data Grid Facility (NDGF) X X X X ASCC, Taipei RAL, UK BNL, US X IMPF-2006 X X FNAL, US PIC, Spain X X X G. Poulard - CERN PH-ATC X X 23
Tier-2 s ~100 identified – number still growing IMPF-2006 G. Poulard - CERN PH-ATC 24
Tier-0 -1 -2 Connectivity National Research Networks (NRENs) at Tier-1 s: ASnet LHCnet/ESnet GARR LHCnet/ESnet RENATER DFN SURFnet 6 NORDUnet Red. IRIS UKERNA CANARIE IMPF-2006 G. Poulard - CERN PH-ATC 25
Prototypes q It is important that the hardware and software systems developed in the framework of LCG be exercised in more and more demanding challenges q Data Challenges have been recommended by the ‘Hoffmann Review’ of 2001. They though the main goal was to validate the distributed computing model and to gradually build the computing systems, the results have been used for physics performance studies and for detector, trigger, and DAQ design. Limitations of the Grids have been identified and are being addressed. o A series of Data Challenges have been run by the 4 experiments q Presently, a series of Service Challenges aim to realistic end-to-end testing of experiment use-cases over extended period leading to stable production services. q The project ‘A Realisation of Distributed Analysis for LHC’ (ARDA) is developing end-to-end prototypes of distributed analysis systems using the EGEE middleware g. Lite for each of the LHC experiments. IMPF-2006 G. Poulard - CERN PH-ATC 26
Service Challenges q Purpose o Understand what it takes to operate a real grid service – run for days/weeks at a time (not just limited to experiment Data Challenges) o Trigger and verify Tier 1 & large Tier-2 planning and deployment – - tested with realistic usage patterns o Get the essential grid services ramped up to target levels of reliability, availability, scalability, end-to-end performance q Four progressive steps from October 2004 thru September 2006 o End 2004 - SC 1 – data transfer to subset of Tier-1 s o Spring 2005 – SC 2 – include mass storage, all Tier-1 s, some Tier 2 s o 2 nd half 2005 – SC 3 – Tier-1 s, >20 Tier-2 s –first set of baseline services Jun-Sep 2006 – SC 4 –- CERN PH-ATC G. Poulard pilot service o IMPF-2006 27
Key dates for Service Preparation Sep 05 - SC 3 Service Phase Jun 06 –SC 4 Service Phase Sep 06 – Initial LHC Service in stable operation Apr 07 – LHC Service commissioned 2005 SC 3 2006 2007 cosmics SC 4 LHC Service Operation 2008 First physics First beams Full physics run • SC 3 – Reliable base service – most Tier-1 s, some Tier-2 s – basic experiment software chain – grid data throughput 1 GB/sec, including mass storage 500 MB/sec (150 MB/sec & 60 MB/sec at Tier-1 s) • SC 4 – All Tier-1 s, major Tier-2 s – capable of supporting full experiment software chain inc. analysis – sustain nominal final grid data throughput (~ 1. 5 GB/sec mass storage throughput) • LHC Service in Operation – September 2006 – ramp up to full operational capacity by April 2007 – capable of handling twice the nominal data throughput PH-ATC IMPF-2006 G. Poulard - CERN 28
ARDA: A Realisation of Distributed Analysis for LHC q Distributed analysis on the Grid is the most difficult and least defined topic q ARDA sets out to develop end-to-end analysis prototypes using the LCG-supported middleware. q ALICE uses the Ali. ROOT framework based on PROOF. q ATLAS has used DIAL services with the g. Lite prototype as backend; this is rapidly evolving. q CMS has prototyped the ‘ARDA Support for CMS Analysis Processing’ (ASAP) that us used by several CMS physicists for daily analysis work. q LHCb has based its prototype on GANGA, a common project between ATLAS and LHCb. IMPF-2006 G. Poulard - CERN PH-ATC 29
Production Grids What has been achieved q Basic middleware q A set of baseline services agreed and initial versions in production q All major LCG sites active q 1 GB/sec distribution data rate mass storage to mass storage, > 50% of the nominal LHC data rate q Grid job failure rate 5 -10% for most experiments, down from ~30% in 2004 q Sustained 10 K jobs per day q > 10 K simultaneous jobs during prolonged periods IMPF-2006 G. Poulard - CERN PH-ATC 30
Summary on WLCG q Two grid infrastructures are now in operation, on which we are able to complete the computing services for LHC q Reliability and performance have improved significantly over the past year q The focus of Service Challenge 4 is to demonstrate a basic but reliable service that can be scaled up by April 2007 to the capacity and performance needed for the first beams. q Development of new functionality and services must continue, but we must be careful that this does not interfere with the main priority for this year – reliable operation of the baseline services From Les Robertson (CHEP’ 06) IMPF-2006 G. Poulard - CERN PH-ATC 31
ATLAS A Toroidal LHC Apparatu. S q Detector for the study of high-energy proton-proton collision. q The offline computing will have to deal with an output event rate of 200 Hz. i. e 2 x 109 events per year with an average event size of 1. 6 Mbyte. q Researchers are spread all over the world. ATLAS: ~ 2000 Collaborators ~150 Institutes 34 Countries Diameter Barrel toroid length Endcap end-wall chamber span Overall weight IMPF-2006 25 m 26 m 46 m 7000 Tons G. Poulard - CERN PH-ATC 32
The Computing Model PC (2004) = ~1 k. Spec. Int 2 k ~Pb/sec Event Builder 10 GB/sec Event Filter ~159 k. SI 2 k • Some data for calibration and monitoring to institutess 450 Mb/sec • Calibrations flow back ~ 300 MB/s/T 1 /expt Tier 1 US Regional Centre Italian Regional Centre Tier 0 Spanish Regional Centre (PIC) ~9 Pb/year/T 1 No simulation ¨ T 0 ~5 MSI 2 k ¨ ¨ UK Regional Centre (RAL) ¨ ~7. 7 MSI 2 k/T 1 ~2 Pb/year/T 1 622 Mb/s Tier 2 Northern Tier ~200 k. SI 2 k Tier 2 Centre ~200 k. SI 2 k ¨ ~200 Tb/year/T 2 622 Mb/s Lancaster Liverpool Manchester Sheffield ~0. 25 TIPS Physics data cache IMPF-2006 Workstations 100 - 1000 MB/s Each Tier 2 has ~25 physicists working on one or more channels Each Tier 2 should have the full AOD, TAG & relevant Physics Group summary data Desk Tier PH-ATC G. Poulard - CERN 2 do bulk of simulation top 33
ATLAS Data Challenges (1) q LHC Computing Review (2001) “Experiments should carry out Data Challenges of increasing size and complexity to validate their Computing Model their Complete Software suite their Data Model to ensure the correctness of the technical choices to be made” IMPF-2006 G. Poulard - CERN PH-ATC 34
ATLAS Data Challenges (2) q DC 1 (2002 -2003) o First ATLAS exercise on world-wide scale Ø o Put in place the full software chain Ø o O(1000) CPUs peak Simulation of the data; digitization; pile-up; reconstruction Production system Ø Tools • o “Preliminary” Grid usage Ø Ø Ø o Many people involved Lessons learned Ø Ø o Nordu. Grid: all production performed on the Grid US: Grid used at the end of the exercise LCG-EDG: some testing during the Data Challenge but not “real” production At least one person per contributing site Ø o Bookkeeping of data and Jobs (~AMI); Monitoring; Code distribution Management of failures is a key concern Automate to cope with large amount of jobs “Build” the ATLAS DC community q Physics Monte Carlo data needed for ATLAS High Level Trigger Technical Design Report IMPF-2006 G. Poulard - CERN PH-ATC 35
ATLAS Data Challenges (3) q DC 2 (2004) o Similar exercise as DC 1 (scale; physics processes) BUT o Introduced the new ATLAS Production System (Prod. Sys) Ø Ø Unsupervised production across many sites spread over three different Grids (US Grid 3; ARC/Nordu. Grid; LCG-2) Based on DC 1 experience with At. Com and GRAT • • • Ø Production supervisor Executor Common data management system Common production database • Ø Core engine with plug-ins Avoid inventing ATLAS’s own version of Grid – Use middleware broker, catalogs, information system, … 4 major components Use middleware components as much as possible q Immediately followed by “Rome” production (2005) o Production of simulated data for an ATLAS Physics workshop in Rome in June 2005 using the DC 2 infrastructure. IMPF-2006 G. Poulard - CERN PH-ATC 36
ATLAS Production System q ATLAS uses 3 Grids o o o LCG (= EGEE) ARC/Nordu. Grid (evolved from EDG) OSG/Grid 3 (US) q Plus possibility for local batch submission (4 interfaces) q Input and output must be accessible from all Grids q The system makes use of the native Grid middleware as much as possible (e. g. . Grid catalogs); not “re-inventing” its own solution. IMPF-2006 G. Poulard - CERN PH-ATC 37
ATLAS Production System q The production database, which contains abstract job definitions q A supervisor (Windmill; Eowyn) that reads the production database for job definitions and present them to the different Grid executors in an easy-toparse XML format q The Executors, one for each Grid flavor, that receives the job-definitions in XML format and converts them to the job description language of that particular Grid q Don. Quijote (DQ), the ATLAS Data Management System, moves files from their temporary output locations to their final destination on some Storage q In order to handle the task of ATLAS DCs Elements and registers the files in the an automated Production system was developed. Replica Location Service of that Grid q It consists of 4 components: IMPF-2006 G. Poulard - CERN PH-ATC 38
AT Au LA tu S D mn C 20 2 04 The 3 Grid flavors: LCG-2 Number of sites; resources are evolving quickly IMPF-2006 G. Poulard - CERN PH-ATC 39
AT Au LA tu S D mn C 20 2 04 The 3 Grid flavors: Grid 3 Sep 04 • 30 sites, multi-VO • shared resources • ~3000 CPUs (shared) q The deployed infrastructure has been in operation since November 2003 q At this moment running 3 HEP and 2 Biological applications q Over 100 users authorized to run in GRID 3 IMPF-2006 G. Poulard - CERN PH-ATC 40
AT Au LA tu S D mn C 20 2 04 The 3 Grid flavors: Nordu. Grid q Nordu. Grid is a research collaboration established mainly across Nordic Countries but includes sites from other countries. q They contributed to a significant part of the DC 1 (using the Grid in 2002). q It supports production on several operating systems. • > 10 countries, 40+ sites, ~4000 CPUs, • ~30 TB storage IMPF-2006 G. Poulard - CERN PH-ATC 41
Production phases Bytestream Raw Digits Events Hep. MC Pythia Events Hep. MC Geant 4 Digits (RDO) MCTruth Hits MCTruth Digitization Geant 4 Hits MCTruth Digitization Digits (RDO) MCTruth Geant 4 Hits MCTruth Pile-up Digits (RDO) MCTruth Bytestream Raw Digits Mixing Events Hep. MC ~5 TB Event generation IMPF-2006 Physics events Hits MCTruth Pile-up 20 TB Detector Simulation Min. bias Events Digits (RDO) MCTruth Digitization (Pile-up) Piled-up events ESD AOD Bytestream Raw Digits Reconstruction Bytestream Raw Digits Mixing Geant 4 Reconstruction Bytestream Raw Digits 20 TB 30 TB Byte stream Mixed events Event Mixing Volume of data With G. Poulard - CERN PH-ATC for 107 events Mixed events Pile-up 5 TB Reconstruction Persistency: TB Athena-POOL 42
ATLAS productions q DC 2 o Few datasets o Different type of jobs Ø q “Rome” o Many different (>170) datasets Physics Events Generation • Very short Ø Digitization • Medium: ~5 hours Ø Reconstruction • short o o o Different physics channels Same type of jobs Ø Geant simulation • Geant 3 in DC 1; Geant 4 in DC 2 & “Rome” • Long: more than 10 hours Ø Ø Event Generation; Simulation, etc. All type of jobs run in parallel q Now “continuous” production o Goal is to reach 2 M events per week. All types of jobs run sequentially Ø Each phase one after the other The different type of running has a large impact on the production rate IMPF-2006 G. Poulard - CERN PH-ATC 43
ATLAS Productions: countries (sites) q q q Australia (1) (0) Austria (1) Canada (4) (3) CERN (1) Czech Republic (2) Denmark (4) (3) France (1) (4) Germany (1+2) Greece (0) (1) Hungary (0) (1) Italy (7) (17) Japan (1) (0) q q q q Netherlands (1) (2) Norway (3) (2) Poland (1) Portugal (0) (1) Russia (0) (2) Slovakia (0) (1) Slovenia (1) DC 2: 20 countries; 69 sites Spain (3) Sweden (7) (5) “Rome”: 22 countries; 84 sites Switzerland (1) (1+1) Taiwan (1) UK (7) (8) USA (19) DC 2: 13 countries; 31 sites “Rome”: 17 countries; 51 sites DC 2: 7 countries; 19 sites “Rome”: 7 countries; 14 sites IMPF-2006 Spring 2006: 30 countries; 126 sites LCG: 104 OSG/Grid 3: 8 G. Poulard - CERN PH-ATC NDGF: 14 44
ATLAS DC 2: Jobs Total 4 r m 0 f 3 e ov be 0 20 N so A 20 countries 69 sites ~ 260000 Jobs ~ 2 MSi 2 k. months IMPF-2006 G. Poulard - CERN PH-ATC 45
Rome production Number of Jobs 4% 5 00 7 s A 1 of e 2 n Ju 5% 6% 4% 6% 6% 5% 5% IMPF-2006 G. Poulard - CERN PH-ATC 46
Rome production statistics q 173 datasets q 6. 1 M events simulated and reconstructed (without pileup) q Total simulated data 8. 5 M events q Pile-up done for 1. 3 M events o 50 K reconstructed IMPF-2006 G. Poulard - CERN PH-ATC 47
ATLAS Production (2006) IMPF-2006 G. Poulard - CERN PH-ATC 48
ATLAS Production (July 2004 - May 2005) IMPF-2006 G. Poulard - CERN PH-ATC 49
ATLAS & Service Challenges 3 q. Tier-0 scaling tests Test of the operations at CERN Tier-0 o Original goal: 10% exercise o Ø Preparation phase July-October 2005 Ø Tests October’ 05 -January’ 06 IMPF-2006 G. Poulard - CERN PH-ATC 50
ATLAS & Service Challenges 3 q The Tier-0 facility at CERN is responsible for the following operations: o Calibration and alignment; o First-pass ESD production; o First-pass AOD production; o TAG production; o Archiving of primary RAW and first-pass ESD, AOD and TAG data; o Distribution of primary RAW and first-pass ESD, AOD and TAG data. IMPF-2006 G. Poulard - CERN PH-ATC 51
ATLAS SC 3/Tier-0 (1) q Components of Tier-0 o Castor mass storage system and local replica catalogue; o CPU farm; o Conditions DB; o TAG DB; o Tier-0 production database; o Data management system, Don Quijote 2 (DQ 2) o To be orchestred by the Tier-0 Management System: Ø IMPF-2006 TOM, based on ATLAS Production System (Prod. Sys) G. Poulard - CERN PH-ATC 52
ATLAS SC 3/Tier-0 (2) q Deploy and test o o LCG/g. Lite components (main focus on T 0 exercise) Ø FTS server at T 0 and T 1 Ø LFC catalog at T 0, T 1 and T 2 T 0 Ø VOBOX at T 0, T 1 and T 2 T 0 Ø SRM Storage element at T 0, T 1 and T 2 T 0 ATLAS DQ 2 specific components Ø Central DQ 2 dataset catalogs Ø DQ 2 site services • Sitting in VOBOXes Ø IMPF-2006 DQ 2 client for TOM G. Poulard - CERN PH-ATC 53
ATLAS Tier-0 tape RAW ESD RAW AODm ESD (2 x) RAW 1. 6 GB/file 0. 2 Hz 17 K f/day 320 MB/s 27 TB/day 0. 44 Hz 37 K f/day 440 MB/s EF AODm (10 x) 1 Hz 85 K f/day 720 MB/s T 1 T 1 castor 2. 24 Hz 170 K f/day (temp) 20 K f/day (perm) 140 MB/s 0. 4 Hz 190 K f/day 340 MB/s RAW ESD AODm 0. 5 GB/file 0. 2 Hz 17 K f/day 100 MB/s 8 TB/day AOD 10 MB/file 2 Hz 170 K f/day 20 MB/s 1. 6 TB/day 500 MB/file 0. 04 Hz 3. 4 K f/day 20 MB/s 1. 6 TB/day CPU IMPF-2006 G. Poulard - CERN PH-ATC 54
Scope of the Tier-0 Scaling Test q It was only possible to test o EF writing into Castor o ESD/AOD production on reco farm o archiving to tape o export to Tier-1 s of RAW/ESD/AOD q the goal was to test as much as possible, as realistic as possible q mainly data-flow/infrastructure test (no physics value) ü calibration & alignment processing not included yet ü Cond. DB and Tag. DB streams IMPF-2006 G. Poulard - CERN PH-ATC 55
Oct-Dec 2005 Test: Some Results Castor Writing Rates (Dec 19 -20) - EF farm Castor (write. raw) - reco farm Castor - reco jobs: write. esd + write. aodtmp - AOD-merging jobs: write. aod IMPF-2006 G. Poulard - CERN PH-ATC 56
Tier-0 Internal Test, Jan 28 -29, 2006 READING (nom. rate: 780 MB/s) - Disk WN - Disk Tape WRITING (nom. rate: 460 MB/s) - SFO Disk - WN Disk 780 M 460 M 440 M WRITING (nom. rate: 440 MB/s) - Disk Tape IMPF-2006 G. Poulard - CERN PH-ATC 57
ATLAS SC 4 Tests (June to December 2006) q Complete Tier-0 test o Internal data transfer from “Event Filter” farm to Castor disk pool, Castor tape, CPU farm o Calibration loop and handling of conditions data Ø Including distribution of conditions data to Tier-1 s (and Tier-2 s) o o Transfer of AOD and TAG data to Tier-2 s o q Transfer of RAW, ESD, AOD and TAG data to Tier-1 s Data and dataset registration in DB Distributed production o Full simulation chain run at Tier-2 s (and Tier-1 s) Ø o Reprocessing raw data at Tier-1 s Ø q Data distribution to Tier-1 s, other Tier-2 s and CAF Need to define and test Tiers infrastructure and Distributed analysis Tier-1 Data distribution to other Tier-1 s, Tier-2 s and CAF Tier-2 s o “Random” job submission accessing data at Tier-1 s (some) and Tier-2 s (mostly) o Tests of performance of job submission, distribution and output retrieval associations IMPF-2006 G. Poulard - CERN PH-ATC 58
ATLAS Tier-1 s “ 2008” Resources CPU MSI 2 K Disk % PB Tape % PB % Canada TRIUMF 1. 06 4. 4 0. 62 4. 3 0. 4 4. 4 France CC-IN 2 P 3 3. 02 12. 6 1. 76 12. 2 1. 15 12. 8 Germany FZK 2. 4 10 1. 44 10 0. 9 10 Italy CNAF 1. 76 7. 3 0. 8 5. 5 0. 67 7. 5 1. 46 6. 1 0. 62 4. 3 0. 62 6. 9 3. 05 12. 7 1. 78 12. 3 1. 16 12. 9 1. 2 5 0. 72 5 0. 45 5 Nordic Data Grid Facility Netherlands SARA Spain PIC Taiwan ASGC 1. 87 7. 8 0. 83 5. 8 0. 71 7. 9 UK RAL 1. 57 6. 5 0. 89 6. 2 1. 03 11. 5 USA BNL 5. 3 22. 1 3. 09 21. 4 2. 02 22. 5 Total 2008 pledged 22. 69 94. 5 12. 55 87 9. 11 101. 4 2008 needed 23. 97 100 14. 43 100 8. 99 100 2008 missing 1. 28 5. 5 1. 88 13 -0. 12 -1. 4 IMPF-2006 G. Poulard - CERN PH-ATC 59
ATLAS Tiers Association (SC 4 -draft) Associated Tier-1 % Disk Canada TRIUMF 5. 3 France CC-IN 2 P 3 13. 5 Germany FZK-Grid. Ka 10. 5 Italy CNAF 7. 5 Netherlands SARA 13. 0 Nordic Data Grid Facility Tier-2 or planned Tier-2 TB % PB % West T 2 Fed. BNL East T 2 Fed. CC-IN 2 P 3 AF Romanian T 2 GRIF LPC HEP-Beijing BNL DESY Munich Fed. Polish T 2 Fed. Wuppertal Uni. RAL FZU AS (CZ) INFN T 2 Fed. Freiburg Uni. South. Grid SARA TRIUMF ASGC 5. 5 PIC Spain PIC 5. 5 NDGF Taiwan ASGC 7. 7 SARA ATLAS T 2 Fed Taiwan AF Fed UK RAL 7. 5 CNAF Grid London North. Grid Scot. Grid USA BNL 24 BU/HU T 2 Midwest T 2 Southwest T 2 Melbourne Uni. ICEPP Tokyo Russian Fed. CSCS (CH) No association (yet) IMPF-2006 CC-IN 2 P 3 FZK-Grid. Ka G. Poulard - CERN PH-ATC LIP T 2 HEP-IL Fed. UIBK Brazilian T 2 Fed. 60
Computing System Commissioning q We have defined the high-level goals of the Computing System Commissioning operation during 2006 o More a running-in of continuous operation than a stand-alone challenge q Main aim of Computing System Commissioning will be to test the software and computing infrastructure that we will need at the beginning of 2007: o Calibration and alignment procedures and conditions DB o Full trigger chain o Event reconstruction and data distribution o Distributed access to the data for analysis q At the end (autumn-winter 2006) we will have a working and operational system, ready to take data with cosmic rays at increasing rates IMPF-2006 G. Poulard - CERN PH-ATC 61
IMPF-2006 G. Poulard - CERN PH-ATC 62
Conclusions (ATLAS) q Data Challenges (1, 2); productions(“Rome”; “current (continuous)”) o Have proven that the 3 Grids LCG-EGEE; OSG/Grid 3 and Arc/Nordu. Grid can be used in a coherent way for real large scale productions Ø Possible, but not easy q In SC 3 o We succeeded to reach the nominal data transfer at Tier-0 (internally) and reasonable transfers to Tier-1 q SC 4 o Should allow us to test the full chain using the new WLCG middleware and infrastructure and the new ATLAS Production and Data management systems o This will include a more complete Tier-0 test; Distributed productions and distributed analysis tests q Computing System Commissioning o Will have as main goal to have a full working and operational system o Leading to a Physics readiness report IMPF-2006 G. Poulard - CERN PH-ATC 63
Thank you IMPF-2006 G. Poulard - CERN PH-ATC 64