Скачать презентацию Perspectives on LHC Computing José M Hernández CIEMAT Скачать презентацию Perspectives on LHC Computing José M Hernández CIEMAT

f8251499ad0487d6095ba3bb5c82fb7d.ppt

  • Количество слайдов: 19

Perspectives on LHC Computing José M. Hernández (CIEMAT, Madrid) On behalf of the Spanish Perspectives on LHC Computing José M. Hernández (CIEMAT, Madrid) On behalf of the Spanish LHC Computing community Jornadas CPAN 2013, Santiago de Compostela

The LHC Computing Challenge § The Large Hadron Collider (LHC) delivered in Run 1 The LHC Computing Challenge § The Large Hadron Collider (LHC) delivered in Run 1 (2010 -2012) billions of recorded collisions to the experiments § ~ 100 PB of data stored at CERN on tape § The Worldwide LHC Computing Grid (WLCG) provides compute and storage resources for data processing, simulation and analysis § ~ 300 k cores, ~200 PB disk, ~200 PB tape § The computing challenge resulted in a great success § Unprecedented data volume analyzed in record time delivering great scientific results (e. g. Higgs boson discovery) José Hernández LHC Computing Perspectives 2

Global effort, global success José Hernández LHC Computing Perspectives 3 Global effort, global success José Hernández LHC Computing Perspectives 3

Computing is part of the global effort Computing José Hernández CMS Computing Upgrade and Computing is part of the global effort Computing José Hernández CMS Computing Upgrade and Evolution 28 October 2013, Seoul, Korea 4

WLCG (initial) computing model § Distributed computing resources managed using Grid technologies that needed WLCG (initial) computing model § Distributed computing resources managed using Grid technologies that needed to be developed § Centers interconnected via private and national high-capacity Ethernet networks All available WLCG § Centers provide mass storage (disk/tape servers) and CPU resources (x 86 CPUs) resources have been § Hierarchical tieredused during intensively structure § Detector data prompt reconstruction and LHC Run 1 calibration at the Tier-0 at CERN § Data intensive processing at Tier-1’s § User analysis and simulation production at Tier-2’s (LHCb only simulation) § Data tape archival at Tier-0 and Tier-1’s § Data caches at Tier-2 s (except LHCb) José Hernández LHC Computing Perspectives 5

ATLAS Computing scale in LHC Run 1 § 150 k slots continuously utilized § ATLAS Computing scale in LHC Run 1 § 150 k slots continuously utilized § ~1. 4 M jobs/day completed § More than 5 GB/s transfer rate worldwide 10 GB/s José Hernández LHC Computing Perspectives 6

CMS Computing scale in LHC Run 1 § ~100 PB transferred between sites § CMS Computing scale in LHC Run 1 § ~100 PB transferred between sites § ~2/3 for data analysis at T 2 s § Resource usage saturation. In 2012: § 70 k slots continuously utilized § ~500 k jobs/day completed José Hernández LHC Computing Perspectives 7

Computing challenges for Run 2 § Computing in LHC Run 1 was very successful Computing challenges for Run 2 § Computing in LHC Run 1 was very successful but Run 2 from 2015 poses new challenges § Increased energy and luminosity delivered by LHC in Run 2 § More complex events to process § Event reconstruction time (CMS ~2 x) § Higher output rate to record § Maintain similar trigger thresholds and sensitivity to Higgs physics and to potential new physics § ATLAS, CMS event rate to storage 2. 5 x § Need a substantial increase of computing resources that we probably cannot afford José Hernández LHC Computing Perspectives 8

Upgrading LHC Computing in LS 1 § The shutdown period is a valuable opportunity Upgrading LHC Computing in LS 1 § The shutdown period is a valuable opportunity to asses § Lessons and operational experiences of Run 1 § Computing demands of Run 2 § The technical and cost evolution of computing § Undertake intensive planning and development to prepare LHC Computing for 2015 and beyond § While sustaining steady state full scale operations § With an assumption of constrained funding § This has been happening internally to the experiments and collaboratively with CERN IT, WLCG, common software and computing projects § Upgrade in parallel to accelerator and detector upgrades to push the frontiers of HEP José Hernández LHC Computing Perspectives 9

Computing strategy for Run 2 § Increase resources in WLCG as much as possible Computing strategy for Run 2 § Increase resources in WLCG as much as possible § Try to conform to constrained budget situation § Make a more efficient and flexible use of the available resources § Reduce CPU and storage needs § Less reprocessing passes, less simulated events, more compact data format, reduce data replication factor § Intelligent dynamic data placement § Automatic replication of hot data and deletion of cold data § Break down the boundaries between the computing tiers § Run reconstruction, simulation and analysis at Tier-1/Tier-2 indistinctly § Tier-1 s extension of the Tier-0 § Keep higher service level and custodial tape storage at Tier-1 § Centralized production of group analysis datasets § Shrink ‘chaotic analysis’ to only what really is user specific § Remove redundancies in processing and storage, reducing operational workloads while improving turnaround for users José Hernández LHC Computing Perspectives 10

Access to new resources for Run 2 § Access to opportunistic resources § HPC Access to new resources for Run 2 § Access to opportunistic resources § HPC clusters, academic or commercial clouds, volunteer computing § Significant increase in capacity with low cost (satisfy capacity peaks) § Use HLT farm for offline data processing § A significant resource (>10 k slots) § During extended periods with no data taking and even inter-fill periods § Adopt advanced architectures § Processing in Run 1 done under Enterprise Linux on x 86 CPUs § Many-core processors, low-power CPUs, GPU environments § Challenging heterogeneous environment § Parallelization of processing application will be key José Hernández LHC Computing Perspectives 11

Computing resources increase § ~25% yearly growth preliminary requests for Run 2 § HS Computing resources increase § ~25% yearly growth preliminary requests for Run 2 § HS 06 Benefit from technology evolution to buy more capacity with same money PB José Hernández LHC Computing Perspectives 12

Processing evolution …but clock speed growth suffered a heat death… Transistor count growth is Processing evolution …but clock speed growth suffered a heat death… Transistor count growth is holding up… § Sustaining throughput growth by replacing ever faster processors with a higher number of cores, co-processors, concurrency features § New environment: high concurrency, modest memory/core, GPUs § Multi-core now many-core soon finer grained parallelism needed § Many or most of our codes require extensive overhauls § Being adapted: geant 4, root, reconstruction code, exp. frameworks José Hernández LHC Computing Perspectives 13

Data Management § Where is LHC in Big Data Terms? Lib of Congress Business Data Management § Where is LHC in Big Data Terms? Lib of Congress Business emails sent 3000 PB/year (Doesn’t count; not managed as a coherent data set) Big Data in 2012 Reputed capacity of NSA’s new Utah data center: 5000 PB (50 -100 MW, $2 billion) We are big… Climate DB LHC data 15 PB/yr Google search 100 PB US Census Facebook uploads 180 PB/year Nasdaq You. Tube 15 PB/yr Digital health 30 PB Current LHC data set, all data products: ~250 PB Wired Magazine 4/2013 José Hernández LHC Computing Upgrade and Evolution 14

Data Management evolution § Data access model during LHC Run 1 § Pre-locate and Data Management evolution § Data access model during LHC Run 1 § Pre-locate and replicate data at sites, send jobs to the data § We need more efficient distributed data handling, lower disk storage demands and better use of available CPU resources § The network has been very reliable and has experimented a large increase in bandwidth § (Aspire to) send only the data you need, only where you need it (and cache it when it arrives) § Towards transparent distributed data access enabled by the network § Industry has been at this approach for years, in content delivery networks § Already successful approaches during Run 1… José Hernández LHC Computing Perspectives 15

Data Management evolution in Run 1 § Scalable access to conditions data § Frontier Data Management evolution in Run 1 § Scalable access to conditions data § Frontier for Scalable Distributed DB Access § Caching web proxies provide hierarchical, highly scalable cache based data access § Experiment software provisioning to the worker nodes § CERNVM File System (CVMFS) § Evolve towards a distributed data federation… José Hernández LHC Computing Perspectives 16

Data Management evolution § Distributed data federation § A collection of disparate storage resources Data Management evolution § Distributed data federation § A collection of disparate storage resources transparently accessible across a wide area via a common namespace (CMS AAA, ATLAS FAX) § Needs efficient remote I/O § CMS has invested heavily in I/O optimizations within the application to allow efficient reading of the data over the (long latency) network using the xrootd technology while maintaining a high CPU efficiency § Extending initial use cases: fallback on local access failure, overflow busy sites, allow interactive access to data, use diskless sites § Interesting approach: ATLAS event service § Ask for exactly what you need, have it delivered by a service that knows how to get it to you efficiently § Return the outputs in a ~steady stream, such that a WN can be lost with little lost processing § Well suited to transient opportunistic resources, volunteer computing where preemption cannot be avoided § Well suited for high-CPU low I/O workflows José Hernández LHC Computing Perspectives 17

From Grid to Clouds § Turning computing into a utility providing infrastructure as a From Grid to Clouds § Turning computing into a utility providing infrastructure as a service § Clouds evolve, complement and extend the Grid § Decrease heterogeneity seen by the user (hardware virtualization) § VMs provide a uniform user interface to resources § Integrate diverse resources manageably § Isolate software from physical hardware § Dynamic provision of resources § New resources (commercial, research clouds) § Huge community behind Cloud software § Grid of clouds already used by LHC exps § Several sites provide Cloud interface § ATLAS ~450 k production jobs from Google over a few weeks § Tests on amazon EC spot pricing ~economically viable José Hernández LHC Computing Perspectives 18

Conclusions § LHC computing performed extremely well at all levels in Run 1 § Conclusions § LHC computing performed extremely well at all levels in Run 1 § We know how to deliver, adapting where necessary § Excellent networks, flexible and adaptable computing models and software systems paid off in exploiting resources § LHC computing needs to face new challenges for LHC Run 2 § Large increase of computing resources required from 2015 § Live within constrained budgets § Use resources we own as fully and efficiently as possible § Support major development program required § Access to opportunistic and cloud resources, explore new computer and processing architectures § Evolve towards dynamic data access & distributed parallel computing § Explosive growth in data and (highly granular) processors in the wider world gives us a powerful ground for success in our evolution path § Evolve towards a more dynamic, efficient and flexible system José Hernández LHC Computing Perspectives 19