Скачать презентацию The ROOT Project in the multi-core CPU era Скачать презентацию The ROOT Project in the multi-core CPU era

92816eac9a024cb6f99ad1ee1854d206.ppt

  • Количество слайдов: 43

The ROOT Project in the multi-core CPU era • CHEP 06, Mumbai • 15 The ROOT Project in the multi-core CPU era • CHEP 06, Mumbai • 15 February 2006 • René Brun • CERN

Plan of talk • ROOT: 11 years old !! • Still many developments • Plan of talk • ROOT: 11 years old !! • Still many developments • Multi Core cpus: parallelism • ROOT, Software Obesity and the GRID René Brun, CERN ROOT in the multi-core cpu era 2

ROOT: a long story • Started in January 1995. ROOT had to face many ROOT: a long story • Started in January 1995. ROOT had to face many sociological obstacles at a time when most users were changing experiments, languages and lost in many fights. “Every problem has its root in failure of a relationship” (The Times of India Tuesday 14 February) • This initial opposition has been a key element for the success of the project. By spotting the inevitable weaknesses of some early designs, it forced the team to react quickly. The development method involving more and more users has been essential to get feedback. Designing a large system like ROOT is an iterative process. This process has involved many people in many experiments. • ROOT is now strongly supported at CERN and FNAL. Many thanks to the management and my colleagues in the LCG project for facilitating a convergent process. René Brun, CERN ROOT in the multi-core cpu era 3

ROOT project: some numbers • The ROOT project is comparable in size and complexity ROOT project: some numbers • The ROOT project is comparable in size and complexity to the software of each LHC experiment. See, for instance, the evaluation by the sloccount tool Total Physical Source Lines of Code (SLOC) = 1, 709, 170 Development Effort Estimate, Person-Years (Months) = 495. 97 (5, 951. 63) Schedule Estimate, Years (Months) = 5. 66 (67. 97) Estimated Average Number of Developers = 87. 57 Total Estimated Cost to Develop = $ 66, 998, 665 • sloccount by John Wheeler assumes (Basic COCOMO model, Person-Months = 2. 4 * (KSLOC**1. 05)) (Basic COCOMO model, Months = 2. 5 * (person-months**0. 38)) (average salary = $56, 286/year, overhead = 2. 40). René Brun, CERN ROOT in the multi-core cpu era 4

ROOT person power Only people working full time on the project René Brun, CERN ROOT person power Only people working full time on the project René Brun, CERN ROOT in the multi-core cpu era CERN + FNAL 5

Presentations about ROOT & co at CHEP 06 98 - PROOF - The Parallel Presentations about ROOT & co at CHEP 06 98 - PROOF - The Parallel ROOT Facilit Distributed Data Analysis - Monday 13 February 15: 00 Presenter: GANIS, Gerardo (CERN) Xxx Recent Developments in the ROOT I/O and TTrees Software Components and Libraries - Monday 13 February 16: 00 Presenter: Dr. Canal, Philippe (FNAL) 187 - ROOT GUI, General Status Software Tools and Information Systems - Monday 13 February 16: 40 Presenter: RADEMAKERS, Fons (CERN) 227 - New Developments of ROOT Mathematical Software Libraries Software Components and Libraries - Tuesday 14 February 16: 00 Presenter: Dr. MONETA, Lorenzo (CERN) 188 - From Task Analysis to the Application Design Software Tools and Information Systems - Monday 13 February 17: 00 Presenter: Mr. RADEMAKERS, Fons (CERN) 383 - New features in ROOT geometry modeller for representing non-ideal geometries Software Components and Libraries - Wednesday 15 February 14: 00 Presenter: CARMINATI Federico (CERN) 129 - ROOT I/O for SQL databases Software Components and Libraries - Monday 13 February 17: 40 Presenter: Dr. LINEV, Sergey (GSI DARMSTADT) 93 – ROOT 3 D graphics Software Components and Libraries - Wednesday 15 February 16: 00 Presenter: BRUN, Rene (CERN) 185 - Reflex, reflection for C++ Software Components and Libraries - Tuesday 14 February 14: 00 Presenter: Dr. ROISER, Stefan (CERN) 92 - ROOT 2 D graphics visualisation techniques Poster - Monday 13 February 11: 00 91 - ROOT 3 D graphics overview and examples Poster - Monday 13 February 11: 00 189 - Recent User Interface Developments Poster - Monday 13 February 11: 00 186 - ROOT/CINT/Reflex integration Poster - Monday 13 February 11: 00 René Brun, CERN 407 - Performance and Scalbility of xrootd Distributed Data Analysis - Wednesday 15 February 17: 00 Presenter: HANUSHEVSKY, Andrew (Stanford Linear Accelerator Center) 228 - The structure of the new ROOT Mathematical Software Libraries Poster - Wednesday 15 February 09: 00 249 - Xrd. Sec - A high-level C++ interface for security services in client-server applications Poster - Wednesday 15 February 09: 00 408 - xrootd Server Clustering Poster - Wednesday 15 February 09: 00 ROOT in the multi-core cpu era 6

Multi Core cpus Impact on ROOT Multi Core cpus Impact on ROOT

Multi Core CPUs http: //www. intel. com/technology/computing/archinnov/platform 2015/ This is going to affect the Multi Core CPUs http: //www. intel. com/technology/computing/archinnov/platform 2015/ This is going to affect the evolution of ROOT in many areas René Brun, CERN ROOT in the multi-core cpu era 8

Moore’s law revisited Your laptop in 2016 with 32 processors 16 Gbytes RAM 16 Moore’s law revisited Your laptop in 2016 with 32 processors 16 Gbytes RAM 16 Tbytes disk > 50 today’s laptop René Brun, CERN ROOT in the multi-core cpu era 9

Impact on ROOT • There are many areas in ROOT that can benefit from Impact on ROOT • There are many areas in ROOT that can benefit from a multi core architecture. Because the hardware is becoming available on commodity laptops, it is urgent to implement the most obvious asap. • Multi-Core often implies multi-threading. There are several areas to be made not only thread-safe but also thread aware. • PROOF obvious candidate. By default a ROOT interactive session should run in PROOF mode. It would be nice if this could be made totally transparent to a user. • Speed-up I/O with multi-threaded I/O and read-ahead • Buffer compression in parallel • Minimization function in parallel • Interactive compilation with ACLIC in parallel • etc. . René Brun, CERN ROOT in the multi-core cpu era 10

CPU/Node hierarchy Laptop node Local cluster GRID(s) 1 ->32 ->? ? N cpus 1000 CPU/Node hierarchy Laptop node Local cluster GRID(s) 1 ->32 ->? ? N cpus 1000 x. N cpus 100 x 1000 nodes latency 100 nanos 100 micros 100 millis Batch jobs pushed to the GRID Maximum number of jobs run in one week/month Interactive jobs run on the laptop and use processors on the GRID Real Time important for short/medium queries Analysis mainly on laptop and ONE cluster on the GRID René Brun, CERN ROOT in the multi-core cpu era 11

Software Obesity Use local power as much as possible. Can we simplify software installation Software Obesity Use local power as much as possible. Can we simplify software installation on the GRID? A proposal

Observations • A considerable amount of time is spent in installing software (up to Observations • A considerable amount of time is spent in installing software (up to one day for an expert). • Porting to a new platform is non trivial. • Dependency problems in case many packages must be installed. • Only a small subset of the software is used. • The installation may require a huge amount of disk space. Users are scared to download a new version. • This is not fitting well with the GRID concept. • The GRID should be used to simplify this process and not to make it more complex. René Brun, CERN ROOT in the multi-core cpu era 13

LHC software Alice Atlas CMS ROOT number of lines in header files 102282 698208 LHC software Alice Atlas CMS ROOT number of lines in header files 102282 698208 104923 153775 classes total 1815 8910 ? ? ? 1500 classes in dict 1669 >4120 2140 835 1422 lines in dict 479849 ? ? ? 103057 698000 classes c++ lines 577882 1524866 277923 857390 total lines Classes+dict 1057731 ? ? ? 380980 1553390 total f 77 lines 736751 928574 ? ? ? 3000 directories 540 19522 <500 958 comp time 25’ 750’ 90’ 30’ lines compiled/s 1196 50 (70) 71 863 René Brun, CERN ROOT in the multi-core cpu era 14

René Brun, CERN ROOT in the multi-core cpu era 15 René Brun, CERN ROOT in the multi-core cpu era 15

Source of inefficiencies with Shared Libs • f. PIC (Position Independent Code) introduces a Source of inefficiencies with Shared Libs • f. PIC (Position Independent Code) introduces a 20 per cent degradation (10 to 30%) • In case of many shared libs, the percentage of classes and code used is small =>swapping (20%) • Because shared libs are generated for maximum portability, one cannot use the advanced features of the local processor when compiling. The same optimization level is used everywhere • But a very large fraction of the code does not need to be optimized: no gain at execution, big loss when compiling • A small fraction of the code should be compiled with the highest possible optimization (10%) • May be a factor 2 loss !!! René Brun, CERN ROOT in the multi-core cpu era 16

Shared Libs vs Archive Libs • In the Fortran era, often one subroutine/file • Shared Libs vs Archive Libs • In the Fortran era, often one subroutine/file • Loader takes only the subroutines really referenced. However the percentage of referenced but not used code has increased with time. • Shared libs were efficient at a time when code could be shared between different tasks on time sharing systems. • Shared libs have solved partially the link time problem. • Shared libs are not a solution for the long term. • Archive libs are unusable in a large system, but nice to build static modules • What to do ? René Brun, CERN ROOT in the multi-core cpu era 17

Shared lib size in bytes Fraction of ROOT code really used in a batch Shared lib size in bytes Fraction of ROOT code really used in a batch job René Brun, CERN ROOT in the multi-core cpu era 18

Fraction of ROOT code really used in a job with graphics René Brun, CERN Fraction of ROOT code really used in a job with graphics René Brun, CERN ROOT in the multi-core cpu era 19

Fraction of code really used in one program %functions used %classes used René Brun, Fraction of code really used in one program %functions used %classes used René Brun, CERN ROOT in the multi-core cpu era 20

memory We are waisting a lot of time in writing/reading. o or. so files memory We are waisting a lot of time in writing/reading. o or. so files to/from disk Cint 10000 l/s c++ 800 l/s *. cxx, *. h 100 Mb René Brun, CERN ld *. o 110 Mb ROOT in the multi-core cpu era myapp *. so 76 Mb 21

Proposal for a new scenario Introducing BOOT A Software Bootstrap system René Brun, CERN Proposal for a new scenario Introducing BOOT A Software Bootstrap system René Brun, CERN ROOT in the multi-core cpu era 22

What is BOOT? • A small system to facilitate the life of many users What is BOOT? • A small system to facilitate the life of many users doing mainly data analysis with ROOT and their own classes (users + experiment). • It is a very small subset of ROOT (5 to 10 per cent) • The same idea could be extended to other domains, like simulation and reconstruction. R O O BOOT René Brun, CERN T ROOT in the multi-core cpu era 23

What is BOOT? • A small, easy to install, standalone executable module ( < What is BOOT? • A small, easy to install, standalone executable module ( < 5 Mbytes) • One click in the web browser • It must be a stable system that can cope with old and new versions of other packages including ROOT itself. • It will include: • • A subset of ROOT I/O, network and Core classes A subset of Reflex A subset of CINT (could also have a python flavor) Possibly a GUI object browser • From the BOOT GUI or command line, the referenced software (URL) will be automatically downloaded and locally compiled/cached in a transparent way. René Brun, CERN ROOT in the multi-core cpu era 24

BOOT and existing applications • BOOT must be able to run with the existing BOOT and existing applications • BOOT must be able to run with the existing codes, may be with reduced possibilities. • In the next slides, a few use cases to illustrate the ideas. • Do not take the syntax as a final word. René Brun, CERN ROOT in the multi-core cpu era 25

BOOT: Use Case 1 • Assumes BOOT already installed on your machine user@xxx. yyy. BOOT: Use Case 1 • Assumes BOOT already installed on your machine user@xxx. yyy. zzz • Nothing else on the machine , except the compiler (no ROOT, etc) • Import a ROOT file containing histograms, Trees and other classes (usecase 1. root) • Browse contents of file • Draw an histogram R O O BOOT René Brun, CERN T ROOT in the multi-core cpu era 26

Use Case 1 Usecase 1. root (2 Mbytes) Contains references (URL) to classes in Use Case 1 Usecase 1. root (2 Mbytes) Contains references (URL) to classes in namespace ROOT Local cache with the source of the classes really used + binaries for the classes or functions that are automatically generated from the interpreter (like ACLIC mechanism) user@xxx. yyy. zzz René Brun, CERN http: //root. cern. ch/coderoot This is a compressed ROOT file containing the full ROOT source tree automatically built from CVS (25 Mbytes) + ROOT classes dictionary DS generated by Reflex (5 Mbytes) + The full classes documentation Objects generated by the source parser (5 Mbytes) pcroot@cern. ch ROOT in the multi-core cpu era 27

Use Case 1 pictures usecase 1. root code. root René Brun, CERN ROOT in Use Case 1 pictures usecase 1. root code. root René Brun, CERN ROOT in the multi-core cpu era 28

Use Case 2 • BOOT already installed • Want to write the shortest possible Use Case 2 • BOOT already installed • Want to write the shortest possible program using some classes in namespace ROOT and some classes from another namespace YYYY //This code can be interpreted line by line //executed as a script or compiled with C/C++ //after corresponding code generation use ROOT, YYYY=http: //cms. cern. ch/packages/yyyy h = new TH 1 F(“h’, ”example”, 100, 0, 1); v = new Lorentz. Vector(…. ); gener = new my. Class(v. x()); h. Fill(gener. Something()); h. Draw(); René Brun, CERN ROOT in the multi-core cpu era 29

Use Case 3 • A variant of Use Case 2 • A bug has Use Case 3 • A variant of Use Case 2 • A bug has been found in class Lorentz. Vector of ROOT and fixed in new version ROOT 6 use ROOT, YYYY=http: //cms. cern. ch/packages/yyyy use ROOT 6=http: //root. cern. ch/root 6/code. root use ROOT 6: : Lorentz. Vector h = new TH 1 F(“h’, ”example”, 100, 0, 1); v = new Lorentz. Vector(…. ); gener = new my. Class(v. x()); h. Fill(gener. Something()); René Brun, CERN ROOT in the multi-core cpu era 30

Use Case 4 • High Level ROOT Selector understanding named collections in memory (ROOT, Use Case 4 • High Level ROOT Selector understanding named collections in memory (ROOT, STL) or collections in ROOT files. use ROOT use ATLFAST=http: //atlas. cern. ch/atlfastcode. root TFile f(“mcrun. root”); for each entry in f. Tree for each electron in Electrons h. Fill(electron. m_Pt); h. Draw René Brun, CERN ROOT in the multi-core cpu era 31

Use Case 5: Event Displays • In general, Event Displays require the full experiment Use Case 5: Event Displays • In general, Event Displays require the full experiment infrastructure (Pacific, Obelix, Wonder. Land, Crocodile). • This is complex and not good for users and OUTREACH. • A data file with the visualization scripts is far more powerful • This implies that the GUI must be fully scriptable This is the case for ROOT GUI. René Brun, CERN ROOT in the multi-core cpu era Event data in a Tree C++ scripts 32

Requirements: work to do • lib. Core has already all the infrastructure for clientserver Requirements: work to do • lib. Core has already all the infrastructure for clientserver communications and for accessing remote files on the GRID. • We must understand how to use subsets of the compilers and linkers to bypass disk I/O. • We must understand how to emulate a dynamic linker using pre-compiled objects in memory. • We have to investigate various code generation tools and the coupling with an extended version of CINT (and possibly python). • We must understand how to use the STL functionality without its penalty. Dynamic templates are also necessary. René Brun, CERN ROOT in the multi-core cpu era 33

Procedure • These are just ideas. Making a firm proposal requires more investigations and Procedure • These are just ideas. Making a firm proposal requires more investigations and prototyping. • It must be clear that the top priority is the consolidation of ROOT to be ready for LHC data taking. This should not be an excuse to not look forward. • This work will continue as a background activity. René Brun, CERN ROOT in the multi-core cpu era 34

Conclusions • After more than 10 years of intensive development, the CORE work packages Conclusions • After more than 10 years of intensive development, the CORE work packages are consolidated. • Important developments in PROOF, Math, CINT, Reflex, 3 -D graphics. • All packages must be adapted to a multi-threading environment made necessary by the multi core cpus. • . Instead of pushing gigabytes of source or shared libs to the GRID working nodes, BOOT could greatly optimize and simplify the use of the GRID. BOOT will use a PULL technique to download only the software necessary (source) to run an application and in an incremental way. • Hoping to show a working BOOT at the next CHEP René Brun, CERN ROOT in the multi-core cpu era 35

 • Spare Slides • Spare Slides

“Classic” approach catalog files Batch farm Storage queues query data file splitting jobs my. “Classic” approach catalog files Batch farm Storage queues query data file splitting jobs my. Ana. C merging final analysis submit manager outputs § “static” use of resources § jobs frozen, 1 job / worker node § “manual” splitting, merging § limited monitoring (end of single job) René Brun, CERN G. Ganis, CHEP 06, 15 Feb 2006 ROOT in the multi-core cpu era 37

The PROOF approach catalog files PROOF farm Storage scheduler query PROOF query: data file The PROOF approach catalog files PROOF farm Storage scheduler query PROOF query: data file list, my. Ana. C MASTER feedbacks final outputs (merged) § farm perceived as extension of local PC § more dynamic use of resources § real time feedback § automated splitting and merging René Brun, CERN G. Ganis, CHEP 06, 15 Feb 2006 ROOT in the multi-core cpu era 38

Atlas packages with > 10000 lines 211677 187691 129793 118504 116327 115143 112445 108200 Atlas packages with > 10000 lines 211677 187691 129793 118504 116327 115143 112445 108200 80866 74721 67822 64838 59429 49926 40058 39576 31192 29500 25001 18989 18328 17291 16139 14250 12930 11955 11195 dice atrecon Muon. Spectrometer Tools fortran=211641 fortran=138126, cpp=49354 fortran=121321, python=3715, csh=2613, sh=2136 cpp=67337, ansic=19012, python=13770, sh=7373, yacc=5659, fortran=3024, lex=1971 Physics. Analysis cpp=107348, python=6070, sh=1649, csh=1260 geant 3 fortran=115040, ansic=67 3 million Tile. Calorimeter cpp=108580, python=2209, csh=920, sh=736 atutil fortran=108000, ansic=164 lines of code Applications fortran=71764, cpp=6961, ansic=1865 1200 packages Calorimeter cpp=65917, python=7854, sh=490, csh=460 atlfast fortran=67786 Tracking cpp=60255, python=2092, csh=1380, sh=1104 Generators fortran=28136, cpp=25538, python=4123, sh=872, csh=760 graphics java=40719, cpp=8312, python=321, sh=255, csh=220 Atlas. Test cpp=25159, python=5131, sh=4815, perl=4145, csh=517 Control cpp=22030, python=15904, sh=907, csh=693 Detector. Description ansic=29540, csh=680, sh=562, python=343 Test. Beam cpp=27433, python=1491, csh=320, sh=256 Reconstruction sh=10297, fortran=7559, python=5393, csh=1667 atlsim fortran=17561, cpp=1380 Inner. Detector python=11466, csh=2860, sh=2641, ansic=1343 Simulation python=13653, sh=2126, csh=1302, fortran=169 Database perl=8310, sh=4299, java=2209, csh=709, python=566 Event cpp=13522, python=296, csh=240, sh=192 gcalor fortran=12894 Trigger python=7860, csh=1780, sh=1673, perl=634 LAr. Calorimeter python=6133, ansic=2045, csh=1620, sh=1347 René Brun, CERN ROOT in the multi-core cpu era 39

Alice packages with > 10000 lines 398742 146414 128337 128103 105763 94548 72400 52443 Alice packages with > 10000 lines 398742 146414 128337 128103 105763 94548 72400 52443 51489 50932 46176 41998 39407 35916 31820 27751 27025 26667 24258 21588 20562 18344 15232 13142 12945 10966 10944 10659 PDF PYTHIA 6 HLT ITS MUON DPMJET STEER HBTAN TPC PHOS TRD ISAJET RALICE EMCAL ANALYSIS HERWIG FMD TOF EVGEN HIJING JETAN RAW STRUCT PMD RICH FASTSIM MONITOR ZDC René Brun, CERN fortran=398729, ansic=13 fortran=140748, cpp=5413, ansic=153, pascal=100 cpp=127601, ansic=605, sh=100, csh=31 cpp=128010, sh=93 cpp=105673, sh=90 1. 5 million fortran=94267, cpp=281 lines of code cpp=72400 cpp=51260, fortran=1183 cpp=51479, sh=10 cpp=50639, csh=293 cpp=46176 fortran=40483, cpp=1494, pascal=21 cpp=29764, ansic=9355, sh=288 cpp=35410, fortran=383, csh=123 cpp=31820 fortran=27246, cpp=477, ansic=28 cpp=27021, sh=4 cpp=26667 cpp=24258 fortran=21099, cpp=489 cpp=19687, fortran=875 cpp=18344 cpp=15232 cpp=13142 cpp=12945 cpp=10966 cpp=10944 cpp=10659 ROOT in the multi-core cpu era 40

h. Draw() CINT lib. X 11 ------… lib. Core ------… I/O TSystem … pm h. Draw() CINT lib. X 11 ------… lib. Core ------… I/O TSystem … pm lib. Hist ------… TH 1 TH 2 … René Brun, CERN local mode drawline drawtext … (Plug-in Manager) pm lib. Hist. Painter ------… THist. Painter TPainter 3 DAlgorithms … ROOT in the multi-core cpu era pm lib. Gpad ------… TPad TFrame … pm pm lib. Graf ------… TGraph TGaxis TPave … 41

Problem with STL Inlining • STL containers are very nice. However they have a Problem with STL Inlining • STL containers are very nice. However they have a very high cost in a real large environment. • Compiling code with STL is much slower because of inlining (STL is only in header files). The situation improves a bit with precompiled headers (eg in gcc 4), but not much. • Object modules are bigger • Compiler or linker is able to eliminate duplicate code in ONE object file or shared lib, not across libraries. • If you have 100 shared libs, it is likely that you have the code for std: vector push_back or iterators 100 times! • In-lining is nice if used with care (or toy benchmarks). It may have an opposite effect, generating more cache misses in a real application. • Templates are statically defined and difficult to use in an dynamic interactive environment. René Brun, CERN ROOT in the multi-core cpu era 42

Can we gain with a better packaging? • Yes and no • One shared Can we gain with a better packaging? • Yes and no • One shared lib per class implies more administration, more dictionaries, more dependencies. • 80 shared libs for ROOT is already a lot. 500 would be non sense • A CORE library is essential. However some developers do not like this and penalize/complicate the life of the vast majority of users. • Plug-in Manager helps René Brun, CERN ROOT in the multi-core cpu era 43