8a62afff31e591342278902148ba7ecc.ppt
- Количество слайдов: 49
How to benchmark an HEP worker node Evaluation of HEP worker nodes Michele Michelotto at pd. infn. it Workshop CCR 08 LNGS michele michelotto - INFN Padova
Computing model Lab m Uni x a up o or l gr f id iona gr g re Lab a Tier 1 USA FNAL Tier 3 physics department UK Tier-1 USA Tier-2 France BNL Italy Uni a CERN Lab b CERN Tier 0 Japan Germany Uni n Lab c grid for a physics study group Uni b Desktop Workshop CCR 08 LNGS michele michelotto - INFN Padova 2
Storage easier • Tape Storage: – Very Easy: event size x Number of events Petabyte • Disk Storage – Easy again: events size Terabyte – Gibibyte vs Gigabyte • 1 TB = 1 Tera. Byte = 10004 • 1 Ti. B = 1 Tebi. Byte = 10244 – Raw Terabyte or Raid protected (Raid 5, Raid 6)? Workshop CCR 08 LNGS michele michelotto - INFN Padova 3
But computing power? • Tricky: Event/sec? Which events? Sim or Reco? • We need some kind of benchmark – MIPS (meaningless instruction per second) – VUPS (Vax unit per second) – Cern. Unit – MHz – Spec – SI 2 K…. Workshop CCR 08 LNGS michele michelotto - INFN Padova 4
T 1 + T 2 cpu budget - LHC Workshop CCR 08 LNGS michele michelotto - INFN Padova 5
SI 2 K frozen • SI 2 K is the benchmark used up to now to measure the computing power of all the HEP experiments – Computing power requested by experiments from the TDR – Computing power provided by a Tier-[0, 1, 2] • SI 2 K is the nickname for SPEC CPU Int 2000 benchmark – Came after Spec 89, Spec Int 92 and Spec Int 95 – Declared obsolete by SPEC in 2006 – Replaced by SPEC with CPU Int 2006 Workshop CCR 08 LNGS michele michelotto - INFN Padova 6
Transition problem • Impossible to find SPEC Int 2000 pubblished results for the new processors (e. g. the not so new Clovertown 4 -core) • Impossible to find pubblished SPEC Int 2006 for old processor (before 2006) – E. g. Old P 4 Xeon, P 4, AMD 2 xx • You can’t convert from SI 2000 to SI 2006 – but the ratio for x 86 architecture is in the 137 – 172 range Workshop CCR 08 LNGS michele michelotto - INFN Padova 7
The SI 2 K inflaction • The main problems with SI 2000 in our community: it is not proportional to HEP codes performance (as it was) • You can buy processors with huge SI 2 K number but with a smaller increase in real performances Workshop CCR 08 LNGS michele michelotto - INFN Padova 8
Nominal SI vs real SI • CERN (and FZK) started to use a new currency: SI 2 K measured with “gcc”, the gnu C compiler and using two flavour of optimization – FZK: High tuning: gcc –O 3 –funroll-loops– march=$ARCH – CERN: Low tuning: gcc –O 2 –f. PIC –pthread Workshop CCR 08 LNGS michele michelotto - INFN Padova 9
FZK Measurement • In 2001 SPEC with gcc was 80% of the average pubblished data • In 2006 the gap was much wider Workshop CCR 08 LNGS michele michelotto - INFN Padova 10
Nominal SI vs real SI • FZK uses for tender SI 2 K with FZK tuning (gcc-high) and add 25% to “normalize” to year 2001 • CERN and FZK Proposal to WLCG: use SI 2 K with CERN tuning (gcc-low) and add 50% to normalize • Run n copies in parallel – Where n is the number of cores in the worker node – To take in account the drop in performance of a multicore machine when fully loaded. Workshop CCR 08 LNGS michele michelotto - INFN Padova 11
Too many SI 2 K • Take as an example a worker node with two Intel Woodcrest dual core 5160 at 3. 06 GHz • SI 2 K nominal: 2929 – 3089 (min – max) • SI 2 K sum on 4 cores: 11716 - 12536 • SI 2 K gcc-low: 5523 • SI 2 K gcc-high: 7034 • SI 2 K gcc-low + 50%: 8284 Workshop CCR 08 LNGS michele michelotto - INFN Padova 12
Wrong way • Old way: – Take the measurement of SI 2000 you prefer from SPEC (or an average) and multiply by number of cores in your farm • Other variations: – Take SI 2000 with gcc on one core and multiply by number of cores – Take SI 2000 rate Workshop CCR 08 LNGS michele michelotto - INFN Padova 13
WLCG SI 2 K How-to • Run SI 2000 with gcc 3, 32 bit, with CERN flags – gcc –O 2 –f. PIC –pthread –m 32 • Run N copies of this SI 2000 in parallel as the N number of cores • Sum all the results • Add 50% – This is the SI 2 K of one machine • Sum over all the machines Workshop CCR 08 LNGS michele michelotto - INFN Padova 14
Exercise • Compute the WLCG official rating of a farm with 224 Dell Blade M 1000 e 2 x 5420 – Number of cores/server: 8 – SI 2 K gcc-low: 10218 – 10218 * 224 = 2289000 – Total SI 2 K: 2289 k. SI 2 k + 50%: –Total WLCG SI 2 K: 3433 k. SI 2 k Workshop CCR 08 LNGS michele michelotto - INFN Padova 15
What a mess • SI 2 K is easy to measure but is maintained any more • How to ask a vendor to measure with SI 2 K if he can’t buy it? • Is SI 2006 the right substitute? • Or SI 2006 rate? • Or Spec FP 2006? Workshop CCR 08 LNGS michele michelotto - INFN Padova 16
CMS sw SIM and Pythia • CMS Montecarlo simulation (32 bit) and Pythia (64 bit) show the same performance once normalized • Both Specint 2006 pubblished and Specint 2006 with gcc show the same behaviour • SI 2 K pubbished does not match HEP sw • SI 2 K cern better but not as good as SI 2006 Workshop CCR 08 LNGS michele michelotto - INFN Padova 17
Babar Tier. A Results Workshop CCR 08 LNGS • If you normalize by core and clock all new processors have the same performance • Doubling the older generation cpu • SI 2006 matches this pattern (pubblished and gcc ratio constant) • SI 2000 -gcc better than SI 2 K nominal • SI 2000 clearly doesn’t work michele michelotto - INFN Padova 18
Atlas • • • Workshop CCR 08 LNGS michele michelotto - INFN Padova Here 100% is Xeon 5160 Few results for SI 2006+gcc but no diff from CMS and babar Few results also from SI 2006 pubblished because of several old architectures SI 2 K+gcc not bad SI 2 K pubblished heavily overstimate new Xeon Atlas simulation normalized performs the same on the new intel “core” or amd “opteron” (like CMS, Babar) 19
Many gaps • Easy to find SPEC pubblished result – But only for new machines • Difficult to measure: – Not easy to have machine on loan from Server reseller or producer – Not easy to borrow machine from colleagues – Always for short periods of time – A SPEC run can last 15 -20 hours • Need a set of dedicated worker node to make SPEC and HEP application measurement Workshop CCR 08 LNGS michele michelotto - INFN Padova 20
HEPIX group • A group with people from the major lab (CERN, FZK, DESY, RAL, INFN, JLAB, TRIUMF) after IHEPCCC request • And people appointed from experiments (CMS, ATLAS, ALICE, LHCB) • Several machine – (lxbench cluster) at CERN – Harpertown and Barcelona INFN PD – Harpertown in Desy Workshop CCR 08 LNGS michele michelotto - INFN Padova 21
Measure • SI 2000 with gcc 3, 32 bit, cern tuning, parallel • SI 2006 with gcc 3, 32 bit, cern tuning, parallel • SFP 2006 with gcc 4, 32 bit cern tuning, parallel – Because Spec FP doesn’t compile with gcc 3 • For each experiment – GEN, SIM, DIGI, RECO on the same set of events Workshop CCR 08 LNGS michele michelotto - INFN Padova 22
SPEC rate vs parallel Workshop CCR 08 LNGS michele michelotto - INFN Padova 23
All The Machines • Lxbench 01 – 2 x Nocona 2. 8 GHz/1 MB, 2 x 1 GB • Lxbench 02 – 2 x Irvindale 2. 8 GHz/2 MB, 4 x 1 GB DDR 333 • Lxb 6106 – 2 x Irvindale 2. 8 GHz/2 MB, 2 x 1 GB DDR 333 • Lxb 7006 – 2 x Irvindale 2. 8 GHz/2 MB, 2 x 1 GB DDR-II 400 • Lxbench 03 – 2 x Opteron 275 2. 2 GHz/2 MB, 4 x 1 GB DDR-II 400 • Lxbench 04 – 2 x Woodcrest 2. 66 GHz/4 MB, 8 x 1 GB DDR-II 533 • Lxb 7609 – 2 x Woodcrest 3. 00 GHz/4 MB, 4 x 2 GB DDR-II 667 Workshop CCR 08 LNGS michele michelotto - INFN Padova Benchmarking Cluster - 3 CERN 24
All The Machines, cont. • Lxbench 05 – 2 x Woodcrest 3. 00 GHz/4 MB, 8 x 1 GB DDR-II 533 • Lxbench 06 – 2 x Opteron 2218 rev. F 2. 6 GHz/2 MB, 8 x 1 GB DDR-II 667 • Lxbench 07 – 2 x Clovertown 2. 33 GHz/2 x 4 MB, 8 x 2 GB DDR-II 667 • Lxbench 08 – 2 x Harpertown E 5410 2. 33 GHz/2 x 4 M, 8 x 2 GB DDR-II 667 • Lxcmssrv 07 – 2 x Harpertown E 5410 2. 33 GHz/2 x 4 M, 16 GB • Lxcmssrv 08 – 2 x Opteron Barcelona 2352 2. 10 GHz / 4 x 512 k. B + 2 x 2 MB, 16 GB • Desy – 2 x Harpertown E 5440 2. 83 GHz/2 x 4 M, 16 GB Workshop CCR 08 LNGS michele michelotto - INFN Padova Benchmarking Cluster - 4 CERN 25
SPECint 2000 Results 12000 10000 8000 6000 4000 2000 0 lxbench 01 lxbench 02 Workshop CCR 08 LNGS lxb 6106 lxb 7006 lxbench 03 lxbench 04 lxb 7609 lxbench 05 lxbench 06 lxbench 07 lxbench 08 michele michelotto - INFN Padova Benchmarking Cluster - 8 CERN 26
SPEC 2000 vs. SPEC 2006 70 60 SPEC 2006 50 40 SPEC 2006 int 32 SPEC 2006 fp 32 30 Linear(SPEC 2006 int 32) Linear(SPEC 2006 fp 32) 20 10 0 0 Workshop CCR 08 LNGS 2000 4000 6000 SPEC 2000 8000 10000 michele michelotto - INFN Padova Benchmarking Cluster - 9 CERN 12000 27
ATLAS v 12 ATLAS vs. SPEC 8 7 6 ATLAS Generation ATLAS Simulation ATLAS Digitization ATLAS Reconstruction ATLAS Total SPECint 2000 SPECint 2006 SPECfp 2006 5 4 3 2 1 0 lxbench 01 lxbench 02 Workshop CCR 08 LNGS lxbench 03 lxbench 04 lxbench 05 lxbench 06 michele michelotto - INFN Padova lxbench 07 28
LHCB GEN+SIM • 4 hours per run • Min bias p-p events GEN+SIM Workshop CCR 08 LNGS michele michelotto - INFN Padova 29
LHCB - Reconstruction • 20 minutes for each run • Min bias digitized events as input Workshop CCR 08 LNGS michele michelotto - INFN Padova 30
Alice pp Workshop CCR 08 LNGS michele michelotto - INFN Padova 31
Alice Pb Pb Workshop CCR 08 LNGS michele michelotto - INFN Padova 32
CMS RECO bench 01 Workshop CCR 08 LNGS michele michelotto - INFN Padova 33
CMS RECO bench 04 Workshop CCR 08 LNGS michele michelotto - INFN Padova 34
Alice results (preliminary) Exp. Results versus … Benchmark Test pp Min. Bias GEN+SIM 0. 974 0. 981 0. 980 DIGI 0. 949 0. 959 0. 979 RECO 0. 956 0. 966 0. 989 TOTAL(SUM) 0. 965 0. 974 0. 983 Pb. Pb per 2 GEN+SIM 0. 976 0. 983 0. 982 8. 6 - 11. 2 fm DIGI 0. 754 0. 752 0. 682 RECO 0. 942 0. 949 0. 943 TOTAL(SUM) 0. 976 0. 983 SPECint 2000 SPECint 2006 SPECfp 2006 IHEPCCC/HEPi. X Benchmarking WG - 35
CMS results (preliminary) (1) Exp. Result versus… Benchmark Test Higgs. ZZ 4 LM 190 GEN+SIM 0. 983 0. 988 0. 986 DIGI 0. 971 0. 977 0. 974 RECO 0. 979 0. 985 0. 983 TOTAL(SUM) 0. 982 0. 988 0. 986 GEN+SIM 0. 982 0. 988 0. 986 DIGI 0. 972 0. 978 0. 973 RECO 0. 970 0. 976 0. 970 TOTAL(SUM) 0. 981 0. 987 0. 984 Min. Bias SPECint 2000 SPECint 2006 SPECfp 2006 IHEPCCC/HEPi. X Benchmarking WG - 36
CMS results (preliminary) (2) QCD_80_120 0. 986 0. 984 DIGI 0. 973 0. 980 0. 976 RECO 0. 975 0. 981 0. 977 TOTAL(SUM) Single. Electron. E 1 000 GEN+SIM 0. 980 0. 986 0. 983 GEN+SIM 0. 983 0. 989 0. 988 DIGI 0. 970 0. 976 0. 974 RECO 0. 962 0. 968 0. 960 TOTAL(SUM) 0. 983 0. 989 0. 987 IHEPCCC/HEPi. X Benchmarking WG - 37
CMS results (preliminary) (3) QCD_80_120 0. 986 0. 984 DIGI 0. 973 0. 980 0. 976 RECO 0. 975 0. 981 0. 977 TOTAL(SUM) Single. Electron. E 1000 GEN+SIM 0. 980 0. 986 0. 983 GEN+SIM 0. 983 0. 989 0. 988 DIGI 0. 970 0. 976 0. 974 RECO 0. 962 0. 968 0. 960 TOTAL(SUM) 0. 983 0. 989 0. 987 IHEPCCC/HEPi. X Benchmarking WG - 38
Conclusion • Waiting for an official decision for the next benchmark: – SI 2006(gcc, parallel): PROBABLY – SI 2006 rate(gcc): – SI 2006(pubblished): easier to do, but risk of future divergence – C++ subset of SI 2006 (best fit but risk of future divergence – SI 2000(gcc): obsolete • TODAY: You are supposed to use the old SI 2 K(gcc-low)+50% Workshop CCR 08 LNGS michele michelotto - INFN Padova 39
Workshop CCR 08 LNGS michele michelotto - INFN Padova 40
Workshop CCR 08 LNGS michele michelotto - INFN Padova 41
Workshop CCR 08 LNGS michele michelotto - INFN Padova 42
Workshop CCR 08 LNGS michele michelotto - INFN Padova 43
Workshop CCR 08 LNGS michele michelotto - INFN Padova 44
Workshop CCR 08 LNGS michele michelotto - INFN Padova 45
Workshop CCR 08 LNGS michele michelotto - INFN Padova 46
Nehalem Workshop CCR 08 LNGS michele michelotto - INFN Padova 47
IMC + cache • Integrated Memory controller • 2 thread per core Latency cycle L 1 L 2 L 3 Nehalem 2. 66 GHz 4 11 39 Penryn Q 9460 2. 66 GHz 3 15 N/A Workshop CCR 08 LNGS michele michelotto - INFN Padova 48
2009: Nehalem vs Shangai • • 45 nm 700 M transistor Same architecture AMD uses 2 MB's of L 2 + 6 MB's of L 3 for 8 MB's total. Intel uses 1 MB of L 2 + 8 MB's of L 3 for 9 MB's total • Time to market? Clock? Price? Workshop CCR 08 LNGS michele michelotto - INFN Padova 49
8a62afff31e591342278902148ba7ecc.ppt