Скачать презентацию Hepmark Valutazione della potenza dei nodi di calcolo Скачать презентацию Hepmark Valutazione della potenza dei nodi di calcolo

3545bca355cb9eb92e44472cbc1ec6da.ppt

  • Количество слайдов: 36

Hepmark Valutazione della potenza dei nodi di calcolo nella HEP Michele Michelotto Padova Ferrara Hepmark Valutazione della potenza dei nodi di calcolo nella HEP Michele Michelotto Padova Ferrara Bologna Cd. S - Luglio 2008 michele michelotto - INFN PD

Modello di computing Lab m Uni x a up Centri di vario livello o Modello di computing Lab m Uni x a up Centri di vario livello o or l gr f id iona gr g re Lab a Tier 1 USA FNAL Tier 3 physics department UK Tier-1 USA Tier-2 France BNL Italy Uni a CERN Lab b CERN Tier 0 Japan Germany Uni n Lab c grid for a physics study group Uni b Desktop Cd. S - Luglio 2008 michele michelotto - INFN PD 2

Esigenze di Computing • Tape and disk Storage – Very Easy: events Terabyte • Esigenze di Computing • Tape and disk Storage – Very Easy: events Terabyte • Disk Storage – Easy again: events Terabyte – (1000 x 1000 or 1024 x 1024? ) – RAID protected or raw size? • Computing Power – Tricky: Event/sec? Sim or Reco? – MIPS, Cern. Unit, MHz, Spec, SI 2 K…. Cd. S - Luglio 2008 michele michelotto - INFN PD 3

 • SI 2 K is the benchmark used up to now to measure • SI 2 K is the benchmark used up to now to measure the computing power of all the HEP experiments – Computing power requested by experiment – Computing power provided by a Tier-[0, 1, 2] • SI 2 K is the nickname for SPEC CPU Int 2000 benchmark – Came after Spec 89, Spec Int 92 and Spec Int 95 – Declared obsolete by SPEC in 2006 – Replaced by SPEC with CPU Int 2006 Cd. S - Luglio 2008 michele michelotto - INFN PD 4

T 1 + T 2 cpu budget - LHC SI 2 K Misura in T 1 + T 2 cpu budget - LHC SI 2 K Misura in k€ Cd. S - Luglio 2008 michele michelotto - INFN PD 5

The SI 2 K inflaction • The main problems with SI 2000 in our The SI 2 K inflaction • The main problems with SI 2000 in our community: it is not proportional to HEP codes performance (as it was) • You can buy processors with huge SI 2 K number but with a smaller increase in real performances • SI 2 K results for the last generation processor affected by inflation Cd. S - Luglio 2008 michele michelotto - INFN PD 6

Nominal SI vs real SI • So CERN (and FZK) started to use a Nominal SI vs real SI • So CERN (and FZK) started to use a new currency: SI 2 K measured with “gcc”, the gnu C compiler and using two flavour of optimization – High tuning: gcc –O 3 –funroll-loops– march=$ARCH – Low tuning: gcc –O 2 –f. PIC –pthread Cd. S - Luglio 2008 michele michelotto - INFN PD 7

 • CERN Proposal: Use as site rating the “Real SI” obtained by SI • CERN Proposal: Use as site rating the “Real SI” obtained by SI measured with gcc-low and increased by 50% – Actually this make sense only for a short period of time and for the last generation of processor • Run n copies in parallel – Where n is the number of cores in the worker node – To take in account the drop in performance of a multicore machine when fully loaded. Cd. S - Luglio 2008 michele michelotto - INFN PD 8

Too many SI 2 K esempio • Take as an example a worker node Too many SI 2 K esempio • Take as an example a worker node with two Intel Woodcrest dual core 5160 at 3. 06 GHz • SI 2 K nominal: 2929 – 3089 (min – max) • SI 2 K sum on 4 cores: 11716 - 12536 • SI 2 K gcc-low: 5523 • SI 2 K gcc-high: 7034 • SI 2 K gcc-low + 50%: 8284 • The goal is to find a “commercial mantained” benchmark to replace SI 2 K Cd. S - Luglio 2008 michele michelotto - INFN PD 9

 • Cache : importance of the cache architecture – 1 st level, 2 • Cache : importance of the cache architecture – 1 st level, 2 nd level, 3 rd level, cache latency (tempo), Cache bandwidth (vel trasfer), shared or exclusive? • Access time to memory • Power consumption Example: a big Tier 2 with 500 boxes needs 100 k. W – About 800 MWh in one year – Energy cost 0. 12 Euro per k. Wh Energy bills of 100 k. Euro/year – A 10% improvement on Power efficiency means 10 k€/year savings – And savings on the infrastructure (power distribution, UPS, Cooling) Cd. S - Luglio 2008 michele michelotto - INFN PD 10

Many gaps • Difficult to measure: – Not easy to have machine on loan Many gaps • Difficult to measure: – Not easy to have machine on loan from Server reseller or producer – Not easy to borrow machine from colleagues – Always for short periods of time – A SPEC run can last 15 -20 hours • Need a set of dedicated worker node to make SPEC and HEP application measurement Cd. S - Luglio 2008 michele michelotto - INFN PD 11

 • • Padova: Michele Michelotto (1° Tecn. ) 0. 70, Matteo Menguzzato (Univ) • • Padova: Michele Michelotto (1° Tecn. ) 0. 70, Matteo Menguzzato (Univ) 0. 40 Ferrara: Alberto Gianoli (1° Tecn. ): 0. 20 • Bologna: Franco Brasolin (CTER): 0. 20 TOT FTE 1. 5 Milestone • • 2009 Undestand SPEC 2006. Propose a new benchmark to replace SI 2 K Measure the performance of the current architectures for Montecarlo SIM (evt/sec vs SPEC) 2009/2010 Power performances, Cache profiling estero consumo FE PD int 1. 00 2. 00 3. 00 inventario TOTALI 1. 00 16. 00 22. 00 Totali 2. 00 3. 00 16. 00 23. 00 2010 1. 00 3. 00 2. 00 16. 00 22. 00 Cd. S - Luglio 2008 michele michelotto - INFN PD 12

Mem intel vs amd • Who is faster? • It depends on the block Mem intel vs amd • Who is faster? • It depends on the block size • On the red zones Intel is better. • On the green zone AMD is better Cd. S - Luglio 2008 michele michelotto - INFN PD 13

Cache behaviour • 54 xx has lower latency even with bigger cache • The Cache behaviour • 54 xx has lower latency even with bigger cache • The 3 processors behave very differently in the 4 MB e 64 MB range • If your (HEP) application works in this range you will see a big change of performance changing processor Cd. S - Luglio 2008 michele michelotto - INFN PD 14

CMS sw SIM and Pythia • CMS Montecarlo simulation (32 bit) and Pythia (64 CMS sw SIM and Pythia • CMS Montecarlo simulation (32 bit) and Pythia (64 bit) show the same performance once normalized • Both Specint 2006 pubblished and Specint 2006 with gcc show the same behaviour • SI 2 K pubbished does not match HEP sw • SI 2 K cern better but not as good as SI 2006 Cd. S - Luglio 2008 michele michelotto - INFN PD 15

Babar Tier. A Results Cd. S - Luglio 2008 • If you normalize by Babar Tier. A Results Cd. S - Luglio 2008 • If you normalize by core and clock all new processors have the same performance • Doubling the older generation cpu • SI 2006 matches this pattern (pubblished and gcc ratio constant) • SI 2000 -cern better than SI 2 K nominal • SI 2000 clearly doesn’t work michele michelotto - INFN PD 16

4 core processor Cd. S - Luglio 2008 michele michelotto - INFN PD 17 4 core processor Cd. S - Luglio 2008 michele michelotto - INFN PD 17

Intel 54 xx Cd. S - Luglio 2008 michele michelotto - INFN PD 18 Intel 54 xx Cd. S - Luglio 2008 michele michelotto - INFN PD 18

AMD 4 core Cd. S - Luglio 2008 michele michelotto - INFN PD 19 AMD 4 core Cd. S - Luglio 2008 michele michelotto - INFN PD 19

Load transactional (confronto tra processori) Performance don’t drop in the new 4 core processor Load transactional (confronto tra processori) Performance don’t drop in the new 4 core processor Clovertown drop wrt Harpwertown A dual core processor keeps only up to Load 3 Cd. S - Luglio 2008 michele michelotto - INFN PD 20

Perf/watt • AMD Barcelona at 65 nm Performance per watt similar to INTEL xeon Perf/watt • AMD Barcelona at 65 nm Performance per watt similar to INTEL xeon at 45 nm Cd. S - Luglio 2008 michele michelotto - INFN PD 21

Cache behaviour • 54 xx has lower latency even with bigger cache • The Cache behaviour • 54 xx has lower latency even with bigger cache • The 3 processors behave very differently in the 4 MB e 64 MB range • If your (HEP) application works in this range you will see a big change of performance changing processor Cd. S - Luglio 2008 michele michelotto - INFN PD 22

Memory intel vs amd • Access time very similar • At 1 GB (tipical Memory intel vs amd • Access time very similar • At 1 GB (tipical footprint of HEP application) the new AMD behave better • But the new are Xeon 54 xx much better than the 53 xx Cd. S - Luglio 2008 michele michelotto - INFN PD 23

Mem intel vs amd • Who is faster? • It depends on the block Mem intel vs amd • Who is faster? • It depends on the block size • On the red zones Intel is better. • On the green zone AMD is better Cd. S - Luglio 2008 michele michelotto - INFN PD 24

Cache behaviour • We need to study the behaviour of tipical HEP application – Cache behaviour • We need to study the behaviour of tipical HEP application – Simulation, event generation, Reconstruction, Analysis – To understand how to write more efficient application Cd. S - Luglio 2008 michele michelotto - INFN PD 25

Power issues • Power consumption change from one processor to another – Clock, High-K Power issues • Power consumption change from one processor to another – Clock, High-K dielectric, Active Power Managements, Clock throttling Cd. S - Luglio 2008 michele michelotto - INFN PD 26

An HEP data center • Need to make measurement of Power usage for HEP An HEP data center • Need to make measurement of Power usage for HEP application • Example: a big Tier 2 with 500 boxes needs 100 k. W – Like the whole CED of INFN Padova – About 800 MWh in one year – Energy cost 0. 12 Euro per k. Wh Energy bills of 100 k. Euro/year – A 10% improvement on Power efficiency means 10 k€/year savings – And savings on the infrastructure (power distribution, UPS, Cooling) Cd. S - Luglio 2008 michele michelotto - INFN PD 27

Financial request • Need to buy a new worker node each time a new Financial request • Need to buy a new worker node each time a new processor is released in the dual proc market segment – – Only if significantly new features are presents One or two each for INTEL and AMD per year 4 k. Euro each (dual proc, 2 GB/core, 1 disk) 2 box to start with Cd. S - Luglio 2008 michele michelotto - INFN PD 28

Transition problem • Impossible to find SPEC Int 2000 pubblished results for the new Transition problem • Impossible to find SPEC Int 2000 pubblished results for the new processors (e. g. the not so new Clovertown 4 -core) • Impossible to find pubblished SPEC Int 2006 for old processor (before 2006) – E. g. Old P 4 Xeon, P 4, AMD 2 xx • You can’t convert from SI 2000 to SI 2006 but the ratio for x 86 architecture is in the 137 – 172 range Cd. S - Luglio 2008 michele michelotto - INFN PD 29

Even more • Actually all the gcc results in the previous slide are on Even more • Actually all the gcc results in the previous slide are on i 386 (32 bit) • if you would like to know how your code is running on 64 bit machine, you can measure Specint INT 2000 with gcc on x 86_64. • So the worker node with two Intel Woodcrest dual core 5160 at 3. 06 GHz • SI 2 K nominal: 2929 – 3089 (min – max) • SI 2 K on 4 cores: 11716 - 12536 • SI 2 K gcc-low: 6021 • SI 2 K gcc-high: 6409 • SI 2 K gcc-low + 50%: 9031 Cd. S - Luglio 2008 michele michelotto - INFN PD 30

Atlas • • • Cd. S - Luglio 2008 michele michelotto - INFN PD Atlas • • • Cd. S - Luglio 2008 michele michelotto - INFN PD Here 100% is Xeon 5160 Few results for SI 2006+gcc but no diff from CMS and babar Few results also from SI 2006 pubblished because of several old architectures SI 2 K+gcc not bad SI 2 K pubblished heavily overstimate new Xeon Atlas simulation normalized performs the same on the new intel “core” or amd “opteron” (like CMS, Babar) 31

Power consumption Cd. S - Luglio 2008 michele michelotto - INFN PD 32 Power consumption Cd. S - Luglio 2008 michele michelotto - INFN PD 32

Power meter • Need a device to measure Voltage and Current • And logging Power meter • Need a device to measure Voltage and Current • And logging capabilities • E. g. Fluke 1735 Cd. S - Luglio 2008 michele michelotto - INFN PD 33

FZK Measurement • In 2001 SPEC with gcc was 80% of the average pubblished FZK Measurement • In 2001 SPEC with gcc was 80% of the average pubblished data • In 2006 the gap was much wider Cd. S - Luglio 2008 michele michelotto - INFN PD 34

Which is the better? • I started to measure performances of HEP codes on Which is the better? • I started to measure performances of HEP codes on several machines • The goal was to find a “commercial mantained” benchmark to replace SI 2 K • I compared HEP code with – SI 2 K pubblished results – SI 2 K measured with gcc and “CERN” tuning – SI 2006 and SI 2006 rate pubblished results – SI 2006 and SI 2006 with gcc 4 (32 and 64 bit) Cd. S - Luglio 2008 michele michelotto - INFN PD 35

Cache • In the 80’s the latency (3 -10 clock time) • Now latency Cache • In the 80’s the latency (3 -10 clock time) • Now latency is 1000 s of clock time • Importance of the cache architecture – 1 st level, 2 nd level, 3 rd level – Cache latency (tempo) – Cache bandwidth (vel trasfer) – Shared or exclusive? Cd. S - Luglio 2008 michele michelotto - INFN PD 36