dbaf84ebdfe0528ad28b18aa449aab76.ppt
- Количество слайдов: 20
Extending PAPI to Multiple Measurement Domains Presented by Jack Dongarra, Kevin London, Shirley Moore, Philip Mucci, Daniel Terpstra, and Haihang You University of Tennessee and Oak Ridge National Laboratory
Motivation · Increasing cpu speeds and densities places greater importance on: - Thermal health and management - Power consumption · Higher processor counts make communications metrics more critical: - Bandwidth Latency Dropped packets Bytes transferred · Industry standard interfaces don’t exist to measure these metrics. · Hybrid machines require simultaneous access to multiple processor counter substrates. 2
PAPI 3. 0 Design PAPI High Level PAPI Low Level Portable Layer Machine Specific Layer 3 Hardware Independent Layer PAPI Machine Dependent Substrate Kernel Extension Operating System Hardware Performance Counters
PAPI 4. 0 Multiple Substrate Design PAPI Level High PAPI Low Level PAPI Low Portable Layer Hardware Independent Layer PAPI Machine Dependent Substrate PAPI CPU Dependent Substrate PAPI Network Dependent Machine Substrate Kernel Extension Machine Specific Layer Kernel Extension Operating System Hardware Performance Counters Off-Processor Hardware Counters Hardware Performance Counters 4
Multiple Measurements · HPCC HPL benchmark on Opteron with 3 performance metrics: - FLOPS, Temperature, Network Sends/Receives · Temperature is from an onchip thermal diode 5
Multiple Measurements · HPCC HPL benchmark on Opteron with 3 performance metrics: - FLOPS, Temperature, Network Sends/Receives · Temperature is from an onchip thermal diode 6
7
For More Information · http: //icl. cs. utk. edu/papi/ - Software and documentation Reference materials Papers and presentations Third-party tools Mailing lists · Team members: - Jack Dongarra, Kevin London, Shirley Moore, Philip Mucci, Daniel Terpstra, Haihang You 8
9
Correlating Temperature and PAPI Events · Can Multi-Substrate PAPI be used to correlate temp with PAPI presets? · Measure temperature & all 42 PAPI presets on Opteron cluster across HPCC suite. · Statistically examine results for correlations using cluster analysis and principal component analysis. 10
Dendrogram of temperature and PAPI events · Cluster analysis shows 8 PAPI preset events with similar behavior to the temperature. · Half are L 2 cache related. · Also: ACPI_TEMP PAPI_TOT_CYC PAPI_TLB_TL PAPI_HW_INT PAPI_RES_STL PAPI_L 2_STM PAPI_L 2_TCM PAPI_L 2_DCR 11 - Resource stalls Hardware interrupts TLB misses Total cycles
Normalized Graph of Clustered Events 12
Principal Component Analysis · Simplifies a dataset by transforming to a new coordinate system. · The principal component contains the greatest variance. · In this example, the first two components contain the bulk of the temperature variance. 13
First Principal Component Inversely Proportional: PAPI_TLB_TL PAPI_L 2_STM PAPI_RES_STL PAPI_TLB_DM PAPI_L 2_STM PAPI_FPU_IDL Proportional: ACPI_THERM PAPI_TOT_INS PAPI_FP_INS 14 PAPI_L 1_TCA PAPI_L 1_TCH PAPI_L 1_ICR PAPI_L 1_ICA PAPI_L 1_DCH PAPI_FML_INS PAPI_L 1_DCA PAPI_FAD_INS PAPI_FP_OPS PAPI_L 1_ICH
First vs. Second Principal Component Proportional PAPI_L 1_ICH PAPI_L 1_ICR PAPI_FPU_IDL PAPI_L 1_DCH PAPI_TLB_DM PAPI_L 1_DCA PAPI_TLB_TL PAPI_TOT_INS PAPI_HW_INT PAPI_VEC_INS PAPI_RES_STL PAPI_FML_INS PAPI_L 2_TCM PAPI_FP_INS PAPI_L 2_DCM PAPI_FAD_INS PAPI_L 1_TCM Inversely Proportional 15 PAPI_L 1_DCM
Temperature Correlation · Multi-Substrate PAPI made it easy to collect data needed to analyze and reduce the number of performance metrics required · Found approximately 10 events that are either directly or inversely proportional · Redundancy suggests using as few as 4 -5 events to estimate temperature · Potential for automated search for relevant performance metrics on new hardware 16
PAPI 4. 0 Status · Multi-substrate development complete · Some CPU platforms not yet ported · Substrates available for - ACPI (Advanced Configuration and Power Interface) - Myrinet MX · Substrates under development for - Infiniband - Gig. E · Friendly User release available now for CVS checkout · Release target: Q 3, 2006 Acknowledgement: This work was supported by the U. S. Department of Energy Los Alamos Computer Science Institute under subcontract R 7 A 827 -79200 through Rice University. 17
PAPI 4. 0 · Multi-substrate work complete · Substrates available for - ACPI (Advanced Configuration and Power Interface ) - Myrinet MX · Substrates under development for - Infiniband - Gig. E · Friendly User release available now for CVS checkout · PAPI 4. 0 Beta release expected Q 3, 2006 18
Support Slide: Setting up the counters · Test is run on 1. 4 GHz AMD Opteron - Supports 42 PAPI preset events - 4 hardware counters · HPCC calls function to setup PAPI events and uses a timer - Run on 1 processor, interested in temperature of 1 processor · Multiway multiplexing - Need 11 eventsets to monitor all events Each eventset gets a 20 ms timeslice Randomized order of eventsets After 5 iterations log results · Resulted in 1631 logged results of 43 different performance metrics (42 PAPI presets & 1 temperature) 19
20
dbaf84ebdfe0528ad28b18aa449aab76.ppt