Скачать презентацию CS 7960 -4 Lecture 16 Cache Decay Exploiting Скачать презентацию CS 7960 -4 Lecture 16 Cache Decay Exploiting

d071d7880f5d3c2ee248b09009186dce.ppt

  • Количество слайдов: 17

CS 7960 -4 Lecture 16 Cache Decay: Exploiting Generational Behavior to Reduce Cache Leakage CS 7960 -4 Lecture 16 Cache Decay: Exploiting Generational Behavior to Reduce Cache Leakage Power S. Kaxiras, Z. Hu, M. Martonosi Proceedings of ISCA-28 July 2001

Leakage Power Trends • Leakage a num transistors (incr) supply voltage (decr) low thresh. Leakage Power Trends • Leakage a num transistors (incr) supply voltage (decr) low thresh. voltage (incr) • L 1 and L 2 caches are the biggest contributors (high transistor budgets)

Leakage Power Leakage Power

Vdd-Gating • Leakage can be reduced by gating off the supply voltage to the Vdd-Gating • Leakage can be reduced by gating off the supply voltage to the circuit • When applied to a cache, the contents of the SRAM cell are lost • Cache decay: apply Vdd-gating when you do not care about cache contents

Lifetime of a Cache Line Lifetime of a Cache Line

Overheads • Hardware to determine when to decay • Introduces additional cache misses • Overheads • Hardware to determine when to decay • Introduces additional cache misses • Normalized cache leakage power = Activeratio (fraction of cache that is powered on) + (Counter overhead : Leak) x activity + (L 2 access energy : Leak) x num-misses • Increased execution time (< 0. 7%) • L 2 access/leakage ratio is ~9

Skier’s Dilemma New skis: $400 Ski rentals: $20 Heuristic: Buy skis after rental cost Skier’s Dilemma New skis: $400 Ski rentals: $20 Heuristic: Buy skis after rental cost = purchase price Ski trips: Optimal: Heuristic: 5 10 15 20 25 50 $100 $200 $300 $400 $100 $200 $300 $800 Likewise, decay a cache line when the cost of an additional miss equals leakage dissipated so far

Tracking Dead Time • Each line has a 2 -bit counter that gets reset Tracking Dead Time • Each line has a 2 -bit counter that gets reset on every access and gets incremented every 2500 cycles through a global signal (negligible overhead) • After 10, 000 clock cycles, the counter reaches the max value and triggers a decay • Adaptive decay: Start with a short decay period; if you have a quick miss, double the period; if there is no miss, halve the period

Results Results

Overheads Overheads

Adaptive Technique Adaptive Technique

Other Results • L 2 cache is equally suitable to decay techniques -- lifetimes Other Results • L 2 cache is equally suitable to decay techniques -- lifetimes are scaled by a factor of 10, an extra miss also costs a lot more • For their experiments, there is little interference from multiprogramming • Some instructions can easily be identified as last touches to a cache block – potential for early cache decay

The GALS Approach • Dynamic voltage (and freq) scaling (DVS) has favorable power-performance characteristics The GALS Approach • Dynamic voltage (and freq) scaling (DVS) has favorable power-performance characteristics – 3% power savings for ~1% performance loss • Distributing a single clock is going to be much harder in the future – will naturally result in multiple clock domains • DVS can be applied to each individual domain – identifying critical regions will allow better IPC

Multi-Clock Domain Processor Multi-Clock Domain Processor

Interfacing Domains • There are queues between each pair of domains • Producers place Interfacing Domains • There are queues between each pair of domains • Producers place data in the queues and consumers pull data out, in an asynchronous fashion • Synchronization delays make the IPC slightly lower than the base case • Occupancy in the queue means the consumer can slow down; an empty queue implies the producer can slow down

Next Week’s Paper • “Reducing Power with Dynamic Critical Path Information”, J. S. Seng, Next Week’s Paper • “Reducing Power with Dynamic Critical Path Information”, J. S. Seng, E. S. Tune, D. M. Tullsen, Proceedings of MICRO-34, Dec 2001

Title • Bullet Title • Bullet