Скачать презентацию MINIMISING DYNAMIC POWER CONSUMPTION IN ON-CHIP NETWORKS Robert Скачать презентацию MINIMISING DYNAMIC POWER CONSUMPTION IN ON-CHIP NETWORKS Robert

1217010d3c15ff54bbdf97c3cad46f5a.ppt

  • Количество слайдов: 20

MINIMISING DYNAMIC POWER CONSUMPTION IN ON-CHIP NETWORKS Robert Mullins Computer Architecture Group Computer Laboratory MINIMISING DYNAMIC POWER CONSUMPTION IN ON-CHIP NETWORKS Robert Mullins Computer Architecture Group Computer Laboratory University of Cambridge, UK

Communication-Centric Architectures • Future performance gains will primarily come from increasing the number of Communication-Centric Architectures • Future performance gains will primarily come from increasing the number of IP cores in a system not their complexity or operating frequency • Many reasons: – – – Diminishing returns from simply scaling what we have Energy efficiency Complexity Fault tolerance Economics 2

On-Chip Networks • An efficient general purpose chip-wide communication infrastructure is becoming essential • On-Chip Networks • An efficient general purpose chip-wide communication infrastructure is becoming essential • One flexible networking option is to use packetswitched networks with support for virtualchannels 3

The Lochside Router • Router Architecture – Highly parameterised implementation – Packet-switched network with The Lochside Router • Router Architecture – Highly parameterised implementation – Packet-switched network with virtual-channel flowcontrol – Best case latency is one cycle per network hop. • Results presented here are from post P&R simulations targeting a 90 nm technology TILE Traffic Generator, Debug & Test R Lochside Chip (2004/05) 180 nm Technology 4

Exploiting Speculation to Reduce Communication Latency Peh/Dally (2001) 5 Exploiting Speculation to Reduce Communication Latency Peh/Dally (2001) 5

Exploiting Speculation to Reduce Communication Latency 6 Exploiting Speculation to Reduce Communication Latency 6

Aims of this work • Apply existing power saving techniques to an onchip network Aims of this work • Apply existing power saving techniques to an onchip network design – e. g. clock and signal gating, gate-level optimisations etc. – Importance of applying such techniques before making comparisons • Measure power consumption and provide an accurate breakdown of where the remaining power is dissipated • Where is best place to look for future power savings? 7

Measuring and Optimizing Dynamic Power • Our Test Case – 8 mm x 8 Measuring and Optimizing Dynamic Power • Our Test Case – 8 mm x 8 mm die – 4 x 4 mesh network – Low-latency routers, best case latency is one cycle per hop (incl. interconnect) – 1. 2 V, 90 nm technology – 4 input-buffers/ VC – 4 VC/ input port – 48 x 80 -bit network links – 800 MHz @ WC PVT • ~32 FO 4 clock period – Results reported at 250 MHz 8

Interconnect Delay/Energy Trade-offs • Power dissipated in network links depends on how links are Interconnect Delay/Energy Trade-offs • Power dissipated in network links depends on how links are spaced and buffered • At least a factor of 3 difference in energy consumption over range of potential interconnect options • Could move to low-swing differential schemes for even greater energy savings For results we assume min. spaced wires, opt. energy x delay product 9

Clock Gating • Clock gating optimisations applied at two levels: – Local Clock Gating Clock Gating • Clock gating optimisations applied at two levels: – Local Clock Gating • Automated clock gating within router • Some tuning of RTL involved to maximise opportunities for synthesis tool – Router Level Clock Gating • Exploit opportunities to gate clock as it enters the router • Isolates router’s clock completely, only static power consumption remains 10

Router-Level Clock Gating • Clock gating exposes clock tree insertion delay • Need to Router-Level Clock Gating • Clock gating exposes clock tree insertion delay • Need to know early if router will be required • Generate ‘early valid’ signals in neighbouring routers – Early-valid signals are slightly pessimistic – Based on what is requested not granted 11

Gate-Level Optimizations and Signal Gating • Automated signal gating and gate-level power optimisations had Gate-Level Optimizations and Signal Gating • Automated signal gating and gate-level power optimisations had minimal impact • Inserting signal gating logic manually did reduce input FIFO power requirements significantly • The reported results could be further improved (by 12%) by enabling logic optimisation across module boundaries – This was restricted to accurately determine where power is dissipated 12

Analysis of Power Consumption Power consumption of a single router and its links • Analysis of Power Consumption Power consumption of a single router and its links • Simple power optimisations can quarter power requirements + many more opportunities to save power • Network is ~5% of core area • Perhaps 10% of system power at present • Don’t make comparisons without optimizing power! 13

Analysis of Power Consumption • 22% Static power, 11% Inter-Router Links • ~1% Global Analysis of Power Consumption • 22% Static power, 11% Inter-Router Links • ~1% Global Clock tree • 65% Dynamic Power – Power Breakdown • ~50% of dynamic power is consumed in local clock tree and input FIFOs • ~30% on router datapath • ~20% on scheduling and arbitration – Scheduling is probably more complex than typical implementations due to speculation 14

Low-Power On-Chip Networks • Interconnect and static power set to increase – Many low-power Low-Power On-Chip Networks • Interconnect and static power set to increase – Many low-power link technologies • Low-swing differential techniques – Power gating and other leakage reduction techniques • Potential power savings begin to require lots of different techniques – no one silver bullet? 15

Low-Power On-Chip Networks • Topology – Don’t want to sacrifice general or at least Low-Power On-Chip Networks • Topology – Don’t want to sacrifice general or at least multipurpose nature of our networked So. C – Results suggest higher radix routers and longer interconnects could reduce power • Probably not a long term solution • Reduces path diversity, bad for fault-tolerance • Architecture – Scope for minimising memory required to store precomputed router schedule (particular to our router) – Simpler routers – Single cycle routers reduce power? Speculation for low-power? 16

Supporting Best-Effort (BE) and Guaranteed Services (GS) Efficiently • Current timing of the datapath Supporting Best-Effort (BE) and Guaranteed Services (GS) Efficiently • Current timing of the datapath and link suggests additional GS data could be routed in the same clock cycle – Allocate datapath/link to GS traffic for first ½ of clock cycle • Double capacity of network – Exploit simpler GS circuit-switched routing when possible – Reduce power • Very little additional overhead 17

Clocking On-Chip Networks • Network system timing issues are interesting – naturally event-driven not Clocking On-Chip Networks • Network system timing issues are interesting – naturally event-driven not synchronous • Work is investigating placing local data-driven clock generators in each network router – – Clock is stretched when no data to be routed Clock matches rate of incoming data streams Robust synchronisation solution (true GALS) Also investigating incorporating power gating support • See also Distributed Clock Generator – DCG (Fairbanks/Moore) 18

Challenges and Future Work • These are early results in a much more rigorous Challenges and Future Work • These are early results in a much more rigorous study on the power requirements of networked on-chip comummunication – Much more soon! • Exploiting a general-purpose on-chip network – – Exploiting execution diversity to improve energy-efficiency Multi-use platforms and Virtual-IP Fault tolerance Networks of processing elements or networks that process? • Scope for removing unnecessary interfaces and boundaries • Impact of networking on IP and processor core design 19

Thank You Thank You