Скачать презентацию Adaptive System on a Chip a So C Скачать презентацию Adaptive System on a Chip a So C

3c75f8ca98272d2447eb5796de46d56a.ppt

  • Количество слайдов: 39

Adaptive System on a Chip (a. So. C) for Low-Power Signal Processing Andrew Laffely, Adaptive System on a Chip (a. So. C) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier Department of Electrical and Computer Engineering University of Massachusetts, Amherst {alaffely, jliang, pjain, nweng, burleson, tessier} @ecs. umass. edu This material is based upon work supported by the National Science Foundation under Grant No. 9988238. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Overview • Motivation • Video Processing • Architecture • Dynamic Power Management • Core, Overview • Motivation • Video Processing • Architecture • Dynamic Power Management • Core, Interconnect, and Clock

Problem • Wireless video processing requires • • • High throughput Low Power Flexible Problem • Wireless video processing requires • • • High throughput Low Power Flexible

System on a Chip Solutions • Take advantage of parallelism • Possible improved performance System on a Chip Solutions • Take advantage of parallelism • Possible improved performance • Allow use and reuse of existing integrated components • If • • The application can be partitioned The appropriate architecture is used

Proposed Architecture: a. So. C • High throughput • Heterogeneous processor elements • Use Proposed Architecture: a. So. C • High throughput • Heterogeneous processor elements • Use the right tool for the job • Fast and predictable interconnect • Flexible • Runtime reconfiguration of cores and interconnect • Power consumption • • Implement power saving features in both cores and interconnect Use reconfiguration to dynamically control power consumption

a. So. C: adaptive System on a Chip • Tiled So. C Motion Estimation a. So. C: adaptive System on a Chip • Tiled So. C Motion Estimation and Compensation DCT Control Encrypt VLE FIR Viterbi Memory architecture

a. So. C: adaptive System on a Chip Motion Estimation and Compensation DCT • a. So. C: adaptive System on a Chip Motion Estimation and Compensation DCT • Tiled So. C architecture • Supports the use of independently developed heterogeneous cores • Control Encrypt VLE • FIR Viterbi Memory Pick and place cores which best perform the given application • Increase performance • Save power Cores may be any number of tiles in size

a. So. C: adaptive System on a Chip Motion Estimation and Compensation DCT Control a. So. C: adaptive System on a Chip Motion Estimation and Compensation DCT Control Encrypt VLE FIR Viterbi Memory • Tiled So. C architecture • Supports the use of independently developed heterogeneous cores • Connected with an interconnect mesh • Restricted to near neighbor communications • Creates pipeline • Decreases cycle time

a. So. C: adaptive System on a Chip Motion Estimation and Compensation DCT Control a. So. C: adaptive System on a Chip Motion Estimation and Compensation DCT Control Encrypt VLE • Tiled So. C architecture • Supports the use of independently developed heterogeneous cores • Connected with a fixed interconnect mesh • Using a communication interface (CI) to manage data • FIR Viterbi Memory • • Network port (Coreport) for each core Each CI uses a memory and FSM to repetitively process a predefined schedule of communications Crossbar

Stream Control Outputs Inputs Core North South East West Local Config. • Instruction memory Stream Control Outputs Inputs Core North South East West Local Config. • Instruction memory • • PC • Selects and synchronizes the communications • Decoder • Sets crossbar • Controller • • Decoder/Controller Holds the predetermined schedule of communications Sets PC Interprets incoming configuration commands • Crossbar • Instruction Memory PC Any input to any set of outputs

Example: Communication • Stream A-D Core A Core • B Core C • A Example: Communication • Stream A-D Core A Core • B Core C • A given application requires periodic communications from Core A to Core C • • • a. So. C uses a prescheduled communication STREAM Core A places the data in a dedicated STREAM between the two tiles Core C pulls the data from that STREAM • The tile to tile communication uses 3 cycles

Example: Stream A 1 Core to East B C Example: Stream A 1 Core to East B C

Example: Stream • Stream A-D A 2 B • West to East C Example: Stream • Stream A-D A 2 B • West to East C

Example: Stream A 3 B C West to Core Example: Stream A 3 B C West to Core

Example: Stream • Stream A-D A 1 Loop Back 2 3 B • C Example: Stream • Stream A-D A 1 Loop Back 2 3 B • C Core to East West to Core

Static Scheduled Communications • Creates system scalability by Motion Estimation and Compensation DCT Control Static Scheduled Communications • Creates system scalability by Motion Estimation and Compensation DCT Control Encrypt VLE “eliminating” network congestion • Many interconnect segments managed with time division multiplexing • lots of Bandwidth • Improves So. C performance by up to factor of 8 FIR Viterbi Memory

Power Consumption? • Provide reconfiguration methods for cores and CI • Develop programmable clocking Power Consumption? • Provide reconfiguration methods for cores and CI • Develop programmable clocking systems at each tile

Power Aware Core • Custom motion estimation core • Choose search method • Full Power Aware Core • Custom motion estimation core • Choose search method • Full search • 960 -600 m. W (bit width and pel sub-sampling) • Spiral search • 76 m. W • Three step search • 25 m. W Data taken with Synopsys. TM Power Compiler at the RTL level

a. So. C Support • Multiple streams in and out through dedicated coreports Coreports a. So. C Support • Multiple streams in and out through dedicated coreports Coreports Motion Estimation Core • Easy to manage on both sides of the port • Schedule configuration streams in with the data in 1 • in 2 out 1 out 2 • • Stream A Stream B Stream C Stream A: Input Frame Stream B: Configuration (Choose search mode and size) Stream C: Motion Vectors

Reconfigurable Interconnect • P-frame + - Input Frame ME • I-frame Input Frame S Reconfigurable Interconnect • P-frame + - Input Frame ME • I-frame Input Frame S DCT MC DCT

a. So. C Support Motion Estimation & Compensation DCT • Lumped ME, MC and a. So. C Support Motion Estimation & Compensation DCT • Lumped ME, MC and Summation into one double core

a. So. C Support: P-Frame Motion Estimation & Compensation Input Frame (Stream A) Difference a. So. C Support: P-Frame Motion Estimation & Compensation Input Frame (Stream A) Difference Frame (Stream B) DCT

a. So. C Support: Schedule Change Input Frame (Stream A) Motion Estimation & Compensation a. So. C Support: Schedule Change Input Frame (Stream A) Motion Estimation & Compensation Difference Frame (Stream B) Configuration Streams (C & D) DCT

a. So. C Support: Schedule Change Input Frame (Stream A) PC Motion Estimation & a. So. C Support: Schedule Change Input Frame (Stream A) PC Motion Estimation & Compensation Schedule 1 Schedule 2 Configuration (Streams C) Difference Frame (Stream B) DCT

a. So. C Support: Schedule Change Input Frame (Stream A) PC Motion Estimation & a. So. C Support: Schedule Change Input Frame (Stream A) PC Motion Estimation & Compensation Schedule 1 Schedule 2 Configuration (Streams C) Difference Frame (Stream B) DCT

a. So. C Support: Schedule Change Input Frame (Stream A) Motion Estimation & Compensation a. So. C Support: Schedule Change Input Frame (Stream A) Motion Estimation & Compensation PC Schedule 1 Schedule 2 Configuration (Streams D) DCT

a. So. C Support: Schedule Change Input Frame (Stream A’) Motion Estimation & Compensation a. So. C Support: Schedule Change Input Frame (Stream A’) Motion Estimation & Compensation PC Schedule 1 Schedule 2 Configuration (Streams D) DCT

a. So. C Support: I-Frame OFF Input Frame (Stream A’) Motion Estimation & Compensation a. So. C Support: I-Frame OFF Input Frame (Stream A’) Motion Estimation & Compensation DCT

Operating Frequency? • Interconnect synchronized • H-tree clock distribution • Core frequencies depend on Operating Frequency? • Interconnect synchronized • H-tree clock distribution • Core frequencies depend on critical path • • Tile provides clock reference Coreport provides asynchronous boundary • Dynamic core configuration requires dynamic clock configuration • • a. So. C clock reference provides multiples of interconnect clock (… 4 x, 2 x, 1 x, 0. 5 x, 0. 25 x, …) Configured through the tile controller

Mixed vs. Fixed Core Frequencies • • Cores not designed with clock gating Core Mixed vs. Fixed Core Frequencies • • Cores not designed with clock gating Core power from Synopsys RTL simulation Interconnect from SPICE Assumes 10 cycle schedule, 4 pixels/word

Current Density and Clocking • Red: fixed worst case clocking ME: Full Search • Current Density and Clocking • Red: fixed worst case clocking ME: Full Search • Short spikes of high current • Green: optimal independent clocking ME: Spiral • ME: Three Step Search • Optimal clocking eliminates Current DCT Time Process Start Slow and low Deadline current spikes (improved battery life)

Configuration Overhead • Configuration adds up DCT Input Frame (Stream B) to 2 streams Configuration Overhead • Configuration adds up DCT Input Frame (Stream B) to 2 streams per tile • Only 2 required for data • Total BW =5 x. Tx. N • Transform Frame (Stream D) • • 5 streams/(cycle, tile) T tiles N cycles in schedule • Single tile can support Configuration Streams up to 50 different streams in 10 cycle schedule

Configuration Power Overhead • Configuration streams used infrequently Once/Macro block or Once/Frame • Architecture Configuration Power Overhead • Configuration streams used infrequently Once/Macro block or Once/Frame • Architecture disables unused streams • • Data valid bit already used for flow control • Only 4 -9% of interconnect power is due to configuration streams

Conclusion • a. So. C supports dynamic power management with Reconfiguration • • • Conclusion • a. So. C supports dynamic power management with Reconfiguration • • • Cores Interconnect Clocks • Low configuration overhead in both • • Communication Bandwidth Power

Future Work • Add reconfigurable voltage supplies at each tile • Finish test chip Future Work • Add reconfigurable voltage supplies at each tile • Finish test chip • Import larger applications

Questions Questions

a. So. C: adaptive System on a Chip Tile Motion Estimation and Compensation DCT a. So. C: adaptive System on a Chip Tile Motion Estimation and Compensation DCT Control Encrypt VLE FIR Cores Viterbi Memory Interconnect Interface

Example: Stream • Stream A-D A B • C Example: Stream • Stream A-D A B • C

Partitioning • Automated partitioning a non trivial problem • For small signal processing systems Partitioning • Automated partitioning a non trivial problem • For small signal processing systems user defined partitioning may be possible • Key: Perfectly partitioning the system may not be possible • How can the So. C mitigate the penalty?