Скачать презентацию The Minimax Cache An Energy Efficient Framework for Скачать презентацию The Minimax Cache An Energy Efficient Framework for

c773b1c4f35258936e69f0c332e9bfa6.ppt

  • Количество слайдов: 18

The Minimax Cache: An Energy. Efficient Framework for Media Processors Osman S. Unsal, Israel The Minimax Cache: An Energy. Efficient Framework for Media Processors Osman S. Unsal, Israel Koren, C. Mani Krishna, Csaba Andras Moritz Department of Electrical and Computer Engineering University of Massachusetts, Amherst February 5, 2002 HPCA-8, Boston, Massachusetts

CPU Power Dissipation IEEE Journal of SSC Nov. 96 Proceedings of ISSCC 94 Concentrate CPU Power Dissipation IEEE Journal of SSC Nov. 96 Proceedings of ISSCC 94 Concentrate on L 1 data cache Cool Chips, Micro-32, 99

Minimax Cache Philosophy ¨ Leverage data access characteristics of Media applications : – Low Minimax Cache Philosophy ¨ Leverage data access characteristics of Media applications : – Low scalar memory footprint – High scalar access frequency ¨ Leverage multimedia sensitive compile- time partitioning of memory accesses – Partition scalars from non-scalars – Map scalars to a Minicache

Conventional Minibuffer Minimax Cache Dynamic Static Estimation/Partitioning Cache Buffer ¨ Applicable to existing architectures Conventional Minibuffer Minimax Cache Dynamic Static Estimation/Partitioning Cache Buffer ¨ Applicable to existing architectures ¨ Simpler compiler analysis Cache Minicache

Intel Strongarm SA-1100 Others: • Samsung ARM 7 • Hitachi SH 2 • Fujitsu Intel Strongarm SA-1100 Others: • Samsung ARM 7 • Hitachi SH 2 • Fujitsu Sparclite

Simulation Setup Approximate Footprint Analysis High-Level Analysis Minimax. Cache Specific Code Generation Simplescalar/ Wattch Simulation Setup Approximate Footprint Analysis High-Level Analysis Minimax. Cache Specific Code Generation Simplescalar/ Wattch Annotations High-Level Optimizations SUIF/Mach. SUIF

Benchmarks ADPCM Adaptive differential pulse code modification audio coding EPIC Image compression coder based Benchmarks ADPCM Adaptive differential pulse code modification audio coding EPIC Image compression coder based on wavelet decomposition G 721 Voice compression coder based on G. 711, 723 standards GSM Rate speech transcoding coder based on the GSM standard JPEG A lossy image compression coder MESA Open. GL clone: using Mipmap quadilateral texture mapping MPEG Lossy motion video compression decoder PEGWIT Public key encryption coder generates a public key RASTA Speech recognition front-end processing

Approximate Footprint Application Epic G 721 Gsm Jpeg Rasta Size 203 32 146 83 Approximate Footprint Application Epic G 721 Gsm Jpeg Rasta Size 203 32 146 83 152 Application Epic G 721 Gsm Jpeg Rasta 32 reg. 32. 0 4. 5 2. 3 1. 1 16. 0 16 reg. 62. 4 38. 8 37. 2 46. 5 36. 0 ¨ Scalar memory requirements are low! ¨ Percentage of scalars in total memory accesses are high!

Experimental Setup General Parameters 1 GHz, 0. 35μm, 2. 5 V Issue Single, in-order Experimental Setup General Parameters 1 GHz, 0. 35μm, 2. 5 V Issue Single, in-order L 1 D-Cache 64 K, 2 way Minicache 256 b, Fully-associative L 1 I-Cache 64 K, 2 way L 2 Cache None/256 K Register File Size Main Memory 16/32 100 cycles

Minicache Miss Rates Cache Size Minicache Miss Rates Cache Size

Relative Execution Time Minimax Performance (No L 2) 8 K+512 byte Minicache vs. 16 Relative Execution Time Minimax Performance (No L 2) 8 K+512 byte Minicache vs. 16 K monolithic Cache No L 2

Relative Execution Time Minimax Performance No L 2 256 K L 2 8 K+512 Relative Execution Time Minimax Performance No L 2 256 K L 2 8 K+512 byte Minicache vs. 16 K monolithic Cache

Monolithic 128 b Energy-Delay Product 256 b 512 b 1024 b Monolithic 128 b Energy-Delay Product 256 b 512 b 1024 b

Monolithic 128 b Energy-Delay Product (L 2) 256 b 512 b 1024 b Monolithic 128 b Energy-Delay Product (L 2) 256 b 512 b 1024 b

A Videophone Application • Amalgam of 3 media cores • Mpeg for video • A Videophone Application • Amalgam of 3 media cores • Mpeg for video • GSM for voice • Rasta for MOM • Weighed as follows: • Mpeg 60% • GSM 20% • RASTA 20%

Conclusion ¨ Compile-time partitioning of scalars ¨ Simple compiler implementation, minimal architectural modifications ¨ Conclusion ¨ Compile-time partitioning of scalars ¨ Simple compiler implementation, minimal architectural modifications ¨ 30% to 60% energy-delay efficient ¨ Idea applicable to other types of small- footprint, frequently used data

Cool- (star) Project ¨ Compiler-enabled low-power processor ¨ Key ideas – Leverage static information Cool- (star) Project ¨ Compiler-enabled low-power processor ¨ Key ideas – Leverage static information speculatively – Implement static and static-dynamic execution paths in addition to conventional ¨ 30 -60% reduction in power

Cool- Publications ¨ Unsal O. S. , Wang Z. , Koren I. , Krishna Cool- Publications ¨ Unsal O. S. , Wang Z. , Koren I. , Krishna C. M. , Moritz C. A. , “On Memory Behavior of Scalars in Embedded Multimedia Systems, ” MPI Workshop, ISCA 01 ¨ Unsal O. S. , Ashok R. , Koren I. , Krishna C. M. , Moritz C. A. , “Cool-Cache for Hot Multimedia, ” MICRO 01 ¨ Unsal O. S. , Koren I. , Krishna C. M. , Moritz C. A. , “Cool- Fetch: A Static Compiler-Enabled Energy-Efficient Fetch Throttling Architecture, ” Technical Report, ECE Dept. , University of Massachusetts, Amherst