c773b1c4f35258936e69f0c332e9bfa6.ppt
- Количество слайдов: 18
The Minimax Cache: An Energy. Efficient Framework for Media Processors Osman S. Unsal, Israel Koren, C. Mani Krishna, Csaba Andras Moritz Department of Electrical and Computer Engineering University of Massachusetts, Amherst February 5, 2002 HPCA-8, Boston, Massachusetts
CPU Power Dissipation IEEE Journal of SSC Nov. 96 Proceedings of ISSCC 94 Concentrate on L 1 data cache Cool Chips, Micro-32, 99
Minimax Cache Philosophy ¨ Leverage data access characteristics of Media applications : – Low scalar memory footprint – High scalar access frequency ¨ Leverage multimedia sensitive compile- time partitioning of memory accesses – Partition scalars from non-scalars – Map scalars to a Minicache
Conventional Minibuffer Minimax Cache Dynamic Static Estimation/Partitioning Cache Buffer ¨ Applicable to existing architectures ¨ Simpler compiler analysis Cache Minicache
Intel Strongarm SA-1100 Others: • Samsung ARM 7 • Hitachi SH 2 • Fujitsu Sparclite
Simulation Setup Approximate Footprint Analysis High-Level Analysis Minimax. Cache Specific Code Generation Simplescalar/ Wattch Annotations High-Level Optimizations SUIF/Mach. SUIF
Benchmarks ADPCM Adaptive differential pulse code modification audio coding EPIC Image compression coder based on wavelet decomposition G 721 Voice compression coder based on G. 711, 723 standards GSM Rate speech transcoding coder based on the GSM standard JPEG A lossy image compression coder MESA Open. GL clone: using Mipmap quadilateral texture mapping MPEG Lossy motion video compression decoder PEGWIT Public key encryption coder generates a public key RASTA Speech recognition front-end processing
Approximate Footprint Application Epic G 721 Gsm Jpeg Rasta Size 203 32 146 83 152 Application Epic G 721 Gsm Jpeg Rasta 32 reg. 32. 0 4. 5 2. 3 1. 1 16. 0 16 reg. 62. 4 38. 8 37. 2 46. 5 36. 0 ¨ Scalar memory requirements are low! ¨ Percentage of scalars in total memory accesses are high!
Experimental Setup General Parameters 1 GHz, 0. 35μm, 2. 5 V Issue Single, in-order L 1 D-Cache 64 K, 2 way Minicache 256 b, Fully-associative L 1 I-Cache 64 K, 2 way L 2 Cache None/256 K Register File Size Main Memory 16/32 100 cycles
Minicache Miss Rates Cache Size
Relative Execution Time Minimax Performance (No L 2) 8 K+512 byte Minicache vs. 16 K monolithic Cache No L 2
Relative Execution Time Minimax Performance No L 2 256 K L 2 8 K+512 byte Minicache vs. 16 K monolithic Cache
Monolithic 128 b Energy-Delay Product 256 b 512 b 1024 b
Monolithic 128 b Energy-Delay Product (L 2) 256 b 512 b 1024 b
A Videophone Application • Amalgam of 3 media cores • Mpeg for video • GSM for voice • Rasta for MOM • Weighed as follows: • Mpeg 60% • GSM 20% • RASTA 20%
Conclusion ¨ Compile-time partitioning of scalars ¨ Simple compiler implementation, minimal architectural modifications ¨ 30% to 60% energy-delay efficient ¨ Idea applicable to other types of small- footprint, frequently used data
Cool- (star) Project ¨ Compiler-enabled low-power processor ¨ Key ideas – Leverage static information speculatively – Implement static and static-dynamic execution paths in addition to conventional ¨ 30 -60% reduction in power
Cool- Publications ¨ Unsal O. S. , Wang Z. , Koren I. , Krishna C. M. , Moritz C. A. , “On Memory Behavior of Scalars in Embedded Multimedia Systems, ” MPI Workshop, ISCA 01 ¨ Unsal O. S. , Ashok R. , Koren I. , Krishna C. M. , Moritz C. A. , “Cool-Cache for Hot Multimedia, ” MICRO 01 ¨ Unsal O. S. , Koren I. , Krishna C. M. , Moritz C. A. , “Cool- Fetch: A Static Compiler-Enabled Energy-Efficient Fetch Throttling Architecture, ” Technical Report, ECE Dept. , University of Massachusetts, Amherst