Скачать презентацию Hot Chips 16 August 24 2004 Optimo DE Скачать презентацию Hot Chips 16 August 24 2004 Optimo DE

d7fb0bb6ac0cebb67e17839591af1a9f.ppt

  • Количество слайдов: 19

Hot Chips 16 August 24, 2004 Optimo. DE: Programmable Accelerator Engines Through Retargetable Customization Hot Chips 16 August 24, 2004 Optimo. DE: Programmable Accelerator Engines Through Retargetable Customization Nathan Clark, Hongtao Zhong, Kevin Fan, Scott Mahlke Krisztián Flautner, Koen Van Nieuwenhove CCCP Research Group University of Michigan http: //cccp. eecs. umich. edu ARM Limited

Hot Chips 16 August 24, 2004 Optimo. DE Overview • Optimo. DE – A Hot Chips 16 August 24, 2004 Optimo. DE Overview • Optimo. DE – A configurable VLIW-styled Data Engine architecture – Targeted at intensive data processing • Characteristics – Very wide performance envelope • Power / area / speed tradeoff • Exploiting parallelism in applications – Unlimited data path configuration options – User extensible through ISA customization • Semi-automatic design system – User-in-the-loop design, retargetable compiler toolchain

Hot Chips 16 August 24, 2004 SRAM I/O S M switch S S S Hot Chips 16 August 24, 2004 SRAM I/O S M switch S S S FIFO S M DATA 1 M 2 CTRL M DATA 2 Data ARM CPU Memory SDRAM Controller AHB Bus Matrix SDRAM Control Optimo. DE in a System On Chip Data Engine MEM DMA Controller Interrupt Controller So. C

Hot Chips 16 August 24, 2004 Optimo. DE Architecture Model Function units Memories … Hot Chips 16 August 24, 2004 Optimo. DE Architecture Model Function units Memories … interconnect Functional Units – ALU, ACU, Multipliers – Custom Controller … • I/O ports • – RAM ( asynch / synch ) – ROM • I/O ports – addressable – handshake protocol • Registers Memory Registers – Register files regs … regs Interconnect – Direct connection – Shared bus m. C • interconnect Interconnect • Controller • • All layers required Intra-layer configuration

Hot Chips 16 August 24, 2004 Design Toolchain m. A Evaluation m. A Definition Hot Chips 16 August 24, 2004 Design Toolchain m. A Evaluation m. A Definition Optimo. DE Library Librarian User Library m. A. inc 1. c. inc xxxxx xx dct xxxxx instantiate Design. DE m. A. inc 2. c. inc m. A. inc 3. c. inc DEvelop m. A. tmp save ISS set target #include … run / profile 001010 110011 010110 110111 load main() { … = dct(); create_resource LIFETIME } LOAD

Hot Chips 16 August 24, 2004 Compiler Toolchain DEvelop + C Source Description + Hot Chips 16 August 24, 2004 Compiler Toolchain DEvelop + C Source Description + 2 3 INPUT * + + 1 3 2 * + 4 * 5 + 7 6 * * 8 1 OUTPUT analysis/opti bind 0 1 2 3 4 5 Analysis feedback * + 1 2 * 4 * + 3 6 * + 8 7 9 10 * 5 * + Microcode schedule Syntax checks Match architecture Optimize code and Dataflow analysis and dataflow graph register use

Hot Chips 16 August 24, 2004 32 -point DCT Microarchitecture inport rom outport ram_1 Hot Chips 16 August 24, 2004 32 -point DCT Microarchitecture inport rom outport ram_1 acu_2 imm_1 ram_2 alu_1 alu_2 acu_3 imm_2 control • 2 Custom FUs, 2 RAM, 1 ROM, 3 ACU, 2 I/O ports • Designer responsible for creating custom units manually

Hot Chips 16 August 24, 2004 Retargetable Customization • Prototype 2 technologies in Optimo. Hot Chips 16 August 24, 2004 Retargetable Customization • Prototype 2 technologies in Optimo. DE – Automated ISA customization – Retargetable customization to an “application-area” • Customizing for 1 application – Programmability Nominally programmable – Critical problem – Cannot sustain performance across similar applications – How well does a custom ISA generalize • 5 encryption algorithms, create custom design for each • Average loss >80% versus native [MICRO, 2003] – Proactive generalization creates a retargetable design

Hot Chips 16 August 24, 2004 Creating Custom Instructions • Candidate discovery – Identify Hot Chips 16 August 24, 2004 Creating Custom Instructions • Candidate discovery – Identify customization opportunities • Examine program DFG • Partition DFG at: – Memory operations – Unprofitable edges • Enumerate candidate subgraphs within each partition

Hot Chips 16 August 24, 2004 Grouping and Selection • Group candidate subgraphs with Hot Chips 16 August 24, 2004 Grouping and Selection • Group candidate subgraphs with same structure Group 1 Group 2 Group 1 • Estimate performance and cost for each group • Greedily select groups to implement in hardware subject to budget • 1 CFU created per group Cost: 2 Adders Gain: 10, 000 Cycles Group 3 Cost: 0. 5 Adders Gain: 1, 000 Cycles Group 4 Cost: 1 Adder Gain: 1, 500 Cycles Group 3 2, 500 Cycles 4 Gain: Group

Hot Chips 16 August 24, 2004 Proactively Generalize Groups Input 1 • Cost-effectively extend Hot Chips 16 August 24, 2004 Proactively Generalize Groups Input 1 • Cost-effectively extend group functionality to enable reuse 0 x 8 >> 0 x. FF Input 1 Input 2 | >> 0 x. FF • Wildcard – multiple functionality at nodes + Input 2 Output • Subsumed – configurable interconnect to bypass nodes 0 x 8, 0 x 4 +, Output |, &

Hot Chips 16 August 24, 2004 Native Speedups Hot Chips 16 August 24, 2004 Native Speedups

Hot Chips 16 August 24, 2004 Importance of Generalization 3 des Blowfish Rc 4 Hot Chips 16 August 24, 2004 Importance of Generalization 3 des Blowfish Rc 4 AES Key: application run – application designed for Sha

Hot Chips 16 August 24, 2004 Case Study - Md 5 Hot Chips 16 August 24, 2004 Case Study - Md 5

Hot Chips 16 August 24, 2004 Optimo. DE Design for this Point Input 1 Hot Chips 16 August 24, 2004 Optimo. DE Design for this Point Input 1 Input 2 + Input 3 + ALU 1 RF RF ALU 2 ACU SRAM … Input 4 CFU RF RF + 0 x 1 B >> Control Memory 0 x 5 << | Output Input 2 Input 3 Input 1 ^ & ^ Output

Hot Chips 16 August 24, 2004 Die Area Breakdown Optimo. DE = 5. 5 Hot Chips 16 August 24, 2004 Die Area Breakdown Optimo. DE = 5. 5 mm 2 in 0. 13 m ARM 926 EJ = 5. 0 mm 2 in 0. 13 m ALU 1 RF RF ALU 2 ACU SRAM … Control Memory CFU RF RF

Hot Chips 16 August 24, 2004 Conclusions • Optimo. DE – Configurable VLIW-style data Hot Chips 16 August 24, 2004 Conclusions • Optimo. DE – Configurable VLIW-style data engine architecture – Automated tools for implementing embedded signal and data processing solutions • Automatic retargetable customization – Customized design combined with cost-effective generalization – Performance programmability - Performance stability across a family of similar applications

Hot Chips 16 August 24, 2004 For More Information • CCCP group website – Hot Chips 16 August 24, 2004 For More Information • CCCP group website – cccp. eecs. umich. edu • ARM Optimo. DE information – www. arm. com/products/CPUs/families/Optimo. DE. html

Hot Chips 16 August 24, 2004 Designing for a Domain Hot Chips 16 August 24, 2004 Designing for a Domain