d7fb0bb6ac0cebb67e17839591af1a9f.ppt
- Количество слайдов: 19
Hot Chips 16 August 24, 2004 Optimo. DE: Programmable Accelerator Engines Through Retargetable Customization Nathan Clark, Hongtao Zhong, Kevin Fan, Scott Mahlke Krisztián Flautner, Koen Van Nieuwenhove CCCP Research Group University of Michigan http: //cccp. eecs. umich. edu ARM Limited
Hot Chips 16 August 24, 2004 Optimo. DE Overview • Optimo. DE – A configurable VLIW-styled Data Engine architecture – Targeted at intensive data processing • Characteristics – Very wide performance envelope • Power / area / speed tradeoff • Exploiting parallelism in applications – Unlimited data path configuration options – User extensible through ISA customization • Semi-automatic design system – User-in-the-loop design, retargetable compiler toolchain
Hot Chips 16 August 24, 2004 SRAM I/O S M switch S S S FIFO S M DATA 1 M 2 CTRL M DATA 2 Data ARM CPU Memory SDRAM Controller AHB Bus Matrix SDRAM Control Optimo. DE in a System On Chip Data Engine MEM DMA Controller Interrupt Controller So. C
Hot Chips 16 August 24, 2004 Optimo. DE Architecture Model Function units Memories … interconnect Functional Units – ALU, ACU, Multipliers – Custom Controller … • I/O ports • – RAM ( asynch / synch ) – ROM • I/O ports – addressable – handshake protocol • Registers Memory Registers – Register files regs … regs Interconnect – Direct connection – Shared bus m. C • interconnect Interconnect • Controller • • All layers required Intra-layer configuration
Hot Chips 16 August 24, 2004 Design Toolchain m. A Evaluation m. A Definition Optimo. DE Library Librarian User Library m. A. inc 1. c. inc xxxxx xx dct xxxxx instantiate Design. DE m. A. inc 2. c. inc m. A. inc 3. c. inc DEvelop m. A. tmp save ISS set target #include … run / profile 001010 110011 010110 110111 load main() { … = dct(); create_resource LIFETIME } LOAD
Hot Chips 16 August 24, 2004 Compiler Toolchain DEvelop + C Source Description + 2 3 INPUT * + + 1 3 2 * + 4 * 5 + 7 6 * * 8 1 OUTPUT analysis/opti bind 0 1 2 3 4 5 Analysis feedback * + 1 2 * 4 * + 3 6 * + 8 7 9 10 * 5 * + Microcode schedule Syntax checks Match architecture Optimize code and Dataflow analysis and dataflow graph register use
Hot Chips 16 August 24, 2004 32 -point DCT Microarchitecture inport rom outport ram_1 acu_2 imm_1 ram_2 alu_1 alu_2 acu_3 imm_2 control • 2 Custom FUs, 2 RAM, 1 ROM, 3 ACU, 2 I/O ports • Designer responsible for creating custom units manually
Hot Chips 16 August 24, 2004 Retargetable Customization • Prototype 2 technologies in Optimo. DE – Automated ISA customization – Retargetable customization to an “application-area” • Customizing for 1 application – Programmability Nominally programmable – Critical problem – Cannot sustain performance across similar applications – How well does a custom ISA generalize • 5 encryption algorithms, create custom design for each • Average loss >80% versus native [MICRO, 2003] – Proactive generalization creates a retargetable design
Hot Chips 16 August 24, 2004 Creating Custom Instructions • Candidate discovery – Identify customization opportunities • Examine program DFG • Partition DFG at: – Memory operations – Unprofitable edges • Enumerate candidate subgraphs within each partition
Hot Chips 16 August 24, 2004 Grouping and Selection • Group candidate subgraphs with same structure Group 1 Group 2 Group 1 • Estimate performance and cost for each group • Greedily select groups to implement in hardware subject to budget • 1 CFU created per group Cost: 2 Adders Gain: 10, 000 Cycles Group 3 Cost: 0. 5 Adders Gain: 1, 000 Cycles Group 4 Cost: 1 Adder Gain: 1, 500 Cycles Group 3 2, 500 Cycles 4 Gain: Group
Hot Chips 16 August 24, 2004 Proactively Generalize Groups Input 1 • Cost-effectively extend group functionality to enable reuse 0 x 8 >> 0 x. FF Input 1 Input 2 | >> 0 x. FF • Wildcard – multiple functionality at nodes + Input 2 Output • Subsumed – configurable interconnect to bypass nodes 0 x 8, 0 x 4 +, Output |, &
Hot Chips 16 August 24, 2004 Native Speedups
Hot Chips 16 August 24, 2004 Importance of Generalization 3 des Blowfish Rc 4 AES Key: application run – application designed for Sha
Hot Chips 16 August 24, 2004 Case Study - Md 5
Hot Chips 16 August 24, 2004 Optimo. DE Design for this Point Input 1 Input 2 + Input 3 + ALU 1 RF RF ALU 2 ACU SRAM … Input 4 CFU RF RF + 0 x 1 B >> Control Memory 0 x 5 << | Output Input 2 Input 3 Input 1 ^ & ^ Output
Hot Chips 16 August 24, 2004 Die Area Breakdown Optimo. DE = 5. 5 mm 2 in 0. 13 m ARM 926 EJ = 5. 0 mm 2 in 0. 13 m ALU 1 RF RF ALU 2 ACU SRAM … Control Memory CFU RF RF
Hot Chips 16 August 24, 2004 Conclusions • Optimo. DE – Configurable VLIW-style data engine architecture – Automated tools for implementing embedded signal and data processing solutions • Automatic retargetable customization – Customized design combined with cost-effective generalization – Performance programmability - Performance stability across a family of similar applications
Hot Chips 16 August 24, 2004 For More Information • CCCP group website – cccp. eecs. umich. edu • ARM Optimo. DE information – www. arm. com/products/CPUs/families/Optimo. DE. html
Hot Chips 16 August 24, 2004 Designing for a Domain


