2b92ccc907cb6291ea5549fcef4ea3e6.ppt
- Количество слайдов: 22
Instruction Generation for Hybrid Reconfigurable Systems Ryan Kastner, Seda Ogrenci-Memik, Elaheh Bozorgzadeh and Majid Sarrafzadeh {kastner, seda, elib, majid}@cs. ucla. edu Embedded and Reconfigurable Systems Group Computer Science Department UCLA Los Angeles, CA 90095 ICCAD’ 01: November, 2001
Outline l Introduction l l Instruction Generation l l l Programmability Hybrid Reconfigurable Systems Strategically Programmable System Uses in Hybrid Reconfigurable Systems Relation to Template Generation and Matching Algorithm for Template Generation and Matching Experiments Conclusion ICCAD’ 01: November, 2001
Programmability Future systems need programmability multiple levels of computation hierarchy Computational Hierarchy: l Control ADD Register Control FU FU Register MUL Register Memory Bank Programmability Basic Unit of Computation Communication Bit -Architecture Level Gate Level Byte Instruction (8 – 128 bits) Boolean Operation Arithmetic Operation Functional Operation (and, or, xor) Direct wires Bundles of wires, Bus, memory connections registers Hybrid Reconfigurable Systems have programmability at ICCAD’ 01: November, 2001 one or more levels
Tradeoffs Control Thousands of cycles ADD Configuration Time Register Flexibility MUL Register Hundreds FU FU of cycles Register Memory Bank Gate level Microarchitecture level Architecture level Types of Programmable Units CLBs, LUTs Datapath unit, Control unit, RAM Custom instructions, Register banks Example Platform Xilinx, Altera Chameleon Systems Tensilica, Improv Hybrid Reconfigurable Systems should find a happy medium ICCAD’ 01: November, 2001
SPS - Strategically Programmable System l l Embed (hard or soft) computational units – Versatile Programmable Blocks (VPB) - into FPGA-like fabric Combine programmable units from gate, microarchitecture and architecture levels Balance flexibility and configuration time Memory Need automated method of determining the functionality of VPBs VPB Memory VPB ICCAD’ 01: November, 2001
Overview of SPS Compiler Set of applications specified in high level code (c/c++, fortran, MOC) SPS Architecture • Compile to low level specification • Determine VPB functionality SPS Architecture Generation SPS VPB Module Synthesis Placement Routing Arch. ICCAD’ 01: November, 2001
VPB Instruction Generation l Given a set of applications, what computation should be implemented on VPBs? RAM Set of applications VPB RAM VPBs? VPB l l l Want complex, commonly occurring computation patterns Look for computational patterns at the instruction level Basic operation is add, multiply, shift, etc. ICCAD’ 01: November, 2001
Problem Definition l l l Determining VPB functionality requires regularity extraction Regularity Extraction - find common sub-structures (templates) in one or a collection of graphs Each application can be specified by collection of graphs (CDFGs) Templates are implemented as VPBs Two related sub-problems: l l Template Matching Template Generation ICCAD’ 01: November, 2001
Template Matching – Formal Def’n l Problem 1: Given a directed, labeled graph G(N, A), a library of templates, each of which is a directed labeled graph Ti(V, E), find every subgraph of G that is isomorphic to any Ti Directed Labeled Graph G Templates T T 2 T 1 + T 3 * & + + % * * & + + * * + * T 6 T 5 T 4 + || * * & + || * + + % + * * + + ICCAD’ 01: November, 2001 +
Template Matching – Formal Def’n l Problem 2: Given an infinite number of each set of templates = T 1, … , Tk and an overlapping set of subgraphs of the given graph G(N, E) which are isomorphic to some member of ; minimize k as well as xi where xi is the number of templates of type Ti used such that the number of nodes left uncovered is the minimum. + % * * & + + % + + * & * || * + + + ICCAD’ 01: November, 2001
Template Generation l l l Templates may not always be given as input An automatic regularity extraction algorithm must develop it’s own templates Generate a set of templates such that: l l Number of templates is minimized Covering of the graph is maximized ICCAD’ 01: November, 2001
Related Work l Useful in a wide variety of CAD applications l Data path regularity l l l [Chowdhary 98], [Callahan 99] Scheduling [Ly 95] System partitioning [Rao 93] Low power design [Mehra 96] Soft macros – CPR [Cadambi 99] for Pipe. Rench architecture ICCAD’ 01: November, 2001
An Algorithm for Simultaneous Template Generation and Matching Formal Definition Informal Definition 1. Given a labeled digraph G(V, E) 2. # C is a set of edge types 1. Find the most common edge type 2. Contract common edges 3. C 4. while (stop_conditions_not_met(G)) 5. C profile_graph(G) 6. cluster_common_edges(G, C) ICCAD’ 01: November, 2001 3. Repeat until stopping condition met
Explanation of Algorithm l Profile Edges: Find most common edge types * + * * Most Common * Edge Type * * l Edge contraction: Merge adjacent nodes and maintain connectivity + * * Contract Edge Stopping Conditions l l l * * l * Reach certain number of templates Graph sufficiently covered No frequently occurring edge type ICCAD’ 01: November, 2001
Algorithm in Action >> * & % * + * * Edge 4 Edge 1 Edge 2 Edge 3 * Conflict Graph >> & + Create Conflict Graph Contract edges 2 and 4 Determine MIS * * Edge 2 Edge 1 * * Edge 3 * Edge 4 MIS * % * Templates >> * & % * + * * Templates >> * Contract edges * ICCAD’ 01: November, 2001 & % * + * * * Iteration 2
Algorithm Summary l Algorithm can be generalized and used in a variety of applications l Easily extended to hypergraphs l Input/output pin restrictions can easily be added l Performs template generation and matching simultaneously We target algorithm towards VPB generation in SPS ICCAD’ 01: November, 2001
Experimental Setup Control Flow Graph Set of applications specified in C SUIF & Machine-SUIF Control Dataflow Graph + * + + * Dataflow Graph Generation Pass ICCAD’ 01: November, 2001
Experimental Setup Control Dataflow Graph Media. Bench Files + * + + * Compile to CDFGs Gather Perform Statistics: Template Graph Coverage, Generation Num. Templates and Matching ICCAD’ 01: November, 2001
Experimental Setup - Benchmarks l Selected files from Media. Benchmark C File Description mpeg 2 motion. c Motion vector decoding mpeg 2 getblk. c DCT block decoding adpcm. c ADPCM to/from 16 -bit PCM epic convolve. c 2 D general image convolution jpeg jctrans. c Transcoding compression jpeg jdmerge. c Color conversion rasta fft. c Fast Fourier Transform rasta noise_est. c Noise estimation functions gsm_decode. c GSM decoding gsm_encode. c GSM encoding ICCAD’ 01: November, 2001
Similarity Across Applications Operation Media. Bench file name motion jdmerge getblk gsm_dec jctrans ADD 50. 3% 84. 6% 44. 5% 29. 6% 84. 6% MUL 36. 3% 13. 8% 24. 0% 22. 4% 13. 8% Template Coverage MULMUL ADDADD ADDMUL MULADD 0. 0% 1. 3% 0. 0% 14. 5% 9. 1% 3. 2% 3. 6% 9. 1% 0. 0% 0. 4% 0. 6% 0. 0% 0. 4% 36. 3% 13. 0% 21. 5% 22. 4% 13. 0% ICCAD’ 01: November, 2001
Experimental Results l l Techniques l Simple – restrict templates to two operations l No restrictions – unlimited amount of operations Stopping condition: most common edge occurs < x% (x 5 -25) ICCAD’ 01: November, 2001
Summary l l l Systems need programmability at multiple levels of the computational hierarchy Introduced SPS as a Hybrid Reconfigurable System Developed an instruction generation algorithm to determine VPB functionality Showed that common templates can be found across a similar set of applications An efficient covering possible using simple templates Future work: Create methods to uncover more complex templates ICCAD’ 01: November, 2001
2b92ccc907cb6291ea5549fcef4ea3e6.ppt