Скачать презентацию Resource Awareness FPGA Design Practices for Reconfigurable Computing Скачать презентацию Resource Awareness FPGA Design Practices for Reconfigurable Computing

cfc3250333cd92314fe511150b473fb3.ppt

  • Количество слайдов: 19

Resource Awareness FPGA Design Practices for Reconfigurable Computing: Principles and Examples Wu, Jinyuan Fermilab, Resource Awareness FPGA Design Practices for Reconfigurable Computing: Principles and Examples Wu, Jinyuan Fermilab, PPD/EED April 2007

Introduction • Short Course (1/2 day): – “How to Design Compact FPGA Functions: Resource Introduction • Short Course (1/2 day): – “How to Design Compact FPGA Functions: Resource awareness design practices. ” – http: //www-ppd. fnal. gov/EEDOffice. W/Projects/ckm/comadc/Compact. FPGAdesign. pdf • Refresher Course (45 min): – “Resource Saving in Micro-Computer Software & FPGA Firmware Designs” – http: //www-ppd. fnal. gov/EEDOffice. W/Projects/ckm/comadc/Resource. Saving. ppt What can be done • This Document with an FPGA? – Resource Awareness FPGA Design Practices for Reconfigurable Computing: Principles and Examples

FPGA AMP & Shaper ADC FPGA AMP & Shaper TDC AMP & Shaper Example: FPGA AMP & Shaper ADC FPGA AMP & Shaper TDC AMP & Shaper Example: ADC Using FPGA • Analog signals from AMP & Shapers are directly fed to FPGA pins. • FPGA outputs and passive RC network are used to generate ramping reference voltage VREF. • The input voltages and VREF are compared using FPGA differential input receivers. • The times of transitions representing input voltage values are digitized by TDC blocks in FPGA. TDC V 1 VREF R 1 V 2 V 3 T 2 T 3 V 4 C R 2 T 1 T 4

Clock Multiple Domain Sampling Changing TDC Inside FPGA Q 2 QD c 0 Q Clock Multiple Domain Sampling Changing TDC Inside FPGA Q 2 QD c 0 Q 3 QE • Sampling rate: 360 MHz x 4 phases = 1. 44 GHz. • LSB = 0. 69 ns. • Logic elements with critical timing are assigned as shown. QF Q 1 c 0 c 90 c 180 Q 0 c 270 4 Ch c 90 Trans. Detection & Encode Coarse Time Counter Logic elements with non-critical timing are freely placed by the fitter of the compiler. DV T 0 T 1 TS

ADC Test: Waveform Digitization on BD 3_19 FPGA TDC A lot can be done ADC Test: Waveform Digitization on BD 3_19 FPGA TDC A lot can be done with an FPGA if one can image. TDC VREF 50 50 Input Waveform, Overlap Trigger & Reference Voltage 1000 p. F 100 Raw Data Converted

Micro-computing vs. Reconfigurable Computing (100+3 -4)*5+7 =? 100 3 Data: 100, 3, 4, 5, Micro-computing vs. Reconfigurable Computing (100+3 -4)*5+7 =? 100 3 Data: 100, 3, 4, 5, 7 4 5 Control: LD (+) Data Program (-) 7 (*) (+) CPU Data Program FPGA Configuration • In microprocessor, the users specify program on fixed logic circuits. • In FPGA, the users specify logic circuits (as well as program). • The FPGA computing needs not to follow microprocessor architectures. (But useful experiences can be borrowed. ) • The usefulness of FPGA reconfigurable computing is still to be fully appreciated.

Example: Track Fitting -4 h (z-z 0)=-4 (z-z 0)=-2 y 0 4 h z=z Example: Track Fitting -4 h (z-z 0)=-4 (z-z 0)=-2 y 0 4 h z=z 0 (z-z 0)=+2 (z-z 0)=+4

Relative Errors of Several Track Fitter Schemes Least Square Fitter Multiplier-less FPGA LS Fitter Relative Errors of Several Track Fitter Schemes Least Square Fitter Multiplier-less FPGA LS Fitter

Least Square Fitter c 7 c 6 c 5 c 4 c 3 c Least Square Fitter c 7 c 6 c 5 c 4 c 3 c 2 c 1 d 7 d 6 d 5 d 4 d 3 d 2 d 1 e 7 e 6 e 5 e 4 e 3 e 2 e 1 y 7 y 6 y 5 y 4 y 3 y 2 y 1 X • The parameters can be described as inner-products. • Hit coordinates and coefficients are fed simultaneously. • The inner-products can be calculated with multiplieraccumulator structures. X X S S S

Multiplier-less (ML) Quasi-Least Square Fitter x 7 x 6 y 7 x 5 y Multiplier-less (ML) Quasi-Least Square Fitter x 7 x 6 y 7 x 5 y 6 x 4 y 5 x 3 y 4 x 2 y 3 +1 4 x 1 y 2 – 5=4+1; 7=8 -1; 112=128 -16; • The multiplication is replaced with two shift & add/sub operations. • There are two clock cycles to fetch a measurement point (i. e. , y 1, y 2, etc. ) allowing two shift & add/sub operations -16 128 y 1 << • The coefficients are described as “two-bit” numbers, e. g. : -1 8 S +/- << S +/-

Inaccuracy Doesn’t Matter, A Lot of Time Multiplier-less Quasi-Least Square FPGA Fitter Least Square Inaccuracy Doesn’t Matter, A Lot of Time Multiplier-less Quasi-Least Square FPGA Fitter Least Square Fitter

Fitting is easy. Matching hits is harder. Software O(n 2) FPGA Typical O(n)*O(N) FPGA Fitting is easy. Matching hits is harder. Software O(n 2) FPGA Typical O(n)*O(N) FPGA Resource Saving Approaches Hash Sorter O(n)*O(N): in RAM for(){…} } Comparator Array O(n 3) O(n)*O(N 2) Tiny Triplet Finder CAM, O(n)*O(N*log. N) for(){…} } } O(n 4) for(){ for() {…} }}} Hugh Trans.

Resource Saving Tricks Loop Reduction Tricks: The number of computations in a given task Resource Saving Tricks Loop Reduction Tricks: The number of computations in a given task is reduced by (1) using fewer iterations in loops or/and (2) using fewer operations in each iteration. Non-Loop Reduction Tricks: The number of computations in a given task is unchanged. The FPGA resource is saved by (1) reusing the resources multiple times via sequencing or/and (2) using transistor-saving resources such as RAM.

Resource Saving Tricks Loop-Reduction Tiny Triplet Finder: O(n)*O(N*log(N)) x[n] + Bit Array Bit-wise Coincident Resource Saving Tricks Loop-Reduction Tiny Triplet Finder: O(n)*O(N*log(N)) x[n] + Bit Array Bit-wise Coincident Logic Shifter Bit Array Shifter Recursive Implementation of FIR Filter *h 1 S *h 2 -x[n-K] s[n] *h[K] -s[n-K] + y[n] *R 1/R 3 *R 2/R 3 Multiplier-less (ML) Approaches FFT: O(n)*O(log(N)) X << S S +/-

Resource Saving Tricks Non-Loop-Reduction Sequencing: Initialization OP 1 OP 2 OP 3 OP 4 Resource Saving Tricks Non-Loop-Reduction Sequencing: Initialization OP 1 OP 2 OP 3 OP 4 Using RAM: Hash Sorter/Histogram Initialization 12 Initialization 3 Initialization OP 4 OP 3 OP 2 OP 1

An Example of Inexplicit Computing & Hidden Resource RAM D 16 BCO Input Ctrl An Example of Inexplicit Computing & Hidden Resource RAM D 16 BCO Input Ctrl Deserial. WA W/R RA 32 Hit(s) • Data with random time stamp are re-ordered according to beam crossing (BCO). • Data with same BCO output together and the bandwidth becomes smaller. • Inexplicit computing (sorting) is performed with hidden resource (RAM, it should be static RAM not dynamic RAM. )

Why Saving Resource? Why not? Why Saving Resource? Why not?

The Fever of Moore’s Law vs. Maxwell’s Equations Op/sec WRW 1998 2000 2002 2004 The Fever of Moore’s Law vs. Maxwell’s Equations Op/sec WRW 1998 2000 2002 2004 2006 2008 2010 MIT, 2002 • During the hot days of Moore’s Law, the rules of thumb are: – BRB – Buy Rather than Build – URU – Use Rather than Understand – WRW – Wait Rather than Work • From fundamental principles like Maxwell’s Equations, it is known limits of Moore’s Law exist. The technology advance should come from: – The I 3 Law: Imagination, Innovation & Implementation.

Total Useful Works = (Clock Frequency) x (Silicon Size) x (Efficiency) E E Primarily Total Useful Works = (Clock Frequency) x (Silicon Size) x (Efficiency) E E Primarily Users’ Responsibility F F S S • There is a big room for improvement on computation efficiency in both microcomputer software and FPGA firmware. • Resource awareness not only saves direct cost, but also indirect cost like power consumption, PC board layout, cooling etc. • Unnecessary artificial complexities confuse people, often including the designer. • Resource saving helps today when technology stales. • Resource saving helps future with technology progresses.