Скачать презентацию Progress Report FPGA-based Infrastructure Henry Chen henryic ee ucla Скачать презентацию Progress Report FPGA-based Infrastructure Henry Chen henryic ee ucla

eb8cfdb681580805d0b7a6d0fccafc38.ppt

  • Количество слайдов: 28

Progress Report FPGA-based Infrastructure Henry Chen henryic@ee. ucla. edu June 11, 2010 Progress Report FPGA-based Infrastructure Henry Chen [email protected] ucla. edu June 11, 2010

Motivation w Architectural & algorithmic exploration/optimization w High-performance/high-throughput computation w Closed-loop test environment [1, Motivation w Architectural & algorithmic exploration/optimization w High-performance/high-throughput computation w Closed-loop test environment [1, 2]

Platform Architecture [3] w Large design effort; amortize widely w As general-purpose as possible Platform Architecture [3] w Large design effort; amortize widely w As general-purpose as possible – Large memories – High I/O bandwidth w Use embedded CPU to provide high-level interface to FPGA resources

IBOB w IBOB (Interconnect Break-Out Board) – – – 1 x Virtex-II Pro (FPGA IBOB w IBOB (Interconnect Break-Out Board) – – – 1 x Virtex-II Pro (FPGA + Power. PC 405) 2 x 18 Mb (36 -bit) SRAMs (~250 MHz) 2 x CX 4 10 Gb high-speed-serial 2 x Z-DOK+ high-speed differential GPIO (80 diff pairs) 80 x LCMOS/LVTTL GPIO w RS 232 UART to PPC; major I/O bottleneck – read_xps/write_xps w Our primary test platform; have 2 in-house

ROACH w ROACH (Reconfigurable Open Architecture Compute Hardware) – – – 1 x Virtex ROACH w ROACH (Reconfigurable Open Architecture Compute Hardware) – – – 1 x Virtex 5 FPGA External PPC 440 1 x DDR 2 DIMM 2 x 72 Mbit (18 -bit) QDR SRAMs (~350 MHz) 4 x CX 4 2 x Z-DOK+ (80 diff pairs) w External PPC provides much faster interface to FPGA resources (1 Gb. E) w None in-house (for now)

BEE 2 w BEE 2 (Berkeley Emulation Engine) – 5 x Virtex-II Pro – BEE 2 w BEE 2 (Berkeley Emulation Engine) – 5 x Virtex-II Pro – 20 x DDR 2 DRAM DIMMs (200 MHz) – 18 x CX 4 ports w High-End Reconfigurable Computer – High I/O bandwidth per FPGA – High memory capacity per FPGA w Have one in-house

BORPH [4] w Linux kernel modification for hardware abstraction; run on embedded CPU connected BORPH [4] w Linux kernel modification for hardware abstraction; run on embedded CPU connected to FPGA w “Hardware process” – Programming an FPGA running Linux executable – Some FPGA resources accessible in Linux process memory space w Makes FPGA board look just like Linux workstation w Used on BEE 2, ROACH; limited version on IBOB w/ expansion board

Design Environment w Simulink – Schematic-like – Integration w/ Matlab for analysis w Good Design Environment w Simulink – Schematic-like – Integration w/ Matlab for analysis w Good for dataflow designs (ie. , DSP) w Designed by BWRC, now maintained by international collaboration w Tutorials aplenty! See wiki

Design Environment • Xilinx System Generator for Simulink • Custom DSP and system blocksets Design Environment • Xilinx System Generator for Simulink • Custom DSP and system blocksets • One-click design compilation

Testing w/ ROACH + KATCP w Digital frontend receiver (Rashmi) Testing w/ ROACH + KATCP w Digital frontend receiver (Rashmi)

Matlab Power. PC BRAM QDR SRAM FPGA LVDS IO 1 Gb. E ASIC Test Matlab Power. PC BRAM QDR SRAM FPGA LVDS IO 1 Gb. E ASIC Test Board ASIC

Testing Requirements w High TX clock rate (400 MHz target) – Beyond practical limits Testing Requirements w High TX clock rate (400 MHz target) – Beyond practical limits of IBOB’s V 2 P w Long test vectors (~4 Mb) w Asynchronous clock domains for TX and RX

Asynchronous Clock Domains w Easily supported by FPGA hardware w XSG has very limited Asynchronous Clock Domains w Easily supported by FPGA hardware w XSG has very limited capability for expressing multiple clocks; CE toggling w Further restricted by bee_xps tool automation; assumes single clock design (though many different clocks available)

Asynchronous Clock Domains w Manually merged separate designs for test vector and readback datapaths Asynchronous Clock Domains w Manually merged separate designs for test vector and readback datapaths Fixed 60 MHz RX 255 -315 MHz TX

Results w Test up to 315 MHz w/ loadable vectors in QDR; up to Results w Test up to 315 MHz w/ loadable vectors in QDR; up to 340 MHz with pre-compiled vectors in ROMs w 55 d. B SNR @ 20 MHz bandwidth

Limitations w DDR output FF critical path @ 340 MHz (clock out) w QDR Limitations w DDR output FF critical path @ 340 MHz (clock out) w QDR SRAM bus interface critical path @ 315 MHz w Output clock jitter? w LVDS receivers usually only 400 500 Mbps – OK for data, not good for faster clocks – Get LVDS I/O cells?

Future Design Recommendations w Send source-synchronous clock with returned data w Send synchronization information Future Design Recommendations w Send source-synchronous clock with returned data w Send synchronization information with returned data – “Vector warning” or frame start – Data valid

KATCP w Comm. protocol interfacing to BORPH w Can be implemented over TCP telnet KATCP w Comm. protocol interfacing to BORPH w Can be implemented over TCP telnet connection w Libraries and clients for C, Python

KATCP Matlab Client w For our purposes, replaces read_xps, write_xps w Can program FPGA KATCP Matlab Client w For our purposes, replaces read_xps, write_xps w Can program FPGA from directly from Matlab no more JTAG cable! w Provides byte-level read/write granularity w Increases speed from ~KB/s to ~MB/s – Room for improvement; currently high protocol overhead

Towards Streaming w Transition to TCP/IP-based protocols facilitates streaming w Osort test vectors 10 Towards Streaming w Transition to TCP/IP-based protocols facilitates streaming w Osort test vectors 10 Mb of data at ~Mb/s (IBOB) – Single-vector load and read via SRAM – LWIP UDP read/write_xps – Ethernet streaming w/o going through shared memory

New Windows Server(s) w dsp experiencing severe stability problems w eecls-{1, 2, 3, 4}. New Windows Server(s) w dsp experiencing severe stability problems w eecls-{1, 2, 3, 4}. ee. ucla. edu – – – – Windows Server 2008 (32 -bit) Matlab R 2007 b (+ XSG 10. 1) Matlab R 2009 b (+ XSG 11. 5, Synphony 2009. 12) Xilinx Suite 10. 1 Xilinx Suite 11. 5 Model. Sim 6. 6 a Synplify 2010. 03 w sherwin is now a print server

References [1] Marković, D. , et al. , “ASIC Design and Verification in an References [1] Marković, D. , et al. , “ASIC Design and Verification in an FPGA Environment, ” IEEE CICC, 2007 [2] Dejan Marković, UCLA EEM 216 A Fall 2008 Lecture 20 [3] Chang, C. , et al. , “BEE 2: A High-End Reconfigurable Computing System”, IEEE Design & Test of Computers, 2005 [4] H. So, R. Brodersen, “A Unified Hardware/Software Runtime Environment for FPGA-Based Reconfigurable Computers using BORPH, ” ACM TECS, 2008.