6d101a545bb9ecfb736c2a707dc4df5a.ppt
- Количество слайдов: 58
1 IP & So. C Verification 경종민 kyung@ee. kaist. ac. kr
2 Contents • IP Verification – – – Cycle-level Transaction Level Testbench build-up Hardware debugging An example (i. PROVE) • So. C Verification – – – Design Flow Multi-level, multi-lingual verification Multiple-FPGA set-up Debugging An example (i. SAVE)
3 IP verification • Important issues – IP reuse – Testbench reuse – Debuggability • Testbench issues – Various testbench support • HLL: C/C++ • HDL: Verilog and VHDL • De facto standards: SCE-MI, System. C, Open. Vera and so on – Various levels of testbench • Transaction-level: control by command level, e. g. , read/write • Cycle-level: control over pin-by-pin • Abstract-bus-level: standard on-chip networks
4 Cycle-level verification • Cycle-level verification Testbench (C/HDL) Testbench Device PCI Driver Channel S/W simulation part DUT (HDL) PCI Controller Buffer/ Pin Signal Generator FPGA part DUT
5 Cycle-level verification • SW: Testbench – Modeled with HDL or C language – Generate stimulus at every clock cycle – Check the result of DUT at every clock cycle • HW: DUT – Mapped on FPGA – Stimulus are transferred through a system bus, e. g. , PCI. – All signals are assigned to DUT concurrently after they are transferred from the SW test bench. • Operating speed – Faster than SW simulation due to the acceleration of HDL or C model of DUT in FPGA. – Determined by the interface requirement (number and bandwidth of signals to be transferred), and bandwidth of the interface (PCI)
6 Transaction-level verification • Transaction-level verification Testbench Device Driver S/W simulation part Main DMA Memory Channel PCI Controller Transactor Testbench DUT FPGA part DUT
7 Transaction-level verification • SW: Testbench – Modelled with C language – Generate stimulus and check the result of DUT – Only information enough to form the transaction is transferred to DUT. • i. e. , command, address and data • HW: DUT and transactor – Mapped on FPGA – Transactor knows how to interpret the transaction and thence generates all signals necessary for DUT. • Operating speed – HW and SW parts are operated independently. – Faster than cycle-level verification as well as SW simulation.
8 Testbench automation • Overview – SCE-MI – VERA – Test Builder
9 SCE-API (MI) • Standard Co-Emulation API (Modeling Interface) – SW part implemented as C or C++, with recommendation on HW implementation – Based on IKOS’ multi-channel co-modeling technology: TIP (Transaction Interface Portal) – SCE-API Consortium • Founded June 2000 • Aptix, Co. Ware, IKOS, Mentor, STMicroelectronics, Synopsys, Trans. EDA – SCE-API version 1. 0 modeling interface • SCE-MI v 1. 0 released through Open System. C Initiative (http: //www. systemc. org), April 2001 – Accellera’s Interface Technical Committee • Merged into ITC, Oct. 2001 – SCE-DI (Debug Interface) & SCE-CI (Control Interface) on progress
10 Vera • Vera – Functional verification language for testbench description – Language specification can be obtained from Open. Vera site (http: //www. opera-vera. com) – Vera Language • Object-oriented language • Includes HDL features – Waiting clock event – Bit data type, bit operation (extraction, concatenation) – Data expectation (‘do something when the expectation is hit’) » 0, 100 bus. ack == 1; // ack must be 1 in at least 100 cycles – Vera Verification Environment • Commercial product from Synopsys • Vera source codes are compiled and runs with HDL simulator in which DUT is simulated. • Additional features – Automatic stimulus generation, Coverage analysis
11 Vera. vr – Vera Compiler HDL Simulator DUT Vera Source Vera Shell Vera PLI . vrh – . vro –. vrl – Vera Header Vera Object Vera List Supplied by User Automatically generated by Vera compiler
12 Test. Builder • Transaction-Based Verification Functional verification in higher level of abstraction Engineer develops tests from a system level perspective – Advantages • Enhance reusability of each component in the testbenches • Improve debugging and overage analysis Transaction Level Tests Signal Level TVM (Transactor) Design TVM: Transaction Verification Model
13 Test. Builder • How Test. Builder Operates Transaction Level Tests While(){ Tx. send_packet(. . ); Mem. expect_write(. . ); . . Signal Level TVM DUV Tx. send_packet(. . ){ header = “hd”; address = 0 xff 0011; data = 0 xff 0011; } } C/C++/Test. Builder C library (PLI/FLI) Implementable using Test. Builder/HDL HDL Simulation
14 Specman • Functional Testbench Automation Tool by Verisity (http: //www. verisity. com) – Its concept is similar to Vera, but starts earlier and more widely used. • Describe user specification with e language Interface Spec & Test Plan by e Legacy code in C/VHDL/Verilog Specman Elite Automatic Testbench Generation Data & Temporal Checking DUT Coverage Analysis
15 Debugging feature < Built-In Logic Analyzer (BILA) Ü DUT boundary – ports Ü DUT internal – internal nodes PCI PC i. PROVE Board
16 Hardware debugging schemes • Low speed scheme – Operating speed: < 10 MHz – There is no dedicated storage element in the device. – All debugging information is transferred to main memory or large storage device at every cycle. – Readback scheme of Xilinx device is a typical example. – Usually, the scheme needs only a small number of IO pins. • JTAG interface: 4 pins (TCK, TDI, TMS, TDO) • 8 -bit parallel interface (CLK, INIT, CS, RW, D[7: 0])
17 Hardware debugging scheme • High speed scheme – Operating speed: < 100 MHz – There is several dedicated elements which can be internal memories or external memories. – All debugging information is stored in the dedicated elements. – Typical example • Xilinx: Chip. Scope • Altera: Signal. Tap-II
18 What is i. PROVE • i. PROVE is a small scale design verification tool by enabling C/C++, HDL and de facto standards interfacing API’s. Testbench and/or rest blocks in C, HDL and/or System. C IP in HDL/EDIF • API – Proprietary C/C++ API – Proprietary Verilog API – SCE-API/MI PCI bus
19 i. PROVE tool positioning Running Speed Real Silicon 100 MHz 10 MHz Rapid Prototype 1 MHz 100 KHz HW Emulator i. PROVE 10 KHz 100 Hz 10 Hz HW Accelerator SW Simulator Investment
20 i. PROVE typical usage: IP verification Testbech PC PCI IP i. PROVE DPP Interactive IO Automatically generated module Signal information DUT Signals Cycle-level verification IP verification without prototyping Test Transactors Transactions DUT Signals Transaction-based verification DUT BFM bus Test Abstract-bus-based verification DUT
21 i. PROVE typical usage: DPP Large size data PC PCI DPP i. PROVE Multi-media board
22 i. PROVE structure • User design – Verilog, VHDL • User testbench – C/C++, Verilog, VHDL • OS – Windows 2000 or XP – Linux • De facto standards – – SCE-MI/API System. C Open. Vera Test. Builder • API – C/C++ • Visual C • Borland C • GNU GCC under Cygwin – Verilog
23 i. PROVE design flow
24 i. PROVE design flow synthesis P&R compilation Mapping by running testbench execution Debugging with BILA
25 Cycle-level with Verilog (1/3) • A simple ALU example
26 Cycle-level with Verilog (2/3) • • Step 1: 2: 3: 4: Start with EDIF of the ALU – need synthesizer make FPGA mapping data modify testbench by inserting PLI’s for i. PROVE run the ALU with i. PROVE and HDL simulator Testbench runs at host computer DUT goes to i. PROVE
Testbench example (Cycle-level) 27 *alu-proxy is image of ALU mapped on FPGA `define CARD_ID 0 module alu_top(); // inputs and outputs … always #5 clk = ~clk; initial begin $dumpfile("alu. vcd"); $dumpvars(); `ifdef i. PROVE $i. Prove. Open. Card(`CARD_ID); Automatically `ifdef i. PROVE $i. Prove. Init. Card(`CARD_ID, “ALU. tcf"); generated by alu_proxy(…) $i. Prove. Load. Module. Info. File(`CARD_ID, "ALU. mit"); `else $i. Prove. Cyc. Load. Signal. Info. File("alu", “ALU. pin"); i. PROVE software alu(…) `endif clk = 1'b 0; alu_sim(. resetb(resetb), . clk(clk), . cmd(cmd), resetb = 1'b 1; . src 1(op 1), . src 2(op 2), . cin(carry), repeat (2) @ (posedge clk); . result(result), . cf(cf), . vf(vf), resetb = 1'b 0; . nf(nf), . zf(zf)); repeat (2) @ (posedge clk); // other thestbench codes resetb = 1'b 1; // other testbench codes `ifdef i. PROVE $i. Prove. Close. Card(`CARD_ID); $i. Prove. Cyc. Signal. Write(modhl_alu, sighdl_reset, reset); `endif $i. Prove. Cyc. Signal. Write(modhl_alu, sighdl_cmd, cmd); $stop; … end $i. Prove. Cyc. Clock. Advance. By. Module(modhl_alu, sighdl_clk); endmodule $i. Prove. Cyc. Signal. Read(modhl_alu, sighdl_cf, cf); $i. Prove. Cyc. Signal. Read(modhl_alu, sighdl_vf, vf); … $i. Prove…; system task for i. Prove defined as PLI routine
28 Transaction-level with C (1/3) • A simple SSRAM example
29 Transaction-level with C (2/3) • • Step 1: 2: 3: 4: Start with EDIF of the SSRAM – need synthesizer make FPGA mapping data modify testbench by inserting PLI’s for i. PROVE run the SSRAM with i. PROVE and HDL simulator DUT & transactor goes to i. PROVE Testbench runs at host computer
Testbench example (Transaction-level) #include “iprove. h” int main(int argc, char** argv) { // other codes i. Prove. Open. Card(card_id); i. Prove. Init. Card(card_id, tcf); i. Prove. Load. Module. Info. File(card_id, mit); i. Prove. Get. Module. Handle(instance_name, &module_handle); i. Prove. Alloc. Read. Buffer(module_handle, sbm_size); i. Prove. Alloc. Write. Buffer(module_handle, sbm_size); #ifdef BILA i. Prove. BILAConfig(card_id, trg); i. Prove. BILATrig. On(card_id); #endif i. Prove. Start(card_id); Test. Bench(); #ifdef BILA i. Prove. BILAUpload(card_id, dmp); bila_info. cid = card_id; bila_info. dump_filename = dmp; bila_info. signallist_filename = lst; With. Check(i. Prove. Dump 2 Vcd(&bila_info, 1, vcd); #endif i. Prove. Stop(card_id); i. Prove. Close. Card(card_id); return 0; } void Test. Bench(void) { // other codes i. Prove. Cmd. Write(module_handle, &cmd, 1); i. Prove. Data. Write(module_handle, pbuf, num, &tmp); // other codes i. Prove. Cmd. Write(module_handle, &cmd, 1); i. Prove. Data. Read(module_handle, pbuf, num, &tmp); // other codes } 30
31 Performance comparisons IDCT: 59 K gates FPACC 0: 56 K gates FPACC 1: 104 K gates FPACC 2: 208 K gates
32 i. PROVE performance • i. PROVE provides outstanding speed-up over x 2000. • Example – FPACC 2 (Floating-point number calculation IP) – Gate count: 208, 479 – Logic usage: 99% of XCV 1000 E x 1 Model. Sim x 47 i. PROVE with Model. Sim x 2053 x 69 i. PROVE with Cycle. Level C-API i. PROVE with Transaction -Level C-API
33 i. PROVE-Summary • Easy to use and fast setup time to emulation – No or minor source modification – The same testbench for simulation and emulation • Various verification mode – Cycle, transaction and abstract bus modes • Powerful debugging – BILA (Built-in Logic Analyzer) as a real hardware logic analyzer • High-performance interface to S/W side – High-speed DMA feature – High to low level languages such as C/C++, Verilog and VHDL • Open interface system – API layer provides easy-tointerface mechanism to de facto standards • Scalability – Multiple i. PROVE cards as well as various gate count options
34 So. C Verification • • Key technologies in So. C Verification Early/Consistent Verification Environment Progressive Refinement Multi-level, Multi-lingual Verification
35 ASIC Verification Methods Running Speed 100 MHz Ideal Verification 10 MHz Solution 1 MHz 100 KHz Make it cheaper Make it faster Real Silicon Rapid Prototype HW Emulator 10 KHz 100 Hz 10 Hz HW Accelerator SW Simulator Investment
36 What’s the point in So. C Verification? • Mixture of SW and HW – Make it easier to cooperate with Processor Model such as ISS or BFM • Mixture of pre-verified, not-verified components – Make it easier to utilize legacy IPs already verified • Mixture of different language, different abstraction level – Provide common interface structure between So. C components
37 Canonical So. C design flow System Spec. HW-SW Co-Design System Design HW IP SW IP HW/SW Partitioning Functional Verification HW Development SW Development HW refinement (UT->T->RTL) Final code Software Verification SW refinement (RTOS mapping) Gate HW-SW Co. Verification Gate-Level Verification • Emulator – In-system emulator – HW-SW co-debugging
38 Tools for HW-SW Co-Verification System Spec. System Design HW IP SW IP HW/SW Partitioning Functional Verification HW Development SW Development HW refinement (UT->T->RTL) HW-SW Co. Verification Software Verification SW refinement (RTOS mapping) – High-level synthesis – Testbench automation – IP accelerator – HW-SW co-simulation – ISS – RTOS simulator
39 Tools for System-level System Spec. HW-SW Co-Design System Design HW IP SW IP HW/SW Partitioning • System-level design (Performance analysis tools) – – – Hot-spot analyzer High-level cycle count estimation High-level power analysis High-level chip area estimation On-chip-bus traffic estimation
40 Verification Environment • Early test-bench setup – Accurate and fast test-bench setup in early design stage greatly reduces verification time and efforts • Consistent test-bench utilization – Once the test-bench is built up, it must be consistently reused in the following design steps • In-system test bench – The test bench must be switchable between SW simulation and in-system verification to cover all corner cases.
41 In-System Verification In-System Gate Level Verification design synthesis manufacture Integration test spec. RTL gate silicon board functional verification formal verification test pattern In-System Behavioral Level Verification
42 Flexible Verification Environment HDL Test Bench In-System Test Bench C Model Conventional Verification Environment C Test Bench HDL Design Gate Level Design C Test Bench In-System Test Bench HDL Test Bench In-System Test Bench C Model HDL Design Gate Level Design
43 Progressive refinement • With the advent of design reuse methodology for System-On-a. Chip designs, a mixture of C, HDL, EDIF netlist and IP core blocks is required to be verified together as one system. • For a large design, it is necessary to verify each design blocks/modules one after another until whole design is verified. • IP has to be prepared in various abstraction levels in order to support progressive refinement process. EDIF RTL BCA Incremental/progressive refinement TF UTF A typical So. C chip SRAM u. P Core FLASH D-Cache USB MPEG FIFO SRAM Logic
44 Multi-Level & Multi-Lingual Algorithm Functional Memory (20~50%) level of abstraction UT Behavioral BCA RTL other IPs (>20%) Custom contents (15~20%) CA gate Multiple Programmable Cores (20%) TA EDIF HDL (gate-level (Veilog netlist) VHDL) System. C (HW) C/C++ (SW) UT: untimed, BCA: bus cycle accurate CA: cycle accurate, TA: timing accurate RTL: register transfer level
45 Supporting Multi-Language • Simulation Vehicle – – HDL Simulator User C process ( C/C++/System. C model for HW or SW ) ISS for embedded processor core Test description language (Vera, Test. Builder) • Emulation Vehicle – FPGA containing one or more IP’s (enables gate-level IP verification) – FPGA interfacing with target system (enables in-system verification) • Communication channel between vehicles – IPC (inter-process communication) for designs simulated in multiple processes – Dedicated device driver for designs mapped in FPGAs
46 Supporting Multi-Level • Bridging abstraction gap – Using transactor Transaction -Level C/HDL Model Read Channel Transactor Cycle Accurate HDL/EDIF Model Cycle-Level Transactor Cycle Accurate Model Write Channel – Using cycle-level transactor Cycle Accurate C/HDL Model Cycle Accurate API Read Channel Write Channel
47 Multi-Level & Multi-Lingual C sessions HDL sessions Design in System. C Design in VHDL Transactor Design in C Design in Verilog Transactor Inter-Lingual Communication I/F protocol TIE Transactor Design in EDIF Target board Transactor Design in EDIF sessions
48 i. SAVE-MP & MPEG 2/4 i. SAVE-MP main i. SAVE-MP TIM GUI windows Decoded image MPEG Board
49 So. C model with ARM CCM Debugger ARM CCM AMBA model Memory model IP models Address Decoder IP in HDL Bus wrapper ILC(Inter-lingual Communication)
50 Using Multiple FPGA’s • Using multiple FPGAs – Partitioning into multiple FPGAs – Bus split Host Processor ARM ISS Memory Model Transactor FPGA 1 Bus Split Logic FPGA 2 IP 0 IP 1 IP 2 IP 3
51 Debugging in Multi-Level • Traditional debugging tools – Design in emulation vehicle • Logic analyzer – Design in simulation vehicle • Source-level debugger • Waveform viewer • Challenges in So. C – How to manage waveforms from different abstraction level – How to manage trigger conditions – How to probe out internal signals of designs in emulation vehicles
52 Debugging in Multi-Level • Built-in logic analyzer – Built-in logic analyzer enables the designer to watch what is actually going on. – Built-in logic analyzer samples the states of the DUT and stores them in the external dump memory. (non-intrusive) Configure FPGA Download Trigger Design Under Test Built-In Logic Analyzer Run Upload Post Processing External Dump Memory VCD
53 Debugging in Multi-Level • Built-in logic analyzer – Triggering condition is dynamically configured. – After the emulation is over, the dump data in the external memory is read and processed to generate VCD file $date Fri Dec 6 22: 50: 56 2002 $end $version 4. 10 $end $timescale 100 ps $end $scope $var … module reg 32 ! reg 1 “ reg 4 $ Waveform viewer BILA $end user_data $end write_en $end mode $end
54 Debugging in Multi-Level • Probing internal nodes – Sometimes the designer wants to watch internal nodes in his design. – Internal node probing enables this by wiring-out the internal nodes to the boundary of the DUT top block. Top block DUT Sub-block Internal node Built-In Logic Analyzer External Dump Memory
55 Debugging in Multi-Level • Monitoring software variables – Software dump data is merged with hardware dump data (Built-In Logic Analyzer) to generate unified waveform. – The waveform contains both hardware and software debugging information. Built-In Logic Analyzer Dump data with timing information dump 2 vcd Software variable Dump data with timing information VCD with both Hardware and software Debugging information
56 Using Multiple FPGAs • Synchronous Built-In Logic Analyzer – When the design is partitioned into multiple FPGAs, Built-In Logic Analyzer(BILA) in each FPGA samples the internal state of each FPGA. – All the dump data are merged to provide the user a unified and synchronized waveform. FPGA #n p 2 m d vc DUT du Dump data #n Dump data #3 Dump data #2 Dump data BILA #1 FPGA #3 DUT BILA FPGA #2 DUT BILA FPGA #1 DUT BILA Unified VCD
57 Using Multiple FPGAs • Multiple FPGAs with multiple Processes – BILA in each FPGA samples FPGA states. – SVA in each process samples program states. – All of the dump data are merged
58 Using Multiple FPGAs • Multiple FPGAs with multiple Processes Process #m Dump data #1 d vc BILA p 2 DUT m du Dump data Process #3 #n SVA Dump data Process #2 #3 SVA Process #1 Dump data SVA #2 FPGA #n Dump data SVA #1 Dump data FPGA #3 #n DUT BILA Dump data FPGA #2 #3 DUT BILA FPGA #1 Dump data DUT BILA #2 Unified VCD
6d101a545bb9ecfb736c2a707dc4df5a.ppt