41370c27fd2c2b40721c22b117b4329d.ppt
- Количество слайдов: 54
An Automatic Approach to Generate Haste Code from Simulink Specifications Maurizio Tranchero 1, Leonardo M. Reyneri 1, Arjan Bink 2, and Mark de Wit 2 1 Politecnico di Torino – Department of Electronics – Italy 2 Handshake Solutions – The Netherlands
Outline n n n Simulink Based Design and Code. Simulink Haste Coding Choices Simulink-Specific Issues Proposed Flow and its Implementation Case studies and performance Haste Code from Simulink 2
Simulink-Based Design
Simulink®: what is it? and why? n n n General-purpose graphical tool able to describe and simulate heterogeneous systems Based on MATLAB® Widely used in different application and industrial areas: signal and image processing, control, aerospace, modeling, etc. . . Does not require knowledge of electronic/digital design; allows interdisciplinary teams Uses dataflow (DF) computational model Haste Code from Simulink 4
Simulink diagrams n n n A set of interconnected blocks Each block performs an operation (e. g. a multiply and accumulate model) Includes stimuli and test points ACCUM SOURCES DISPLAY RESULTS MULTIPLY ADD Haste Code from Simulink 5
Simulink to develop digital systems n Simulink is very fine in general-purpose modeling, but: what are the implications of HW/SW implementations? ¨ what about the effects of data representation? ¨ what about the effects of timing, latencies and delays? ¨ n Can Simulink models be implemented physically? Yes, but some external tools are required: ¨ For SW n ¨ For HW n n ¨ Real-Time Workshop, from The Mathworks System Generator (Xilinx) DSP-Builder (Altera) HDL Coder (The Mathworks) Code. Simulink (Politecnico di Torino) For mixed HW/SW n Code. Simulink (Politecnico di Torino) Haste Code from Simulink 6
Data Flow vs. Register Transfer n n n n Simulink is natively Data Flow (each block computes only when data is valid) Sequential SW is DF (by compilation) Synchronous HW is natively Register Transfer (each block computes independently of data being valid) RT has the problem of repipelining and synchronization… Asynchronous HW is natively DF (because of handshake) Analog systems are time-continuous True Simulink HW has to be DF! Mixed HW/SW systems have to be DF! Haste Code from Simulink 7
Commercial tools Simulink HW/SW n Non Simulink-compliant (they are RT, not DF !!!) : System Generator (Xilinx) ¨ DSP-Builder (Altera) ¨ HDL Coder (The Mathworks) ¨ n they use Simulink ONLY as a graphical interface do NOT support any Simulink block Simulink-compatible (fully DF): Real Time Workshop (The Mathworks; only for SW!) ¨ Code. Simulink/SMT 6040 (Politecnico di Torino); also supports mixed HW/SW/analog systems ¨ both implement Simulink blocksets natively in a transparent manner Haste Code from Simulink 8
Code. Simulink/ SMT 6040 Tool
Our Tool Code. Simulink/SMT 6040 n n n True DF, Simulink-compatible, model-based, hybrid codesign environment Co-simulates: SW + digital HW + analog HW + external world (e. g. mechanical) modeling real behavior of chosen implementation(s) Generates: SW (C) + digital HW (VHDL) + analog (SPICE) or VHDL + HASTE code Digital either: synchronous DF, asynchronous DF Commercially available (SMT 6040) Student edition available at http: //polimage. polito. it/groups/codesimulink. html Haste Code from Simulink 10
A Simple Code. Simulink model Haste Code from Simulink 11
Implementation Parameters n Available parameters ¨ ¨ ¨ DATAWIDTH (number of bits) BINARYPOINT (position of fixed point) REPRESENTATION ((un)signed, sign/modulus, floating point) OVERFLOW (saturation/wraparound) TRUNCATION (floor, ceil, round, etc. ) PIPELINE (latency, speed) +/- 1 0 1 1 0 +5. 50 Haste Code from Simulink 12
From Simulink to Code. Simulink n An automatic process composed of these steps: Model simulation ¨ Model conversion (namely 1 -to-1 block substitution) ¨ Hw Parameter setting (based on simulation result) ¨ Double precision (64 b) Floating point - Selectable data-width - integer, fixed point, floating point. . . - signed, unsigned, modulus & sign - wrap around, saturate output Haste Code from Simulink 13
A Semi-Automatic Process n n n This conversion cannot be completely automated Inputs and outputs block should be inserted manually Some block parameter have to be set manually (overflow, truncation and pipeline) Haste Code from Simulink 14
Code. Simulink Environment System Description Functional + Timing Simulation HW-SW Partitioning Digital HW SW Dig. Hw Compiler RTW Analog HW An. Hw Compiler Synchronous Asynchronous PCB Tool P&R Target Programming Schematic Haste Code from Simulink 15
Advantages of (Code)Simulink n n n Flexibility: very high (short redesign time); no need to take care of interfaces and timing; quick system-level performance optimization Reusability: may use existing Simulink models Time-to-market: very short (consequently), although design is suboptimal (can be optimized later on) Accessibility: does not require experienced designer; simpler integration of work team with heterogeneous know-how’s Academic: Optimal for teaching Electronic Systems and Asynchronous circuits; student version available Haste Code from Simulink 16
Advantages of Code. Simulink n n n Allows choosing implementation later in the design flow Timing analysis and pipeline balancing Natively handles scalars, vectors, matrices Supports multi-system (multi-platforms, multi-cores, multi -SW, multi-FPGA, multi time-domains, GALS, mixed synch/asynch, hybrid, etc. ) Supports synchronous bit-parallel, bit-serial, bundleddata asynchronous designs Interfaces to low-level simulators (Model. Sim, Max. Plus, Quartus, ISE, Spice-like) Haste Code from Simulink 17
Limitations of Code. Simulink n n Best suited to data-dominated systems Mostly fixed-rate (does not mean synchronous!) sampling strategy (including multi-rate) Library-based (sub-optimal) Fast timing models (optional) require technology characterization Haste Code from Simulink 18
Library Blocks n Large library of blocks (blockset) including: ¨ Low-level Simulink blockset: addition, multiplication, min/max, floating / fixed point converters, etc. ¨ High-level functions: FIR filters, FFTs, custom transfer functions, etc. ¨ Special-purpose functions ¨ Interface blocks: n n I/Os SW/HW/SW Analog/digital/analog Synchronous/asynchronous Haste Code from Simulink 19
Code. Simulink digital blocks Each Code. Simulink block is translated into: n A combinational functional blocks (VHDL) n A sequential protocol controller + register. Either: VHDL, synchronous ¨ VHDL, asynchronous ¨ Haste code ¨ CHANNEL REQ, ACK CLK, VAL, RDY Haste Code from Simulink 20
Asynchronous Code. Simulink n n n n Just change protocol handling box Supports bundled data transfers Analyzes and optimizes timing Timing analysis identifies bottlenecks and helps to minimize them Forces timing constraints during synthesis accordingly Adds delay line according to required timing Prevents optimization on delay line Haste Code from Simulink 21
Haste Coding Choices
VHDL Usage within Ti. DE n n Code. Simulink uses a library-based approach: each block is described in VHDL To reuse such code, an automatic conversion into Verilog (which is fully supported in Ti. DE) code has been provided using RTL Compiler Haste Code from Simulink 23
Coding Styles in HASTE n n Different coding styles available: which is the best for Simulink blocks? To benchmark, we used a simple datapath made of: 4 different arithmetic operations ¨ 2 x 16 bit-wide inputs ¨ 1 x 3 bit-wide selector ¨ 1 x 32 bit-wide output ¨ Haste Code from Simulink * + > x 3 24
Multiple vs. Single Processes n The same block described as a single process or as an ensemble of concurrent processes produces different implementations forever do multiplier(. . . ) || adder(. . . ) || comparator(. . . ) || fixed. Gain(. . . ) forever do multiplier(. . . ) od || forever do adder(. . . ) od || forever do comparator(. . . ) od || forever do fixed. Gain(. . . ) od od Haste Code from Simulink 25
Shared Variables vs. Channels n n Variables are cheaper than channels automatically make synchronization & C = func(& i 0 ? var T & i 1 ? var T ): T. ( i 0 + i 1 ) fit T & pipeline: main proc( & x ? chan T broad pas & y ! chan T ). begin | forever do wait( outprobe( x ) ) ; ( y!C(. i 0(dataprobe(x)), . i 1(B(. i 0(dataprobe(x)), . i 1(A(dataprobe(x)))) ) ) || x? ~ ) od end & C = proc( & i 0 ? chan T & i 1 ? chan T & o 0 ! chan T ). begin | forever do wait( outprobe(i 0) * outprobe(i 1) ) ; o 0!( dataprobe( i 0 ) + dataprobe( i 1 ) ) fit T ; ( i 0? ~ || i 1? ~ ) od end & pipeline: main proc( & x ? chan T broad pas & y ! chan T ). begin & c 0 : chan T & c 1 : chan T | A(x, c 0)||B(c 0, x, c 1)||C(c 1, x, y) end Haste Code from Simulink 26
Tupled vs. Separated Channels n n Tupled channels (c) are cheaper than separated one (b) But they can introduce deadlock in several configurations Haste Code from Simulink 27
Deadlock Exposed Haste Code from Simulink 28
Register Insertion n n Using HASTE and Ti. DE 5. 2, registers are inserted at inputs Simulink blocks have usually one output and one or more inputs We would like to have register on output, for less area occupation At the moment (Ti. DE 5. 2) it is not possible, but in Ti. DE 6 it will be Haste Code from Simulink 29
Some Figures (htcomp + htmap) System Datapath Global forever do Multiple forever do Independ. parallel inputs Tupled inputs V - V - 932. 2 156. 7 V - - V V - 902. 0 156. 3 - V V - 848. 7 129. 3 - V V - 871. 7 124. 7 V - - V 331. 0 14. 3 V - - V 298. 0 8. 3 Haste Code from Simulink Pipelined version Fully combinati onal Area [mm 2] C-gates 30
Conclusions n After this analysis we can decide to: ¨ Use multiple processes description ¨ Use channels instead of variables ¨ Use separated channels ¨ Registers are not optimized, but left to compiler optimization Haste Code from Simulink 31
Simulink-Specific Issues
Multidimensional Objects n n Simulink models can easily process scalars, vectors or matrices Depending on throughput constraints we can decide to process each data component serially or in parallel Serial vector 1, 3, 5, 7 2 2, 6, 10, 14 Parallel vector 1 3 5 7 Haste Code from Simulink 2 2 2 6 10 14 33
Sampling Blocks n n n Sampling blocks are the ones with special timing constraints, i. e. , they have to guarantee data processing in a fixed amount of time They can be used to change input/output data rate The main blocks belonging to this category unit delay ¨ zero order hold ¨ rate transition ¨ Haste Code from Simulink 34
Unit Delay FSM for scalar data n n It introduces one memory stage from input to output When a “Sampling Time” period has been elapsed The old data (multiple data, in case of arrays) is (are) generated on output ¨ A new data (multiple components) is (are) sampled ¨ Haste Code from Simulink 35
Zero Order Hold FSM for scalar data n n It maintains output data until a “Sampling Time” period has been elapsed When it elapses, a new acquired input data (possibly multiple) is transferred to the output Haste Code from Simulink 36
Rate Transition n It is a super set of previous blocks: it is used to change data rate from input to output, both increasing or decreasing it Replicates/consumes tokens It can be described as a cascade of “unit delays” and “zero order” blocks Haste Code from Simulink 37
Sampling Blocks Implementation n All these blocks have to be connected to a clock/timing (? !? ) signal to guarantee timing To reduce overhead introduced by clock interaction, it is possible to use a fully asynchronous version of such blocks, yet precisely timed Timing clock interaction is still necessary but it could be moved to I/Os Haste Code from Simulink 38
Simulink-Haste Flow Implementation
The Flow n n Simulink Model Integrates Code. Simulink with the existing Ti. DE flow Each block is converted in both Haste and RTL code Code. Simulink VHDL Descriptions Haste Description RTL Compiler htcomp + htmap Verilog Descriptions HT Back-end Haste Code from Simulink 40
Haste File Generated n Is composed of 6 parts ¨ Type definitions (used in the file itself) ¨ Top level procedure interface definition ¨ Internal channels ¨ Internal functions (the interface to RTL code) ¨ Internal procedures (protocol management and functions instance) ¨ Procedure instances and connections Haste Code from Simulink 41
E. g. : Haste File Generated // Types Definition & STD_LOGIC_VECTOR_17 = type [0. . 2^17 -1] & STD_LOGIC_VECTOR_16 = type [0. . 2^16 -1] & STD_LOGIC_VECTOR_15 = type [0. . 2^15 -1] & STD_LOGIC_VECTOR_14 = type [0. . 2^14 -1] & STD_LOGIC_VECTOR_1 = type [0. . 2^1 -1] // Top entity instance & inout 1 : main proc( & DIGINA ? chan STD_LOGIC_VECTOR_15 & DIGINB ? chan STD_LOGIC_VECTOR_14 & DIGOUTA ! chan STD_LOGIC_VECTOR_17 ). begin. . . Haste Code from Simulink 42
E. g. : Haste File Generated // Functions declarations & sim_sum 1_f = func ( & A 1 ? Var STD_LOGIC_VECTOR_16 & A 2 ? var STD_LOGIC_VECTOR_1 ): STD_LOGIC_VECTOR_17. import // Component declarations & sim_sum 1 = proc ( & Y 1 ! Chan STD_LOGIC_VECTOR_17 & A 1 ? Chan STD_LOGIC_VECTOR_16 & A 2 ? Chan STD_LOGIC_VECTOR_1 ). begin & v_A 1 : var STD_LOGIC_VECTOR_16 & v_A 2 : var STD_LOGIC_VECTOR_1 | forever do ( A 1 ? v_A 1 || A 2 ? v_A 2 ) ; Y 1 ! sim_sum 1_f(. A 1( v_A 1 ), . A 2( v_A 2 ) ) od end Haste Code from Simulink 43
E. g. : Haste File Generated // Internal signal declarations & Y 1_5 : chan STD_LOGIC_VECTOR_1 broad & Y 1_4 : chan STD_LOGIC_VECTOR_17 broad & Y 1_1 : chan STD_LOGIC_VECTOR_16 broad. . . // Component instantiation sim_constant (. Y 1( Y 1_5 ) ) || sim_dig. Out (. A 1( Y 1_4 ), . DIGIO( DIGOUTA ) ) || sim_sum 1 (. Y 1( Y 1_4 ), . A 1( Y 1_1 ), . A 2( Y 1_5 ) ). . . Haste Code from Simulink 44
E. g. : VHDL File Generated -- Top entity instance ENTITY sim_sum 1 IS PORT ( DIGOUTA_i : IN STD_LOGIC_VECTOR(15 downto 0); DIGIN_VALA 0 : IN STD_LOGIC; DIGIN_RDYA : OUT STD_LOGIC; DIGOUTB_i : IN STD_LOGIC_VECTOR(0 downto 0); DIGIN_VALA 0 : IN STD_LOGIC; DIGOUTA_o : OUT STD_LOGIC_VECTOR(16 downto 0); DIGOUT_VALA : OUT SIM_SIGVAL_SYNCHPAR; DIGOUT_RDYA : IN STD_LOGIC; n. RESET : IN STD_LOGIC; CLK : IN STD_LOGIC -- left unconnected in this implementation ); END sim_sum 1 ; Haste Code from Simulink n n VHDL is used to describe the block functionality For each block a HDL file will be generated with desired parameters (Data width, binary point. . . ) 45
Conversion of Simulink models Simulink model Compiled Haste program // Functions declarations & sim_sum 1_f = func ( & A 1 ? Var STD_LOGIC_VECTOR_16 & A 2 ? var STD_LOGIC_VECTOR_1 ): STD_LOGIC_VECTOR_17. import // Component declarations & sim_sum 1 = proc ( & Y 1 ! Chan STD_LOGIC_VECTOR_17 & A 1 ? Chan STD_LOGIC_VECTOR_16 & A 2 ? Chan STD_LOGIC_VECTOR_1 ). begin & v_A 1 : var STD_LOGIC_VECTOR_16 & v_A 2 : var STD_LOGIC_VECTOR_1 | forever do ( A 1 ? v_A 1 || A 2 ? v_A 2 ) ; Y 1 ! sim_sum 1_f(. A 1( v_A 1 ), . A 2( v_A 2 ) ) od end 46 Haste Code from Simulink
Case Studies
3 -Input 32 -bits adder Haste Code from Simulink 48
Simple 16 -bits ALU (*, +, <, gain) Haste Code from Simulink 49
8 th order, 20 -bits wide IIR Filter Haste Code from Simulink 50
Results Speed comparison by simulation on Cyclone II FPGA Area comparison on commercial 90 nm ASIC library Haste Code from Simulink 51
Proprietary Audio Test Chip Area [um 2] Handwritten + Ti. DE 5. 2 (not available) Sequential 32, 018 89, 792 11, 632 Logic 141, 676 357, 368 152, 468 Total 173, 694 468, 746 164, 100 n n Code. Simulink + Ti. DE 5. 2 Code. Simulink + Ti. DE 6. 0 Ti. DE 5. 2 has limitations (e. g. registers placed at the input instead of the outputs) which made Simulink to Haste conversion very inefficient. Ti. DE 6. 0 has overcome these limitations and the automatically generated ASIC is smaller than the handwritten one. Haste Code from Simulink 52
Conclusions n n n Optimization at system level (Code. Simulink ), followed by automatic translation to Haste can achieve the same quality as manual coding with Haste, followed by hand optimization at Haste level, although The major drastic improvement is in productivity, maintainability and reusability of Code. Simulink model System-level cosimulation reduces development risks, makes optimization easier, makes interdisciplinary interactions much easier Time to market is significantly faster Performance reduction due to library-based design (about 10 -20% in the average) is completely overcompensated by the performance improvement achievable with high level specification, simulation and optimization Further manual optimizations are feasible if economical returns justify them Haste Code from Simulink 53
That’s folk! Thank you for your attention! Haste Code from Simulink 54


