Design Technology 1 BY HASSAN AL MANASRAH TAMIR

Design Technology 1 BY HASSAN AL MANASRAH TAMIR AL ZU’BI

Outline 2 Introduction Automation: synthesis Verification: hardware/software co-simulation Reuse: intellectual property cores Design process models

Introduction 3 System Design Goals

Introduction 4 What does “Design” means? Task of defining system functionality and converting that functionality into physical implementation. Convert functionality to physical implementation while Satisfying constrained metrics Optimizing other design metrics Designing embedded systems is hard because of Complex functionality Millions of possible environment scenarios. Ex: Elevator Controller. So many Competing, tightly constrained metrics. Productivity gap As low as 10 lines of code or 100 transistors produced per day Many possible combinations of buttons being pressed.

Improving productivity 5 Design technologies developed to improve productivity, we focus on technologies advancing hardware / software view: Automation: Synthesis Reuse Computer program to replace manual design. Which made Hardware design look like Software design. Process of using predesigned components. Core in the Hardware domain. Verification Task of ensuring correctness/completeness of each design step. Hardware/Software co-simulation. Specification Automation Verification Implementation Reuse

Automation: synthesis 6 The parallel evolution of compilation and synthesis Synthesis levels Logic synthesis Two-level logic minimization Multi-level logic minimization FSM synthesis Technology mapping Register-transfer synthesis Behavioral synthesis System synthesis and hardware/software co-design

The parallel evolution of compilation and synthesis 7 In the early design was mostly hardware, software was fairly simple. Software complexity increased with advent of general-purpose processor. Different techniques for software design and hardware design: Sequential program code (e. g. , C, VHDL) Behavioral synthesis (1990 s) Compilers (1960 s, 1970 s) Register transfers RT synthesis (1980 s, 1990 s) Assembly instructions Caused division of the two fields Hardware/software design fields rejoining The co-design ladder Both can start from behavioral description in sequential program model Logic equations / FSM's Assemblers, linkers (1950 s, 1960 s) Logic synthesis (1970 s, 1980 s) Machine instructions Microprocessor plus program bits Logic gates Implementation VLSI, ASIC, or PLD implementation

Cont. 8 Software design evolution Machine instructions Assemblers Collection machine instructions called Program (0’s, 1’s). Convert assembly programs into machine instructions, due to hard dealing with huge number of 0’s, 1’s. Compilers translate sequential programs into assembly Hardware design evolution Interconnected logic gates Logic synthesis Register-transfer (RT) synthesis converts logic equations or FSMs into gates converts FSMDs into FSMs, logic equations, predesigned RT components (registers, adders, etc. ) The co-design ladder Sequential program code (e. g. , C, VHDL) Behavioral synthesis (1990 s) Compilers (1960 s, 1970 s) Register transfers RT synthesis (1980 s, 1990 s) Assembly instructions Logic equations / FSM's Assemblers, linkers (1950 s, 1960 s) Logic synthesis (1970 s, 1980 s) Machine instructions Logic gates Behavioral synthesis converts sequential programs into FSMDs Microprocessor plus program bits Implementation VLSI, ASIC, or PLD implementation Hardware design involves many more dimensions, while compilers must generate assembly instructions to implement itself. Hardware Designer concerned about size, power, performance and other metrics.

Synthesis Levels 9 Gajski’s Y-chart Carry-ripple adder Each axis represents type of description Behavioral Processors, memories Implements behavior by connecting components with known behavior Physical Behavior Structural Defines outputs as function of inputs Addition Gives size/locations of components and wires on chip/board Sequential programs Registers, FUs, MUXs Register transfers Gates, flip-flops Logic equations/FSM Transistors Transfer functions Synthesis converts behavior at given Cell Layout level to structure at same level or lower E. g. , FSM → gates, flip-flops (same level) FSM → transistors (lower level) FSM X registers, FUs (higher level) FSM X processors, memories (higher level) Modules Chips Boards Physical

Logic Synthesis 10 Converting logic-level behavior to structural implementation By converting Logic equations and/or FSM to connected gates. Combinational logic synthesis Two-level minimization Multilevel minimization FSM synthesis State minimization State encoding

Two-level minimization 11 Represent logic function as sum of products (or product of sums) AND gate for each product OR gate for each sum This minimization gives best possible performance when at most we have 2 gates delay Goal: minimize size Minimum cover that is prime Sum of products F = abc'd' + a'b'cd + a'bcd + ab'cd Direct implementation a b c F d 4 4 -input AND gates and 1 4 -input OR gate → 40 transistors

Minimum Cover 12 Minimum # of AND gates (sum of products) Literal: variable or its complement a or a’, b or b’, etc. Minterm: product of literals Each literal appears exactly once abc’d’, ab’cd, a’bcd, etc. Implicant: product of literals Each literal appears no more than once abc’d’, a’cd, etc. Covers 1 or more minterms a’cd covers a’bcd and a’b’cd Cover: set of implicants that covers all minterms of function Minimum cover: cover with minimum # of implicants

Cont. 13 Minimum cover: K-map approach K-map: sum of products Karnaugh map (K-map) 1 represents minterm Circle represents implicant Minimum cover Covering all 1’s with min # of circles Example: direct vs. min cover 00 01 11 10 00 01 11 0 0 1 1 0 1 ab 10 0 0 1 0 K-map: minimum cover 0 0 cd 00 01 11 10 0 0 1 1 0 0 0 0 Minimum cover F=abc'd' + a'cd + ab'cd Minimum cover implementation Less gates a 4 vs. 5 b c ab cd Less transistors 28 vs. 40 d F 2 4 -input AND gate 1 3 -input AND gates 1 4 input OR gate → 28 transistors

Minimum cover that is prime 14 Minimum # of inputs to AND gates K-map: minimum cover that is prime Prime implicant cd ab Implicant not covered by any other implicant 0 0 1 0 Max-sized circle in K-map 0 0 1 0 0 0 Minimum cover that is prime 0 0 1 0 Covering with min # of prime implicants Minimum cover that is prime Min # of max-sized circles F=abc'd' + a'cd + b'cd Example: prime cover vs. min cover 00 01 11 10 Same # of gates 4 vs. 4 Less transistors 26 vs. 28 Implementation a b c d 1 4 -input AND gate 2 3 -input AND F gates 1 4 input OR gate → 26 transistors

Minimum cover: heuristics 15 K-maps give optimal solution every time Functions with > 6 inputs too complicated Use computer-based tabular method Finds all prime implicants Finds min cover that is prime Also optimal solution every time Problem: 2 n minterms for n inputs 32 inputs = 4 billion minterms Exponential complexity Heuristic Solution technique where optimal solution not guaranteed Hopefully comes close

Heuristics: iterative improvement 16 Start with initial solution i. e. , original logic equation Repeatedly make modifications toward better solution Common modifications Expand Reduce Opposite of expand Reshape Replace each nonprime implicant with a prime implicant covering it Delete all implicants covered by new prime implicant Expands one implicant while reducing another Maintains total # of implicants Irredundant Selects min # of implicants that cover from existing implicants Synthesis tools differ in modifications used and the order they are used

Multilevel logic minimization 17 Trade performance for size Increase delay for lower # of gates Gray area represents all possible solutions Circle with X represents ideal solution 2 -level gives best performance m max delay = 2 gates Solve for smallest size lm delay Generally not possible ti- m ul i in . e lev Multilevel gives pareto-optimal solution Minimum delay for a given size Minimum size for a given delay 2 -level minim. size

Example 18 Minimized 2 -level logic function: F = adef + bdef + cdef + gh Requires 5 gates with 18 total gate inputs 4 ANDS and 1 OR After algebraic manipulation: F = (a + b + c)def + gh Requires only 4 gates with 11 total gate inputs 2 ANDS and 2 ORs Less inputs per gate Assume gate inputs = 2 transistors Reduced by 14 transistors 36 (18 * 2) down to 22 (11 * 2) Sacrifices performance for size Inputs a, b, and c now have 3 -gate delay Iterative improvement heuristic commonly used 2 -level minimized a d b e c f g h F multilevel minimized a b c d e f g h F

FSM synthesis 19 Converting FSM to gates State minimization Reduce # of states Identify and merge equivalent states Smaller states registers and fewer gates Outputs, next states same for all possible inputs. Tabular method gives exact solution. • Table of all possible state pairs. • If n states, n 2 table entries. • heuristics used with large # of states. State encoding Unique bit sequence for each state. If n states, log 2(n) bits to represent n unique encodings. n! possible encodings. Thus, heuristics common.

Technology mapping 20 Library of gates available for implementation Simple only 2 -input AND, OR gates Complex various-input AND, OR, NAND, NOR, etc. gates Efficiently implemented meta-gates (i. e. , AND-OR-INVERT, MUX) Final structure consists of specified library’s components only If technology mapping integrated with logic synthesis More efficient circuit More complex problem

Register-transfer synthesis 21 Converts FSMD to custom single-purpose processor Datapath Register units to store variables Functional units Buses, MUXs FSM controller Arithmetic operations Connection units Complex data types Controls datapath Key sub problems: Allocation Instantiate storage, functional, connection units Binding Mapping FSMD operations to specific units

Behavioral synthesis 22 High-level synthesis Converts single sequential program to single-purpose processor FSDM Does not require the program to schedule states Behavioral synthesis tool use advance techniques to carry out task scheduling allocation. Key sub problems Allocation Binding Scheduling Implementing a sequential program needs Assign sequential program’s operations to states Optimizations important Compiler Constant propagation, dead-code elimination, loop unrolling Advanced techniques for allocation, binding, scheduling

System synthesis Collection of processors 23 At embedded systems its getting much complex Multiple processes may provide better performance/power May be better described using concurrent sequential programs System synthesis means: Convert 1 or more processes into 1 or more processors Tasks Transformation Can merge 2 exclusive processes into 1 process Can break 1 large process into separate processes Allocation Essentially design of system architecture Select processors to implement processes Also select memories and busses

Cont. 24 Tasks (cont. ) Partitioning Scheduling Mapping 1 or more processes to 1 or more processors Variables among memories Communications among buses Determining when each of the multiple processes on a single processor will have chance to execute on the processor. Memory accesses, bus communications must be schedule. Tasks performed in variety of orders Iteration among tasks common

Cont. 25 Synthesis driven by constraints E. g. , Meet performance requirements at minimum cost Allocate as much behavior as possible to general-purpose processor • Low-cost/flexible implementation Minimum # of SPPs used to meet performance System synthesis for GPP only (software) Common for decades Multiprocessing Parallel processing Real-time scheduling Hardware/software codesign Simultaneous consideration of GPPs/SPPs during synthesis Made possible by maturation of behavioral synthesis in 1990’s

26 Verification

Verification 27 It is the task of ensuring that a design is correct and complete. o Correctness Means that the design implements its specification correctly. o Completeness Means that the designs specification described appropriate output responses to all relevant input sequences. There are two main verification approaches Ø Ø Formal verification Simulation

Formal Verification 28 It is an approach of analyzing a design to prove or disprove certain properties. This is done by verifying the correctness of a particular design & verifying the completeness of a behavioral description. Correctness verification By verifying that a particular structural description correctly implements a behavioral description, by proving the equivalence of the two descriptions. Example: Prove ALU structural implementation equivalent to behavioral description. 1. Derive Boolean equations for outputs. 2. Create truth table for equations. 3. Compare to truth table from original behavior table. completeness verification Verifying completeness of a behavioral verification is proving of that a certain situations will never occur. Example: Formally prove elevator door can never open while elevator is moving 1. Derive conditions for door being open. 2. Show conditions conflict with conditions for elevator moving. Drawbacks: Formal Verification is very hard limited to small designs or verifying only certain key properties

Simulation 29 It is an approach in which we create a model of the design that can be executed on computer We entered the input values to the module and check that the output values of the module match the expected values. Correctness verification Example : Prove ALU structural implementation equivalent to behavioral description. 1. Providing all possible input combinations to the module 2. Checking the ALU outputs for correct results completeness verification Example : Formally prove Elevator door closed when moving 1. Provide all possible input sequences 2. Check door always closed when elevator moving Simulation of all possible inputs is impossible, like simulating of all possible inputs for 32 -bits ALU , which requires 232*232 possible input combinations which take a very long time to simulate. Designer can only simulate a tiny subset of possible inputs, which includes typical values , and boundary inputs. Simulation increases confidence of correctness/completeness of the design but Does not prove anything.

Simulation advantages & disadvantages 30 Simulation has several advantages over the physical implementation with respect to test & debugging the system. o Controllability The ability to control the execution of the system, like the control of time and the data inputs of the system. o Observability the ability to examine system values, that the user can stop the simulation and observe internal system values. o Debugging the user can stop the simulation at any time , either small , and change the input values or the internal values or the environment values, then restarting again. o Setting up time Simulation takes a less setting up time than physical implementation, and gives the ability to test the system and check the output before setting up the system in hardware. Simulation has disadvantages o o o Set up simulation take much time for a complex external environment. The models of the environment likely is incomplete , so environment behavioral may be not modeled correctly. Simulation speed is slower than physical implementation speed.

Cont… 31 The most significant disadvantage is simulation speed e. g. . physical implementation of microprocessor may executes 100 million instruction per second, a simulation of gate level model may execute only 10 instruction per second…big gap!!! Simulation is slow for many reasons: Sequentializing parallel design Supposing that we are analyzing 1000000 logic gates in a design , all this gates operate in parallel , so we have inputs , outputs for each gate, every gate is simulated per a time. Several programs added between simulated system and real hardware The simulation has to understand the system , takes the input , calculates , then generates the output , all of this take a time, additionally the simulation is running under OS which may make a delay. Overcome of slow simulation speed o Reducing the amount of real time simulation Instead of using hours of simulation we might use a milliseconds of simulation o Using faster simulator There are two ways to make simulator faster Ø Building & Using special hardware for simulation, known as Emulators. Ø Using simulator which is less precise and accurate, by reducing controllability and observability.

Cont… 32 Don’t need gate-level analysis for all simulations E. g. , cruise control Don’t care what happens at every input/output of each logic gate Simulating RT components ~10 x faster 1 IC 1 hour Cycle-based simulation ~100 x faster 10 FPGA 1 day Accurate at clock boundaries only No information on signal changes 100 4 days hardware emulation between boundaries Faster simulator often combined with 1000 throughput model 1. 4 months reduction in real time 10000 instruction-set simulation 1. 2 years If willing to simulate for 10 hours Use instruction-set simulator 100, 000 cycle-accurate simulation 12 years Real execution time simulated 1, 000 register-transfer-level HDL simulation >1 lifetime 10 hours * 1 / 10, 000, 000 gate-level HDL simulation 1 = 0. 001 hour millennium = 3. 6 seconds

Hardware/software co-simulation 33 It is a simulator that is designed to hide the details of integration of an ISS and HDL simulator. There are many simulation approaches varying in speed , precision , and accuracy. You may find a very detailed simulation like gate-level mode , and very abstract simulation like instruction level model. Simulation tools evolved separately for hardware/software , so every one has separate design evolution. Software Global Purpose Processor(GPP) Ø Typically with instruction-set simulator (ISS) Hardware Special Purpose Processor(SPP) Ø Typically with models in HDL environment The integration of GPP & SPP onto a single IC increased the need of simulating these two processors together, by merging the Software/Hardware simulation tools. There are two approaches to merge Software & Hardware simulation together o The Simple way is to create an HDL module for the GPP which will run the software of the system, and then integrating the HDL model of the SPP, it has two disadvantages: Ø Ø o Much slower than ISS Less observable/controllable than ISS Creating communication between GPP (ISS) & SPP(HDL) , that every one run alone at its simulation and transferred data between them by shared communication when needed, this is known as Hardware/Software Co-Simulation.

Cont… 34 Modern Hardware/Software co-simulations additionally to integrating two simulators, they minimize the communication between two simulator. E. g. the memory between GPP & SPP every processor has to access the memory Where should memory go? In ISS HDL simulator must stall for memory access In HDL? ISS must stall when fetching each instruction The solution is to model a independent memory for every processor in ISS simulator and HDL simulator with updating the shared data for both. Huge speedups (100 x or more) reported with this technique.

Emulators 35 It is general physical device onto which a system can be mapped relatively quickly, and can be placed in the system real environment. It is created to solve the problems of simulation , expensive environment setup, incomplete environment models, and slow simulation speed. An emulator consists of microprocessor IC and monitoring &controlling circuits. It may contain tens or hundreds of FPGAs , and Usually supports debugging tasks Emulation has several advantages over simulation: Mapped relatively quickly Ø Can be placed in real environment Ø Ø Hours, days No environment setup time No incomplete environment Typically faster than simulation Ø Hardware implementation

Cont… 36 Emulation has also disadvantages: o Still not as fast as real implementations E. g. , emulated cruise-control may not respond fast enough to keep control of car o Mapping still time consuming E. g. , mapping complex SOC to 10 FPGAs , just partitioning into 10 parts could take weeks o Can be very expensive o o Top-of-the-line FPGA-based emulator: $100, 000 to $1 mill Leads to resource bottleneck, which a company may afford one emulator, then caused a groups to wait.

Reuse: intellectual property cores 37 Designers always has Commercial Of-The-Shelf components COTS, which is predesigned package ICs, and it is reduced the time of design and debug. System-On-Chip SOC is implementing all components of a system on single chip, this is achieved by increasing ICs capacities. Changing the way COTS components are sold , it is being sold as intellectual property (IP) rather than actual IC. They are sold as behavioral, structural, or physical descriptions rather than actual ICs. Designers can integrate these descriptions with other to form one large SOC. Processor-level components known as cores , and it is referred to GPP or SPP IP component.

Cont… 38 Soft core Synthesizable behavioral description Typically written in HDL (VHDL/Verilog) Firm core Structural description Typically provided in HDL Hard core Physical description Provided in variety of physical layout file formats Gajski’s Y-chart Behavioral Structural Processors, memories Sequential programs Registers, FUs, MUXs Register transfers Gates, flip-flops Logic equations/FSM Transistors Transfer functions Cell Layout Modules Chips Boards Physical

Hard/Soft core advantages & disadvantages 39 Hard cores Ease of use Predictability Developer already designed and tested hard core Can use right away Can expect to work correctly Size, power, performance predicted accurately It is specific for exact IC process , and not easily mapped (retargeted) to different process E. g. , core available for vendor X’s 0. 25 micrometer CMOS process Can’t use with vendor X’s 0. 18 micrometer process Can’t use with vendor Y Soft cores Can be synthesized to nearly any technology Can optimize for particular use E. g. , delete unused portion of core which gives Lower power , and smaller designs Requires more design effort May not work in technology not tested for Not as optimized as hard core for the same processor , since hard cores have been given more attention.

Firm core advantages & disadvantages 40 Compromise between hard and soft cores Some retargetability Limited optimization Better predictability/ease of use

New challenges to processor providers 41 Cores have dramatically changed business model of vendors of GPP & SPP. These changes made for Pricing model & IP protection Pricing models In the past Vendors sold product as IC to the designers Designers must buy any additional copies, because of impossible copying of ICs • Could not (economically) copy from original Today Vendors can sell as IP instead of ICs itself Designers incorporate IPs into SOC Designers can make as many copies as needed, and vendors gain money Vendor can use different pricing models Royalty-based model • Similar to old IC model • Designer pays for each additional model created Fixed price model • One price for IP and designers can make as many copies as needed Many other models used IP protection The next slide

IP protection 42 IP protection has become a key concern of core providers In the past Illegally copying of IC is very difficult Reverse engineering required tremendous, deliberate effort “Accidental” copying is not possible Today Cores sold in electronic format Deliberate/accidental unauthorized copying are easier Vendors consider Safeguards greatly when selling their products Contracts are created between vendors and designers to ensure no copying and distributing for the IP Encryption techniques is used by vendors to limit the actual exposure to IP E. g. watermarking determines if particular instance of processor was copied whether copy authorized

New challenges to processor users 43 There a new challenges posed for a designers to use GPP & SPP Licensing arrangements Extra design effort Purchasing a cores is not as easy as purchasing ICs More contracts enforcing pricing model and IP protection and possibly requiring legal assistance. Especially for soft cores Must still be synthesized and tested Minor differences in synthesis tools can cause problems Verification requirements more difficult Extensive testing for synthesized soft cores and soft/firm cores mapped to particular technology Ensure correct synthesis Timing and power vary between implementations There is no direct access to a core once it has been integrated into a chip Cores buried within IC Cannot simply replace bad core like replacing bad IC in the past

Design process model 44 It describes order that design steps are processed, and each step has many sub steps. 1. 2. 3. Behavior description step Behavior to structure conversion step Mapping structure to physical implementation step Waterfall design model Behavioral Structural Waterfall model Proceed to next step only after current step completed Physical Spiral design model Spiral model Proceed through 3 steps in order but with less detail Repeat 3 steps gradually increasing detail Keep repeating until desired system obtained Becoming extremely popular (hardware & software development) Structural Behavioral Physical

Waterfall method 45 If the designer has 6 month to build a system then he proceed with: 1. 2. 3. The designer start with describing behavior of the system completely, may take two months. Once fully satisfied the correct of behavioral , moving to the structural design, also take two months. Once fully satisfying the correct of structural, then physical implementation is done. Drawbacks When we moved to the next step we cant come back to the previous level Not very realistic Bugs often found in later steps that must be fixed in earlier step E. g. , when testing the structure we notice that we forgot to handle certain input condition at the behavior level Waterfall design model Prototype often needed to know complete desired behavior E. g. , customer adds features after product demo Behavioral System specifications commonly change E. g. , to remain competitive by reducing power, size, certain features be dropped Unexpected iterations back through 3 steps cause missed deadlines Lost revenues May never make it to market Structural Physical

Spiral method 46 If the designer has 6 month to build a system then he proceed with: 1. 2. 3. 4. The designer start with describing the basic behavior of the system and it is not complete, may take few weeks. Proceeding to the structural design, also may take few weeks. then creating a physical prototype for the system, and this prototype is used to test out the basic functions. Go back to the first step and continue First iteration of 3 steps incomplete Much faster, though End up with prototype Spiral design model Structural Behavioral Use to test basic functions Get idea of functions to add/remove Original iteration experience helps in following iterations of 3 steps Physical Drawbacks: The designer must come up with ways to obtain structure and physical implementations quickly E. g. , the designer uses FPGAs for prototype , then generating a new silicon for final product takes a long time May have to use more tools The designer Could require Extra effort/cost when using extra tools. Could require more time than waterfall method due to the overhead of creating physical prototyps. If correct implementation first time with waterfall

General-purpose processor design models 47 Previous slides focused on SPPs Can apply equally to GPPs Waterfall model Structure developed by particular company Acquired by embedded system designer Designer develops software (behavior) Designer maps application to architecture Compilation Manual design Spiral-like model Beginning to be applied by embedded system designers

Spiral-like model 48 Designer develops or acquires architecture Develops application(s) Maps application to architecture Analyzes design metrics Y-chart Architecture Application(s) Mapping Analysis Now makes choice Modify mapping Modify application(s) to better suit architecture Modify architecture to better suit application(s) Not as difficult now Maturation of synthesis/compilers IPs can be tuned Continue refining to lower abstraction level until particular implementation chosen

Summary 49 Design technology seeks to reduce gap between IC capacity growth and designer productivity growth Synthesis has changed digital design Increased IC capacity means sw/hw components coexist on one chip Design paradigm shift to core-based design Simulation essential but hard Spiral design process is popular

References 50 Embedded System Design: A Unified Hardware/Software Introduction Frank Vahid and Tony Givargis