Скачать презентацию Hardware Software Co-Design Group 8 Sandeep V Vivek Скачать презентацию Hardware Software Co-Design Group 8 Sandeep V Vivek

c944978d8406a9a5015e3f924e59500f.ppt

  • Количество слайдов: 39

Hardware Software Co-Design Group 8 Sandeep. V Vivek Radhakrishnan Shravan Rao Deepa Nagaraj Jagat Hardware Software Co-Design Group 8 Sandeep. V Vivek Radhakrishnan Shravan Rao Deepa Nagaraj Jagat Bhushan Rathish Kumar Sameera Markod

Agenda • • • Introduction Concept Fundamental Phases of HSC Power Metric Low power Agenda • • • Introduction Concept Fundamental Phases of HSC Power Metric Low power design Hardware software partition Cosynthesis Implementation example Conclusion

Need of HW/SW Co-Design • Earlier: Hardware and software components were designed in isolations. Need of HW/SW Co-Design • Earlier: Hardware and software components were designed in isolations. Drawbacks: üLess than best possible implementation. üHardware & Software component integration issues. üUncertainty in system functionality. üHigh cost due to long development cycles.

Hardware Software Co-design (HSC) • Concept : Meeting system-level objectives by exploiting the synergism Hardware Software Co-design (HSC) • Concept : Meeting system-level objectives by exploiting the synergism of hardware and software through their concurrent design. Advantage: ü Shorter development cycles ü Low power Designs ü Component Re-use ü Less Expensive

Understanding of HSC • HSC partitioned into distinct key areas Co. Development, Co-verification and Understanding of HSC • HSC partitioned into distinct key areas Co. Development, Co-verification and Co-Simulation. • Co-Development is detailed implementation of hardware and software components. • Co-Verification is verification of these component using Co-Simulation technology. • In simple words HSC refers to co-design in which system is modeled and simulated at various level of abstraction, analyze the trade offs, partition it into hardware and software components and design concurrent both hardware and software.

Fundamental Phases of HSC Library Formal Specification Models Modeling Modification & Refinements Simulation and Fundamental Phases of HSC Library Formal Specification Models Modeling Modification & Refinements Simulation and verification Design Evaluation Verified Design HW/SW Partitioning Software Synthesis C Code Interfacing Implementation and prototyping Hardware Synthesis VHDL Code

Fundamental Phases of HSC Continued…. • Formal Spec : It documents system’s requirement, constraints Fundamental Phases of HSC Continued…. • Formal Spec : It documents system’s requirement, constraints like power & size, functionality and behavior and major Interfaces. • Modeling : It contains set of instruction which describes the model of the design. Ø Structural Model: Describes hardware and software components which system. Ø Functional Model: Describes system’s functionality Ø Dynamic Model: Describes state transitions occurring inside system. • Model Library: It provide knowledge about existing design which can be reused.

Fundamental Phases of HSC Continued…. • Simulation and verification : CAD tool based simulation Fundamental Phases of HSC Continued…. • Simulation and verification : CAD tool based simulation engines. Initial condition are set through scenario based test cases. • Hardware/Software partitioning : Verified Design is translated into C-Code (Software components) and Verilog (hardware components) code. FM algorithm is a efficient algorithm used in functional partitioning for HSC and circuit placement. • Trade –off considered during HW/SW partitioning Trade-Off Std Proc Core Proc ASIP ASIC Performance Medium Highest Power High Medium-low Lowest Flexibility Medium High Low Design Time Low(Software) Medium Highest High(Hardware)

Fundamental Phases of HSC Continued…. • Interfacing: System component communication requires interfaces. Ø HW-HW, Fundamental Phases of HSC Continued…. • Interfacing: System component communication requires interfaces. Ø HW-HW, HW-SW, SW-SW. Ø Examples: Memory mapped I/O interface, Interrupt handling, Busy waiting and handshaking. • Implementation and prototyping: Ø Breaking down system functionality into small domain independent, concurrent and interacting/communicating process. Ø Performance requirement such as op. freq, throughput and latency is defined. Ø Allocation process, action of assigning process either to HW or SW domain. Ø Scheduling process.

Low Power Design approach by HSC • System-level partitioning i. e assignment of operations Low Power Design approach by HSC • System-level partitioning i. e assignment of operations to HW and SW greatly impacts low power achievement, system cost and performance • Hierarchical power efficiency System Ø Efficient way to reduce dynamic power consumption

High-Level Power Estimation Metrics • Embedded is categorized as – Timing constrained system: The High-Level Power Estimation Metrics • Embedded is categorized as – Timing constrained system: The speed is the most important design constraint. – Area constrained system: The area is the most important constraint. • Timing constrained system can further categorized as depending on system throughput (T) as fixed throughput mode, maximum throughput mode, and burst throughput mode. • Energy to throughput ratio (ETR) defined for timing constrained as ETR = power/T 2. • Energy per operation by Area EAP= (power*A)/T. • Estimation of power is derived by considering HW and SW separately.

Hardware part power Estimation • Power Dissipated by ASIC Pavg=PIO+Pcore (1) Pcore=PDP+PMEM+PCNTR+PPROC (2) PIO=Power Hardware part power Estimation • Power Dissipated by ASIC Pavg=PIO+Pcore (1) Pcore=PDP+PMEM+PCNTR+PPROC (2) PIO=Power dissipated by I/O. Pcore=Power dissipated by Core. PDP= Power dissipated by Data-Path. PMEM=Power dissipated by Memory. PCNTR=Power dissipated by Control Logic. PPROC =Power dissipated by Processor.

Hardware part power Estimation continued… PDP = PREG+PMUX+PFU (3) • • PCNTR= PIN+PSTATE_REG+PCOMB+POUT (4) Hardware part power Estimation continued… PDP = PREG+PMUX+PFU (3) • • PCNTR= PIN+PSTATE_REG+PCOMB+POUT (4) PREG=Power dissipated by Registers. PMUX=Power dissipated by Multiplexers. PFU=Power dissipated by Functional units. PIN=Power dissipated by Primary inputs. PSTATE_REG=Power dissipated by state registers. PCOMB=Power dissipated by combinational logic. POUT=Power dissipated by primary outputs.

Software part power Estimation • Avg power dissipated by processor while running a program Software part power Estimation • Avg power dissipated by processor while running a program is PSW=IAVG*VDD. – IAVG is average current. – Vdd is supply voltage. • Energy is given by ESW=PSW*t. SW. • t. SW is execution time given by tsw=NCLK*t. CLK. • NCLK is number of clock cycles to execute the program and t. CLK the clock period. • Instruction set, addressing mode of processor matters for Avg power dissipation.

Low Power Codesign Techniques • There are 3 types of power dissipated in a Low Power Codesign Techniques • There are 3 types of power dissipated in a system: - Dynamic power (Pdynamic = CL. Nsw. Vdd 2. f) - Static - Short circuit • We mainly focus on reducing the dynamic power dissipation • NSW (average number of circuit state switches) is reduced by using techniques such as minimizing the Hamming distance in operations/instructions following each other or minimizing the number of operations

Low Power design at various levels • System-level power-aware design includes power and energy Low Power design at various levels • System-level power-aware design includes power and energy management and modeling issues at various levels Microarchitecture Level low power design • Instruction Set Architecture (ISA) Level - A common approach is to combine multiple instructions of a processor to make one single complex low power instruction.

Microarchitecture Level low power design contd • Instruction-Cache (I-Cache) and Bus Level - compress Microarchitecture Level low power design contd • Instruction-Cache (I-Cache) and Bus Level - compress the instructions in memory - saves instruction fetch energy by using fewer bits on a fetch - a loop cache and keep the tight loop in a small loop cache instead of accessing a larger block • Cache Region Reservation - regions in cache are reserved for applications with tight memory requirements

Microarchitecture Level low power design contd • Voltage and frequency scaling - DVS (Dynamic Microarchitecture Level low power design contd • Voltage and frequency scaling - DVS (Dynamic Voltage Scaling) - augmenting with hardware blocks that allow changing the supply voltage dynamically - Reduction of Vdd and/or frequency saves substantial power - DVS heuristics usually trade off power savings against delay

Compiler Level Low power design • Voltage and frequency scaling - compiler driven DVS Compiler Level Low power design • Voltage and frequency scaling - compiler driven DVS using ‘checkpoints’ - statically determined points in the program where processor frequency and voltage can be changed • Memory Organization - contributes around 50% of power dissipation - categorize the memory subsystem access and customize the memory architecture for access type and locality patterns

Operating System level low power design • RTOS typically utilizes modular architecture which can Operating System level low power design • RTOS typically utilizes modular architecture which can be adapted for low power applications • I/O device scheduling - power aware I/O device scheduling - Online algorithms with task schedule and device usage list as inputs can give out a sequence of sleep/idle states for each device • Power and energy profiling of OS - software layers in an OS consume significant power - poorly built-in idle cycles can increase power consumption

Operating System level low power design contd • Jitter - DVS schemes can be Operating System level low power design contd • Jitter - DVS schemes can be used for reclaiming the slack for power savings Network Level low power design • network protocols to increase determinism • applied in wireless networks to develop low power ad-hoc network • Energy Efficient Ethernet (EEE)

Low power hardware software partitioning • Using hierarchical power efficiency system • advantage of Low power hardware software partitioning • Using hierarchical power efficiency system • advantage of higher performance for lower power trade off • configuration-aware data partitioning • depending on the application requirements, the reconfiguration overhead impacts the data-partitioning process • multi-rate cyclic scheduling algorithm • minimizes schedule length (thus allowing cheaper PEs) but also significantly reduces reconfiguration energy

RC and power-performance tradeoffs • Reconfigurable computing (RC) as an alternative to ASICs and RC and power-performance tradeoffs • Reconfigurable computing (RC) as an alternative to ASICs and general-purpose processors. • it provides the flexibility of software processors and the efficiency and throughput of hardware coprocessors. • Improve power consumption by performing computations more effectively.

ARCHITECTURE ARCHITECTURE

Architecture contd • DRP - dynamically reconfigurable processors. • L 2 - On-chip multibank Architecture contd • DRP - dynamically reconfigurable processors. • L 2 - On-chip multibank memory subsystem. • L 1 – local memory buffer. Each DRP processor has its own clock signal, which means that this is a kind of globally-asynchronous– locallysynchronous (GALS) architecture.

Architecture of a DRP • 1) The load unit • 2) The store unit Architecture of a DRP • 1) The load unit • 2) The store unit • 3) The dynamically reconfigurable logic This approach enables of having three processes running concurrently. They are

Architecture of a DRP contd • 1) The load unit receiving data for the Architecture of a DRP contd • 1) The load unit receiving data for the next computation. • 2) The reconfigurable logic is processing data from a buffer in the load unit and storing this processed data in a buffer of the store unit. • 3) The store unit is sending the previous processed data to the L 2 memory subsystem.

Energy–Performance Tradeoff Results Energy–Performance Tradeoff Results

Energy–Performance Tradeoff Results contd • Dynamic reconfiguration improves the HW/SW partitioning approach by • Energy–Performance Tradeoff Results contd • Dynamic reconfiguration improves the HW/SW partitioning approach by • 1) 62. 5% when using two DRP processors • 2) 47. 3% when using three DRP processors

Software Hardware Co-Synthesis • Hardware-software co-synthesis is the process of partitioning system specification into Software Hardware Co-Synthesis • Hardware-software co-synthesis is the process of partitioning system specification into hardware and software modules to meet performance, power and cost goals. • Embedded computing systems must meet tight cost, power consumption, and performance constraints • Embedded computing systems are often heterogeneous multiprocessors with multiple CPUs and hardwired processing elements (PEs).

Software Hardware Co-Synthesis Design Platform • Hardware/software co-design can be used either to design Software Hardware Co-Synthesis Design Platform • Hardware/software co-design can be used either to design systems from scratch or to reuse an existing platform. The CPU+ accelerator architecture is one common co-design platform 1) A PC-based system with the accelerator housed on a board plugged into the PC bus. 2) A custom-printed circuit board, using either an FPGA or a custom integrated circuit for the accelerator. 3) A platform FPGA that includes a CPU and an FPGA fabric on a single chip. 4) A custom integrated circuit, for which the accelerator implements a function in less area and with lower power consumption

Software Hardware Co-Synthesis Algorithms • Given a system specification, a co-synthesis algorithm produces a Software Hardware Co-Synthesis Algorithms • Given a system specification, a co-synthesis algorithm produces a detailed description of an architecture that meets the design constraints and optimizes a set of costs. • Ideally, they try to satisfy multiple objective functions simultaneously such as execution time, price, and average power consumption. • Co-synthesis algorithm must select different processing elements (PEs) and communication resources to use in the embedded system (allocation), determine which resource will be used to carry out each portion of the specification’s computation and communication (assignment), and produce a schedule for all of the specification’s computation and communication (scheduling)

Software Hardware Co-Synthesis Algorithm Inputs • Task graphs, have been used for many years Software Hardware Co-Synthesis Algorithm Inputs • Task graphs, have been used for many years to specify concurrent software. Task graphs are generally not described at the operator level, so provide a coarser-grained description of the functionality • In addition to representing the program to be implemented, we must also represent the hardware platform being designed. Table gives the execution time of each process for each type of processing element.

Software Hardware Co-Synthesis Algorithm Inputs • A co-synthesis program can easily look up the Software Hardware Co-Synthesis Algorithm Inputs • A co-synthesis program can easily look up the execution time for a process once it knows the type of PE to which the process has been allocated. In general, a multidimensional table could have several entries for each row/ column pair, including: 1) The CPU time entry shows the computation time required for a process. 2) The communication time entry gives the time required to send data across a link; the amount of data is specified by the sourcedestination pair. 3) The cost entry gives the manufacturing cost of a processing element or communication link. 4) The power entry gives the power consumption of a PE or communication link; this entry can be further subdivided into static and dynamic power components.

A LOW POWER BIOMEDICAL SIGNAL PROCESSOR ASIC BASED ON HARDWARE SOFTWARE CO DESIGN A LOW POWER BIOMEDICAL SIGNAL PROCESSOR ASIC BASED ON HARDWARE SOFTWARE CO DESIGN

CONCLUSION Hardware/software co-design promises an integrated approach in which hardware and software designed in CONCLUSION Hardware/software co-design promises an integrated approach in which hardware and software designed in parallel. The use and reuse of hardware and software designs can lead to products of with good performance with a shorter design and development time as compared to traditional integrated circuit design methodologies. System-level power-performance tradeoffs for fine-grained reconfigurable computing is been explored. configuration-aware data-partitioning technique for reconfigurable architectures is been proposed and shown how the reconfiguration overhead directly impacts this data-partitioning process. Thus hardware/software co-design is the key design technology for digital systems.

REFERENCES [1]Juanjo Noguera and Rosa M. Badia, “System-Level Power-Performance Tradeoffs for Reconfigurable Computing”, IEEE REFERENCES [1]Juanjo Noguera and Rosa M. Badia, “System-Level Power-Performance Tradeoffs for Reconfigurable Computing”, IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 14, NO. 7, JULY 2006 [2]Juanjo Noguera and Rosa M. Badia, “HW/SW Codesign Techniques for Dynamically Reconfigurable Architectures”, IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 10, NO. 4, AUGUST 2002 [3] Li Shang, Member, IEEE, Robert P. Dick, Member, IEEE, and Niraj K. Jha, Fellow, IEEE , “SLOPES: Hardware–Software Cosynthesis of Low. Power Real-Time Distributed Embedded Systems With Dynamically Reconfigurable FPGAs”, IEEE TRANSACTIONS ON COMPUTERAIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 26, NO. 3, MARCH 2007 [4] Thomas, D. E. , Adams, J. K. and Schmit, H. “A Model and Methodology for Hardware /software codesign” , Design & Test of Computers, IEEE , VOL 10, Issue 3. Pages 6 -15, Sept. 1993

Cont… [5]OSMAN S. UNSAL, MEMBER, IEEE, AND ISRAEL KOREN, FELLOW, IEEE, “System-Level Power-Aware Design Cont… [5]OSMAN S. UNSAL, MEMBER, IEEE, AND ISRAEL KOREN, FELLOW, IEEE, “System-Level Power-Aware Design Techniques in Real -Time Systems”, PROCEEDINGS OF THE IEEE, VOL. 91, NO. 7, JULY 2003 pp 1055 -1069 [6]Iftikhar, Khan, Taikyeong Ted. Jeong , Gyungleen Park, and Anthony P. Ambler, “A HW/SW Co-design Methodology: An Accurate Power Efficiency Model and Design Metrics for Embedded System”, 2009 10 th ACIS International Conference on Software Engineering, Artificial Intelligences, Networking and Parallel/Distributed Computing [7]William Fornaciari, Member, IEEE, Paolo Gubian, Member, IEEE, Donatella Sciuto, Member, IEEE, and Cristina Silvano, “Power Estimation of Embedded Systems: A Hardware/Software Codesign Approach”, IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 6, NO. 2, JUNE 1998

Cont… [8]DONALD E. THOMAS, JAY K. ADAMS, HERMAN SCHMIT, Carnegie Mellon University, “A Model Cont… [8]DONALD E. THOMAS, JAY K. ADAMS, HERMAN SCHMIT, Carnegie Mellon University, “A Model and Methodology for Hardware-Software Codesign”, 0740 -747519310900 -0006$03. 00 0 1993 IEEE DESION & TEST OF COMPUTERS [9]Pallav Gupta, “Hardware-Software codesign”, DECEMBER 2001/JANUARY 2002 IEEE potentials [10]GIOVANNI DE MICHELI AND RAJESH K. GUPTA, “Hardware/Software Co-Design” Proc. IEEE, VOL. 85, NO. 3, pp. 349 -365, MARCH 1997. [11]Z. D. Nie, L. Wang, Member, IEEE , W. G. Chen, T. Zhang, and Y. T. Zhang, Fellow, IEEE, “A Low Power Biomedical Signal Processor ASIC Based on Hardware Software Codesign”, 31 st Annual International Conference of the IEEE EMBS Minneapolis, Minnesota, USA, September 2 -6, 2009 [12]Ralf Joost, Ralf Salomon, “Hardware-Software Co-Design in Practice: A Case Study in Image Processing”, 1 -4244 -0136 -4/06/$20. 00 '2006 IEEE