
c944978d8406a9a5015e3f924e59500f.ppt
- Количество слайдов: 39
Hardware Software Co-Design Group 8 Sandeep. V Vivek Radhakrishnan Shravan Rao Deepa Nagaraj Jagat Bhushan Rathish Kumar Sameera Markod
Agenda • • • Introduction Concept Fundamental Phases of HSC Power Metric Low power design Hardware software partition Cosynthesis Implementation example Conclusion
Need of HW/SW Co-Design • Earlier: Hardware and software components were designed in isolations. Drawbacks: üLess than best possible implementation. üHardware & Software component integration issues. üUncertainty in system functionality. üHigh cost due to long development cycles.
Hardware Software Co-design (HSC) • Concept : Meeting system-level objectives by exploiting the synergism of hardware and software through their concurrent design. Advantage: ü Shorter development cycles ü Low power Designs ü Component Re-use ü Less Expensive
Understanding of HSC • HSC partitioned into distinct key areas Co. Development, Co-verification and Co-Simulation. • Co-Development is detailed implementation of hardware and software components. • Co-Verification is verification of these component using Co-Simulation technology. • In simple words HSC refers to co-design in which system is modeled and simulated at various level of abstraction, analyze the trade offs, partition it into hardware and software components and design concurrent both hardware and software.
Fundamental Phases of HSC Library Formal Specification Models Modeling Modification & Refinements Simulation and verification Design Evaluation Verified Design HW/SW Partitioning Software Synthesis C Code Interfacing Implementation and prototyping Hardware Synthesis VHDL Code
Fundamental Phases of HSC Continued…. • Formal Spec : It documents system’s requirement, constraints like power & size, functionality and behavior and major Interfaces. • Modeling : It contains set of instruction which describes the model of the design. Ø Structural Model: Describes hardware and software components which system. Ø Functional Model: Describes system’s functionality Ø Dynamic Model: Describes state transitions occurring inside system. • Model Library: It provide knowledge about existing design which can be reused.
Fundamental Phases of HSC Continued…. • Simulation and verification : CAD tool based simulation engines. Initial condition are set through scenario based test cases. • Hardware/Software partitioning : Verified Design is translated into C-Code (Software components) and Verilog (hardware components) code. FM algorithm is a efficient algorithm used in functional partitioning for HSC and circuit placement. • Trade –off considered during HW/SW partitioning Trade-Off Std Proc Core Proc ASIP ASIC Performance Medium Highest Power High Medium-low Lowest Flexibility Medium High Low Design Time Low(Software) Medium Highest High(Hardware)
Fundamental Phases of HSC Continued…. • Interfacing: System component communication requires interfaces. Ø HW-HW, HW-SW, SW-SW. Ø Examples: Memory mapped I/O interface, Interrupt handling, Busy waiting and handshaking. • Implementation and prototyping: Ø Breaking down system functionality into small domain independent, concurrent and interacting/communicating process. Ø Performance requirement such as op. freq, throughput and latency is defined. Ø Allocation process, action of assigning process either to HW or SW domain. Ø Scheduling process.
Low Power Design approach by HSC • System-level partitioning i. e assignment of operations to HW and SW greatly impacts low power achievement, system cost and performance • Hierarchical power efficiency System Ø Efficient way to reduce dynamic power consumption
High-Level Power Estimation Metrics • Embedded is categorized as – Timing constrained system: The speed is the most important design constraint. – Area constrained system: The area is the most important constraint. • Timing constrained system can further categorized as depending on system throughput (T) as fixed throughput mode, maximum throughput mode, and burst throughput mode. • Energy to throughput ratio (ETR) defined for timing constrained as ETR = power/T 2. • Energy per operation by Area EAP= (power*A)/T. • Estimation of power is derived by considering HW and SW separately.
Hardware part power Estimation • Power Dissipated by ASIC Pavg=PIO+Pcore (1) Pcore=PDP+PMEM+PCNTR+PPROC (2) PIO=Power dissipated by I/O. Pcore=Power dissipated by Core. PDP= Power dissipated by Data-Path. PMEM=Power dissipated by Memory. PCNTR=Power dissipated by Control Logic. PPROC =Power dissipated by Processor.
Hardware part power Estimation continued… PDP = PREG+PMUX+PFU (3) • • PCNTR= PIN+PSTATE_REG+PCOMB+POUT (4) PREG=Power dissipated by Registers. PMUX=Power dissipated by Multiplexers. PFU=Power dissipated by Functional units. PIN=Power dissipated by Primary inputs. PSTATE_REG=Power dissipated by state registers. PCOMB=Power dissipated by combinational logic. POUT=Power dissipated by primary outputs.
Software part power Estimation • Avg power dissipated by processor while running a program is PSW=IAVG*VDD. – IAVG is average current. – Vdd is supply voltage. • Energy is given by ESW=PSW*t. SW. • t. SW is execution time given by tsw=NCLK*t. CLK. • NCLK is number of clock cycles to execute the program and t. CLK the clock period. • Instruction set, addressing mode of processor matters for Avg power dissipation.
Low Power Codesign Techniques • There are 3 types of power dissipated in a system: - Dynamic power (Pdynamic = CL. Nsw. Vdd 2. f) - Static - Short circuit • We mainly focus on reducing the dynamic power dissipation • NSW (average number of circuit state switches) is reduced by using techniques such as minimizing the Hamming distance in operations/instructions following each other or minimizing the number of operations
Low Power design at various levels • System-level power-aware design includes power and energy management and modeling issues at various levels Microarchitecture Level low power design • Instruction Set Architecture (ISA) Level - A common approach is to combine multiple instructions of a processor to make one single complex low power instruction.
Microarchitecture Level low power design contd • Instruction-Cache (I-Cache) and Bus Level - compress the instructions in memory - saves instruction fetch energy by using fewer bits on a fetch - a loop cache and keep the tight loop in a small loop cache instead of accessing a larger block • Cache Region Reservation - regions in cache are reserved for applications with tight memory requirements
Microarchitecture Level low power design contd • Voltage and frequency scaling - DVS (Dynamic Voltage Scaling) - augmenting with hardware blocks that allow changing the supply voltage dynamically - Reduction of Vdd and/or frequency saves substantial power - DVS heuristics usually trade off power savings against delay
Compiler Level Low power design • Voltage and frequency scaling - compiler driven DVS using ‘checkpoints’ - statically determined points in the program where processor frequency and voltage can be changed • Memory Organization - contributes around 50% of power dissipation - categorize the memory subsystem access and customize the memory architecture for access type and locality patterns
Operating System level low power design • RTOS typically utilizes modular architecture which can be adapted for low power applications • I/O device scheduling - power aware I/O device scheduling - Online algorithms with task schedule and device usage list as inputs can give out a sequence of sleep/idle states for each device • Power and energy profiling of OS - software layers in an OS consume significant power - poorly built-in idle cycles can increase power consumption
Operating System level low power design contd • Jitter - DVS schemes can be used for reclaiming the slack for power savings Network Level low power design • network protocols to increase determinism • applied in wireless networks to develop low power ad-hoc network • Energy Efficient Ethernet (EEE)
Low power hardware software partitioning • Using hierarchical power efficiency system • advantage of higher performance for lower power trade off • configuration-aware data partitioning • depending on the application requirements, the reconfiguration overhead impacts the data-partitioning process • multi-rate cyclic scheduling algorithm • minimizes schedule length (thus allowing cheaper PEs) but also significantly reduces reconfiguration energy
RC and power-performance tradeoffs • Reconfigurable computing (RC) as an alternative to ASICs and general-purpose processors. • it provides the flexibility of software processors and the efficiency and throughput of hardware coprocessors. • Improve power consumption by performing computations more effectively.
ARCHITECTURE
Architecture contd • DRP - dynamically reconfigurable processors. • L 2 - On-chip multibank memory subsystem. • L 1 – local memory buffer. Each DRP processor has its own clock signal, which means that this is a kind of globally-asynchronous– locallysynchronous (GALS) architecture.
Architecture of a DRP • 1) The load unit • 2) The store unit • 3) The dynamically reconfigurable logic This approach enables of having three processes running concurrently. They are
Architecture of a DRP contd • 1) The load unit receiving data for the next computation. • 2) The reconfigurable logic is processing data from a buffer in the load unit and storing this processed data in a buffer of the store unit. • 3) The store unit is sending the previous processed data to the L 2 memory subsystem.
Energy–Performance Tradeoff Results
Energy–Performance Tradeoff Results contd • Dynamic reconfiguration improves the HW/SW partitioning approach by • 1) 62. 5% when using two DRP processors • 2) 47. 3% when using three DRP processors
Software Hardware Co-Synthesis • Hardware-software co-synthesis is the process of partitioning system specification into hardware and software modules to meet performance, power and cost goals. • Embedded computing systems must meet tight cost, power consumption, and performance constraints • Embedded computing systems are often heterogeneous multiprocessors with multiple CPUs and hardwired processing elements (PEs).
Software Hardware Co-Synthesis Design Platform • Hardware/software co-design can be used either to design systems from scratch or to reuse an existing platform. The CPU+ accelerator architecture is one common co-design platform 1) A PC-based system with the accelerator housed on a board plugged into the PC bus. 2) A custom-printed circuit board, using either an FPGA or a custom integrated circuit for the accelerator. 3) A platform FPGA that includes a CPU and an FPGA fabric on a single chip. 4) A custom integrated circuit, for which the accelerator implements a function in less area and with lower power consumption
Software Hardware Co-Synthesis Algorithms • Given a system specification, a co-synthesis algorithm produces a detailed description of an architecture that meets the design constraints and optimizes a set of costs. • Ideally, they try to satisfy multiple objective functions simultaneously such as execution time, price, and average power consumption. • Co-synthesis algorithm must select different processing elements (PEs) and communication resources to use in the embedded system (allocation), determine which resource will be used to carry out each portion of the specification’s computation and communication (assignment), and produce a schedule for all of the specification’s computation and communication (scheduling)
Software Hardware Co-Synthesis Algorithm Inputs • Task graphs, have been used for many years to specify concurrent software. Task graphs are generally not described at the operator level, so provide a coarser-grained description of the functionality • In addition to representing the program to be implemented, we must also represent the hardware platform being designed. Table gives the execution time of each process for each type of processing element.
Software Hardware Co-Synthesis Algorithm Inputs • A co-synthesis program can easily look up the execution time for a process once it knows the type of PE to which the process has been allocated. In general, a multidimensional table could have several entries for each row/ column pair, including: 1) The CPU time entry shows the computation time required for a process. 2) The communication time entry gives the time required to send data across a link; the amount of data is specified by the sourcedestination pair. 3) The cost entry gives the manufacturing cost of a processing element or communication link. 4) The power entry gives the power consumption of a PE or communication link; this entry can be further subdivided into static and dynamic power components.
A LOW POWER BIOMEDICAL SIGNAL PROCESSOR ASIC BASED ON HARDWARE SOFTWARE CO DESIGN
CONCLUSION Hardware/software co-design promises an integrated approach in which hardware and software designed in parallel. The use and reuse of hardware and software designs can lead to products of with good performance with a shorter design and development time as compared to traditional integrated circuit design methodologies. System-level power-performance tradeoffs for fine-grained reconfigurable computing is been explored. configuration-aware data-partitioning technique for reconfigurable architectures is been proposed and shown how the reconfiguration overhead directly impacts this data-partitioning process. Thus hardware/software co-design is the key design technology for digital systems.
REFERENCES [1]Juanjo Noguera and Rosa M. Badia, “System-Level Power-Performance Tradeoffs for Reconfigurable Computing”, IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 14, NO. 7, JULY 2006 [2]Juanjo Noguera and Rosa M. Badia, “HW/SW Codesign Techniques for Dynamically Reconfigurable Architectures”, IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 10, NO. 4, AUGUST 2002 [3] Li Shang, Member, IEEE, Robert P. Dick, Member, IEEE, and Niraj K. Jha, Fellow, IEEE , “SLOPES: Hardware–Software Cosynthesis of Low. Power Real-Time Distributed Embedded Systems With Dynamically Reconfigurable FPGAs”, IEEE TRANSACTIONS ON COMPUTERAIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 26, NO. 3, MARCH 2007 [4] Thomas, D. E. , Adams, J. K. and Schmit, H. “A Model and Methodology for Hardware /software codesign” , Design & Test of Computers, IEEE , VOL 10, Issue 3. Pages 6 -15, Sept. 1993
Cont… [5]OSMAN S. UNSAL, MEMBER, IEEE, AND ISRAEL KOREN, FELLOW, IEEE, “System-Level Power-Aware Design Techniques in Real -Time Systems”, PROCEEDINGS OF THE IEEE, VOL. 91, NO. 7, JULY 2003 pp 1055 -1069 [6]Iftikhar, Khan, Taikyeong Ted. Jeong , Gyungleen Park, and Anthony P. Ambler, “A HW/SW Co-design Methodology: An Accurate Power Efficiency Model and Design Metrics for Embedded System”, 2009 10 th ACIS International Conference on Software Engineering, Artificial Intelligences, Networking and Parallel/Distributed Computing [7]William Fornaciari, Member, IEEE, Paolo Gubian, Member, IEEE, Donatella Sciuto, Member, IEEE, and Cristina Silvano, “Power Estimation of Embedded Systems: A Hardware/Software Codesign Approach”, IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 6, NO. 2, JUNE 1998
Cont… [8]DONALD E. THOMAS, JAY K. ADAMS, HERMAN SCHMIT, Carnegie Mellon University, “A Model and Methodology for Hardware-Software Codesign”, 0740 -747519310900 -0006$03. 00 0 1993 IEEE DESION & TEST OF COMPUTERS [9]Pallav Gupta, “Hardware-Software codesign”, DECEMBER 2001/JANUARY 2002 IEEE potentials [10]GIOVANNI DE MICHELI AND RAJESH K. GUPTA, “Hardware/Software Co-Design” Proc. IEEE, VOL. 85, NO. 3, pp. 349 -365, MARCH 1997. [11]Z. D. Nie, L. Wang, Member, IEEE , W. G. Chen, T. Zhang, and Y. T. Zhang, Fellow, IEEE, “A Low Power Biomedical Signal Processor ASIC Based on Hardware Software Codesign”, 31 st Annual International Conference of the IEEE EMBS Minneapolis, Minnesota, USA, September 2 -6, 2009 [12]Ralf Joost, Ralf Salomon, “Hardware-Software Co-Design in Practice: A Case Study in Image Processing”, 1 -4244 -0136 -4/06/$20. 00 '2006 IEEE