4f86360e838e9345bd6fdc09d2c81bb2.ppt
- Количество слайдов: 41
Universität Dortmund Hardware/Software Codesign P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 - 1 -
Universität Dortmund Design productivity gap P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 - 2 -
© Lauro Rizzatti Marketing Vice President Emulation & Verification Engineering (EVE) lauro@eve-usa. com Universität Dortmund P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 - 3 -
Universität Dortmund Today: people taking about crises! Previous ITRS editions have documented a design productivity gap: the number of available transistors grows faster than the ability to meaningfully design them. Yet, investment in process technology has by far dominated investment in design technology. Good news: Enabling progress in DT continues. : -) Bad news: Test cost has grown exponentially relative to manufacturing cost. Today, many design technology gaps are crises. [ ITRS, Design Report 2003, http: //public. itrs. net/ ] P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 - 4 -
Universität Dortmund Current approach: Improving DT step-by-step [ ITRS, Design Report 2003, http: //public. itrs. net/ ] P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 - 5 -
Universität Dortmund Reuse as a way out Pre-designed standard components to be used. • Standard software components • Standard hardware components Platform-based design P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 - 6 -
Universität Dortmund Platform-based design A platform is a family of architectures satisfying a set of constraints imposed to allow the reuse of hardware and software components. However, a hardware platform is not enough. Quick, reliable, derivative design requires using a platform application programming interface (API) to extend the platform toward application software. In general, a platform is an abstraction layer that covers many possible refinements to a lower level. Platform-based design is a meet-in-the-middle approach: In the top-down design flow, designers map an instance of the upper platform to an instance of the lower, and propagate design constraints [Sangiovanni-Vincentelli, 2002]. P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 - 7 -
Universität Dortmund Platform-based design Bottom-Up: Top-Down: Find the appropriate platform levels. Map an instance of the upper platform onto an lower platform considering appropriate constrains. Define platform level parameters Platform instances Platform abstraction levels P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 - 8 -
Universität Dortmund Platform-based design Few design areas suitable for PBD: • System Platform Stack The main application area. The primary notion of PBD originates here. • Network Platforms Equivalent to protocol stacks. • Analog Platform Performance models, behavioral models and interconnection models. Decouples the application development process from the architectural implementation process. [Sangiovanni-Vincentelli, DAC 2004] P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 - 9 -
Universität Dortmund Iterative approach (1) Guided by performance evaluation P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 - 10 -
Universität Dortmund Essentially the same with our flow … System architecture Implementation Mapping R ef in e System behavior Performance simulation P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 - 11 -
Universität Dortmund Iterative approach: Spec. C model P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 - 12 -
Universität Dortmund Overview of design activities • Task level concurrency management Which tasks in the final system? • High level transformations Transformation that are outside the scope of traditional compilers • Hardware/software partitioning Which operation mapped to hardware, which to software? • Compilation Hardware-aware compilation • Scheduling Performed several times, with varying precision • Design space exploration Set of possible designs, not just one. P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 - 13 -
Universität Dortmund Task-level concurrency management Granularity: size of tasks (e. g. in instructions) Readable specifications and efficient implementations can possibly require different task structures. Granularity changes P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 - 14 -
Universität Dortmund Merging of tasks Reduced overhead of context switches, More global optimization of machine code, Reduced overhead for inter-process/task communication. P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 - 15 -
Universität Dortmund Splitting of tasks No blocking of resources while waiting for input, more flexibility for scheduling, possibly improved result. P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 - 16 -
Universität Dortmund Merging and splitting of tasks The most appropriate task graph granularity depends upon the context merging and splitting may be required. Merging and splitting of tasks should be done automatically, depending upon the context. P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 - 17 -
Universität Dortmund Automated rewriting of the task system - Example - P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 - 18 -
Universität Dortmund Attributes of a system that needs rewriting Tasks blocking after they have already started running P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 - 19 -
Universität Dortmund Work by Cortadella et al. 1. 2. 3. 4. Transform each of the tasks into a Petri net, Generate one global Petri net from the nets of the tasks, Partition global net into “sequences of transition” Generate one task from each such sequence Mature, commercial approach not yet available P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 - 20 -
Universität Dortmund Result, as published by Cortadella Reads only at the beginning Initialization task Never true Always true P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 - 21 -
Universität Dortmund Never true Optimized version of Tin j==i-1 j i Tin () { READ (IN, sample, 1); sum += sample; i++; DATA = sample; d = DATA; L 0: if (i < N) return; DATA = sum/N; d = DATA; d = d*c; WRITE(OUT, d, 1); sum = 0; i = 0; return; } Always true P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 - 22 -
Universität Dortmund Task-level concurrency management (2) § The dynamic behavior of applications getting more attention. § Energy consumption reduction is the main target. § Some classes of applications (i. e. video processing) have a considerable variation in processing power requirements depending on input data. § Static design-time methods becoming insufficient. § Runtime-only methods not feasible for embedded systems. How about mixed approaches? P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 - 23 -
Universität Dortmund Example of a mixed TCM Task 1 E Task 2 Deadline Task 3 t t …or they can define a probability for violating the deadline. Mixed methods use compile-time analysis to define a set of possible execution parameters for each task. Runtime scheduler selects the most energy saving, deadline preserving combination. Deadline Static (compile-time) methods can ensure WCET feasible schedules, but waste energy in the average case. t [IMEC, Belgium, http: //www. imec. be/] P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 - 24 -
Universität Dortmund Example of an mixed TCM „Gray-box“: Extract only the information needed for scheduling. Transformations: Merge and/or split task. (Functionality comparable to Cortadella’s approach. ) Find Pareto-curves for each task. Runtime scheduler: uses an heuristic to combine the Pareto-curves. [IMEC, Belgium, http: //www. imec. be/] P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 - 25 -
Universität Dortmund Floating-point to fixed point conversion • Pros: – – – Lower cost Faster Lower power consumption Sufficient SQNR, if properly scaled Suitable for portable applications • Cons: – Decreased dynamic range – Finite word-length effect, unless properly scaled • Overflow and excessive quantization noise © Ki-Il Kum, et al. (Seoul National – Extra programming effort University): A Floating-point To Fixedpoint C Converter For Fixed-point Digital Signal Processors, 2 nd SUIF Workshop, 1996 P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 - 26 -
Universität Dortmund Fixed-Point Data Format • Floating-Point vs. Fixed-Point – exponent, mantissa – Floating-Point • automatic computation and update of each exponent at run-time – Fixed-Point • implicit exponent • determined off-line • Integer vs. Fixed-Point S 1 0 0 . . . 0 0 1 0 (a) Integer IWL=3 S 1 0 0 FWL. . . 0 0 1 0 hypothetical binary point (b) Fixed-Point © Ki-Il Kum, et al P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 - 27 -
Universität Dortmund Assignment and Addition/Subtraction Assume y = x, with - x (IWL=2) and - y (IWL=3): x x>>1 s s Let result = x + y: equalizing each IWL x s x>>1 s + y y s s result s © Ki-Il Kum, et al P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 - 28 -
Universität Dortmund Multiplication Assume result = x * y, with - x (IWL=2) and - y (IWL=3) x -> result (IWL=2+3) s * y s s s result s © Ki-Il Kum, et al P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 - 29 -
Universität Dortmund Development Procedure Floating-Point C Program Range Estimator Floating. Point to Fixed-Point C Program Converter Range Estimation C Program Execution Fixed-Point C Program P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Manual specification IWL information - 30 © Ki-Il Kum, - et al
Universität Dortmund Range Estimator Floating-Point C Program C pre-processor C front-end ID assignment Range Estimation C Program float iir 1(float x) { static float s = 0; float y; y = 0. 9 * s + x; range(y, 0); s = y; range(s, 1); Subroutine call insertion SUIF-to-C converter Range Estimation C Program Execution IWL Information return y; } P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 - 31 © Ki-Il Kum, - et al
Universität Dortmund Floating-Point to Fixed-Point Program Converter Fixed-Point C Program int iir 1(int x) { static int s = 0; int y; y=sll(mulh(29491, s)+ (x>> 5), 1); s = y; return y; } • mulh – to access the upper half of the multiplied result – target dependent implementation • sll – to remove 2 nd sign bit – opt. overflow check © Ki-Il Kum, et al P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 - 32 -
Universität Dortmund Floating-Point to Fixed-Point Program Converter Fixed-Point C Program 0. 9 @ IWL = 0 0 int iir 1(int x) { static int s = 0; int y; y=sll(mulh(29491, s)+ (x>> 5), 1); s = y; return y; } 1 1 1 0 0 1 1 = 0 x 7333 = 29491 x IWL = 0 y IWL = 4 s IWL = 4 “mulh” IWL = 0+4+1 s = 5 x>>5 for add © Ki-Il Kum, et al P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 - 33 -
Universität Dortmund Performance Comparison - Machine Cycles - © Ki-Il Kum, et al P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 - 34 -
Universität Dortmund Performance Comparison - Machine Cycles - © Ki-Il Kum, et al P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 - 35 -
Universität Dortmund Performance Comparison - SNR - © Ki-Il Kum, et al P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 - 36 -
Universität Dortmund Fundamental considerations of tradeoffs by Brodersen (Berkeley) P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 - 37 -
Universität Dortmund Fridge [ISS, Aachen, http: //www. iss. rwth-aachen. de/] Fixed-Point Programming and Design Environment • RWTH Aachen, commercialized by Synopsys as part of the Co. Centric tool suite. • Uses type definition features of C++ to define abstract data types (i. e. ‘fixed’) • Incorporated into System. C. (It’s used for bit-true simulation. ) • Needs architecture dependent back-end optimizations. P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 - 38 -
Universität Dortmund Fridge Fixed-Point Programming and Design Environment Workflow overview: • Input: floating-point algorithm + designer supplied annotations. • Conversion. Iterative, feedback through simulation. • Back-end exploits architectural features. (i. e. mulh, sat, round) • Output: Target optimized integer C code. [ISS, Aachen, http: //www. iss. rwth-aachen. de/] P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 - 39 -
Universität Dortmund Fridge Fixed-Point Programming and Design Environment Conversion steps: • Designer annotates some operands (with WL, IWL, …) • Hybrid code: Partially converted to fixedpoint. • Interpolation: Automatic annotate of remaining operands, transfer each operand into fixed-point type. • Code Gen. : Generates pure C code. DSP Back End • Back End: Optimize for target. • Bit-true simulation. [ISS, Aachen, http: //www. iss. rwth-aachen. de/] P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 - 40 -
Universität Dortmund Today’s summary • Design-Productivity-Gap: No final remedy available, but step-by-step improvements keep costs in a reasonable range. • Platform based design: Reuse is the key. PBD is the systematic approach to it. • Task-Concurrency-Management: Optimize the task set. Goals: Non-blocking job execution / Increased energy efficiency. • Float-point to Fixed-point: Fixed-point arithmetic uses integer operations Simpler and faster hardware than for float-point operations. P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 - 41 -