Скачать презентацию 1 DAC 2006 CAD Challenges for Leading-Edge Скачать презентацию 1 DAC 2006 CAD Challenges for Leading-Edge

05c3108e02718f71f9acff9b15c24a33.ppt

  • Количество слайдов: 33

1 1

DAC 2006 CAD Challenges for Leading-Edge Multimedia Designs DAC 2006 CAD Challenges for Leading-Edge Multimedia Designs

NOMADIK “The challenge of low power, high performance and scalable multimedia acceleration” Alain Artieri NOMADIK “The challenge of low power, high performance and scalable multimedia acceleration” Alain Artieri - Patrick Blouet STMicroelectronics July 26, 2006

Multimedia Computing Landscape 4 Multimedia Computing Landscape 4

The convergence paradigm Personal Computer New Mobile Multimedia Computing Architecture Consumer Electronics 5 Mobile The convergence paradigm Personal Computer New Mobile Multimedia Computing Architecture Consumer Electronics 5 Mobile Phone

Consumer versus Computer q Consumer Products Ø Ø Ø Ø High quality of service Consumer versus Computer q Consumer Products Ø Ø Ø Ø High quality of service Designed for worst case Highly parallel architecture Hardware accelerators Open platform, multi OS Flexibility Rich set of standard interfaces for storage and connectivity q Personal Computer Ø Ø Ø Monolithic processor architecture High MHz for performance High power consumption Open OS Flexibility Rich set of standard interfaces for storage and connectivity New computing architecture must combine the best of both worlds 6

Cell Phones : a Key Driver 1990 2005 2010 M Units <100 400 700 Cell Phones : a Key Driver 1990 2005 2010 M Units <100 400 700 900 Features 7 2000 Voice & Data Multimedia Global Convergence

Competing Technical Constraints Scalability Multimedia Performance 8 Low Power Competing Technical Constraints Scalability Multimedia Performance 8 Low Power

Multimedia Performance Requirements : q q q Multiple video standard, encode and decode (MPEG Multimedia Performance Requirements : q q q Multiple video standard, encode and decode (MPEG 4, H 264, WMV, …), up to HDTV format High resolution : VGA screen and above in small form factor, Output to HDTV with large screen Multi megapixel camera, DSC class image reconstruction chain and picture improvement Sophisticated Audio use cases : combination of multiple Codecs, sound effects, speech codecs, … Advanced 3 D graphics acceleration for gaming q Consume & produce high bandwidth multimedia content 9

Low Power A key system technology driver q Of course a product feature : Low Power A key system technology driver q Of course a product feature : q Ø q But helps product manufacturability Ø q Stacking in a power budget And product cost Ø Ø 10 Battery life time Low cost packaging No heat sink

Nomadik Architecture Overview 11 Nomadik Architecture Overview 11

Application Processor Content Host Processor Multimedia Accelerator Peripherals Embedded Memory Host processor & peripherals, Application Processor Content Host Processor Multimedia Accelerator Peripherals Embedded Memory Host processor & peripherals, Multimedia Acceleration, No differentiation differentiating factor q q 12 The architecture & design challenge is in Multimedia Acceleration (Audio, Video, Imaging, Graphics) This is were innovation is required and competitive advantage is built

Nomadik Multimedia Acceleration Model Interconnect DSP DSP Tightly Coupled HW DMA engine q q Nomadik Multimedia Acceleration Model Interconnect DSP DSP Tightly Coupled HW DMA engine q q q 13 … DMA engine Multiple DSP Attached to HW acceleration Data mover Multiple DSP based sub-system Symmetrical DSPs (generic S/W component can run anywhere) Attached HW resources (dependence resolved at component manager level)

Multiple DSP approach benefits q High computing performance : Ø Ø Ø q Low Multiple DSP approach benefits q High computing performance : Ø Ø Ø q Low Power (target: 100’s of m. W) : Ø Ø Ø 14 Multiple non interfering domains of intense activity, each having its own processor, DMA services and hardware accelerators for data intensive functions Hardware acceleration embedding standard functions (e. g. video codec, image reconstruction & improvement) Highest & predictable performance through a careful bus and memory hierarchy design Intrinsic low power sub systems Fine grain power management at sub system level Leakage management by switching on & off sub systems

Power management q Combination of multiple techniques : Ø Dynamic power reduction : • Power management q Combination of multiple techniques : Ø Dynamic power reduction : • Clock gating • Voltage scaling (DVFS) • Pulse-Width Modulation (PWM) Ø Static power reduction : • Biasing • Power On/Off switching (Power gating) q 15 A global system issue from power management inside the OS down to silicon process (e. g. gate leakage)

DVFS Principle Voltage/ Frequency Tables 100% 1. 2 V 85% 62% Process Requirements : DVFS Principle Voltage/ Frequency Tables 100% 1. 2 V 85% 62% Process Requirements : -Large voltage excursion -Low leakage 16 1. 1 V CPU Voltage 55% energy saving CPU performance requirements 1. 3 V 28% energy saving Operating System Load Monitor (SW)

PWM Principle Operating System Load Monitor (SW) 100% 85% 1. 0 V 62% 1. PWM Principle Operating System Load Monitor (SW) 100% 85% 1. 0 V 62% 1. 0 V Process Requirements : -Clock as fast as possible -Source bias or switch off when clock is stopped 17 38% energy saving Active clock ratio table 1. 0 V 15% energy saving CPU performance requirements CPU Voltage

Multi-step PWM q Power management state machine under SW control Ø Ø Source Bias Multi-step PWM q Power management state machine under SW control Ø Ø Source Bias for short clock stop period Power off with context save/restore for long period save restore Short stop Long stop (Source Bias – reduced leakage) (Power Off – zero leakage) 18

Power management q Power mode changes are managed by software: Ø Ø Constraints and Power management q Power mode changes are managed by software: Ø Ø Constraints and impact must be known by software developer. Information initially needed only at design level is now flowing into the software space. q q 19 Power awareness in the software world is coming form the design world through better link between design tools and software development tools. Need for a power view of the application accessible to software developers.

Software Architecture for Multimedia Acceleration 20 Software Architecture for Multimedia Acceleration 20

User Interface Operating System Multimedia Framework Multimedia API Media Network Server Upward pervasion of User Interface Operating System Multimedia Framework Multimedia API Media Network Server Upward pervasion of design constraints Complex Multimedia Software Stack Execution Infrastructure Codecs, Sensors, Presentation Hardware 21 So. C design perimeter

Objectives q A unified programming model for distributed computing Ø Ø Ø q Enforce Objectives q A unified programming model for distributed computing Ø Ø Ø q Enforce software architecture Ø Ø Ø q Modularity Component programming model Multimedia framework Comprehensive debug Ø Ø 22 One S/W component can run anywhere possible Dynamically configurable Run complex algorithms that requires more than one DSP System level monitoring Component observable by construction (auto code instrumentation)

Complex use case illustration • 16 QCIF decode • 1 Grab & Viewfinder • Complex use case illustration • 16 QCIF decode • 1 Grab & Viewfinder • Graphics & control on Host CPU • SVGA display • 100 m. W 23

Architecture evolution 24 Architecture evolution 24

So. C evolution across technology nodes 2004 2006 2008 2010 2012 Technology Node (nm) So. C evolution across technology nodes 2004 2006 2008 2010 2012 Technology Node (nm) 90 65 45 32 22 Loosely coupled Sub-Systems 2 4 6 8 12 Single Multiple General Purpose CPU Hardwired Reconfigurable Hardware Accelerator q Constant So. C Die Size q Slow evolution of peripherals (area decrease) q General purpose CPU sub-system complexity double at each node (constant area), q Embedded memory capacity double at each node (constant area) q Loosely coupled DSP sub-system complexity increase by 30% at each node (30% area decrease) 25

Main trends q Host CPU evolving toward multi-core architecture to meet the performance increase Main trends q Host CPU evolving toward multi-core architecture to meet the performance increase requirements q HW acceleration mapped on reconfigurable arrays Ø Ø 26 Performances close to dedicated HW in many areas Good fit with regular design constraints imposed by 45 nm process and beyond Excellent structure for best optimized power management And … FLEXIBILITY …

Reconfigurable Hardware (DSP fabric) q Target signal processing and arithmetic intensive applications q Reconfigurable Reconfigurable Hardware (DSP fabric) q Target signal processing and arithmetic intensive applications q Reconfigurable array of simple DSP core (CNode) q Low power architecture Ø Ø Hierarchical clock gating Distributed leakage control (fine grain power gating) q q 27 Programmable DMA engine Reconfigurable at run time, multi task

Mapping Flow DFG Behavioral code Procedure(In, Out, inout) Coarse grained configuration Constant A, b, Mapping Flow DFG Behavioral code Procedure(In, Out, inout) Coarse grained configuration Constant A, b, c, …; Partitioning/ static scheduling Begin X=a-in[0]; Level 1 M U X N 0_i Clusters Level 0 ……. . Mux level 2 End; N 0_o N 1_i N 1_o Data out N 2_i N 2_o Data in • Alus execute a cyclic micro-sequence Data in Data out Data in ILP + software pipelining 28 • Data exchanges through hierarchical clustered interconnect Data out • Configuration step is sequence loading and interconnect programming

What can fit in 45 mm² in 45 nm Programmable Multimedia Accelerator L 1 What can fit in 45 mm² in 45 nm Programmable Multimedia Accelerator L 1 L 1 L 1 DSP DSP DSP 192 CNode HW HW HW (40 GOPS) DMA DMA DMA Video H/W HW Imaging H/W DMA Interconnect 4 MB Multi-port Embedded Memory 30 L 2 L 1 Host Core 2 Peripherals & analog

CAD Challenges 31 CAD Challenges 31

Main area of CAD challenges q Low Power design Ø Ø Ø q Software Main area of CAD challenges q Low Power design Ø Ø Ø q Software design Ø Ø Ø 32 Static & Dynamic power global optimization Power control is becoming very fine grain. Must be tightly linked with software environment. Power control is beyond the pure So. C. System level power view is needed. Efficient software design on hierarchical multiprocessor engine Capability to architect & design software architecture as efficiently as HW Capture tools, simulation, verification, automated code generation

Main area of CAD challenges q Synthesis on Reconfigurable hardware Ø Configuring the hardware Main area of CAD challenges q Synthesis on Reconfigurable hardware Ø Configuring the hardware network • 3 D place & route of massively parallel code on arrays of DSP’s • Design constraints going up in the software – Reconfiguration latency – Expected performance. Ø Reconfigurable hardware managed at software level. • Software development environment has to be aware of reconfigurable hardware. – Profiling to extract hot spot and benefit if doing in hardware. – Code generation as well reconfiguration sequence for hardware. 33

Conclusion q For multimedia processors, the complexity is moving to software design Ø Hardware Conclusion q For multimedia processors, the complexity is moving to software design Ø Hardware complexity resolved through regular design (multicore host, multi-DSP, coarse-grained DSP fabric) q CAD challenge lies essentially in S/W design tools Ø Ø 34 Multimedia software execution infrastructure, simulation, debug Programmable hardware acceleration