6106a8d6dbcdb8f70ca2e6ceacb10170.ppt
- Количество слайдов: 89
Embedded Systems Architecture Introduction Sensors Actuators A/D and D/A Converters Communication Processing Units Conclusion CDA 4630/5636 – Spring 2018 Copyright © 2018 Prabhat Mishra 1
Components of Embedded Systems Display Analog Digital Converter Embedded Computing (Processors, Memories, …) Digital Analog Converter Actuators Sensors Environment 2
Sensors Processing of physical data starts with capturing this data. Sensors can be designed for virtually every physical stimulus heat, light, sound, weight, velocity, acceleration, electrical current, voltage, pressure, . . . u Many physical effects used for constructing sensors. u law of induction (generation of voltages in an electric field), u light-electric effects.
Artificial Leg 4
Artificial Hand 5
Prosthetic Hand 6
Prosthetic Hand with Sensors Identity Protected 7
Artificial Eyes © Dobelle Institute (www. dobelle. com)
Charge-Coupled Devices (CCD) Image Sensors: Based on charge transfer to next pixel cell
CMOS Image Sensors Based on standard production process for CMOS chips, allows integration with other components.
Source: B. Diericks: CMOS image sensor concepts. Photonics West 2000 Short course Comparison CCD/CMOS sensors
Biometrical Sensors Example: Fingerprint sensor (© Siemens, VDE): Matrix of 256 x 256 elem. Voltage ~ distance. Resistance also computed. No fooling by photos and wax copies. Carbon dust? Integrated into ID mouse.
Components of Embedded Systems Display Analog Digital Converter Embedded Computing (Processors, Memories, …) Digital Analog Converter Actuators Sensors Environment 13
Discretization of Time Sampling: how often the signal is converted? Quantization: how many bits used for sampling?
Aliasing Signal frequency: 5. 6 Hz Sampling frequency: 9 Hz
Analog to Digital Conversion Sampling: how often is the signal converted? u. Twice as high as the highest frequency signal present in the input Quantization: how many bits used to represent a sample? u. Sufficient to provide required dynamic range q 16 -bit A/D 20 log 10(216) = 96 d. B (human ear limit) u. Under-loading: dynamic range not used properly u. Clipping: input signal beyond the dynamic range Aliasing: erroneous signals, not present in analog domain, but present in digital domain u. Use anti-aliasing filters u. Sample at higher than necessary rate
Analog-to-Digital Converter 3. 0 V 2. 5 V 2. 0 V 1. 5 V 1. 0 V 0. 5 V 0 V proportionality 4 4 3 3 analog output (V) 5. 0 V 4. 5 V 4. 0 V 3. 5 V 1111 1110 1101 1100 1011 1010 1001 1000 0111 0110 0101 0100 0011 0010 0001 0000 analog input (V) Vmax = 7. 5 V 7. 0 V 6. 5 V 6. 0 V 5. 5 V 2 1 t 1 0100 t 2 t 3 t 4 time 1000 0110 0101 Digital output analog to digital 2 1 t 2 0100 t 3 1000 0110 Digital input digital to analog t 4 time 0101
Flash A/D Converter Parallel comparison with reference voltage Speed: O(1) HW complexity: O(n) u n= # of distinguished voltage levels
Successive Approximation Key idea: binary search: Set MSB='1' if too large: reset MSB Set MSB-1='1' if too large: reset MSB-1 …. . Speed: O(log(n)) Hardware complexity: O(log(n)) with n= # of distinguished voltage levels; slow, but high precision possible.
Successive Approximation Given an analog input signal whose voltage should range from 0 to 15 volts, and an 8 -bit digital encoding, calculate the correct encoding for 5 volts. 1 0 0 0 0 3. 75 + 0. 94 + ½(0. 94) = 4. 69 +. 47 = 5. 16 volts Greater than 5, so reset to ‘ 0’ 0 1 1 0 0 0 4. 69 + ½(0. 47) = 4. 69 +. 24 = 4. 93 volts Less than 5, so keep ‘ 1’ 0 1 0 1 0 0 3. 75+ ½(3. 75) = 0 3. 75 + 1. 88 = 5. 63 volts Greater than 5, so reset to ‘ 0’ 1 1 0 0 0 4. 93 + ½(. 24) = 4. 93 +. 12 = 5. 05 volts Greater than 5, so reset to ‘ 0’ 0 1 0 1 1 0 3. 75 + ½(1. 88) = 3. 75 + 0. 94 = 4. 69 volts Less than 5, so keep ‘ 1’ 1 0 0 0 0 4. 93 + ½(. 12) = 4. 99 volts 0 1 0 1 ½(Vmax – Vmin) = 7. 5 volts Greater than 5, so reset to ‘ 0’ ½(7. 5) = 3. 75 volts Less than 5, so keep ‘ 1’ 0
Components of Embedded Systems Display Analog Digital Converter Embedded Computing (Processors, Memories, …) Digital Analog Converter Actuators Sensors Environment 21
Digital-to-Analog (D/A) Converters Various types, can be quite simple, e. g. :
Output voltage no. represented by x Due to Kirchhoff‘s laws: Current into Op-Amp=0: Hence: Finally:
Components of Embedded Systems Display Analog Digital Converter Embedded Computing (Processors, Memories, …) Digital Analog Converter Actuators Sensors Environment 24
Stepper Motor Controller Stepper motor: rotates fixed number of degrees when given a “step” signal u. In contrast, DC motor just rotates when power applied. u. Rotation achieved by applying specific voltage sequence to coils Controller greatly simplifies this
Stepper Motor Controller sbit SM_A, SM_B, SM_AP, SM_BP; // ports int curr_pos; // tells us the current step position void reset() { // must be called to synchronize current_pos = 0; for(int i=0; i<4; i++) { move_one_step(0); } } void move_one_step(int dir/*0=CW, 1=CCW*/) { const int SM_TBL[4][4] = { 1, 1, 0, 0, 0, 1, 1, 1, 0, 0 }; cur_pos = (curr_pos + (dir == 0 ? +1 : +3)) % 4; SM_A = SM_TBL[curr_pos][0]; SM_B = SM_TBL[curr_pos][1]; SM_AP = SM_TBL[curr_pos][2]; SM_BP = SM_TBL[curr_pos][3]; ms_delay(50); }
Actuators Huge variety of actuators and output devices. Microsystems motors as examples (© MCNC): (© MCNC)
Micro-array of Mirrors TI Fellow Dr. Larry Hornbeck, 2015 TI Fellow Dr. Larry Hornbeck, 1998 28
Components of Embedded Systems Display Analog Digital Converter Embedded Computing (Processors, Memories, …) Digital Analog Converter Actuators Sensors Environment 29
LCD Liquid Crystal Display N rows by M columns Controller build into the LCD module Simple microprocessor interface using ports Software controlled
Display Technologies Liquid Crystal Display (LCD) Cathode Ray Tube (CRT) Plasma Display Panel (PDP) Digital Light Processing (DLP) Liquid Crystal on Silicon (LCo. S) Laser Video Display Light Emitting Diode (LED) Surface-conduction electron-emitter display (SED) Field-emission display (FED) Electronic paper (EPD) Organic light-emitting diode (OLED) Quantum dot display (QLED) 31
Components of Embedded Systems Display Analog Digital Converter Embedded Computing (Processors, Memories, …) Digital Analog Converter Actuators Sensors Environment 32
Communication: Hierarchy Inverse relation between volume and urgency quite common: Sensor/actuator busses
Communication: Requirements Real-time behavior Efficient, economical Bandwidth and communication delay Robustness Fault tolerance Maintainability Diagnosability Security
Basic Techniques: Electrical Robustness Single-ended vs. differential signals ground Voltage at input of Op-Amp positive '1'; otherwise '0' Local ground Combined with twisted pairs; Most noise added to both wires.
Evaluation (Twisted Pairs) Advantages: u. Subtraction removes most of the noise u. Changes of voltage levels have no effect u. Reduced importance of ground wiring u. Higher speed Disadvantages: u. Requires negative u. Increased number voltages of wires and connectors Applications: u. USB, Fire. Wire, ISDN u. Ethernet (STP/UTP CAT 5 cables) u. Differential SCSI u. High-quality analog audio signals
Real-time behavior Carrier-sense multiple-access/collisiondetection (CSMA/CD, Standard Ethernet) no guaranteed response time. Alternatives: u. Token rings, token busses u. Carrier-sense multiple-access/collision-avoidance (CSMA/CA) q WLAN techniques with request preceeding transmission q Each partner gets an ID (priority). After each bus transfer, all partners try setting their ID on the bus; partners detecting higher ID disconnect themselves from the bus. Highest priority partner gets guaranteed response time; others only if they are given a chance.
Example 1: Sensor/Actuator Bus Real-time behavior is very important Different techniques: Many wires less wires expensive & flexible CNC: Computerized Numerical Control
Example 2: Field bus More powerful/expensive than sensor interfaces; serial busses preferred. Examples: u. Process Field Bus (Profibus) qhttp: //www. profibus. com q. Token passing; q 9. 6 kbit/s (1200 m) to 500 kbits/s (200 m); qtoo slow to be used for hard time constraints.
Field Buses Controller area network (CAN) u u u u u Designed by Bosch and Intel in 1981; Used in cars and other equipment; Differential signaling with twisted pairs, Arbitration using CSMA/CA, Throughput between 10 kbit/s and 1 Mbit/s, Low and high-priority signals, Max. latency of 134 µs for high priority signals, Coding similar to that of serial (RS-232) lines of PCs, with modifications for differential signaling. http: //www. can. bosch. com
Field Buses The Time-Triggered-Protocol (TTP) [Kopetz et al. ] u for fault-tolerant safety systems like airbags in cars. Flex. Ray: TDMA (Time Division Multiple Access) protocol, developed by the Flex. Ray consortium (BMW, Ford, Bosch, Daimler. Chrysler, General Motors, Motorola, Philips). u Combination of a variant of the TTP and the byteflight [Byteflight Consortium, 2003] protocol. u Designed to meet key automotive requirements u Complements the major in-vehicle networking standards u A high data rate can be achieved: initially targeted for a data rate of approximately 10 Mbit/sec; however, the design of the protocol allows much higher data rates to be achieved.
Example 3: Wireless Communication
Wireless Communication IEEE 802. 11 a/b/g UMTS (Universal Mobile Telecommunications System) u Bandwidth is becoming a scarce resource. DECT (Digital Enhanced Cordless Telecommunications) u Standard used for wireless phones in Europe Bluetooth u Connect devices e. g. , mobile phone and headset
Components of Embedded Systems Display Analog Digital Converter Embedded Computing (Processors, Memories, …) Digital Analog Converter Actuators Sensors Environment 44
Global Energy Consumption Quadrillion British thermal units (Btu) OECD - Organization for Economic Co-operation and Development Source: U. S. Energy Information Administration, International Energy Outlook 2011. 45
Global Greenhouse Gas Emissions 46
Global Greenhouse Gas Emissions Improved efficiency with Information Technology can reduce 29% in green house gases for transport, industrial, residential and commercial sectors. 2009 U. S. Greenhouse Gas Inventory Report 47
Power Consumption in Data Centers In 2006, datacenters used 1. 5% (60 billion KW-hr/year) of all the electricity produced in the US … if nothing significant is done about the situation, this consumption will rise to 2. 9% by 2011. 48 Report to Congress on Server and Data Center Energy Efficiency, EPA 2007.
The Energy/Flexibility Conflict Operations/Watt [MOps/m. W] 10 1 0. 01 ASIC DSP-ASIPs g mputin Co urable fig Recon s cessor Pro 1. 0µ 0. 5µ µPs poor design generation techniques 0. 25µ 0. 13µ 0. 07µ Technology
Power and Energy P E t In many cases, faster execution also means less energy, but the opposite may be true if power has to be increased to allow faster execution.
Power and Energy Power is drawn from a voltage source Power: Energy: Average Power:
Dynamic Power needed to charge and discharge load capacitances when transistors switch. u The capacitor needs to charge for output to be ‘ 1’ u For output to be ‘ 0’, capacitor needs to discharge This repeats T. fsw times over an interval of T Here, is activity factor and f is clock frequency.
Low Power vs. Low Energy Minimize the power consumption u u Design of the power supply Design of voltage regulators Dimensioning of interconnect Short term cooling Minimizing the energy consumption u Restricted availability of energy (mobile systems) q Limited battery capacities (only slowly improving) q Very high costs of energy (solar panels, in space) u. Cooling q High costs q Limited space u. Dependability q Long lifetimes, low temperatures
Information Processing ASIC Processor Energy efficiency v Code-size efficiency v Run-time efficiency v q Special features of DSP processors q Multimedia instructions q Very Long Instruction Word (VLIW) machines Reconfigurable Hardware Memory
Application Specific Circuits (ASIC) Custom-designed circuits necessary if ultimate speed or energy efficiency is the goal and large numbers can be sold. Approach suffers from long design times and high costs.
Information Processing ASIC Processor Energy efficiency v Code-size efficiency v Run-time efficiency v q Special features of DSP processors q Multimedia instructions q Very Long Instruction Word (VLIW) machines Reconfigurable Hardware Memory
Reducing Energy Consumption Pentium Crusoe Running the same multimedia application. [www. transmeta. com] Infrared Cameras (FLIR) can be used to detect thermal distribution.
Dynamic Power Management (DPM) RUN: operational IDLE: a SW routine may stop the CPU when not in use, while monitoring interrupts SLEEP: Shutdown of on-chip activity 400 m. W RUN 10µs 90µs 160 ms IDLE 10µs 90µs 50 m. W STRONGARM SA 1100 SLEEP 160µW
Dynamic Voltage Scaling (DVS) E = P x T P V 2 u E (energy), P (power), T (time), V (voltage) Example A task is given with workload (W) and deadline (D). Assume that idle energy is negligible. u E 1 V 12. T 1 = V 2. T E 2 V 22. T 2 = V 2/4. 2 T = E 1/2 V V/2 T D T 2 T D 59
Dynamic Voltage Scaling 60
Information Processing ASIC Processor Energy efficiency v Code-size efficiency v Run-time efficiency v q Special features of DSP processors q Multimedia instructions q Very Long Instruction Word (VLIW) machines Reconfigurable Hardware Memory
Code Size Efficiency RISC machines designed for run-time, not for code-size-efficiency Compression techniques: key idea
Code-size Efficiency Compression techniques (continued): u 2 nd instruction set e. g. , ARM Thumb instruction set 001 10 Rd Constant major opcode 16 -bit Thumb instr. ADD Rd #constant minor source= opcode destination zero extended 1110 001 01001 0 Rd 0000 Constant • Reduction to 65 -70 % of original code size • 130% of ARM performance with 8/16 bit memory • 85% of ARM performance with 32 -bit memory
Domain-oriented Architectures n-1 Application: y[j] = i=0 x[j-i]*a[i] i: 0 i n-1: yi[j] = yi-1[j] + x[j-i]*a[i] Architecture: ADSP 210 x (analog. com) D Address- registers A 0, A 1, A 2 . . i+1, j-i+1 Address generation unit (AGU) AX P x a x[j-i] AY AF - Parallelism - Dedicated registers MX MY a[i] MF MR: =0; A 1: =1; A 2: =n-2; MX: =x[n-1]; MY: =a[0]; for ( j: =1 to n) { MR: =MR+MX*MY; +, -, . . AR * x[j-i]*a[i] +, yi-1[j] MR MY: =a[A 1]; MX: =x[A 2]; A 1++; A 2— }
Information Processing ASIC Processor Energy efficiency v Code-size efficiency v Run-time efficiency v q Special features of DSP processors q Multimedia instructions q Very Long Instruction Word (VLIW) machines Reconfigurable Hardware Memory
Digital Signal Processing (DSP) Multiply/accumulate (MAC) and zero-overhead loop (ZOL) instructions Heterogeneous registers Separate address generation units (AGUs)
Digital Signal Processing (DSP) Modulo addressing sliding window Am++ Am: =(Am+1) mod n x (implements ring or circular buffer in memory) t 1 t 2 t . . x[n-2] x[n-1] x[0] x[1]. . Memory, t=t 1 . . x[n-3] x[n-2] x[n-1] x[n] x[1] Memory, t 2=t 1+1
Multimedia Instructions Many registers, adders etc. are very wide u 32 or 64 bits Most multimedia data types are narrow e. g. , 8 bits per color, 16 bit per audio sample u 2 - 8 values can be stored per register and added. u + 4 additions per instruction; carry disabled at word boundaries.
HP precision architecture (hp PA) Half word add instruction HADD: Half word add? Optional saturating arithmetic. Up to 10 instructions can be replaced by HADD.
Application Scaled interpolation between two images Next word = next pixel, same color. 4 pixels processed at a time.
Pentium MMX Architecture 64 -bit vectors represent 8 bytes, 4 words or 2 double word encoded numbers. wrap around/saturating options. u Multimedia registers mm 0 - mm 7, consistent with floating-point registers (OS unchanged). u Instruction Options Comments Padd[b/w/d] PSub[b/w/d] wrap around, addition/subtraction of saturating bytes, words, double words Pcmpeq[b/w/d] Pcmpgt[b/w/d] Result= "11. . 11" if true, "00. . 00" otherwise Pmullw Pmulhw multiplication, 4*16 bits, least significant word multiplication, 4*16 bits, most significant word
Pentium MMX Architecture Psra[w/d] Psll[w/d/q] Psrl[w/d/q] No. of positions in register or instruction Punpckl[bw/wd/dq] Punpckh[bw/wd/dq] Packss[wb/dw] Parallel shift of words, double words or 64 bit quad words Parallel unpack saturating Parallel pack Pand, Pandn Por, Pxor Logical operations on 64 bit words Mov[d/q] Move instruction
VLIW Processors VLIW: Very Long Instruction Word Detection of parallelism is done by compiler, not by hardware at run-time (inefficient). Parallel operations (instructions) encoded in one long word (instruction packet), each instruction controlling one functional unit.
Partitioned Register Files Many memory ports are required to supply enough operands per cycle. Memories with many ports are expensive. u Registers are partitioned into sets, e. g. for TI C 60 x: Data path A Data path B register file A L 1 S 1 register file B M 1 D 2 Address bus Data bus M 2 S 2 L 2
Microcontrollers: MHS 80 C 51 8 -bit CPU optimised for control applications Extensive Boolean processing capabilities 64 k Program Memory address space 64 k Data Memory address space 4 k bytes of on chip Program Memory 128 bytes of on chip data RAM 32 bi-directional and indiv. addressable I/O lines Two 16 -bit timers/counters Full duplex UART 6 sources/5 -vector interrupts with 2 priority levels On chip clock oscillators Very popular CPU with many different variations
Information Processing ASIC Processor Energy efficiency v Code-size efficiency v Run-time efficiency v q Special features of DSP processors q Multimedia instructions q Very Long Instruction Word (VLIW) machines Reconfigurable Hardware Memory
Reconfigurable Logic Full custom chips may be too expensive, software may be too slow. u Use of configurable hardware q e. g. , field programmable gate arrays (FPGAs) Application areas u Fast prototyping q configuring mobile phone according to local standards u Low volume applications Example: Xilinx Virtex II FPGAs
Floorplan of VIRTEX II FPGAs CLB: Configurable Logic Block
Configurable Logic Block (CLB)
Information Processing ASIC Processor Energy efficiency v Code-size efficiency v Run-time efficiency v q Special features of DSP processors q Multimedia instructions q Very Long Instruction Word (VLIW) machines Reconfigurable Hardware Memory
Access time will be a problem Speed gap between processor and memory increases early sixties (Atlas): page fault ~ 2500 instructions 2002 (2 GHz µP): access to DRAM ~ 500 instructions penalty for cache miss soon be same as for page fault in Atlas (1. 5 -2 p . a . ) Speed 8 CP U 4 2 x every 2 years. ). 07 p. a (1 DRAM 2 1 0 1 2 3 4 5 years [P. Machanik: Approaches to Addressing the Memory Wall, TR Nov. 2002, U. Brisbane]
Access times and energy consumption Example (CACTI Model): "Currently, the size of some applications is doubling every 10 months" [STMicroelectronics, Medea+ Workshop, Stuttgart, Nov. 2003]
Energy consumption by Memory Mobile PC Thermal Design (TDP) System Power Other 13% 600/500 MHz u. P 13% Power Supply 10% Memory+Graphics 12% LCD 10" 30% Memory+Graphics 15% LCD 10" 19% Note: Based on Actual Measurements CPU Dominates Thermal Design Power [Source: V. Tiwari] Other 13% 600/500 MHz u. P 37% Power Supply 10% HDD 9% Mobile PC Average System Power HDD 19% Multiple Platform Components Comprise Average Power
“CPU” Power Dissipation 42% / 40% memory-related ! IEEE Journal of SSC Nov. 96 Based on slide by and ©: Osman S. Unsal, Israel Koren, C. Mani Krishna, Csaba Andras Moritz, University of Massachusetts, Amherst, 2001 Proceedings of ISSCC 94
Real-time Capability Timing behavior has to be predictable. Features that cause problems: q Caches with difficult to predict replacement strategies q Unified caches (conflicts between instructions and data) q Pipelines with difficult to predict stall cycles ("bubbles") q Interrupts that are possible any time q Memory refreshes that are possible any time q Instructions that have data-dependent execution times [Dagstuhl workshop on predictability, Nov. 17 -19, 2003] No caches, use Scratch Pad memories
Why not just use a Cache ? 1. Predictability? Worst case execution time (WCET) may be large [P. Marwedel et al. , ASPDAC, 2004]
Scratch pad Memory (SPM) Hierarchy Example main SPM Address space processor 0 no tag memory FFF. . scratch pad memory ARM 7 TDMI cores, well-known for low power consumption
Multi-processor ARM (MPARM) ARM SPM Interconnect (AMBA or STBus) Shared Main Memory Interrupt Device Semaphore Device Fairchild F 8 of 1975 contained 64 bytes of scratchpad u… u Intel’s Knights Landing Processor has a 16 GB MCDRAM than can be configured as either a cache, scratchpad memory, or divided in two parts u
Conclusions Embedded systems consist of a wide variety of hardware (analog/digital) components Sensors and actuators interact with the physical world Communication needs to be efficient/real-time Processor design needs to be aware of v energy efficiency, performance, code size, etc. Memory design also needs many constraints v cache is not suitable in real-time systems Reconfigurable systems provide a trade-off between flexibility and efficiency 89
6106a8d6dbcdb8f70ca2e6ceacb10170.ppt