aa3670c91f21a6e62c8a03ec7c8f334a.ppt
- Количество слайдов: 111
Tipovi procesora - Chapter 2 -
Different classes of processors In order to achieve efficient designs, there exist different classes of processors: q. Microcontrollers q. RISC processors q. Digital Signal Processors (DSP) q. Multimedia processors q. Application Specific Instruction Set Processors (ASIP) q. Other calasses
Microcontrollers classes of embedded processors qrelatively slow qvery area efficient qintended for control - intensive applications qmicroprogrammed CISC architecture qthe number of clock cycles different for various instructions qlimited computational and storage resources qrelatively small word length data – path (8 or 16 bits)
Microcontrollers classes of embedded processors – cont. qcomplex instruction set – provides convenient programming interface, i. e. dense code qcontrol-oriented application domain qreach set of instruction for bit level data manipulation and peripheral components like timers or serial I/O ports qsimple processor, such as 8051, 6502 qnowadays reused in customized form as microcontroller for Ess.
Microcontroller – block diagram -
Microcontroller – detailed block diagram -
Timer as constituent of microcontroller
RISC classes of embedded processors qevolved from CISC architectures q. Harvard architecture –separated data and instruction memory qpipelined instruction execution qoffer only very basic set of instructions qinstructions are executed at very high speed qall instructions have the same size, and require the same number of clock cycle for instruction execution
RISC classes of embedded processors - cont. q. Load/Store architecture qlarge number of general purpose registers – reduced number of memory accesses in a machine program qfor a fixed application, the code size for a RISC exceed the code size of a CISC qpopular members of RISC processors for ESs are ARM RISC core, MIPS RISC core, TRICO qlow power consumption (100 m. W), suitable for portable systems with battery supply
Clock Frequency Versus Year for Various Representative Machines
Fundamental attributes The key metrics for characterizing a microprocessor include: Ø performance Øpower consumption Ø cost (chip area) Øhigh availability (fault tolerant)
Instruction Level Parallelism – Definition The next step in performance enhancement beyond pipelining calls for executing several instructions in parallel Instruction-Level Parallelism (ILP) is a family of processor and compiler design techniques that speed-up execution by causing individual machine operations, such as memory loads and stores, integer additions, and floating-point multiplications, to execute in parallel.
Parallel processor systems tend to take one of two forms: • multiprocessors – relatively large tasks, such as procedures or loop iterations are executed in parallel • instructions level parallel (ILP) processors – execute individual instructions in parallel
ILP processors Processors that exploit ILP have been much more successful than multiprocessors in the general-purpose workstations/PC market because they can provide performance improvements on conventional programs, while this has not been possible on multiprocessors. The two more common architectures for ILP are: • superscalar processors • Very Long Instruction Word (VLIW processor)
The structure of ILP processors In the structure of ILP processor some of the execution units are able to execute integer while the other floating-point operations
What is ILP ? ILP processors exploit the fact that many of the instructions in a sequential program do not depend on the instructions that immediately precede them in the program Let consider the following sequence:
What is ILP ? - continue The dependencies require that instructions 1, 3, and 5 are executed in order to generate the correct result, but instructions 2 and 4 can be executed before, after, or in parallel with any of the other instructions without changing the result of the program fragment.
Division of responsibilities between the compiler and the hardware If ILP is to be achieved, between the compiler and the runtime hardware, the following functions must be performed • the dependencies between operations must be determined • the operations, that are independent of any operation has not as yet completed, must be determined, and • these independent operations must be scheduled to execute at same particular time, on some specific functional unit, and must be assigned a register into which the result may be deposited
Breakdown of tasks between compiler and runtime hardware
Superscalar processors – basic principle Superscalar processors contain hardware that examine a sequential program to locate instructions that can be executed in parallel. This allow them to maintain compatibility between generations and to achieve speedups on programs that were compiled for sequential processors, but were compiled window of instructions that the hardware examines to select instructions that can be executed in parallel, wich can reduce performance. Superscalar processors can achieve speedups when running programs (that were compiled for execution on sequential (non. ILP)) processors without requiring recompilation
Superscalar execution Instead of ‘scalar’ execution where in each cycle only one instruction can be resident in each pipeline stage, ‘superscalar’ execution is used, where two or more instructions can be at the same pipe stage in the same cycle. Superscalar execution allow multiple instructions, that are adjacent in program order, to be in the stage of processing simultaneously Superscalar design require significant replication of resources in order to support fetching, decoding, execution, and writing-back of multiple instructions in every cycle.
General superscalar organization
Superpipelining an alternative approach An alternative approach to achieving greater performance is referred to as ‘superpipelining’ Superpipelining exploits the fact many pipeline stages perform task that require less than half a clock cycle
Superscalar vs Superpipeline
Limitations The superscalar approach depends on the ability to execute multiple instructions in parallel ILP refers to the degree to which, on average, the instructions of a program can be executed in parallel A combination of compiler-based optimization and hardware techniques can be used to maximize ILP.
Fundamental limitations to parallelism with which he system must cope are : data dependencies: - true data dependencies - output dependencies - antidependencies procedural dependencies (control dependencies) resource conflicts (structural dependencies)
Effect of dependencies
Data dependencies
Design issues: ILP versus Machine Parallelism ILP and Machine Parallelism (MP) are two related concepts in processor design so it is very important to make a clear distinction between them: ILP exists when instructions in a sequence are independent and thus can be executed in parallel overlapping. ILP is a measure how many instructions can be executed together on an infinitely wide superscalar type machine.
ILP vs Machine Parallelism MP is a measure of the ability of the processor to take advantage of ILP MP is determined by the number of instructions that can be fetched and executed at the same time (the number of parallel pipelines) and by the speed and sophistication of the mechanisms that the processor uses to find independent instructions. Both ILP and MP are important factors in enhancing performance
Example for ILP and MP The code for ( i = 0 ; i < 100 ; i ++) a[i] = a[i] + 1 ; has considerable amount of parallelism. If we built a machine with 100 functional units and memory ports would give us a 100 x speedup.
Example for ILP and MP - continue In many cases the amount of ILP is simply the ratio of dependencies (data and structural) and control dependencies to other types of instructions. Fewer branches and true data dependencies will increase ILP More functional units will increase MP
Instruction issue and instruction issue policy Machine parallelism is not simply of matter of having multiple instances of each pipeline stage. The processor must also be able to identify ILP and to orchestrate the fetching, decoding and execution of instructions in parallel. The term instruction issue refer to the process of initiating instruction execution in the processor’s functional units The term instruction issue policy refer to the protocol used to issue instructions
Instruction issue policies Superscalar instruction issue policies can be grouped into the following three categories: • In-order issue with in-order completion • In-order issue with out-of-order completion • Out-of-order issue with out-of-order completion
Instruction issue policy - examples We assume a superscalar pipeline capable of fetching an decoding two instructions at a time, having three separate functional units, and having two instances of the write-back pipeline stage The examples assumes the following constraints on a sixinstruction code fragment: – I 1 requires two cycles to execute – I 3 and I 4 conflict for the same functional unit – I 5 depends on the value produced by I 4 – I 5 and I 6 conflict for a functional unit
In Order Issue and in Order Completion
In Order Issue Out of Order Completion
Out of Order Issue and Out of Order Completion
Another Example of out-of-order execution
Conceptual Description of Superscalar Processing
Superscalar processor - How execution progresses
Superscalar Internal Structure
Another Superscalar Internal Structure
Instruction Flow, Register and Memory Dataflow
VLIW processors - basic principles VLIW processors architecture requires that programs be recompiled for the new architecture but achieves very good performance on program written in sequential languages such as C or Fortran when these programs are recompiled for a VLIW processor. VLIW is one particular style of processor design that tries to achieve high levels of ILP by executing long instruction words composed of multiple operations. VLIW processors, contrary to superscalar approach, take a differant approach to ILP, relying on the compiler to determine which instructions may be executed in parallel and provide that information to the hardware.
VLIW instruction & VLIWprocessor In VLIW processors, each instruction specifies several independent operations that are executed in parallel by the hardware
Sheduling sequence of operations for execution on a VLIW processor with 3 Execution unit – Example Let consider the following sequence: VLIW scheduling will be:
VLIW – different flavours of parallelism The number of operations in VLIW instructions is equal to the number of execution units in the processor Each operation specifies the instruction that will be executed in the cycle that the VLIW instruction is issued. There is no need for the hardware to examine the instruction stream to determine which instructions may be executed in parallel. The compiler is responsible for ensuring that all of the operations in an instruction can be executed simultaneonsly.
Pros and cons of VLIW – advantages The main advantages of VLIW architectures are: • simpler instruction issue logic, often allow VLIW processors to fit more execution units onto a given amount of chip space (than superscalar processors) • the compiler generally has a larger-scale view of the program than the instruction logic in a superscalar processor and if therefore generally better than the issue logic at finding instructions to execute in parallel
Pros and cons of VLIW – disadvantages The most significant disadvantages of VLIW processors are: VLIW programs only work correctly when executed on a processor with the same number of execution units and the same instruction latencies as the processor they were compiled. Code written for a machine with 4 concurrent integer units could not exploit additional execution units in a later model. Likewise, code optimized for a newer VLIW with 8 concurrent integer units would not function correctly on an older machine with fewer units.
Pros and cons of VLIW – disadvantages - continue In addition, if the compiler cannot find enough parallel operations to fill all of the slots in an instruction, it must place explicit Nop operation into the coresponding operation slots. This causes VLIW programs to take more memory than equivalent programs for superscalar processors.
Defoe Processor – VLIW Representative
Itanium Bundle
Itanium Register Set
Parallelism of Instruction Execution and Instruction Issue
The ways to exploit parallelism: Scalar & instruction Superscalar
The ways to exploit instruction parallelism: Super-pipeline & VLIW
Typical application of VLIW and superscalar processors VLIW processors are often used in digital signal-processing (DSP) applications, where high performance and low cost are critical Superscalar processors are mainly used in general-purpose computers such as workstations and PCs, because customers demand software compatibility between generations of a processor
Improving performance In general performance can be improved by increasing IPC and/or by decreasing the instruction count RISC architecture seeks to increase both frequency and IPC via pipelining and use of cache memories at the expanse of increased instruction count CISC microprocessors employ RISC-like internal representation to achieve higher frequency while maintaining lower instruction count VLIW concept, revived with the EPIC (Explicitly Parallel Instruction Computing) uses the compiler to schedule instruction statically. Exploiting parallelism statically can enable simpler control logic and help EPIC to achieve higher IPC and higher frequency
DSP classes of embedded processors qdesigned for arithmetic – intensive signal processing applications qinstruction set tuned for fast execution of algorithms like digital filtering and FFT qspecial hardware components: hardware multipliers and dedicated address generation units qinstructions can be executed in parallel - VLIW architecture
DSP classes of embedded processors - cont. qunlike RISCs, DSPs use special purpose registers (dedicated accumulator register) qoperate in special arithmetic mode - saturation mode qdue to irregularities in the processor architecture, compared to other processor classes, compilers construction is difficult qthe market leader in DSPs is Texas Instruments
Signalno-procesne arhitekture Danas na tržištu se mogu identifikovati sledeće signalnoprocesne arhitekture: - ASIC – Application Specific Integrated Circuit - ASSP - Application Specific Standard Product - konfigurabilni procesori – Configurable Processor - DSP – Digital Signal Processor - FPGA – Field Programmable Gate Array - MCU - Microcontroller - RISC/GPP – Reduced Instruction Set Computer / General Purpose Processor
Kriterijumi koji se koriste za procenu mogućnosti procesnih elemenata Ø Vremenski period od trenutka kada se proizvod zamisli do trenutka kada se proizvede (Time to market) – veoma važno Ø Performanse (Performance) – vrlo važne Ø Cena (Price) – vrlo važna Ø Sredstva za projektovanje koja su a raspolaganju (Development Ease) - vrlo važna Ø Potrošnja (Power) – srednje važnosti Ø Fleksibilnost karakteristika (Feature Flexibility) – nisu od velike važnosti
Kriterijumi za procenu pogodnosti primene date arhitekture kod procesiranja signala u realnom vremenu
Tipovi programibilnih VLSI kola
Tipovi programibilnih VLSI kola – nast. Ø ASIC - specifično projektovana kola koja izvršavaju jedinstveni zadatak. Kod ovih kola u kasnijoj fazi projektovanja je veoma teško izvršiti izmene. Upravljačka jedinica je obično tipa hard-wired. Ø ASPP - programibilna arhitektura (odnosi se na stazu podataka) koja je u stanju da izvršava veći broj različitih zadataka ( aktivnosti ). Upravljačka jedinica je mikroprogramski zasnovana. Postoji nekoliko programa upisanih u mikroprogramskoj memoriji pri čemu se svaki program odnosi na jedan zadatak. Ø ASIP - takođe poseduje programibilnu stazu podataka koja se sa aspekta fleksibilnosti nalazi negde između ASPP-a i DSP-a. U ovom slučaju staza podataka je nešto uopštenije strukture jer kao i kod standardnih procesora sadrži RF polje (registarsko polje) i ALU. U odnosu na DSP se razlikuje po tome što je skup instrukcija dosta ograničen (restriktivan je) a takođe i broj internih magistrala nije tako veliki. Primena ASIP-a je ograničena na specifične aplikacije koje se mogu brzo izvršavati.
Tipovi programibilnih VLSI kola – nast. Ø DSP - procesori za obradu digitalnih signala, na sličan način kao i mikrokontrolerske jedinice ( kakve su popularne Intel 80 C 51 ili Motorola MC 68 HC 11 ) su “zaokružene” računarske mašine sa interno ugrađenim U/I kanalima i memorijom ali sa znatno superiornijim mogućnostima za matematičkom manipulacijom kao i arhitekturom koja je bolje prilagođena obradi tipovima podataka (pre svega nizovima) tipičnih za digitalno procesiranje signala. Danas DSP-ovi su postale ključne VLSI komponente koje se ugrađuju u komunikacionim, medicinskim, vojnim, industrijskim i raznim drugim proizvodima široke potrošnje. Istraživači i projktanti ih često sa opravdanjem smatraju kao klasa mikroprocesora koja je optimizirana za digitalnu obradu signala. Ø MPU su procesorske jedinice opšte namene koje su u stanju, po ceni redukovane brzine izvršenja, da izvršavaju, bez ograničenja, zadatke bilo kog tipa.
Množenje sa akumulacijom – specifičnost DSP-a
Veći broj izvršnih jedinica – specifičnost DSP-a
Razlika u memorijskim arhitekturama kod standardnih MPU-ova I DSP procesora
Generator adresa i pristup memoriji kod DSP procesora
Tipična organizacija U/I-a kod DSP procesora
Tipična aplikacija DSP procesora – TMS 320 C 240
Konvencionalni u odnosu na poboljšani DSP
Organizacija izvršnih jedinica memorije (program & podatke) kod TMS 320 C 62 xx
SIMD DSP procesori Princip rada 64 - bitnog sabirača podeljen na četiri 16 - bitne sub-reči.
Performansne karakteristike nekih DSP procesora BDTI - je performansna mera
Paralelno procesiranje nezavisnih instrukcija
TMS 320 C 10
TMS 320 C 206
TMS 320 VC 33
Multimedia processors classes of embedded processors qrelatively new on the market - architecturally related to RISCs and DSPs qintended for multimedia applications: audio, image, or video signal processing qthe architecture follows the VLIW paradigm qdifferent functional units can operate in parallel q. Use general purpose registers like RISCs
Multimedia processors classes of embedded processors - cont. q. The architecture is more regular than in DSPs qthe compiler is responsible for exploiting ILP in a program q. Examples of multimedia processors are: C 6201 (up to 8 parallel instructions per cycle), Trimedia TM 1000 (up to 5 parallel instructions per cycle)
Multimedia processor – TM 1000
Multimedia processor – STn 8810
ASIPs classes of embedded processors q. Microcontrollers, RISCs, DSPs and multimedia processors are domain - specific: they are tuned for certain application domain, but not for the given application itself q. ASIPs are compromise between domain - specific processors and non-programmable ASICs q. ASIPS are programmable, but they serve only a very narrow range of application
ASIPs classes of embedded processors - cont 1 q. ASIPs can be parameterized qthe basic architecture of an ASIP is fixed, but it can be customized for a given application by setting a number of different parameters qword lengths my be adjusted to the required precision, register files my be sized, and available special hardware components tuned q. Since these parameters are mostly orthogonal to each other, large number of different configuration of a single ASIP may be available
ASIPs classes of embedded processors - cont 2 q. ASIPs are very efficient, but a large number of different compilers are normally required q. Retargetable compilers are capable of generating code for any particular ASIP configuration Regular versus retargetable compilation
ASIP in the context of processor HW implementation class
The energy - flexibility gap
Definitions of ASIP related terms From application point of view The technical literature uses the acronym ASIP to describe two different kinds of digital ICs: ASIP Application-Specific Integrated Processor (any kind of digital IC used for data processing and does not imply any kind of instruction set oriented or programmable data processing) Application-Specific Instruction Set Processor (Application-Specific Instruction Processor) Programmable application-specific Processor using the concept of an Instruction set architecture for Data processing
Evolution of design criteria in CMOS integrated circuits
Power dissipation in time “CMOS Circuits dissipate little power by nature. So believed circuit designers” (Kuroda-Sakurai, 95) 100 Power (W) x 4 / 3 years 10 1 0. 01 80 85 90 95 “By the year 2000 power dissipation of high-end ICs will exceed the practical limits of ceramic packages, even if the supply voltage can be feasibly reduced. ”
Gloom and Doom predictions
Power density will increase
VDD, Power and Current Trend 200 Voltage Power per chip [W] Voltage [V] 2 Power 1. 5 Current 1 0. 5 0 1998 2002 2006 2010 500 0 2014 VDD current [A] 2. 5 0 Year International Technology Roadmap for Semiconductors 1999 update sponsored by the Semiconductor Industry Association in cooperation with European Electronic Component Association (EECA) , Electronic Industries Association of Japan (EIAJ), Korea Semiconductor Industry Association (KSIA), and Taiwan Semiconductor Industry Association (TSIA) (* Taken from Sakurai’s ISSCC 2001 presentation)
Power Delivery Problem (not just California) Your car starter ! Source: Shekhar Borkar, Intel
Power Consumption New Dimension in Design
Sources of Power Consumption • The three major sources of power consumption in digital CMOS circuits are: + P 4 where: P 1 – capacitive switching power (dynamic - dominant) P 2 – short circuit power (dynamic) P 3 – leakage current power (static) P 4 – static power dissipation (minor)
Research Efforts in Low-Power Design
Reducing the Power Dissipation • The power dissipation can be minimized by reducing: • supply voltage • load capacitance • switching activity – Reducing the supply voltage brings a quadratic improvement – Reducing the load capacitance contributes to the improvement of both power dissipation and circuit speed.
Amount of Reducing the Power Dissipation
Gate Delay and Power Dissipation in Term of Supply Voltage
Needs for Low-Power • Efficient methodologies and technologies for the design of high-throughput and lowpower digital systems are needed. • The main interest of many researches is now oriented towards lowering the energy dissipation of these systems while still maintaining the high-throughput in real time processing.
Baterije – podela U zavisnosti od načina upotrebe (korišćenja) baterije delimo na: • primarne - namenjene da se pune jedanput, koriste se dok se ne isprazne, a nakon toga se bacaju • sekundarne – imaju mogućnost da se ponovo pune i prazne više puta
Osobine 1. Energy density — je mera koja pokazuje koliko energije baterija može da čuva u zadati volumen ili masu. Ova mera se može iskazati na sledeća dva načina: Volumetrijska energy density se obično meri u watthours per liter (Wh/L) Gravimetrijska energy density se meri u watthours per kilogram (Wh/kg) 2. Memory effect - Neke od sekundarnih baterija poseduju osobinu poznatu kao memory effect. Naime, ako se ove baterije koriste dok se u potpunosti isprazne, tada se one mogu ponovo napuniti do njihovog početno deklarisanog kapaciteta. No ako su ove baterije delimično isprazne pre ponovnog punjenja one pokazuju osobine redukcije energetskog kapaciteta. Nakon većeg broja punjenja i pražnjenja ove baterije će postati potpuno beskorisne.
Osobine – prod. 3. Cycle life – ukazuje na broj ciklusa punjenja i pražnjenja koju baterija može da podnese pre nego što postane neupotrebljiva. 4. Working voltage – dostupan napon od jedne čelije koji je odredjen hemijskim sastavom baterije. 5. Self discharge – brzina sa kojom se baterija sama po sebi prazni kada je neiskorišćena.
Tehnologija baterije - tipovi Ni-Cd – najčešće korišćen oblik. Ove baterije se karakterišu high-energy current i koriste se za ugradnju u uredjajima koji mogu da pokretaju male motore. Memorijski efekat, high-self-discharge rate, i low-energy density su loše osobine ovih baterija, što ih čini neupotrebljivim za cellular phones i notebook computers. Alkaline – imaju energy-density nešto bolju od Ni-Cd, i uglavnom se koriste kao baterije za jednokratnu upotrebu. Postoje i recharchable tip ovih baterija ali njihova energy-density brzo opada sa višestrukim punjenjem. Ni-MH – Nickel Metal Hybride baterije se uobičajeno koriste kod cellular phones i notebook computers jer je njihova cena prihvatljiva, a energydensity je relativno visoka. Na žalost self-discharching rate je visoka što ih čini neogodnim za odredjene aplikacije. Ovaj tip baterije je dugo bio most izmedju Ni-Cd i lithium ion-skih, ali je izgubio primat zbog pada cena lithium -skih baterija.
Tehnologija baterije - tipovi (prod. ) Lithium-ion - karakteriše se velikim energy-density. Standardno se koriste kod cellular-nih telefona i notebook računara. Veoma su tanke (do 0. 5 mm). Zadnjih godina cena im je drastično pala. Lithium polymer – karakteriše se high energy density i mogu se formirati (oblikovati) u različite oblike čime se izvrsno uklapaju sa formom (oblikom) proizvoda. Photovoltaic cells - konvertuju ambijentalno svetlo u električnu energiju i mogu se koristiti za low-power devices kakvi su kalkulatori. Fuel cells – konvertuju hydro-carbon u električnu energiju i imaju veoma visoku energy density. Ponovno punjenje ovih ćelija slično je punjenju upaljača. Imaju od 3 do 5 puta bolju energy density u odnosu na lithium ionske baterije , ali su nepraktične za apikacije koje se odnose na prenosive elektronske uredjaje.
Kritične metrike za tehnologiju baterije
Implementacija proizvoda Najbolja tehnologija baterije za prenosive elektronske uredjaje se odredjuje u fazi procesa analize proizvoda. Projektant mora pri tome da napravi balans izmedju high energy capacity, male dimenzije baterije (small form factor), i cene, kako bi napravio uspešan proizvoodni koncept. Da bi rešenje učinio realnim, proizvodjač mora da sagleda formu (oblik) baterije, zahteve za ponovnim punjenjem /zamena, mehaničku montažu, konektore, i power management elektronikom. Postoji mnogo oblika (formi) baterija. Standardne forme su AA, AAA, C i D celije, lithium-ske button cell baterije koje se takoreči mogu kupiti u svakoj prodavnici. Ovi tipovi baterija su poželjni ako želimo da one budu lako zamenljive od strane širokog kruga korisnika. Sa druge strane, lithium ion-ske i Ni-MH su dostupne u razne forme (pravougaone, ne cilindrične, i dr. ) kao i neke forme koje se prave po narudžbini.


