85fff1ec3e7d3877b70e377243b4f617.ppt
- Количество слайдов: 157
Chapter 1 Microcomputers and Microprocessors Microprocessor Evolution and Performance
Contents z. Introduction to microcomputer system z. Microprocessor evolution ythe INTEL processor family z. Microprocessor performance
Introduction to Microcomputer z. An microcomputer can be interpreted as a machine with: y. I/O devices for Input/Output, ymicroprocessor for processing, ymemory units for storage y. Buses for connecting the above components z In 1970, a microcomputer was normally interpreted as a computer considerably smaller than a mini-computer, possibly using ROM for program storage
Basic hardware units z. Input ye. g. keyboard, mouse z. Microprocessor ye. g. 8085, 8086, mc 68000 microprocessors z. Memory ye. g. RAM, hard disk z. Output ye. g. monitor, printer
Buses z. Buses: External connections to input/output unit z. Major Buses: y. Address bus: address of memory locations containing instructions or data y. Data bus: contents of memory locations y. Control Bus: synchronization and handshaking between components
General Architecture Memory Unit Input unit Primary memory Secondary memory Microprocessing unit Output unit
Processor History Vacuum Tubes to IC’s
First Generation Computers z. Vacuum tube technology y. Large room, air-conditioned y. Tube life-time: 3, 000 hours z. Useless Machine? y 1951: 1 st Univac I (UNIVersal Automatic Computer) delivered y 1952: Prediction of presidential election by CBS y 1952: IBM Model 710 Data Processing System
Second Generation Computers z. The Transistor Is Born (Solid-State Era) y 1948: invention of bipolar transistors x 1956: Nobel physics award: Drs. William Shockley, John Bardeen and Walter H. Brattain (Bell Labs) y 1954: Bell Labs: all-transistorized computer (TRADIC) x 800 transistors x. Much less heat x. More reliable and less costly
Second Generation Computers z. Mainframe Computers y 1958: IBM’s 1 st transistorized computer 7070/7090 y 1959: 1401 (business-oriented model) y. Built on circuit boards mounted into rack panels, or frames y. Main frame (mainframe): the CPU portion of the computer y. Popular with business and industry
Third Generation Computers z Invention of IC: 1959 y. Dr. Robert Noyce (Fairchild) and Jack Kilby (TI) y. Kilby: fabricating resistors, capacitors and transistors on a germanium wafer, and connecting these parts with fine gold wires y. Noyce: isolating individual components with reversebiased diodes, and deposing an adherent metal film over the circuit, thus connecting the components y 1 st IC: 2 -transistor multivibrator y. By mid 1960 s: memory chips with 1, 000 components are common
Third Generation Computers z 1964: IBM 360 Series (32 -bit) y. The first to use IC technology x. A family of 6 compatible computers y 40 different I/O and auxiliary storage devices y. Memory capacity: 16 K words to over 1 MB. y 32 -bit registers x 16 y 24 -bit address bus y 128 -bit data bus
Third Generation Computers z 1964: IBM 360 Series (32 -bit) y 375, 000 computations per second x(<< 150 mips Pentium 100) y$5 billion development cost z. IBM became the leading mainframe company
Minicomputer z 1960 s: Space Race between US & USSR y. IC industry boom y. A tremendous demand by scientists and engineers for an inexpensive computer that they could operate by themselves y 1965: DEC PDP-8 (by Edson de Castro’s group) x. Low-cost ($25, 000) minicomputer x 12 -bit x 16 -bit PDP-11 y. Supermini …
Microprocessors: CPU on a Chip z 1968: INTEL (Integrated Electronics) y Founded by Robert Noyce and Gordon Moore (Fairchild) y. Original goals: semiconductor memory market y 1969: customized IC’s for Busicom for calculator y. Ted Hoff and Stan Mazor: proposed 4 -bit CPU on a single chip, plus ROM, RAM chips
Microprocessors: CPU on a Chip z 1971: 4000 Family y. By Fredrico Faggin y 4001: 2 K ROM with 4 -bit I/O port y 4002: 320 -bit RAM, 4 -bit output port y 4003: 10 -bit serial-in parallel-out shift register y 4004: 4 -bit processor z. Processor-on-a-chip: Micro-processor era
Microprocessors: CPU on a Chip z 1972: 8008, 8 -bit z 1974: 8080, an improved version
Microprocessors: CPU on a Chip z 8 -bit CPUs z 16 -bit address (64 K) y. MC 6800: Motorola y 6502: MOS Technology (spin-off from Motorola) x. Apple-II, Apple DOS y. Z-80: Zilog (spin-off from Intel) x. Z-80 cards on Apple-II, CP/M
Microprocessors: CPU on a Chip z 16 -bit CPUs (Late 1970 s) y 8086, 80186, 80286: Intel x. PC, PC-DOS, MS-DOS, SCO-Unix y. MC 68000: Motorola x 16 -bit instructions x. Hardware multiply and divide x 20 -bit address buses (1 MB) x. Workstations: Sun 3
Microprocessors: CPU on a Chip z 32 -bit CPUs y 80386, 80486: Intel y. MC 68020, 68030: Motorola z 64 -bit CPUs y. Pentium, Pentium Pro (64 -bit external data bus, 32 -bit internal registers, not recognized as 64 -bit CPUs in terms of internal register word length)
Microcomputers: Computers Based on Microprocessors z 1975: MITS Altair 8800 (Kit) y$399, i 8080, programmed by depositing 1 s/0 s via front panel switches z Other Computers boom y 8080: MITS, … y 6800: SWTPC 6800, … y. Z-80: TRS-80, … y 6502: Apple I, 8 K, programmed with BASIC x. Steve Jobs & Steve Wozniak, millionaires from PC COM’s …
Personal Computers: the Open Architecture Era z 1982: IBM PC y. A system board (mother board) y. Intel 8088 processor y 16 K memory y 5 expansion slots x. Third-party vendors to supply various IO adapter cards x. Open architecture x. Computer with interchangeable components
Micro-controllers: Microcomputers on a Chip z Microcontroller: a computer on a chip y. Microprocessor, plus y. On-chip memory, plus y. Input/output ports z 1995: microcontrollers out sold microprocessors 10: 1 yembedded on various equipments: x. Thermostat, machine tools, communication, automotive, … z Evolution: getting greater IO capabilities y. Intel: MCS-51, MCS-96, …
High-Performance Processors z. Supercomputers y. Aircraft design, global climate modeling, oilbearing formation, molecular design of new drugs, financial behavior y. CDC 6600, 7600: Seymour Cray y. Cray-1: 1976, the first true supercomputer x. ECL, 128 KW power consumption x 130 MFLOPS (Pentium 100: 150 MFLOPS) x$5. 1 million
High-Performance Processors z Parallel Processors y. Tens of gigaflops y. Multi-processors wired by a common bus y. Each is given a portion of the problem to solve y. Hypercube: early 1980 s x. Cosmic Cube, i. PSC (with i 860/RISC chips) y 2 D rectangular Mesh architecture: multiple processor at each node x. Intel: teraflops computer with 4500 nodes, each powered by 2 Pentium Pro 200.
RISC vs. CISC z RISC: Reduced Instruction Set Computer (1980 s) y. A small number of fixed-length instructions y. Simple addressing modes y. A large number of registers y. Instructions executed in one clock cycle z Intel i 860 (“Cray on a Chip”) y 82 instructions, 32 -bit long each y. Four addressing modes y 32 general-purpose registers
RISC vs. CISC z CISC: Complex Instruction Set Computer y. A large number of variable length instructions y. Multiple addressing modes y. A small number of registers y. Multiple number of clock cycles to execute z Intel 8086 y. Over 3000 instruction forms, 1 -6 bytes y 9 addressing modes y 8 general-purpose registers y. Execution from 2 to 80+ cycles
RISC vs. CISC z. RISC y. Control unit is much simpler (simpler instructions, execution in 1 CLK) y. Faster execution with less total on-chip logic y. Chip area: 10% (vs 50% for CISC) y. More area for register file, data and instruction caches, FPU, and co-processor y. Power. PC: 32 -bit, by IBM, Apple, Motorola y. Sparc: for Sun. Micro workstations
Application-Specific Processors z. DSP Chips y. Mostly for analog signal processing y. ADC-DSP-DAC architecture y. Avoid processing analog signals using discrete circuits, involving capacitors and inductance y. DSP: conduct complex mathematic functions x. Digital filter, spectrum analysis
Application-Specific Processors z DSP Chip Architecture y. Different data/program areas: Harvard Architecture y. Hardware multipliers and adders, optimized to execute on a single cycle y. Arithmetic pipelining: several instructions operated at once y. Hardware loop control y. Multiple IO ports for communication with other processors
Summary of Processor History q 1940 s: Vacuum tube, large and consuming large power q 1950 s: Transistor (1948 -) q 1959: First IC (second industrial revolution) q 1960 s: IC was popular to build CPU’s. q 1971: Intel 4004 microprocessor (2300 transistors) Starts of the microprocessor age q Late 1970’s: 8080/85
Summary of Processor History q 1980: RISC (reduced instruction set computer) q CISC (complicated instruction set computer) vs. RISC q CISC family: Intel 80 x 86, Pentium; Motorola 68000 series q All others are RISC series.
Evolution of INTEL Processors 4004 (’ 71)-Pentium Pro (’ 93 -)
INTEL z. Integrated Electronics y 1968: founded by Robert Noyce and Gordon Moore y. IA: Intel Architecture (e. g, IA-16, IA-32, IA-64) since 8008 (’ 72) had became the de facto standard z. Evolution: x. Internal register sizes x. External bus widths x. Real, Protected, and Virtual 8086 modes
4 -bit Processors z 4004 yfirst microprocessor ybecame available in 1971 y 4 -bit microprocessor: x 4 -bit registers & 4 -bit data bus x#transistors: 2250 x. Min. feature size: 10 microns x. Address bus: 10 bits/1 K x 0. 06 MIPS (@ 0. 108 MHz) x. No internal cache
8 -bit Processors z 8008, 8080, 8085 ybecame available in 1974 y 8 -bit microprocessor 8080
8086: IA standard z Became available in 1978 y 16 -bit data bus y 20 -bit address bus (was 16 -bit for 8080) ymemory organization: 16 segments of 64 KB (1 MB limit) z Re-organize CPU into BIU (bus interface unit) and EU (execution unit) y. Allow fetch and execution simultaneously z Internal register expanded to 16 -bit y. Allow access of low/high byte separately
8086 z. Hardware multiply and divide instructions z. External math co-processor z. Instruction set compatible with 8080/8085 z 8086: defined the 80 x 86 architecture
8086 z. Not quite successful y 16 -bit data bus: Requires two separate 8 -bit memory banks y. Memory chips were expensive
8088: PC standard z Became available in 1979, almost identical to 8086 z 8 -bit data bus: for hardware compatibility with 8080 z 16 -bit internal registers and data bus (same as 8086) z 20 -bit address bus (was 16 -bit for 8080) y. BIU re-designed z memory organization: 16 segments of 64 KB (1 MB limit) y. Two memory accesses for 16 -bit data (less efficient) y. But less cost z 8088: used by IBM PC (1982), 16 K-64 K, 4. 77 MHz
80186, 80188: High Integration CPU z PC system: y 8088 CPU + various supporting chips x. Clock generator x 8251: serial IO (RS 232) x 8253: timer/counter x 8255: PPI (programmable periphial interface) x 8257: DMA controller x 8259: interrupt controller z 80186/80188: 8086/8088 + supporting functions y. Compatible instruction set (+ 9 new instructions)
80286 z. Became available in 1982 zused in IBM AT computer (1984) z 16 -bit data bus zclock speed 25% faster than 8088, throughput 5 times greater than 8088 z 24 -bit address bus (16 MB) (vs. 20 -bit/1 M 8086)
80286: Real vs. Protected Modes z Larger address space: 24 -bit address bus y. Real Mode vs. Protected Mode z Real Mode: y. Power on default mode y. Function like a 8086: use 20 -bit least significant address lines (1 M) y. Software compatible with 286 y 16 new instructions (for Protected Mode management) y. Faster 286: redesigned processor, plus higher clock rate (68 MHz)
80286: Real vs. Protected Modes z. Protected Mode: y. Multi-program environment y. Each program has a predetermined amount of memory y. Addressed via segment selector (physical addresses invisible): 16 M addressable y. Multiple programs loaded at once (within their respective segments), protected from read/write by each other
80286: Real vs. Protected Modes z. Protected Mode: y. Cannot be switch back to real mode to avoid illegal access by switching back and forth between modes z. A faster 8086 only? y. MS-DOS requires that all programs be run in Real Mode
Clock Speed z. Electrical signals cannot change instantaneously (transition period required) z. System clock provides timing signal for synchronization z. Cannot be used to compare the performance of microprocessors with different instruction sets ye. g. , a 66 MHz Pentium is twice as fast as a 66 MHz 80486
80386 DX (aka. 80386) z available in 1985, a major redesign of 86/286 y. Compatibility commitment through 2000 z 32 -bit data and address buses (4 GB memory) y. Real Address Mode: 1 M visible, 286 real mode y. Protected Virtual Address Mode: x. On board MMU x. Segmented tasks of 1 byte to 4 G bytes • Segment base, limit, attributes defined by a descriptor register x. Page swapping: 4 K pages, up to 64 TB virtual memory space x. Windows, OS/2, Unix/Linux
80386 DX (aka. 80386) z Virtual 8086 mode (a special Protected mode feature): permitted multiple 8086 virtual machinesmultitasking (similar to real mode) y. Windows (multiple MSDOS’s) z Clock rate: ymax. 40 MHz, 2 pulses per R/W bus cycle y. External memory cache to avoid wait x. Fast SRAM x 93% hit rate with 64 K cache z Compatible instructions (14 new)
80386 SX z 80386 SX: (for transition to 32 -bit) y 16 -bit data bus/32 -bit register y 24 -bit address bus
80486 DX z 1989: a polished 386, 6 new OS level instructions zvirtually identical to 386 in terms of compatibility z. RISC design concepts yfewer clock cycles per operation, a single clock cycle for most frequently used instructions y. Max 50 MHz y 5 stage execution pipeline x. Portions of 5 instructions execute at once
80486 DX z. Highly Integrated: y. On board 8 K memory cache y. FPP (equivalent to external 80387 co-processor) z. Twice as fast as 386 at any given clock rate y 20 Mhz 486 ~= 40 Mhz 386
80486 SX z 80486 SX y. NOT a 16 -bit version for transition purpose yno coprocessor y. No internal cache y. For low-end applications y. Max. 33 Mhz only
80486 DX 2/DX 4: Overdrive Chips z. Processor speed increased too fast y. Redesign of microcomputer for compatibility becomes harder y. Solution: Separating internal speed with external speed, improve performance independently z 80486 DX 2/DX 4 – internal clock twice/three times (NOT four times) the external clock: runs faster internally
80486 DX 2/DX 4: Overdrive Chips z System board design is independent of processor upgrade (less expensive components are allowed) z Processor operate at maximum speed data rate internally y. Only slow access to external data operates at system board rate y. Internal cache offset the speed gap z 486 DX 2 66: 66 internal, 33 external z 486 DX 4 100: 100 internal, 33 external (3 x) z Overdrive sockets: for upgrading 486 dx/sx to 486 dx 2/dx 4 (with overdrive socket pin-outs)
Pentium: Superscaler Processor z available in 1992 z 32 -bit architecture z Superscaler architecture y. Scaling: scaling down etchable feature size to increase complexity of IC (e. g. , DRAM) x 10 microns/4004 to 0. 13 microns (2001) y. Superscaler: go beyond simply scaling down y. Two instruction pipelines: each with own ALU, address generation circuitry, data cache interface y. Execute two different instructions simultaneously
Pentium: Superscaler Processor z. Onboard cache y. Separate 8 K data and code caches to avoid access conflicts z. FPP z. Instruction pipeline: 8 stage z. Optimized floating point functions y 5 x-10 x FLOP’s of 486 y 2 x performance of 486 at any clock rate
Pentium: Superscaler Processor z. Compatibility with 386/486: y. Internal 32 -bit registers and address bus y. Data bus expanded to 64 -bits for higher data transfer rate x. Compare 8088 to 386 sx transition
Pentium: Superscaler Processor znon-clone competition from AMD, Cyrix zdevelopment of brand identity by Intel
Pentium Pro: Two Chips in One z Became available in 1995 z Superscaler of degree 3 y. Can execute 3 instructions simultaneously z Optimized for 32 -bit operating systems (e. g. , Windows NT, OS 2/Warp) z Two separate silicon die on the same package y. Processor: 0. 35 u, 5. 5 million transistors y 256 KB(/512 K) Level 2 cache included on chip, 15. 5 million transistors in smaller area
Pentium Pro: Two Chips in One z. On Board Level 2 cache y. Simplifies system board design y. Requires less space y. Gains faster communication with processor z. Internal (level 1) cache: 8 K z. Pentium Pro 133 ~= 2 x Pentium 66 ~= 4 x 486 DX 2 66
Pentium Pro: Dynamic Execution z Dynamic execution: reduce idle processor time by predicting instruction behaviors y. Multiple Branch Prediction: look as far as 30 instructions ahead to anticipate program branches y. Data Flow Analysis: looks at upcoming instructions and determine if they are available for processing, depending on other instructions. Determine optimal execution sequences. y. Speculative Execution: execute instructions in different order as entered. Speculative results are stored until final states can be determined.
Processor Future What’s More from Moore’s Law?
Moore's Law z. In 1965, Gordon Moore predicted that: z“The number of transistors per integrated circuit would double every 18 months” z. He forecast that this trend would continue through 1975
Moore’s Law
Other Microprocessors z. Motorola family yfrom 6809 (Apple II) through 68040 z. Power. PC yjoint venture between Apple, IBM, and Motorola z. RISC Processors y. DEC Alpha, MIPS, Sun SPARC, etc.
CISC vs. RISC z. CISC (Complex Instruction Set Computer) y. CISC processors have a large versatile instruction set that supports many complex addressing modes ymove complexity from software to hardware z. RISC (Reduced Instruction Set Computer) y. RISC processors have a small instruction set ymove complexity from hardware to software
Microprocessor Performance z. Two main factors: z. Respond time ythe time between the start and completion of a task, also referred to as execution time z. Throughput ythe total amount of work done in a given time
MIPS z. Million Instructions Per Second y. MIPS = (Instruction count) / (Execution time in micro second X 106) z. It specifies performance inversely to execution time z. Faster machines have a higher MIPS rating
Some Problems of MIPS z. Cannot compare computers with different instruction sets, since the instruction count will certainly differ z. MIPS varies between programs on the same computer
i. COMP z. An index provided by Intel for comparison of performance of their 32 -bit microprocessors z. Based on a variety of performance components that represent integer mathematics, graphics, etc. z. Combine results of a set of software application benchmarks
Chapter 2 Computer Codes, Programming, and Operating Systems Number Systems Computer Codes Programming Operating Systems
Number Systems z. Decimal: Base 10 z. Binary: Base 2 z. Octal: Base 8 z. Hexadecimal: Base 16
Base Conversion: 2 10 z. Binary to Decimal y. D = i=0, n-1 bi x 2 i z. Decimal to Binary y. Repeated subtraction x. D’ = i=0, m-1 bi x 2 i = D - 2 m (bm=1) x. D <= D’ & m <= m’ (m’: max exp. s. t. (bm’=1) y. Long division x. D’ = D/2 … bi & D <= D’
MCS-51 Program Development. SDT (CVTSYM) Program Symbol Converter ICE . SYM Editor . ASM Assembler (X 8051) . OBJ Linker . HEX (Link) Target
Chapter 3 80 x 86 Processor Architecture 8086/88 Segmented Memory 80386 80486 Pentium Pro
The 8086 and 8088 Processor Model Programming Model
8086: IA standard z Became available in 1978 y 16 -bit data bus y 20 -bit address bus (was 16 -bit for 8080) ymemory organization: 16 segments of 64 KB (1 MB limit) z Re-organize CPU into BIU (bus interface unit) and EU (execution unit) y. Allow fetch and execution simultaneously z Internal register expanded to 16 -bit y. Allow access of low/high byte separately
8088: PC standard z Became available in 1979, almost identical to 8086 z 8 -bit data bus: for hardware compatibility with 8080 z 16 -bit internal registers and data bus (same as 8086) z 20 -bit address bus (was 16 -bit for 8080) y. BIU re-designed z memory organization: 16 segments of 64 KB (1 MB limit) y. Two memory accesses for 16 -bit data (less efficient) y. But less cost z 8088: used by IBM PC (1982), 16 K-64 K, 4. 77 MHz
80186, 80188: High Integration CPU z PC system: y 8088 CPU + various supporting chips x. Clock generator x 8251: serial IO (RS 232) x 8253: timer/counter x 8255: PPI (programmable periphial interface) x 8257: DMA controller x 8259: interrupt controller z 80186/80188: 8086/8088 + supporting functions y. Compatible instruction set (+ 9 new instructions)
8086 Processor Model: BIU+EU z. BIU y. Memory & IO address generation z. EU y. Receive codes and data from BIU x. Not connected to system buses y. Execute instructions y. Save results in registers, or pass to BIU to memory and IO
8086 Processor Model Address Generation and Bus Control EU CS ES SS DS IP Instruction Queue AH AL BH BL CH CL DH DL BP DI SI SP BIU ALU Flags 83
Fetch and Execution Cycle z BIU+EU allows the fetch and execution cycle to overlap y 0. System boot, Instruction Queue is empty y 1. IP =>BIU=> address bus && IP++ y 2. Mem[(IP-1)] => Instruction Queue[tail++] y 3 a. Instr. Q[head] => EU => execution y 3 b. Mem[IP++] => Instr. Q[tail++] x. Maybe multiple instructions y. Repeat 3 a+3 b (overlapped)
Waiting Conditions: Memory Access z. BIU+EU: execute (almost) continuously without waiting z. Waiting Conditions: Accessing memory locations not in queue y. BIU suspend instruction fetch y. Issues external memory address y. Resumes instruction fetch and execution
Waiting Conditions: Jump z. Next Jump Instruction y. Instructions in queue are discarded y. EU wait for the next instruction after the jump location to be fetched by BIU y. Resume execution
Waiting Conditions: Long Instructions z. Long Instruction is being executed y. Instruction Full y. BIU waits y. Resume instruction fetch after EU pull one or tow bytes from queue
BIU: 8088 vs. 8086 z. BIU is the major difference z 8088: ydata bus: 8 -bit (vs. 16 -bit/8086) y. Instruction queue: 4 bytes (vs. 6 -byte/8086) z. Only 30% slower than 8086 y. If queue is kept full
8086 Programming Model AH BH CH DH AL BL CL DL CS ES SS DS BP DI SI SP Flags H Flags L IP 89
8086 Programming Model z. Data Group: y. AX (AH+AL): Accumulator y. BX (BH+BL): Base y. CX (CH+CL): Counter y. DX (DH+DL): Data
8086 Programming Model z. Segment Group: y. CS: Code Segment y. DS: Data Segment y. ES: Extra Segment y. SS: Stack Segment z. Segment Registers: y. Base address to particular segments
8086 Programming Model z. Pointer/Index Group: y. IP: Instruction Pointer CS y. SI: Source Index DS y. DI: Destination Index ES y. SP: Stack Pointer SS z. Index Registers: y. Index (offset) or Pointer to a Base address
8086 Flag Word Flag L: SF CF: Carry Flag ZF X AF X PF X CF CF= 0:No Carry (Add) or Borrow (SUB) CF= 1:high-order bit Carry/Borrow PF: (Even) Parity Flag (even number of 1’s in low-order 8 bits of result) AF: Aux. Carry: Carry/Borrow on bit 3 (Low nibble of AL) ZF: Zero Flag: (1: result is zero) SF: Sign Flag: (0: positive, 1: negative)
8086 Flag Word Flag H: X X OF DF IF TF TF: Trap flag (single-step after next instruction; clear by single-step interrupt) IF: Interrupt-Enable: enable maskable interrupts DF: Direction flag: auto-decrement (1) or increment(0) index on string operations OF: Overflow: signed result cannot be expressed within #bits in destination operand
Segmented Memory z. Linear vs. Segmented y. Linear Addressing: x. The entire memory is regarded as a whole xthe entire memory space is available all the time y. Segmented: xmemory is divided into segments x. Process is limited to access designated segments at a given time
8086 Memory Organization z. Even and Odd Memory Banks y 16 -bit data bus two-byte / two one-byte access y. Allows processor to work on bytes or on words (16 -bit) x. IO operations are normally conducted in bytes y. Can handle odd-length instructions x. Single byte instructions x. Multiple byte (and very long) instructions
8086 Memory Organization z. Memory Space: y 20 -bit address bus y. Linearly, 1 M bytes directly addressable z. Memory Banks y. Can read 16 -bit data (512 K words) from even and odd-addressed simultaneously x need Two memory banks in parallel x BHE control line: allows addressing even/odd banks or both
Memory Organization: Alignment z Endianess: y. One way to model multi-byte CPU register x. AX AH+AL y. Two ways to store operands in memory z Big-endian CPU: (IBM 370, M 68*, Sparc) y. High-order-byte-first (HOBF) y. Maps highest-order byte of internal register lowest (1 st) memory byte address y. Operand address of MSB x. MOV R 1, N N: 1 st byte in memory & MSB of register
Memory Organization: Alignment z Little-endian CPU: (DEC, Intel) y. Low-order-byte-first (LOBF) y. Maps lowest-order byte of register 1 st memory byte y. Operand address of LSB (1 st memory byte) x. MOV AX, N N: 1 st byte in memory & LSB of register x. AL N, AH N+1 z Configurable: y. Can switch between Big/Little-endian, or y. Provide instructions which convert 16 -/32 -bit data between two byte ordering (80486)
8086 Memory Organization z Aligned operand y. Operand aligned at even-byte (word/dword) boundaries y. Allows single access to read/write one operand x. Through internal shift/swap mechanism, if necessary z Mis-aligned words: y. Word operand not start at even address y. Need 2 read cycles to read/write the word (8086) x. Issues two addresses to access the two even-aligned words containing the operand in order to access the operand xslower but transparent to programmer
8086 Memory Organization z 8088 yalways 2 cycles for word operations x. Aligned or not y. Because of 8 -bit external data bus x. Single memory bank is sufficient
8086 Memory Map z. Memory Map: How memory space is allocated y. ROM Area: boot, BIOS y. RAM: OS/User Apps & data y. Unused y. Reserved: for future hardware/software uses y. Dedicated: for specific system interrupt and rest functions, etc.
Segment Registers z 64 K memory segments x 16 z 16 -bit offset each z. CS, DS, ES, SS
Logical and Physical Addresses z. Physical: 20 -bit z. Logical: 16 -bit y 16 -byte segment boundaries z. Address Translation y. E. g. , CS: IP
80286 z. First with Protection Mode z. Review of 286 Protected Mode … Next
80286 z. Became available in 1982 zused in IBM AT computer (1984) z 16 -bit data bus zclock speed 25% faster than 8088, throughput 5 times greater than 8088 z 24 -bit address bus (16 MB) (vs. 20 -bit/1 M 8086)
80286: Real vs. Protected Modes z Larger address space: 24 -bit address bus y. Real Mode vs. Protected Mode z Real Mode: y. Power on default mode y. Function like a 8086: use 20 -bit least significant address lines (1 M) y. Software compatible with 286 y 16 new instructions (for Protected Mode management) y. Faster 286: redesigned processor, plus higher clock rate (68 MHz)
80286: Real vs. Protected Modes z. Protected Mode: y. Multi-program environment y. Each program has a predetermined amount of memory y. Addressed via segment selector (physical addresses invisible): 16 M addressable y. Multiple programs loaded at once (within their respective segments), protected from read/write by each other
80286: Real vs. Protected Modes z. Protected Mode: y. Cannot be switch back to real mode to avoid illegal access by switching back and forth between modes z. A faster 8086 only? y. MS-DOS requires that all programs be run in Real Mode
80386 Model z. Refine 286 Protect Mode z. Expand to 32 -bit registers z. New Virtual 8086 Mode
80386 Review
80386 DX (aka. 80386) z available in 1985, a major redesign of 86/286 y. Compatibility commitment through 2000 z 32 -bit data and address buses (4 GB memory) y. Real Address Mode: 1 M visible, 286 real mode y. Protected Virtual Address Mode: x. On board MMU x. Segmented tasks of 1 byte to 4 G bytes • Segment base, limit, attributes defined by a descriptor register x. Page swapping: 4 K pages, up to 64 TB virtual memory space x. Windows, OS/2, Unix/Linux
80386 DX (aka. 80386) z Virtual 8086 mode (a special Protected mode feature): permitted multiple 8086 virtual machinesmultitasking (similar to real mode) y. Windows (multiple MSDOS’s) z Clock rate: ymax. 40 MHz, 2 pulses per R/W bus cycle y. External memory cache to avoid wait x. Fast SRAM x 93% hit rate with 64 K cache z Compatible instructions (14 new)
80386 SX z 80386 SX: (for transition to 32 -bit) y 16 -bit data bus/32 -bit register y 24 -bit address bus
80386: Real vs. Protected Modes z Larger address space: 32 -bit address bus (4 G) y. Real Mode vs. Protected Mode (refined from 286) z Real Mode: y. Power on default mode y. Function like a 8086: (1) use only 20 -bit least significant address lines (1 M) (2) segmented memory retained (64 K) y. Software compatible with 286 z New Real Mode Features: yaccess to 32 -bit register set ytwo new segments: F, G
80386: Real vs. Protected Modes z. Protected Mode: ynew addressing mechanism vs. real mode ysupports protection levels ysegment size: 1 to 4 G (not 64 K, fixed) ysegment register: pointer to a descriptor table xnot base address
80386: Real vs. Protected Modes z. Protected Mode: ydescriptor table: (8 byte per entry) x 32 -bit base address of segment xsegment size xaccess rights ymemory address = base address (in table) + offset (in instruction)
80386: Real vs. Protected Modes z. Protected Mode: y. Paging mechanism: xmap 32 -bit linear address (base+offset) =>physical address & page frame address x (4 K page frames in system memory) x 64 TB of virtual memory
80386: Real vs. Protected Modes z. Protected Mode: y. Protection mechanism: xtasks/data/instructions are assigned a privilege level (PL) xtasks running at lower PL cannot access tasks or data segments at a higher PL xrunning programs that are protected from the others
80386: Real vs. Protected Modes z Two Ways to Run 8086 Programs: y. Real Mode y. Virtual 8086 Mode z Virtual 8086 Mode: yruns multiple 8086+other 386 (protected mode) programs independently yeach sees 1 MB (mapped via paging to anywhere in 4 GB space) yrunning V 8086+ Protected mode simultaneously
386 80386 Processor Model
80386 Processor Model: BIU+CPU+MMU z BIU ycontrol 32 -bit address and data buses ykeep instruction queue full (16 bytes) z Address pipelining yaddress of next memory location is output halfway through current bus cycle ymore address decode time yslower memory chip is OK yeasier to keep up with faster (2 CLK) bus cycle of 386
80386 Processor Model: BIU zdynamic data bus sizing yswitch between 16 -/32 -bit data bus on the fly yaccommodate to external 16 -bit memory cards or IO devices yadjust bus timing to use only the least significant 16 bits
80386 Processor Model: BIU z. External memory y 4 memory banks (4 x 8=32 bits) y. BE 0 -BE 3 for bank selection yaccess byte or word or double word xaligned operands: 1 bus cycle xmis-aligned (not %4): 2 bus cycles
80386 Processor Model: CPU z. CPU=IU (instruction) +EU (execution) yfetching & execution overlap z. IU: yretrieval instructions from queue ydecode ystore in decoded queue z. EU: ALU+registers (32 -bit) yexecute decode instructions
80386 Processor Model: MMU z Segmentation unit y. Real mode: generate the 20 -bit physical address y. Protected mode: store base/size/rights in descriptor registers xcache descriptor tables in RAM xfaster operations z Paging Unit ydetermines physical addresses associated with active segments (divided into 4 K pages) yvirtual memory support to allow larger programs
80386 Programming Model z. General Purpose Registers y. Data & Addresses Groups y. Status & Control Flags x. VM, RF, NT, IOPL y. Segment Group
80386 Programming Model z. Special purpose Registers
80386 Programming Model z. Memory Management ysegment descriptors xkeep base, size, access rights x 3 types of tables: global (GDT), local (LDT), interrupt (IDT) xaddressing: • index (to a table) + RPL • base + offset (from instruction) y. Paging x. TLB
80386 Programming Model z. Protection (PL) ytask: CPL yinstruction: RPL ydata segment: DPL z. Gates yspecial descriptors that allows access to higher PL tasks from lower PL tasks
80486 Review …
80486 DX z 1989: a polished 386, 6 new OS level instructions zvirtually identical to 386 in terms of compatibility z. RISC design concepts yfewer clock cycles per operation, a single clock cycle for most frequently used instructions y. Max 50 MHz y 5 stage execution pipeline x. Portions of 5 instructions execute at once
80486 DX z. Highly Integrated: y. On board 8 K memory cache y. FPP (equivalent to external 80387 co-processor) z. Twice as fast as 386 at any given clock rate y 20 Mhz 486 ~= 40 Mhz 386
80486 SX z 80486 SX y. NOT a 16 -bit version for transition purpose yno coprocessor y. No internal cache y. For low-end applications y. Max. 33 Mhz only
80486 DX 2/DX 4: Overdrive Chips z. Processor speed increased too fast y. Redesign of microcomputer for compatibility becomes harder y. Solution: Separating internal speed with external speed, improve performance independently z 80486 DX 2/DX 4 – internal clock twice/three times (NOT four times) the external clock: runs faster internally
80486 DX 2/DX 4: Overdrive Chips z System board design is independent of processor upgrade (less expensive components are allowed) z Processor operate at maximum speed data rate internally y. Only slow access to external data operates at system board rate y. Internal cache offset the speed gap z 486 DX 2 66: 66 internal, 33 external z 486 DX 4 100: 100 internal, 33 external (3 x) z Overdrive sockets: for upgrading 486 dx/sx to 486 dx 2/dx 4 (with overdrive socket pin-outs)
486 Processor Features z 386 features: y. Real/Protected Modes y. Memory Management y. PL’s yregisters & bus sizes z. New features y 6 OS instructions y 8 K/16 K onboard cache (was external before 386)
486 Processor Features z. A better 386 y 5 stage instruction pipeline x. IF/ID/EX => PF/D 1/D 2/EX/WB x. PF: instructions => Q (2*16 -bytes) x. D 1: determine opcode x. D 2: determine memory address of operands x. EX: execute indicated OP x. WB: update register
486 Processor Features z. Reduced Instruction Cycle Times y 5 stage instruction pipeline (e. g. , Fig. 3. 18) yinstruction cycle times: x 8086: 4 CLK x 80386: 2 CLK x 80486: 1 CLK ( close to RISC) xabout 2 X faster than 386
486 Processor Model: 386+FPU+Cache z 386 units retained: BIU, CPU, MMU znew: FPU (80387) + Cache (8 K/16 K) z. FPU: y 387 onboard x 0. 8 u => #transistors increased (275 K => 1+ millions) xsimplified system board design xspeedup FP operations
486 Processor Model: Cache z. Cache (8 K/16 K (dx 4)) y. Function: bridge processor memory bandwidth x 8088: 4. 77 MHz x 80486: 50 MHz x. Pentium: 100 MHz x. Pentium Pro: 133 MHz x. Main Memory (DRAM): relatively slow y. Fast Static RAMs (SRAM) as cache
486 Processor Model: Cache z. Organization: y 8 K y 4 -way set associative x 4 direct mapped caches wired in parallel xeach block maps to a set of 4 lines yunified: data & code in the same cache ywrite-through: update cache and memory page on write operations
486 Processor Model: Cache zlocality (why caches help? ) yspatial locality: e. g. , array of data ytemporal: e. g. , loops in codes zoperations on hit/miss z 128 -bit cache lines y 32 -bit x N to catch locality (N=4) y 128 -bit = 16 -byte
486 Processor Model: Cache z Mapping: ymemory => many-to-many => cache y. Data RAM: save memory data y. Tag RAM: save memory address information z 3 methods of mapping yfully associative: memory block to any cache line ydirect map: memory block to specific line xtrashing yset associative: memory block to a set of cache lines
486 Processor Model: Cache z. Replacement policy (LRU) yvalid bits: all 4 lines in use ? x. NO => use any unused line x. YES => find one to replace y. LRU bits: which is least recently used
Pentium Review …
Pentium: Superscaler Processor z available in 1992 z 32 -bit architecture z Superscaler architecture y. Scaling: scaling down etchable feature size to increase complexity of IC (e. g. , DRAM) x 10 microns/4004 to 0. 13 microns (2001) y. Superscaler: go beyond simply scaling down y. Two instruction pipelines: each with own ALU, address generation circuitry, data cache interface y. Execute two different instructions simultaneously
Pentium: Superscaler Processor z. Onboard cache y. Separate 8 K data and code caches to avoid access conflicts z. FPP z. Instruction pipeline: 8 stage z. Optimized floating point functions y 5 x-10 x FLOP’s of 486 y 2 x performance of 486 at any clock rate
Pentium: Superscaler Processor z. Compatibility with 386/486: y. Internal 32 -bit registers and address bus y. Data bus expanded to 64 -bits for higher data transfer rate x. Compare 8088 to 386 sx transition
Pentium: Superscaler Processor znon-clone competition from AMD, Cyrix zdevelopment of brand identity by Intel
Pentium Pro Review …
Pentium Pro: Two Chips in One z Became available in 1995 z Superscaler of degree 3 y. Can execute 3 instructions simultaneously z Optimized for 32 -bit operating systems (e. g. , Windows NT, OS 2/Warp) z Two separate silicon die on the same package y. Processor: 0. 35 u, 5. 5 million transistors y 256 KB(/512 K) Level 2 cache included on chip, 15. 5 million transistors in smaller area
Pentium Pro: Two Chips in One z. On Board Level 2 cache y. Simplifies system board design y. Requires less space y. Gains faster communication with processor z. Internal (level 1) cache: 8 K z. Pentium Pro 133 ~= 2 x Pentium 66 ~= 4 x 486 DX 2 66
Pentium Pro: Dynamic Execution z Dynamic execution: reduce idle processor time by predicting instruction behaviors y. Multiple Branch Prediction: look as far as 30 instructions ahead to anticipate program branches y. Data Flow Analysis: looks at upcoming instructions and determine if they are available for processing, depending on other instructions. Determine optimal execution sequences. y. Speculative Execution: execute instructions in different order as entered. Speculative results are stored until final states can be determined.