Скачать презентацию Chapter 1 Microcomputers and Microprocessors Microprocessor Evolution and Скачать презентацию Chapter 1 Microcomputers and Microprocessors Microprocessor Evolution and

85fff1ec3e7d3877b70e377243b4f617.ppt

  • Количество слайдов: 157

Chapter 1 Microcomputers and Microprocessors Microprocessor Evolution and Performance Chapter 1 Microcomputers and Microprocessors Microprocessor Evolution and Performance

Contents z. Introduction to microcomputer system z. Microprocessor evolution ythe INTEL processor family z. Contents z. Introduction to microcomputer system z. Microprocessor evolution ythe INTEL processor family z. Microprocessor performance

Introduction to Microcomputer z. An microcomputer can be interpreted as a machine with: y. Introduction to Microcomputer z. An microcomputer can be interpreted as a machine with: y. I/O devices for Input/Output, ymicroprocessor for processing, ymemory units for storage y. Buses for connecting the above components z In 1970, a microcomputer was normally interpreted as a computer considerably smaller than a mini-computer, possibly using ROM for program storage

Basic hardware units z. Input ye. g. keyboard, mouse z. Microprocessor ye. g. 8085, Basic hardware units z. Input ye. g. keyboard, mouse z. Microprocessor ye. g. 8085, 8086, mc 68000 microprocessors z. Memory ye. g. RAM, hard disk z. Output ye. g. monitor, printer

Buses z. Buses: External connections to input/output unit z. Major Buses: y. Address bus: Buses z. Buses: External connections to input/output unit z. Major Buses: y. Address bus: address of memory locations containing instructions or data y. Data bus: contents of memory locations y. Control Bus: synchronization and handshaking between components

General Architecture Memory Unit Input unit Primary memory Secondary memory Microprocessing unit Output unit General Architecture Memory Unit Input unit Primary memory Secondary memory Microprocessing unit Output unit

Processor History Vacuum Tubes to IC’s Processor History Vacuum Tubes to IC’s

First Generation Computers z. Vacuum tube technology y. Large room, air-conditioned y. Tube life-time: First Generation Computers z. Vacuum tube technology y. Large room, air-conditioned y. Tube life-time: 3, 000 hours z. Useless Machine? y 1951: 1 st Univac I (UNIVersal Automatic Computer) delivered y 1952: Prediction of presidential election by CBS y 1952: IBM Model 710 Data Processing System

Second Generation Computers z. The Transistor Is Born (Solid-State Era) y 1948: invention of Second Generation Computers z. The Transistor Is Born (Solid-State Era) y 1948: invention of bipolar transistors x 1956: Nobel physics award: Drs. William Shockley, John Bardeen and Walter H. Brattain (Bell Labs) y 1954: Bell Labs: all-transistorized computer (TRADIC) x 800 transistors x. Much less heat x. More reliable and less costly

Second Generation Computers z. Mainframe Computers y 1958: IBM’s 1 st transistorized computer 7070/7090 Second Generation Computers z. Mainframe Computers y 1958: IBM’s 1 st transistorized computer 7070/7090 y 1959: 1401 (business-oriented model) y. Built on circuit boards mounted into rack panels, or frames y. Main frame (mainframe): the CPU portion of the computer y. Popular with business and industry

Third Generation Computers z Invention of IC: 1959 y. Dr. Robert Noyce (Fairchild) and Third Generation Computers z Invention of IC: 1959 y. Dr. Robert Noyce (Fairchild) and Jack Kilby (TI) y. Kilby: fabricating resistors, capacitors and transistors on a germanium wafer, and connecting these parts with fine gold wires y. Noyce: isolating individual components with reversebiased diodes, and deposing an adherent metal film over the circuit, thus connecting the components y 1 st IC: 2 -transistor multivibrator y. By mid 1960 s: memory chips with 1, 000 components are common

Third Generation Computers z 1964: IBM 360 Series (32 -bit) y. The first to Third Generation Computers z 1964: IBM 360 Series (32 -bit) y. The first to use IC technology x. A family of 6 compatible computers y 40 different I/O and auxiliary storage devices y. Memory capacity: 16 K words to over 1 MB. y 32 -bit registers x 16 y 24 -bit address bus y 128 -bit data bus

Third Generation Computers z 1964: IBM 360 Series (32 -bit) y 375, 000 computations Third Generation Computers z 1964: IBM 360 Series (32 -bit) y 375, 000 computations per second x(<< 150 mips Pentium 100) y$5 billion development cost z. IBM became the leading mainframe company

Minicomputer z 1960 s: Space Race between US & USSR y. IC industry boom Minicomputer z 1960 s: Space Race between US & USSR y. IC industry boom y. A tremendous demand by scientists and engineers for an inexpensive computer that they could operate by themselves y 1965: DEC PDP-8 (by Edson de Castro’s group) x. Low-cost ($25, 000) minicomputer x 12 -bit x 16 -bit PDP-11 y. Supermini …

Microprocessors: CPU on a Chip z 1968: INTEL (Integrated Electronics) y Founded by Robert Microprocessors: CPU on a Chip z 1968: INTEL (Integrated Electronics) y Founded by Robert Noyce and Gordon Moore (Fairchild) y. Original goals: semiconductor memory market y 1969: customized IC’s for Busicom for calculator y. Ted Hoff and Stan Mazor: proposed 4 -bit CPU on a single chip, plus ROM, RAM chips

Microprocessors: CPU on a Chip z 1971: 4000 Family y. By Fredrico Faggin y Microprocessors: CPU on a Chip z 1971: 4000 Family y. By Fredrico Faggin y 4001: 2 K ROM with 4 -bit I/O port y 4002: 320 -bit RAM, 4 -bit output port y 4003: 10 -bit serial-in parallel-out shift register y 4004: 4 -bit processor z. Processor-on-a-chip: Micro-processor era

Microprocessors: CPU on a Chip z 1972: 8008, 8 -bit z 1974: 8080, an Microprocessors: CPU on a Chip z 1972: 8008, 8 -bit z 1974: 8080, an improved version

Microprocessors: CPU on a Chip z 8 -bit CPUs z 16 -bit address (64 Microprocessors: CPU on a Chip z 8 -bit CPUs z 16 -bit address (64 K) y. MC 6800: Motorola y 6502: MOS Technology (spin-off from Motorola) x. Apple-II, Apple DOS y. Z-80: Zilog (spin-off from Intel) x. Z-80 cards on Apple-II, CP/M

Microprocessors: CPU on a Chip z 16 -bit CPUs (Late 1970 s) y 8086, Microprocessors: CPU on a Chip z 16 -bit CPUs (Late 1970 s) y 8086, 80186, 80286: Intel x. PC, PC-DOS, MS-DOS, SCO-Unix y. MC 68000: Motorola x 16 -bit instructions x. Hardware multiply and divide x 20 -bit address buses (1 MB) x. Workstations: Sun 3

Microprocessors: CPU on a Chip z 32 -bit CPUs y 80386, 80486: Intel y. Microprocessors: CPU on a Chip z 32 -bit CPUs y 80386, 80486: Intel y. MC 68020, 68030: Motorola z 64 -bit CPUs y. Pentium, Pentium Pro (64 -bit external data bus, 32 -bit internal registers, not recognized as 64 -bit CPUs in terms of internal register word length)

Microcomputers: Computers Based on Microprocessors z 1975: MITS Altair 8800 (Kit) y$399, i 8080, Microcomputers: Computers Based on Microprocessors z 1975: MITS Altair 8800 (Kit) y$399, i 8080, programmed by depositing 1 s/0 s via front panel switches z Other Computers boom y 8080: MITS, … y 6800: SWTPC 6800, … y. Z-80: TRS-80, … y 6502: Apple I, 8 K, programmed with BASIC x. Steve Jobs & Steve Wozniak, millionaires from PC COM’s …

Personal Computers: the Open Architecture Era z 1982: IBM PC y. A system board Personal Computers: the Open Architecture Era z 1982: IBM PC y. A system board (mother board) y. Intel 8088 processor y 16 K memory y 5 expansion slots x. Third-party vendors to supply various IO adapter cards x. Open architecture x. Computer with interchangeable components

Micro-controllers: Microcomputers on a Chip z Microcontroller: a computer on a chip y. Microprocessor, Micro-controllers: Microcomputers on a Chip z Microcontroller: a computer on a chip y. Microprocessor, plus y. On-chip memory, plus y. Input/output ports z 1995: microcontrollers out sold microprocessors 10: 1 yembedded on various equipments: x. Thermostat, machine tools, communication, automotive, … z Evolution: getting greater IO capabilities y. Intel: MCS-51, MCS-96, …

High-Performance Processors z. Supercomputers y. Aircraft design, global climate modeling, oilbearing formation, molecular design High-Performance Processors z. Supercomputers y. Aircraft design, global climate modeling, oilbearing formation, molecular design of new drugs, financial behavior y. CDC 6600, 7600: Seymour Cray y. Cray-1: 1976, the first true supercomputer x. ECL, 128 KW power consumption x 130 MFLOPS (Pentium 100: 150 MFLOPS) x$5. 1 million

High-Performance Processors z Parallel Processors y. Tens of gigaflops y. Multi-processors wired by a High-Performance Processors z Parallel Processors y. Tens of gigaflops y. Multi-processors wired by a common bus y. Each is given a portion of the problem to solve y. Hypercube: early 1980 s x. Cosmic Cube, i. PSC (with i 860/RISC chips) y 2 D rectangular Mesh architecture: multiple processor at each node x. Intel: teraflops computer with 4500 nodes, each powered by 2 Pentium Pro 200.

RISC vs. CISC z RISC: Reduced Instruction Set Computer (1980 s) y. A small RISC vs. CISC z RISC: Reduced Instruction Set Computer (1980 s) y. A small number of fixed-length instructions y. Simple addressing modes y. A large number of registers y. Instructions executed in one clock cycle z Intel i 860 (“Cray on a Chip”) y 82 instructions, 32 -bit long each y. Four addressing modes y 32 general-purpose registers

RISC vs. CISC z CISC: Complex Instruction Set Computer y. A large number of RISC vs. CISC z CISC: Complex Instruction Set Computer y. A large number of variable length instructions y. Multiple addressing modes y. A small number of registers y. Multiple number of clock cycles to execute z Intel 8086 y. Over 3000 instruction forms, 1 -6 bytes y 9 addressing modes y 8 general-purpose registers y. Execution from 2 to 80+ cycles

RISC vs. CISC z. RISC y. Control unit is much simpler (simpler instructions, execution RISC vs. CISC z. RISC y. Control unit is much simpler (simpler instructions, execution in 1 CLK) y. Faster execution with less total on-chip logic y. Chip area: 10% (vs 50% for CISC) y. More area for register file, data and instruction caches, FPU, and co-processor y. Power. PC: 32 -bit, by IBM, Apple, Motorola y. Sparc: for Sun. Micro workstations

Application-Specific Processors z. DSP Chips y. Mostly for analog signal processing y. ADC-DSP-DAC architecture Application-Specific Processors z. DSP Chips y. Mostly for analog signal processing y. ADC-DSP-DAC architecture y. Avoid processing analog signals using discrete circuits, involving capacitors and inductance y. DSP: conduct complex mathematic functions x. Digital filter, spectrum analysis

Application-Specific Processors z DSP Chip Architecture y. Different data/program areas: Harvard Architecture y. Hardware Application-Specific Processors z DSP Chip Architecture y. Different data/program areas: Harvard Architecture y. Hardware multipliers and adders, optimized to execute on a single cycle y. Arithmetic pipelining: several instructions operated at once y. Hardware loop control y. Multiple IO ports for communication with other processors

Summary of Processor History q 1940 s: Vacuum tube, large and consuming large power Summary of Processor History q 1940 s: Vacuum tube, large and consuming large power q 1950 s: Transistor (1948 -) q 1959: First IC (second industrial revolution) q 1960 s: IC was popular to build CPU’s. q 1971: Intel 4004 microprocessor (2300 transistors) Starts of the microprocessor age q Late 1970’s: 8080/85

Summary of Processor History q 1980: RISC (reduced instruction set computer) q CISC (complicated Summary of Processor History q 1980: RISC (reduced instruction set computer) q CISC (complicated instruction set computer) vs. RISC q CISC family: Intel 80 x 86, Pentium; Motorola 68000 series q All others are RISC series.

Evolution of INTEL Processors 4004 (’ 71)-Pentium Pro (’ 93 -) Evolution of INTEL Processors 4004 (’ 71)-Pentium Pro (’ 93 -)

INTEL z. Integrated Electronics y 1968: founded by Robert Noyce and Gordon Moore y. INTEL z. Integrated Electronics y 1968: founded by Robert Noyce and Gordon Moore y. IA: Intel Architecture (e. g, IA-16, IA-32, IA-64) since 8008 (’ 72) had became the de facto standard z. Evolution: x. Internal register sizes x. External bus widths x. Real, Protected, and Virtual 8086 modes

4 -bit Processors z 4004 yfirst microprocessor ybecame available in 1971 y 4 -bit 4 -bit Processors z 4004 yfirst microprocessor ybecame available in 1971 y 4 -bit microprocessor: x 4 -bit registers & 4 -bit data bus x#transistors: 2250 x. Min. feature size: 10 microns x. Address bus: 10 bits/1 K x 0. 06 MIPS (@ 0. 108 MHz) x. No internal cache

8 -bit Processors z 8008, 8080, 8085 ybecame available in 1974 y 8 -bit 8 -bit Processors z 8008, 8080, 8085 ybecame available in 1974 y 8 -bit microprocessor 8080

8086: IA standard z Became available in 1978 y 16 -bit data bus y 8086: IA standard z Became available in 1978 y 16 -bit data bus y 20 -bit address bus (was 16 -bit for 8080) ymemory organization: 16 segments of 64 KB (1 MB limit) z Re-organize CPU into BIU (bus interface unit) and EU (execution unit) y. Allow fetch and execution simultaneously z Internal register expanded to 16 -bit y. Allow access of low/high byte separately

8086 z. Hardware multiply and divide instructions z. External math co-processor z. Instruction set 8086 z. Hardware multiply and divide instructions z. External math co-processor z. Instruction set compatible with 8080/8085 z 8086: defined the 80 x 86 architecture

8086 z. Not quite successful y 16 -bit data bus: Requires two separate 8 8086 z. Not quite successful y 16 -bit data bus: Requires two separate 8 -bit memory banks y. Memory chips were expensive

8088: PC standard z Became available in 1979, almost identical to 8086 z 8 8088: PC standard z Became available in 1979, almost identical to 8086 z 8 -bit data bus: for hardware compatibility with 8080 z 16 -bit internal registers and data bus (same as 8086) z 20 -bit address bus (was 16 -bit for 8080) y. BIU re-designed z memory organization: 16 segments of 64 KB (1 MB limit) y. Two memory accesses for 16 -bit data (less efficient) y. But less cost z 8088: used by IBM PC (1982), 16 K-64 K, 4. 77 MHz

80186, 80188: High Integration CPU z PC system: y 8088 CPU + various supporting 80186, 80188: High Integration CPU z PC system: y 8088 CPU + various supporting chips x. Clock generator x 8251: serial IO (RS 232) x 8253: timer/counter x 8255: PPI (programmable periphial interface) x 8257: DMA controller x 8259: interrupt controller z 80186/80188: 8086/8088 + supporting functions y. Compatible instruction set (+ 9 new instructions)

80286 z. Became available in 1982 zused in IBM AT computer (1984) z 16 80286 z. Became available in 1982 zused in IBM AT computer (1984) z 16 -bit data bus zclock speed 25% faster than 8088, throughput 5 times greater than 8088 z 24 -bit address bus (16 MB) (vs. 20 -bit/1 M 8086)

80286: Real vs. Protected Modes z Larger address space: 24 -bit address bus y. 80286: Real vs. Protected Modes z Larger address space: 24 -bit address bus y. Real Mode vs. Protected Mode z Real Mode: y. Power on default mode y. Function like a 8086: use 20 -bit least significant address lines (1 M) y. Software compatible with 286 y 16 new instructions (for Protected Mode management) y. Faster 286: redesigned processor, plus higher clock rate (68 MHz)

80286: Real vs. Protected Modes z. Protected Mode: y. Multi-program environment y. Each program 80286: Real vs. Protected Modes z. Protected Mode: y. Multi-program environment y. Each program has a predetermined amount of memory y. Addressed via segment selector (physical addresses invisible): 16 M addressable y. Multiple programs loaded at once (within their respective segments), protected from read/write by each other

80286: Real vs. Protected Modes z. Protected Mode: y. Cannot be switch back to 80286: Real vs. Protected Modes z. Protected Mode: y. Cannot be switch back to real mode to avoid illegal access by switching back and forth between modes z. A faster 8086 only? y. MS-DOS requires that all programs be run in Real Mode

Clock Speed z. Electrical signals cannot change instantaneously (transition period required) z. System clock Clock Speed z. Electrical signals cannot change instantaneously (transition period required) z. System clock provides timing signal for synchronization z. Cannot be used to compare the performance of microprocessors with different instruction sets ye. g. , a 66 MHz Pentium is twice as fast as a 66 MHz 80486

80386 DX (aka. 80386) z available in 1985, a major redesign of 86/286 y. 80386 DX (aka. 80386) z available in 1985, a major redesign of 86/286 y. Compatibility commitment through 2000 z 32 -bit data and address buses (4 GB memory) y. Real Address Mode: 1 M visible, 286 real mode y. Protected Virtual Address Mode: x. On board MMU x. Segmented tasks of 1 byte to 4 G bytes • Segment base, limit, attributes defined by a descriptor register x. Page swapping: 4 K pages, up to 64 TB virtual memory space x. Windows, OS/2, Unix/Linux

80386 DX (aka. 80386) z Virtual 8086 mode (a special Protected mode feature): permitted 80386 DX (aka. 80386) z Virtual 8086 mode (a special Protected mode feature): permitted multiple 8086 virtual machinesmultitasking (similar to real mode) y. Windows (multiple MSDOS’s) z Clock rate: ymax. 40 MHz, 2 pulses per R/W bus cycle y. External memory cache to avoid wait x. Fast SRAM x 93% hit rate with 64 K cache z Compatible instructions (14 new)

80386 SX z 80386 SX: (for transition to 32 -bit) y 16 -bit data 80386 SX z 80386 SX: (for transition to 32 -bit) y 16 -bit data bus/32 -bit register y 24 -bit address bus

80486 DX z 1989: a polished 386, 6 new OS level instructions zvirtually identical 80486 DX z 1989: a polished 386, 6 new OS level instructions zvirtually identical to 386 in terms of compatibility z. RISC design concepts yfewer clock cycles per operation, a single clock cycle for most frequently used instructions y. Max 50 MHz y 5 stage execution pipeline x. Portions of 5 instructions execute at once

80486 DX z. Highly Integrated: y. On board 8 K memory cache y. FPP 80486 DX z. Highly Integrated: y. On board 8 K memory cache y. FPP (equivalent to external 80387 co-processor) z. Twice as fast as 386 at any given clock rate y 20 Mhz 486 ~= 40 Mhz 386

80486 SX z 80486 SX y. NOT a 16 -bit version for transition purpose 80486 SX z 80486 SX y. NOT a 16 -bit version for transition purpose yno coprocessor y. No internal cache y. For low-end applications y. Max. 33 Mhz only

80486 DX 2/DX 4: Overdrive Chips z. Processor speed increased too fast y. Redesign 80486 DX 2/DX 4: Overdrive Chips z. Processor speed increased too fast y. Redesign of microcomputer for compatibility becomes harder y. Solution: Separating internal speed with external speed, improve performance independently z 80486 DX 2/DX 4 – internal clock twice/three times (NOT four times) the external clock: runs faster internally

80486 DX 2/DX 4: Overdrive Chips z System board design is independent of processor 80486 DX 2/DX 4: Overdrive Chips z System board design is independent of processor upgrade (less expensive components are allowed) z Processor operate at maximum speed data rate internally y. Only slow access to external data operates at system board rate y. Internal cache offset the speed gap z 486 DX 2 66: 66 internal, 33 external z 486 DX 4 100: 100 internal, 33 external (3 x) z Overdrive sockets: for upgrading 486 dx/sx to 486 dx 2/dx 4 (with overdrive socket pin-outs)

Pentium: Superscaler Processor z available in 1992 z 32 -bit architecture z Superscaler architecture Pentium: Superscaler Processor z available in 1992 z 32 -bit architecture z Superscaler architecture y. Scaling: scaling down etchable feature size to increase complexity of IC (e. g. , DRAM) x 10 microns/4004 to 0. 13 microns (2001) y. Superscaler: go beyond simply scaling down y. Two instruction pipelines: each with own ALU, address generation circuitry, data cache interface y. Execute two different instructions simultaneously

Pentium: Superscaler Processor z. Onboard cache y. Separate 8 K data and code caches Pentium: Superscaler Processor z. Onboard cache y. Separate 8 K data and code caches to avoid access conflicts z. FPP z. Instruction pipeline: 8 stage z. Optimized floating point functions y 5 x-10 x FLOP’s of 486 y 2 x performance of 486 at any clock rate

Pentium: Superscaler Processor z. Compatibility with 386/486: y. Internal 32 -bit registers and address Pentium: Superscaler Processor z. Compatibility with 386/486: y. Internal 32 -bit registers and address bus y. Data bus expanded to 64 -bits for higher data transfer rate x. Compare 8088 to 386 sx transition

Pentium: Superscaler Processor znon-clone competition from AMD, Cyrix zdevelopment of brand identity by Intel Pentium: Superscaler Processor znon-clone competition from AMD, Cyrix zdevelopment of brand identity by Intel

Pentium Pro: Two Chips in One z Became available in 1995 z Superscaler of Pentium Pro: Two Chips in One z Became available in 1995 z Superscaler of degree 3 y. Can execute 3 instructions simultaneously z Optimized for 32 -bit operating systems (e. g. , Windows NT, OS 2/Warp) z Two separate silicon die on the same package y. Processor: 0. 35 u, 5. 5 million transistors y 256 KB(/512 K) Level 2 cache included on chip, 15. 5 million transistors in smaller area

Pentium Pro: Two Chips in One z. On Board Level 2 cache y. Simplifies Pentium Pro: Two Chips in One z. On Board Level 2 cache y. Simplifies system board design y. Requires less space y. Gains faster communication with processor z. Internal (level 1) cache: 8 K z. Pentium Pro 133 ~= 2 x Pentium 66 ~= 4 x 486 DX 2 66

Pentium Pro: Dynamic Execution z Dynamic execution: reduce idle processor time by predicting instruction Pentium Pro: Dynamic Execution z Dynamic execution: reduce idle processor time by predicting instruction behaviors y. Multiple Branch Prediction: look as far as 30 instructions ahead to anticipate program branches y. Data Flow Analysis: looks at upcoming instructions and determine if they are available for processing, depending on other instructions. Determine optimal execution sequences. y. Speculative Execution: execute instructions in different order as entered. Speculative results are stored until final states can be determined.

Processor Future What’s More from Moore’s Law? Processor Future What’s More from Moore’s Law?

Moore's Law z. In 1965, Gordon Moore predicted that: z“The number of transistors per Moore's Law z. In 1965, Gordon Moore predicted that: z“The number of transistors per integrated circuit would double every 18 months” z. He forecast that this trend would continue through 1975

Moore’s Law Moore’s Law

Other Microprocessors z. Motorola family yfrom 6809 (Apple II) through 68040 z. Power. PC Other Microprocessors z. Motorola family yfrom 6809 (Apple II) through 68040 z. Power. PC yjoint venture between Apple, IBM, and Motorola z. RISC Processors y. DEC Alpha, MIPS, Sun SPARC, etc.

CISC vs. RISC z. CISC (Complex Instruction Set Computer) y. CISC processors have a CISC vs. RISC z. CISC (Complex Instruction Set Computer) y. CISC processors have a large versatile instruction set that supports many complex addressing modes ymove complexity from software to hardware z. RISC (Reduced Instruction Set Computer) y. RISC processors have a small instruction set ymove complexity from hardware to software

Microprocessor Performance z. Two main factors: z. Respond time ythe time between the start Microprocessor Performance z. Two main factors: z. Respond time ythe time between the start and completion of a task, also referred to as execution time z. Throughput ythe total amount of work done in a given time

MIPS z. Million Instructions Per Second y. MIPS = (Instruction count) / (Execution time MIPS z. Million Instructions Per Second y. MIPS = (Instruction count) / (Execution time in micro second X 106) z. It specifies performance inversely to execution time z. Faster machines have a higher MIPS rating

Some Problems of MIPS z. Cannot compare computers with different instruction sets, since the Some Problems of MIPS z. Cannot compare computers with different instruction sets, since the instruction count will certainly differ z. MIPS varies between programs on the same computer

i. COMP z. An index provided by Intel for comparison of performance of their i. COMP z. An index provided by Intel for comparison of performance of their 32 -bit microprocessors z. Based on a variety of performance components that represent integer mathematics, graphics, etc. z. Combine results of a set of software application benchmarks

Chapter 2 Computer Codes, Programming, and Operating Systems Number Systems Computer Codes Programming Operating Chapter 2 Computer Codes, Programming, and Operating Systems Number Systems Computer Codes Programming Operating Systems

Number Systems z. Decimal: Base 10 z. Binary: Base 2 z. Octal: Base 8 Number Systems z. Decimal: Base 10 z. Binary: Base 2 z. Octal: Base 8 z. Hexadecimal: Base 16

Base Conversion: 2 10 z. Binary to Decimal y. D = i=0, n-1 bi Base Conversion: 2 10 z. Binary to Decimal y. D = i=0, n-1 bi x 2 i z. Decimal to Binary y. Repeated subtraction x. D’ = i=0, m-1 bi x 2 i = D - 2 m (bm=1) x. D <= D’ & m <= m’ (m’: max exp. s. t. (bm’=1) y. Long division x. D’ = D/2 … bi & D <= D’

MCS-51 Program Development. SDT (CVTSYM) Program Symbol Converter ICE . SYM Editor . ASM MCS-51 Program Development. SDT (CVTSYM) Program Symbol Converter ICE . SYM Editor . ASM Assembler (X 8051) . OBJ Linker . HEX (Link) Target

Chapter 3 80 x 86 Processor Architecture 8086/88 Segmented Memory 80386 80486 Pentium Pro Chapter 3 80 x 86 Processor Architecture 8086/88 Segmented Memory 80386 80486 Pentium Pro

The 8086 and 8088 Processor Model Programming Model The 8086 and 8088 Processor Model Programming Model

8086: IA standard z Became available in 1978 y 16 -bit data bus y 8086: IA standard z Became available in 1978 y 16 -bit data bus y 20 -bit address bus (was 16 -bit for 8080) ymemory organization: 16 segments of 64 KB (1 MB limit) z Re-organize CPU into BIU (bus interface unit) and EU (execution unit) y. Allow fetch and execution simultaneously z Internal register expanded to 16 -bit y. Allow access of low/high byte separately

8088: PC standard z Became available in 1979, almost identical to 8086 z 8 8088: PC standard z Became available in 1979, almost identical to 8086 z 8 -bit data bus: for hardware compatibility with 8080 z 16 -bit internal registers and data bus (same as 8086) z 20 -bit address bus (was 16 -bit for 8080) y. BIU re-designed z memory organization: 16 segments of 64 KB (1 MB limit) y. Two memory accesses for 16 -bit data (less efficient) y. But less cost z 8088: used by IBM PC (1982), 16 K-64 K, 4. 77 MHz

80186, 80188: High Integration CPU z PC system: y 8088 CPU + various supporting 80186, 80188: High Integration CPU z PC system: y 8088 CPU + various supporting chips x. Clock generator x 8251: serial IO (RS 232) x 8253: timer/counter x 8255: PPI (programmable periphial interface) x 8257: DMA controller x 8259: interrupt controller z 80186/80188: 8086/8088 + supporting functions y. Compatible instruction set (+ 9 new instructions)

8086 Processor Model: BIU+EU z. BIU y. Memory & IO address generation z. EU 8086 Processor Model: BIU+EU z. BIU y. Memory & IO address generation z. EU y. Receive codes and data from BIU x. Not connected to system buses y. Execute instructions y. Save results in registers, or pass to BIU to memory and IO

8086 Processor Model Address Generation and Bus Control EU CS ES SS DS IP 8086 Processor Model Address Generation and Bus Control EU CS ES SS DS IP Instruction Queue AH AL BH BL CH CL DH DL BP DI SI SP BIU ALU Flags 83

Fetch and Execution Cycle z BIU+EU allows the fetch and execution cycle to overlap Fetch and Execution Cycle z BIU+EU allows the fetch and execution cycle to overlap y 0. System boot, Instruction Queue is empty y 1. IP =>BIU=> address bus && IP++ y 2. Mem[(IP-1)] => Instruction Queue[tail++] y 3 a. Instr. Q[head] => EU => execution y 3 b. Mem[IP++] => Instr. Q[tail++] x. Maybe multiple instructions y. Repeat 3 a+3 b (overlapped)

Waiting Conditions: Memory Access z. BIU+EU: execute (almost) continuously without waiting z. Waiting Conditions: Waiting Conditions: Memory Access z. BIU+EU: execute (almost) continuously without waiting z. Waiting Conditions: Accessing memory locations not in queue y. BIU suspend instruction fetch y. Issues external memory address y. Resumes instruction fetch and execution

Waiting Conditions: Jump z. Next Jump Instruction y. Instructions in queue are discarded y. Waiting Conditions: Jump z. Next Jump Instruction y. Instructions in queue are discarded y. EU wait for the next instruction after the jump location to be fetched by BIU y. Resume execution

Waiting Conditions: Long Instructions z. Long Instruction is being executed y. Instruction Full y. Waiting Conditions: Long Instructions z. Long Instruction is being executed y. Instruction Full y. BIU waits y. Resume instruction fetch after EU pull one or tow bytes from queue

BIU: 8088 vs. 8086 z. BIU is the major difference z 8088: ydata bus: BIU: 8088 vs. 8086 z. BIU is the major difference z 8088: ydata bus: 8 -bit (vs. 16 -bit/8086) y. Instruction queue: 4 bytes (vs. 6 -byte/8086) z. Only 30% slower than 8086 y. If queue is kept full

8086 Programming Model AH BH CH DH AL BL CL DL CS ES SS 8086 Programming Model AH BH CH DH AL BL CL DL CS ES SS DS BP DI SI SP Flags H Flags L IP 89

8086 Programming Model z. Data Group: y. AX (AH+AL): Accumulator y. BX (BH+BL): Base 8086 Programming Model z. Data Group: y. AX (AH+AL): Accumulator y. BX (BH+BL): Base y. CX (CH+CL): Counter y. DX (DH+DL): Data

8086 Programming Model z. Segment Group: y. CS: Code Segment y. DS: Data Segment 8086 Programming Model z. Segment Group: y. CS: Code Segment y. DS: Data Segment y. ES: Extra Segment y. SS: Stack Segment z. Segment Registers: y. Base address to particular segments

8086 Programming Model z. Pointer/Index Group: y. IP: Instruction Pointer CS y. SI: Source 8086 Programming Model z. Pointer/Index Group: y. IP: Instruction Pointer CS y. SI: Source Index DS y. DI: Destination Index ES y. SP: Stack Pointer SS z. Index Registers: y. Index (offset) or Pointer to a Base address

8086 Flag Word Flag L: SF CF: Carry Flag ZF X AF X PF 8086 Flag Word Flag L: SF CF: Carry Flag ZF X AF X PF X CF CF= 0:No Carry (Add) or Borrow (SUB) CF= 1:high-order bit Carry/Borrow PF: (Even) Parity Flag (even number of 1’s in low-order 8 bits of result) AF: Aux. Carry: Carry/Borrow on bit 3 (Low nibble of AL) ZF: Zero Flag: (1: result is zero) SF: Sign Flag: (0: positive, 1: negative)

8086 Flag Word Flag H: X X OF DF IF TF TF: Trap flag 8086 Flag Word Flag H: X X OF DF IF TF TF: Trap flag (single-step after next instruction; clear by single-step interrupt) IF: Interrupt-Enable: enable maskable interrupts DF: Direction flag: auto-decrement (1) or increment(0) index on string operations OF: Overflow: signed result cannot be expressed within #bits in destination operand

Segmented Memory z. Linear vs. Segmented y. Linear Addressing: x. The entire memory is Segmented Memory z. Linear vs. Segmented y. Linear Addressing: x. The entire memory is regarded as a whole xthe entire memory space is available all the time y. Segmented: xmemory is divided into segments x. Process is limited to access designated segments at a given time

8086 Memory Organization z. Even and Odd Memory Banks y 16 -bit data bus 8086 Memory Organization z. Even and Odd Memory Banks y 16 -bit data bus two-byte / two one-byte access y. Allows processor to work on bytes or on words (16 -bit) x. IO operations are normally conducted in bytes y. Can handle odd-length instructions x. Single byte instructions x. Multiple byte (and very long) instructions

8086 Memory Organization z. Memory Space: y 20 -bit address bus y. Linearly, 1 8086 Memory Organization z. Memory Space: y 20 -bit address bus y. Linearly, 1 M bytes directly addressable z. Memory Banks y. Can read 16 -bit data (512 K words) from even and odd-addressed simultaneously x need Two memory banks in parallel x BHE control line: allows addressing even/odd banks or both

Memory Organization: Alignment z Endianess: y. One way to model multi-byte CPU register x. Memory Organization: Alignment z Endianess: y. One way to model multi-byte CPU register x. AX AH+AL y. Two ways to store operands in memory z Big-endian CPU: (IBM 370, M 68*, Sparc) y. High-order-byte-first (HOBF) y. Maps highest-order byte of internal register lowest (1 st) memory byte address y. Operand address of MSB x. MOV R 1, N N: 1 st byte in memory & MSB of register

Memory Organization: Alignment z Little-endian CPU: (DEC, Intel) y. Low-order-byte-first (LOBF) y. Maps lowest-order Memory Organization: Alignment z Little-endian CPU: (DEC, Intel) y. Low-order-byte-first (LOBF) y. Maps lowest-order byte of register 1 st memory byte y. Operand address of LSB (1 st memory byte) x. MOV AX, N N: 1 st byte in memory & LSB of register x. AL N, AH N+1 z Configurable: y. Can switch between Big/Little-endian, or y. Provide instructions which convert 16 -/32 -bit data between two byte ordering (80486)

8086 Memory Organization z Aligned operand y. Operand aligned at even-byte (word/dword) boundaries y. 8086 Memory Organization z Aligned operand y. Operand aligned at even-byte (word/dword) boundaries y. Allows single access to read/write one operand x. Through internal shift/swap mechanism, if necessary z Mis-aligned words: y. Word operand not start at even address y. Need 2 read cycles to read/write the word (8086) x. Issues two addresses to access the two even-aligned words containing the operand in order to access the operand xslower but transparent to programmer

8086 Memory Organization z 8088 yalways 2 cycles for word operations x. Aligned or 8086 Memory Organization z 8088 yalways 2 cycles for word operations x. Aligned or not y. Because of 8 -bit external data bus x. Single memory bank is sufficient

8086 Memory Map z. Memory Map: How memory space is allocated y. ROM Area: 8086 Memory Map z. Memory Map: How memory space is allocated y. ROM Area: boot, BIOS y. RAM: OS/User Apps & data y. Unused y. Reserved: for future hardware/software uses y. Dedicated: for specific system interrupt and rest functions, etc.

Segment Registers z 64 K memory segments x 16 z 16 -bit offset each Segment Registers z 64 K memory segments x 16 z 16 -bit offset each z. CS, DS, ES, SS

Logical and Physical Addresses z. Physical: 20 -bit z. Logical: 16 -bit y 16 Logical and Physical Addresses z. Physical: 20 -bit z. Logical: 16 -bit y 16 -byte segment boundaries z. Address Translation y. E. g. , CS: IP

80286 z. First with Protection Mode z. Review of 286 Protected Mode … Next 80286 z. First with Protection Mode z. Review of 286 Protected Mode … Next

80286 z. Became available in 1982 zused in IBM AT computer (1984) z 16 80286 z. Became available in 1982 zused in IBM AT computer (1984) z 16 -bit data bus zclock speed 25% faster than 8088, throughput 5 times greater than 8088 z 24 -bit address bus (16 MB) (vs. 20 -bit/1 M 8086)

80286: Real vs. Protected Modes z Larger address space: 24 -bit address bus y. 80286: Real vs. Protected Modes z Larger address space: 24 -bit address bus y. Real Mode vs. Protected Mode z Real Mode: y. Power on default mode y. Function like a 8086: use 20 -bit least significant address lines (1 M) y. Software compatible with 286 y 16 new instructions (for Protected Mode management) y. Faster 286: redesigned processor, plus higher clock rate (68 MHz)

80286: Real vs. Protected Modes z. Protected Mode: y. Multi-program environment y. Each program 80286: Real vs. Protected Modes z. Protected Mode: y. Multi-program environment y. Each program has a predetermined amount of memory y. Addressed via segment selector (physical addresses invisible): 16 M addressable y. Multiple programs loaded at once (within their respective segments), protected from read/write by each other

80286: Real vs. Protected Modes z. Protected Mode: y. Cannot be switch back to 80286: Real vs. Protected Modes z. Protected Mode: y. Cannot be switch back to real mode to avoid illegal access by switching back and forth between modes z. A faster 8086 only? y. MS-DOS requires that all programs be run in Real Mode

80386 Model z. Refine 286 Protect Mode z. Expand to 32 -bit registers z. 80386 Model z. Refine 286 Protect Mode z. Expand to 32 -bit registers z. New Virtual 8086 Mode

80386 Review 80386 Review

80386 DX (aka. 80386) z available in 1985, a major redesign of 86/286 y. 80386 DX (aka. 80386) z available in 1985, a major redesign of 86/286 y. Compatibility commitment through 2000 z 32 -bit data and address buses (4 GB memory) y. Real Address Mode: 1 M visible, 286 real mode y. Protected Virtual Address Mode: x. On board MMU x. Segmented tasks of 1 byte to 4 G bytes • Segment base, limit, attributes defined by a descriptor register x. Page swapping: 4 K pages, up to 64 TB virtual memory space x. Windows, OS/2, Unix/Linux

80386 DX (aka. 80386) z Virtual 8086 mode (a special Protected mode feature): permitted 80386 DX (aka. 80386) z Virtual 8086 mode (a special Protected mode feature): permitted multiple 8086 virtual machinesmultitasking (similar to real mode) y. Windows (multiple MSDOS’s) z Clock rate: ymax. 40 MHz, 2 pulses per R/W bus cycle y. External memory cache to avoid wait x. Fast SRAM x 93% hit rate with 64 K cache z Compatible instructions (14 new)

80386 SX z 80386 SX: (for transition to 32 -bit) y 16 -bit data 80386 SX z 80386 SX: (for transition to 32 -bit) y 16 -bit data bus/32 -bit register y 24 -bit address bus

80386: Real vs. Protected Modes z Larger address space: 32 -bit address bus (4 80386: Real vs. Protected Modes z Larger address space: 32 -bit address bus (4 G) y. Real Mode vs. Protected Mode (refined from 286) z Real Mode: y. Power on default mode y. Function like a 8086: (1) use only 20 -bit least significant address lines (1 M) (2) segmented memory retained (64 K) y. Software compatible with 286 z New Real Mode Features: yaccess to 32 -bit register set ytwo new segments: F, G

80386: Real vs. Protected Modes z. Protected Mode: ynew addressing mechanism vs. real mode 80386: Real vs. Protected Modes z. Protected Mode: ynew addressing mechanism vs. real mode ysupports protection levels ysegment size: 1 to 4 G (not 64 K, fixed) ysegment register: pointer to a descriptor table xnot base address

80386: Real vs. Protected Modes z. Protected Mode: ydescriptor table: (8 byte per entry) 80386: Real vs. Protected Modes z. Protected Mode: ydescriptor table: (8 byte per entry) x 32 -bit base address of segment xsegment size xaccess rights ymemory address = base address (in table) + offset (in instruction)

80386: Real vs. Protected Modes z. Protected Mode: y. Paging mechanism: xmap 32 -bit 80386: Real vs. Protected Modes z. Protected Mode: y. Paging mechanism: xmap 32 -bit linear address (base+offset) =>physical address & page frame address x (4 K page frames in system memory) x 64 TB of virtual memory

80386: Real vs. Protected Modes z. Protected Mode: y. Protection mechanism: xtasks/data/instructions are assigned 80386: Real vs. Protected Modes z. Protected Mode: y. Protection mechanism: xtasks/data/instructions are assigned a privilege level (PL) xtasks running at lower PL cannot access tasks or data segments at a higher PL xrunning programs that are protected from the others

80386: Real vs. Protected Modes z Two Ways to Run 8086 Programs: y. Real 80386: Real vs. Protected Modes z Two Ways to Run 8086 Programs: y. Real Mode y. Virtual 8086 Mode z Virtual 8086 Mode: yruns multiple 8086+other 386 (protected mode) programs independently yeach sees 1 MB (mapped via paging to anywhere in 4 GB space) yrunning V 8086+ Protected mode simultaneously

386 80386 Processor Model 386 80386 Processor Model

80386 Processor Model: BIU+CPU+MMU z BIU ycontrol 32 -bit address and data buses ykeep 80386 Processor Model: BIU+CPU+MMU z BIU ycontrol 32 -bit address and data buses ykeep instruction queue full (16 bytes) z Address pipelining yaddress of next memory location is output halfway through current bus cycle ymore address decode time yslower memory chip is OK yeasier to keep up with faster (2 CLK) bus cycle of 386

80386 Processor Model: BIU zdynamic data bus sizing yswitch between 16 -/32 -bit data 80386 Processor Model: BIU zdynamic data bus sizing yswitch between 16 -/32 -bit data bus on the fly yaccommodate to external 16 -bit memory cards or IO devices yadjust bus timing to use only the least significant 16 bits

80386 Processor Model: BIU z. External memory y 4 memory banks (4 x 8=32 80386 Processor Model: BIU z. External memory y 4 memory banks (4 x 8=32 bits) y. BE 0 -BE 3 for bank selection yaccess byte or word or double word xaligned operands: 1 bus cycle xmis-aligned (not %4): 2 bus cycles

80386 Processor Model: CPU z. CPU=IU (instruction) +EU (execution) yfetching & execution overlap z. 80386 Processor Model: CPU z. CPU=IU (instruction) +EU (execution) yfetching & execution overlap z. IU: yretrieval instructions from queue ydecode ystore in decoded queue z. EU: ALU+registers (32 -bit) yexecute decode instructions

80386 Processor Model: MMU z Segmentation unit y. Real mode: generate the 20 -bit 80386 Processor Model: MMU z Segmentation unit y. Real mode: generate the 20 -bit physical address y. Protected mode: store base/size/rights in descriptor registers xcache descriptor tables in RAM xfaster operations z Paging Unit ydetermines physical addresses associated with active segments (divided into 4 K pages) yvirtual memory support to allow larger programs

80386 Programming Model z. General Purpose Registers y. Data & Addresses Groups y. Status 80386 Programming Model z. General Purpose Registers y. Data & Addresses Groups y. Status & Control Flags x. VM, RF, NT, IOPL y. Segment Group

80386 Programming Model z. Special purpose Registers 80386 Programming Model z. Special purpose Registers

80386 Programming Model z. Memory Management ysegment descriptors xkeep base, size, access rights x 80386 Programming Model z. Memory Management ysegment descriptors xkeep base, size, access rights x 3 types of tables: global (GDT), local (LDT), interrupt (IDT) xaddressing: • index (to a table) + RPL • base + offset (from instruction) y. Paging x. TLB

80386 Programming Model z. Protection (PL) ytask: CPL yinstruction: RPL ydata segment: DPL z. 80386 Programming Model z. Protection (PL) ytask: CPL yinstruction: RPL ydata segment: DPL z. Gates yspecial descriptors that allows access to higher PL tasks from lower PL tasks

80486 Review … 80486 Review …

80486 DX z 1989: a polished 386, 6 new OS level instructions zvirtually identical 80486 DX z 1989: a polished 386, 6 new OS level instructions zvirtually identical to 386 in terms of compatibility z. RISC design concepts yfewer clock cycles per operation, a single clock cycle for most frequently used instructions y. Max 50 MHz y 5 stage execution pipeline x. Portions of 5 instructions execute at once

80486 DX z. Highly Integrated: y. On board 8 K memory cache y. FPP 80486 DX z. Highly Integrated: y. On board 8 K memory cache y. FPP (equivalent to external 80387 co-processor) z. Twice as fast as 386 at any given clock rate y 20 Mhz 486 ~= 40 Mhz 386

80486 SX z 80486 SX y. NOT a 16 -bit version for transition purpose 80486 SX z 80486 SX y. NOT a 16 -bit version for transition purpose yno coprocessor y. No internal cache y. For low-end applications y. Max. 33 Mhz only

80486 DX 2/DX 4: Overdrive Chips z. Processor speed increased too fast y. Redesign 80486 DX 2/DX 4: Overdrive Chips z. Processor speed increased too fast y. Redesign of microcomputer for compatibility becomes harder y. Solution: Separating internal speed with external speed, improve performance independently z 80486 DX 2/DX 4 – internal clock twice/three times (NOT four times) the external clock: runs faster internally

80486 DX 2/DX 4: Overdrive Chips z System board design is independent of processor 80486 DX 2/DX 4: Overdrive Chips z System board design is independent of processor upgrade (less expensive components are allowed) z Processor operate at maximum speed data rate internally y. Only slow access to external data operates at system board rate y. Internal cache offset the speed gap z 486 DX 2 66: 66 internal, 33 external z 486 DX 4 100: 100 internal, 33 external (3 x) z Overdrive sockets: for upgrading 486 dx/sx to 486 dx 2/dx 4 (with overdrive socket pin-outs)

486 Processor Features z 386 features: y. Real/Protected Modes y. Memory Management y. PL’s 486 Processor Features z 386 features: y. Real/Protected Modes y. Memory Management y. PL’s yregisters & bus sizes z. New features y 6 OS instructions y 8 K/16 K onboard cache (was external before 386)

486 Processor Features z. A better 386 y 5 stage instruction pipeline x. IF/ID/EX 486 Processor Features z. A better 386 y 5 stage instruction pipeline x. IF/ID/EX => PF/D 1/D 2/EX/WB x. PF: instructions => Q (2*16 -bytes) x. D 1: determine opcode x. D 2: determine memory address of operands x. EX: execute indicated OP x. WB: update register

486 Processor Features z. Reduced Instruction Cycle Times y 5 stage instruction pipeline (e. 486 Processor Features z. Reduced Instruction Cycle Times y 5 stage instruction pipeline (e. g. , Fig. 3. 18) yinstruction cycle times: x 8086: 4 CLK x 80386: 2 CLK x 80486: 1 CLK ( close to RISC) xabout 2 X faster than 386

486 Processor Model: 386+FPU+Cache z 386 units retained: BIU, CPU, MMU znew: FPU (80387) 486 Processor Model: 386+FPU+Cache z 386 units retained: BIU, CPU, MMU znew: FPU (80387) + Cache (8 K/16 K) z. FPU: y 387 onboard x 0. 8 u => #transistors increased (275 K => 1+ millions) xsimplified system board design xspeedup FP operations

486 Processor Model: Cache z. Cache (8 K/16 K (dx 4)) y. Function: bridge 486 Processor Model: Cache z. Cache (8 K/16 K (dx 4)) y. Function: bridge processor memory bandwidth x 8088: 4. 77 MHz x 80486: 50 MHz x. Pentium: 100 MHz x. Pentium Pro: 133 MHz x. Main Memory (DRAM): relatively slow y. Fast Static RAMs (SRAM) as cache

486 Processor Model: Cache z. Organization: y 8 K y 4 -way set associative 486 Processor Model: Cache z. Organization: y 8 K y 4 -way set associative x 4 direct mapped caches wired in parallel xeach block maps to a set of 4 lines yunified: data & code in the same cache ywrite-through: update cache and memory page on write operations

486 Processor Model: Cache zlocality (why caches help? ) yspatial locality: e. g. , 486 Processor Model: Cache zlocality (why caches help? ) yspatial locality: e. g. , array of data ytemporal: e. g. , loops in codes zoperations on hit/miss z 128 -bit cache lines y 32 -bit x N to catch locality (N=4) y 128 -bit = 16 -byte

486 Processor Model: Cache z Mapping: ymemory => many-to-many => cache y. Data RAM: 486 Processor Model: Cache z Mapping: ymemory => many-to-many => cache y. Data RAM: save memory data y. Tag RAM: save memory address information z 3 methods of mapping yfully associative: memory block to any cache line ydirect map: memory block to specific line xtrashing yset associative: memory block to a set of cache lines

486 Processor Model: Cache z. Replacement policy (LRU) yvalid bits: all 4 lines in 486 Processor Model: Cache z. Replacement policy (LRU) yvalid bits: all 4 lines in use ? x. NO => use any unused line x. YES => find one to replace y. LRU bits: which is least recently used

Pentium Review … Pentium Review …

Pentium: Superscaler Processor z available in 1992 z 32 -bit architecture z Superscaler architecture Pentium: Superscaler Processor z available in 1992 z 32 -bit architecture z Superscaler architecture y. Scaling: scaling down etchable feature size to increase complexity of IC (e. g. , DRAM) x 10 microns/4004 to 0. 13 microns (2001) y. Superscaler: go beyond simply scaling down y. Two instruction pipelines: each with own ALU, address generation circuitry, data cache interface y. Execute two different instructions simultaneously

Pentium: Superscaler Processor z. Onboard cache y. Separate 8 K data and code caches Pentium: Superscaler Processor z. Onboard cache y. Separate 8 K data and code caches to avoid access conflicts z. FPP z. Instruction pipeline: 8 stage z. Optimized floating point functions y 5 x-10 x FLOP’s of 486 y 2 x performance of 486 at any clock rate

Pentium: Superscaler Processor z. Compatibility with 386/486: y. Internal 32 -bit registers and address Pentium: Superscaler Processor z. Compatibility with 386/486: y. Internal 32 -bit registers and address bus y. Data bus expanded to 64 -bits for higher data transfer rate x. Compare 8088 to 386 sx transition

Pentium: Superscaler Processor znon-clone competition from AMD, Cyrix zdevelopment of brand identity by Intel Pentium: Superscaler Processor znon-clone competition from AMD, Cyrix zdevelopment of brand identity by Intel

Pentium Pro Review … Pentium Pro Review …

Pentium Pro: Two Chips in One z Became available in 1995 z Superscaler of Pentium Pro: Two Chips in One z Became available in 1995 z Superscaler of degree 3 y. Can execute 3 instructions simultaneously z Optimized for 32 -bit operating systems (e. g. , Windows NT, OS 2/Warp) z Two separate silicon die on the same package y. Processor: 0. 35 u, 5. 5 million transistors y 256 KB(/512 K) Level 2 cache included on chip, 15. 5 million transistors in smaller area

Pentium Pro: Two Chips in One z. On Board Level 2 cache y. Simplifies Pentium Pro: Two Chips in One z. On Board Level 2 cache y. Simplifies system board design y. Requires less space y. Gains faster communication with processor z. Internal (level 1) cache: 8 K z. Pentium Pro 133 ~= 2 x Pentium 66 ~= 4 x 486 DX 2 66

Pentium Pro: Dynamic Execution z Dynamic execution: reduce idle processor time by predicting instruction Pentium Pro: Dynamic Execution z Dynamic execution: reduce idle processor time by predicting instruction behaviors y. Multiple Branch Prediction: look as far as 30 instructions ahead to anticipate program branches y. Data Flow Analysis: looks at upcoming instructions and determine if they are available for processing, depending on other instructions. Determine optimal execution sequences. y. Speculative Execution: execute instructions in different order as entered. Speculative results are stored until final states can be determined.