1e16039e9890cc1ab1e092a93de72f0d.ppt
- Количество слайдов: 44
Computer Architecture (“MAMAS”, 234267) Spring 2014 Lecturer: Yoav Etsion Reception: Mon 15: 00, Fishbach 306 -8 TAs: Nadav Amit, Gil Einziger, Franck Sala Presentation based on slides by David Patterson, Avi Mendelson, Lihu Rappoport, Adi Yoaz and Dan Tsafrir Computer Architecture 2014 – Introduction
Computer System Structure Computer Architecture 2014 – Introduction
ure 1 U Weiser COMPUTER SYSTEM COMPONENTS Archaic CPU - Memory BUS ADAPTER CACHE I/O CPU MAIN MEMORY BUS I/O CONTROLLERS LAN Disk etw N 3 W k + or printer scanner keyboard mouse. . . Computer Architecture 2014 – Introduction 3
ture 1 U Weiser COMPUTER SYSTEM COMPONENTS Yesterday CPU cache MAIN MEMORY North Bridge South Bridge Network +WLAN Disk printer scanner keyboard mouse. . . I/O CONTROLLERS 4 4 Computer Architecture 2014 – Introduction
ture 1 U Weiser COMPUTER SYSTEM COMPONENTS now CPU MC+cache+G MAIN MEMORY South Bridge Printer, scanner Keyboard, mouse. . . Network +WLAN Disk/SSD 5 Computer Architecture 2014 – Introduction 5
Classical Motherboard Diagram Cache More to the “north” = closer to the CPU = faster CPU BUS North Bridge External Graphics Card PCI express 2. 0 IOMMU DDR 2 or DDR 3 On-board Memory Graphics controller South Bridge 6 Serial Port Parallel Port IO Controller Floppy Drive keybrd USB controller mouse Mem BUS DDR 2 or DDR 3 Channel 2 PCI express × 1 SATA controller DVD Drive Channel 1 Hard Disk PCI Sound Card speakers Lan Adap LAN Computer Architecture 2014 – Introduction
Course Focus u Start from CPU (=processor) v v u Move on to Memory Hierarchy v v v u 7 Caching Main memory Virtual Memory Move on to PC Architecture v u Instruction set, performance Pipeline, hazards Branch prediction Out-of-order execution System & chipset, DRAM, I/O, Disk, peripherals End with some Advanced Topics Computer Architecture 2014 – Introduction
The Processor Computer Architecture 2014 – Introduction
Architecture vs. Microarchitecture u Architecture: = The processor features as seen by its user = Interface v u Microarchitecture: = Manner by which the processor is implements the Architecture = Implementation details v u 9 Caches size and structure, number of execution units, … Note: different processors with different u-archs can support the same arch v u Instruction set, number of registers, addressing modes, … Example: ARM V 8, ARM V 9 We will address both Computer Architecture 2014 – Introduction
Why Should We Care? u Abstractions enhance productivity, so: v v u Same goes for arch v 10 If we know the arch (=interface), Why should we care about the u-arch (=internals)? Just details for a programmer of a high-level language Computer Architecture 2014 – Introduction
Recent Processor Trends Source: http: //www. scidacreview. org/0904/html/multicore. html Computer Architecture 2014 – Introduction
Well-Known Moore’s Law Graph taken from: http: //www. intel. com/technology/mooreslaw/index. htm 12 Computer Architecture 2014 – Introduction
13 Computer Architecture 2014 – Introduction
The Story in a Nutshell Transistors (1000 s) clock speed (MHz) power (W) Instructions/cycle (ILP) 14 Computer Architecture 2014 – Introduction
Took the Industry by Surprise 15 Computer Architecture 2014 – Introduction
Dire Implications: Performance 16 Computer Architecture 2014 – Introduction
Dire Implications: Sales 17 Computer Architecture 2014 – Introduction
Dire Implications: Sales 18 Computer Architecture 2014 – Introduction
Dire Implications: Programmers 19 Computer Architecture 2014 – Introduction
Supercomputing: “Top 500 list” 20 Computer Architecture 2014 – Introduction
Dire Implications: Supercomputing 21 Computer Architecture 2014 – Introduction
Processor Performance Computer Architecture 2014 – Introduction
Metrics: IC, CPI, IPC u CPUs work according to a clock signal v v u Instruction Count (IC) v u Clock cycle: measured in nanoseconds (10 -9 of a second) Clock frequency = 1/|clock cycle|: in GHz (109 cycles/sec) Total number of instructions executed in the program Cycles Per Instruction (CPI) v Average #cycles per Instruction (in a given program) CPI = v 23 #cycles required to execute the program IC IPC (= 1/CPI) : Instructions per cycles. Can be > 1; see the “story in a nutshell slide” Computer Architecture 2014 – Introduction
Minimizing Execution Time u CPU Time - time required to execute a program CPU Time = IC CPI clock cycle u Our goal: minimize CPU Time (any of above components) v Minimize clock cycle: increase GHz (processor design) v Minimize CPI: u-arch (e. g. : more execution units) v Minimize IC: arch (e. g. SSE instruction) SSE = streaming SIMD extension (Intel) 24 Computer Architecture 2014 – Introduction
Alternative Way to Calculate CPI u u ICi = #times instruction of type-i is executed in program IC = #instruction executed in program = Fi = relative frequency of type-i instruction = ICi/IC CPIi = #cycles to execute type-i instruction v e. g. : CPIadd = 1, CPImul = 3 u #cycles required to execute the program: u CPI: 25 Computer Architecture 2014 – Introduction
Performance Evaluation: How? u Performance depends on v v u 26 Application Input Mathematical analysis Computer Architecture 2014 – Introduction
Benchmarks u Use benchmarks & measure how long it takes v u Use real applications (=> no absolute answers) Preferably standardized benchmarks (+input), e. g. , v SPEC INT: integer apps • Compression, C complier, Perl, text-processing, … v v v u Sometimes you see FLOPS (“pick” or “sustained”) v 27 SPEC FP: floating point apps (mostly scientific) TPC benchmarks: measure transaction throughput (DB) SPEC JBB: models wholesale company (Java server, DB) Supercomputers (top 500 list), against LINPACK Computer Architecture 2014 – Introduction
Evaluating Performance u Use a performance simulator to evaluate the performance of a new feature / algorithm v v u Models the uarch to a great detail Run 100’s of representative applications Produce the performance s-curve v v Sort the applications according to the IPC increase Baseline (0%) is the processor without the new feature Bad S-curve Negative outliers 28 Positive outliers Good S-curve Positive outliers Small negative outliers Computer Architecture 2014 – Introduction
Amdahl’s Law u Suppose we accelerate the computation such that v P = portion of computation we make faster v S = speedup experienced by the portion we improved u For example v If an improvement can speedup 40% of the computation => P = 0. 4 v u 29 If the improvement makes the portion run twice as fast => S = 2 Then overall speedup = Computer Architecture 2014 – Introduction
Amdahl’s Law - Example u FP operations improved to run 2 x faster v S = 2, but… v P = only affects 10% of the program v u Conclusion v 30 Speedup: Better to make common case fast… Computer Architecture 2014 – Introduction
Amdahl’s Law – Parallelism u When parallelizing a program v P = proportion of program that can be made parallel v 1 - P = inherently serial v v u 31 N = number of processing elements (say, cores) Speedup: Serial component imposes a hard limit Computer Architecture 2014 – Introduction
Instruction Set Design software The ISA is what the user & compiler see instruction set hardware 32 The HW implements the ISA Computer Architecture 2014 – Introduction
Considerations in ISA Design u Instruction size v Long instructions take more time to fetch from memory v Longer instructions require a larger memory • Important for small (embedded) devices, e. g. , cell phones u Number of instructions (IC) v u Reduce IC => reduce runtime (at a given CPI & frequency) Virtues of instructions simplicity v v Optimization can be applied better to simpler code v 33 Simpler HW allows for: higher frequency & lower power Cheaper HW Computer Architecture 2014 – Introduction
Basing Design Decisions on Workload Immediate argument’s size in bits (histogram) 30% Int. Avg. FP Avg. 20% 10% v v 34 15 14 13 12 Immediate data bits 11 10 9 8 7 6 5 4 3 2 1 0 0% 1% of data values > 16 -bits Having 16 bits is likely good enough Computer Architecture 2014 – Introduction
CISC Processors u CISC - Complex Instruction Set Computer v v Example: x 86 The idea: a high level machine language • Once people programmed in assembly, CISC supposedly easier u Characteristic v v Many instruction types, with a many addressing modes Some of the instructions are complex • Execute complex tasks • Require many cycles v ALU operations directly on memory (e. g. , arr[j] = arr[i]+n) • Registers not used (and, accordingly, only a few registers exist) v Variable length instructions • common instructions get short codes save code length 35 Computer Architecture 2014 – Introduction
But it Turns Out… Rank instruction % of total executed 1 load 22% 2 conditional branch 20% 3 compare 16% 4 store 12% 5 add 8% 6 and 6% 7 sub 5% 8 move register-register 4% 9 call 1% 10 return 1% Total 96% Simple instructions dominate instruction frequency 36 Computer Architecture 2014 – Introduction
CISC Drawbacks u Complex instructions and complex addressing modes complicates the processor slows down the simple, common instructions contradicts Make The Common Case Fast u Compilers don’t use complex instructions / indexing methods u Variable length instructions are real pain in the neck v v Difficult to decode few instructions in parallel • As long as instruction is not decoded, its length is unknown It is unknown where the instruction ends It is unknown where the next instruction starts An instruction may be longer than a cache line • Or even longer than a page (in theory) 37 Computer Architecture 2014 – Introduction
RISC Processors u RISC - Reduced Instruction Set Computer v u The idea: simple instructions enable fast hardware Characteristic v v A small instruction set, with only a few instructions formats Simple instructions • execute simple tasks • Most of them require a single cycle (with pipeline) v v A few indexing methods Load/Store machine: ALU operations on registers only • Memory is accessed using Load and Store instructions only • Many orthogonal registers • Three address machine: Add dst, src 1, src 2 v u 38 Fixed length instructions Examples: MIPSTM, Sparc. TM, Alpha. TM, Power. TM Computer Architecture 2014 – Introduction
RISC Processors (Cont. ) u Simple arch => simple u-arch v v v u Compiler can be smarter v v u Better pipeline usage Better register allocation Existing RISC processor are not “pure” RISC v 39 Room for larger on die caches Smaller => faster Easier to design & validate (=> cheaper to manufacture) Shorten time-to-market More general-purpose registers (=> less memory refs) Various complex operations added along the way Computer Architecture 2014 – Introduction
Compilers and ISA u Ease of compilation v Orthogonality: • no special registers • few special cases • all operand modes available with any data type or instruction type v Regularity: • no overloading for the meanings of instruction fields v streamlined • resource needs easily determined u Register assignment is critical too v 40 Easier if lots of registers Computer Architecture 2014 – Introduction
Still, CISC Is Dominant u x 86 (CISC) dominates the processor market v u Legacy v v v u A vast amount of existing software Intel, AMD, Microsoft benefit But put lot of money to compensate for disadvantage CISC internally arch emulates RISC v v 41 Not necessarily because it is CISC… Starting at Pentium II and K 6 , x 86 processors translate CISC instructions into RISC-like operations internally Inside core is a RISC machine Computer Architecture 2014 – Introduction
Software Specific Extensions u Extend arch to accelerate exec of specific apps u Example: SSETM – Streaming SIMD Extensions v v 128 -bit packed (vector) / scalar single precision FP (4× 32) Introduced on Pentium® III on ’ 99 8 new 128 bit registers (XMM 0 – XMM 7) Accelerates graphics, video, scientific calculations, … u Packed: Scalar: 128 -bits x 3 x 2 x 1 128 -bits x 0 x 3 x 2 + y 3 y 2 x 0 + y 1 y 0 x 3+y 3 x 2+y 2 x 1+y 1 x 0+y 0 42 x 1 y 3 y 2 y 1 y 0 y 3 y 2 y 1 x 0+y 0 Computer Architecture 2014 – Introduction
BACKUP 43 Computer Architecture 2014 – Introduction
Compatibility u Backward compatibility (HW responsibility) v When buying new hardware, it can run existing software: • i 5 can run SW written for Core 2 Duo, Pentium 4, Pentium M, Pentium III, Pentium , 486, 386, 268 BTW: u Forward compatibility (SW responsibility) v v u Architecture-independent SW v v 44 For example: MS Word 2003 can open MS Word 2010 doc Commonly supports one or two generations behind Run SW on top of VM that does JIT (just in time compiler): JVM for Java and CLR for. NET Interpreted languages: Perl, Python Computer Architecture 2014 – Introduction


