Скачать презентацию COSC 3430 Computer Architecture Lecture 09 Single cycle Скачать презентацию COSC 3430 Computer Architecture Lecture 09 Single cycle

e7df5eec095c56c062b815648c0d01e7.ppt

  • Количество слайдов: 22

COSC 3430 Computer Architecture Lecture 09: Single cycle control and Multicycle Implementation PH 3: COSC 3430 Computer Architecture Lecture 09: Single cycle control and Multicycle Implementation PH 3: Chapter 5 sections 5. 4 and 5. 5 COSC 3430 L 09 Multicycle Implementation. 1

Single cycle datapath control COSC 3430 L 09 Multicycle Implementation. 2 Single cycle datapath control COSC 3430 L 09 Multicycle Implementation. 2

Control q Selecting the operations to perform (ALU, read/write, etc. ) q Controlling the Control q Selecting the operations to perform (ALU, read/write, etc. ) q Controlling the flow of data (multiplexor inputs) q Information comes from the 32 bits of the instruction q Example: add $8, $17, $18 q Instruction Format: 000000 10001 op q rs 10010 01000 rt rd 00000 100000 shamt func ALU's operation based on instruction type and function code COSC 3430 L 09 Multicycle Implementation. 3

Control q e. g. , what should the ALU do with this instruction q Control q e. g. , what should the ALU do with this instruction q Example: lw $1, 100($2) q 35 2 1 op q rs rt 100 16 bit offset ALU control inputs as developed in B. 6 0000 0001 0010 0111 1100 AND OR add subtract set-on-less-than NOR Not all of the above are used in this simplified datapath development COSC 3430 L 09 Multicycle Implementation. 4

Control q Must describe hardware to compute 4 -bit ALU control input from l Control q Must describe hardware to compute 4 -bit ALU control input from l l q given instruction type 00 = lw, sw 01 = beq, 10 = arithmetic function code for arithmetic ALUOp is a 2 bit output computed from instruction type Describe it using a truth table (can turn into gates): COSC 3430 L 09 Multicycle Implementation. 5

Single cycle with control COSC 3430 L 09 Multicycle Implementation. 6 Single cycle with control COSC 3430 L 09 Multicycle Implementation. 6

Settings of the control lines from the opcode INSTRUCTION R-type Beq Lw Sw OPCODES Settings of the control lines from the opcode INSTRUCTION R-type Beq Lw Sw OPCODES 0 4 35 43 COSC 3430 L 09 Multicycle Implementation. 7 Binary 000000 000100 100011 101011

Control. Generation of the control signals from opcode COSC 3430 L 09 Multicycle Implementation. Control. Generation of the control signals from opcode COSC 3430 L 09 Multicycle Implementation. 8

Truth table for the ALU 4 bit operation We show an implementation of this Truth table for the ALU 4 bit operation We show an implementation of this truth table with gates on the next slide. This could be considered a 3 bit output since the MSB is always 0 for our problem. COSC 3430 L 09 Multicycle Implementation. 9

Control. Generating the 4 bit ALU operation from the ALUOp 0 and ALUOp 1 Control. Generating the 4 bit ALU operation from the ALUOp 0 and ALUOp 1 and the function code (bits 0 -5) Example: Suppose ALUOP 1 = 1 and F 1 = 1 All others = 0 except ALUOP 0 is X. Output should be 0110 COSC 3430 L 09 Multicycle Implementation. 10

Single Cycle Disadvantages & Advantages q Uses the clock cycle inefficiently – the clock Single Cycle Disadvantages & Advantages q Uses the clock cycle inefficiently – the clock cycle must be timed to accommodate the slowest instruction l especially problematic for more complex instructions like floating point multiply Cycle 1 Cycle 2 Clk lw sw Waste May be wasteful of area since some functional units (e. g. , adders) must be duplicated since they can not be shared during a clock cycle but q Is simple and easy to understand q COSC 3430 L 09 Multicycle Implementation. 11

Single Cycle Implementation (an example) q Calculate cycle time assuming negligible delays except: memory Single Cycle Implementation (an example) q Calculate cycle time assuming negligible delays except: memory (200 ps), ALU and adders (100 ps), register file access (50 ps) • Assuming only the above delays, which of the following implementations would be faster and by how much? 1. An implementation in which every instruction operates in 1 clock cycle of a fixed length, or 2. An implementation where every instruction executes in 1 clock cycle using a variable-length clock, which for each instruction is only as long as it needs to be. (Such an approach is not practical, but it will allow us to see what is being sacrificed when all the instructions must execute in a single clock of the same length. ) l COSC 3430 L 09 Multicycle Implementation. 12

Example continued q To compare performance, assume the following instruction mix: 25% loads, 10% Example continued q To compare performance, assume the following instruction mix: 25% loads, 10% stores, 45% ALU instructions, 15% branches, and 5% jumps. q First compare the CPU execution times using the equation q CPU time = Instr count × CPI × Clock cycle time, so q CPU time = IC × Clock cycle time, since CPI = 1 for both cases COSC 3430 L 09 Multicycle Implementation. 13

Steps and times for various instructions COSC 3430 L 09 Multicycle Implementation. 14 Steps and times for various instructions COSC 3430 L 09 Multicycle Implementation. 14

Example continued q The clock cycle for a machine with a single clock cycle Example continued q The clock cycle for a machine with a single clock cycle time for all instructions will be determined by the longest instruction, which is 600 ps, so CPU time = 600 ps (IC). q A machine with a variable clock cycle time has an average time per instruction of CPU cycle = 600(25%) + 550(10%) + 400(45%) + 350(15%0 + 200(5%) = 447. 5 ps. q Since the variable clock has a shorter average clock cycle, it’s CPU time = 447. 5 ps (IC). The performance improvement is then 600/447. 5 = 1. 34. COSC 3430 L 09 Multicycle Implementation. 15

Example continued q Hence the variable clock implementation is 1. 34 times faster. q Example continued q Hence the variable clock implementation is 1. 34 times faster. q Unfortunately, implementing a variable speed clock for each instruction class is extremely difficult, and the overhead for such an approach could be larger than any advantage gained. As we will later see, an alternative is to use a shorter clock cycle that does less work and then vary the number of clock cycles for the different instruction classes. q The penalty for using a single-cycle design with a fixed clock cycle is significant, but might be acceptable for the small instruction set we are using. Early computers did exactly this. However, implementing a floating point unit for example, or an ISA with more complex instructions, wouldn’t work well at all. COSC 3430 L 09 Multicycle Implementation. 16

Example continued q Because we must assume the clock cycle is equal to the Example continued q Because we must assume the clock cycle is equal to the worst-case delay for all instructions, we can’t use implementations that reduce the delay of the common case unless they also improve the worst case time. q A single cycle implementation thus violates one of our key design principles of making the common case fast. COSC 3430 L 09 Multicycle Implementation. 17

Single Cycle Datapath with Control Unit 0 Add ALUOp Reg. Dst PC Read Address Single Cycle Datapath with Control Unit 0 Add ALUOp Reg. Dst PC Read Address Instr[31 -0] Mem. Read Memto. Reg Mem. Write ALUSrc Reg. Write ovf Instr[25 -21] Read Addr 1 Register Read Instr[20 -16] Read Addr 2 Data 1 File 0 Write Addr Read 1 Instr[15 -11] Instr[15 -0] Write Data zero 0 1 Sign 16 Extend 32 Address ALU control Data Memory Read Data 1 Write Data ALU Data 2 Instr[5 -0] COSC 3430 L 09 Multicycle Implementation. 18 1 PCSrc Branch Instr[31 -26] Control Unit Instruction Memory Add Shift left 2 4 0

Where we are headed q Single Cycle Problems: l l l q what if Where we are headed q Single Cycle Problems: l l l q what if we had a more complicated instruction like floating point? The clock cycle is set by the longest instruction execution time. Even with our simplified implementation, the clock cycle time will be determined by the time for a load instruction which uses the instruction memory, register file, the ALU, data memory, and the register file again. One Solution: A multicycle datapath l l use a “smaller” cycle time have different instructions take different numbers of cycles COSC 3430 L 09 Multicycle Implementation. 19

Multicycle Datapath Approach q Let an instruction take more than 1 clock cycle to Multicycle Datapath Approach q Let an instruction take more than 1 clock cycle to complete l Break up instructions into steps where each step takes a cycle while trying to - balance the amount of work to be done in each step - restrict each cycle to use only one major functional unit l q Not every instruction takes the same number of clock cycles In addition to faster clock rates, multicycle allows functional units that can be used more than once per instruction as long as they are used on different clock cycles, as a result l only need one memory – but only one memory access per cycle l need only one ALU/adder – but only one ALU operation per cycle COSC 3430 L 09 Multicycle Implementation. 20

Multicycle Datapath Approach, con’t At the end of a cycle ALUout Read Addr 1 Multicycle Datapath Approach, con’t At the end of a cycle ALUout Read Addr 1 Register Read Addr 2 Data 1 File Write Addr Read Data 2 Write Data B PC Write Data l Memory Address Read Data (Instr. or Data) A Store values needed in a later cycle by the current instruction in an internal register (not visible to the programmer). All (except IR) hold data only between a pair of adjacent clock cycles (no write control signal needed) IR l MDR q IR – Instruction Register MDR – Memory Data Register A, B – regfile read data instructions are ALUout – ALU output visible Data used by subsequentregisters stored in programmer registers (i. e. , register file, PC, or memory) COSC 3430 L 09 Multicycle Implementation. 21

Next Lecture and Reminders q Next lecture l MIPS multicycle datapath and control - Next Lecture and Reminders q Next lecture l MIPS multicycle datapath and control - Reading assignment – PH, Chapter 5. 5 COSC 3430 L 09 Multicycle Implementation. 22