91c0dda8d6c795b1d80ca2c72c12ef7e.ppt
- Количество слайдов: 19
Reducing the Complexity of the Register File in Dynamic Superscalar Processors Rajeev Balasubramonian, Sandhya Dwarkadas, and David H. Albonesi In Proceedings of the 34 th Annual ACM/IEEE International Symposium on Microarchitecture, pp. 237 – 248, MICRO 2001. Nathir Rawashdeh University of Massachusetts, Amherst Low Power Architecture, Professor Moritz Note : This presentation is, to a large extent, a reproduction of slides created buy the School of Electrical Engineering at Korea University. I have altered them and added new slides to better suit my audience. Nathir Rawashdeh (3 November 2003)
Contents q Motivation q Reduce register file size Two Level Register File (1 st Technique) q Reduce port complexity Banked Organization (2 nd Technique) q Evaluation Two-Level Register File Evaluation Banked Register File Evaluation Combining the Two Techniques
Motivation q Modern high-performance processors use an out-of-order superscalar core to dynamically extract instruction level parallelism (ILP) from running applications. Examine large window of in-flight instructions to find/issue multiple ready and independent instructions every cycle. A larger instruction window: – – Achieves better ILP Requires a larger register file, issue queue, and reorder buffer. Large multi-ported register file can potentially compromise clock cycle time in future wire-limited technologies. q Suggested two Methods in this Paper: Two-Level Register File Organization to reduce register file size requirements. Banked Organization to reduces port complexity.
Motivation q Conventional Register File Organization • Logical registers are renamed to physical registers • At 1 and 2 : lr 5 is renamed to pr 18 • Branch at 3 is predicted not taken -> must keep pr 18 in case of misprediction. Lr 5 at 5 must be allocated a new reg. pr 27 • Pr 18 can only released to the free-list after 5 commits. Then lr 5 at 5 will be remapped to pr 27
Two-Level Register File (1 st Technique) q Level One (L 1) Register File : Leaves register values that have potential readers. q Level Two (L 2) Register File : Keeps other register values waiting to be released after their instructions commit. q Effects: Reduced register file access time. Because a smaller portion (L 1) of the register file is on the critical path. More energy needed to copy register contents between L 1 and L 2.
Two-Level Register File q Microarchitectural Changes Assumption : 8 -way issue processor During rename, register renames correspond only to L 1 Physical registers, L 2 registers are hidden from the rename process.
Two-Level Register File q Usage Table Monitors the usage statistics for each L 1 physical registers. Maintaining Information – Pending consumer counter : keeps track of the number of pending consumers of that value. u Increment : during rename, an instruction that sources the register increments the counter u Decrement : during issue, the same instruction decrements the counter or if the instruction is squashed after a mispredict. – Overwrite bit (single bit) u Set when the physical register is no longer the latest mapping for its logical register. (the lr’s mapping changed to a different pr) – Another “result-written” bit u Indicates if a result has been written into the physical register. – Sequence number counter (sequence number 1) u For the branch immediately following the instruction that writes to this physical register. – Sequence number counter (sequence number 2) u For the branch immediately preceding the next instruction that writes to the same logical register. • Sequence number counter size : log 2(ROB size).
Two-Level Register File q Single L 2 ID valid bit Added to each ROB entry. Indicates that the destination register ID in that entry corresponds to an L 2 register.
Two-Level Register File q Copy List Keeps track of L 1 -L 2 copies for recovery from a branch mispredict. Maintaining Information for each L 2 entry: – – – The L 1 physical register name that had earlier contained the value. The sequence number for the branch immediately following the instruction that writes to this physical register. The sequence number for the branch immediately preceding the next instruction that writes to the same logical register. Two branch sequence numbers stored indicate the live period of a physical register value, the period during which instructions sourcing this value are dispatched.
Minimally-Ported Banked Register File (2 nd Technique) q Motivation The large number of register file ports (in a wide-issue processor) – – Increase complexity -> more power consumption Increase reg. file access time -> will limit clock speed in future wirelimited technologies. The number of ports required on average are a lot fewer than the actual port count (that supports the worst case). Reasons: – – – Many operands are read off the bypass network, not form the reg. file. Many instructions only have a single register operand. A number of instructions produce results that are not written to the register file (branches, stores, effective address computation part of a load or store)
Minimally-Ported Banked Register File
Evaluation q Metrics used to evaluate the Two-Level Register File Organization and Banked Register File Organization. IPC : instructions per cycle IPS : instructions per second = IPC/Access Time Assume register file access time is the bottleneck, IPS is a better measure than IPC
Two-Level Register File Evaluation u IPC (single vs. two-level reg. file) Gap between the two lines : Addition of L 2 frees up more L 1 registers 1. 63 Two-level organization has IPC = (1. 67) with just 80 L 1 registers (and 80 L 2) 1. 65 Single-level organization requires as many as 140 registers to attain an IPC of 1. 65. Þout of 140 physical registers, only about 80 are active at any given time. Renaming 60 don’t have any consumers unless there is a misprediction or exception and they can be move away to the L 2. -
Two-Level Register File Evaluation u IPS (single vs. two-level reg. file) For single level register file, IPS peaks for a 100 -entry register file. max For two-level register file, peak IPS value is seen for 60 -entry L 1. max Optimal IPS with two-level organization is 17% better than the optimal IPS with a single-level register file ( better access time with two-level design).
Two-Level Register File Evaluation u IPS on individual applications. The 100 -L 1 has the longest access time, but it’s IPS is not always worse than the 60 -L 1. In those cases, the 100 -L 1’s IPC out ways the access time penalty. Two-level organization achieves best IPS because it maintains low access time and an IPC comparable (within 1%) to the single-level 100 -L 1 design.
Banked Register File Evaluation q Reg. file with a single read and single write port with N banks. Base Case: “Single bank, 4 rd, 4 wr” is within 2% of 24 -ported case Third Bar : penalty by conflicts for read ports. 1% IPC degradation Fourth Bar : additional penalty by write port conflicts. 5% IPC degradation Worst port contention for apps with high ILP
Banked Register File Evaluation q Reducing conflicts move from 4 to 8 banks With 8 banks -> almost no IPC degradation due to read/write port conflicts (compared to 4 banks in previous figure) Still 2% IPC loss over 24 -ported design
Combining the Two Techniques
Summary of various Organizations Two-level organization has slightly lower IPC than single-level, but 17% better IPS due to shorter L 1 access times. Energy penalty due to copying between L 1 & L 2. Banked (single port per bank) reg. file has shorter access time (>2 factor) and needs 18 times less energy than a conventional organization. The Choice of technique dependant on design goals


