Скачать презентацию Low Power Clocking Through the Use of Dual Скачать презентацию Low Power Clocking Through the Use of Dual

af575d3067d6d47fa2c03734af3f7ce2.ppt

  • Количество слайдов: 69

Low Power Clocking Through the Use of Dual Edge Triggered Flip-Flops Gabriel Ricardo Theresa Low Power Clocking Through the Use of Dual Edge Triggered Flip-Flops Gabriel Ricardo Theresa Holliday ACSEL Lab University of California, Davis 1

Outline l l l l Dual Edge Flip-Flops overview Standard Cell Characterization LEON Synthesis Outline l l l l Dual Edge Flip-Flops overview Standard Cell Characterization LEON Synthesis for SET design LEON Synthesis for DET design Issues with including Dual edge into synthesis flow Preliminary comparisons Conclusions and Future Work Questions ACSEL Lab University of California, Davis 2

Outline l l l l Dual Edge Flip-Flops overview Standard Cell Characterization LEON Synthesis Outline l l l l Dual Edge Flip-Flops overview Standard Cell Characterization LEON Synthesis for SET design LEON Synthesis for DET design Issues with including Dual edge into synthesis flow Preliminary comparisons Conclusions and Future Work Questions ACSEL Lab University of California, Davis 3

Symmetric Pulse Generator Flip-Flop (SPGFF) First stage, X and Y, are dynamic, second stage Symmetric Pulse Generator Flip-Flop (SPGFF) First stage, X and Y, are dynamic, second stage static NAND l Results in small delay l Can size to trade some delay for power l ACSEL Lab University of California, Davis 4

Operation of SPGFF Transparency window created by CLK and CLK 3 for stage 1 Operation of SPGFF Transparency window created by CLK and CLK 3 for stage 1 (CLK 1 and CLK 4 for stage 2), allows for X (Y) to conditionally evaluate based on input D. l Output stage NAND allows for X, Y to be passed to output based on clock value without the need for a latch. l ACSEL Lab University of California, Davis 5

Transmission Gate Master Slave (TGMS) ACSEL Lab University of California, Davis 6 Transmission Gate Master Slave (TGMS) ACSEL Lab University of California, Davis 6

Comparison between SPGFF and TGMS in 0. 18 um Delay Power EDP Clk load Comparison between SPGFF and TGMS in 0. 18 um Delay Power EDP Clk load SPGFF 356 ps 133 μW 1. 70 e-23 Js 12 f. F TGMS 354 ps 89. 9 μW 1. 13 e-23 Js 16 f. F ACSEL Lab University of California, Davis 7

Advantages of SPGFF Lowest clock energy of other DET-CSEs, resulting in higher clock power Advantages of SPGFF Lowest clock energy of other DET-CSEs, resulting in higher clock power savings l Energy delay product comparable to high performance single edge triggered clocked storage elements l ACSEL Lab University of California, Davis 8

Outline l l l l Dual Edge Flip-Flops overview Standard Cell Characterization LEON Synthesis Outline l l l l Dual Edge Flip-Flops overview Standard Cell Characterization LEON Synthesis for SET design LEON Synthesis for DET design Issues with including Dual edge into synthesis flow Preliminary comparisons Conclusions and Future Work Questions ACSEL Lab University of California, Davis 9

Characterization Methodology – Generating synthesis views l Created automated process for generating synopsys liberty Characterization Methodology – Generating synthesis views l Created automated process for generating synopsys liberty format (. lib) synthesis models. l Using perl scripts and gspice (spice pre/postprocessor) l Characterized for timing and energy. l Can easily extend to generate cadence synthesis models (. tlf). ACSEL Lab University of California, Davis 10

Characterization Methodology – Trip-points l l l Used same trip-points as those in technology Characterization Methodology – Trip-points l l l Used same trip-points as those in technology library. Nominal conditions: 25˚C, 1. 8 V supply Can easily generate best and worst case corner models (over temp and supply variation). Cell delay: defined as clock 50% rise/fall to Output (Q or QN) 50% rise/fall Transition time: 10%-90% rise, 90%-10% fall time ACSEL Lab University of California, Davis 11

Trip-points - Falling ACSEL Lab University of California, Davis 12 Trip-points - Falling ACSEL Lab University of California, Davis 12

Trip-points - Rising ACSEL Lab University of California, Davis 13 Trip-points - Rising ACSEL Lab University of California, Davis 13

Characterization Methodology Drive Characteristics l Build 5 x 5 non-linear delay table. l Clock Characterization Methodology Drive Characteristics l Build 5 x 5 non-linear delay table. l Clock slope values (nano-seconds) : 0. 03, 0. 1, 0. 4, 1. 5, 3 l Output load values (f. F): 0. 35, 21, 38. 5, 147, 311 ACSEL Lab University of California, Davis 14

Characterization Methodology – Trip-points l Setup time: sweep input transition towards active edge until Characterization Methodology – Trip-points l Setup time: sweep input transition towards active edge until 10% increase in clock to output delay. l Hold time: sweep input transition away from active edge until 10% increase in clock to output delay. ACSEL Lab University of California, Davis 15

Characterization Methodology – Setup-hold 10% push-out ACSEL Lab University of California, Davis 10% push-out Characterization Methodology – Setup-hold 10% push-out ACSEL Lab University of California, Davis 10% push-out 16

Characterization Methodology – Setup and Hold l Build 3 x 2 non-linear delay table. Characterization Methodology – Setup and Hold l Build 3 x 2 non-linear delay table. (3 ps accuracy) l Clock slope values (nano-seconds): 0. 03, 3 l Data slope values (nano-seconds): 0. 03, 0. 9, 3 ACSEL Lab University of California, Davis 17

Characterization Methodology – Internal energy l Characterized over same data points as drive characteristics Characterization Methodology – Internal energy l Characterized over same data points as drive characteristics for internal energy (5 x 5 lookup table). l Data pin, clock pin energy tables generated (1 x 5 lookup table). ACSEL Lab University of California, Davis 18

Characterization Results - single vs dual-edge – D to Q delay SPGFF ACSEL Lab Characterization Results - single vs dual-edge – D to Q delay SPGFF ACSEL Lab University of California, Davis TGMS 19

What is typical output load? l Extracted output loading from netlist for all CSEs. What is typical output load? l Extracted output loading from netlist for all CSEs. l Average load = 24 f. F l l 90% l (6. 8 min. inverters) of CSEs have load less than 60 f. F (17 min. sized inverters) ACSEL Lab University of California, Davis 20

Netlist extracted CSE output loading statistics ACSEL Lab University of California, Davis 21 Netlist extracted CSE output loading statistics ACSEL Lab University of California, Davis 21

Characterization Results - single vs dual-edge – Delay SPGFF TGMS Typical region of operation Characterization Results - single vs dual-edge – Delay SPGFF TGMS Typical region of operation ACSEL Lab University of California, Davis 22

Characterization Results – zoomed-in - single vs dual-edge – delay SPGFF ACSEL Lab University Characterization Results – zoomed-in - single vs dual-edge – delay SPGFF ACSEL Lab University of California, Davis TGMS 23

Characterization Results - single vs dual-edge – Energy delay product SPGFF ACSEL Lab University Characterization Results - single vs dual-edge – Energy delay product SPGFF ACSEL Lab University of California, Davis TGMS 24

Outline l l l l Dual Edge Flip-Flops overview Standard Cell Characterization LEON Synthesis Outline l l l l Dual Edge Flip-Flops overview Standard Cell Characterization LEON Synthesis for SET design LEON Synthesis for DET design Issues with including Dual edge into synthesis flow Preliminary comparisons Conclusions and Future Work Questions ACSEL Lab University of California, Davis 25

Leon SPARC core configuration ACSEL Lab University of California, Davis 26 Leon SPARC core configuration ACSEL Lab University of California, Davis 26

Leon SPARC synthesis l Synthesized using TSMC 0. 18 um standard cell library. l Leon SPARC synthesis l Synthesized using TSMC 0. 18 um standard cell library. l Target frequency of 200 MHz l Limit use of single sized D-FF. ACSEL Lab University of California, Davis 27

SET- Synthesis flow ACSEL Lab University of California, Davis 28 SET- Synthesis flow ACSEL Lab University of California, Davis 28

SET-CSE synthesis summary Area and Power Cell type Area (mm 2) % Power total SET-CSE synthesis summary Area and Power Cell type Area (mm 2) % Power total (m. W) % total Memory blocks 2. 03 55% 214. 3 72% Core 0. 71 19% 73 24% Clock tree (ideal net) N/A 4% Total 3. 7 ACSEL Lab University of California, Davis 11. 6 299 29

Core summary Core Area(mm 2) % total core Power (m. W) Sequential (1986 CSEs) Core summary Core Area(mm 2) % total core Power (m. W) Sequential (1986 CSEs) 0. 47 36% 26 Combinatorial + nets 0. 24 64% 47 Total 0. 71 73 Approximately 20 k-gates ACSEL Lab University of California, Davis 30

Clock tree loading Clock tree components Loading (p. F) Sequential cells (1986 cells) 5. Clock tree loading Clock tree components Loading (p. F) Sequential cells (1986 cells) 5. 18 Memory macro cells (6) 1. 37 Wire routing* 11. 4 Total 17. 94 * - based on library wire-load model ACSEL Lab University of California, Davis 31

Clock tree power estimation High-fanout nets are beyond the library’s wire-load models interpolation range. Clock tree power estimation High-fanout nets are beyond the library’s wire-load models interpolation range. l wire-load models are not meant for estimating balanced distribution nets such as clock nets. l Using library wire-load models for clock tree is not valid. l Use an H-tree estimation equation to obtain a ballpark number. l ACSEL Lab University of California, Davis 32

H-tree estimation equation l Equation developed by ACSEL lab member Nikola Nedovic. l recursively H-tree estimation equation l Equation developed by ACSEL lab member Nikola Nedovic. l recursively calculates H-tree loading for a given area, number of CSEs in design, and number of H-tree levels. ACSEL Lab University of California, Davis 33

H-tree estimation method ACSEL Lab University of California, Davis 34 H-tree estimation method ACSEL Lab University of California, Davis 34

H-tree estimation method * Table taken from Nedovic, Nikola, Ph. D. Dissertation, UCD, “CLOCKED H-tree estimation method * Table taken from Nedovic, Nikola, Ph. D. Dissertation, UCD, “CLOCKED STORAGE ELEMENTS FOR HIGH-PERFORMANCE APPLICATIONS” ACSEL Lab University of California, Davis 35

H-tree estimation method l Equation reduces to: Load due to CSEs ACSEL Lab University H-tree estimation method l Equation reduces to: Load due to CSEs ACSEL Lab University of California, Davis Load due to wiring 36

Total H-tree power Load switching power Clock driver power ACSEL Lab University of California, Total H-tree power Load switching power Clock driver power ACSEL Lab University of California, Davis 37

SET-CSE synthesis summary with H-tree estimate Area and Power Cell type Area (mm 2) SET-CSE synthesis summary with H-tree estimate Area and Power Cell type Area (mm 2) % Power total (m. W) % total Memory blocks 2. 03 55% 214. 3 66% Core 0. 71 19% 63 19% Clock tree (H-tree estimate) N/A 15% Total 3. 7 ACSEL Lab University of California, Davis 48. 5 325 38

SET-CSE power profile with H-tree estimate ACSEL Lab University of California, Davis 39 SET-CSE power profile with H-tree estimate ACSEL Lab University of California, Davis 39

SET-CSE Core power profile ACSEL Lab University of California, Davis 40 SET-CSE Core power profile ACSEL Lab University of California, Davis 40

Outline l l l l Dual Edge Flip-Flops overview Standard Cell Characterization LEON Synthesis Outline l l l l Dual Edge Flip-Flops overview Standard Cell Characterization LEON Synthesis for SET design LEON Synthesis for DET design Issues with including Dual edge into synthesis flow Preliminary comparisons Conclusions and Future Work Questions ACSEL Lab University of California, Davis 41

Modeling DET-CSEs for Synthesis l Need to model the timing parameters for both edges. Modeling DET-CSEs for Synthesis l Need to model the timing parameters for both edges. ACSEL Lab University of California, Davis 42

Modeling DET-CSEs for Synthesis l Can model complex timing relationships for synthesis. Falling-edge timing Modeling DET-CSEs for Synthesis l Can model complex timing relationships for synthesis. Falling-edge timing arc rising-edge timing arc ACSEL Lab University of California, Davis 43

Modeling DET-CSEs for Synthesis l Synthesis tool will time, and (try to) meet constraints Modeling DET-CSEs for Synthesis l Synthesis tool will time, and (try to) meet constraints for the dual-edge triggered synchronous system. ACSEL Lab University of California, Davis 44

Modeling DET-CSEs for Synthesis l Synthesis tool will use the worst timing arc relationship Modeling DET-CSEs for Synthesis l Synthesis tool will use the worst timing arc relationship for critical path constraint. Critical Not Critical ACSEL Lab University of California, Davis 45

Modeling DET-CSEs for Synthesis tools are not capable of inferring a dual-edge triggered device Modeling DET-CSEs for Synthesis tools are not capable of inferring a dual-edge triggered device from HDL code. l For meeting timing we only care about the strictest constraint anyway. (i. e. for one pair of launch and capture edges). l Unnecessary to model complex timing device. l ACSEL Lab University of California, Davis 46

Modeling DET-CSEs for Synthesis l Simply model DET-CSE as a SET-CSE with worst-edge timing Modeling DET-CSEs for Synthesis l Simply model DET-CSE as a SET-CSE with worst-edge timing parameters. ACSEL Lab University of California, Davis 47

Synthesis flow for DET-CSEs ACSEL Lab University of California, Davis 48 Synthesis flow for DET-CSEs ACSEL Lab University of California, Davis 48

Synthesis flow for DET-CSEs Use synthesis directives to force use of DETCSE modeled device. Synthesis flow for DET-CSEs Use synthesis directives to force use of DETCSE modeled device. l Synthesize for target throughput, not frequency. l Worst-case models for meeting critical-path timing constraints. l generate a worst-case hold model, to verify the race-path. l l Fastest clk-Q with worst-case hold time ACSEL Lab University of California, Davis 49

Modeling DET-CSEs for Synthesis l Race-path modeling. May have under-constrained race-path. ACSEL Lab University Modeling DET-CSEs for Synthesis l Race-path modeling. May have under-constrained race-path. ACSEL Lab University of California, Davis 50

DET-CSE synthesis summary with H-tree estimate Area and Power Cell type Area (mm 2) DET-CSE synthesis summary with H-tree estimate Area and Power Cell type Area (mm 2) % Power total (m. W) % total Memory blocks 2. 03 44% 214. 3 72% Core 1. 65 36% 64 21% Clock tree (det-cse H-tree estimate) @ new freq. N/A 7% Total 4. 64 ACSEL Lab University of California, Davis 20. 2 298. 5 51

DET-CSE power profile ACSEL Lab University of California, Davis 52 DET-CSE power profile ACSEL Lab University of California, Davis 52

DET Core summary Core Area(mm 2) % total core Power (m. W) % total DET Core summary Core Area(mm 2) % total core Power (m. W) % total Sequential (1986 CSEs) 1. 41 85. 5% 22 34% Combinatorial + nets 0. 24 14. 5% 42 66% Total 1. 65 64 Approximately 20 k-gates (based on nand 4) ACSEL Lab University of California, Davis 53

DET-CSE power profile ACSEL Lab University of California, Davis 54 DET-CSE power profile ACSEL Lab University of California, Davis 54

Outline l l l l Dual Edge Flip-Flops overview Standard Cell Characterization LEON Synthesis Outline l l l l Dual Edge Flip-Flops overview Standard Cell Characterization LEON Synthesis for SET design LEON Synthesis for DET design Issues with including DETCSEs into synthesis flow Preliminary comparisons Conclusions and Future Work Questions ACSEL Lab University of California, Davis 55

Issues with DET-CSE integration Memory blocks are single-edge triggered and must be clocked at Issues with DET-CSE integration Memory blocks are single-edge triggered and must be clocked at twice the core clock rate. l Currently using a dual-edge triggered VHDL behavioral model for memory blocks for netlist simulations. l Possible solutions: l l l Clock the memory blocks at 2 x nominal. Modify memory address and data latch to be dual-edge triggered. ACSEL Lab University of California, Davis 56

Outline l l l l Dual Edge Flip-Flops overview Standard Cell Characterization LEON Synthesis Outline l l l l Dual Edge Flip-Flops overview Standard Cell Characterization LEON Synthesis for SET design LEON Synthesis for DET design Issues with including Dual edge into synthesis flow Preliminary comparisons Conclusions and Future Work Questions ACSEL Lab University of California, Davis 57

Power Comparison of two design netlists SPGFF Core Total = 92. 46 m. W Power Comparison of two design netlists SPGFF Core Total = 92. 46 m. W Total = 84. 2 m. W TGMS Core Total = 106. 8 m. W 111 m. W Total = 27 m. W savings 24% power savings in core ACSEL Lab University of California, Davis 58

Summary of comparison 24% savings in core power. l Estimated 28% increase in sequential Summary of comparison 24% savings in core power. l Estimated 28% increase in sequential cell area (17% increase in core area). l Both meet specified performance @ 200 MHz (report zero slack). l ACSEL Lab University of California, Davis 59

Outline l l l l Dual Edge Flip-Flops overview Standard Cell Characterization LEON Synthesis Outline l l l l Dual Edge Flip-Flops overview Standard Cell Characterization LEON Synthesis for SET design LEON Synthesis for DET design Issues with including Dual edge into synthesis flow Preliminary comparisons Conclusions and Future Work Questions ACSEL Lab University of California, Davis 60

Summary l Established methods for automated cell characterization. l Developed design flow for DET-CSE Summary l Established methods for automated cell characterization. l Developed design flow for DET-CSE integration. l Demonstrated pre-layout results. l Obtained functional DET-CSE netlist. l Investigated functionally enhanced DETCSEs (scan, reset). ACSEL Lab University of California, Davis 61

Future work l Expand family of DET-CSEs (i. e. sizings, functionalities) l Obtain more Future work l Expand family of DET-CSEs (i. e. sizings, functionalities) l Obtain more accurate clock tree loading. l Perform layout of cells for more accurate comparison. ACSEL Lab University of California, Davis 62

Functionally enhanced Dual-Edge Triggered Flip-Flops Need to show that functions such as reset, and Functionally enhanced Dual-Edge Triggered Flip-Flops Need to show that functions such as reset, and scan be added to DETCSEs l Need to do analysis of power and performance impact of added functionality l l Do DETCSEs still result in practical power savings? ACSEL Lab University of California, Davis 63

Scan in SPGFF ACSEL Lab University of California, Davis 64 Scan in SPGFF ACSEL Lab University of California, Davis 64

Scan in DFF Functional Schematic of DFF with Scan ACSEL Lab University of California, Scan in DFF Functional Schematic of DFF with Scan ACSEL Lab University of California, Davis 65

Clear in SPGFF ACSEL Lab University of California, Davis 66 Clear in SPGFF ACSEL Lab University of California, Davis 66

Clear in DFF ACSEL Lab University of California, Davis 67 Clear in DFF ACSEL Lab University of California, Davis 67

Preliminary Results of Adding Functionalities Delay Power EDP SPGFF 356 ps 136 μW 1. Preliminary Results of Adding Functionalities Delay Power EDP SPGFF 356 ps 136 μW 1. 73 e-23 Js With Scan 371 ps (4. 2%) 143 μW (5%) 1. 97 e-23 Js (14%) With Reset 407 ps (14%) 140 μW (3%) 2. 32 e-23 Js (34%) Delay Power EDP SETFF 412 ps 82 μW 1. 38 e-23 Js With Scan 483 ps (17%) 82 μW (0%) 1. 89 e-23 Js (37%) With Reset 483 ps (17%) 71 μW (-13%) 1. 65 e-23 Js (20%) ACSEL Lab University of California, Davis 68

Outline l l l l Dual Edge Flip-Flops overview Standard Cell Characterization LEON Synthesis Outline l l l l Dual Edge Flip-Flops overview Standard Cell Characterization LEON Synthesis for SET design LEON Synthesis for DET design Issues with including Dual edge into synthesis flow Preliminary comparisons Conclusions and Future Work Questions ACSEL Lab University of California, Davis 69