af575d3067d6d47fa2c03734af3f7ce2.ppt
- Количество слайдов: 69
Low Power Clocking Through the Use of Dual Edge Triggered Flip-Flops Gabriel Ricardo Theresa Holliday ACSEL Lab University of California, Davis 1
Outline l l l l Dual Edge Flip-Flops overview Standard Cell Characterization LEON Synthesis for SET design LEON Synthesis for DET design Issues with including Dual edge into synthesis flow Preliminary comparisons Conclusions and Future Work Questions ACSEL Lab University of California, Davis 2
Outline l l l l Dual Edge Flip-Flops overview Standard Cell Characterization LEON Synthesis for SET design LEON Synthesis for DET design Issues with including Dual edge into synthesis flow Preliminary comparisons Conclusions and Future Work Questions ACSEL Lab University of California, Davis 3
Symmetric Pulse Generator Flip-Flop (SPGFF) First stage, X and Y, are dynamic, second stage static NAND l Results in small delay l Can size to trade some delay for power l ACSEL Lab University of California, Davis 4
Operation of SPGFF Transparency window created by CLK and CLK 3 for stage 1 (CLK 1 and CLK 4 for stage 2), allows for X (Y) to conditionally evaluate based on input D. l Output stage NAND allows for X, Y to be passed to output based on clock value without the need for a latch. l ACSEL Lab University of California, Davis 5
Transmission Gate Master Slave (TGMS) ACSEL Lab University of California, Davis 6
Comparison between SPGFF and TGMS in 0. 18 um Delay Power EDP Clk load SPGFF 356 ps 133 μW 1. 70 e-23 Js 12 f. F TGMS 354 ps 89. 9 μW 1. 13 e-23 Js 16 f. F ACSEL Lab University of California, Davis 7
Advantages of SPGFF Lowest clock energy of other DET-CSEs, resulting in higher clock power savings l Energy delay product comparable to high performance single edge triggered clocked storage elements l ACSEL Lab University of California, Davis 8
Outline l l l l Dual Edge Flip-Flops overview Standard Cell Characterization LEON Synthesis for SET design LEON Synthesis for DET design Issues with including Dual edge into synthesis flow Preliminary comparisons Conclusions and Future Work Questions ACSEL Lab University of California, Davis 9
Characterization Methodology – Generating synthesis views l Created automated process for generating synopsys liberty format (. lib) synthesis models. l Using perl scripts and gspice (spice pre/postprocessor) l Characterized for timing and energy. l Can easily extend to generate cadence synthesis models (. tlf). ACSEL Lab University of California, Davis 10
Characterization Methodology – Trip-points l l l Used same trip-points as those in technology library. Nominal conditions: 25˚C, 1. 8 V supply Can easily generate best and worst case corner models (over temp and supply variation). Cell delay: defined as clock 50% rise/fall to Output (Q or QN) 50% rise/fall Transition time: 10%-90% rise, 90%-10% fall time ACSEL Lab University of California, Davis 11
Trip-points - Falling ACSEL Lab University of California, Davis 12
Trip-points - Rising ACSEL Lab University of California, Davis 13
Characterization Methodology Drive Characteristics l Build 5 x 5 non-linear delay table. l Clock slope values (nano-seconds) : 0. 03, 0. 1, 0. 4, 1. 5, 3 l Output load values (f. F): 0. 35, 21, 38. 5, 147, 311 ACSEL Lab University of California, Davis 14
Characterization Methodology – Trip-points l Setup time: sweep input transition towards active edge until 10% increase in clock to output delay. l Hold time: sweep input transition away from active edge until 10% increase in clock to output delay. ACSEL Lab University of California, Davis 15
Characterization Methodology – Setup-hold 10% push-out ACSEL Lab University of California, Davis 10% push-out 16
Characterization Methodology – Setup and Hold l Build 3 x 2 non-linear delay table. (3 ps accuracy) l Clock slope values (nano-seconds): 0. 03, 3 l Data slope values (nano-seconds): 0. 03, 0. 9, 3 ACSEL Lab University of California, Davis 17
Characterization Methodology – Internal energy l Characterized over same data points as drive characteristics for internal energy (5 x 5 lookup table). l Data pin, clock pin energy tables generated (1 x 5 lookup table). ACSEL Lab University of California, Davis 18
Characterization Results - single vs dual-edge – D to Q delay SPGFF ACSEL Lab University of California, Davis TGMS 19
What is typical output load? l Extracted output loading from netlist for all CSEs. l Average load = 24 f. F l l 90% l (6. 8 min. inverters) of CSEs have load less than 60 f. F (17 min. sized inverters) ACSEL Lab University of California, Davis 20
Netlist extracted CSE output loading statistics ACSEL Lab University of California, Davis 21
Characterization Results - single vs dual-edge – Delay SPGFF TGMS Typical region of operation ACSEL Lab University of California, Davis 22
Characterization Results – zoomed-in - single vs dual-edge – delay SPGFF ACSEL Lab University of California, Davis TGMS 23
Characterization Results - single vs dual-edge – Energy delay product SPGFF ACSEL Lab University of California, Davis TGMS 24
Outline l l l l Dual Edge Flip-Flops overview Standard Cell Characterization LEON Synthesis for SET design LEON Synthesis for DET design Issues with including Dual edge into synthesis flow Preliminary comparisons Conclusions and Future Work Questions ACSEL Lab University of California, Davis 25
Leon SPARC core configuration ACSEL Lab University of California, Davis 26
Leon SPARC synthesis l Synthesized using TSMC 0. 18 um standard cell library. l Target frequency of 200 MHz l Limit use of single sized D-FF. ACSEL Lab University of California, Davis 27
SET- Synthesis flow ACSEL Lab University of California, Davis 28
SET-CSE synthesis summary Area and Power Cell type Area (mm 2) % Power total (m. W) % total Memory blocks 2. 03 55% 214. 3 72% Core 0. 71 19% 73 24% Clock tree (ideal net) N/A 4% Total 3. 7 ACSEL Lab University of California, Davis 11. 6 299 29
Core summary Core Area(mm 2) % total core Power (m. W) Sequential (1986 CSEs) 0. 47 36% 26 Combinatorial + nets 0. 24 64% 47 Total 0. 71 73 Approximately 20 k-gates ACSEL Lab University of California, Davis 30
Clock tree loading Clock tree components Loading (p. F) Sequential cells (1986 cells) 5. 18 Memory macro cells (6) 1. 37 Wire routing* 11. 4 Total 17. 94 * - based on library wire-load model ACSEL Lab University of California, Davis 31
Clock tree power estimation High-fanout nets are beyond the library’s wire-load models interpolation range. l wire-load models are not meant for estimating balanced distribution nets such as clock nets. l Using library wire-load models for clock tree is not valid. l Use an H-tree estimation equation to obtain a ballpark number. l ACSEL Lab University of California, Davis 32
H-tree estimation equation l Equation developed by ACSEL lab member Nikola Nedovic. l recursively calculates H-tree loading for a given area, number of CSEs in design, and number of H-tree levels. ACSEL Lab University of California, Davis 33
H-tree estimation method ACSEL Lab University of California, Davis 34
H-tree estimation method * Table taken from Nedovic, Nikola, Ph. D. Dissertation, UCD, “CLOCKED STORAGE ELEMENTS FOR HIGH-PERFORMANCE APPLICATIONS” ACSEL Lab University of California, Davis 35
H-tree estimation method l Equation reduces to: Load due to CSEs ACSEL Lab University of California, Davis Load due to wiring 36
Total H-tree power Load switching power Clock driver power ACSEL Lab University of California, Davis 37
SET-CSE synthesis summary with H-tree estimate Area and Power Cell type Area (mm 2) % Power total (m. W) % total Memory blocks 2. 03 55% 214. 3 66% Core 0. 71 19% 63 19% Clock tree (H-tree estimate) N/A 15% Total 3. 7 ACSEL Lab University of California, Davis 48. 5 325 38
SET-CSE power profile with H-tree estimate ACSEL Lab University of California, Davis 39
SET-CSE Core power profile ACSEL Lab University of California, Davis 40
Outline l l l l Dual Edge Flip-Flops overview Standard Cell Characterization LEON Synthesis for SET design LEON Synthesis for DET design Issues with including Dual edge into synthesis flow Preliminary comparisons Conclusions and Future Work Questions ACSEL Lab University of California, Davis 41
Modeling DET-CSEs for Synthesis l Need to model the timing parameters for both edges. ACSEL Lab University of California, Davis 42
Modeling DET-CSEs for Synthesis l Can model complex timing relationships for synthesis. Falling-edge timing arc rising-edge timing arc ACSEL Lab University of California, Davis 43
Modeling DET-CSEs for Synthesis l Synthesis tool will time, and (try to) meet constraints for the dual-edge triggered synchronous system. ACSEL Lab University of California, Davis 44
Modeling DET-CSEs for Synthesis l Synthesis tool will use the worst timing arc relationship for critical path constraint. Critical Not Critical ACSEL Lab University of California, Davis 45
Modeling DET-CSEs for Synthesis tools are not capable of inferring a dual-edge triggered device from HDL code. l For meeting timing we only care about the strictest constraint anyway. (i. e. for one pair of launch and capture edges). l Unnecessary to model complex timing device. l ACSEL Lab University of California, Davis 46
Modeling DET-CSEs for Synthesis l Simply model DET-CSE as a SET-CSE with worst-edge timing parameters. ACSEL Lab University of California, Davis 47
Synthesis flow for DET-CSEs ACSEL Lab University of California, Davis 48
Synthesis flow for DET-CSEs Use synthesis directives to force use of DETCSE modeled device. l Synthesize for target throughput, not frequency. l Worst-case models for meeting critical-path timing constraints. l generate a worst-case hold model, to verify the race-path. l l Fastest clk-Q with worst-case hold time ACSEL Lab University of California, Davis 49
Modeling DET-CSEs for Synthesis l Race-path modeling. May have under-constrained race-path. ACSEL Lab University of California, Davis 50
DET-CSE synthesis summary with H-tree estimate Area and Power Cell type Area (mm 2) % Power total (m. W) % total Memory blocks 2. 03 44% 214. 3 72% Core 1. 65 36% 64 21% Clock tree (det-cse H-tree estimate) @ new freq. N/A 7% Total 4. 64 ACSEL Lab University of California, Davis 20. 2 298. 5 51
DET-CSE power profile ACSEL Lab University of California, Davis 52
DET Core summary Core Area(mm 2) % total core Power (m. W) % total Sequential (1986 CSEs) 1. 41 85. 5% 22 34% Combinatorial + nets 0. 24 14. 5% 42 66% Total 1. 65 64 Approximately 20 k-gates (based on nand 4) ACSEL Lab University of California, Davis 53
DET-CSE power profile ACSEL Lab University of California, Davis 54
Outline l l l l Dual Edge Flip-Flops overview Standard Cell Characterization LEON Synthesis for SET design LEON Synthesis for DET design Issues with including DETCSEs into synthesis flow Preliminary comparisons Conclusions and Future Work Questions ACSEL Lab University of California, Davis 55
Issues with DET-CSE integration Memory blocks are single-edge triggered and must be clocked at twice the core clock rate. l Currently using a dual-edge triggered VHDL behavioral model for memory blocks for netlist simulations. l Possible solutions: l l l Clock the memory blocks at 2 x nominal. Modify memory address and data latch to be dual-edge triggered. ACSEL Lab University of California, Davis 56
Outline l l l l Dual Edge Flip-Flops overview Standard Cell Characterization LEON Synthesis for SET design LEON Synthesis for DET design Issues with including Dual edge into synthesis flow Preliminary comparisons Conclusions and Future Work Questions ACSEL Lab University of California, Davis 57
Power Comparison of two design netlists SPGFF Core Total = 92. 46 m. W Total = 84. 2 m. W TGMS Core Total = 106. 8 m. W 111 m. W Total = 27 m. W savings 24% power savings in core ACSEL Lab University of California, Davis 58
Summary of comparison 24% savings in core power. l Estimated 28% increase in sequential cell area (17% increase in core area). l Both meet specified performance @ 200 MHz (report zero slack). l ACSEL Lab University of California, Davis 59
Outline l l l l Dual Edge Flip-Flops overview Standard Cell Characterization LEON Synthesis for SET design LEON Synthesis for DET design Issues with including Dual edge into synthesis flow Preliminary comparisons Conclusions and Future Work Questions ACSEL Lab University of California, Davis 60
Summary l Established methods for automated cell characterization. l Developed design flow for DET-CSE integration. l Demonstrated pre-layout results. l Obtained functional DET-CSE netlist. l Investigated functionally enhanced DETCSEs (scan, reset). ACSEL Lab University of California, Davis 61
Future work l Expand family of DET-CSEs (i. e. sizings, functionalities) l Obtain more accurate clock tree loading. l Perform layout of cells for more accurate comparison. ACSEL Lab University of California, Davis 62
Functionally enhanced Dual-Edge Triggered Flip-Flops Need to show that functions such as reset, and scan be added to DETCSEs l Need to do analysis of power and performance impact of added functionality l l Do DETCSEs still result in practical power savings? ACSEL Lab University of California, Davis 63
Scan in SPGFF ACSEL Lab University of California, Davis 64
Scan in DFF Functional Schematic of DFF with Scan ACSEL Lab University of California, Davis 65
Clear in SPGFF ACSEL Lab University of California, Davis 66
Clear in DFF ACSEL Lab University of California, Davis 67
Preliminary Results of Adding Functionalities Delay Power EDP SPGFF 356 ps 136 μW 1. 73 e-23 Js With Scan 371 ps (4. 2%) 143 μW (5%) 1. 97 e-23 Js (14%) With Reset 407 ps (14%) 140 μW (3%) 2. 32 e-23 Js (34%) Delay Power EDP SETFF 412 ps 82 μW 1. 38 e-23 Js With Scan 483 ps (17%) 82 μW (0%) 1. 89 e-23 Js (37%) With Reset 483 ps (17%) 71 μW (-13%) 1. 65 e-23 Js (20%) ACSEL Lab University of California, Davis 68
Outline l l l l Dual Edge Flip-Flops overview Standard Cell Characterization LEON Synthesis for SET design LEON Synthesis for DET design Issues with including Dual edge into synthesis flow Preliminary comparisons Conclusions and Future Work Questions ACSEL Lab University of California, Davis 69


