
1c89a3ab44c19c44a26bd6aeee3a0fc1.ppt
- Количество слайдов: 43
Deep Submicron Logic / Layout Synthesis 1999. 11 Jun Dong Cho Sungkyunkwan Univ. Dept. ECE Mail : Jdcho@skku. ac. kr Homepage : vada. skku. ac. kr 1
Agenda n Design Methodology n Recent Approaches in Logic / Layout Synthesis n EDA Vendor & Their Tools n Conclusion 2
Design Methodology n Introduction : DSM Design Dilemma n Current Design Methodology n Recent Approaches in Design Methodology n Floorplan Approach n Super Glue Approach n Fixed Timing Approach n Simultaneous Optimization Approach 3
Introduction : DSM Design Dilemma n As physical feature sizes decrease, the time delay of electrical signals traveling in the interconnect between active devices and gates is approaching the delay through the devices and gates. Therefore, the parasitic information (resistance and capacitance) of the interconnect is absolutely critical to predicting circuit performance. n The key to solving this problem is knowing more about the physical design, i. e. placement and estimated interconnect, early in the design cycle. n Iterations between synthesis and layout increase dramatically due to timing and routability problems. n Most current VLSI tools could not handle the new problems, such as accurate RC extraction, transmission line effect and coupling effect, raised by deep submicron technology properly. Even models for VLSI ASIC designs, such as timing delays, routability, size and power dissipation will need to be modified or to be improved. 4
Introduction : DSM Design Dilemma n n n Although there is no official line for what constitutes a deep submicron, the term generally refers to a CMOS device whose minimum logic gate length is 0. 5 um or smaller. Deep submicron technology gives the chip manufacturers' ability to put more gates in chips and increase the density of chips. These make chips more powerful and smaller. Most current VLSI tools could not handle the new problems, such as accurate RC extraction, transmission line effect and coupling effect, raised by deep submicron technology properly. Even models for VLSI ASIC designs, such as timing delays, routability, size and power dissipation will need to be modified or to be improved. In non-submicron integrated circuits that do not require high clock operation speed, minimum-width line can be used for clock distribution. Since the difference of logic gate delays in the signal paths dominates the clock skew, wire length does not affect the clock skew much. The interconnect wire delay is not a big issue. Under these conditions, the rule of thumb is to use the same number of identical buffers for each signal path, such that every component will experience the same logic gate delay. 5
Current Design Methodology 6
Floorplan Approach n Exsiting EDA Vendor n n n Particularly emphasize the floorplan Iterations between different tools Traditional Floorplan n No flexibility to fix timing problems caused by long wires n Overly constrained timing budgets n May fail at timing closure n Adds many buffers and oversizes gates on critical nets. 7
Super Glue Approach n Attempts to glue Front End and Back End n n n Performs floorplan, placement, routing, timing verification in advance Optimize few variables Merely moving the closure issue Little correlation with final Back End Difficult feedback to Front End 8
Fixed Timing Approach n n n n Attempts to break the problem Early aggressive timing optimization Based on simple conservative models Timing is set, back-end left for later Sub-Optimal area and power results Trades one problem to another Difficult to extend to other optimizations 9
Simultaneous Layout Optimization Approach n Simultaneous Placement and Global Optimization n n n n Placement Routing Timing Logic Optimization Clocks Power Crosstalk Additional Effects 10
Recent Approaches in Logic / Layout Synthesis n Layout Driven Logic Synthesis n n Post Routing Optimization n n On The Behavior of Congestion Minimization During Placement[ISPD`99] Control Logic Layout Synthesis n n Post-Routing Optimization with Routing Characterization[ISPD`99] Congestion Minimization n n Wireplanning in Logic Synthesis[ICCAD`98] C 5 M- A Control-Logic Layout Synthesis System for High-Performance Microprocessor[TCAD`98] RTL Logic / Layout Synthesis n Wave Steering in YADDs : A Novel Non-Iterative Synthesis and Layout Technique[ISPD`99] 11
Layout Driven Logic Synthesis (1) Main Feature n Adoption opposite approach to conventional logic synthesis : Logic synthesis to optimize only for interconnect delay, ignoring the effect of gate delays. n Based on the simple observation that if an output “o” depends on an input “I”, then the best way to connect “I” to “o” is through a path which is monotonic from “I” to “o” : no diversions in the path from “I” to “o”. n Conventional logic synthesis can produce a circuit for which it is impossible to find a placement with no diversions in the input-output paths. n “Illegal node” : a node is illegal if it can not be placed somewhere on the die without causing a diversion in the circuit. n The proposed approach has the advantage that it still maintains a distinction between the logic synthesis and place & route stages. It does not need to tightly couple synthesis and placement by frequently alternating between the two which can be inefficient and may not converge at all. 12
Layout Driven Logic Synthesis (2) Problem Description n Assumption n n I/O Pin-to-Pin delay model (IP-based synthesis vs slack-based synthesis) n n n The die is represented by a rectangle with width and height. The given logic circuit is pin-assigned. The delay of a path is linear function of its length (In general, the interconnect delay depends on quadratically on the length of the interconnect, but, it can be made linear by buffer insertion and wire sizing). Particularly suited for intellectual property(IP) blocks. Arrival time of the pins are not known in advance. Thus, we aggressively minimize the delays for all I/O paths. Objective Function : Minimize the longest path of the circuit 13
Layout Driven Logic Synthesis (3) n Synthesis Network n n n Placement Tool places node z to minimize the longest path from b to y 1 & y 2. Decomposed Network n n n Not good for placement(longest path exist). Y 2 is independent of b, therefore b can be removed from the support set of y 2 Path from c to y 1 is greater than its Manhattan distance Optimal Placement Synthesis n n Each node has a short Manhattan distance Aim is to guide logic synthesis such that it produces a circuit which is good for placement 14
Layout Driven Logic Synthesis (4) Constraint Generation n Region Placement Constraint n n Partition the die into rectangles along the pin position Labels each region with functions that can be placed in it Each region is labeled with a set of placement constraint(Support set & transitive primary outputs). r 3: {c, d} is support of y 2, and {a, b} is support of y 1. n Node Placement Constraint n n n Label each node with a placement constraint The node placement constraint of node n denotes the support set of n & its transitive primary outputs Can be easily computed by traversing the Boolean network in BFS manner. 15
Layout Driven Logic Synthesis (5) Legalize & Synthesis n Make Legal n n n Constraint-Driven Synthesis n n Legal : the node n is legal if there is a region r where n can be placed. Minimize the number of new Bloolean nodes created. Traverse the Boolean network in a reverse topological order (after fanout visited, node is visited) Sees an illegal node. Collapse the node into its fanouts until the node becomes legal. Optimize the network such that we get a minimum literal legal Boolean network. Fast Extract : finds for a two-cube divisor or a two-literal cube that reduces the most number of literals in every iteration. Resubstitution : a node n is resubstituted into an other node x if n divides x(if the legality of n is preserved) Produce a legal Boolean network, which can be placed s. t. every path is monotonic. 16
Post Routing Optimization (1) Main Feature [Avante’ 99] n n n The delay due to parasitic of wire routing becomes non-ignoring factor under 0. 25 um Traditional back-annotation approach can’t solve the timing problem because of inaccurate delay estimation Many iteration occurs b/w synthesis and layout Timing convergence is not guaranteed Timing Fidelity Before and After Routing n n Minimal rectilinear spanning tree or Steiner tree is usually used to estimate the wire load and delay for the interconnects of a placement VDSM Design n n Routing congestion is more severe so, Wires have to detour Coupling effect is usually large Timing discrepancies b/w pre- and post- routing Main cause of Timing discrepancy n n Coupling effect Routing pattern 17
Post Routing Optimization (2) Coupling Effect & Routing Pattern Effect n Coupling Effect n n Increase signal delay Introduces noise over neighboring wires Dominates the wire load Assumes coupling cap. exists b/w neighboring parallel wires only n Routing Pattern Effect n n MST, MRST : lower bound of total wire length The more routing congestion, the larger detouring nets, the larger the timing discrepancy 18
Post Routing Optimization (3) Routing Characterization n Coupling characterization n n Divide layout floorplan into 3 D routing plan of small regions Routing congestion 19
Post Routing Optimization (4) Routing Pattern Prediction n A routing pattern prediction is required to predict which regions the final routing will go through. This prediction can be passed down to a detailed router to guide the final routing. Routing Pattern Problem(RPP) n Given a routing graph Find set S of connected regions which cover all terminals of this net with objective to n Subject to capacity constraints n 20
Post Routing Optimization (5) Main Feature n Cluster Selection n n Seed selection : choose gates in the range of critical slack. Selection criteria : 1) criticality of the gate, 2) difference among the arrival times of the inputs to the gates, 3) number of fan -in’s and fan-out’s, 4) congestion of the neighboring area. Grouping : cluster the adjacent instances to the seed selection to form a partition User-specified window size is given to control the logic change within a localized area. Incremental placement & logic optimization is performed. 21
Post Routing Optimization (6) Routing Characterization n Routing Characterization / Prediction n When a cluster is transformed and placed, the routing of changed nets will be predicted based on the characterization n Timing analyzer may use the coupling capacitance to estimate the timing after routing n If the change improves the timing, it will be committed, the routing tree will be updated. 22
Congestion Minimization (1) Main Feature n Automated cell placement for VLSI circuits has always been a key factor for achieving designs with optimized area usage, wiring congestion and timing behavior. As technology advances, the congestion problem becomes more important. n Congestion in a layout means too many nets are routed in local regions. n With the advent of over-the-cell routing the goal of every place and route methodology has been to utilize area to prevent spilling of routes into channels. n Multiple routing layers have enough routing resources to route most wires as long as there are not too many wires congested in the same region. n Excessive congestion will result in a local shortage of the routing resources. 23
Congestion Minimization (2) Definition of Congestion Cost n Global bin(bin) : partition a given chip into several retilinear regions. n Routing demand( ) : number of the nets crossing edge n Routing supply of a global edges( : function of the length of e(fixed value) n Overflow ( ) : exceeding amount of routing demand n Measure of congestion n Total overflow of a placement n ) Number of congested edges 24
Congestion Minimization (3) Consistent Model n Congestion cost is router dependent. n Congestion is dependent of the wiring cost. n “Consistent” n Two routing models are defined to be “consistent” if the total weighted length of the routes are the same. n A model for net consist of a set of segment. Each segment has a length and a weight. The total weighted length for net is n Total weighted length for all net is 25
Congestion Minimization (4) Congestion Distribution n Correlation between Wirelength and Congestion n Total wirelength of a layout is equal to the total routing demand on all global edges n Maximum routing demand is greater than or equal to the total wirelength divided by the number of global edges. : Average routing demand 26
Congestion Minimization (5) Theoretical Congestion Distribution n Theoretical analysis n Expected number of wires crossing global edge e 27
Congestion Minimization (6) Objective Function n An effective congestion objective should be sensitive to placement moves and directly related to the congestion cost n Objective function n n Direct overflow cost of this move n Cost = 0 when n Cost is close to when n n Suppose the routing demand of e is before a move and after the move. Don’t care(no congestion) when Overflow with Look-ahead 28
Control Logic Layout Synthesis (1) Main Feature n High-performance control logic is sometimes implemented via custom (manual) layout. Custom layout methods result in good eletrical and area characteristics. n n n Standard-cell and other fixed-library ASIC-like methods are often employed for control logic n n Productivity is very poor. Using custom design for control logic is often a high-risk strategy because the reaction time to changes is long. Design turn around time using these methods is very fast and top-down constraints are accommodated well. Overhead required to create the fixed cell library is substantial. A poor timing/area/power tradeoff can occur. C 5 M : a new layout system for high-performance control logic which has been successfully used in the design of 400 MHz IBM processor. 29
Control Logic Layout Synthesis (2) C 5 M Approach n n n C 5 M generates hierarchical row-based macros for static CMOS logic. Schematic independence and device-sizing tuning are accomplished via on -the-fly leaf-cell synthesis Flow n n n The macro HDL description is compiled into a gate-level schematic via logic synthesis. The synthesis target library consists of parameterized gate schematics and delay rules(no layout data) Performance is optimized through manual or automatic device-size tuning The tuned schematic is restructured for cell generation through gate combining and splitting The leaf cells are synthesized to a macro-specific cell image The macro is assembled according to macro image 30
Control Logic Layout Synthesis (3) Leaf Cell Generation n The leaf-cell schematic is converted into a symbolic layout using CCC(IBM cell compiler) n n n The symbolic layout is converted into a physical layout using CC(IBM layout compactor) n n n CCC operates by first splitting the devices in the schematic according to the maximum finger size, using selectable split strategies. Placement engine accommodates multiple objectives like minimum diffusion breaks, maximum gate alignment, minimum wire length, minimum number of contacts etc. Uses the constraint-based, 1 D model. Constraint-graph generation Critical-path analysis Wire-minimization Cell-image : the result of C 5. It assures the cells can be readily assembled, cell boundary are regularized to enable cell abutment, cell wiring is controlled to facilitate macro wiring. 31
Control Logic Layout Synthesis (4) Macro Assembly n n C 5 M uses a row-based macro assembly style Placement is performed by an IBM Chip. Place: Qplace(Quadratic programming model) n n n Not restricted to row-based models. Timing driven placement support A number of functions for controlling cell placement through constraints or ojectives Signal Wiring is created using an IBM LGWire(maze router) Macro Image n n n Controls the top level physical design Specifies pin assignments, bussing structure, macro shape, macro wiring porosity, row structure and configuration of special sub-macros Size and pin data are automatically imported from the floorplan and is constructed by an automatic uitility that is parameterized with respect to the mask levels Bussing structure is a grid Power/Ground(M 1), Vertical Wire(M 2), Horizontal Wire(M 3) 32
PTL Logic / Layout Synthesis (1) Introduction n YADD based layout technique n n Linearized, pseudo-symmetric binary decision diagram based synthesis of a function Can be directly mapped to pass transistor logic with very highly predictable delay and area Based on low granularity 2 -phase pipelining Advantage n n n Routing by abutment Avoiding interconnect related parasitics Delay : Cell delay Equalize the delays of the different paths to very small margins of spread Be able to “Wave Steer” the circuits The obvious limitation n n The size of layout can be more than the standard cell implementation’s In some cases the latency of our implementation can be more than that of the standard cell one though the clocking frequency can still be high because of the coexistence of multiple data waves Will not be good for feedback systems Will be good for data path circuits 33
PTL Logic / Layout Synthesis (2) Topology of Synthesized YADD Structure (1) n LBDD n n n Defines as an Ordered BDD which grows linearly in the number of nodes per level C 2 and C 3 in the 3 rd level can be merged Not every function can be represented by LBDD 34
PTL Logic / Layout Synthesis (3) Topology of Synthesized YADD Structure (2) n PSBDD(Pseudo-Symmetric BDD) n n Allows for multiple levels labeled with the same variable Created by repeated application of Shannon’s expansion Merging adjacent non-conflict nodes and/or join operation Has a regular structure and can be directly mapped to layout 35
PTL Logic / Layout Synthesis (4) Topology of Synthesized YADD Structure (3) n YADD(Yet Another Decision Diagram) n n n Generalization of the PSBDD and LBDD Unrestricted ordering of child nodes of a parent Two adjacent nodes in a level that can be merged Any leaf node must be present only at the lowest level of the structure Exterior don’t cares : Process of joining cofactors and repeating variables creates don’t cares which can be useful in the subsequent level 36
PTL Logic / Layout Synthesis (5) Topology of Synthesized YADD Structure (4) n Exterior don’t care n n n Two adjacent nodes in a level are in conflict and any reordering of the parent nodes cannot merge them : not solvable Assign some care values to exterior don’t care : merging is possible Interior don’t care n n When both the parents is same More powerful than exterior don’t care 37
PTL Logic / Layout Synthesis (6) Topology of Synthesized YADD Structure (5) n Algorithm for generating YADDs n n n Goal : generate YADD from a logic specification Cost function : min. the number of level of YADD Input : blif / Output : YADD Variable selected : max. the number of don’t care minterm pairs after the merging During any joining ops, the algorithm tries to create more interior don’t cares 38
PTL Logic / Layout Synthesis (7) VLSI Realization of the YADD n Implementation n n Regular two-dimensional structure of the YADD : entire structure can be mapped directly to silicon by the simple expedient of replacing every node by a pass transistor logic MUX and an inverter Why inverter? : In n-FET transistor, signal degradation of logic high signal in input occurs n n Have faster rise and fall times Carry out voltage restoration and improve noise margin Size them selectively to equalize the different path delays Requirement for PTL circuits n n n No output should remain floating for any combination of inputs There should be no sneak paths in the circuits To make ‘safe buffer insertion’, should not keep any internal node floating 39
PTL Logic / Layout Synthesis (8) Physical Layout Details (1) n Why use 2 phase clock scheme? n Inputs are clocked simultaneously at a higher frequency to make many waves coexist in the structure, data will be corrupted 40
PTL Logic / Layout Synthesis (9) Physical Layout Details (2) n Why use D-FF? n n n Delay logic values by integer number of clock periods L YADD depth, we have (L-1)/2 FF at the root level and 0 at lowest level The number of FFs increases by 1 every 2 level 41
PTL Logic / Layout Synthesis (10) Physical Layout Details (3) n FF Cell n n n Skewing of input data to provide time alignment Compact, low power, dynamic shift register cell used Driver n n Convert from dynamic to static logic Subsequent inverter and static CMOS inverter pair 42
References & Suggested Readings n n n n n [1] John A. Chandy, Prithviraj Banerjee. A Parallel Circuit-Partitioned Algorithm for Timing Driven Cell Placement. Proceedings of the 1997 IEEE International Conference on Computer Design : VLSI, 1997 [2] Wilsin Gosti, Amit Narayan, Robert K. Brayton, Alberto L. Sangiovanni-Vincentelli. Wireplanning in Logic Synthesis, Proceedings of the IEEE/ACM International Conference on Compter Aided Design, 26 -33, 1998 [3] Burns JL, Feldman JA. C 5 M - A CONTROL-LOGIC LAYOUT SYNTHESIS SYSTEM FOR HIGHPERFORMANCE MICROPROCESSORS. IEEE Transactions on Computer-Aided Design of Integrated Circuits & Systems, V. 17 N. 1, 14 -23, 1998. [4] Peichen Pan. Performance-driven integration of Retiming and Resynthesis. Proceedings of the 36 th ACM/IEEE conference on Design automation conference, 1999. [5] Chieh Changfan, Yu-Chin Hsu, Fur-Shing Tsai, Post-Routing Timing Optimization with Routing Congestion. Proceedings of the 1999 international symposium on Physical design, 1999. [6]Maogang Wang, Majid Sarrafzadeh, On the Behavior of Congestion Minimization during Placement Proceedings of the 1999 international symposium on Physical design, 1999. [7] Kahng AB, Robins G, Singh A, Zelikovsky A. Filling algorithms and analyses for layout density control. IEEE Transactions on Computer-Aided Design of Integrated Circuits & Systems, V. 18 N. 4, 445 -462, 1999 [8] Chang SC, Cheng KT, Woo NS, Mareksadowska M. POSTLAYOUT LOGIC RESTRUCTURING USING ALTERNATIVE WIRES. IEEE Transactions on Computer-Aided Design of Integrated Circuits & Systems, V. 16 N. 6, 587 -59656, 1997 [9] Salek AH, Lou JN, Pedram M. An integrated logical and physical design flow for deep submicron circuits. IEEE Transactions on Computer-Aided Design of Integrated Circuits & Systems, V. 18 N. 9, 1305 -1315, 1999 [10] Gaj K, Herr QP, Adler V, Krasniewski A, Friedman EG, Feldman MJ. Tools for the computeraided design of multigigahertz superconducting digita circuits, IEEE Transactions on Applied Superconductivity, V. 9 N. 1, 18 -38, 1999. 43
1c89a3ab44c19c44a26bd6aeee3a0fc1.ppt