8159b584651ea1498944457808e688fa.ppt
- Количество слайдов: 51
EDA Court: Hierarchical Construction and Timing Sign-off of So. Cs TAU 2013 Panel
The good side of hierarchy Chip (h=0) Chiplet (h=1) Core (h=2) …k …k Macro (h=4) …k …k Unit (h=3) …k …k …k
Impact of pruning h=0 Fraction of chip at top h=1 2 h= 3 h= 4 h= 5 h= Unpruned fraction Sweet spot: 50 B objects 2 M per macro Fraction at top = 4 e-5 4 levels of hierarchy Pruning = 93%
The bad side of hierarchy o Accuracy? Pessimism? n n n o o o Coupling noise? Functional noise? Multiple interacting clocks? Parasitics on boundary nets? Is “context” required? If so, we cannot “shelve and re-use” macros Construction flow? Draconian methodology restrictions?
Chandu Visweswariah Distinguished Engineer IBM East Fishkill, NY chandu@us. ibm. com Oleg Levitsky Solutions Architect Cadence San Jose, CA oleg@cadence. com Qiuyang Wu Senior Staff Engineer Synopsys Hillsboro, OR qwu@synopsys. com Amit Shaligram Principal Engineer STMicroelectronics Scottsdale, AZ amit. shaligram@st. com Guntram Wolski Principal Engineer Cisco San Jose, CA gwolski@cisco. com Larry Brown Design Center Engineer IBM San Jose, CA lmbrown@us. ibm. com Alex Rubin Senior Engineer IBM San Jose, CA rubin 1@us. ibm. com Alexander Skourikhin EDA Engineer Intel Haifa, Israel alexander. skourikhin@intel. com Igor Keller Senior Architect Cadence San Jose, CA ikeller@cadence. com
Panel plan Charge 1: Hierarchical implementation and hence hierarchical timing sign 10 min -off don’t have a future Plaintiff: Oleg Levitsky, Cadence Defendant: Qiuyang Wu, Synopsys Charge 2: EDA tools and flows are inadequate for a construction flow: 10 min budgeting, IP models and hierarchical constraint development are lacking Plaintiff: Amit Shaligram, STMicro. Defendant: Alex Rubin, IBM Charge 3: You can never really close out-of-context + 10 min Misdemeanor charge: too much additional complexity and software Plaintiff: Guntram Wolski, Cisco Defendant: Alexander Skourikhin, Intel Charge 4: hierarchical timing cannot handle multiple interacting 10 min synchronous clocks Plaintiff: Larry Brown, IBM Defendant: Igor Keller, Cadence 30 min Discussion and audience questions 5 min Verdicts and “damages”
Charge 1: Hierarchical implementation and hence hierarchical timing sign-off don’t have a future Plaintiff: Oleg Levitsky, Cadence Defendant: Qiuyang Wu, Synopsys
Evolution of design flow Prototype Implement Sign Off
Evolution of design flow Prototype Implement Sign Off
Evolution of design flow Prototype Blk 1 Blk 2 … Implement Sign Off Blkn
Evolution of design flow Prototype Blk 1 Blk 2 … Quiz: Why hierarchical flow? Blkn Create more work for managers Contribute to real estate bubble Implement Control time to market schedule? Blk 1 Blk 2 … Sign Off Blkn
Hierarchical design flow Prototype Complexity Blk 1 Blk 2 … Blkn Implement Blk 1 Blk 2 … Sign Off Hierarchical scalability Blkn
Hierarchical design flow Prototype Step 1 Step 2 Blk 1 Blk 2 … Blkn … Implement Step n tapeout Blk 1 Blk 2 … Sign Off Blkn Flow convergence is a key
Hierarchical design flow Prototype o Technical challenges: n Blk 1 Blk 2 … Blkn Implement Convergence n n n Blk 1 Blk 2 … Blkn n Sign Off n n SI Over the block routing Useful skew distribution CPPR modeling Power budgeting Channeless designs … o Human factor: n n n Level of expertise Human error Lack of sleep
Hierarchical design flow Prototype Complexity Blk 1 Blk 2 … Blkn Implement Convergence Blk 1 Blk 2 … Sign Off Blkn Hierarchical scalability Failed to control TTM
What is the alternative?
Charge 1: Hierarchical implementation and hence hierarchical timing sign-off don’t have a future Plaintiff: Oleg Levitsky, Cadence Defendant: Qiuyang Wu, Synopsys
Hierarchical Design and Timing Closure is the Only Way to Have a Future Qiuyang Wu Sr. Staff Engineer, Synopsys Inc. March 2013 © Synopsys 2013 18
Hierarchical Implementation is Proven • Way back when in the last century – Designs grew beyond the reach of flat implementation – Established hierarchical methodologies, tried, and true • The success will continue because – naturally an iterative and gradual refinement process – relatively larger error margins and tolerances for tradeoff – more about reuse and integration, less about from scratch – … +100 M Gates +1 M Gates
But, “Classic” Hierarchical Timing is Inadequate for Signoff Gap #1 - Burden is on the users: “Garbage in, garbage out” – Block designers do not have quality constraints Can’t close block timing with confidence: pessimism, optimism Can’t create quality models: pessimism, optimism Gap #2 - Language limitations: critical details can’t be elaborated – Chip level designers do not have means to express design intention Can’t describe I/O timing context accurately and completely Can’t cover different reuse scenarios chip netlist chip parasitics TOP Inst The rescue: flat signoff. Full chip golden constraints Hier STA Flat STA (golden) block netlist parasitics Block constraints (ad-hoc) Block ILM, ETM, glass-box, black-box, … However, hierarchical signoff is the only way to stay on top of the technology curve.
And Here is How to do Hierarchical Signoff • The Recipe on Top of Signoff Quality Engine • Provide hierarchical constraint management – Check and highlight inconsistencies • Provide context feedback and allow refinement – Produce accurate and elaborate timing environment • Provide Ease-of-Use through data / flow automation – Minimize/prevent user errors by construction • The Benefits Go Beyond Signoff – Design faster: throughput and interoperation with implementation – Design better: accuracy enables further optimization for power, leakage, robustness, area, etc.
Charge 2: EDA tools/flows are inadequate for a construction flow: budgeting, IP models, hierarchical constraint development are lacking Plaintiff: Amit Shaligram, STMicroelectronics Defendant: Alex Rubin, IBM
Hierarchical Constraints & Budgeting Amit Shaligram, Principal Engineer STMicroelectronics
Models – Accuracy, speed and compatibility • Which model to use? • ETM or. lib – Reasonable for use before clock tree. • ILM – Required after clock tree insertion • Model accuracy • Different modes at block and top level, block/top constraint mismatches • Handling of high fanout and static nets • Model compatibility • Models between different vendors/tools are not compatible. • Some tools create “physical ILMs” others only “timing ILMs” • It takes time. . • For a ~2 M instance block: 1 scenario (1 mode/1 corner), it takes ~6 -8 hours • Quickly becomes impractical with 25 blocks, ~5 modes and ~16 corners • Can someone create models on the fly? Just use the DEF! Presentation Title 24
Budgeting • Floorplan and constraints – a chicken and egg problem! • Estimation of feedthru delays can be challenging. • Consider crosstalk effect! • Best practices not easy to follow all the time (FF at the boundary) • Critical path from a macro, legacy design, cannot tolerate extra latency • Managing hold violations with FF at the boundary • Uncommon clock path creates hold violations due to OCV impact. • SDC format limitations after clock tree insertion • Input/Output delay is specified with respect to virtual clock • Latency of virtual clock changes with every step of the flow (post. CTS, post. Route. SI) Presentation Title 25
Hierarchical Constraints • Top down or bottom-up constraints development flow ? • How to ensure that block and top constraints are aligned? • Constraint modifications required when using. lib or ILMs in top level • Generated clock definitions inside blocks create “new internal” clocks/pins • Handling large constraint files created within ILM generation flow(s) • Boundary conditions for hold? • How to estimate set_min_delay accurately? • Crosstalk effects of top level clock tree • How much margin is too much margin inside the blocks? • Using infinite timing windows inside the blocks is an overkill Presentation Title 26
Charge 2: EDA tools/flows are inadequate for a construction flow: budgeting, IP models, hierarchical constraint development are lacking Plaintiff: Amit Shaligram, STMicroelectronics Defendant: Alex Rubin, IBM
Living in a flat world? March 27, 2013
Long list of charges that simply don’t stick… Many teams have used hierarchy successfully to tape out designs! – Large problems require the use of “divide and conquer”. Vast amount of design experience, understanding and overcoming practical challenges. Tools help establish hand-shake across hierarchical levels. – Verification of boundary conditions and assumptions. – Automatic constraint generation and management. – Enforcement of best design practices. Significant body of “do’s and don’ts” to help provide guidance, improve efficiency and reduce pessimism.
Follow best hierarchical design practices Isolate output loading from internal paths! Flop bound the design! Simple rules can make hierarchy easy(er)! Macro A Flop 1 D Q CLK Macro B Flop 2 D Q CLK Avoid critical paths crossing boundaries! Use single macro clock input!
Hierarchy is a “must have”! 44 M Objects! 5 X Speedup 10+ days Run time (hours) Object count per unit Deterministic Timing Statistical Timing Parallelizes timing and optimization of independent paths to improve over -all efficiency. Better supports timing closure when different macros / top level are at different “stages” of completeness. Fosters un-interrupted design fix-up loop. More resilient to failure.
Charge 3: You can never really close out-of-context + Misdemeanor charge: too much additional complexity and software Plaintiff: Guntram Wolski, Cisco Defendant: Alexander Skourikhin, Intel
Hierarchical Timing Felonies or Misdemeanors? Guntram Wolski – Cisco Systems Principal Engineer Enterprise Networking Group 33
• You can come close, but that only counts in …. . Or if you start worst casing things, you’ve overdesigned… • You can set goals/targets for blocks, but then reality sets in. You end up opening block as it is the “right thing to do” in order to close. • Multiple instances of same core How do you wire over/through the cores? Wiring bays – what if you don’t have enough in some areas? Wire over the top == create new extraction/unique timing problems. Noise issues Every instance doesn’t have same IR drop/noise profile 34
• Requires strict PD requirements to be effective Very strict methodology to be effective Need flopped boundaries Long distance routes/fly overs need extra handling or pushed down Legacy designs/IP integration cause immediate loss of benefit Integration/Adopt complexity seems more so than with other tools Logic designers have very little interest in helping PD It’s good enough, live with it. I’m not paid to improve your problems, I just meet timing. I have to work on something else, you have to fix it. • Are we leaving performance on table? Subchips need to be designed to guardbanded conditions on I/Os and IR drop 35
• Why are we not looking at taking advantage of parallelism? Are these not many individual paths? If DRC can run on 120 cpus and benefit, why can’t timing? Break up the problem and distribute to my farm…. 36
Charge 3: You can never really close out-of-context + Misdemeanor charge: too much additional complexity and software Plaintiff: Guntram Wolski, Cisco Defendant: Alexander Skourikhin, Intel
Defense • Timing closure is an iterative process • Controllability is the key for success • Start from initial spec • Once design is getting mature, gradually refine environmental requirements and increase model accuracy • Finally, you see the “real” timing requirements, avoiding overdesign • Non-overdesigned multi-instantiated blocks are reality • Must see all the requirements (timing, parasitics) w/o worst casing • Clocks handling is the real challenge • Noise is never an issue (at most – make worst case between instances) • Reusable IPs are feasible • Have to use accurate block models (adjustable to a new env. ) • Have to apply design restrictions on interfaces
Defense (cont. ) • Have to apply methodological restrictions to block interfaces • Driver size, wire length, ports, etc. • All of them are manageable and ease integration on top level • Doesn’t necessarily lead to overdesign, due to accurate block models • Applicable to both flop and latch based designs • Timing analysis is highly parallelizable • Individual block analysis is naturally done in parallel • Top level analysis might • leverage multi-threading technologies in STA algorithms • be divided in clusters and every cluster is analyzed in parallel
Summary • Efficient and Reliable Hierarchical Flow requires two essential factors: • A robust project methodology, which • Enforces design restrictions • Takes advantage of IP Reuse • Provides continuous timing picture throughout all project phases • Allows productive ECO work • Advanced EDA tools, which • Are flexible and allow controllability between accuracy and simplicity • Can efficiently handle Multi-X environments (X=system, corner, clocks, etc. ) • Utilize parallel computing techniques • Support batch and ECO modes
Charge 4: Hierarchical timing cannot handle multiple interacting synchronous clocks Plaintiff: Larry Brown, IBM Defendant: Igor Keller, Cadence
Hierarchical timing cannot handle multiple interacting synchronous clocks o Define the problem:
Definition continued o o o If clk 1 X is later than clk 2 X, we reduce our setup margin. If clk 1 X is earlier than clk 2 X, we reduce our hold margin. n We don’t know the real relationship between the two clocks until we have our top level established. o This makes it difficult to close timing on the logic macro and “put it on the shelf. ” The problem is magnified if the logic macro is re-used. n In that case, the setup and hold margins of the logic macro must span all existing clk 1 X-clk 2 X relationships.
Fixes from timing methodology o o Option 1: Assert an uncertainty between clk 1 X and clk 2 X in macro timing, and validate this uncertainty when running top level timing. n Problem with this: o Leave performance/area on the table by lowering cycle time and/or over-padding hold fails. o If top level can’t meet this requirement, we must open up logic macro for further work. Option 2: ? ? ?
The best solution: Fix the design Update the design so we do not have multiple synchronous clock inputs in the first place.
Conclusion Perhaps it’s more accurate to say that hierarchical timing can handle multiple synchronous clock inputs, but cannot do this without leaving performance and/or area on the table. In other words, it does not lead to the most efficient design.
Charge 4: Hierarchical timing cannot handle multiple interacting synchronous clocks Plaintiff: Larry Brown, IBM Defendant: Igor Keller, Cadence
Defense: First and foremost, defendant pleads not guilty The charge from plaintiff only means that there is no free lunch For Hierarchical Timing to work designers must follow certain rules They are well described in Alex Rubin defense Specifically, one should have a single clock pin in a block to avoid extra pessimism in hold/setup timing In the case of multiple clock pins plaintiff himself exonerated defender by proposing a solution: it is possible to remove some of the pessimism by describing relationship between two clocks 48
Defense (cont. ) Advanced SI analysis today reduces pessimism today if victim and aggressor share same clock SI analysis also becomes more problematic with multiple clock pins With multiple clock pins one assumes the clocks are different leading to Pessimism if uncertainty is assigned to both pins Optimism if no uncertainty is assigned As often is true, the best way to resolve a problem is to avoid creating it: stick to rules of hierarchy-friendly design methodology
Ways to Remove the Limitation CLK There are ways to define relationship between two internal clocks: Through parent external clock Explicitly define ranges of skews Parameterization of timing models with skew on two clocks is possible These enhancement are feasible but need to be driven by real commercial interest
Q & A Verdicts Damages!!! Charge 1: Hierarchical implementation and hence hierarchical timing sign 10 min -off don’t have a future Plaintiff: Oleg Levitsky, Cadence Defendant: Qiuyang Wu, Synopsys Charge 2: EDA tools and flows are inadequate for a construction flow: 10 min budgeting, IP models and hierarchical constraint development are lacking Plaintiff: Amit Shaligram, STMicro. Defendant: Alex Rubin, IBM Charge 3: You can never really close out-of-context + 10 min Misdemeanor charge: too much additional complexity and software Plaintiff: Guntram Wolski, Cisco Defendant: Alexander Skourikhin, Intel Charge 4: hierarchical timing cannot handle multiple interacting 10 min synchronous clocks Plaintiff: Larry Brown, IBM Defendant: Igor Keller, Cadence 30 min Discussion and audience questions 5 min Verdicts and “damages”


