Скачать презентацию From ESL to Implementation Reinventing Hardware Design using Скачать презентацию From ESL to Implementation Reinventing Hardware Design using

786c181995ba116e64ffa5839094bde1.ppt

  • Количество слайдов: 198

From ESL to Implementation: Reinventing Hardware Design using Bluespec System. Verilog™ © 2006, Bluespec, From ESL to Implementation: Reinventing Hardware Design using Bluespec System. Verilog™ © 2006, Bluespec, Inc. Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Joe Stoy Founder and Principal Engineer Bluespec Inc. 14 -16 Spring Street Waltham MA Joe Stoy Founder and Principal Engineer Bluespec Inc. 14 -16 Spring Street Waltham MA 02451, USA +1 781 250 2206 stoy@bluespec. com www. bluespec. com Copyright © Bluespec Inc. 2006 Confidential and Proprietary 2

Bluespec System. Verilog Workshop Agenda Intro: why an HDL can affect overall productivity, from Bluespec System. Verilog Workshop Agenda Intro: why an HDL can affect overall productivity, from concept to silicon Behavior: n n Rules: a new way to express HW behavior Correctness: why rules help Comparison with behavioral synthesis Rule-based Interface Methods: modularizing rules Structure: improving the expression of HW structure using ideas from advanced programming languages Clock domains and gated clocks: compiler-guaranteed safety Testbenches using BSV Transaction Level Modeling/architecture exploration and refinement, within a single paradigm Synthesis quality: as good as hand-coded RTL Tool flows Futures: n Formal verification Copyright © Bluespec Inc. 2006 Confidential and Proprietary 3

Intro: why an improved HDL is a central need to address today’s chip design Intro: why an improved HDL is a central need to address today’s chip design complexities Copyright © Bluespec Inc. 2006 Confidential and Proprietary 4

Moore’s Law: “Silicon capacity doubles every 18 to 24 months” Today (2005): • ~10 Moore’s Law: “Silicon capacity doubles every 18 to 24 months” Today (2005): • ~10 -20 M gates • 90 nm, 65 nm Source: http: //www. intel. com/technology/silicon/mooreslaw/index. htm Copyright © Bluespec Inc. 2006 Confidential and Proprietary 5

Today’s chips: “So. C”s (System on a Chip) “IP” blocks n (“Intellectual Property”) Processors Today’s chips: “So. C”s (System on a Chip) “IP” blocks n (“Intellectual Property”) Processors Caches, Memories Interconnects DMAs Other peripheral blocks I/O blocks E. g. , cell phones, cell network base stations, TV set-top boxes, i. Pods, digital cameras, … Copyright © Bluespec Inc. 2006 Confidential and Proprietary 6

ASIC design flow, and costs Architecture Design Verification and Test Physical Design time Can ASIC design flow, and costs Architecture Design Verification and Test Physical Design time Can take ~ 12 -24 months Can cost $10 Million+ (and rising) Bug respin cost + market window cost Copyright © Bluespec Inc. 2006 Confidential and Proprietary 7

Verification costs and chip quality are getting worse 66% of new ICs/ASICs require at Verification costs and chip quality are getting worse 66% of new ICs/ASICs require at least one re-spin 75% are due to logical/functional errors (an increase from 71% two years prior) Source: 2004 Collett study Source: IBM/IBS, Inc. Copyright © Bluespec Inc. 2006 Confidential and Proprietary 8

Design affects everything! Architecture Design Verification and Test Physical Design Myth: improving the Design Design affects everything! Architecture Design Verification and Test Physical Design Myth: improving the Design language will have little impact Architecture Design Verification and Test Physical Design In fact, the Design language impacts all activities Architecture Design Verification and Test Physical Design Copyright © Bluespec Inc. 2006 Confidential and Proprietary 9

How to improve productivity? “It is a profoundly erroneous truism, repeated by all copybooks How to improve productivity? “It is a profoundly erroneous truism, repeated by all copybooks and by eminent people when they are making speeches, that we should cultivate the habit of thinking of what we are doing. The precise opposite is the case. Civilization advances by extending the number of important operations which we can perform without thinking about them. …” [ Example: long division used to be an advanced subject in the days of Roman numerals; Arabic numerals changed that ] Alfred North Whitehead Mathematician and philosopher (1861 -1947) Copyright © Bluespec Inc. 2006 Confidential and Proprietary 10

The language of design is crucial! Software analogy: Assembler Fortran C C++ Java No The language of design is crucial! Software analogy: Assembler Fortran C C++ Java No theoretical difference (all Turing-complete) n “I can produce better code by writing it in Assembler” w w Maybe, if you are given enough time! “Better” = more efficient, but not more readable, maintainable, or reusable You can still write incorrect code; you still need to debug; you still need to verify. But the probabilities of certain bugs decrease and the kinds of bugs change, as you go to higher levels: n n n Register protocol, argument/result-passing protocol, stack protocol, byte/word-alignment issues Reentrancy and recursion protocols Memory layout of complex data Memory allocation/deallocation Type-misinterpretation Code reuse (parameterization and polymorphism) Copyright © Bluespec Inc. 2006 Confidential and Proprietary 11

Some lessons from SW language history The size/complexity of the system that you can Some lessons from SW language history The size/complexity of the system that you can build, correctly, within a short time, improves with higher levels of abstraction But also, crucially, people will not/ cannot use your new higher level language for serious work n n if it sacrifices efficiency if it is unpredictable/uncontrollable Copyright © Bluespec Inc. 2006 Confidential and Proprietary 12

Evolution of HDLs (Hardware Description Languages) Hand-drawn circuit diagrams (schematics) Schematic Capture (automated) ~1985 Evolution of HDLs (Hardware Description Languages) Hand-drawn circuit diagrams (schematics) Schematic Capture (automated) ~1985 Text-based RTL langs: Verilog & VHDL 2004 System. Verilog (Accellera) 1995 2001 2005 IEEE ? 2005 IEEE Verilog standards (also VHDL standards) time (RTL = Register-Transfer Level) Copyright © Bluespec Inc. 2006 Confidential and Proprietary 13

Bluespec: Better Design Accelerates Everything! More architectural flexibility during design Architectural exploration 50% reduction Bluespec: Better Design Accelerates Everything! More architectural flexibility during design Architectural exploration 50% reduction from design to verified netlist Architecture Design Early executable models Verification and Test Faster fixes, to achieve closure Physical Design 50% reduction in errors, faster correction Better reuse Fully synthesizable – without compromise! Copyright © Bluespec Inc. 2006 Confidential and Proprietary 14

Bluespec, Inc. company and technology background Copyright © Bluespec Inc. 2006 Confidential and Proprietary Bluespec, Inc. company and technology background Copyright © Bluespec Inc. 2006 Confidential and Proprietary 15

Bluespec, Inc. background Research@MIT on high-level synthesis & verification Technology Sandburst Corp: 10 Gb/s Bluespec, Inc. background Research@MIT on high-level synthesis & verification Technology Sandburst Corp: 10 Gb/s core router ASICs (Bluespec: further technology development) VC funding Technology VC funding Bluespec, Inc. : highlevel design and synthesis tool (System. Verilog-based) ~1996 2000 Copyright © Bluespec Inc. 2006 2003 Confidential and Proprietary 16

Bluespec, Inc. Headquartered in Waltham, MA n ~45 people (MA, CA, Europe, Armenia, India) Bluespec, Inc. Headquartered in Waltham, MA n ~45 people (MA, CA, Europe, Armenia, India) Technology, 1997 -present n n MIT research: Professor Arvind, students & colleagues Patented: HW synthesis from Rules Active IEEE P 1800/Accellera member; SV language contributor, System C language contributor Copyright © Bluespec Inc. 2006 Confidential and Proprietary 17

What does Bluespec offer? A new and powerful way to explore and express designs; What does Bluespec offer? A new and powerful way to explore and express designs; System. Verilog (design subset) tools to simulate and to synthesize into quality RTL; with Rules and Rule-based Interfaces Bluesim Verilog 95 RTL feeding into existing RTL-to-chip tools/flows Bluespec Synthesis Cycle Accurate RTL synthesis, Physical design Verilog sim Tapeout Copyright © Bluespec Inc. 2006 Confidential and Proprietary 18

Bluespec Solutions Copyright © Bluespec Inc. 2006 Confidential and Proprietary 19 Bluespec Solutions Copyright © Bluespec Inc. 2006 Confidential and Proprietary 19

Bluespec core technologies Design – executable specifications n n Synthesizable, high-level concurrency semantics Transactional Bluespec core technologies Design – executable specifications n n Synthesizable, high-level concurrency semantics Transactional interfaces for design with self-documenting protocol Verification – static and formal n n n Strong type checking Interface connectivity and protocol checking Race condition identification and management Multiple-domain clock and interface checking Rapid simulation with C/C++ functions Copyright © Bluespec Inc. 2006 Confidential and Proprietary 20

Bluespec tools System. C [ESE] BSV TRANSLATE Parsing Static Checking Optimization gcc Scheduling Common Bluespec tools System. C [ESE] BSV TRANSLATE Parsing Static Checking Optimization gcc Scheduling Common Synthesis Engine Power Optimization. exe Rapid, Source-Level Simulation and Interactive Debug of BSV Blueview Debug Parsing libsystemc. h RTL Generation System. C Simulation Bluespec Synthesis Cycle-Accurate w/Verilog sim Bluesim w/Verilog sim RTL Copyright © Bluespec Inc. 2006 Confidential and Proprietary 21

Creating ESL methodologies Design Components Prerequisites Bandwidth Accurate Architectural Exploration Transactions, Functional Model Simulation Creating ESL methodologies Design Components Prerequisites Bandwidth Accurate Architectural Exploration Transactions, Functional Model Simulation speed, instrumentation, protocol checking Latency Accurate Software Test Platform Functional Model with accurate timing and full concurrency Simulation speed and register interfaces Cycle Accurate Power Optimization & Firmware Development Defined Buses, registers, concurrency Rapid changes in micro-architecture and automatic RTL generation Implementation & Integration Automatically generated with rules & formal interfaces Easy ECOs and timing closure Bit Accurate Copyright © Bluespec Inc. 2006 Confidential and Proprietary Consistent Verification and Debugging Paradigms Purposes Consistent Connectivity through Formal I/F Methods Abstraction Level 22

ESL to Implementation Technologies Tools System. C [ESE] Methodologies BSV TRANSLATE Concurrency Semantics Bandwidth ESL to Implementation Technologies Tools System. C [ESE] Methodologies BSV TRANSLATE Concurrency Semantics Bandwidth Accurate Formal Interfaces gcc Bluespec Synthesis Bluesim Static, Formal Checking. exe Low Power Optimization Blueview libsystemc. h Latency Accurate Cycle Accurate System. C Simulation Bit Accurate RTL Copyright © Bluespec Inc. 2006 Confidential and Proprietary 23

Bluespec System. Verilog Agenda: Technical Deep Dive Intro: why an HDL can affect overall Bluespec System. Verilog Agenda: Technical Deep Dive Intro: why an HDL can affect overall productivity, from concept to silicon Behavior: n n Rules: a new way to express HW behavior Correctness: why rules help Comparison with behavioral synthesis Rule-based Interface Methods: modularizing rules Structure: improving the expression of HW structure using ideas from advanced programming languages Clock domains and gated clocks: compiler-guaranteed safety Testbenches using BSV Transaction Level Modeling/architecture exploration and refinement, within a single paradigm n Comparison with System. C Synthesis quality: as good as hand-coded RTL Tool flows n Coexistence with Verilog/VHDL/SV/System. C Futures: n n Integration of Rules and Rule-based Interfaces into System. C Formal verification Copyright © Bluespec Inc. 2006 Confidential and Proprietary 24

Bluespec System. Verilog™ A one slide overview Bluespec System. Verilog Behavioral For complex concurrency Bluespec System. Verilog™ A one slide overview Bluespec System. Verilog Behavioral For complex concurrency and control, across multiple shared resources, across module boundaries High-level abstract types Powerful static checking Powerful parameterization Powerful static elaboration Advanced clock management Two dimensions raising the level of abstraction (fully synthesizable) Structural Rules and Rule-based Interfaces VHDL/Verilog/System. C Copyright © Bluespec Inc. 2006 Confidential and Proprietary 25

Bluespec System. Verilog™ A one-slide overview Bluespec System. Verilog Behavioral For complex concurrency and Bluespec System. Verilog™ A one-slide overview Bluespec System. Verilog Behavioral For complex concurrency and control, across multiple shared resources, across module boundaries High-level abstract types Powerful static checking Powerful parameterization Powerful static elaboration Advanced clock management Two dimensions raising the level of abstraction (fully synthesizable) Structural Rules and Rule-based Interfaces VHDL/Verilog/System. C Copyright © Bluespec Inc. 2006 Confidential and Proprietary 26

Complex concurrency with shared resources HW by its very nature is highly concurrent n Complex concurrency with shared resources HW by its very nature is highly concurrent n n A HW design can be viewed as a set of cooperating concurrent FSMs The cooperation occurs through shared resources Today’s So. Cs have enormous amounts of complicated concurrency and shared resources How do we express this today? n n Concurrency expressed with processes (“always” blocks in RTL) Access to shared resources are tediously micro-managed (ifthen-elses inside always blocks) Unfortunately: this does not scale n Leads to race conditions (inconsistent state in the shared resources) which are very tricky to discover, diagnose, fix Copyright © Bluespec Inc. 2006 Confidential and Proprietary 27

Simple example with concurrency and shared resources cond 0 cond 1 cond 2 Process Simple example with concurrency and shared resources cond 0 cond 1 cond 2 Process priority: 2 > 1 > 0 0 +1 -1 x 1 +1 -1 2 y Process 0: increments register x when cond 0 Process 1: transfers a unit from register x to register y when cond 1 Process 2: decrements register y when cond 2 Each register can only be updated by one process on each clock. Priority: 2 > 1 > 0 Just like real applications, e. g. : n Packet arrives, is processed, departs Copyright © Bluespec Inc. 2006 Confidential and Proprietary 28

cond 0 cond 1 cond 2 Process priority: 2 > 1 > 0 0 cond 0 cond 1 cond 2 Process priority: 2 > 1 > 0 0 +1 -1 1 +1 x -1 2 y Which one is correct? always @(posedge CLK) begin if (!cond 2 && cond 1) x <= x – 1; else if (cond 0) x <= x + 1; always @(posedge CLK) begin if (!cond 2 || cond 1) x <= x – 1; else if (cond 0) x <= x + 1; if (cond 2) y <= y – 1; else if (cond 1) y <= y + 1; end What’s required to verify that they’re correct? What if the priorities changed: cond 1 > cond 2 > cond 0? What if the processes are in different modules? Copyright © Bluespec Inc. 2006 Confidential and Proprietary 29

With Bluespec, the design is direct cond 0 cond 1 cond 2 Process priority: With Bluespec, the design is direct cond 0 cond 1 cond 2 Process priority: 2 > 1 > 0 0 +1 -1 x 1 +1 -1 2 y (* descending_urgency = “proc 2, proc 1, proc 0” *) rule proc 0 (cond 0); x <= x + 1; endrule proc 1 (cond 1); y <= y + 1; x <= x – 1; endrule proc 2 (cond 2); y <= y – 1; endrule Hand-written RTL: Complexity due to: State-centric (for synthesizability) Scheduling clutter BSV: Functional correctness follows directly from rule semantics Executable spec (operation-centric) Automatic handling of shared resource mux logic Same hardware as the RTL Copyright © Bluespec Inc. 2006 Confidential and Proprietary 30

Now, let’s make a small change: add a new process and insert its priority Now, let’s make a small change: add a new process and insert its priority cond 0 cond 1 -1 0 1 cond 2 +1 +1 -1 +2 3 x 2 -2 y cond 3 Process priority: 2 > 3 > 1 > 0 Copyright © Bluespec Inc. 2006 Confidential and Proprietary 31

Changing the Bluespec design cond 0 cond 1 -1 0 1 cond 2 +1 Changing the Bluespec design cond 0 cond 1 -1 0 1 cond 2 +1 +1 -1 +2 3 2 -2 x Pre-Change Process priority: 2 > 3 > 1 > 0 y cond 3 (* descending_urgency = "proc 2, proc 3, proc 1, proc 0" *) (* descending_urgency = “proc 2, proc 1, proc 0” *) rule proc 0 (cond 0); x <= x + 1; endrule proc 1 (cond 1); y <= y + 1; x <= x - 1; endrule proc 1 (cond 1); y <= y + 1; x <= x – 1; endrule proc 2 (cond 2); y <= y - 1; x <= x + 1; endrule proc 2 (cond 2); y <= y – 1; endrule ? rule proc 3 (cond 3); y <= y - 2; x <= x + 2; endrule Copyright © Bluespec Inc. 2006 Confidential and Proprietary 32

Changing the Verilog design cond 0 cond 1 -1 0 1 cond 2 +1 Changing the Verilog design cond 0 cond 1 -1 0 1 cond 2 +1 +1 -1 +2 3 x Process priority: 2 > 3 > 1 > 0 2 -2 y cond 3 Pre-Change always @(posedge CLK) begin if ((cond 2 && cond 0) || (cond 0 && !cond 1 && !cond 3)) x <= x + 1; else if (cond 3 && !cond 2) x <= x + 2; else if (cond 1 && !cond 2) x <= x - 1 always @(posedge CLK) begin if (!cond 2 && cond 1) x <= x – 1; else if (cond 0) x <= x + 1; if (cond 2) y <= y – 1; else if (cond 1) y <= y + 1; end if (cond 2) y <= y - 1; else if (cond 3) y <= y - 2; else if (cond 1) y <= y + 1; end Copyright © Bluespec Inc. 2006 Confidential and Proprietary ? 33

Key Benefits Executable specifications Rapid changes But, with fine-grained control of RTL: n n Key Benefits Executable specifications Rapid changes But, with fine-grained control of RTL: n n n Define the optimal architecture/microarchitecture Debug at the source OR RTL level – designer understands both The Quality of Results (Qo. R) of RTL! Copyright © Bluespec Inc. 2006 Confidential and Proprietary 34

The concurrency complexities illustrated in the simple example are greatly magnified in real designs The concurrency complexities illustrated in the simple example are greatly magnified in real designs Copyright © Bluespec Inc. 2006 Confidential and Proprietary 35

A more complex example, from CPU design Register File Speculative, out-of-order Many, many concurrent A more complex example, from CPU design Register File Speculative, out-of-order Many, many concurrent activities Branch FIFO MEM Unit FIFO Instruction Memory Dave & Arvind, 2003 FIFO Re. Order Buffer (ROB) FIFO Decode FIFO Fetch ALU Unit FIFO Data Memory Copyright © Bluespec Inc. 2006 Confidential and Proprietary 36

Many concurrent actions on common state: nightmare to manage explicitly Register File Get operands Many concurrent actions on common state: nightmare to manage explicitly Register File Get operands for instr Empty E Waiting W Writeback results Re-Order Buffer State Instruction Operand 1 Operand 2 Result E - - Instr - V - - W Instr A V 0 - Instr B V 0 - Instr C V 0 - Instr D V 0 - E Instr - V - V - - E Resolve branches V W Instr - V - V - - E Instr - V - - E Tail - W Put an instr into ROB V W Decode Unit - E Head Instr - V - - Copyright © Bluespec Inc. 2006 Confidential and Proprietary Get a ready ALU instr Put ALU instr results in ROB Get a ready MEM instr Put MEM instr results in ROB ALU Unit(s) MEM Unit(s) 37

But in BSV…. . you can code each operation in isolation, as a rule. But in BSV…. . you can code each operation in isolation, as a rule. . the tool guarantees that operations are INTERLOCKED (i. e. each runs to completion without external interference) Branch Resolution • … Commit Instr • … • Write results to register Write Back Resultsfile ROB to (or allow memory write for store) • Write back results to • Set to Empty instr result Dispatch Instr • Write back to all • Increment head pointer waiting • Mark instruction tags Insert Instr in ROB dispatched • Set to done • Put instruction in first • Forward to appropriate available slot unit • Increment tail pointer • Get source operands - RF prev instr Copyright © Bluespec Inc. 2006 Confidential and Proprietary 38

The key: Rules execute atomically Reference semantics: while some rules are enabled choose one The key: Rules execute atomically Reference semantics: while some rules are enabled choose one enabled rule execute it Copyright © Bluespec Inc. 2006 Confidential and Proprietary 39

Atomicity atomic Copyright © Bluespec Inc. 2006 Confidential and Proprietary 40 Atomicity atomic Copyright © Bluespec Inc. 2006 Confidential and Proprietary 40

Atomicity ατομος Copyright © Bluespec Inc. 2006 Confidential and Proprietary 41 Atomicity ατομος Copyright © Bluespec Inc. 2006 Confidential and Proprietary 41

Atomicity a_tomic n not w w w asymmetric atypical amoral Copyright © Bluespec Inc. Atomicity a_tomic n not w w w asymmetric atypical amoral Copyright © Bluespec Inc. 2006 Confidential and Proprietary 42

Atomicity a_tomic n not w w w n asymmetric atypical amoral cut w w Atomicity a_tomic n not w w w n asymmetric atypical amoral cut w w microtome Tomography appendectomy tome (of a multi_volume book) Copyright © Bluespec Inc. 2006 Confidential and Proprietary 43

Atomicity Rules are atomic “Not cut” n Whenever they run, they run to completion Atomicity Rules are atomic “Not cut” n Whenever they run, they run to completion w n never interrupted No other activities are interleaved with them This greatly simplifies design n n avoids many race conditions easier to prove invariants Copyright © Bluespec Inc. 2006 Confidential and Proprietary 44

Extensive supporting theory in computer science literature Term Rewriting Systems, Terese, Cambridge Univ. Press, Extensive supporting theory in computer science literature Term Rewriting Systems, Terese, Cambridge Univ. Press, 2003, 884 pp. Parallel Program Design: A Foundation, K. Mani Chandy and Jayadev Misra, Addison Wesley, 1988 n UNITY programming language for concurrent, reactive systems Term Rewriting and All That, Franz Baader and Tobias Nipkow, Cambridge Univ. Press, 1998, 300 pp. Using Term Rewriting Systems to Design and Verify Processors, Arvind and Xiaowei Shen, IEEE Micro 19: 3, 1998, p 36 -46 Proofs of Correctness of Cache-Coherence Protocols, Stoy et al, in Formal Methods for Increasing Software Productivity, Berlin, Germany, 2001, Springer-Verlag LNCS 2021 Superscalar Processors via Automatic Microarchitecture Transformation, Mieszko Lis, Masters thesis, Dept. of Electrical Eng. and Computer Science, MIT, 2000 … and more … The intuitions underlying this theory are easy to use in practice Copyright © Bluespec Inc. 2006 Confidential and Proprietary 45

Synthesizing Rules into efficient clocked synchronous HW - Automatically generates correct HW for the Synthesizing Rules into efficient clocked synchronous HW - Automatically generates correct HW for the most error-prone parts of hand-written RTL - While retaining transparency, predictability and designer control Copyright © Bluespec Inc. 2006 Confidential and Proprietary 46

Clocked synchronous hardware The compiler translates BSV source code into Verilog RTL I Transition Clocked synchronous hardware The compiler translates BSV source code into Verilog RTL I Transition Logic S“Next” Collection S Copyright © Bluespec Inc. 2006 of State Elements Confidential and Proprietary O 47

Clocked semantics Reference semantics: while some rules are enabled choose one enabled rule execute Clocked semantics Reference semantics: while some rules are enabled choose one enabled rule execute it Clocked semantics: every clock cycle: execute as many rules as you can provided the overall effect is as if they executed serially in some order Copyright © Bluespec Inc. 2006 Confidential and Proprietary 48

Rule semantics mapped to hardware semantics Rules HW Ri Rj Rk rule steps clocks Rule semantics mapped to hardware semantics Rules HW Ri Rj Rk rule steps clocks Ri The effect of each cycle is as if a sequence of rules was executed one-at-a-time Consequence: The HW state can never be an interleaving of actions from different rules Rule atomicity (therefore, correctness) is preserved Copyright © Bluespec Inc. 2006 Confidential and Proprietary 49

Synthesizing a single rule foo (… cond … (x < y) …); … action Synthesizing a single rule foo (… cond … (x < y) …); … action … x <= x + z … endrule x x’ rule foo y current z state Q action logic cond logic Copyright © Bluespec Inc. 2006 next-state values D y’ z’ next state EN enable signals Confidential and Proprietary 50

Synthesizing multiple rules Different rules can read/write common state. Therefore, n n Need multiplexing Synthesizing multiple rules Different rules can read/write common state. Therefore, n n Need multiplexing of next state values into shared state element inputs Need control of which rules get to update next state elements w w Control of next state “enables” Control of next state data multiplexers Copyright © Bluespec Inc. 2006 Confidential and Proprietary 51

Synthesizing multiple rules Rule Control Rule 1 Rule. N Data Select Action 1 State Synthesizing multiple rules Rule Control Rule 1 Rule. N Data Select Action 1 State D Q Action. N Cond 1 Scheduler Enable Cond. N Scheduler ensures consistency with Rule semantics Usually the most error-prone part of hand-written RTL n Here, correct by construction Bluespec patented technology Copyright © Bluespec Inc. 2006 Confidential and Proprietary 52

Transparency and predictability Bluespec synthesis only adds this part Rule Control Rule 1 Rule. Transparency and predictability Bluespec synthesis only adds this part Rule Control Rule 1 Rule. N Data Select Action 1 State D Q Action. N Cond 1 Scheduler Enable Cond. N User-specified structures dominates area, critical paths Microarchitecture remains completely under user control Copyright © Bluespec Inc. 2006 Confidential and Proprietary 53

Comparing BSV to traditional “Behavioral Synthesis” Copyright © Bluespec Inc. 2006 Confidential and Proprietary Comparing BSV to traditional “Behavioral Synthesis” Copyright © Bluespec Inc. 2006 Confidential and Proprietary 54

Function vs. Algorithm People often say: “I’m describing the algorithm of my HW block Function vs. Algorithm People often say: “I’m describing the algorithm of my HW block using C/C++ or Behavioral RTL” Actually, they’re describing the function, not the algorithm A function: spec of I/O behavior, without consideration for implementability, and in particular without consideration for cost in space (circuitry) or time (performance) An algorithm: a specific implementation with a particular cost model n Different computation models, with different cost models, usually require radically different algorithms for implementing the same function Copyright © Bluespec Inc. 2006 Confidential and Proprietary 55

“Behavioral Synthesis” “Behavior” of design expressed as sequential program (e. g. , in C “Behavioral Synthesis” “Behavior” of design expressed as sequential program (e. g. , in C or procedural Verilog) Past products: n n Behavioral Synthesis tool Current products: n n RTL n n Copyright © Bluespec Inc. 2006 Synopsys Behavioral Compiler (withdrawn) Get 2 Chip (absorbed into Cadence) Mentor’s Catapult. C Synfora Forte (in System. C) … Confidential and Proprietary 56

Behavioral Synthesis: the technology has a long history Sequential source program Parsing … Control-flow Behavioral Synthesis: the technology has a long history Sequential source program Parsing … Control-flow graph (sequential CDFG) Dependency Analysis and associated transforms (“automatic parallelization”) Parallel CDFG Tractable only for certain loop-and-array codes, without any complex control (where it can work spectacularly well) (Control/Data Flow Graph) Synthesis (target-specific) Vector computers (~1975 …) VLIW/IA 64, Cellular, SIMD, dataflow, SMP, cluster, cachefriendly, … (~1980 s …) Copyright © Bluespec Inc. 2006 Confidential and Proprietary Hardware (RTL) (~1990 s …) 57

The “Automatic Parallelization” problem The input (C program) is totally sequential, because of C The “Automatic Parallelization” problem The input (C program) is totally sequential, because of C semantics We want the synthesized hardware to exploit parallelism, for high performance The Automatic Parallelization problem: Undo/remove the input’s sequentiality, converting into a parallel form Copyright © Bluespec Inc. 2006 Confidential and Proprietary 58

“Automatic Parallelization”: Example — matrix multiplication j inner product j i i C A “Automatic Parallelization”: Example — matrix multiplication j inner product j i i C A B void matmult (int A[N, N], B[N, N], C[N, N]) { int i, j, k, inner. Product. Sum; for (i = 0; i < N; i++) for (j = 0; j < N; j++) { inner. Product. Sum = 0; for (k = 0; k < N; k++) inner. Product. Sum += A[i, k] * B[k, j]; C[i, j] = inner. Product. Sum; } } Copyright © Bluespec Inc. 2006 Confidential and Proprietary 59

“Automatic Parallelization”: Example — matrix multiplication Can the k loop (inner product) be executed “Automatic Parallelization”: Example — matrix multiplication Can the k loop (inner product) be executed in parallel? n The “*”s can be done in parallel, but the “+”s are still sequenced A[i, *] B[*, j] k=0 k=N-1 x 0 x x x + + Copyright © Bluespec Inc. 2006 Confidential and Proprietary C[i, j] 60

“Automatic Parallelization”: Example — matrix multiplication A clever compiler could transform it into tree “Automatic Parallelization”: Example — matrix multiplication A clever compiler could transform it into tree accumulation, which has more parallelism n Depends on commutativity, associativity of “+” w w May not be true if the integers can overflow! May not be true for floating point numbers! A[i, *] B[*, j] k=0 x k=N-1 x + x x + + + C[i, j] Copyright © Bluespec Inc. 2006 Confidential and Proprietary 61

“Automatic Parallelization”: Example — matrix multiplication Can the i and j loops be executed “Automatic Parallelization”: Example — matrix multiplication Can the i and j loops be executed in parallel? n n Not as written, because all the k loops read and write a single common variable, “inner. Product. Sum”! A clever compiler can eliminate this using “scalar expansion”: converting it into an array w Note: most clever programmers would do the opposite! void matmult (int A[N, N], B[N, N], C[N, N]) { int i, j, k, inner. Product. Sum [N, N] ; for (i = 0; i < N; i++) for (j = 0; j < N; j++) { inner. Product. Sum [i, j] = 0; for (k = 0; k < N; k++) inner. Product. Sum [i, j] += A[i, k] * B[k, j]; C[i, j] = inner. Product. Sum [i, j] ; } } Copyright © Bluespec Inc. 2006 Confidential and Proprietary 62

Automatic Parallelization: history Studied extensively since the 1960 s (vectorizing/ parallelizing/ VLIW/ EPIC software Automatic Parallelization: history Studied extensively since the 1960 s (vectorizing/ parallelizing/ VLIW/ EPIC software compilers) Fundamental problems: n n Complex control structures, pointers and aliasing (memory indirection), dynamic data allocation, … are all difficult/ impossible to parallelize automatically C is often a bad starting point: best parallel algorithm for a given function can be quite different from best sequential algorithm w Parallel algorithm designers prefer to start with a clean slate from a functional specification, not a C algorithm with unnecessary sequential baggage Has succeeded only in limited domain: simple array-based loop nests n SW community has abandoned automatic parallelization of generalpurpose programs; is mostly used only for scientific/ technical computing, linear algebra, … Copyright © Bluespec Inc. 2006 Confidential and Proprietary 63

Automatic Parallelization: transparency, predictability, controllability Another common issue with automatic parallelization and behavioral synthesis Automatic Parallelization: transparency, predictability, controllability Another common issue with automatic parallelization and behavioral synthesis n n Designer loses intuition and precise control over generated output Behavioral synthesis: tool decides microarchitecture based on complex optimization criteria w w w “What HW will result, with this input C program? ” “What will be the effect on the resulting HW, if I make this change to the input C program? ” “What change should I make to the input C program, to improve the HW in this way? ” Copyright © Bluespec Inc. 2006 Confidential and Proprietary 64

Behavioral Synthesis: Applicability IDCT Motion compensator DES Only few IP blocks may benefit from Behavioral Synthesis: Applicability IDCT Motion compensator DES Only few IP blocks may benefit from Behavioral Synthesis Complex Datapaths (e. g. processor/ controller) Control Technical Algorithms (e. g. DSP/math) FIR filter Copyright © Bluespec Inc. 2006 Confidential and Proprietary 65

Comparing “Model of Time” in BSV vs. Automatic Synthesis from C/C++: completely untimed n Comparing “Model of Time” in BSV vs. Automatic Synthesis from C/C++: completely untimed n No relationship between source model of time (sequential C code execution) and target model of time (HW clocks) BSV: untimed to timed n n n Initially, designer writes arbitrarily complex rules, i. e. , any amount of functional computation per rule Designer refines this (splitting rules, if necessary) so that the functional computation per rule is feasible in HW in a target clock speed/ technology BSV tool schedules multiple rules per clock Copyright © Bluespec Inc. 2006 Confidential and Proprietary 66

Comparing Concurrency Model in BSV vs. System. C BSV: Rules n n Atomic transactions Comparing Concurrency Model in BSV vs. System. C BSV: Rules n n Atomic transactions Tool generates control logic to manage concurrency System. C n Threads and events w n Higher-level synchronization abstractions built on top of events: semaphores, locks, blocking methods, … Designer manages atomicity explicitly (consistent access to multiple shared resources) Copyright © Bluespec Inc. 2006 Confidential and Proprietary 67

Historical improvements in concurrency control Higher level (less error-prone) Atomic transactions (multiple resources) Atomic Historical improvements in concurrency control Higher level (less error-prone) Atomic transactions (multiple resources) Atomic objects (structured locking) Semaphores (locks, events, …) Cycle Accounting 1950 today SW: pthreads SW: Java HW: RTL, System. C Copyright © Bluespec Inc. 2006 SW: Database Systems, Distributed Systems HW: Bluespec Confidential and Proprietary 68

Elevating design above RTL Bluespec System. C C/C++ Rules with Methods Explicit <LOC Functionality Elevating design above RTL Bluespec System. C C/C++ Rules with Methods Explicit

Bluespec System. Verilog Agenda: Technical Deep Dive Intro: why an HDL can affect overall Bluespec System. Verilog Agenda: Technical Deep Dive Intro: why an HDL can affect overall productivity, from concept to silicon Behavior: n n Rules: a new way to express HW behavior Correctness: why rules help Comparison with behavioral synthesis Rule-based Interface Methods: modularizing rules Structure: improving the expression of HW structure using ideas from advanced programming languages Clock domains and gated clocks: compiler-guaranteed safety Testbenches using BSV Transaction Level Modeling/architecture exploration and refinement, within a single paradigm n Comparison with System. C Synthesis quality: as good as hand-coded RTL Tool flows n Coexistence with Verilog/VHDL/SV/System. C Futures: n n Integration of Rules and Rule-based Interfaces into System. C Formal verification Copyright © Bluespec Inc. 2006 Confidential and Proprietary 70

Bluespec System. Verilog™ A one-slide overview Bluespec System. Verilog Behavioral For complex concurrency and Bluespec System. Verilog™ A one-slide overview Bluespec System. Verilog Behavioral For complex concurrency and control, across multiple shared resources, across module boundaries High-level abstract types Powerful static checking Powerful parameterization Powerful static elaboration Advanced clock management Two dimensions raising the level of abstraction (fully synthesizable) Structural Rules and Rule-based Interfaces VHDL/Verilog/System. C Copyright © Bluespec Inc. 2006 Confidential and Proprietary 71

Consider a FIFO, in RTL enq() first()/deq() In Verilog: module mk. FIFO_model (output not. Consider a FIFO, in RTL enq() first()/deq() In Verilog: module mk. FIFO_model (output not. Full, input [31: 0] data. In, input enq_enab, output not. Empty, output [31: 0] first, input deq_enab); … endmodule mk. FIFO not. Full module mk. FIFO_implem (output not. Full, input [31: 0] data. In, input enq_enab, output not. Empty, output [31: 0] first, input deq_enab); … endmodule Copyright © Bluespec Inc. 2006 32 data. In enq_enab Confidential and Proprietary first 32 not. Empty deq_enab 72

Modules written to be used by others require detailed specifications data_in “Designware” FIFO and Modules written to be used by others require detailed specifications data_in “Designware” FIFO and associated documentation push_req_n pop_req_n data_out full empty clk rstn A small sample of the informal, written interface specification (8 pages): Copyright © Bluespec Inc. 2006 Confidential and Proprietary 73

Module interfaces: summary critique of today’s RTL methodology Two modules that implement the same Module interfaces: summary critique of today’s RTL methodology Two modules that implement the same interface have to repeat the same port list (tedious, error prone) Interfaces are flat, unstructured port lists n No concept of grouping ports according to “transactions” No specification of behavior on the interface n n “enq_enab allowed only if not. Full” “data_in should be valid with enq_enab” “first only valid if not. Empty” “deq_enab allowed only if not. Empty” Behavior is typically specified in ad hoc text and timing diagrams n Verification obligation, often to incomplete specs Copyright © Bluespec Inc. 2006 Confidential and Proprietary 74

A FIFO in System. Verilog enq() first()/deq() In System. Verilog: interface FIFO; bit not. A FIFO in System. Verilog enq() first()/deq() In System. Verilog: interface FIFO; bit not. Full, enq_enab; bit [31: 0] data. In; bit not. Empty, deq_enab, bit [31: 0] first; modport ifc (output not. Full, not. Empty, first, input data. In, enq_enab, deq_enab); endinterface module mk. FIFO_model (FIFO. ifc); … endmodule 32 data. In enq_enab enq not. Full deq first mk. FIFO first 32 not. Empty deq_enab module mk. FIFO_implem (FIFO. ifc); … endmodule Copyright © Bluespec Inc. 2006 Confidential and Proprietary 75

Module interfaces: summary critique of System. Verilog methodology Interface port lists are separately specified Module interfaces: summary critique of System. Verilog methodology Interface port lists are separately specified (independent of any module implementing the interface) n Two modules that implement the same interface can share the same interface definition (improves “plug and play”) But, still: n Interfaces are flat, unstructured port lists w n No specification of behavior on the interface w w n No concept of grouping ports according to “transactions” “enq_enab allowed only if not. Full” “data_in should be valid with enq_enab” “first only valid if not. Empty” “deq_enab allowed only if not. Empty” Behavior is typically specified in ad hoc text and timing diagrams w Verification obligation, often to incomplete specs Note: SV does allow definition of tasks and functions inside an interface definition, and this provides some limited ability to group according to transactions and to encapsulate interface behavior Copyright © Bluespec Inc. 2006 Confidential and Proprietary 76

Rule-based Interfaces Robust, parameterizable, correct-by-construction way to express interactions with a module Extend Rule Rule-based Interfaces Robust, parameterizable, correct-by-construction way to express interactions with a module Extend Rule Semantics across module boundaries Capture the protocol of a complete “transaction” with a module Capture inter-transaction scheduling constraints Copyright © Bluespec Inc. 2006 Confidential and Proprietary 77

A FIFO in BSV enq() first()/deq() interface FIFO#(type item. Type); method Action enq (item. A FIFO in BSV enq() first()/deq() interface FIFO#(type item. Type); method Action enq (item. Type x); method item. Type first (); method Action deq (); method Action clear (); endinterface Each method captures a complete transaction protocol: n RDY w w n ENABLE w n n e. g. , enq() is allowed (the FIFO is not full) e. g. , deq() is allowed (the FIFO is not empty) e. g. , when enq() or deq() is invoked Input data buses (method arguments) Output data buses (method results) More abstract than port lists and ad hoc timing diagrams n Never have any timing errors at interfaces Copyright © Bluespec Inc. 2006 Confidential and Proprietary 78

Methods map directly into HW ports: FIFO n not full enab not empty rdy Methods map directly into HW ports: FIFO n not full enab not empty rdy enab always true Copyright © Bluespec Inc. 2006 rdy Confidential and Proprietary deq rdy clear not empty first n Any module that provides a FIFO interface • n-bit argument • has side effect (Action) first(): • no argument • n-bit result deq(): • no argument • has side effect (Action) clear(): • no argument • has side effect (Action) enab rdy enq(): 79

Interface methods are HW! Interface method declarations look like functions/ procedures in SW Uses Interface methods are HW! Interface method declarations look like functions/ procedures in SW Uses of interface methods look like function/ procedure calls in SW But: think HW, not SW or process simulation! A definition of an interface method in a module is a manifest bit of circuitry behind its ports A use of an interface method is just a set of connections (wires) to the module interface ports There is no “call/execute/return”, stack frame, …! Copyright © Bluespec Inc. 2006 Confidential and Proprietary 80

Interface methods fit smoothly into rules route module … FIFO#(int) i. Fifo <- mk. Interface methods fit smoothly into rules route module … FIFO#(int) i. Fifo <- mk. FIFO; FIFO#(int) o. Fifo 1 <- mk. FIFO; FIFO#(int) o. Fifo 2 <- mk. FIFO; rule (i. Fifo. first[0] == 0); i. Fifo. deq; o. Fifo 1. enq (i. Fifo. first); endrule (i. Fifo. first[0] == 1); i. Fifo. deq; o. Fifo 2. enq (i. Fifo. first); endrule endmodule All the implicit conditions (not. Full, not. Empty) are automatically handled by incorporating into Rule conditions. This eliminates much clutter, and improves correctness. Copyright © Bluespec Inc. 2006 Confidential and Proprietary 81

Module interfaces: Inter-transaction scheduling constraints Architect to Engineers: Please design for me a FIFO Module interfaces: Inter-transaction scheduling constraints Architect to Engineers: Please design for me a FIFO in which I can enq and deq simultaneously (i. e. , in the same clock) “With my FIFO, you can enq and deq simultaneously … Engineer 1 … in most cases, but not if it’s either empty or full. ” “naive. FIFO” Engineer 2 … even if it’s full. “Pipeline. FIFO” (Think of it as a deq first, making room for a following enq, but squeezed into a single clock. This naturally fits into regster semantics: read old value, write new value. ) ” Engineer 3 … even if it’s empty. “Bypass. FIFO” (Think of it as an enq first, making an item available for a following deq, but squeezed into a single clock. This is just a bypass of a value from input to output. ) ” Copyright © Bluespec Inc. 2006 Confidential and Proprietary 82

Inter-transaction scheduling constraints enq() deq() For 3 FIFO designs (capacity 2) and various conditions, Inter-transaction scheduling constraints enq() deq() For 3 FIFO designs (capacity 2) and various conditions, allowable operations and their “in the same clock” semantics # of elements in FIFO 0 1 2 Naïve. FIFO enq || deq Pipeline. FIFO enq || deq < enq Bypass. FIFO enq < deq enq || deq Copyright © Bluespec Inc. 2006 Confidential and Proprietary 83

Module interfaces: Inter-transaction scheduling constraints The FIFO variants have the same interface methods/wires, but Module interfaces: Inter-transaction scheduling constraints The FIFO variants have the same interface methods/wires, but differ only in scheduling of the interface transactions n “enq || deq” “deq < enq” “enq < deq” They have different latency properties n n Naïve. FIFO, Pipeline. FIFO: minimum 1 -tick latency Bypass. FIFO: minimum 0 -tick latency w This can affect “alignment” with associated data on other datapaths Their control circuits have different properties: n n Pipeline. FIFO: “not. Full” depends on “deq_enab” Bypass. FIFO: “not. Empty” depends on “enq_enab” Their data paths have different properties: n Bypass. FIFO: combinational path from data in to data out (can affect timing closure) Copyright © Bluespec Inc. 2006 Confidential and Proprietary 84

Module interfaces: Inter-transaction scheduling constraints “Client” HW that uses one of these FIFOs will Module interfaces: Inter-transaction scheduling constraints “Client” HW that uses one of these FIFOs will be different, depending on which variant is used n Different control logic to obey different scheduling requirements In RTL, n n These difference are often undocumented, or poorly communicated from FIFO designer to FIFO user more verification surprises, bugs With Rule-based Interface Methods n n Precise vocabulary to specify and communicate scheduling Control HW in client is automatically synthesized to take into account scheduling differences Copyright © Bluespec Inc. 2006 Confidential and Proprietary 85

Broad-brush differences between BSV and RTL: Module hierarchy BSV has exactly the same notion Broad-brush differences between BSV and RTL: Module hierarchy BSV has exactly the same notion of module hierarchy as RTL n In fact, more stringently so: even registers are modules (at the leaves of the hierarchy). In BSV, ordinary variables never represent registers. Thus, designers exercise precise control over microarchitecture n “If so, how can BSV be a high-level HDL? ” n w w Microarchitecture is the creative (and fun) part of HW design; it distinguishes good designs from bad. The designer should remain involved in this. Complex concurrency and control is the hard and tedious part of HW design; it’s where most errors arise. BSV’s Rules dramatically simplify and automate this. Copyright © Bluespec Inc. 2006 Confidential and Proprietary 86

Modules, rules, interfaces, methods module interface state The big picture: modules contain rules which Modules, rules, interfaces, methods module interface state The big picture: modules contain rules which use methods that are provided by sub-modules in their interfaces. Methods, too, can use other methods. rule Copyright © Bluespec Inc. 2006 Confidential and Proprietary 87

Example: a 2 x 2 switch, with stats Determine Queue Packets arrive on two Example: a 2 x 2 switch, with stats Determine Queue Packets arrive on two input FIFOs, and must be switched to two output FIFOs Certain “interesting packets” must be counted +1 Count certain packets Copyright © Bluespec Inc. 2006 Confidential and Proprietary 88

2 x 2 switch specs Input FIFOs can be empty Output FIFOs can be 2 x 2 switch specs Input FIFOs can be empty Output FIFOs can be full Shared resource collision on an output FIFO: n if packets available on both input FIFOs, both have same destination, and destination FIFO is not full Shared resource collision on counter: n if packets available on both input FIFOs, each has different destination, both output FIFOs are not full, and both packets are “interesting” Resolve collisions in favor of packets from the first input FIFO Must have maximum throughput: a packet must move if it can, modulo the above rules Copyright © Bluespec Inc. 2006 Confidential and Proprietary 89

The meat of the BSV code Determine Queue module mk. Small. Switch (…); … The meat of the BSV code Determine Queue module mk. Small. Switch (…); … (* descending_urgency = "r 1, r 2" *) rule r 2; // for packets from FIFO i 2 let x = i 2. first; let out = ((x[0] == 0) ? o 1 : o 2); i 2. deq; out. enq (x); if (count(x)) c <= c + 1; endrule endmodule: mk. Small. Switch Copyright © Bluespec Inc. 2006 Determine Queue rule r 1; // for packets from FIFO i 1 let x = i 1. first; let out = ((x[0] == 0) ? o 1 : o 2); i 1. deq; out. enq (x); if (count(x)) c <= c + 1; endrule +1 Count certain packets Confidential and Proprietary 90

Commentary Muxing into output FIFOs, and control of those muxes, automatically generated Automatic handling Commentary Muxing into output FIFOs, and control of those muxes, automatically generated Automatic handling of FIFO emptiness, FIFO fullness n This is part of BSV’s rule and interface method semantics w w w Impossible to read a junk value from an empty FIFO Impossible to enqueue into a full FIFO Impossible to race for multiple enqueues onto a FIFO All control for resource sharing handled automatically n n Rule atomicity ensures consistency The “descending_urgency” attribute resolves collisions in favor of rule r 1 The BSV code directly expresses design intent without all the clutter of control and shared-resource mgmt generating efficient, correct-by-construction RTL Copyright © Bluespec Inc. 2006 Confidential and Proprietary 91

Managing change Now imagine the following changes to the existing code: n n n Managing change Now imagine the following changes to the existing code: n n n Some packets are multicast (go to both FIFOs) Some packets are dropped (go to no FIFO) More complex arbitration w w w n n n FIFO collision: in favor of r 1 Counter collision: in favor of r 2 Fair scheduling Several counters for several kinds of interesting packets Non-exclusive counters (e. g. , IP packets include TCP packets) M input FIFOs, N output FIFOs (parameterized) Suppose these changes are required 6 months after original coding In BSV these are easy, because the source code remains uncluttered by all the complex control and mux logic atomicity ensures correctness Copyright © Bluespec Inc. 2006 Confidential and Proprietary 92

Broad-brush differences between BSV and RTL: BSV is not simulation-centric RTL and System. C Broad-brush differences between BSV and RTL: BSV is not simulation-centric RTL and System. C are simulation-centric “Synthesizable subsets” were defined later Many concepts/constructs are a consequence of this SW-processlike simulation view. E. g. , n n n Execution of a process has a program-counter-like “locus of control” Variables have the semantics of updatable memory locations, updated when “execution reaches this statement” Sensitivity lists “If execution reaches this statement, the wire is driven with the value of the right-hand side” Functions/procedures get called, execute, and return (stack like semantics) None of these are particularly meaningful from a HW point of view: the tail (simulation) is wagging the dog (HW description) BSV is not simulation-centric, and in these respects, BSV is closer to traditional HW view Copyright © Bluespec Inc. 2006 Confidential and Proprietary 93

Broad-brush differences between BSV and RTL: Datapaths and control paths With BSV you don’t Broad-brush differences between BSV and RTL: Datapaths and control paths With BSV you don’t think separately about datapaths and control Each Rule specifies the part of the datapath relevant for its behavior, and the control conditions under which the path is traversed The Bluespec compiler combines these specifications to generate the final datapaths and control circuitry n n No central datapath description No central “control FSM” Copyright © Bluespec Inc. 2006 Confidential and Proprietary 94

Interface abstraction Copyright © Bluespec Inc. 2006 Confidential and Proprietary 95 Interface abstraction Copyright © Bluespec Inc. 2006 Confidential and Proprietary 95

Interface abstraction Examples of BSV hierarchical and polymorphic interfaces (all synthesizable): n n n Interface abstraction Examples of BSV hierarchical and polymorphic interfaces (all synthesizable): n n n interface Put#(t); method Action put(t x); endinterface Get#(t); … endinterface Client#(req. Type, resp. Type); interface Get#(req. Type) request; interface Put#(resp. Type) response; endinterface Server#(req. Type, resp. Type); interface Put#(req. Type) request; interface Get#(resp. Type) response; endinterface DMA#(bus. Req, bus. Resp) interface Client#(bus. Req, bus. Resp) data. Mover; interface Server#(bus. Req, bus. Resp) config; endinterface Copyright © Bluespec Inc. 2006 Confidential and Proprietary 96

Client/Server interfaces Get/Put pairs are very common, and duals of each other, so the Client/Server interfaces Get/Put pairs are very common, and duals of each other, so the library defines Client/Server interface types for this purpose get ready enable data ready resp_t data ready data put req_t enable interface Server #(req_t, resp_t); interface Put#(req_t) request; interface Get#(resp_t) response; endinterface client enable interface Client #(req_t, resp_t); interface Get#(req_t) request; interface Put#(resp_t) response; endinterface get put server Copyright © Bluespec Inc. 2006 Confidential and Proprietary 97

Client/Server interfaces interface Cache. Ifc; interface Server#(Req_t, Resp_t) ipc; interface Client#(Req_t, Resp_t) icm; endinterface Client/Server interfaces interface Cache. Ifc; interface Server#(Req_t, Resp_t) ipc; interface Client#(Req_t, Resp_t) icm; endinterface mk. Processor client get put get server mk. Cache client get put module mk. Cache (Cache. Ifc); // from / to processor FIFO#(Req_t) p 2 c <- mk. FIFO; FIFO#(Resp_t) c 2 p <- mk. FIFO; // to / from memory FIFO#(Req_t) c 2 m <- mk. FIFO; FIFO#(Resp_t) m 2 c <- mk. FIFO; … rules expressing cache logic … interface ipc = fifos. To. Server (p 2 c, c 2 p); interface icm = fifos. To. Client (c 2 m, m 2 c); endmodule get server mk. Mem Copyright © Bluespec Inc. 2006 Confidential and Proprietary 98

mk. Connection Using these interface facilities, assembling systems becomes very easy mk. Processor client mk. Connection Using these interface facilities, assembling systems becomes very easy mk. Processor client get put server (ipc) mk. Cache client (icm) get put get server interface Cache. Ifc; interface Server#(Req_t, Resp_t) ipc; interface Client#(Req_t, Resp_t) icm; endinterface module mk. Top. Level (…) // instantiate subsystems Client #(Req_t, Resp_t) p <- mk. Processor; Cache_Ifc #(Req_t, Resp_t) c <- mk. Cache; Server #(Req_t, Resp_t) m <- mk. Mem; // instantiate connects mk. Connection (p, c. ipc); mk. Connection (c. icm, m); endmodule mk. Mem Copyright © Bluespec Inc. 2006 Confidential and Proprietary 99

Bluespec System. Verilog Agenda: Technical Deep Dive Intro: why an HDL can affect overall Bluespec System. Verilog Agenda: Technical Deep Dive Intro: why an HDL can affect overall productivity, from concept to silicon Behavior: n n Rules: a new way to express HW behavior Correctness: why rules help Comparison with behavioral synthesis Rule-based Interface Methods: modularizing rules Structure: improving the expression of HW structure using ideas from advanced programming languages Clock domains and gated clocks: compiler-guaranteed safety Testbenches using BSV Transaction Level Modeling/architecture exploration and refinement, within a single paradigm n Comparison with System. C Synthesis quality: as good as hand-coded RTL Tool flows n Coexistence with Verilog/VHDL/SV/System. C Futures: n n Integration of Rules and Rule-based Interfaces into System. C Formal verification Copyright © Bluespec Inc. 2006 Confidential and Proprietary 100

Bluespec System. Verilog™ A one-slide overview Bluespec System. Verilog Behavioral For complex concurrency and Bluespec System. Verilog™ A one-slide overview Bluespec System. Verilog Behavioral For complex concurrency and control, across multiple shared resources, across module boundaries High-level abstract types Powerful static checking Powerful parameterization Powerful static elaboration Advanced clock management Two dimensions raising the level of abstraction (fully synthesizable) Structural Rules and Interface Methods VHDL/Verilog/System. C Copyright © Bluespec Inc. 2006 Confidential and Proprietary 101

Structural abstractions The behavioral abstractions (Rules and Interface Methods), by themselves, tremendously improve productivity Structural abstractions The behavioral abstractions (Rules and Interface Methods), by themselves, tremendously improve productivity and correctness n A designer can be productive with Rules and Interface Methods after about a day of training The structural abstractions (types, parameterization, static checking, elaboration) are an additional substantial multiplier Copyright © Bluespec Inc. 2006 Confidential and Proprietary 102

Example: a butterfly switch (crossbar) 00 01 10 11 Basic building blocks: Recursive construction: Example: a butterfly switch (crossbar) 00 01 10 11 Basic building blocks: Recursive construction: 1 x 1 2 x 2 4 x 4 … Nx. N Copyright © Bluespec Inc. 2006 Confidential and Proprietary 103

Butterfly switch: code excerpts interface XBar #(type t); interface List#(Put#(t)) interface List#(Get#(t)) endinterface input_ports; Butterfly switch: code excerpts interface XBar #(type t); interface List#(Put#(t)) interface List#(Get#(t)) endinterface input_ports; output_ports; Polymorphic (type parameter t) Sub-interfaces (hierarchical) Aggregation (lists, vectors of interfaces) Copyright © Bluespec Inc. 2006 Confidential and Proprietary 104

Butterfly switch: code excerpts module mk. XBar #(Integer logn, function Bit #(32) destination. Of Butterfly switch: code excerpts module mk. XBar #(Integer logn, function Bit #(32) destination. Of (t x), module #(Merge 2 x 1 #(t)) mk. Merge 2 x 1) (XBar #(t)) … endmodule: mk. XBar // // param interface Size parameter: logn Comb. circuit parameter: destination. Of Module parameter: mk. Merge 2 x 1 n Encapsulates flow-control, arbitration, queueing behavior of the 2 x 1 merge Interfaces instead of port lists: XBar#(t) Polymorphic: type parameter t Copyright © Bluespec Inc. 2006 Confidential and Proprietary 105

Butterfly switch: code excerpts module mk. XBar #(Integer logn, …) if (logn == 0) Butterfly switch: code excerpts module mk. XBar #(Integer logn, …) if (logn == 0) … // BASE CASE FIFO#(t) f <- mk. FIFO; … else … // RECURSIVE CASE XBar#(t) upper <- mk. XBar (logn-1, …); XBar#(t) lower <- mk. XBar (logn-1, …); … for (Integer j = 0; j < n; j = j + 1) … rule route; … if (! flip) merges [j]. iport 0. put (x); else merges [j. Flipped]. iport 1. put (x); endrule endmodule: mk. XBar Arbitrary elaboration n (here: conditional, recursion, loop) All constructs can be elaborated n (first class modules, interfaces, rules, …) Copyright © Bluespec Inc. 2006 Confidential and Proprietary 106

Butterfly switch (see also whitepaper and/or demo for full code) Summary: - Advanced parameterization Butterfly switch (see also whitepaper and/or demo for full code) Summary: - Advanced parameterization - Recursive elaboration - The switch itself: < 60 lines of BSV code - First working (tested) prototype: < 1 day (including simple testbench) - Fully synthesizable: - synthesized to netlist (Magma, tsmc 0. 18 u, 500 MHz) Copyright © Bluespec Inc. 2006 Confidential and Proprietary 107

Example: parameterized, pipelined, priority queue (P 3 Q) deq enq: insertion point depends on Example: parameterized, pipelined, priority queue (P 3 Q) deq enq: insertion point depends on “priority” Specs: Must be synthesizable to quality HW Must allow simultaneous (same clock) enq/deq Must be parameterized with: n n n Capacity of queue Item-type (data type of items being queued) Precise bit-representation of item-type Priority function (“item 1 <= item 2”) Pipelined (2 -clock) or non-pipelined (1 -clock) enq op (to allow synthesis at range of clock speeds) w Pipelining should not affect external enq deq latency Copyright © Bluespec Inc. 2006 Confidential and Proprietary 108

P 3 Q in Bluespec System. Verilog Written, tested, synthesized in ~ 3 days P 3 Q in Bluespec System. Verilog Written, tested, synthesized in ~ 3 days About 610 lines of understandable, well commented code n (~ 400 lines if ignore comments) Synthesized at 400 MHz (Magma, TSMC 0. 18 u) (see white paper) Quote from expert commercial architect/designer who specified this problem: expect this to be a 10 X improvement over what we do today” “I Compares very well with solutions in any other SW programming language or HDL! Copyright © Bluespec Inc. 2006 Confidential and Proprietary 109

Bluespec System. Verilog Agenda: Technical Deep Dive Intro: why an HDL can affect overall Bluespec System. Verilog Agenda: Technical Deep Dive Intro: why an HDL can affect overall productivity, from concept to silicon Behavior: n n Rules: a new way to express HW behavior Correctness: why rules help Comparison with behavioral synthesis Rule-based Interface Methods: modularizing rules Structure: improving the expression of HW structure using ideas from advanced programming languages Clock domains and gated clocks: compiler-guaranteed safety Testbenches using BSV Transaction Level Modeling/architecture exploration and refinement, within a single paradigm n Comparison with System. C Synthesis quality: as good as hand-coded RTL Tool flows n Coexistence with Verilog/VHDL/SV/System. C Futures: n n Integration of Rules and Rule-based Interfaces into System. C Formal verification Copyright © Bluespec Inc. 2006 Confidential and Proprietary 110

Bluespec System. Verilog™ A one slide overview Bluespec System. Verilog Behavioral For complex concurrency Bluespec System. Verilog™ A one slide overview Bluespec System. Verilog Behavioral For complex concurrency and control, across multiple shared resources, across module boundaries High level abstract types Powerful static checking Powerful parameterization Powerful static elaboration Advanced clock management Two dimensions raising the level of abstraction (fully synthesizable) Structural Rules and Interface Methods VHDL/Verilog/System. C Copyright © Bluespec Inc. 2006 Confidential and Proprietary 111

Advanced clock management Clock domains: n n n Clock abstract type, with static checking Advanced clock management Clock domains: n n n Clock abstract type, with static checking of clock compatibility So, impossible to connect across clock domains without a synchronizer Rich, user extensible library of synchronizers Gated clocks, for power management n n Clock gating conditions contribute to Rule conditions So, impossible to communicate with a clock domain that is gated “off” Copyright © Bluespec Inc. 2006 Confidential and Proprietary 112

Power management: Multiple clock domains One of the most effective ways to control power Power management: Multiple clock domains One of the most effective ways to control power consumption Divide the design into “islands” or “domains” that use a common clocking discipline Run each domain at the slowest clock speed that is adequate to meet performance specs “Gate”-off clocks to domains that are currently not being used n E. g. , digital camera circuits in a cell phone when the camera is not in use Copyright © Bluespec Inc. 2006 Confidential and Proprietary 113

Multiple clock domains: Typical design rules Always use a “synchronizer” at domain boundaries n Multiple clock domains: Typical design rules Always use a “synchronizer” at domain boundaries n Unless the two clocks only differ in gating (same underlying “oscillator”) Do not communicate with a gated-off domain n But you may still need to read “most recent values” before the clock was gated off “Ignore” timing violations in synchronizers n n By definition they violate clock timing discipline “False paths” in synthesis constraints Copyright © Bluespec Inc. 2006 Confidential and Proprietary 114

Multiple clock domains: enforcing design rules in BSV treats Clock as a special abstract Multiple clock domains: enforcing design rules in BSV treats Clock as a special abstract data type n n Distinguised from all other types Type-checking ensures that clocks never get mixed up with ordinary signals For clock dividers, BSV provides only “trusted” primitives for deriving the divided clock from an existing clock For clock generation, BSV provides only “trusted” primitives for elevating an ordinary signal into a Clocks can be used in expressions, parameters, arguments, arrays, …; type-checking ensures safety Clock c 1; Clock c = (b ? c 1 : c 2); //b must be known at compile-time Copyright © Bluespec Inc. 2006 Confidential and Proprietary 115

Multiple clock domains: enforcing design rules in the design language BSV provides primitives to Multiple clock domains: enforcing design rules in the design language BSV provides primitives to associate a boolean signal with a Clock, as a gating signal Bool b 1 = …; Bool b 2 = …; Clock c 1 <- mk. Gated. Clock (b 1, clocked_by c 0); Clock c 2 <- mk. Gated. Clock (b 2, clocked_by c 1); New gating signals are “ANDed” with existing gating signals Compiler keeps track fact that c 0, c 1 and c 2 differ only in gating signals (have a common oscillator) n c 0, c 1 and c 2 are said to be “in the same clock family” Copyright © Bluespec Inc. 2006 Confidential and Proprietary 116

Multiple clock domains: enforcing design rules in the design language When instantiating a module, Multiple clock domains: enforcing design rules in the design language When instantiating a module, can connect Clocks as usual n Type-checking ensures that only a Clock signal can be connected to a Clock port Ifc. Type ifc <- mk. Module (…, c 1, clocked_by c 0) Statically checked rules also n n ensure that each Rule based Interface Method of the instantiated module is clocked with a unique Clock keeps track of which method is clocked by which Clock … ifc. method_A () … … ifc. method_B () … Copyright © Bluespec Inc. 2006 // clocked by c 1 // clocked by c 0 Confidential and Proprietary 117

Multiple clock domains: enforcing design rules in the design language In every Rule, type-checking Multiple clock domains: enforcing design rules in the design language In every Rule, type-checking ensures that all the methods used in the rule have a “compatible” clock (same clock family) rule foo (5 < mod 1. method 1()); let x = mod 2. method 2 (True); mod 3. method 3 (x, x+1); endrule mod 1. method 1, mod 2. method 2 and mod 3. method 3 must have the same clock (or be in the same family) n n If not, a static error is raised by the compiler If, e. g. , mod 1. method 1 has a different clock, the designer must insert a synchronizer module between mod 1. method 1 and its use in this rule, to resolve the incompatibility Copyright © Bluespec Inc. 2006 Confidential and Proprietary 118

Multiple clock domains: enforcing design rules in the design language In every Rule, clock Multiple clock domains: enforcing design rules in the design language In every Rule, clock gating conditions are “ANDed” with the rule condition rule foo (5 < mod 1. method 1()); let x = mod 2. method 2 (True); mod 3. method 3 (x, x+1); endrule The rule will not execute if any of the clocks of any of the methods is gated off n Therefore, will not attempt to communicate with a method that is gated off Copyright © Bluespec Inc. 2006 Confidential and Proprietary 119

Power management: Multiple clock domains — summary Today’s So. Cs have numerous clock domains: Power management: Multiple clock domains — summary Today’s So. Cs have numerous clock domains: n n Different IP blocks run at different clock speeds For power management Abstract types, type checking, and clock tracking can eliminate many of the common errors made by designers in managing multiple clock domains n n n Clean clocks: cannot accidently use a (possibly skewed) signal for a clock Cannot accidently connect across clock domain boundaries of unrelated clocks without using a synchronizer Cannot accidentally communicate with a module whose clock is currently gated off Copyright © Bluespec Inc. 2006 Confidential and Proprietary 120

Example: USB 2. 0 UTMI USB Device USB Host Device Specific Logic Serial Interface Example: USB 2. 0 UTMI USB Device USB Host Device Specific Logic Serial Interface Engine USB 2. 0 Transceiver Macrocell (UTMI) Source: UTMI specification, version 1. 05 Copyright © Bluespec Inc. 2006 Confidential and Proprietary USB PHY, includes: • Data serialization/ deserialization • Bit stuffing • Clock recovery and synchronization - Including 480 Mbps serial mode USB 2. 0

UTMI Implementation 480 MHz Input Clocks (8) 8 4 Receive USB 2. 0 Transceiver UTMI Implementation 480 MHz Input Clocks (8) 8 4 Receive USB 2. 0 Transceiver 12 MHz Transmitter (120 MHz) Macrocell. Generated Clock 16 Transmit Word 4 Transmit 4 Phy. Out Analog Front End Receive Word Physical Interface (480/12 MHz) 16 Oversampler Serial Interface Engine (SIE) (30 MHz) Receiver (120 MHz) BSV Implementation 480 MHz Input Clock 13 Clock Domains! Copyright © Bluespec Inc. 2006 Confidential and Proprietary USB 2. 0

UTMI implementation notes Developed by one engineer in 3 months Verified with Cadence e. UTMI implementation notes Developed by one engineer in 3 months Verified with Cadence e. VC testbench Transmitter & receiver are separable components Synthesizes at 480 MHz in TSMC 0. 18 using Magma with positive slack Absolutely no runtime clock debugging! Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Reuse Copyright © Bluespec Inc. 2006 Confidential and Proprietary 124 Reuse Copyright © Bluespec Inc. 2006 Confidential and Proprietary 124

About Reuse IP Reuse has traditionally been difficult because of n n inflexibility: IP About Reuse IP Reuse has traditionally been difficult because of n n inflexibility: IP block can’t be “adjusted” for different application imprecision: Undocumented scheduling/protocol assumptions All the language-based ideas we have discussed improve the situation: n Rules and Rule based Interface Methods w w n n n Express complex concurrency across shared resources succinctly and naturally Eliminate typical control-logic design errors, including race-conditions, by automatically synthesizing the correct control logic Types, type-checking and clock-checking eliminate careless mistakes by designers Polymorphism and parameterization allow defining generic IP blocks that can be instantiated in widely differing contexts Full-power static elaboration allows very succinct expression of regular structures, dramatically reducing code size, and eliminating tedium and careless mistakes in “cut-and-paste” manual replication Copyright © Bluespec Inc. 2006 Confidential and Proprietary 125

Bluespec System. Verilog Agenda: Technical Deep Dive Intro: why an HDL can affect overall Bluespec System. Verilog Agenda: Technical Deep Dive Intro: why an HDL can affect overall productivity, from concept to silicon Behavior: n n Rules: a new way to express HW behavior Correctness: why rules help Comparison with behavioral synthesis Rule-based Interface Methods: modularizing rules Structure: improving the expression of HW structure using ideas from advanced programming languages Clock domains and gated clocks: compiler-guaranteed safety Testbenches using BSV Transaction Level Modeling/architecture exploration and refinement, within a single paradigm n Comparison with System. C Synthesis quality: as good as hand-coded RTL Tool flows n Coexistence with Verilog/VHDL/SV/System. C Futures: n n Integration of Rules and Rule-based Interfaces into System. C Formal verification Copyright © Bluespec Inc. 2006 Confidential and Proprietary 126

Bluespec System. Verilog for Testbenches Copyright © Bluespec Inc. 2006 Confidential and Proprietary 127 Bluespec System. Verilog for Testbenches Copyright © Bluespec Inc. 2006 Confidential and Proprietary 127

Verification is still a bottleneck TB complexity grows along with exploding complexity in DUTs Verification is still a bottleneck TB complexity grows along with exploding complexity in DUTs n n n Complex TB behaviors (simultaneous stimulus on multiple ports, pipelining, out-of-order processing) Mixing new and old IPs in So. Cs Inadequate facilities to construct libraries of common TB design patterns Inadequate interface semantics n n n Complex data types Complex interface protocols Difficult to refine from TLM to Implementation Level Limited Parameterization and therefore reuse of Verification IPs, Transactors, etc. Bluespec’s strengths can remove these bottlenecks Copyright © Bluespec Inc. 2006 Confidential and Proprietary 128

BSV improves verification: for the Testbenches enjoy the same benefits: n Express complex concurrency BSV improves verification: for the Testbenches enjoy the same benefits: n Express complex concurrency correctly with Rules n State-machine generation w n Succinct expression of stimulus patterns Correct connection to DUT w Interface Methods are naturally transactional n w w n Interface abstraction allows high-level interfaces No interface timing errors Clock discipline Reuse due to parameterization Copyright © Bluespec Inc. 2006 Confidential and Proprietary 129

State machine generation // Specify an FSM generating a test seqence Stmt test_seq = State machine generation // Specify an FSM generating a test seqence Stmt test_seq = seq for (i <= 0; i < NI; i <= i + 1) // each input for (j <= 0; j < NJ; j <= j + 1) begin //each output let pkt <- gen_packet (); send_packet (i, j, pkt); // test i-j path in isolation end par // test packet arbitration by sending packets in parallel send_packet (0, 1, pkt 0); // to output 1 send_packet (1, 1, pkt 1); // to output 1 (collision) endpar endseq // Generate the FSM mk. Auto. FSM (test_seq); Easy to specify precise orchestration of stimulus n sequencing, parallel, iteration Same Rule semantics n automatically flow-controlled, robust to latency variations, etc Copyright © Bluespec Inc. 2006 Confidential and Proprietary 130

Example: An Ethernet MAC testbench is created that corresponds to an existing SV TB. Example: An Ethernet MAC testbench is created that corresponds to an existing SV TB. The testbench is quickly extended to create a switch for more real life testing at a fraction of the effort it would take to write and debug the original SV TB Copyright © Bluespec Inc. 2006 Confidential and Proprietary

MAC Testbench Structure Frame Source Sink Verilog 95 Frame Source Bluespec Sink Test DUT MAC Testbench Structure Frame Source Sink Verilog 95 Frame Source Bluespec Sink Test DUT Transmitting Packets Receiving Packets SWEM Master WB IFC Slave WB IFC DUT (MAC) Copyright © Bluespec Inc. 2006 Confidential and Proprietary MII Interface Slave WB IFC MAC Interrupts Software Emulator MII Interface RAM PHY 132

Adding concurrency Original SV Tb Untimed Tb ~7000 lines of code No concurrency management Adding concurrency Original SV Tb Untimed Tb ~7000 lines of code No concurrency management Stand-alone checking New Tb Timed Tb ~2600 lines of code Generalized Wishbone Model Includes infrastructure to handle concurrency Generalized Switch Router Environment Parameterized Verification Environment With Concurrency Copyright © Bluespec Inc. 2006 Confidential and Proprietary 133

Extended Example Combine DUTs into router/switch n n n Multiple DUTs Packet Routing across Extended Example Combine DUTs into router/switch n n n Multiple DUTs Packet Routing across Wishbone bus Wishbone now includes round-robin arbiter. Little additional code required n n Wishbone bus etc. already generalized Instantiate multiple DUTs Add Arbiter/Bank Add serialization code (Frame -> WB) Original ~2583 relevant lines Modified ~2957 relevant lines Copyright © Bluespec Inc. 2006 Confidential and Proprietary 134

Original Testbench Structure Software Emulator Master WB IFC MAC Slave WB IFC Master WB Original Testbench Structure Software Emulator Master WB IFC MAC Slave WB IFC Master WB IFC DUT (MAC) Copyright © Bluespec Inc. 2006 MII Interface Slave WB IFC RAM Confidential and Proprietary MII Interface Sink SWEM Interrupts Frame Source Sink Verilog 95 Frame Source Bluespec PHY 135

MAC Extended Example (as Router) M/S WB IFC WB Serializer M/S WB IFC Verilog MAC Extended Example (as Router) M/S WB IFC WB Serializer M/S WB IFC Verilog 95 WB Serializer Current Tb Current TB M/S WB IFC Wishbone Bus Arbiter Bluespec Address Bank WB Serializer Current Tb Current TB Copyright © Bluespec Inc. 2006 Sink Frame Source Confidential and Proprietary 136

Bluespec System. Verilog Agenda: Technical Deep Dive Intro: why an HDL can affect overall Bluespec System. Verilog Agenda: Technical Deep Dive Intro: why an HDL can affect overall productivity, from concept to silicon Behavior: n n Rules: a new way to express HW behavior Correctness: why rules help Comparison with behavioral synthesis Rule-based Interface Methods: modularizing rules Structure: improving the expression of HW structure using ideas from advanced programming languages Clock domains and gated clocks: compiler-guaranteed safety Testbenches using BSV Transaction Level Modeling/architecture exploration and refinement, within a single paradigm n Comparison with System. C Synthesis quality: as good as hand-coded RTL Tool flows n Coexistence with Verilog/VHDL/SV/System. C Futures: n n Integration of Rules and Rule-based Interfaces into System. C Formal verification Copyright © Bluespec Inc. 2006 Confidential and Proprietary 137

BSV for So. C (System on a Chip) design a. k. a. ESL (Electronic BSV for So. C (System on a Chip) design a. k. a. ESL (Electronic System Level) design Copyright © Bluespec Inc. 2006 Confidential and Proprietary 138

Today’s chips: “So. C”s (System on a Chip) “IP” blocks n (“Intellectual Property”) Processors Today’s chips: “So. C”s (System on a Chip) “IP” blocks n (“Intellectual Property”) Processors Caches, Memories Interconnects DMAs Other peripheral blocks I/O blocks E. g. , cell phones, cell network base stations, TV set-top boxes, i. Pods, digital cameras, … Copyright © Bluespec Inc. 2006 Confidential and Proprietary 139

Design Issues Complex tradeoffs in deciding architectures; need early HW architecture metrics: n n Design Issues Complex tradeoffs in deciding architectures; need early HW architecture metrics: n n Processor power, cache organization, bus and interconnect sizing, latencies, throughputs Pipelined transactions, bursts, out-of-order processing SW development needs to begin before HW is ready Simulation speed (“boot the OS on the processor and run the video app thru the MPEG decoder HW IP block”) n Simulation speed inversely related to level of detail being simulated Copyright © Bluespec Inc. 2006 Confidential and Proprietary 140

TLM: Transaction Level Models TLM is a level of abstraction well above the hardware TLM: Transaction Level Models TLM is a level of abstraction well above the hardware implementation level, based on “transactions” at module interfaces n n E. g. , “send an Ethernet packet”, “read a disk sector” Instead of: w w “send a byte/word” Wait for RDY, assert DATA_IN, assert ENABLE Advantages: n n Models can be built quickly Capture essential functionality and essential structure Provide an enviroment for early development of embedded software Much faster simulation Copyright © Bluespec Inc. 2006 Confidential and Proprietary 141

Ideal: One consistent platform for system exploration & design Models Transaction Models Implementation Abstraction/ Ideal: One consistent platform for system exploration & design Models Transaction Models Implementation Abstraction/ refinement dimension Implementation Architecture dimension Copyright © Bluespec Inc. 2006 Confidential and Proprietary 142

BSV: single-language methodology BSV Tools allow embedding C code (for embedded SW, early modelling) BSV: single-language methodology BSV Tools allow embedding C code (for embedded SW, early modelling) Transaction Model in BSV (with embedded C) Interface methods are naturally “transactional” n Interfaces can express complex interactions Rules are naturally “reactive” refinement Types, parameterization, abstraction comparable to C++ Rules make it easier to express complex concurrency (due to atomicity) HW Implementation in BSV HW metrics available from the beginning, for architecture decisions Good HW synthesis exists Single language environment, with strong semantics to enable disciplined refinement, testbench reuse, etc. Copyright © Bluespec Inc. 2006 Confidential and Proprietary 143

The importance of rapid architecture exploration Can you estimate the hardware size of an The importance of rapid architecture exploration Can you estimate the hardware size of an IP block, just by looking at the spec? Let’s look at what happened in three actual design activities: n n n LPM (Longest Prefix Match) in Internet Packet Router MIPS processor 2 -stage pipeline 802. 11 a transmitter Copyright © Bluespec Inc. 2006 Confidential and Proprietary 144

* F IP address … … … F F … E A F B * F IP address … … … F F … E A F B A C 7 F 10 255 Result F 18 M Ref F 2 10. 18. 201. 5 F 3 7. 14. 7. 2 A 4 5. 13. 7. 2 E 10. 18. 200. 7 C 1 4 Copyright © Bluespec Inc. 2006 … 7. 13. 7. 3 200 F 5 D … 5. *. *. * F 3 A … D 7 … 10. 18. 200. 5 F A … C 14 … 10. 18. 200. * F … B E … 7. 14. 7. 3 5 F … A … 7. 14. *. * 0 … A lookup table (sparse tree) for LPM C Real-world lookup algorithms are more complex but all make a sequence of dependent memory references. Confidential and Proprietary 145

Software version of LPM (in C) int lpm (IPADDRESS ipa) { int p; p Software version of LPM (in C) int lpm (IPADDRESS ipa) { int p; p = RAM [ipa >> 16]; if (is. Leaf(p)) return p; // level 1 lookup (16 b) p = RAM [p + (ipa >> 8) & 0 x. FF]; // level 2 lookup (8 b) if (is. Leaf(p)) return p; } p = RAM [p + ipa & 0 x. FF]; return p; // level 3 lookup (8 b) Note: the C code says nothing about good microarchitectures for HW implementation Copyright © Bluespec Inc. 2006 Confidential and Proprietary 146

Longest Prefix Match for IP lookup Even for such a small function, 3 dramatically Longest Prefix Match for IP lookup Even for such a small function, 3 dramatically different architectures (no doubt many more possibilities) Static pipeline Linear pipeline Inefficient memory usage but simple design Designer’s Ranking: Efficient memory usage through memory port replicator 1 2 Circular pipeline Efficient memory with most complex control Which is “best”? 3 Arvind, Nikhil, Rosenband & Dave ICCAD 2004 Copyright © Bluespec Inc. 2006 Confidential and Proprietary 147

Synthesis results LPM versions Best Area (gates) Best Speed (ns) Static V, I 8898 Synthesis results LPM versions Best Area (gates) Best Speed (ns) Static V, I 8898 3. 60 Static V, II 2271 3. 56 Static BSV 2391 (5% larger) 3. 32 (7% faster) Linear V 14759 4. 7 Linear BSV 15910 (8% larger) 4. 7 (same) Circular V 8103 3. 62 Circular BSV 8170 (1% larger) 3. 67 (2% slower) V = Verilog; BSV = Bluespec System. Verilog, TSMC 0. 18 µm Microarchitecture is by far the most significant determinant of HW quality Even for an apparently “fixed” microarchitecture, clever microarchitecture optimization can have a dramatic effect n (Static V, I vs Static V, II) Copyright © Bluespec Inc. 2006 Confidential and Proprietary 148

(In)applicability of “behavioral synthesis” Traditional “behavioral synthesis” has a hard time with this example (In)applicability of “behavioral synthesis” Traditional “behavioral synthesis” has a hard time with this example (just 10 lines of C!) n n Hard to analyze variable number of memory reads that are data-dependent on each other Hard to interleave them to access a single shared resource (memory) Designer creativity needed to improve “Static V I” from “Static V II” (clever sharing of state machine) Designer creativity needed to come up with circular pipeline Copyright © Bluespec Inc. 2006 Confidential and Proprietary 149

Design Activity 2 MIT postgraduate course: n n 6. 884 Complex Digital Systems, Spring Design Activity 2 MIT postgraduate course: n n 6. 884 Complex Digital Systems, Spring 2005 (see http: //csg. csail. mit. edu/6. 884/index. html) Lab task: design and synthesize a simple MIPS 2 stage processor pipeline n n Can there really be much variation in this? The next slide shows the variation in HW quality across the different lab project teams Copyright © Bluespec Inc. 2006 Confidential and Proprietary 150

Lab 2 Results Pareto-Optimal Points Source: http: //csg. csail. mit. edu/6. 884/lab 2 -results. Lab 2 Results Pareto-Optimal Points Source: http: //csg. csail. mit. edu/6. 884/lab 2 -results. html Copyright © Bluespec Inc. 2006 Confidential and Proprietary 151

802. 11 a transmitter Copyright © Bluespec Inc. 2006 Confidential and Proprietary 152 802. 11 a transmitter Copyright © Bluespec Inc. 2006 Confidential and Proprietary 152

802. 11 a: What’s the optimal implementation for power, area, performance? 802. 11 a 802. 11 a: What’s the optimal implementation for power, area, performance? 802. 11 a Wi-Fi transmitter targeted at a wireless platform Final design: 4 milliwatts accounts for > 95% area Power Characterization RTL for New Macro-/Micro. Architecture Source: Dave, Pellauer, Gerding & Arvind Copyright © Bluespec Inc. 2006 Confidential and Proprietary 153

IFFT: Micro-architectural exploration out 0 in 1 out 1 in 2 out 2 in IFFT: Micro-architectural exploration out 0 in 1 out 1 in 2 out 2 in 3 in 4 … in 59 in 60 Sharing radix 4’s? in 0 out 3 out 4 … Folding stages? out 59 out 60 in 61 Each stage is almost identical, why not fold and re-use what you can? out 61 in 62 Each stage’s 16 radix 4 blocks could be also implemented with 8, 4, 2 or 1 radix 4 block(s) used over multiple cycles out 62 in 63 Each of the 48 radix 4 blocks looks like this Copyright © Bluespec Inc. 2006 out 63 Confidential and Proprietary 154

Superfolded circular pipeline: Just one Radix-4 node! in 0 out 1 64, 4 -way Superfolded circular pipeline: Just one Radix-4 node! in 0 out 1 64, 4 -way Muxes in 3 Radix 4 Permute_1 in 2 4, 16 -way Muxes in 1 out 0 in 4 out 3 out 4 … Permute_2 in 63 Index Counter 0 to 15 4, 16 -way De. Muxes … out 2 out 63 Permute_3 Stage Counter 0 to 2 Designer intuition: Most efficient design lowest power Copyright © Bluespec Inc. 2006 Confidential and Proprietary 155

Synchronous pipeline f 1 f 2 f 3 x in. Q s. Reg 1 Synchronous pipeline f 1 f 2 f 3 x in. Q s. Reg 1 rule sync-pipeline (True); in. Q. deq(); s. Reg 1 <= f 1(in. Q. first()); s. Reg 2 <= f 2(s. Reg 1); out. Q. enq(f 3(s. Reg 2)); endrule Copyright © Bluespec Inc. 2006 s. Reg 2 out. Q This is real IFFT code; just replace f 1, f 2 and f 3 with stage_f code Confidential and Proprietary 156

Folded pipeline f 1 f x in. Q stage s. Reg f 2 out. Folded pipeline f 1 f x in. Q stage s. Reg f 2 out. Q rule folded-pipeline (True); if (stage==1) begin in. Q. deq(); sx. In= in. Q. first(); end else sx. In= s. Reg; f 3 function f (stage, sx); case (stage) 1: return f 1(sx); 2: return f 2(sx); 3: return f 3(sx); endcase endfunction sx. Out = f(stage, sx. In); if (stage==3) out. Q. enq(sx. Out); else s. Reg <= sx. Out; This is real IFFT code too. . . stage <= (stage==3)? 1 : stage+1; endrule Copyright © Bluespec Inc. 2006 Confidential and Proprietary 157

Performance results 7 combinations created and explored within 5 days Designers were astounded to Performance results 7 combinations created and explored within 5 days Designers were astounded to find that their intuitions were wrong and that the critical areas for reducing power were not where they suspected Optimal power Original designer intuition Copyright © Bluespec Inc. 2006 Confidential and Proprietary 158

BSV Advantages It is essential to do architectural exploration for better (area, power, performance, BSV Advantages It is essential to do architectural exploration for better (area, power, performance, . . . ) designs Bluespec enables rapid architectural exploration Fast, low-effort, low-risk changes enable: n n Rapid architectural/micro-architectural exploration and optimization Nimble responses to: w w Feature/spec changes Timing closure challenges Bug fixes Area optimizations Copyright © Bluespec Inc. 2006 Confidential and Proprietary 159

Architecture exploration: summary Despite the self-image of many experienced engineers, there is a wide Architecture exploration: summary Despite the self-image of many experienced engineers, there is a wide margin of error in estimating size of IP blocks without actually prototyping them (working out microarchitectures) n A bad estimate will leave you stuck with a sub-optimal design So, Transaction Level Modeling, and their quick refinement to realistic hardware, are essential for accurate evaluation of candidate architectures Essential to have a design language that supports this n n n High levels of abstraction High levels of static checking and elaboration Synthesis from high level into quality hardware Copyright © Bluespec Inc. 2006 Confidential and Proprietary 160

BSV for architectural exploration Rules and Interface Methods are “transactional” in nature n Can BSV for architectural exploration Rules and Interface Methods are “transactional” in nature n Can be written at very high level (in addition to the microarchitectural level) E. g. , module interconnection using highly parameterized Get/Put interfaces n n From complete packets to bits Similar to System. C TLM Clear semantics for splitting, joining, adding, removing n n Rich theory developed over many decades in Computer Science Enables disciplined refinement Copyright © Bluespec Inc. 2006 Confidential and Proprietary 161

Methods and Transaction Level Modeling Each method can be read as a transaction that Methods and Transaction Level Modeling Each method can be read as a transaction that can be applied against a module By just changing the level of abstraction of the arguments and results, we can move from realistic hardware to high-level models, using the single paradigm of methods Get#(Bit#(16)) m <- mk. M; Put#(Bit#(16)) n <- mk. N; Get#(Ether. Packet) m <- mk. M; Put#(Ether. Packet) n <- mk. N; rule r 1 (…); Bit#(16) x <- m. get(); n. put (x); endrule r 1 (…); Ether. Packet x <- m. get(); n. put (x); endrule Copyright © Bluespec Inc. 2006 Confidential and Proprietary 162

Example: A reference platform is created to test a device driver for a hard Example: A reference platform is created to test a device driver for a hard disk microdrive. The reference platform allows either the device driver or the hardware model to be swapped out for the actual implementation. The model is instrumented with Assertions for monitoring transactions. Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Creating a Reference Platform System Ro. S (Rest of System) Ro. S periodically initiates Creating a Reference Platform System Ro. S (Rest of System) Ro. S periodically initiates a disk sector R/W transfer, and continues concurrent activity (non-blocking) Handle callbacks asynchronously Callbacks Disk sector R/W requests DD (Device Driver) Monitor Data for PIO reg reads Converts sector transfer requests into IDE protocol consisting of IDE register R/Ws, and responding to IDE HW interrupts Interrupts Monitors all inter-block traffic, checks for immediate and temporal correctness conditions IDE register R/Ws Models IDE register command block and sector data buffer, and behavior in response to IDE commands written into IDE command register HW (IDE disk) Copyright © Bluespec Inc. 2006 Confidential and Proprietary 164

Using the reference platform, replacing DD with real C code System. Cosim In System. Using the reference platform, replacing DD with real C code System. Cosim In System. C simulator All written in C/C++/System. C Ro. S (Rest of System) Other C/C++/System. C code DD (Device Driver) Understanding IDE: 2 weeks Coding and Verification: 5 days Integrating “C” Driver: 3 days In Bluesim All written in BSV (same reference model code) Monitor DD now written in C Communications are simple function calls Interface communication shim code automatically generated by Bluespec compiler Copyright © Bluespec Inc. 2006 HW (IDE disk) Confidential and Proprietary 165

Example: Amba AHB bus system, from transactional level to implementation level Copyright © Bluespec Example: Amba AHB bus system, from transactional level to implementation level Copyright © Bluespec Inc. 2006 Confidential and Proprietary 166

E. g. , Amba AHB bus: transactional level (get/put) Master Block Master transactional interface E. g. , Amba AHB bus: transactional level (get/put) Master Block Master transactional interface Bus master-side transactional interface Direct transactional interconnect (for faster simulation) Bus slave-side transactional interface Slave Block Copyright © Bluespec Inc. 2006 Confidential and Proprietary 167

E. g. , Amba AHB bus: mixed transactional/implementation levels Master Block Master transactional interface E. g. , Amba AHB bus: mixed transactional/implementation levels Master Block Master transactional interface Master Block Bus master-side transactional interface Master interface Bus master-side interface Bus slave-side interface adapter AHB Bus slave-side interface Slave Block Bus slave-side transactional interface adapter Slave transactional interface Slave Block Copyright © Bluespec Inc. 2006 Confidential and Proprietary 168

Amba AHB bus: Implementation level Master Block Master interface Bus master-side interface Bus slave-side Amba AHB bus: Implementation level Master Block Master interface Bus master-side interface Bus slave-side interface AHB Bus slave-side interface Slave Block Copyright © Bluespec Inc. 2006 Confidential and Proprietary 169

Bluespec System. Verilog Agenda: Technical Deep Dive Intro: why an HDL can affect overall Bluespec System. Verilog Agenda: Technical Deep Dive Intro: why an HDL can affect overall productivity, from concept to silicon Behavior: n n Rules: a new way to express HW behavior Correctness: why rules help Comparison with behavioral synthesis Rule-based Interface Methods: modularizing rules Structure: improving the expression of HW structure using ideas from advanced programming languages Clock domains and gated clocks: compiler-guaranteed safety Testbenches using BSV Transaction Level Modeling/architecture exploration and refinement, within a single paradigm n Comparison with System. C Synthesis quality: as good as hand-coded RTL Tool flows n Coexistence with Verilog/VHDL/SV/System. C Futures: n n Integration of Rules and Rule-based Interfaces into System. C Formal verification Copyright © Bluespec Inc. 2006 Confidential and Proprietary 170

Many proof points demonstrating - General applicability, - Productivity - HW quality Copyright © Many proof points demonstrating - General applicability, - Productivity - HW quality Copyright © Bluespec Inc. 2006 Confidential and Proprietary 171

Designs with Bluespec has been used for every design listed: “RISC” processor MIPS Itanium Designs with Bluespec has been used for every design listed: “RISC” processor MIPS Itanium Power. PC ARM L 2 cache ctlr IDCT Motion compensator DES MPEG-4 IFFT Bluespec is the only next generation solution that addresses control and complex datapaths DDR 2 ctlr SRAM ctlr OCP interconnect Bus converters AMBA DMA ctlr 802. 11 a Network proc Queuing engines Sorting queue Arbiter IP lookup Debug controller Complex Datapaths (e. g. processor/ controller) Control Algorithms (e. g. DSP/math) Everyone else only addresses this application space PCI Express I 2 C USB PCI-X FIR filter Pixel processor Waveform generator Pong Copyright © Bluespec Inc. 2006 Confidential and Proprietary 172

Bluespec vs. Hand-coded RTL 7 Designs 5 Designs Copyright © Bluespec Inc. 2006 Confidential Bluespec vs. Hand-coded RTL 7 Designs 5 Designs Copyright © Bluespec Inc. 2006 Confidential and Proprietary 18 Designs 20 Designs 173

IDCT design results Verilog Bluespec System. Verilog RTL coding & unit verification 2. 5 IDCT design results Verilog Bluespec System. Verilog RTL coding & unit verification 2. 5 man-weeks 1. 3 man-weeks Top level verification 1. 5 man-weeks 1. 2 man-weeks 4 man-weeks 2. 5 man-weeks 2716 723 Latency (I O) in clock cycles 172 171 Gate count (2 -input NAND; excluding memory) 52 K 48 K Total effort Lines of code Copyright © Bluespec Inc. 2006 Confidential and Proprietary 174

Itanium: IA 64 in Bluespec Wunderlich & Hoe The first model was developed in Itanium: IA 64 in Bluespec Wunderlich & Hoe The first model was developed in a few months by one student! Copyright © Bluespec Inc. 2006 Confidential and Proprietary 175

… and numerous other examples Validated by customer experience 50% less time (or better) … and numerous other examples Validated by customer experience 50% less time (or better) to verified, synthesized design n Even with no prior knowledge of BSV Area and time of synthesized design matched previous implementations done in Verilog/VHDL n Up to multi-million gate designs Copyright © Bluespec Inc. 2006 Confidential and Proprietary 176

Bluespec System. Verilog Agenda: Technical Deep Dive Intro: why an HDL can affect overall Bluespec System. Verilog Agenda: Technical Deep Dive Intro: why an HDL can affect overall productivity, from concept to silicon Behavior: n n Rules: a new way to express HW behavior Correctness: why rules help Comparison with behavioral synthesis Rule-based Interface Methods: modularizing rules Structure: improving the expression of HW structure using ideas from advanced programming languages Clock domains and gated clocks: compiler-guaranteed safety Testbenches using BSV Transaction Level Modeling/architecture exploration and refinement, within a single paradigm n Comparison with System. C Synthesis quality: as good as hand-coded RTL Tool flows n Coexistence with Verilog/VHDL/SV/System. C Futures: n n Integration of Rules and Rule-based Interfaces into System. C Formal verification Copyright © Bluespec Inc. 2006 Confidential and Proprietary 177

Tools and tool flow Copyright © Bluespec Inc. 2006 Confidential and Proprietary 178 Tools and tool flow Copyright © Bluespec Inc. 2006 Confidential and Proprietary 178

Tools and flow Bluespec System. Verilog source Bluespec Synthesis Blueview Verilog 95 RTL Bluesim Tools and flow Bluespec System. Verilog source Bluespec Synthesis Blueview Verilog 95 RTL Bluesim Cycle Accurate Verilog sim VCD output Legend files (plus other Verilog/VHDL) RTL synthesis gates Visualization (e. g. , Debussy) Bluespec tools 3 rd party tools Copyright © Bluespec Inc. 2006 Confidential and Proprietary 179

Interactive Cross-Probing between Views (source, RTL, Novas Debussy/Verdi) RTL SOURCE Waves Copyright © Bluespec Interactive Cross-Probing between Views (source, RTL, Novas Debussy/Verdi) RTL SOURCE Waves Copyright © Bluespec Inc. 2006 Confidential and Proprietary 180

Bluespec System. Verilog Agenda: Technical Deep Dive Intro: why an HDL can affect overall Bluespec System. Verilog Agenda: Technical Deep Dive Intro: why an HDL can affect overall productivity, from concept to silicon Behavior: n n Rules: a new way to express HW behavior Correctness: why rules help Comparison with behavioral synthesis Rule-based Interface Methods: modularizing rules Structure: improving the expression of HW structure using ideas from advanced programming languages Clock domains and gated clocks: compiler-guaranteed safety Testbenches using BSV Transaction Level Modeling/architecture exploration and refinement, within a single paradigm n Comparison with System. C Synthesis quality: as good as hand-coded RTL Tool flows n Coexistence with Verilog/VHDL/SV/System. C Futures: n n Integration of Rules and Rule-based Interfaces into System. C Formal verification Copyright © Bluespec Inc. 2006 Confidential and Proprietary 181

Concurrency Semantics of Rules and Rule-based Interface Methods are also available in System. C Concurrency Semantics of Rules and Rule-based Interface Methods are also available in System. C Copyright © Bluespec Inc. 2006 Confidential and Proprietary 182

Why integrate System. C with Rules and Rule-based Interface Methods? Improve System. C’s concurrency Why integrate System. C with Rules and Rule-based Interface Methods? Improve System. C’s concurrency model n n Atomic transactions vs. threads and events Rule semantics across module boundaries Provide a path to high-level synthesis for control logic and complex datapaths Enable use of same model for embedded software development and hardware exploration and hardware implementation Copyright © Bluespec Inc. 2006 Confidential and Proprietary 183

+ TLM core System. C class defs/libs + Rules TLM class defs/libs Refinement Bluespec + TLM core System. C class defs/libs + Rules TLM class defs/libs Refinement Bluespec Synthesizable subset Rule class defs/libs Standard System. C tools (gcc, OSCI sim, gdb, …) Bluesim Bluespec synthesis tool other Bluespec tools RTL Standard synthesis back-end tools HW Copyright © Bluespec Inc. 2006 Confidential and Proprietary 184

Components Additional classes and macros (esl. h) n Defines Bluespec Modules, Rules, Methods, Interfaces, Components Additional classes and macros (esl. h) n Defines Bluespec Modules, Rules, Methods, Interfaces, etc ESL Analyzer (“esepp”) n n n Parses Modules, Rules, Methods Generates code to call elaborator with callback registrations, etc. Generated code is compiled and linked with the rest of the system Cannot be done with cpp Original modules are not changed by the analyzer and can be compiled directly by gcc, but must be linked with ESEPP-generated code Run-time system (libesepro. a) n Elaborator w n Determines priorities and scheduling ordering of rules and methods, executed Run-time scheduler w “Fires” rules on every clock cycle Copyright © Bluespec Inc. 2006 Confidential and Proprietary 185

Components/flow Standard System. C flow systemc. h Rule classes esl. h #include dut. cpp Components/flow Standard System. C flow systemc. h Rule classes esl. h #include dut. cpp esepp dut. epp libsystemc. a esepro. a gcc simulation executable Copyright © Bluespec Inc. 2006 Confidential and Proprietary 186

Bluespec System. Verilog Agenda: Technical Deep Dive Intro: why an HDL can affect overall Bluespec System. Verilog Agenda: Technical Deep Dive Intro: why an HDL can affect overall productivity, from concept to silicon Behavior: n n Rules: a new way to express HW behavior Correctness: why rules help Comparison with behavioral synthesis Rule-based Interface Methods: modularizing rules Structure: improving the expression of HW structure using ideas from advanced programming languages Clock domains and gated clocks: compiler-guaranteed safety Testbenches using BSV Transaction Level Modeling/architecture exploration and refinement, within a single paradigm n Comparison with System. C Synthesis quality: as good as hand-coded RTL Tool flows n Coexistence with Verilog/VHDL/SV/System. C Futures: n n Integration of Rules and Rule-based Interfaces into System. C Formal verification Copyright © Bluespec Inc. 2006 Confidential and Proprietary 187

Future: formal verification Copyright © Bluespec Inc. 2006 Confidential and Proprietary 188 Future: formal verification Copyright © Bluespec Inc. 2006 Confidential and Proprietary 188

Verification: Formal Methods — why? So far, Verification = Testing (by simulation) n Even Verification: Formal Methods — why? So far, Verification = Testing (by simulation) n Even the current use of assertions (PSL, SVA) is only a testing strategy Unfortunately, the size (# state elements) of todays chips makes it increasingly difficult/ impossible to cover the state space by testing Copyright © Bluespec Inc. 2006 Confidential and Proprietary 189

Verification: Formal Methods Approach 1 Use theorem-proving and other methods to prove assertions (rather Verification: Formal Methods Approach 1 Use theorem-proving and other methods to prove assertions (rather than just testing assertions during simulation) Assertions can be written using PSL, SVA, … Advantage: coverage (assertion is always true, not just for a particular set of test cases) Caveat: verification can only be as good as the set of assertions being verified! n Do the set of assertions completely specify the design? Copyright © Bluespec Inc. 2006 Confidential and Proprietary 190

Verification: Formal Methods Approach 2 Prove the equivalence of a simple reference model with Verification: Formal Methods Approach 2 Prove the equivalence of a simple reference model with the implementation E. g. , for a processor design: n n Reference model: one instruction at a time, no pipelining, no speculation, no cacheing Implementation: full implementation details Proof method: n n Define a correspondence between each state in the reference model and a state in the implementation For each state change in reference model, show that the implementation moves between corresponding states Copyright © Bluespec Inc. 2006 Confidential and Proprietary 191

Verification: Formal Methods Some references on formal verification using Rule semantics Parallel Program Design: Verification: Formal Methods Some references on formal verification using Rule semantics Parallel Program Design: A Foundation, K. Mani Chandy and Jayadev Misra, Addison Wesley, 1988 n UNITY programming language for concurrent, reactive systems Using Term Rewriting Systems to Design and Verify Processors, Arvind and Xiaowei Shen, IEEE Micro 19: 3, 1998, p 36 -46 Cache Coherence Verification with TLA+, H. Akhiani, Doligez D. , Harter, P. , Lamport L. , Scheid J. , Tuttle M. and Yu Y. , Proc. World Congress on Formal Methods in the Development of Computing Systems-Volume II, p. 1871 -1872, September 20 -24, 1999 Proofs of Correctness of Cache-Coherence Protocols, Stoy et al, in Formal Methods for Increasing Software Productivity, Berlin, Germany, 2001, Springer-Verlag LNCS 2021 Superscalar Processors via Automatic Microarchitecture Transformation, Mieszko Lis, Masters thesis, Dept. of Electrical Eng. and Computer Science, MIT, 2000 Copyright © Bluespec Inc. 2006 Confidential and Proprietary 192

Verification: Formal Methods Summary Formal methods in verification are not yet in widespread use Verification: Formal Methods Summary Formal methods in verification are not yet in widespread use Many companies have started using formal methods on an experimental basis These methods will beome increasingly important as chip complexity increases Design languages with strong formal semantics will improve the likelihood of success Copyright © Bluespec Inc. 2006 Confidential and Proprietary 193

Summary and wrapup Copyright © Bluespec Inc. 2006 Confidential and Proprietary 194 Summary and wrapup Copyright © Bluespec Inc. 2006 Confidential and Proprietary 194

Bluespec System. Verilog™ A one slide overview Bluespec System. Verilog Behavioral For complex concurrency Bluespec System. Verilog™ A one slide overview Bluespec System. Verilog Behavioral For complex concurrency and control, across multiple shared resources, across module boundaries High-level abstract types Powerful static checking Powerful parameterization Powerful static elaboration Advanced clock management Two dimensions raising the level of abstraction (fully synthesizable) Structural Rules and Interface Methods VHDL/Verilog/System. C Copyright © Bluespec Inc. 2006 Confidential and Proprietary 195

Summary Bluespec is using ideas from advanced programming languages: n Behavior: w n Rule-based Summary Bluespec is using ideas from advanced programming languages: n Behavior: w n Rule-based systems, atomic transactions, correctness using invariants, modularity, achieving performance systematically via Rule-composition semantics, … Structural correctness, abstraction and elaboration w w w Complex types, abstract types, polymorphism (type parameterization), systematic overloading Orthogonality (parameterization over all semantically meaningful concepts, including pieces of behavior) Full programming power for structural descriptions . . . to tackle the complexities of modern chip design n Both individual HW blocks, and So. Cs Copyright © Bluespec Inc. 2006 Confidential and Proprietary 196

Bluespec: Better Design Accelerates Everything! More architectural flexibility during design Architectural exploration 50% reduction Bluespec: Better Design Accelerates Everything! More architectural flexibility during design Architectural exploration 50% reduction from design to verified netlist Architecture Design Early executable models Verification and Test Faster fixes, to achieve closure Physical Design 50% reduction in errors, faster correction Better reuse Fully synthesizable – without compromise! Copyright © Bluespec Inc. 2006 Confidential and Proprietary 197

End Thank you for your attention! Copyright © Bluespec Inc. 2006 Confidential and Proprietary End Thank you for your attention! Copyright © Bluespec Inc. 2006 Confidential and Proprietary 198