8114f91e8c6bd98cbecbb736fe28c01b.ppt
- Количество слайдов: 98
Asynchronous Circuit Compilation Dr. Doug Edwards doug@cs. man. ac. uk Southampton: Oct 99 Asynchronous Circuit Compilation- 1
Overview: Asynchronous circuits n Advantages n Asynchronous Design Paradigms n Syntax Directed Compilation n • Handshake Circuits Balsa n Datapath Compilation n Design Example - DMA Controller n Southampton: Oct 99 Asynchronous Circuit Compilation- 2
Asynchronous (self-timed) Basics n Synchronous circuits • a global clock separates system states – A time domain view of system activity. n Asynchronous circuits • input changes separate system states – A sequence or trace domain view of system activity. Southampton: Oct 99 Asynchronous Circuit Compilation- 3
Why Asynchronous? n Low Power • data-driven: power is only used to do useful work • zero power when idle with instant restart n Low EMI • In a clocked circuit, all noise is correlated • Async circuits have “distributed” switching activity leading to uncorrelated EMI Southampton: Oct 99 Asynchronous Circuit Compilation- 4
Why Asynchronous? No clock distribution problems n Composability/Modularity n • facilitates IP reuse n Average Case Performance • exploit the fact that worst-case often occurs infrequently Southampton: Oct 99 Asynchronous Circuit Compilation- 5
Timing Models n Delay Insensitive (DI) • Delays in circuits & wires are arbitrary n Quasi-Delay Insensitive (QDI) • Similar to DI but assuming isochronic forks n Speed Independent (SI) • Wires have no delays, arbitrary gate delays n Bounded Delay • Single-sided timing constraints Southampton: Oct 99 Asynchronous Circuit Compilation- 6
Asynchronous Design Paradigms n AFSMs - for fast controllers etc • Traditionally hard – hazards, races , state asigment problems • Research has led to new techniques – STG/Petri net based SI circuits – Burst-Mode circuits n Macromodule-like for larger systems • micropipeline approach, handshake circuits Southampton: Oct 99 Asynchronous Circuit Compilation- 7
Asynchronous Control With no clock, some other means is required to co-ordinate control flow n Use a request/acknowledge handshake n Req Sender Southampton: Oct 99 Ack Asynchronous Circuit Compilation- 8
Signalling Protocols n req & ack are abstractions: • layer a signalling protocol on top of them: n Two common protocols • 2 -phase (transition signalling, NRZ) • 4 -phase (Return-to-Zero signalling) Southampton: Oct 99 Asynchronous Circuit Compilation- 9
Data Validity Models n Self Timed • The validity of the data is encoded within the data itself – redundant coding • e. g. Dual Rail: each data bit requires two wires. 00 -> no data, 01 -> ‘ 0’, 10 -> ‘ 1’ n Bundled Data approach • conventional datapath • validity is assured by imposing timing constraints. Southampton: Oct 99 Asynchronous Circuit Compilation- 10
2 -phase Protocol n Events are transitions: º Req 1 transaction valid Ack Southampton: Oct 99 Asynchronous Circuit Compilation- 11
4 -phase protocol n Signals are returned to initial state after each transaction • Several possible interleavings of the signal transitions Southampton: Oct 99 Asynchronous Circuit Compilation- 12
Comparison of Approaches n 2 -phase/4 -phase • 2 -phase conceptually simpler (once an event mind-set is adopted) • 2 -phase circuits slower & more complex • think 2 -phase, build 4 -phase n Bundled-Data/Dual-rail • Current orthodoxy: bundled data is faster, lower power, smaller area with tolerancing task no worse than for a clocked design Southampton: Oct 99 Asynchronous Circuit Compilation- 13
Current Approach QDI control n Bounded-Delay (bundled-data) datapath n 4 -phase signalling n Amulet 3 i Southampton: Oct 99 Asynchronous Circuit Compilation- 14
Asynchronous HDLs n Conventional programming languages lack 3 necessary constructs: • communication • parallelism/concurrency • sharing (of hardware) n Conventional HDLs lack adequate • fine-grain concurrency • channel based communication primitives Southampton: Oct 99 Asynchronous Circuit Compilation- 15
Asynchronous HDLs – 2 n Tangram , Balsa • CSP based + data types + … • based on underlying formal semantics – guarantees correct composition rules – easier composition than in sync circuits? ? ? • transparent compilation – each production rule in the language translates to an intermediate handshake circuit – allows designer to infer circuit costs & performance from the program Southampton: Oct 99 Asynchronous Circuit Compilation- 16
Handshake Circuits - 1 Circuits communicate along channels n Channels connect ports at circuit interface n Ports have: n • Type • Direction • Sense Southampton: Oct 99 Asynchronous Circuit Compilation- 17
Handshake Circuits - 2 n Port type determines the number of data wires • no data wires == control only port! Port direction is input, output or control only n Port sense n • Active: initiates transfers • Passive: responds to requests Southampton: Oct 99 Asynchronous Circuit Compilation- 18
Micropipeline-Style Circuits: Push Circuits: Circuit waits for data ack passive input Southampton: Oct 99 req cct data ack active output Asynchronous Circuit Compilation- 19
Micropipeline-Style Circuits: Push Circuits: data arrives data ack Southampton: Oct 99 req cct data ack Asynchronous Circuit Compilation- 20
Micropipeline-Style Circuits: Push Circuits: data validity signalled data ack Southampton: Oct 99 req cct data ack Asynchronous Circuit Compilation- 21
Micropipeline-Style Circuits: Push Circuits: circuit accepts data ack Southampton: Oct 99 req cct data ack Asynchronous Circuit Compilation- 22
Micropipeline-Style Circuits: Push Circuits: circuit signals data taken data ack Southampton: Oct 99 req cct data ack Asynchronous Circuit Compilation- 23
Micropipeline-Style Circuits: Push Circuits: Circuit outputs data ack Southampton: Oct 99 req cct data ack Asynchronous Circuit Compilation- 24
Micropipeline-Style Circuits: Push Circuits: Circuit signals validity data ack Southampton: Oct 99 req cct data ack Asynchronous Circuit Compilation- 25
Micropipeline-Style Circuits: Push Circuits: receiver takes data ack Southampton: Oct 99 req cct data ack Asynchronous Circuit Compilation- 26
Micropipeline-Style Circuits: 4 -phase protocol not detailed n Previous circuit decoupled input and ouput n • implies a latch inside the handshake circuit n An alternative is for the input handshake to enclose the output handshake Southampton: Oct 99 Asynchronous Circuit Compilation- 27
Enclosed Handshake: Push Circuits: data arrives data ack Southampton: Oct 99 req cct data ack Asynchronous Circuit Compilation- 28
Enclosed Handshake: Push Circuits: data validity signalled data ack Southampton: Oct 99 req cct data ack Asynchronous Circuit Compilation- 29
Enclosed Handshake: Push Circuits: circuit accepts data ack Southampton: Oct 99 req cct data ack Asynchronous Circuit Compilation- 30
Enclosed Handshake: Push Circuits: Circuit outputs data ack Southampton: Oct 99 req cct data ack Asynchronous Circuit Compilation- 31
Enclosed Handshake: Push Circuits: Circuit signals validity data ack Southampton: Oct 99 req cct data ack Asynchronous Circuit Compilation- 32
Enclosed Handshake: Push Circuits: receiver takes data ack Southampton: Oct 99 req cct data ack Asynchronous Circuit Compilation- 33
Enclosed Handshake: Push Circuits: input handshake completes req No latch required data ack Southampton: Oct 99 req cct data ack Asynchronous Circuit Compilation- 34
Tangram Style Circuits Pull Circuits: active ported circuits/ control driven req data ack cct data ack active input port Southampton: Oct 99 Asynchronous Circuit Compilation- 35
Tangram Style Circuits Pull Circuits: Circuit demands data ack Southampton: Oct 99 req cct data ack Asynchronous Circuit Compilation- 36
Tangram Style Circuits Pull Circuits: data is sent on demand data ack Southampton: Oct 99 req cct data ack Asynchronous Circuit Compilation- 37
Tangram Style Circuits Pull Circuits: data is accepted and can then be released data ack Southampton: Oct 99 req cct data ack Asynchronous Circuit Compilation- 38
Balsa Language for synthesising large async circuits & systems n CSP/OCCAM background n Tangram-like n • based on Tangram compilation function • compiles to a small (but expanding) set of handshake circuits • origins: ESPRIT EXACT project Southampton: Oct 99 Asynchronous Circuit Compilation- 39
Balsa Language Features n Data types based on sequence of bits • Arrays and records are bit-based • Element extraction is by array slicing • Strict data typing Structural iteration n Arrayed channels n Parameterised & recursive functions n Southampton: Oct 99 Asynchronous Circuit Compilation- 40
Balsa Language Features n Enclosed selection semantics • • Allows passive ported circuits Allows push (micropipeline-style) circuits Allows unbuffered (latch-free) circuits Can be considered a restricted form of Burns’ probe construct. Southampton: Oct 99 Asynchronous Circuit Compilation- 41
Balsa Source Southampton: Oct 99 Asynchronous Circuit Compilation- 42
Example: Single Place Buffer import [balsa. types. basic] public type word is 16 bits visibility library mechanism output o : word) procedure buffer (input i : word; type local variable x : word declaration begin procedure loop i -- Input implies communication definition -> x; latch o <- x -- Output communication repeat end forever sequential end is channel declarations read input channel into operation output local variable x to output channel Southampton: Oct 99 Asynchronous Circuit Compilation- 43
Buffer Handshake Circuit Single-place buffer repeater activation channel # sequencer ; transferrer i T x T o variable Southampton: Oct 99 Asynchronous Circuit Compilation- 44
Buffer Handshake Circuit Single-place buffer repeater is activated # ; i Southampton: Oct 99 T x T o Asynchronous Circuit Compilation- 45
Buffer Handshake Circuit Single-place buffer Sequencer handshakes to left transferrer # ; i Southampton: Oct 99 T x T o Asynchronous Circuit Compilation- 46
Buffer Handshake Circuit Single-place buffer transferrer requests data from environment # ; i Southampton: Oct 99 T x T o Asynchronous Circuit Compilation- 47
Buffer Handshake Circuit Single-place buffer data transferred to variable x # ; i Southampton: Oct 99 T x T o Asynchronous Circuit Compilation- 48
Buffer Handshake Circuit Single-place buffer variable handshake completes # ; i Southampton: Oct 99 T x T o Asynchronous Circuit Compilation- 49
Buffer Handshake Circuit Single-place buffer transferrer handshake completes to environment # ; i Southampton: Oct 99 T x T o Asynchronous Circuit Compilation- 50
Buffer Handshake Circuit Single-place buffer transferrer handshake completes # ; i Southampton: Oct 99 T x T o Asynchronous Circuit Compilation- 51
Buffer Handshake Circuit Single-place buffer Sequencer handshakes to right transferrer # ; i Southampton: Oct 99 T x T o Asynchronous Circuit Compilation- 52
Buffer Handshake Circuit Single-place buffer Transferrer reads variable # ; i Southampton: Oct 99 T x T o Asynchronous Circuit Compilation- 53
Buffer Handshake Circuit Single-place buffer Transferrer outputs to environment # ; i Southampton: Oct 99 T x T o Asynchronous Circuit Compilation- 54
Buffer Handshake Circuit Single-place buffer handshakes complete # ; i Southampton: Oct 99 T x T o Asynchronous Circuit Compilation- 55
Buffer Handshake Circuit Single-place buffer Sequencer completes its input handshake # ; i Southampton: Oct 99 T x T o Asynchronous Circuit Compilation- 56
Buffer Handshake Circuit Single-place buffer repeater initiates another transfer, etc # ; i Southampton: Oct 99 T x T o Asynchronous Circuit Compilation- 57
Example: Single Place Buffer import [balsa. types. basic] public type word is 16 bits procedure buffer (input i : word; output o : word) is local variable x : word begin loop i -> x; -- Input communication o <- x -- Output communication end Southampton: Oct 99 Asynchronous Circuit Compilation- 58
Example: 2 -place buffer import [balsa. types. basic] import [buffer 1 a] public type word is 16 bits reuse component procedure buffer 2 c (input i : word; output o : word) is local channel c : word begin buffer (i, c) || internal channel buffer (c, o) connects two end parallel 1 -place buffers composition buffers connected by common signal name Southampton: Oct 99 Asynchronous Circuit Compilation- 59
2 -place Buffer Handshake Circuit par component passivator i Southampton: Oct 99 c B • c x o Asynchronous Circuit Compilation- 60
2 -place Buffer Handshake Circuit par component # ; i passivator c T Southampton: Oct 99 x # T • ; c T x T o Asynchronous Circuit Compilation- 61
Peephole Optimisation Composition of handshake circuits leads to inefficiencies at circuit boundaries n Straightforward peephole optimizations n Southampton: Oct 99 Asynchronous Circuit Compilation- 62
2 -place Buffer Handshake Circuit par component # ; i passivator c T Southampton: Oct 99 x # T • ; c T x T o Asynchronous Circuit Compilation- 63
Optimized 2 -place Buffer Circuit # # ; • ; control-only i T Southampton: Oct 99 x T Asynchronous Circuit Compilation- 64
The Repeater n “Formal” Definition REP(a , b ) = (a : #[b ]) denotes active : denotes handshake # denotes repeat port denotes passive port enclosure Southampton: Oct 99 Asynchronous Circuit Compilation- 65
The Repeater n “Formal” Definition REP(a , b ) = (a : #[b ]) = (a : #[b ; b ]) = (ar : #[br ; ba ; br ; ba ]) aa ar Southampton: Oct 99 br ba Asynchronous Circuit Compilation- 66
The Transferrer n Several Implementations • simplest – wire-only: ar aa br ca ba cr data[n] Southampton: Oct 99 Asynchronous Circuit Compilation- 67
Balsa Toolkit -1 n balsa-c • The compiler for the language n breeze 2 dot • Produces a postscript plot of the generated handshake circuits n breezecost • Reports the cost of the compiled circuit in arbitrary units Southampton: Oct 99 Asynchronous Circuit Compilation- 68
Balsa Toolkit -2 n breeze 2 lard • The interface to the LARD simulation environment. – balsa source is translated to LARD – simple test harness is generated n balsa-md • An automatic makefile generation facility. n balsa-mgr • A GUI project manager Southampton: Oct 99 Asynchronous Circuit Compilation- 69
Mod-16 Counter (all even) Southampton: Oct 99 Asynchronous Circuit Compilation- 70
Bundled-Datapaths n Problems • random standard cell layout – mixed control + datapath • timing analysis required • robustness of design reduced n Possible Solutions • DI codes • hybrid bundled + DI • simpler timing analysis Southampton: Oct 99 Asynchronous Circuit Compilation- 71
DI Codes n Dual Rail (used in 1 st Tangram system) • Can use standard cell approach without timing analysis – no need to distinguish between control & data • abandoned in favour of bundled-data – area cost in extra wires – area & time cost in completion detection • Tangram/Balsa generates push-pull pipelines with expensive synchronization Southampton: Oct 99 Asynchronous Circuit Compilation- 72
Generic Pipeline n Passivators join compiled procedure passivator i Southampton: Oct 99 c B • c B o Asynchronous Circuit Compilation- 73
Passivator Implementation n Bundled Data ar br C aa ba n-wide C-gate n data[n] Dual Rail d 0 d 1 n-bits wide dn-1 aa Southampton: Oct 99 br C C ba Asynchronous Circuit Compilation- 74
DI Code Synchronizations n Expensive • need C-element synchronisation tree n A partial solution (not always possible/desirable) is: • transform to push-style datapath – (not possible in Tangram only Balsa) Southampton: Oct 99 Asynchronous Circuit Compilation- 75
Push Pipeline Passive input port i c B • c B o connector (wires-only) Southampton: Oct 99 Asynchronous Circuit Compilation- 76
Hybrid Solutions n Use DI coding within bundled datapath framework • e. g. use dual-rail carry signals within a conventional adder – early completion easily detected Average-case performance n Only applicable to a few datapath operations n Southampton: Oct 99 Asynchronous Circuit Compilation- 77
Simpler Timing Analysis n Separate control and datapath • generate regular, compiled, datapath – area improvement over standard cell (because of regular layout) – generate matched delay paths (c. f. self-timed PLAs) • must be able to recognize datapath – difficult: control often contains datapath-like elements. – e. g. start at variables and work backwards. . . Southampton: Oct 99 Asynchronous Circuit Compilation- 78
Datapath meets Control n Example: Balsa case statement 1 hot encoding data “n” bits wide true/complement lines: dual-rail expansion Southampton: Oct 99 Asynchronous Circuit Compilation- 79
Case Component n input from datapath • dual-rail simplifies internal logic expansions parameterisable n “encode” component is similar n • opposite of case with true/false expansion Southampton: Oct 99 Asynchronous Circuit Compilation- 80
Simpler Timing Analysis n Tool support required • use existing (non-Balsa) tools if possible • automatically add matched paths/delays to synthesised datapaths n Design own cells where appropriate • e. g. hybrid stages Southampton: Oct 99 Asynchronous Circuit Compilation- 81
Future Work n Provide support for DI, hybrid and datapath-compiled datapaths • even with datapath compilation, some datapath would still be standard cell – e. g. instruction decoder (control heavy) – datapath in control • cost of connecting separate blocks in layout n Test Design required (datapath heavy) Southampton: Oct 99 Asynchronous Circuit Compilation- 82
Tool Enhancement n balsa-c • support for attribution to select compilation mechanisms/ optimisation schemes n breeze 2 lard • new models n balsa-netlist: • new tech-mapping descriptions • interface to datapath compilers Southampton: Oct 99 Asynchronous Circuit Compilation- 83
AMULET 3 i n Asynchronous macrocell • • • n ARM compatible processor core Full custom RAM Compiled ROM Balsa compiled DMA controller Test I/F, synchronous and off-chip bus bridges Synchronous peripherals • Designed by commercial partner. . . Southampton: Oct 99 Asynchronous Circuit Compilation- 84
AMULET 3 System CPU / RAM ROM Southampton: Oct 99 Sync bridge MARBLE Periph 1 SOCB DMAC Asynchronous Circuit Compilation- 85
DMA Local RAM Access CPU / RAM ROM Southampton: Oct 99 Sync bridge MARBLE Periph 1 SOCB DMAC Asynchronous Circuit Compilation- 86
DMA Peripheral Accesses CPU / RAM ROM Sync bridge MARBLE Periph 1 SOCB DMAC DMA requests Southampton: Oct 99 Asynchronous Circuit Compilation- 87
Requirements / Specification 16 clients, 32 channels n 3 channel types - complicated register structure n Programmable client channel 1 many mapping n Support synchronous requests n Transfers mostly between synchronous clients n Southampton: Oct 99 Asynchronous Circuit Compilation- 88
Controller Structure Southampton: Oct 99 Asynchronous Circuit Compilation- 89
Two Controller Descriptions n Sequential (previous slides) • Very simple control flow • Requires two passes through register bank • Slow!, Only memory decoupling helps n Parallel (next slides) • Decouple TE actions from memory R/W with a new unit: Transfer Interface • Interrupt the register bank on end of transfer Southampton: Oct 99 Asynchronous Circuit Compilation- 90
“Parallel” Design Southampton: Oct 99 Asynchronous Circuit Compilation- 91
The Design 919 lines of Balsa describing register bank control, TE and TI. n Custom register banks and Synchronous Peripheral Interface n Miscellaneous glue standard cells n • Register bank controllers • MARBLE interfaces n Compass Design Automation CAD Southampton: Oct 99 Asynchronous Circuit Compilation- 92
Implementation Technology 0. 35 m, 3 LM CMOS n Standard cells from ARM Ltd. n Locally designed complex gates and asynchronous elements/gates. n Automated standard cell P&R n Only “essential” and simple gate level optimisation (by hand) n Southampton: Oct 99 Asynchronous Circuit Compilation- 93
Design Partitioning Marble BUS: outside of DMA controller Southampton: Oct 99 Asynchronous Circuit Compilation- 94
Design Partitioning Balsa synthesised standard cells Southampton: Oct 99 Asynchronous Circuit Compilation- 95
Design Partitioning Custom “regular” layout Southampton: Oct 99 Asynchronous Circuit Compilation- 96
Design Partitioning Hand designed standard cells Southampton: Oct 99 Asynchronous Circuit Compilation- 97
DMA Controller Floor-Plan Southampton: Oct 99 Asynchronous Circuit Compilation- 98