ad18e2dfbb5cf7a94731bf5a3ee9074b.ppt
- Количество слайдов: 57
Ian G. Clark IGClark@iee. org Thanks for the invite! http: //Ian. GClark. net/
Talk Layout The Whole Group Open Problems Heterogeneous Metastability MOVIE BESST Async. Comms STELLA COHERENT COMFORT
Formal Techniques The Whole Group Heterogeneous Distributed & Concurrent Systems Async. Comms. Real-Time Networks Verification Fault Tolerance & Reliability Async. Design & Test BIST Controllers Metastability Models Synthesis CAD Direct Mapping PN STG HDL Software tools
MOVIE - “Model Visualisation for Asynchronous Circuit Design” The project addresses the development of theoretical models and an associated set of algorithms and software tools for graphical representation and visualisation of highly complex asynchronous circuit behaviour. New tools will enable skilled designers to achieve greater quality and productivity, and greater confidence in their designs. A few slides from DATE’ 03 …
Visualisation and Resolution of Coding Conflicts in Asynchronous Circuit Design A. Madalinski, V. Khomenko, A. Bystrov and A. Yakovlev University of Newcastle upon Tyne MOVIE Project
Motivation • state coding is a necessary for implementability • manual vs. automatic resolution of coding conflicts – automatic can produce sub-optimal solutions – manual crucial for finding good (low-latency, compact & elegant) synthesis solutions • interactivity is good! • conflict complementary set (i. e. {b+, a-, b-, a+}) called a ‘core’ • select cores insert a signal to break the conflict.
Core selection: Height map csc 1+ Core map Height map
Signal insertion: an example Phase 1 Phase 2 csc 1+ csc 1 - Core map Part of the solving process 888 CSC conflicts – 4 cores
BEhavioural Synthesis of Systems with heterogeneous Timing (BESST) supported by EPSRC at Newcastle University (project GR/R 16754) Aim : The overall strategic goal of the project is generic methods and an associated set of software tools for synthesis of systems with heterogeneous timing --primarily focused on self-timed controllers and interfaces. Prof. Alex Yakovlev, Dr. Albert Koelmans, Dr. Frank Burns, and Mr. Delong Shang
Design Flow
System Synthesis Method Ø Ø A new method has been proposed. It is not a syntax-direct translation. It semantically translates a system specification from high level to an intermediate format, LPNs (Labelled Petri Net) and CPNs (Coloured Petri Net), and then directly maps the LPNs and CPNs to an SI (Speed Independent) circuit. Some examples have been done using the method, such as DMA controller, and others.
What Has Been Done?
Current and Future Work. Ø Ø Ø Currently more research is focused on optimization and scheduling, and will be focused on the system level synthesis, for example partitioning and communication synthesis. More complex examples are being studied. Relative Timing (RT) techniques among others will be introduced to improve performance.
STELLA: Synthesis and Testing of Low-Latency Asynchronous Circuits Prof. A. Yakovlev (PI) Dr. A. Bystrov Prof. D. Kinniment Dr. A. Koelmans Dr. G. Russell Jan. 2003 -- Dec. 2005
Aims and Objectives • Develop the detailed implementation architecture of a lowlatency controller with techniques for automated decomposition, synthesis and timing analysis (see e. g. CSTR-743, CS-TR-754 – from ‘http: //www. cs. ncl. ac. uk/’). • Develop the main supporting structures for off-line testing, such as internal scanning, for a class of stuck-at, bridging and delay faults with minimum speed overheads (see e. g. CS -TR-746). • Develop the detailed architecture for a snooper for on-line testing of self-timed structures with minimum area and power consumption overheads. • Develop a demonstrator chip employing the testable lowlatency methodology; the application area will be an on-chip communication adaptor.
Example of Low-Latency structure • Output precomputation: Explicit Context Signals (ECS) • Latency reduction: inputs connected to output flip-flops
Interfacing to standard CAD tools • Maximum reuse of industrial CAD tools • Providing alternative solutions to the parts of the standard design flow • Compilation of RTL specs and structural Verilog netlists into asynchronous designs • Reuse of test-related standard CAD tools Methods developed in the course of work will be implemented in software tools and interfaced to the industrial CAD toolkits (Cadence), acting as a performance and test oriented asynchronous design front-end.
COMFORT - "asynchronous COmmunication Mechanisms FOr Real-Time systems" Objectives • To study a range of asynchronous communication mechanisms (ACMs) that can be used in constructing (distributed and concurrent) systems with heterogeneous timing • To develop hardware implementations for ACMs, (including self-timed circuits) for potential use in Systems-On-a-Chip (SOCs) and embedded (miniature, low power and EMC) applications
COHERENT - "COmputational HEte. Rog. Eneously timed Ne. Tworks" Objectives • • Development of a parameterised library of ACMs Formal synthesis of multi-slot ACM algorithms Develop RTNo. C architecture (HETS) Develop RTNo. C design flow: functional spec, design, simulation, analysis, prototyping, implementation and testing • Test RTNo. Cs on real examples of control or vision systems; comparison with existing (centrally clocked) solutions
Introduction and Background Non-sampled Continuous time Sing le sync clock hron ous llel Para Mult ip dom le clock ains roge neou s Hete GALS S HET Asyn (self chronou -time s d) Ana logu e The Timing Modes Spectrum Sampled data ? Discrete time • Sequential and synchronous easier. • An intermediate solution GALS • Transfer of knowledge from the existing methods to the new solutions.
Introduction and Background Benefits of Asynchronous processing… • Improved EMC - dependent on data being processed. • Lower power - energy only used when work is done. Example – A to D conversion.
Tool Support • MASCOT / Real-Time network tools (internal to BAe). • Metropolis (Cadence Labs at Berkeley +++ (http: //www. gigascale. org/metropolis/)) • Moses (http: //www. tik. ee. ethz. ch/~moses/). Component re-use • Off the shelf processors or IP cores - “best in class” • MASCOT designs can be compiled down on to different hardware platforms Implementation • ‘So. PC’ - System on Programmable Chip - defined as ‘any complex ASIC with at least one computing engine’ Pat Mead, Altera: from IEE So. C forum in Cambridge 2001 • No. C: Benini/De Micheli work
No. C – Network on Chip • Large existing knowledge base. • Philips ‘ethernet on chip’. • Current networks are synchronous – cannot handle non-synchronous cores – like self-timed. • Global chip communication – increased power consumption. • Good for non-deterministic data communication. • Side step the synchronization and global clock issues. • Not suitable for Real-Time applications.
Baseline: Architectural aspect • Real-time networks and MASCOT approach – from RSRE/Phillips(67), BAe/Simpson(86) – for software systems – high time heterogeneity but relatively low speed • Globally-Asynchronous-Locally-Synchronous (GALS) – Chapiro(84), Muttersbach(00), Ginosar(00) – for VLSI circuits – high speed but very limited time heterogeneity
Heterogeneously Timed Nets (hets) (based on MASCOT standard symbols) A 2 A 1 C 2 A 4 C 1 A 3 C 3
Hets Time/event/data-driven Data processing elements (active) A 1 A 2 C 2 A 4 C 1 A 3 C 3
Hets Data communication elements (passive) - ACMs A 2 A 1 C 2 A 4 C 1 A 3 C 3
Asynchronous data communications Processes are single threads of execution. writer time domain reader time domain Level of asynchrony is defined by WRITE and READ rules
Classification of ACMs Hugo Simpson’s classification: Destructive read (read can be held up) Destructive write (write cannot be held up) Non-destructive write (write can be held up) Non-destructive read (read cannot be held up) Signal (event data) Pool (reference data) Channel (message data) Constant (configuration data) Other ACM classifications: e. g. L. Lamport, 1986 (safe, regular and atomic registers)
Difficulty with Simpson’s classification • Destructive/Non-destructive does not intuitively imply temporal, Wait/No-wait division: – Destructive write cannot wait – Destructive read can wait • There is symmetry between Pool and Channel but no symmetry between Signal and Constant
Petri net capture of Simpson’s protocols Signal Pool non-destr write empty destr read destr write non-destr read full Channel Constant empty non-destr write destr read non-destr write full
Our interpretation Signal write Pool read write over-write read re-read over-write read unread Channel write Message/Command read write read unread Constant is a special case of Command re-read unread
Our interpretation Busy Writer Signal write Pool read write over-write read unread Lazy Reader Channel write re-read Busy Reader Message/Command read write re-read unread Lazy Writer unread
Our classification of ACMs Lazy read = read only Busy read = may repreviously unread data already read data (read can be held up) Busy write = may over-write unread data (read cannot be held up) BW-LR (Signal) BW-BR (Pool) LW-LR (Channel) LW-BR (Command) (write cannot be held up) Lazy write = write only if previous read data (write can be held up)
Signal vs Pool Real time 1 (busy domain) Real time (busy domain) Pool Signal Real time 2 (busy domain) Data-driven (lazy domain) Low Power!
Sample algorithms Pool – with 3 slots – fully asynchronous wr: write slot n; r 0: r: =l; w 0: l: =n; rd: read slot r; w 1: n: =¬(l, r); Signal – with 2 slots – conditionally asynchronous wr: write slot w; r 0: r: =¬r; w 0: w: =¬r; rd: wait until w¬=r read slot r;
What is a slot? - Slot: Shared memory for one item of data - Multiple slots: No temporal independence with only one slot. (There will always be situations when both processes clash in time on the one data slot). - Capacity Not to be confused with the number of slots. It takes a minimum of 3 slots to make a capacity 1 pool.
Data Properties Coherence Write: ‘ 07: 57’; ‘ 07: 58’; ‘ 07: 59’; ‘ 08: 00’; ‘ 08: 01’; ‘ 08: 02’; ‘ 08: 03’; Read: ‘ 07: 57’; ‘ 07: 59’; ‘ 07: 00’; ‘ 08: 02’; Freshness Write: ‘ 07: 57’; ‘ 07: 58’; ‘ 07: 59’; ‘ 08: 00’; ‘ 08: 01’; ‘ 08: 02’; ‘ 08: 03’; Read: ‘ 07: 57’; ‘ 07: 58’; ‘ 08: 02’; Sequence Write: ‘ 07: 57’; ‘ 07: 58’; ‘ 07: 59’; ‘ 08: 00’; ‘ 08: 01’; ‘ 08: 02’; ‘ 08: 03’; Read: ‘ 07: 57’; ‘ 07: 59’; ‘ 07: 58’; ‘ 08: 02’;
SIGNAL: Data latency If a reader cycle immediately follows a writer cycle what data does it get? Write X post Does the reader read X?
write slot w; w : = not r; SIGNAL: Data latency Write slot 0 w=0 r=0 w: =not r = 1 Write X post pre r: =not r = 1 w==r therefore made to wait r : = not r; wait until w¬=r read slot r;
write slot w; w : = not r; SIGNAL: Data latency Write slot 0 w=0 r=0 Write slot 1 w: =not r = 1 Write X post w: =not r = 0 Write Y pre post Read This implies 0 capacity r: =not r = 1 w==r therefore made to wait r : = not r; wait until w¬=r read slot r; Trade off between slots and capacity and latency. 3 slot signal has capacity 1, and does not make the reader wait as here.
Modeling the algorithms Example statement : - “w : = not r; ” r=1 w=0 start finish r=0 w=1 subnet W 0 in the Signal Non-abstract models for ease of understanding This is atomic – some statements need to be 2 stage
Modeling the algorithms w=0/1 write subnet r=0/1 R 0 subnet W 0 subnet setting referencing read subnet Slot_0/1 read/unread
Sub-models and the ‘enable’ place write post Write is set to fresh and valid other slot is set to not fresh write end fresh and valid sub-model This should appear as an atomic action to the other process
Sub-models and the ‘enable’ place write end testing sub-model enable part of the reader model
Metastability Active clock edge time
a normal state-transition
Metastability Active clock edge time Output Propagation delay Every flip-flop has at least three equilibrium points, two stable and one unstable. Input Set-up time
Metastable transients
Metastability Active clock edge time 0 Output Propagation delay M Input Set-up time Keep away from data path! 1
Analysis and Some Results Exhaustive ‘reachability’ search – all process interleaving covered. 3 slot pool Control {1, 2, 3} Arbiter req. 4 slot pool Control {0, 1} No arbiter Capacit y 1+delay Capacit y 1 2 slot signal Control {0, 1} No arbiter Capacit y 0~1 3 slot signal Control {1, 2, 3} No arbiter Capacit y 1
VLSI design layout (chip fab’ed in June 2000 via EUROPRACTICE) 4 -slot Pool ACM
4 -slot ACM part (details on testing in 9 th. Async UK Forum paper)
Applications Distributed CCTV • Advisor EU Project. Control systems • Broom balancer. Sensor networks • Condition based maintenance In car network • simple RC oscillator – vast clock range with temp.
Conclusion The Whole Group Open Problems Heterogeneous COHERENT MOVIE BESST Metastability STELLA Async. Comms. COMFORT
Open questions Analysis of dynamic systems with ACMs in. Testing intermittent faults, online-testing (e. g. cross talk). Folding of Petri Nets • Synthesis from partial orders.
Acknowledgements More info on team and projects Leader: Alex Yakovlev. Academics: Graeme Chester , Tony Davies, David Kinniment, Albert Koelmans, Maciej Koutny, Gordon Russell, Sergio Velastin. Collaborators: Eric Campbell, Hugo Simpson, +++. Researchers: Frank Burns, Alex Bystrov, David Fraser, Marta Pietkiewicz. Koutny, Delong Shang, Fei Xia. Students: Fei Hao, Victor Khomenko, Agnes Madalinski, Danil Sokolov, Maria Valera, +++.