Скачать презентацию A Channel-Based Asynchronous Low Power High-Performance Standard Cell-Based Скачать презентацию A Channel-Based Asynchronous Low Power High-Performance Standard Cell-Based

112f2b5d2f2be011999062374b36767b.ppt

  • Количество слайдов: 31

A Channel-Based Asynchronous Low. Power High-Performance Standard Cell-Based Sequential Decoder Implemented with QDI Templates A Channel-Based Asynchronous Low. Power High-Performance Standard Cell-Based Sequential Decoder Implemented with QDI Templates Recep Ö. Özdağ & Peter A. Beerel University of Southern California

Motivation and Approach Our Goal: Close to Full-Custom Performance with ASIC Design Times USC Motivation and Approach Our Goal: Close to Full-Custom Performance with ASIC Design Times USC Asynchronous Group 2

Channel Based Asynchronous Design Dual-Rail Channel Sender Receiver Ack clock Asynchronous channel Data • Channel Based Asynchronous Design Dual-Rail Channel Sender Receiver Ack clock Asynchronous channel Data • Two wires per data bit • One acknowledgment wire • Generalizes to 1 -of-N coding • Advantage: Synchronous System • Delay insensitive communication Asynchronous System Synchronization and communication between blocks implemented with handshaking using asynchronous channels by sending/receiving “data tokens” USC Asynchronous Group 3

Channel-Based Design Reg A Main FSM Reg B Memory Adder ASIC Register Bank Multiplier Channel-Based Design Reg A Main FSM Reg B Memory Adder ASIC Register Bank Multiplier BN-1 BN-2 BN-3 leaf cells Subtract/ Divider channels Adder/ Mult. Reg C FAN-1 FAN-2 FAN-3 FA 0 Netlist consists of leaf cells communicating along channels USC Asynchronous Group 4

Asynchronous Leaf Cells Input Channels L Output Channels L Linear Pipeline L Conditional Join Asynchronous Leaf Cells Input Channels L Output Channels L Linear Pipeline L Conditional Join L USC Asynchronous Group Conditional Split 5

Template-Based Leaf-Cell Design • Each pipeline style (QDI, timed…) has a different blueprint • Template-Based Leaf-Cell Design • Each pipeline style (QDI, timed…) has a different blueprint • Create a library using a blueprint to implement the lowest level communicating blocks L 2 -input 1 -output pipeline stage L L Blueprint for a QDI N-input Moutput pipeline stage 1 -input 2 -output pipeline stage Generation of instances from templates is straightforward USC Asynchronous Group 6

Background: Caltech’s QDI Templates bit 0 OR bit 1 OR bitn OR C Done Background: Caltech’s QDI Templates bit 0 OR bit 1 OR bitn OR C Done Completion Detector R L precharge control nmos network Function Block evaluation control USC Asynchronous Group 7

PCHB Performance Analysis Cycle time = 3 t+ 2+t. CD + 2 c++ tprech PCHB Performance Analysis Cycle time = 3 t+ 2+t. CD + 2 c++ tprech time = 3 t. Eval 2 + 2 t tc tprech L 11 L 21 L 31 L 12 L 22 L 32 USC Asynchronous Group 8

Outline USC Asynchronous Group 9 Outline USC Asynchronous Group 9

Background on Fano Algorithm • Fano algorithm is a depth first tree-search algorithm [Fano Background on Fano Algorithm • Fano algorithm is a depth first tree-search algorithm [Fano 64] • Achieves good performance with a low average complexity -5 Total Path Metric: +3 Total Path Metric: +1 Total Path Metric: -2 0 Estimate that transmitted a 1 10 10 01 10 1 error (+3) (-10) (-5) 0 errors 11 11 (-5) (+3) 11 (+3) 00 (-5) Estimate that transmitted a 0 Received Branch Bits Decoded Bit Index (-10) 00 00 (+3) 01 01 10 10 root Decoded bit 11 11 10 … (-5) 01 (-5) 10 10 (-5) 01 (-5) 1 0 0 X X 0 1 0 X 11 X 01 X X 00 1 2 USC Asynchronous Group 3 … 10

The Synchronous Architecture [Asilomar 99] Critical path consists of a 2 ALU’s and 2 The Synchronous Architecture [Asilomar 99] Critical path consists of a 2 ALU’s and 2 MUX’s USC Asynchronous Group 11

Outline • Introduction and Background • The Asynchronous Fano Design • The Back-End Asynchronous Outline • Introduction and Background • The Asynchronous Fano Design • The Back-End Asynchronous Design Flow • Summary of Contributions USC Asynchronous Group 12

The Asynchronous Fano USC Asynchronous Group 13 The Asynchronous Fano USC Asynchronous Group 13

The Asynchronous Architecture To BMU From BMU no. Error XOR_SPLIT Comparison Result ERROR-DETECT Decision_bit The Asynchronous Architecture To BMU From BMU no. Error XOR_SPLIT Comparison Result ERROR-DETECT Decision_bit FILTER Skip. Ahead Decision Received Data compared with MERGE estimated branch bits FAST SHIFT REGISTER XOR XOR BMU Decision FAST DECISION REGISTER The critical path of the Skip Ahead Unit runs at 450 MHz (post layout) USC Asynchronous Group 14

The Memory Design USC Asynchronous Group 15 The Memory Design USC Asynchronous Group 15

Fano: Error-Free Operation 17971 ns 18449 ns USC Asynchronous Group 16 Fano: Error-Free Operation 17971 ns 18449 ns USC Asynchronous Group 16

Fano: Error Operation 17537 ns Error Encountered Move back USC Asynchronous Group 25361 ns Fano: Error Operation 17537 ns Error Encountered Move back USC Asynchronous Group 25361 ns 17

The Layout Received Memory USC Asynchronous Group Branch Metric Calculator Skip Ahead Unit Counter The Layout Received Memory USC Asynchronous Group Branch Metric Calculator Skip Ahead Unit Counter Threshold Adjust Unit Lookup Table Decision Memory 18

Outline • Introduction and Background • The Asynchronous Fano Design • The Back-End Asynchronous Outline • Introduction and Background • The Asynchronous Fano Design • The Back-End Asynchronous Design Flow • Summary of Contributions USC Asynchronous Group 19

Physical Design Flow Standard Flow Works USC Asynchronous Group 20 Physical Design Flow Standard Flow Works USC Asynchronous Group 20

Cell Library Flow: Alternatives • Used for the Fano Algorithm • More suitable for Cell Library Flow: Alternatives • Used for the Fano Algorithm • More suitable for designs with relaxed timing assumptions at the leaf cell level • Used for the STFB based adder • More suitable for designs with strict timing assumptions at the leaf cell level Leaf cell level or gate level place and route USC Asynchronous Group 21

Cell Library Flow Developed asynchronous gate library USC Asynchronous Group 22 Cell Library Flow Developed asynchronous gate library USC Asynchronous Group 22

Transistor Sizing Create a number of subtypes for different strengths USC Asynchronous Group 23 Transistor Sizing Create a number of subtypes for different strengths USC Asynchronous Group 23

Charge-Sharing Considerations • Output inverters and staticizers are internal to all dynamic cells and Charge-Sharing Considerations • Output inverters and staticizers are internal to all dynamic cells and form part of known minimum load on dynamic node (allowing 10% dip in voltage) • On each dynamic gate minimum load is guaranteed to be sufficient to ensure no charge sharing problems exist via extensive simulation Output inverters and staticizers are encapsulated with the dynamic logic into a single gate USC Asynchronous Group 24

Netlist extraction Verilog netlist (. v) for placement and routing Verilog netlist of library Netlist extraction Verilog netlist (. v) for placement and routing Verilog netlist of library gates is auto-generated USC Asynchronous Group 25

Placement, Routing and Extraction USC Asynchronous Group 26 Placement, Routing and Extraction USC Asynchronous Group 26

Chip Assembly • Stream-in blocks layout (from SE to Virtuoso) • Block placement and Chip Assembly • Stream-in blocks layout (from SE to Virtuoso) • Block placement and routing • DRC, LVS and netlist extraction (. sp) • Post-layout simulation Future Work: • Static timing • Automatic block placement and routing • Synthesis USC Asynchronous Group 27

Summary USC Asynchronous Group 28 Summary USC Asynchronous Group 28

Thank You USC Asynchronous Group 29 Thank You USC Asynchronous Group 29

Skip-Ahead Unit with RSPCHB A 14% throughtput improvement in the Skip-Ahead Unit using RSPCHB Skip-Ahead Unit with RSPCHB A 14% throughtput improvement in the Skip-Ahead Unit using RSPCHB instead of PCHB To BMU From BMU no. Error XOR_SPLIT Comparison Result ERROR-DETECT Decision_bit FILTER Skip. Ahead Decision Received Data compared with MERGE estimated branch bits FAST SHIFT REGISTER XOR USC Asynchronous Group XOR FAST DECISION REGISTER 30 BMU Decision

Overview of New Pipeline Templates 2 -D Style Timing Throughput Assumptions PCHB DI/QDI 772 Overview of New Pipeline Templates 2 -D Style Timing Throughput Assumptions PCHB DI/QDI 772 MHz RSPCHB QDI 920 MHz LP 2/2+ Moderate 1. 0 GHz Aggressive 1. 2 GHz HC Foundation of design space exploration trading robustness for performance USC Asynchronous Group 31