Скачать презентацию Digital Integrated Circuits A Design Perspective Jan M Скачать презентацию Digital Integrated Circuits A Design Perspective Jan M

7526fd3eb606acf2819f770f0ddecf9f.ppt

  • Количество слайдов: 40

Digital Integrated Circuits A Design Perspective Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic Modified Digital Integrated Circuits A Design Perspective Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic Modified and integrated by Davide Bertozzi Design Methodologies Array-based design © Digital Integrated Circuits 2 nd Design Methodologie

Late-Binding Implementation • Till now, all methodologies require a complete run through the fabrication Late-Binding Implementation • Till now, all methodologies require a complete run through the fabrication process ü very high NRE (nonrecurring expense) • Array-based implementations have less manufacturing costs ü attractive for small series ü lower performance/density, higher power Array-based Pre-diffused (Gate Arrays) © Digital Integrated Circuits 2 nd Pre-wired (FPGA's) Design Methodologie

Gate Array — Sea-of-gates • wafers of pre-diffused transistors are pre-manufactured • desired interconnections Gate Array — Sea-of-gates • wafers of pre-diffused transistors are pre-manufactured • desired interconnections added to determine the overall function of the chip - just a few metallization steps more, applied onto pre-diffused wafers in a week or less - manufacturing irregarding of final application (standard masks) PMOS Uncommited Cell NMOS The channelless layout is called “sea-of-gates” (which also does not have predefined contacts) © Digital Integrated Circuits 2 nd Committed Cell (4 -input NOR) Design Methodologie

Gate Array — Primitive cells PMOS Uncommited Cell NMOS • How to determine - Gate Array — Primitive cells PMOS Uncommited Cell NMOS • How to determine - composition of primitive cells? - need to ensure maximum transistor exploitation - size of primitive transistors? - flexibility to drive arbitrary loads Static design decisions affect a wide range of designs!! © Digital Integrated Circuits 2 nd Design Methodologie

Sea-of-gate Primitive Cells Alternative cell structures Using oxide-isolation Isolated cells consist of N transistors Sea-of-gate Primitive Cells Alternative cell structures Using oxide-isolation Isolated cells consist of N transistors In principle, gate-isolation leads to higher transistor density © Digital Integrated Circuits 2 nd Using gate-isolation Long rows of transistors sharing the same diffusion area Some transistors must be tied to Vdd or GND for isolation between neighboring gates Design Methodologie

Sea-of-gate Primitive Cells Transistor sizing challenge Interconnect-oriented nature of GAs (prop. delay dominated by Sea-of-gate Primitive Cells Transistor sizing challenge Interconnect-oriented nature of GAs (prop. delay dominated by interconn. capacitance) • Favors larger device sizes - large area overhead when unused • Connect smaller devices in parallel (e. g. , 2 rows of small NMOS TNs, to connect in parallel when needed) • Small devices for pass transistor logic or memory cells smaller Utilization factors largely depend on application Using oxide-isolation - from 100% (regular structures) to lower than 75%. Mapping a design onto a gate array is largely automated © Digital Integrated Circuits 2 nd Design Methodologie

Example: Base Cell of Gate. Isolated GA Base cell: 1 p. MOS 1 n. Example: Base Cell of Gate. Isolated GA Base cell: 1 p. MOS 1 n. MOS Cell height: 21 tracks © Digital Integrated Circuits 2 nd From Smith 97 Design Methodologie

Example: Flip-Flop in Gate. Isolated GA © Digital Integrated Circuits 2 nd From Smith Example: Flip-Flop in Gate. Isolated GA © Digital Integrated Circuits 2 nd From Smith 97 Design Methodologie

Sea-of-gates Memories can be implemented on top of gate arrays - inefficient (similar to Sea-of-gates Memories can be implemented on top of gate arrays - inefficient (similar to standard cells) GAs integrated with memory macros (embedded gate array) Random Logic Memory Subsystem © Digital Integrated Circuits 2 nd LSI Logic LEA 300 K (0. 6 mm CMOS) Courtesy LSI Logic Design Methodologie

Comparison • Gate Array: - Lower manifacturing cost - larger area - interconnect-centric programming Comparison • Gate Array: - Lower manifacturing cost - larger area - interconnect-centric programming (!) - regular and fixed layout: load factors, wiring parasitics, … can be accurately estimated Loss of interest • Standard cell: - Higher manifacturing cost - lower area - less emphasis on routing - load factors and parasitics are only known after placement, routing and extraction Some Gate Array design approaches also leverage regularity and predictability of interconnects © Digital Integrated Circuits 2 nd Design Methodologie

The return of gate arrays? Array of prediffused cells with a superimposed wiring grid The return of gate arrays? Array of prediffused cells with a superimposed wiring grid Via programmable gate array (VPGA) Via-programmable cross-point metal-5 metal-6 programmable via Exploits regularity of interconnects © Digital Integrated Circuits 2 nd [Pileggi 02] Design Methodologie

Prewired Arrays Solution: programming in the field, outside the silicon foundry! Classification of prewired Prewired Arrays Solution: programming in the field, outside the silicon foundry! Classification of prewired arrays (or field programmable gate arrays, FPGA): q Based on Programming Technique § Fuse-based (program-once) § Non-volatile § RAM based q Programmable Logic Style § Array-Based § Look-up Table based q Programmable Interconnect Style § Channel-routing § Mesh networks © Digital Integrated Circuits 2 nd Design Methodologie

Prewired Arrays Starting from a regular array of cells…. . q How do we Prewired Arrays Starting from a regular array of cells…. . q How do we implement programmable logic? How can we commit logic to perform any possible boolean function? q How do we store the program/configuration that commits the programmable array to a certain logic function? © Digital Integrated Circuits 2 nd Design Methodologie

Configuration storage q q q Fuse-based FPGA - Use of fuses (to be blown) Configuration storage q q q Fuse-based FPGA - Use of fuses (to be blown) or antifuses (to be short-circuited) - small area overhead vs one-time-programmable Nonvolative FPGA - program stored in EEPROM/Flash - functionality retained until next programming round - Additional process steps (e. g. , ultrathin oxides), high programming voltages Volatile FPGA - program stored in RAM cells - at power up, configuration re-loading from external non-volatile memory - RAM cells programmed as a giant shift register - linear programming vs multi-cell programming - regular CMOS process is OK - logic function can be dynamically modified on the fly during execution (partial reconfiguration capability) © Digital Integrated Circuits 2 nd Design Methodologie

Antifuse-Based FPGA antifuse polysilicon 10 nm ONO dielectric n+ antifuse diffusion Open by default, Antifuse-Based FPGA antifuse polysilicon 10 nm ONO dielectric n+ antifuse diffusion Open by default, closed by applying current pulse (melting of the dielectric) The opposite holds for FUSES © Digital Integrated Circuits 2 nd From Smith 97 Design Methodologie

Prewired Arrays …. starting from a regular array of cells…. . q How do Prewired Arrays …. starting from a regular array of cells…. . q How do we implement programmable logic? How can we commit logic to perform any possible boolean function? § Array-based approach § Cell-based approach q How do we store the program/configuration that dedicates the programmable array to a certain logic function? © Digital Integrated Circuits 2 nd Design Methodologie

Logic (programmable logic devices, PLD) Include input in the minterm Include minterm in the Logic (programmable logic devices, PLD) Include input in the minterm Include minterm in the output I 5 I 4 I 3 I 2 I 1 I 0 Programmable OR array I 3 Programmable AND array I 1 I 0 Programmable OR array Fixed AND array O 3 O 2 O 1 O 0 PLA I 2 I 5 I 4 I 3 I 2 I 1 I 0 Fixed OR array Programmable AND array O 3 O 2 O 1 O 0 Fixed, trade-off flexibility for density and power PROM O 3 O 2 O 1 O 0 PAL Indicates programmable connection © Digital Integrated Circuits 2 nd Indicates fixed connection Design Methodologie

Programming a PROM 1 X 2 X 1 X 0 A large fraction of Programming a PROM 1 X 2 X 1 X 0 A large fraction of the PROM is unused! Complex logic functions determine: -low performance -low programming density And in general, no registers nor flip flops! : programmed node NA NA f 1 f 0 © Digital Integrated Circuits 2 nd PLD less and less attractive Design Methodologie

More Complex PAL How can I implement sequential logic with PLDs? x Outputs can More Complex PAL How can I implement sequential logic with PLDs? x Outputs can be fed back as a subset of the inputs Programmable D, T, J-K or clocked S-R flip flop i inputs, j minterms/macrocell, k macrocells © Digital Integrated Circuits 2 nd From Smith 97 Design Methodologie

Multi-level logic advantages Reduced sum of products form: x=ADF + AEF + BDF + Multi-level logic advantages Reduced sum of products form: x=ADF + AEF + BDF + BEF + CDF + CEF + G A D F A E F B D F B E F C D F C E F G 1 6 x 3 -input AND gates + 1 x 7 -input OR gate (may not exist!) 25 wires (19 literals plus 6 internal wires) 2 A B C 3 4 5 7 x 1 D E 2 3 4 x F G Factored form: 6 x = (A + B + C) (D + E) F + G 1 x 3 -input OR gate, 2 x 2 -input OR gates, 1 x 3 -input AND gate 10 wires (7 literals plus 3 internal wires) Such optimizations 2 nd unsopported by PLAs © Digital Integrated Circuits are Design Methodologie

Array-based Programmable Logic q + REGULAR STRUCTURE § accurate parasitic, area, power, speed estimates Array-based Programmable Logic q + REGULAR STRUCTURE § accurate parasitic, area, power, speed estimates q + SUITABLE FOR 2 -LEVEL LOGIC § E. g. functions with a large fan-in §. . or functions that map well into 2 -level logic (e. g. , FSMs) q - HIGHER OVERHEAD § capacitance of intermediate nodes § negatively affects performance and power § risk of underutilization, especially PLAs (and waste of power) The alternative is CELL-BASED PROGRAMMABLE LOGIC…. © Digital Integrated Circuits 2 nd Design Methodologie

2 -input mux as programmable logic block A mux used as logic function generator 2 -input mux as programmable logic block A mux used as logic function generator Configuration A A 0 F B 1 S B S F= 0 0 X Y Y 1 1 1 0 X Y Y 0 0 1 0 1 1 X Y X X X Y 1 0 X Y XY X+ Y X Y 1 By properly connecting inputs A, B and S to variables X and Y, 10 different logic functions can be obtained © Digital Integrated Circuits 2 nd Design Methodologie

Logic Cell of Actel Fuse-Based FPGA More complex logic gates with multiple Muxes Used Logic Cell of Actel Fuse-Based FPGA More complex logic gates with multiple Muxes Used in Actel fuse-based FPGA Any 2 or 3 inputs logic functions; some 4 inputs logic functions; a Latch © Digital Integrated Circuits 2 nd Design Methodologie

Look-up Table Based Logic Cell EXOR inference The Look-up table stores the truth table Look-up Table Based Logic Cell EXOR inference The Look-up table stores the truth table of a logic function (with n inputs, any logic function of n inputs can be implemented) © Digital Integrated Circuits 2 nd Design Methodologie

Extensions for sequential cells Sel LUT D Q CLK LUT-Based Logic Cell © Digital Extensions for sequential cells Sel LUT D Q CLK LUT-Based Logic Cell © Digital Integrated Circuits 2 nd Design Methodologie

Sizing LUTs Source: Altera white paper: FPGA Architecture q Small size LUT increases the Sizing LUTs Source: Altera white paper: FPGA Architecture q Small size LUT increases the level of logic implementation and, hence, increases circuit delay. q Large size LUT increases silicon area and cost since some of their inputs are not used in logic implementation. © Digital Integrated Circuits 2 nd Design Methodologie

LUT-Based Logic Cell Complex cells by adding more LUTs, increasing LUT size and inserting LUT-Based Logic Cell Complex cells by adding more LUTs, increasing LUT size and inserting flip-flops and Muxes 4 C 1. . C 4 xx D 4 D 3 D 2 Logic function of xxx D 1 F 3 F 2 F 1 Logic function of xxx x xxxxx CLB for Xilinx 4000 Series © Digital Integrated Circuits 2 nd xxxx Bits control xx xx Logic function x of xxx F 4 xxxx xx x x Bits control xx xx xxxx xx xx H P x x Multiplexer Controlled by Configuration Program Courtesy Xilinx Design Methodologie

q How to make interconnects programmable? © Digital Integrated Circuits 2 nd Design Methodologie q How to make interconnects programmable? © Digital Integrated Circuits 2 nd Design Methodologie

Array-Based Programmable Wiring Interconnect Point Pass transistor with memory cell M (Flash or SRAM) Array-Based Programmable Wiring Interconnect Point Pass transistor with memory cell M (Flash or SRAM) Programmed interconnection Input/output pin Cell Horizontal tracks Vertical tracks © Digital Integrated Circuits 2 nd Design Methodologie

Crossing points q q Pass transistor large number of transistors and control signals High Crossing points q q Pass transistor large number of transistors and control signals High fan-out wires delay and power Fuse/antifuse Fuses: long programming times (few connections usually needed) Antifuses require less programming one time programmable Array-based wire programming has been successful only in the write-once class of FPGAs © Digital Integrated Circuits 2 nd Design Methodologie

Network Each logic cell output routed north, west, south or east Connectivity through RAM-programmable Network Each logic cell output routed north, west, south or east Connectivity through RAM-programmable switching or connect matrices Switch Box Connect Box Interconnect Point © Digital Integrated Circuits 2 nd Courtesy Dehon and Wawrzyniek Design Methodologie

Transistor Implementation of Mesh q The transistor induces a treshold-voltage drop which limits performance Transistor Implementation of Mesh q The transistor induces a treshold-voltage drop which limits performance -level restorers, zero-Vth transistors, boosted control signals, . . q Inefficient for global interconnects © Digital Integrated Circuits 2 nd Courtesy Dehon and Wawrzyniek Design Methodologie

Hierarchical Mesh Network • Most mesh-based FPGA architectures offer alternative wiring resources allowing for Hierarchical Mesh Network • Most mesh-based FPGA architectures offer alternative wiring resources allowing for effective global wiring • Reduced fanout and reduced resistance © Digital Integrated Circuits 2 nd Courtesy Dehon and Wawrzyniek Design Methodologie

ALTERA EPLD Block Diagram Nonvolatile FPGA Logic cells are PLA elements (called Logic Array ALTERA EPLD Block Diagram Nonvolatile FPGA Logic cells are PLA elements (called Logic Array Block, LABs) 16 macrocells per LAB Primary inputs Macrocell © Digital Integrated Circuits 2 nd Courtesy Altera Design Methodologie

Altera MAX © Digital Integrated Circuits 2 nd From Smith 97 Design Methodologie Altera MAX © Digital Integrated Circuits 2 nd From Smith 97 Design Methodologie

Altera MAX Interconnect Architecture column channel row channel t PIA LAB 1 LAB 2 Altera MAX Interconnect Architecture column channel row channel t PIA LAB 1 LAB 2 LAB PIA t PIA LAB 6 Array-based (MAX 3000 -7000) Simple, predictable does not scale well © Digital Integrated Circuits 2 nd Mesh-based (MAX 9000) Wide channels (48 to 96 wires) Beyond 560 macrocells Courtesy Altera Design Methodologie

Xilinx 4000 Interconnect Architecture Combines look-up table based approach with mesh-based interconnect Low delay Xilinx 4000 Interconnect Architecture Combines look-up table based approach with mesh-based interconnect Low delay inter-CLB connections 12 8 Single 4 Double 3 CLB Quad Long 2 3 12 4 4 8 Quad Long Global Long 4 8 4 Double Single Global Clock Connect Long 2 Carry Direct Clock Chain Connect Distributed over long distances © Digital Integrated Circuits 2 nd Direct Courtesy Xilinx Can also be configured as array of memory cells Design Methodologie

RAM-based FPGA Horizontal and vertical routing channels easily recognizable 1000 CLB: 32 x 32 RAM-based FPGA Horizontal and vertical routing channels easily recognizable 1000 CLB: 32 x 32 array 25000 equivalent gates 422 kbits programming RAM CLB at 250 MHz Multi-CLB adder: 20 -50 MHz 1 32 bit adder: 62 CLB Xilinx XC 4025 © Digital Integrated Circuits 2 nd Courtesy Xilinx Design Methodologie

Heterogeneous Programmable Platforms Centered around an FPGA Fabric Embedded memories Embedded Power. Pc Hardwired Heterogeneous Programmable Platforms Centered around an FPGA Fabric Embedded memories Embedded Power. Pc Hardwired multipliers Xilinx Vertex-II Pro High-speed I/O (3. 125 Gbps transceivers) © Digital Integrated Circuits 2 nd Courtesy Xilinx Design Methodologie

Berkeley Pleiades Processor Centered around an ARM 7 core FPGA Reconfigurable Data-path Interface ARM Berkeley Pleiades Processor Centered around an ARM 7 core FPGA Reconfigurable Data-path Interface ARM 8 Core © Digital Integrated Circuits 2 nd - ARM 8: system manager - Intensive computations offloaded to a reconfigurable datapath (adders, multipliers, ASIP, . . ) - FPGA for bit manipulation • 0. 25 um 6 -level metal CMOS • 5. 2 mm x 6. 7 mm • 1. 2 Million transistors • 40 MHz at 1 V • 2 extra supplies: 0. 4 V, 1. 5 V • 1. 5~2 m. W power dissipation Design Methodologie