Скачать презентацию Giga-Scale System-On-A-Chip International Center on System-on-a-Chip ICSOC Jason Скачать презентацию Giga-Scale System-On-A-Chip International Center on System-on-a-Chip ICSOC Jason

346aff9dc8b1390becd7bf7e1030ed43.ppt

  • Количество слайдов: 26

Giga-Scale System-On-A-Chip International Center on System-on-a-Chip (ICSOC) Jason Cong University of California, Los Angeles Giga-Scale System-On-A-Chip International Center on System-on-a-Chip (ICSOC) Jason Cong University of California, Los Angeles Tel: 310 -206 -2775, Email: cong@cs. ucla. edu (Other participants are listed inside) Jason Cong

Background: “Double Exponential” Growth of Design Complexity • C 1: complexity due to exponential Background: “Double Exponential” Growth of Design Complexity • C 1: complexity due to exponential increase of chip capacity – More devices – More power – Heterogeneous integration, …… • C 2: complexity due to exponential decrease of feature size – Interconnect delay – Coupling noise – EMI, …… • Design Complexity C 1 x C 2 Jason Cong 2

10, 000 100, 000 1, 000 100, 000 10, 000 58%/Yr. Complexity growth rate 10, 000 100, 000 1, 000 100, 000 10, 000 58%/Yr. Complexity growth rate 1, 000 100, 000 10, 000 10 21%/Yr. Productivity growth rate x 100 xx xx x 1, 000 1 Transistor/Staff-Month Logic Transistors/Chip (K) Motivation: Productivity Gap 10 1998 2003 Chip Capacity and Designer Productivity Jason Cong Source: NTRS’ 97 3

Project Summary • Develop new design methodology to enable efficient giga-scale integration for system-on-a-chip Project Summary • Develop new design methodology to enable efficient giga-scale integration for system-on-a-chip (SOC) designs • Project includes three major components – SOC synthesis tools and methodologies – SOC verification, test, and diagnosis – SOC design driver – network processor Jason Cong 4

Research Team by Institutions § US § § § Taiwan § § § UCLA: Research Team by Institutions § US § § § Taiwan § § § UCLA: Jason Cong UC Santa Barbara: Tim Cheng NTHU: Shi-Yu Huang, Tingting Hwang, J. K. Lee, Youn-Long Lin, C. L. Liu, Cheng-Wen Wu, Allen Wu NCTU: Jing-Yang Jou China § § § Jason Cong Tsinghua Univ. : Jinian Bian, Xianlong Hong, Zeyi Wang, Hongxi Xue Peking Univ. : Xu Cheng Zhejiang Univ. : Xiaolang Yan 5

Current Research Team § US § § § Taiwan § § § NTHU: Shi-Yu Current Research Team § US § § § Taiwan § § § NTHU: Shi-Yu Huang, Tingting Hwang, J. K. Lee, Youn-Long Lin, C. L. Liu, Cheng-Wen Wu, Allen Wu NCTU: Jing-Yang Jou China § § § UCLA: Jason Cong UC Santa Barbara: Tim Cheng Tsinghua Univ. : Jinian Bian, Xianlong Hong, Zeyi Wang, Hongxi Xue Peking Univ. : Xu Cheng Zhejiang Univ. : Xiaolang Yan Several new faculty members in the 7 institutions Guest members from National University of Singapore, Purdue Univ. , and UCLA (EE Dept) Jason Cong 6

Thrust 1 -- SOC Synthesis Environment/Methodology (Led by Jason Cong) Design Spec VHDL/C Co-Simulation Thrust 1 -- SOC Synthesis Environment/Methodology (Led by Jason Cong) Design Spec VHDL/C Co-Simulation Design Partitioning ASIC Synthesis Code Generation for Retargetable Compiler and Assembler Generator DSP Synthesis and Optimization FPGA Synthesis and Technology Mapping Interconnect-Driven High-level Synthesis for IP Reuse Physical Synthesis for Full-Chip Assembly Embedded Processors Jason Cong DSPs Embedded FPGAs Customiz ed Logic 7

Interconnect Bottleneck in Nanometer Designs u Challenge: Single-cycle full chip communication is no longer Interconnect Bottleneck in Nanometer Designs u Challenge: Single-cycle full chip communication is no longer possible u Not supported by the 5 cycles CAD toolset current n ITRS’ 01 0. 07 um Tech 5. 63 G Hz across-chip clock 800 mm 2 (28. 3 mm x 28. 3 mm) IPEM BIWS estimations n Buffer size: 100 x u Driver/receiver size: 100 x On semi-global layer (tier 3) n n n 4 cycles u 3 cycles 2 cycles 1 cycle 0 Jason Cong 11. 4 22. 8 28. 3 : u Can travel up to 11. 4 mm in one cycle 8

Regular Distributed Register Architecture Reg. file … … Island Reg. file … … LCC Regular Distributed Register Architecture Reg. file … … Island Reg. file … … LCC FSM § FSM LCC MUX ADD Cluster with area constraint Hi FSM … k cycle Reg. file … 2 cycle Reg. file …. Local Computational Cluster (LCC) Global Interconnect Reg. file 1 cycle LCC FSM FSM LCC MUL Register File Wi Use register banks: § Registers in each island are partitioned to k banks for 1 cycle, 2 cycle, … k cycle interconnect communication in each island Highly regular Jason Cong 9

MCAS: Architectural Synthesis for Multi-Cycle Communication Using RDR Architecture C program CDFG generation MCAS MCAS: Architectural Synthesis for Multi-Cycle Communication Using RDR Architecture C program CDFG generation MCAS (Multi-Cycle Architectural Synthesis) CDFG Resource allocation & Functional unit binding ICG Scheduling-driven placement Locations Placement-driven rescheduling & rebinding Register and port binding Datapath & FSM generation Jason Cong RTL VHDL Floorplan constraints Multi-cycle path constraints 10

MCAS flow vs. Synopsys Behavioral Compiler (on Virtex-II) § Synopsys Behavioral Compiler setting: default MCAS flow vs. Synopsys Behavioral Compiler (on Virtex-II) § Synopsys Behavioral Compiler setting: default (optimizing latency) § Average latency ratio of MCAS vs. BC: 69% Jason Cong Latency Resource 11

Optimality Study of Large-Scale Circuit Placement • Construction of Placement Example with Known Optimal Optimality Study of Large-Scale Circuit Placement • Construction of Placement Example with Known Optimal (PEKO) [C. Chang et al, 2003] n Construct instances with known optimal using the characteristic of the original problem n First quantitative evaluation of the optimality of circuit placement problem n Existing placement algorithms can be 70% to 150% away from the optimal ? Jason Cong 12

High Interest in the Community • Three EE Times articles coverage – Placement tools High Interest in the Community • Three EE Times articles coverage – Placement tools criticized for hampering IC designs [Feb’ 03] – IC placement benchmarks needed, researchers say [April’ 03] – FPGA placement performance [Nov 03] • More than 150 downloads from our website – Cadence, IBM, Intel, Magma, Mentor Graphics, Synopsys, etc – CMU, SUNY, UCB, UCSD, UIC, UMichgan, UWaterloo, etc • Used in every placement since its publication http: //ballade. cs. ucla. edu/~pubbench Jason Cong 13

Floorplanning & Interconnect Planning • Based on proposed Corner Block List (CBL) representation propose Floorplanning & Interconnect Planning • Based on proposed Corner Block List (CBL) representation propose several Extended Corner Block List, ECBL, CCBL and SUB-CBL to speed up floorplanning and handle more complicate L/T shaped and rectilinear shaped blocks. • Propose floorplanning algorithms with some geometric constraints, such as boundary, abutment, L/T shaped blocks. • Propose integrated floorplanning and buffer planning algorithms with consideration of congestion. • Using research results from UCLA on interconnect planning • About 30 papers published in DAC, ICCAD, ISPD, ASPDAC, ISCAS and Transactions. Jason Cong 14

P/G Network Analysis & Optimization • Propose an Area Minimization of Power Distribution Network P/G Network Analysis & Optimization • Propose an Area Minimization of Power Distribution Network Using Efficient Nonlinear Programming Techniques (ICCAD 2001, accepted by IEEE Trans. On CAD) • Propose a decoupling capacitance optimization algorithm for Robust On-Chip Power Delivery (ASPDAC 2004, ASICON 2003) Jason Cong 15

Parasitic R/L/C Etraction • 3 -D R/C Extraction using Boundary Element Method (BEM) • Parasitic R/L/C Etraction • 3 -D R/C Extraction using Boundary Element Method (BEM) • Quasi-Multiple Medium (QMM) BEM algorithms • Hierarchical Block BEM (HBBEM) technique • Fast 3 -D Inductance Extraction (FIE) • Papers were published in ASPDAC, ASICON and IEEE Transaction on MTT Jason Cong 16

Thrust 2 -- SOC Verification, Test, and Diagnosis (Led by Tim Cheng) Verification and Thrust 2 -- SOC Verification, Test, and Diagnosis (Led by Tim Cheng) Verification and Testing and diagnosis for heterogeneous SOC Self-testing using on-chip programmable components Self-testing for onchip analog/mixedsignal components New test techniques for deep-submicron embedded memories Jason Cong Enabling techniques for semiformal functional verification Scalable constraintsolving techniques Automatic/semiautomatic functional vector generation from HDL code Integrated framework for simulation, vector generation and model checking 17

Key Results - Verification • Developed and released ATPG-based SAT solvers for circuits (Univ. Key Results - Verification • Developed and released ATPG-based SAT solvers for circuits (Univ. of California, Santa Barbara) – Integrating structural ATPG and SAT techniques with new conflict learning – CSAT: Fast combinational solver (released on March 2003) • Demonstrated 10 -100 X speedup over state-of-the-art SAT solvers on industrial test cases (reported by Intel and Calypto) • Has been integrated into Intel’s FV verification system and a startup’s verification engine • Publications: DATE 2003 and DAC 2003 – Satori 2: Fast sequential solver (released on Dec. 2003) • Demonstrated 10 X-200 X speedup over a commercial, sequential ATPG engine on public benchmark circuits • Publications: ICCAD 2003, HLDVT 2003 and ASPDAC 2004 Tim Cheng 18

Key Results - Testing A new Statistical Delay Testing and Diagnosis framework consisting of Key Results - Testing A new Statistical Delay Testing and Diagnosis framework consisting of five major components (UCSB): ATPG/Pattern Selection • Statistical timing analysis • Statistical critical path selection [DAC’ 02, ICCAD’ 02] § Selecting statistical long & true paths whose tests maximize detection of parametric failures Critical Path Selection Path Filtering Static Timing Analysis Diagnosis Defect Injection & Simulation Dynamic Timing Simulator Statistical Timing Analysis Framework (Cell-based characterization) • Path coverage metric [ASPDAC’ 03] • Selection/Generation of high quality tests for target paths [ITC’ 01][DATE § Estimating the quality of a path set 2004] § Identifying tests that activate longer delay along the target path • Delay fault diagnosis based on statistical timing model [DATE’ 03, VTS’ 03, DAC’ 03] Tim Cheng 19 § Ref: Krstic, Wang, Cheng, & Abadir, DATE’ 03–Best Paper Award in Test

Key Results - Testing • On-Chip Jitter Extraction for Bit-Error-Rate (BER) Testing of Multi-GHz Key Results - Testing • On-Chip Jitter Extraction for Bit-Error-Rate (BER) Testing of Multi-GHz Signal (UCSB) – Using on-chip, single-shot measurement unit to sample signal periods for spectral analysis – Demonstrated, through simulation, accurate extraction of multiple sinusoids and random jitter components for a 3 GHz signal – Publications: ASPDAC 2004 and DATE 2004 Tim Cheng 20

Thrust 3 – Design Driver: Network Security Processor (Led by Prof. C. W. Wu Thrust 3 – Design Driver: Network Security Processor (Led by Prof. C. W. Wu & Xu Cheng) • • Applications: IPSec, SSL, VPN, etc. Functionalities: – – • • Public key: RSA, ECC Secret key: AES Hashing (Message authentication): HMAC (SHA-1/MD 5) Truly random number generator (FIPS 140 -1, 140 -2 compliant) Target technology: 0. 18 m or below Clock rate: 200 MHz or higher (internal) 32 -bit data and instruction word 10 Gbps (OC 192) Power: 1 to 10 m. W/MHz at 3 V (LP to HP) Die size: 50 mm 2 On-chip bus: AMBA (Advanced Microcontroller Bus Architecture) Jason Cong 21

Encryption Modules (PKEM) • Public key encryption module – Operations: • 32 -bit word-based Encryption Modules (PKEM) • Public key encryption module – Operations: • 32 -bit word-based modular multiplication • Multiplication over GF(p) and GF(2 m) • • An RSA cryptography engine with small area overhead and high speed Scalable word-width TSMC 0. 35μm 34 K gates (1. 7× 1. 8 mm 2 ) 100 MHz clock Scalable key length Throughput – 512 -bit key: 1. 79 Kbps/MHz – 1024 -bit key: 470 bps/MHz Jason Cong 22

Encryption Modules (SKEM) • Secret key encryption module – Operations: • Matrix operations, manipulation Encryption Modules (SKEM) • Secret key encryption module – Operations: • Matrix operations, manipulation • • • AES cryptography 32 -bit external interface 58 K gates Over 200 MHz clock Throughput: 2 Gbps Support key length of 128/192/256 bits TSMC 0. 25 m CMOS Package 128 CQFP Core Size 1, 279 x 1, 271 m 2 Gate Count 63. 4 K Max. Freq. Jason Cong Technology 250 MHz Throughput 2. 977 Gbps (128 -bit key) 2. 510 Gbps (196 -bit key) 23 2. 169 Gbps (256 -bit key)

International Collaborations • Joint NSF/NSC workshop in Aug. 1999 on SOC (Hsin-Chu, Taiwan) • International Collaborations • Joint NSF/NSC workshop in Aug. 1999 on SOC (Hsin-Chu, Taiwan) • First team preparation meeting for the proposed center in Jan. 2000 (Yokohama, Japan) • 2 nd planning meeting held in April 2000 (Hawaii, US) • 3 rd planning meeting in Aug. 2000 (Chengde, China) • Proposal submitted to NSF in Aug. 2000 and funded in Dec. 2000 • Workshops – March 30 -31, 2001 in Taipei, Taiwan. – June 23 -24, 2001 in Los Angeles, USA – August 31 -September 1, 2001 in Hang. Zhou, China • March 28 -29, 2002, National Tsing Hua University, Hsinchu, Taiwan • August 20 -21, 2002, Peking University, Beijing, China • November 15 -16, 2002, University of California, Santa Barbara • March 27 -29, 2003, National Taiwan University, Taipei, Taiwan • December 19 -21, 2003, Yunnan University, Kunming, China Jason Cong 24

Publications • 56 research publications up to this point • 17 in top conferences/journals Publications • 56 research publications up to this point • 17 in top conferences/journals (DAC, ICCAD, ASPDAC, ITC, etc. ) in the field Jason Cong 25

People & Education • Many interactions among participants from different institutes • Two new People & Education • Many interactions among participants from different institutes • Two new IEEE fellows: – Prof. Xiaolang Hong, Tsinghua Univ. – Prof. Cheng-Wen Wu, National Tsing Hua Univ. • Involved many young faculty members and researchers • Trained an army of graduate students Jason Cong 26