Скачать презентацию Interconnect-Centric Approach to System on a Chip i Скачать презентацию Interconnect-Centric Approach to System on a Chip i

8f85b157bdd370177ce9bc12a72e6ac1.ppt

  • Количество слайдов: 40

Interconnect-Centric Approach to System on a Chip (i. So. C) for Low-Power Signal Processing Interconnect-Centric Approach to System on a Chip (i. So. C) for Low-Power Signal Processing 성균관대 조준동

차례 • • 재구성 플랫폼 Software Defined Radio SW/HW 통합 설계 사례 SW/HW 통합 차례 • • 재구성 플랫폼 Software Defined Radio SW/HW 통합 설계 사례 SW/HW 통합 설계 도구 Network on Chip 연구실 소개 연구 제안

So. C and Customizable Platform Based-Design DSP Reconfigurable Hardware (Fine Grain) ASIC 1 ASIC So. C and Customizable Platform Based-Design DSP Reconfigurable Hardware (Fine Grain) ASIC 1 ASIC 2 Reconfigurable Hardware (Coarse Grain)

Semiconductor Revolutions - Makimoto’s wave hardware people µproc. , memory TTL 1967 1957 CS Semiconductor Revolutions - Makimoto’s wave hardware people µproc. , memory TTL 1967 1957 CS new breed needed people LSI, MSI 1977 1987 ASICs, accel’s FPGAs 1997 2007 soft CPUs coarse grain

Abstract • i. So. C는 So. C design 의 scalability, flexibility를 향상시키기 위한 on-chip Abstract • i. So. C는 So. C design 의 scalability, flexibility를 향상시키기 위한 on-chip communication architecture • Dynamic Configuration • i. So. C 의 규칙적이고 유연한 구조는 global communication을 위한 traffic, power, speed, area requirement 모델링을 위해 예측 가능한 framework를 제공

IBM’s Coreconnect 초기의 32 비트에서 시작하여 128비트까지 대역폭을 확장 IBM’s Coreconnect 초기의 32 비트에서 시작하여 128비트까지 대역폭을 확장

Sonics Smart Interconnect IP Sonics Smart Interconnect IP

SMART (Sonics Methodology and Architecture for Rapid Time-to-Market) • plug-and-play on-chip communications network • SMART (Sonics Methodology and Architecture for Rapid Time-to-Market) • plug-and-play on-chip communications network • Packet-based • 50 employees in a year • IP 및 설계환경 제공, So. C 설계 지원 • Cadence와 연합 • Silicon. Backplne III는 통신+미디어

Nexperia Digital Video Platform • Designing the initial platform, along with the pnx 8500, Nexperia Digital Video Platform • Designing the initial platform, along with the pnx 8500, wasn't quick and easy. • It involved about 300 hardware, software and systems people working between 1999 and 2001, of which 60 were involved with hardware.

발전 방향 • 멀티미디어 응용 제품의 확대와 이에 필요 한 대용량의 burst 데이터 전송요구를 발전 방향 • 멀티미디어 응용 제품의 확대와 이에 필요 한 대용량의 burst 데이터 전송요구를 만 족하기 위한 통신 대역폭을 확장 • Dual-Core Architecture (ARM+DSP)

온칩 네트워크 아키텍처 ● Router/Scheduler 알고리즘 개발 ● System. C를 이용한 네트워크 모델 설계 온칩 네트워크 아키텍처 ● Router/Scheduler 알고리즘 개발 ● System. C를 이용한 네트워크 모델 설계 및 검증 ● Star형/Mesh형 온칩 네트워크 핵심 IP 설계 ● Master/Slave 네트워크 인터페이스, 고성 능 메모리 관리 인터페이스 설계

온칩 네트워크 기반 So. C 설계 플랫폼 구축 및 설계 환경 ● 분산형 Crossbar 온칩 네트워크 기반 So. C 설계 플랫폼 구축 및 설계 환경 ● 분산형 Crossbar Switch Topology 생성 및 IP 맵핑 툴 개발 ● IP to Mesh Tile 맵핑 툴 개발 ● IP간 데이터 플로우 분석 기반 네트워크 Topology 생성 툴 개발, So. C 플랫폼 구축

활용 분야 - Qo. S를 보장하는 프로토콜을 지원하여 Real Time Application 및 대용량 데이터 활용 분야 - Qo. S를 보장하는 프로토콜을 지원하여 Real Time Application 및 대용량 데이터 대역폭이 요구되는 응용 분야에 적합 - 멀티미디어 So. C, 휴대 및 통신용 단말기, 인터넷 셋톱 박 스, 게임기, 네트워크 단말의 제품 구현에 필요한 시스템 레벨 칩 등 - high frame rate video 및 3 D 그래픽 관련 등과 같은 멀 티미디어 대용량 응용분야 So. C 설계 - 온칩 네트워크 핵심 IP 및 설계 지원 툴을 하나의 플랫폼 화한 플랫폼 기반 설계 환경을 구축하여 이를 다양한 So. C 설계에 활용함

최근 연구동향 • • Intel’s Reconfigurable Radio Architecture. (mesh + nearest neighbor) Reconfigurable Baseband 최근 연구동향 • • Intel’s Reconfigurable Radio Architecture. (mesh + nearest neighbor) Reconfigurable Baseband Processing, Picochip Portable Components using Containers for Heterogeneous Platforms, Mercury Computer Systems, Inc. A configurable Platform, Altera, Excalibur, Xilinx Virtex FPGA Adaptive Computing Machine, Quicksilver Tech. Mercury, Sky, Galileo, Tundra (crossbars, bridges) Virginia Tech’s reconfigurable hardware

66% chips are not OK on first silicon (2004) Mid-90 s – 6 months 66% chips are not OK on first silicon (2004) Mid-90 s – 6 months late = > 31% earnings loss Today 3 month late = $500 M loss

HIERARCHY OF PLATFORMS HIERARCHY OF PLATFORMS

Full Application Platform • users design full applications on top of hardware and software Full Application Platform • users design full applications on top of hardware and software architectures • Nexperia • Texas Instrument's OMAP multimedia platform • Infineon's M-Gold 3 G wireless platform, • Parthus' Bluetooth platforms • ARM's Prime. Xsys wireless platform

processor-centric platform • focus on access to a configurable processor but doesn't model complete processor-centric platform • focus on access to a configurable processor but doesn't model complete applications • Improv Systems • ARC • Tensilica • Triscend

communication- centric platform • interconnect architecture but doesn't typically provide a processor or a communication- centric platform • interconnect architecture but doesn't typically provide a processor or a full application • Sonics' Silicon. Backplane • Palm. Chip's Core. Frame architectures.

fully programmable platform • consisting of FPGA logic and a processor core • Altera's fully programmable platform • consisting of FPGA logic and a processor core • Altera's Excalibur, Xilinx' Virtex-II Pro and Quicklogic's Quick. MIPS • Xilinx-IBM XBlue architecture

SDR solution으로 5 단계 Tier 0 전통적인 하드웨어 구현 Tier 1 SCR(software controlled radios) SDR solution으로 5 단계 Tier 0 전통적인 하드웨어 구현 Tier 1 SCR(software controlled radios) 소프트웨어로 다중 하드웨어 요 소에 대한 제어 특징을 구현 Tier 2 SDR(software defined radios) 소프트웨어로 변조와 기저대역 처리를 구현하고, 다중 주파수 RF는 고정된 기능의 하드웨어 로 구현 Tier 3 ISR(Ideal Software radio) 안테나에서 아날로그 변환 기능 을 갖는 RF 구현을 통해 프로 그램 능력을 확장 Tier 4 USR(Ultimate software radio) 디지털 처리 능력에 추가하여, 빠 른(수 millisecond 이내) 통신 프로토콜 전환 능력까지 제공 Sand. Bridge (ARM+ 4 DSP’s)

Introduction • Wireless processing system은 높은 throughput과 함께 많은 계산을 필요로 하지만 엄격한 power Introduction • Wireless processing system은 높은 throughput과 함께 많은 계산을 필요로 하지만 엄격한 power 제약이 있음 • 재구성 So. C 구현은 parallelism 에 의해 성 능향상을 시도하고, IP reuse를 사용 • Hot spot bottleneck(or traffic)에 의한 성 능 예측을 통한 Algorithm partitioning

Introduction • Scheduled interconnect – Link utilizations are substantially smaller than the bus since Introduction • Scheduled interconnect – Link utilizations are substantially smaller than the bus since communication is distributed and pipelined throughout the system. – Eliminate the congestion caused by the bus and header overhead presen in dynamic routing. • Reconfigurable Architecture Workstation (RAW) project has reexamined static communication as a mechanism for general-purpose computing. • 규칙적인 연결구조와 정적인 스케줄링은 불필요한 interconnect switching 을 제거 전체 core 에서 Computational load 의 균형을 맞추어 성능향상 Overhead of the configuration streams – Configuration streams must be scheduled periodically along with the data – 4% 의 bandwidth를 configuration stream 이 사용 Data content variation 과 system operating 환경에 따라 core interface 와 core 자체가 low power 모드로 동적 재설정 • • •

Scheduled Communication • A tiled architecture • 각 tile은 computational core 이며 각 interface가 Scheduled Communication • A tiled architecture • 각 tile은 computational core 이며 각 interface가 네트웍을 구성 • Core interface는 하나 이상의 tile 에 서 발생하는heterogeneous processing의 사용을 제공함 • The system connect using statically scheduled mesh of interconnect • Data 는 이웃하는tile 과 communication pipeline 에 의해 이 동하므로 fast clock rate 와 interconnection resource의 시 분할 이 가능 • Core 와 runtime interconnect 의 재 설정 능력에 의해 dynamic power management 를 가능케 한다.

Adaptive System on Chip Adaptive System on Chip

Communication Interface -Stream data that passes through a communication interface is scheduled for a Communication Interface -Stream data that passes through a communication interface is scheduled for a specific communication - clock cycle based on data link availability. -the result of scheduling for each interface is a set of

9 -core and 16 -core Mode 9 -core and 16 -core Mode

Evaluation Methodology Evaluation Methodology

Performance of the Benchmarks Performance of the Benchmarks

Dynamic Power Management • Dynamic Power Management 는 data content 의 run-time variation에 따른 Dynamic Power Management • Dynamic Power Management 는 data content 의 run-time variation에 따른 서로 다른 clock domain을 이용한 frequency 의 감소로 인한 power saving • DCT 구현에서 계산 결과 값이 변하지 않는 high order bit는 bypass 하여 switching을 제거 • Valid data stream data일 경우만 연결시켜 불필 요한 switching 을 제거 • Prefetch many frames in a optimal-sized buffer [npettis@purdue. edu]

Dynamic Power Management • Reconfigurable clock based system balancing creates an environment of just Dynamic Power Management • Reconfigurable clock based system balancing creates an environment of just in time computing which can reduce overall power usage. • Taking advantage of interconnect flexibility allows a system to dynamically change functionality and avoid unused computational units. • Interconnect power consumption is low and the overhead due to configuration streams is under 10% for both bandwidth and power.

Power Metric • Based on network activity and HSPICE circuit simulation of interconnect, the Power Metric • Based on network activity and HSPICE circuit simulation of interconnect, the network power consumption(Pint) is: T : represents the number of tiles PIF/D: overhead of the instruction memory fetch and decode s: the number of stream Nvs and Nivs: the number of valid and invalid transfer for stream s while Ps is the power consumed in transferring 1 bit through stream s

i. SOC Compiler • divides applications into parts, each of which fit into a i. SOC Compiler • divides applications into parts, each of which fit into a specific core. • determines data communications between the cores in a space-time fashion • generate interconnect memory contents for each individual interface.

References • a. SOC: A Scalable, Single-Chip Communications Architecture Jian Liang, Sriram Swaminathan, and References • a. SOC: A Scalable, Single-Chip Communications Architecture Jian Liang, Sriram Swaminathan, and Russell Tessier Department of Electrical and Computer Engineering University of Massachusetts, Amherst, MA. 01003. {jliang, tessier}@ecs. umass. edu • Configurable Platforms With Dynamic Platform Management: An Efficient Alternative to Application-Specific System-on-Chips – – Krishna Sekar Kanishka Lahiri Sujit Dey ksekar@ece. ucsd. edu klahiri@nec-labs. com dey@ece. ucsd. edu Dept. of ECE, UC San Diego, La Jolla, CA NEC Laboratories America, Princeton, NJ

OMAPTM(open multimedia application platform) • OMAP architecture는 platform의 전체 clocking과 idle mode의 전체 control을 OMAPTM(open multimedia application platform) • OMAP architecture는 platform의 전체 clocking과 idle mode의 전체 control을 할 수 있 는 SW/OS가 있다. • Dual core architecture 는 task에 대해 가정 적 당한 process에게 task 를 할당하는 것이 가능

Memory vs Reused-IP Memory vs Reused-IP

ED 2 • SMT (Simultaneous Multi-Threading) 20% speed-up and 24% power overhead [yingmin@cs. virginia. ED 2 • SMT (Simultaneous Multi-Threading) 20% speed-up and 24% power overhead [yingmin@cs. virginia. edu] using Power. Timer, Power. PC simulator Slow-down using DVS: 10% energy gain, scheduling: 15% every saving increase

Time-Space Exploration • Enumerate all Trade-off’s and select the one with the most benefit. Time-Space Exploration • Enumerate all Trade-off’s and select the one with the most benefit. • Branch and Bound method for estimating every So. C metric.

Jiang Xu and Wayne Wolf Princeton University First decide an architecture, and assign estimated Jiang Xu and Wayne Wolf Princeton University First decide an architecture, and assign estimated requirements to unavailable modules. Adjust the requirements using performance analysis in a trial-anderror fashion. Based upon the requirements purchase IP cores and design customized modules. May need several iterations to reach a final design. It is very helpful, if designers can get performance models of IP cores before buy them. Cadence Virtual Component Codesign(VCC)

A Multimedia Embedded Chip A Multimedia Embedded Chip