9cfc05798de5d15a039ffdf39a28816a.ppt
- Количество слайдов: 45
SOC & Embedding System Group o Embedding System n n o Embedded OS – 曾建超 Multimedia – 蔡淳仁 , 蔡文錦 Low power mobile – 曹孝櫟 Storage – 張立平 SOC Design & CAD n n n Network – 林盈達 Architecture and Systems – 鍾崇斌 、 單智君 Wireless base-band Processor – 許騰尹 Multimedia SOC Design – 蔡淳仁 , 彭文孝 Electrical Design Automation – 李毅郎
Research Interests Chien-Chao Tseng 曾建超 網路 程研究所 系統設計研究所 交通大學資訊學院 cctseng@csie. nctu. edu. tw
Wireless Access to Internet : -) fs sts dof ere an Int d H My an • 3 G/GPRS/PHS ? ing • Wi. Max/WLAN/Bluetooth/PAN oam R ØHeterogeneous Wireless Overlay Networks – Multi-interface Handheld Devices
n Embedded OS for Multi-interface Handheld Devices n Cross-layer design for Real-time Applications Linux/Windows XP/CE Ø Driver, Network, and Application Layers (Vo. IP) Ø n Heterogeneous Wireless Networks n n WPAN, WLAN and Mobile Router Roaming and Routing Embedded Wireless Mesh and Sensor Networks n n PHS Multi-tier Wireless Network n n WLAN/Wi. Max/3 G/GPRS/PHS 3 G/GPRS Roaming and Handovers Address Assignment and Routing WLAN Secured and Fast Accesses to Wireless Network
Embedded Systems (曹孝櫟助理教授 ) Research Directions o n n o o Embedded Software for B 3 G/4 G Mobile Devices Protocol Stacks for 4 G access Embedded Operating System and Device Drivers, and their Optimization for Mobile Devices Cooperate with international and local vendors and institutes to development 4 G/multimode radio SOC Establish the reference embedded software for next generation mobile devices/Radio So. Cs
Embedded Systems (曹孝櫟助理教授 ) Low Power and Fast Handover R&D Results - Cellular/WLAN Dual Model Mobiles Power Consumption Evaluation System Architecture and Prototype of Cellular/WLAN Dual Mode Mobile Awarded by Handover Latencies Evaluation 2005 Mobile Communications Contest of Industrial Development Bureau, MOEA 2005 Software Contest of National Center of High-Performance Computing 2006 Embedded Software Contest of MOE
Prof. Li-Pin Chang 張立平 o Recent research directions n n n Embedded storage systems Real-time systems and scheduling algorithms Hardware-software co-design
Embedded Storage: Efficient wear-leveling algorithm for flash memory o o To capture uneven usages from millions of blocks and to level them Result: the most fast, effective, economic approach available!! LBA Erase cycle # Worn-out quickly! Time Block # Access pattern Block usage
Real-Time Systems: Overload Management for Real-Time Object Tracking Inter-arrival time of frames : 4 ms. Workload-scaling factor: 4/7 (57%) Average RMS error t Firm-real-time: (c, 4) ((4, 7), c, 4) Average RMS error (1, 4) i drop t" j drop FE ((4, 7), 2, 4) t Proportional Adjustment: (c, 4) (c, 7) (1, 4) i t' j PP (2, 7)
Hardware-Software Co-design Reconfigurable computing for overload management o Reconfigurable computing for overload management n n Past achievement: Ø Overload management for event-driven real-time embedded systems Working-in-progress: Ø To deal with transient workload bursts with hardware acceleration Ø Move critical tasks onto FPGA • • • Computing resource reclamation On-line floor planning On-line topology reconfiguration for network-on-chip (No. C)
Embedded Systems (蔡文錦 ) Research Directions o o Low-power embedded systems Video compression/decompression
Plan in the near future o Low-power AVC/H. 264 video CODEC algorithm and system design
Multimedia Embedded Systems Lab ( 蔡淳仁 ) – Research Directions o o o So. C Design for Advanced Video Codecs DVB/MHP middleware & Java Runtime Java Processor for DVB/MHP Flexible Multimedia Codec So. C Platforms OS Kernel Scheduler for Tightly-coupled Heterogeneous Multi-core Platforms
Multimedia Embedded Systems Lab R&D Results o o o H. 264 Codec Accelerators on ARM Integrator Java Processor Accelerating Technologies on Spartan 3 and ML-310 Platforms (based on the open source JOP project) Video Rate Control for HW/SW Codesigned So. Cs (patent application) Tightly-coupled H. 264 encoder on TIOMAP 5912 Tightly-coupled kernel scheduler module for ARM-Linux on TI-OMAP 5912
Future Plans o o Implementation a flexible multimedia codec So. C platform Design of a new Java Processor for DVB/MHP Design of Hardware-Friendly Psychovisual-models for Video Codec Clean Design of a Multi-core OS kernel suitable for Tightly-Coupled Task Scheduling
Architecture and Systems Research Directions (單智君 鍾崇斌 ) o o o o Embedded processor and So. C Java processor, JIT compilation &VM DSP designs and compilation Low-power systems Graphic processor Superscalar ARM processor Reconfigurable computing
Architecture and Systems R&D Results o o o o o ARM 9 -compatible processor with video/audio capabilities Java stack operations folding Memory Constrained Java Just-in-time Compiler DSP– instruction set extensions Low-power Branch-Target-Buffer Low-power bus encodings Low-power cache memory Graphic processor design techniques Superscalar ARM Reconfigurable computing
ARM 9 -compatible Processor with Audio/Video Capabilities o o ARMAVP (ARM Audio Video Processor) 為 32位元微處理器,採用負載平衡良好的 五階管線設計,分別為 Fetch Unit、 Decoder Unit、 Execution Unit、 Memory access Unit 以及 Write Back Unit。對各 階的設計進行效能的最佳化,以提高時脈 頻率,並提供有效率的機制,降低了因為記 憶體速度太慢對微處理機效能上的影響 特性 n n n n n o 支援 Conditional Execution ABP 緩衝器設計 改良指令抓取所需時間 精確中斷控制結構 非同步的記憶體存取 動態暫存器組的映射 分支指令的快速處理 多功能有效率的執行路徑 分散式指令控制編碼 功能驗證與評估 n 所有功能已在 Altera EP 20 K 600 EBC 652 -1 上完成驗證。根據 Decode Stage之模擬結 果,在FPGA上可 作於 45 MHz,預期實做為 晶片時可達 210 MHz
DSP– Instruction Set Extensions o Current research topics n n n Multiple-issue architecture Ø Exploring ISE in a multiple-issue architecture, such as superscalar or Very Long Instruction Word (VLIW) Hardware reusebility Ø Reuse same or similar hardware resources in different ASFUs while keep same performance Overcome register file read/write port constraint Ø Try to schedule the input and output of ASFU at different time slots
Low-power Bus Encodings o 在此我們針對不同的匯流排架構的特性,提出了不同的低電 耗匯流排編碼系統。我們的編碼系統利用了各種編碼方法, 將藉由匯流排傳輸的資料,以最具有電耗效率的方式來傳送, 達到省電的效果。 匯流排編碼架構 傳送端 接收端 編碼過的資料 原始資料 o 編碼器 額外控制線路 解碼器 原始資料 低電耗匯流排編碼系統 資料 記憶體 資料位址匯流排 T 0_BI_1, Variable-Stride, SRWEC 資料匯流排 Leading-bytes encoding 指令位址匯流排 處理器 T 0 + Discontinuous Address Table 指令匯流排 BIBITS with Register Relabling 指令、位址混和之位址匯流排 處理器 I/D Selector, T 0 DAT+Stride-Table 指令、位址混和之匯流排 I/D Selector, BIBITS_RR+Leading-bytes 記憶體 指令 記憶體
Low-power Cache Memory 快取記憶體佔有整體處 理器超過 50%之功耗 低功耗快取記憶體設計 o o n n Loop Buffer: 將 loop code置入低耗電存取 之 loop buffer中以節 省指令擷取之功耗 Power Manager: 將不 常使用之快取記憶體 區塊置入低耗電模式 以節省快取記憶體之 靜態功號。
Graphic Processor 2 3 4 5 1 6 研究目的 ︰ 進行新一代繪圖處理器架構 研究,於像素著色器 (Pixel Shader)、材質 (Texture) 及深 度處理 (Depth Processing) 等 三大方向提出硬體架構及軟 體驗證環境。 目前成果分項說明如下 ︰ 1. A dynamically reconfigurable graphics hardware for resource reallocatable rendering pipeline 2. A Reconfigurable Texture Mapping Architecture 3. Implementation of texture Compression by GPU Driver 4. Register Renaming for Pixel Shaders data/value management 5. Instruction scheduling mechanism for 3 D GPU pixel shader 6. An Efficient Texture Memory System Designs 7. Alpha Blending without Z Sort
Superscalar ARM Goal: a superscalar embedded processor featuring o 800 MHz clock rate @ 0. 13 um 1. 8 DMIPS / MHz – superscalar performance under tough pipeline latency 800 K gate count – cost-effective design n Directions and achievements o Micro-architecture n Ø A 12 -stage dual-issue superscalar processor with good instruction fetch rate, issue rate, and efficient forwarding Simulator n Ø A cycle-accurate simulator modeling more details than the well-known simplescalar simulator Compiler n Ø Working on GCC machine description to optimize performance
Reconfigurable Computing Motivations: o Improving the Design Methodology of Embedded System Hardware o Providing a Better Performance with Low Development Cost o Shorting the Time-to. Market of So. C Products Research Issues: o Hardware/Software Partition o Synthesize Technology o Reconfigurable Processing Element Design Reconfigurable Architecture (1/2)
Research overview in SOC and Embedded Systems (林盈達 ) o Research theme: Ø o o Content networking with deep packet inspection by software and hardware solutions; with applications in Internet security (intrusion detection, anti-virus, antispam, content filtering, MSN/P 2 P management) Embedded software n Embedded Linux solutions: 7 -in-1 10 -in-1 n A startup company, L 7 Networks (L 7 -Networks. com), 2002, for all-in-one security gateways So. C n Key component in content networking: string matching hardware acceleration needed! n FPGA-based development to accelerate Aho Crosaic and Bloom Filtering algorithms
Embedded and So. C Group Selected R&D Results (2/2) o o 7 -in-1 integrated security gateway String Matching Engine to Accelerate Aho Corasic Machine Unified Content Filtering Hardware Platform String Matching Hardware with Bloom Filters
7 -in-1 Integrated Security Gateway • 7 -in-1: VPN, Firewall, NAT, Routing, Content Filtering, Intrusion Detection, Bandwidth Management • Launched a startup in 2002: L 7 Networks Inc. LAN/DMZ MAC Filter WAN LAN/DMZ to WAN Outbound Traffic Redirect Y In-LAN Filter Policy Route Out-WAN Filter IPsec VPN Y Out-LAN Filter Bandwidth Mgt. Y FTP/POP 3/SMTP/ Web/URL Filter with Many-to-One NAT Bandwidth Mgt. NAT Alerting System Intrusion Detection Y Route In-WAN Filter Redirect WAN to DMZ/LAN Inbound Traffic de. NAT Y sniff IPsec de. VPN
String Matching Engine to Accelerate Aho Corasic Machine o New Parallel Architecture with Pre-Hashing and Root. Indexing
Unified Content Filtering Hardware Platform o Resolve content filtering issues n n Match without interrupt CPU Multiple connections management On-fly match non -fixed payload Multiple patterns and multiple matched outputs Content Filtering Hardware Text First Matched Last Match Status Length Offset ID Offset Text Pointer FA State . . . Text First Match Last Match Status Length ID Offset Text Descriptors in DRAM
String Matching Hardware with Bloom Filters shift controller Leaving byte Bloom filter(1) Bloom filter(2) Platform: Xilinx ML 310 Embedded Development Platform with embedded Power. PC 405 processor Xilinx Virtex-II Pro XC 2 VP 30 FPGA Monta. Vista Linux Professional Edition 3. 0 Bloom filter(3) Feature Set: 1. Allow maximum shift distance if possible. 2. Reconfigure rules easily. 3. Keep constant hardware complexity. Entering byte detect prefix(p, 1) detect prefix(p, 2) detect factor in p
Embedded and So. C Group Major Projects o o o Excellence Project: Next Generation Information Communication Networks (卓越後續計劃 , 國科會 2004~2008): n 林盈達 , 曾文貴 (with 24 faculty members) Network Benchmarking Lab ( 研院交大網路測試中心 , www. nbl. org. tw, 經濟部 業局 , 2003~2007) n 林盈達 Attack Session Extraction and Comparison with Nessus (Cisco San Jose, 2005~2006) n 林盈達 Content-based Network Security - Content Classification: Design, Implementation, and Evaluation (整合型計劃 , 國科會 , 2004~2006) n 林盈達 (with 李程輝 , 孫雅麗 ) Open Source Product Testing Tools: In-Lab Live Testing (國科會 , 2005~2006) n 林盈達
Biography of Ying-Dar Lin 林盈達 n n n n B. S. , NTU-CSIE, 1988 Ph. D. , UCLA-CS, 1993 Professor, NCTU-CS, 1999~ Founder and Director, ITRI-NCTU Network Benchmarking Lab (NBL; www. nbl. org. tw), 2002~ Co-Founder, L 7 Networks Inc. (www. L 7. com. tw), co-invested by DLink, Zy. XEL, and Advantech, 2002 Consultant, CCL/ITRI, 2002~ Well-cited paper: Multihop Cellular: A New Architecture for Wireless Communications, INFOCOM 2000, YD Lin and YC Hsu; # of citations: 150 n Areas of research interests q q n Design, implementation, analysis, benchmarking of Internet gateway devices (10 -in-1: routing, NAT, firewall, VPN, IDP, CF, anti-virus, anti-spam, IM, P 2 P, bandwidth management, link load balance, etc. ) Internet security and Qo. S Content networking Test technologies of switch, router, WLAN, security, and Vo. IP Publications q q q q International journal: 39 International conference: 33 IETF Internet Draft: 1 Industrial articles: 124 Books: 2 Patents: 16 Tech transfers: 8
Wireless Baseband Processor (許騰尹 ) o o MIMO OFDM PHY Ultra Low-power PHY Generic PHY architecture Chip Implementations
Wireless Baseband Processor Spreading Gate Count : 500 Max. Freq : 80 MHz PAM Match Filter Gate Count : 4800 Max. Freq : 80 MHz Clock Recovery Gate Count : 1500 Max. Freq : 178 MHz CTRL Gate Count : 1500 Max. Freq : 80 MHz Spreading PAM Match Filter Clock Generator Clock Recovery Divider Digital Divider Gate Count : 900 Max. Freq : 60 MHz Clock Generator Gate Count : 2600 Max. Freq : 165 MHz
Proto-type 802. 11 b Baseband+MAC chip Item Technology A/D (Q) A/D (I) PLL 0. 25 um CMOS 1 P 5 M VLSI Type Cell-Based Design Function 802. 11 b Baseband+MAC System Frequency 44 MHz Package 208 QFP Gate Count D/A Specification Not available Chip Size Not available Power supply 2. 5 V (digital) 3. 3 V (analog) Power Dissipation 650 m. W
Architecture and Systems R&D Results o o o ARM 9 -compatible processor with video/audio capabilities (technology transferring) Java stack operations folding (patents) Asynchronous 8051 on FPGA Low-power Branch-Target-Buffer (patent application) Low-power bus encodings (patent applications) Graphic processor design techniques
SOC Electrical Design Automation ( 李毅郎 ) – Research Directions o Reliable Interconnect Design n n o Layout Migration n o Crosstalk-driven Interconnect Design-for-Manufacture (DFM) Interconnect Design VLSI Cell Migration with Topology Preservation Post-Layout Platform for Verification and Optimization
SOC Electrical Design Automation– RD Results o Tile-based Gridless ECO Router with Graph Reduction n o NEMO: A New Full-Chip Gridless Router n o o Two times faster than existing tile-based routers. Faster than all academic gridless routers Crosstalk-driven Track Assignment Pre-Detailed Routing Design Flow Considering Capacitive- and Inductive-Noise Constraints
SOC EDA Group RD Results - New ECO Routing Design Flow
SOC EDA Group RD Results – Full. Chip Gridless Router
Electronic System Level Design http: //mapl. nctu. edu. tw (彭文孝 ) Traditional Design Flow with ESL System Level Verification and Integration First Time Silicon Success
Design Practice: Transaction Level Modeling for H. 264 Decoder (彭文孝 ) http: //mapl. nctu. edu. tw Cache SDRAM Controller Data Transaction Bus Arbitration Video Pipe Control Bus CPU Output Interface
So. C for Multi-Standard Video Codec (彭文孝 ) http: //mapl. nctu. edu. tw Video Codec HD Capturing System on Chip ARM-9 CPU 3 -A Functionalities Color Transform Embedded SRAM and Ob-Chip Bus Networking Bus Arbitration Architecture C Model
VLSI/SOC Research for Graphics System (范倫達老師 ) VLSI Information Processing LAB Advisor: Lan-Da Van (ldvan@cs. nctu. edu. tw) 3 -D Graphics Demo Here!
VLSI/SOC Research for Adaptive Communications (范倫達老師 ) o 虛擬系統單晶片平台 (Virtual SOC Platform)建置 – 使用 Co. Ware Platform Architect n n n 提供虛擬系統平台供軟體人員程式開發 提升系統模擬之層級以提高系統驗證效率 發展效能評估指標 : 根據效能評估指標的模擬結果進而得到系統架構的最佳配置,以 供系統開發時有所依據 Ø Ø 在不同的軟硬體組態,模擬各功能函數所花費的時間 在不同的軟硬體組態,計算模組對 bus之進行存取次數 Block diagram of platform Memory location (size) Addr. Bits / Data. Bits AHB SW stub 0 x 0 (0 x 100000) RAM 0 x 400 0000 (0 x 100000) 20 / 32 ARM 926 Instruction Data clock reset ROM 32 / 32 20 / 32 32 / 32 FFT HW 0 x 1000 0000 (0 x 4) APB i. TCM d. TCM Virtual SOC Verification Platform IP Implement ation 1/8 din Display 0 xc 0000 (0 x 1) FFT/IFFT Chip Design
9cfc05798de5d15a039ffdf39a28816a.ppt