d3f4c678ce29dd8a11f8b518157137c2.ppt
- Количество слайдов: 42
No. C: Network OR Chip? Israel Cidon Technion
Technion’s No. C Research: n PIs n Israel Cidon (networking) ¨ Ran Ginosar (VLSI) ¨ Idit Keidar (Dist. Systems) ¨ Avinoam Kolodny (VLSI) ¨ Israel Cidon, Technion Students: ¨ Evgeny Bolotin, ¨ Reuven Dobkin, ¨ Zvika Guz, ¨ Arkadiy Morgenshtein, ¨ Zigi Walter ¨ Roman Gindin
Origins of the No. C concept n Early publications: ¨ ¨ ¨ ¨ Guerrier and Greiner (2000) – “A generic architecture for on-chip packet-switched interconnections” Hemani, Jantsch, Kumar, Postula, Oberg , Millberg and Lindqvist (2000) – “Network on chip: An architecture for billion transistor era” Dally and Towles (2001) – “Route packets, not wires: on-chip interconnection networks” Wingard (2001) – “Micro. Network-based integration of So. Cs” Rijpkema, Goossens and Wielage (2001) – “A router architecture for networks on silicon” De Micheli and Benini (2002) – “Networks on chip: A new paradigm for systems on chip design” Bolotin, Cidon Ginosar and Kolodny (2004) – “QNo. C: Qo. S architecture and design process for network on chip” Israel Cidon, Technion
Evolution or Paradigm Shift? Network link Network router Computing module Bus n Architectural paradigm shift ¨ n Design paradigm shift ¨ n Replace wire spaghetti by an intelligent network infrastructure Busses and signals replaced by packets Organizational paradigm shift ¨ Create a new discipline, a new infrastructure responsibility Israel Cidon, Technion
successful Characteristics of a paradigm shift n Addresses a critical and topical need n Enables a quantum leap in productivity and application n Resistance from legacy experts n Requires a major change of mindset and skills! Think: Networking not Bus evolution! Israel Cidon, Technion
Critical needs addressed by No. C 1) Efficient interconnect: delay, power, noise, scalability, reliability Module 2) Increase system Module integration productivity 3) Enable Chip Multi Processors Israel Cidon, Technion Module Module Module
No. C offers Area and Power Scalability For Same Performance, compare the Wire-area and power: No. C: Point-to Point: Simple Bus: Segmented Bus: E. Bolotin at al. , “Cost Considerations in Network on Chip”, Integration, special issue on Network on Chip, October 2004 Israel Cidon, Technion
4 Decades of Network 101 n n Evolved from busses and p-t-p connections Extensive architectures, modeling and analysis research Architecture is about optimizing network costs Different goals and element costs => different architectures: ¨ ¨ ¨ n n Local Area Networks (LANs) Metropolitan Area Networks (MANs) System interconnect networks (SAN, Infini. Band …) WAN (TCP/IP, ATM…) Wireless networks Cross layered design Early architecture standardization is an optimization burden! Israel Cidon, Technion
4 Decades of Network 101 Israel Cidon, Technion
Local Area Networks (LANs) n Critical need ¨ n Constraints ¨ n Standardization Main Cost ¨ n Distributing operations and sharing of heterogeneous systems Incremental cost (NICs, wiring) Typical optimized architecture: Low cost hubs/switches ¨ Tree like architecture ¨ Exploit low cost local BW n Shared media n Broadcast ¨ Host embedded NICs ¨ Israel Cidon, Technion
System interconnect (SAN, Infini. Band) n Critical need ¨ n Constraints ¨ n Low latency Main Cost ¨ n Create a powerful specialized system from low cost units Total system cost per MIP Typical architecture: ¨ ¨ ¨ Wormhole/cut through Connection based Over-provisioned network High degree/regular topology Specific optimizations (e. g. RDMA) Israel Cidon, Technion
WAN (TCP/IP, ATM…) n Critical need ¨ n Global application networking (collaboration, WWW, file sharing, voice) Constraints Scalability ¨ Heterogeneous user and application Qo. S requirements ¨ n Main Cost ¨ n Physical infrastructure (mainly long distance trunks) Typical architecture of choice: Packet switching ¨ Irregular, small degree networks of high speed trunks ¨ Optimization of topology and link capacities ¨ Israel Cidon, Technion
CAN optimization n The design envelope (constraints) ¨ ¨ n Collection of designs supported by a given chip Convex hull of traffic requirements all configurations Qo. S constraints Other requirements (eg: design automation…) The main cost(s) Total Area ¨ Power ¨ Others n Design time, verification and testability, ¨ n Optimization variables ¨ ¨ ¨ ¨ Switching mechanism Qo. S Topology (incl. links capacities) Routing Flow and congestion control Buffering Application support …. . Israel Cidon, Technion
One No. C does not fit all! Reconfiguration rate during run time CMP ASSP FPGA at boot time at design time ASIC single application General purpose computer Flexibility I. Cidon and K. Goossens, in “Networks on Chips” , G. De Micheli and L. Benini, Morgan Kaufmann, 2006 Israel Cidon, Technion
One No. C does not fit all! Traffic Unpredictability Run time CMP ASSP FPGA At configuration At design time ASIC single application n General purpose computer Flexibility A large solution range! I. Cidon and K. Goossens, in “Networks on Chips” , G. De Micheli and L. Benini, Morgan Kaufmann, 2006 Israel Cidon, Technion
Apply paradigm to ASIC based No. C n Design envelop / constraints Well define inter-modules traffic ¨ Automatic synthesis ¨ Variable Qo. S requirement ¨ n Main cost ¨ n Power and area Architecture of choice: Wormhole or small frame switching ¨ Small # of buffers, VCs, tables ¨ Simple Qo. S mechanisms (which? ) ¨ Topology and routing optimized for cost ¨ Israel Cidon, Technion
Example: QNo. C Quality-of-service No. C architecture for ASICs n n Traffic requirements are known a-priori R Overall approach ¨ ¨ ¨ Wormhole switching Qo. S based on priority classes Small buffer/VC budget In-order SP XY routing Irregular topology Optimized link capacities R (0, 0) R (0, 3) R (1, 0) (1, 4) R R (2, 1) R R (2, 2) R (2, 3) R R (2, 4) R R (3, 4) (4, 3) R (5, 0) * E. Bolotin, I. Cidon, R. Ginosar and A. Kolodny. , “QNo. C: Qo. S architecture and design process for Network on Chip”, JSA special issue on No. C, 2004. Israel Cidon, Technion (0, 4) (0, 2) R (2, 0) R R (4, 4)
Quality-of-Service in QNo. C n Multiple priority classes ¨ ¨ ¨ n Define latency Preemptive Possible ASIC classes N n Signaling n Real Time Stream n Read-Write n DMA Block Transfer Statistical guarantees ¨ E. g. <0. 01% arrive later then required T * E. Bolotin, I. Cidon, R. Ginosar and A. Kolodny. , “QNo. C: Qo. S architecture and design process for Network on Chip”, JSA special issue on NOC, 2004. Israel Cidon, Technion
QNo. C Design Flow Extract intermodule traffic Place modules Allocate link capacities Verify Qo. S and cost Israel Cidon, Technion
QNo. C Design Flow Extract intermodule traffic R R R Module Module R Allocate link capacities Module R Module Place modules R R Module R R Module Verify Qo. S and cost Israel Cidon, Technion R Module R R Module
QNo. C Design Flow Extract intermodule traffic R R Module Module R Allocate link capacities Module R Place modules R R Module Module R Module Verify Qo. S and cost n n Optimize capacity for performance/power tradeoff Capacity allocation is a traditional WAN optimization problem, however: Israel Cidon, Technion
Wormhole Delay Modeling n Approximate delay analysis in wormhole networks Multiple Virtual-Channels ¨ Different link capacities ¨ Different communication demands ¨ Queuing delay: Flit interleaving delay approximation: * I. Walter, Z. Guz, I. Cidon, R. Ginosar and A. Kolodny, “Efficient Link Capacity and Qo. S Design for Wormhole Network-on-Chip, ” DATE 2006. Israel Cidon, Technion
The Capacity Allocation Problem n n Given: ¨ system topology and routing ¨ Each flow’s bandwidth (fi ) and delay bound (Ti. REQ) Minimize total link capacity Israel Cidon, Technion n Such that:
Capacity Allocation – Realistic Example n n A So. C-like system with realistic traffic demands and delay requirements “Classic” design: 41. 8 Gbit/sec Using the algorithm: 28. 7 Gbit/sec Total capacity reduced by 30% 03 02 01 Before optimization 00 After optimization 12 11 10 23 22 21 20 Israel Cidon, Technion 13
Optimizing routing on Irregular Mesh ü Around the Block ü Dead End Goal: Minimize the total size of routing tables E. Bolotin, I. Cidon, R. Ginosar and A. Kolodny, "Routing Table Minimization for Irregular Mesh No. Cs", DATE 2007. Israel Cidon, Technion
Saving Table Hardware Traditional solutions - full routing tables q q Destination Based Routing - at router Source Routing – at sources Solution idea: q Use Reduced Tables q q Store only relevant destinations (PLA) Default function (“Go XY” or “Don’t turn”) + Table for deviations Israel Cidon, Technion
Routing Heuristics for Irregular Mesh Random problem instances Distributed Routing (full tables) X-Y Routing with Deviation Tables Source Routing for Deviation Points Systems with real applications Israel Cidon, Technion
Efficient Routing Results Savings Scaling of Savings Network Size Israel Cidon, Technion
No. C for Shared Memory CMP n Constraints Multiple access to coherent cache ¨ Unpredictable traffic pattern ¨ Qo. S requirements (fetch, pre-fetch) ¨ n Main cost ¨ n CMP power / area performance Architecture of choice: ¨ ¨ ¨ Tailored for a given CMP In-order/adaptive routing? Simple Qo. S mechanisms? Regular topology? is CMP symmetric? Built in support functions (multicast, search…) Israel Cidon, Technion
No. C can facilitate critical transactions * E. Bolotin, Z. Guz, I. Cidon, R. Ginosar and A. Kolodny, “The Power of Priority: No. C based Distributed Cache Coherency”, No. Cs 2007. Israel Cidon, Technion
Priority No. C: Results Israel Cidon, Technion
No. C Based FPGA Architecture Functional unit No. C for interrouting Routers Configurable region – User logic Configurable network interface Israel Cidon, Technion
No. C for FPGA n Design envelope / constraints Many ASIC like applications for a given FPGA ¨ Hard No. C infrastructure – efficient but inflexible ¨ Soft logic is reusable but has inferior performance ¨ n Average No. C cost of most demanding designs Hard grid links and router logic ¨ Total configured No. C Logic used ¨ n Architecture of choice: ¨ ¨ ¨ Regular and uniform grid In-order/load balanced routing Hard logic for links, routers Soft logic for routing algorithms, headers, CNIs Soft No. C tuning (routing, CNI) for a given implementation Israel Cidon, Technion
No. C Based FPGA Architecture Functional unit No. C for interrouting Routers Configurable region – User logic Configurable network interface Israel Cidon, Technion
Source Toggle XY n n Unlike TXY, traffic to same destination is not split Maximum capacity similar to TXY The route is a bitwise XOR of source and destination ID Can be extended to weighted source toggle (WOT) Israel Cidon, Technion
Two Hotspots Design Envelope for various distances between the hotspots for WOT Maximum Capacity Israel Cidon, Technion
Generic No. C Problems Many shared problems across design spectrum, examples: n Need for a low latency class of service n Verification and predictability n Power control of No. Cs n Centralized vs. distributed control n Is single No. C enough per chip? ¨ n Bus examples suggest otherwise Hot modules slows incoming No. C traffic Off chip systems ¨ Shared memory subsystems ¨ Expensive functional units ¨ Israel Cidon, Technion
Interface IP 3 n n HM is not a local problem Transparent to No. C performance Walter, Cidon, Ginosar and Kolodny, ”Access Regulation to Hot-Modules in Wormhole No. Cs”, NOCS 2007. Israel Cidon, Technion IP 2 Interface IP 1 Interface No. C clogging by hot modules
IP (HM) n No “fairness” is guarantied since routers’ arbitration is based on local state n The further is the source from the destination, its worm has to win more arbitrations n The HM module bandwidth isn’t fairly shared Israel Cidon, Technion Interface Source Fairness
Hot Module Distributed Arbitration n n Control is distributed or centralized Centralized control can account for dependencies Requests and grants are sent at high service level Requests and grants includes additional data as needed requested quota, source queue size, priority, deadline, etc. ¨ Granted quota, scheduling of transmission's, etc. ¨ n Initial credits hides light load request-grant latency Israel Cidon, Technion
Hot vs. non-Hot Module. Traffic HM Traffic With Control HM Traffic Without Control Other Traffic Without Control Israel Cidon, Technion Other Traffic With Control
Conclusions n n No. C is a chip design paradigm shift Introduces many diverse and new networking challenges No killer No. C for all chips Should not comply with any X-AN concept ¨ ¨ ¨ n n May include centralized mechanisms May involve more than one No. C/Bus mechanisms May combine several communication methodologies n Low latency No. C/Bus for metadata and urgent signals Beware of early standardization and legacy barriers Mutual benefit for VLSI-Networking collaboration No. C: A Network AND A Chip Israel Cidon, Technion


