Скачать презентацию No C Network OR Chip Israel Cidon Technion Скачать презентацию No C Network OR Chip Israel Cidon Technion

d3f4c678ce29dd8a11f8b518157137c2.ppt

  • Количество слайдов: 42

No. C: Network OR Chip? Israel Cidon Technion No. C: Network OR Chip? Israel Cidon Technion

Technion’s No. C Research: n PIs n Israel Cidon (networking) ¨ Ran Ginosar (VLSI) Technion’s No. C Research: n PIs n Israel Cidon (networking) ¨ Ran Ginosar (VLSI) ¨ Idit Keidar (Dist. Systems) ¨ Avinoam Kolodny (VLSI) ¨ Israel Cidon, Technion Students: ¨ Evgeny Bolotin, ¨ Reuven Dobkin, ¨ Zvika Guz, ¨ Arkadiy Morgenshtein, ¨ Zigi Walter ¨ Roman Gindin

Origins of the No. C concept n Early publications: ¨ ¨ ¨ ¨ Guerrier Origins of the No. C concept n Early publications: ¨ ¨ ¨ ¨ Guerrier and Greiner (2000) – “A generic architecture for on-chip packet-switched interconnections” Hemani, Jantsch, Kumar, Postula, Oberg , Millberg and Lindqvist (2000) – “Network on chip: An architecture for billion transistor era” Dally and Towles (2001) – “Route packets, not wires: on-chip interconnection networks” Wingard (2001) – “Micro. Network-based integration of So. Cs” Rijpkema, Goossens and Wielage (2001) – “A router architecture for networks on silicon” De Micheli and Benini (2002) – “Networks on chip: A new paradigm for systems on chip design” Bolotin, Cidon Ginosar and Kolodny (2004) – “QNo. C: Qo. S architecture and design process for network on chip” Israel Cidon, Technion

Evolution or Paradigm Shift? Network link Network router Computing module Bus n Architectural paradigm Evolution or Paradigm Shift? Network link Network router Computing module Bus n Architectural paradigm shift ¨ n Design paradigm shift ¨ n Replace wire spaghetti by an intelligent network infrastructure Busses and signals replaced by packets Organizational paradigm shift ¨ Create a new discipline, a new infrastructure responsibility Israel Cidon, Technion

successful Characteristics of a paradigm shift n Addresses a critical and topical need n successful Characteristics of a paradigm shift n Addresses a critical and topical need n Enables a quantum leap in productivity and application n Resistance from legacy experts n Requires a major change of mindset and skills! Think: Networking not Bus evolution! Israel Cidon, Technion

Critical needs addressed by No. C 1) Efficient interconnect: delay, power, noise, scalability, reliability Critical needs addressed by No. C 1) Efficient interconnect: delay, power, noise, scalability, reliability Module 2) Increase system Module integration productivity 3) Enable Chip Multi Processors Israel Cidon, Technion Module Module Module

No. C offers Area and Power Scalability For Same Performance, compare the Wire-area and No. C offers Area and Power Scalability For Same Performance, compare the Wire-area and power: No. C: Point-to Point: Simple Bus: Segmented Bus: E. Bolotin at al. , “Cost Considerations in Network on Chip”, Integration, special issue on Network on Chip, October 2004 Israel Cidon, Technion

4 Decades of Network 101 n n Evolved from busses and p-t-p connections Extensive 4 Decades of Network 101 n n Evolved from busses and p-t-p connections Extensive architectures, modeling and analysis research Architecture is about optimizing network costs Different goals and element costs => different architectures: ¨ ¨ ¨ n n Local Area Networks (LANs) Metropolitan Area Networks (MANs) System interconnect networks (SAN, Infini. Band …) WAN (TCP/IP, ATM…) Wireless networks Cross layered design Early architecture standardization is an optimization burden! Israel Cidon, Technion

4 Decades of Network 101 Israel Cidon, Technion 4 Decades of Network 101 Israel Cidon, Technion

Local Area Networks (LANs) n Critical need ¨ n Constraints ¨ n Standardization Main Local Area Networks (LANs) n Critical need ¨ n Constraints ¨ n Standardization Main Cost ¨ n Distributing operations and sharing of heterogeneous systems Incremental cost (NICs, wiring) Typical optimized architecture: Low cost hubs/switches ¨ Tree like architecture ¨ Exploit low cost local BW n Shared media n Broadcast ¨ Host embedded NICs ¨ Israel Cidon, Technion

System interconnect (SAN, Infini. Band) n Critical need ¨ n Constraints ¨ n Low System interconnect (SAN, Infini. Band) n Critical need ¨ n Constraints ¨ n Low latency Main Cost ¨ n Create a powerful specialized system from low cost units Total system cost per MIP Typical architecture: ¨ ¨ ¨ Wormhole/cut through Connection based Over-provisioned network High degree/regular topology Specific optimizations (e. g. RDMA) Israel Cidon, Technion

WAN (TCP/IP, ATM…) n Critical need ¨ n Global application networking (collaboration, WWW, file WAN (TCP/IP, ATM…) n Critical need ¨ n Global application networking (collaboration, WWW, file sharing, voice) Constraints Scalability ¨ Heterogeneous user and application Qo. S requirements ¨ n Main Cost ¨ n Physical infrastructure (mainly long distance trunks) Typical architecture of choice: Packet switching ¨ Irregular, small degree networks of high speed trunks ¨ Optimization of topology and link capacities ¨ Israel Cidon, Technion

CAN optimization n The design envelope (constraints) ¨ ¨ n Collection of designs supported CAN optimization n The design envelope (constraints) ¨ ¨ n Collection of designs supported by a given chip Convex hull of traffic requirements all configurations Qo. S constraints Other requirements (eg: design automation…) The main cost(s) Total Area ¨ Power ¨ Others n Design time, verification and testability, ¨ n Optimization variables ¨ ¨ ¨ ¨ Switching mechanism Qo. S Topology (incl. links capacities) Routing Flow and congestion control Buffering Application support …. . Israel Cidon, Technion

One No. C does not fit all! Reconfiguration rate during run time CMP ASSP One No. C does not fit all! Reconfiguration rate during run time CMP ASSP FPGA at boot time at design time ASIC single application General purpose computer Flexibility I. Cidon and K. Goossens, in “Networks on Chips” , G. De Micheli and L. Benini, Morgan Kaufmann, 2006 Israel Cidon, Technion

One No. C does not fit all! Traffic Unpredictability Run time CMP ASSP FPGA One No. C does not fit all! Traffic Unpredictability Run time CMP ASSP FPGA At configuration At design time ASIC single application n General purpose computer Flexibility A large solution range! I. Cidon and K. Goossens, in “Networks on Chips” , G. De Micheli and L. Benini, Morgan Kaufmann, 2006 Israel Cidon, Technion

Apply paradigm to ASIC based No. C n Design envelop / constraints Well define Apply paradigm to ASIC based No. C n Design envelop / constraints Well define inter-modules traffic ¨ Automatic synthesis ¨ Variable Qo. S requirement ¨ n Main cost ¨ n Power and area Architecture of choice: Wormhole or small frame switching ¨ Small # of buffers, VCs, tables ¨ Simple Qo. S mechanisms (which? ) ¨ Topology and routing optimized for cost ¨ Israel Cidon, Technion

Example: QNo. C Quality-of-service No. C architecture for ASICs n n Traffic requirements are Example: QNo. C Quality-of-service No. C architecture for ASICs n n Traffic requirements are known a-priori R Overall approach ¨ ¨ ¨ Wormhole switching Qo. S based on priority classes Small buffer/VC budget In-order SP XY routing Irregular topology Optimized link capacities R (0, 0) R (0, 3) R (1, 0) (1, 4) R R (2, 1) R R (2, 2) R (2, 3) R R (2, 4) R R (3, 4) (4, 3) R (5, 0) * E. Bolotin, I. Cidon, R. Ginosar and A. Kolodny. , “QNo. C: Qo. S architecture and design process for Network on Chip”, JSA special issue on No. C, 2004. Israel Cidon, Technion (0, 4) (0, 2) R (2, 0) R R (4, 4)

Quality-of-Service in QNo. C n Multiple priority classes ¨ ¨ ¨ n Define latency Quality-of-Service in QNo. C n Multiple priority classes ¨ ¨ ¨ n Define latency Preemptive Possible ASIC classes N n Signaling n Real Time Stream n Read-Write n DMA Block Transfer Statistical guarantees ¨ E. g. <0. 01% arrive later then required T * E. Bolotin, I. Cidon, R. Ginosar and A. Kolodny. , “QNo. C: Qo. S architecture and design process for Network on Chip”, JSA special issue on NOC, 2004. Israel Cidon, Technion

QNo. C Design Flow Extract intermodule traffic Place modules Allocate link capacities Verify Qo. QNo. C Design Flow Extract intermodule traffic Place modules Allocate link capacities Verify Qo. S and cost Israel Cidon, Technion

QNo. C Design Flow Extract intermodule traffic R R R Module Module R Allocate QNo. C Design Flow Extract intermodule traffic R R R Module Module R Allocate link capacities Module R Module Place modules R R Module R R Module Verify Qo. S and cost Israel Cidon, Technion R Module R R Module

QNo. C Design Flow Extract intermodule traffic R R Module Module R Allocate link QNo. C Design Flow Extract intermodule traffic R R Module Module R Allocate link capacities Module R Place modules R R Module Module R Module Verify Qo. S and cost n n Optimize capacity for performance/power tradeoff Capacity allocation is a traditional WAN optimization problem, however: Israel Cidon, Technion

Wormhole Delay Modeling n Approximate delay analysis in wormhole networks Multiple Virtual-Channels ¨ Different Wormhole Delay Modeling n Approximate delay analysis in wormhole networks Multiple Virtual-Channels ¨ Different link capacities ¨ Different communication demands ¨ Queuing delay: Flit interleaving delay approximation: * I. Walter, Z. Guz, I. Cidon, R. Ginosar and A. Kolodny, “Efficient Link Capacity and Qo. S Design for Wormhole Network-on-Chip, ” DATE 2006. Israel Cidon, Technion

The Capacity Allocation Problem n n Given: ¨ system topology and routing ¨ Each The Capacity Allocation Problem n n Given: ¨ system topology and routing ¨ Each flow’s bandwidth (fi ) and delay bound (Ti. REQ) Minimize total link capacity Israel Cidon, Technion n Such that:

Capacity Allocation – Realistic Example n n A So. C-like system with realistic traffic Capacity Allocation – Realistic Example n n A So. C-like system with realistic traffic demands and delay requirements “Classic” design: 41. 8 Gbit/sec Using the algorithm: 28. 7 Gbit/sec Total capacity reduced by 30% 03 02 01 Before optimization 00 After optimization 12 11 10 23 22 21 20 Israel Cidon, Technion 13

Optimizing routing on Irregular Mesh ü Around the Block ü Dead End Goal: Minimize Optimizing routing on Irregular Mesh ü Around the Block ü Dead End Goal: Minimize the total size of routing tables E. Bolotin, I. Cidon, R. Ginosar and A. Kolodny, "Routing Table Minimization for Irregular Mesh No. Cs", DATE 2007. Israel Cidon, Technion

Saving Table Hardware Traditional solutions - full routing tables q q Destination Based Routing Saving Table Hardware Traditional solutions - full routing tables q q Destination Based Routing - at router Source Routing – at sources Solution idea: q Use Reduced Tables q q Store only relevant destinations (PLA) Default function (“Go XY” or “Don’t turn”) + Table for deviations Israel Cidon, Technion

Routing Heuristics for Irregular Mesh Random problem instances Distributed Routing (full tables) X-Y Routing Routing Heuristics for Irregular Mesh Random problem instances Distributed Routing (full tables) X-Y Routing with Deviation Tables Source Routing for Deviation Points Systems with real applications Israel Cidon, Technion

Efficient Routing Results Savings Scaling of Savings Network Size Israel Cidon, Technion Efficient Routing Results Savings Scaling of Savings Network Size Israel Cidon, Technion

No. C for Shared Memory CMP n Constraints Multiple access to coherent cache ¨ No. C for Shared Memory CMP n Constraints Multiple access to coherent cache ¨ Unpredictable traffic pattern ¨ Qo. S requirements (fetch, pre-fetch) ¨ n Main cost ¨ n CMP power / area performance Architecture of choice: ¨ ¨ ¨ Tailored for a given CMP In-order/adaptive routing? Simple Qo. S mechanisms? Regular topology? is CMP symmetric? Built in support functions (multicast, search…) Israel Cidon, Technion

No. C can facilitate critical transactions * E. Bolotin, Z. Guz, I. Cidon, R. No. C can facilitate critical transactions * E. Bolotin, Z. Guz, I. Cidon, R. Ginosar and A. Kolodny, “The Power of Priority: No. C based Distributed Cache Coherency”, No. Cs 2007. Israel Cidon, Technion

Priority No. C: Results Israel Cidon, Technion Priority No. C: Results Israel Cidon, Technion

No. C Based FPGA Architecture Functional unit No. C for interrouting Routers Configurable region No. C Based FPGA Architecture Functional unit No. C for interrouting Routers Configurable region – User logic Configurable network interface Israel Cidon, Technion

No. C for FPGA n Design envelope / constraints Many ASIC like applications for No. C for FPGA n Design envelope / constraints Many ASIC like applications for a given FPGA ¨ Hard No. C infrastructure – efficient but inflexible ¨ Soft logic is reusable but has inferior performance ¨ n Average No. C cost of most demanding designs Hard grid links and router logic ¨ Total configured No. C Logic used ¨ n Architecture of choice: ¨ ¨ ¨ Regular and uniform grid In-order/load balanced routing Hard logic for links, routers Soft logic for routing algorithms, headers, CNIs Soft No. C tuning (routing, CNI) for a given implementation Israel Cidon, Technion

No. C Based FPGA Architecture Functional unit No. C for interrouting Routers Configurable region No. C Based FPGA Architecture Functional unit No. C for interrouting Routers Configurable region – User logic Configurable network interface Israel Cidon, Technion

Source Toggle XY n n Unlike TXY, traffic to same destination is not split Source Toggle XY n n Unlike TXY, traffic to same destination is not split Maximum capacity similar to TXY The route is a bitwise XOR of source and destination ID Can be extended to weighted source toggle (WOT) Israel Cidon, Technion

Two Hotspots Design Envelope for various distances between the hotspots for WOT Maximum Capacity Two Hotspots Design Envelope for various distances between the hotspots for WOT Maximum Capacity Israel Cidon, Technion

Generic No. C Problems Many shared problems across design spectrum, examples: n Need for Generic No. C Problems Many shared problems across design spectrum, examples: n Need for a low latency class of service n Verification and predictability n Power control of No. Cs n Centralized vs. distributed control n Is single No. C enough per chip? ¨ n Bus examples suggest otherwise Hot modules slows incoming No. C traffic Off chip systems ¨ Shared memory subsystems ¨ Expensive functional units ¨ Israel Cidon, Technion

Interface IP 3 n n HM is not a local problem Transparent to No. Interface IP 3 n n HM is not a local problem Transparent to No. C performance Walter, Cidon, Ginosar and Kolodny, ”Access Regulation to Hot-Modules in Wormhole No. Cs”, NOCS 2007. Israel Cidon, Technion IP 2 Interface IP 1 Interface No. C clogging by hot modules

IP (HM) n No “fairness” is guarantied since routers’ arbitration is based on local IP (HM) n No “fairness” is guarantied since routers’ arbitration is based on local state n The further is the source from the destination, its worm has to win more arbitrations n The HM module bandwidth isn’t fairly shared Israel Cidon, Technion Interface Source Fairness

Hot Module Distributed Arbitration n n Control is distributed or centralized Centralized control can Hot Module Distributed Arbitration n n Control is distributed or centralized Centralized control can account for dependencies Requests and grants are sent at high service level Requests and grants includes additional data as needed requested quota, source queue size, priority, deadline, etc. ¨ Granted quota, scheduling of transmission's, etc. ¨ n Initial credits hides light load request-grant latency Israel Cidon, Technion

Hot vs. non-Hot Module. Traffic HM Traffic With Control HM Traffic Without Control Other Hot vs. non-Hot Module. Traffic HM Traffic With Control HM Traffic Without Control Other Traffic Without Control Israel Cidon, Technion Other Traffic With Control

Conclusions n n No. C is a chip design paradigm shift Introduces many diverse Conclusions n n No. C is a chip design paradigm shift Introduces many diverse and new networking challenges No killer No. C for all chips Should not comply with any X-AN concept ¨ ¨ ¨ n n May include centralized mechanisms May involve more than one No. C/Bus mechanisms May combine several communication methodologies n Low latency No. C/Bus for metadata and urgent signals Beware of early standardization and legacy barriers Mutual benefit for VLSI-Networking collaboration No. C: A Network AND A Chip Israel Cidon, Technion