
2aef0bdeb9ba8460e9006cce0053d4a0.ppt
- Количество слайдов: 27
Architecture and Routing for No. C-based FPGA Israel Cidon* *joint work with Roman Gindin and Idit Keidar
One No. C does not fit all! Traffic uncertainty CMP Run time Configuration FPGA SOC Chip design single application Flexibility General purpose computer I. Cidon and K. Goossens, in “Networks on Chips” , G. De Micheli and L. Benini, Morgan Kaufmann, 2006 Israel Cidon - Technion 2
Field Programmable Gate Array - 101 n Flexible Soft logic ¨ Configurable logic blocks (CLBs) and routing channels Programmed Look-up-tables (LUTs) n Configurable switching boxes n n Area, power and speed efficient Hard logic ¨ Wire and clock infrastructure ¨ Special purpose modules, e. g. , CPU, Ser. Des Israel Cidon - Technion 3
Challenges for Future FPGA Scalability of design methodology n Dominance of wire delays n ¨ Already more than 50% of delay Power n Complex communication patterns n Prototyping for No. C-based So. Cs n Israel Cidon - Technion 5
No. C Based FPGA Architecture Functional unit No. C for interrouting Routers Configurable region – User logic Configurable network interface Israel Cidon - Technion 6
Hard or soft No. C? n Why hard ¨ Interconnect n Why soft is a performance bottleneck ¨ Interconnect power ¨ Part of FPGA infrastructure Israel Cidon - Technion ¨ Application is not known when the network is built ¨ Provides maximum flexibility ¨ Prevents resource lockup 8
Suggested FPGA No. C Architecture No. C Element Implementation Wires, repeaters, etc. Hard Routers, including VCs, buffers, Qo. S support Hard Network interfaces Soft: Configurable Network Interface (CNI) Routing algorithm and headers Routing tables Soft: determined in CNI Soft Israel Cidon - Technion 9
FPGA Routing – Optimization Problem Common efficient No. C Set of Applications Different Architectures Different Traffic Patterns Implemented on the same chip Israel Cidon - Technion 10
The No. C design problem n Design Envelope ¨ n Collection of designs supported by a given programmable chip The cost ¨ Hard grid links n ¨ No. C Logic n n n For uniform grids - the capacity of the most congestion link Hard logic for router Soft logic for routing tables, headers, CNIs The variables ¨ ¨ Number of “hard-coded” wires per link Possible configurable routing schemes Israel Cidon - Technion 11
Routing Schemes n XY Very simple logic C Deadlock free D Unbalanced - high cost in uniform capacity grids C Israel Cidon - Technion 12
Toggle XY (TXY) n n n C C D D Split packets evenly between XY, YX routes Deadlock avoided with 2 VCs Near-optimal for symmetric traffic (permutations) [Seo et al. 05; Towles & Dally 02] Simple Better Balanced Split routes Does not take into account the traffic pattern Israel Cidon - Technion 13
Weighted Schemes n Max. Capacity for graph with two TXY not always produces hotspotsbestand (1, 2) the at (1, 1) results on 5 x 5 grid TXY Israel Cidon - Technion Optimum 14
WTXY Given a traffic pattern, choose XY/YX ratio of lowest maximum capacity n Compute the ratio at programming time n Load into Cxy field in router n Router chooses XY route with probability Cxy, otherwise YX n Israel Cidon - Technion 15
TXY, WTXY Limitation n Traffic split ¨ packets of the same flow take different paths Delays may cause out-of-order arrivals n Re-ordering buffers are costly n Israel Cidon - Technion 16
Ordered Routing Algorithms n One route per source-destination (S-D) pair ¨ No traffic splitting Unordered Routing Israel Cidon - Technion Ordered Routing 17
Source Toggle XY n The route is a function of source and destination ID ¨ bitwise n n XOR Very simple algorithm Maximum capacity is similar to TXY Israel Cidon - Technion 18
Weighted Ordered Toggle - WOT n Weighted Ordered Toggle (WOT) ¨ Route per S-D pair is chosen at programming time ¨ Each source stores a routing bit for each destination n Objective: minimize max link capacity ¨ Optimal route assignment is difficult Israel Cidon - Technion 19
WOT Min-max Route Assignment initial assignment - STXY n Make changes that reduce the capacity: n ¨ Find most loaded link ¨ Among S-D pairs sharing this link change one that minimizes the max capacity (if possible) n Sub-optimal Israel Cidon - Technion 20
Iteration Demonstration S 3 S 2 S 1 D 3 D 1 D 2 Israel Cidon - Technion 21
Benchmarks Previous work consider uniform permutations n Chips have one or more hotspots n ¨ CPU, on-chip memory, off-chip memory interface We use several hot-spot traffic models n Also use a real world example n Israel Cidon - Technion 22
Single Hotspot Israel Cidon - Technion 23
Two Hotspots Maximum Capacity Design Envelope for various distances between the hotspots for WOT Israel Cidon - Technion 24
Three Hotspots n Maximum capacity vs. Minimum distance between the hotspots Israel Cidon - Technion 25
Mixed Traffic Model n Three parameters per node A probability to be a hotspot, ¨ A probability to send data to a hotspot ¨ A probability to send data to a non-hotspot ¨ n Average improvement for WOT vs. TXY is 12% and vs. XT is 25% Israel Cidon - Technion 26
Real-World Example n Based on Bertozzi - video encoder ¨ Mapping and placement are done manually Israel Cidon - Technion 27
Real World Example n Maximum Capacity ¨ WOT - 1053 ¨ STXY -1377 ¨ XY - 1539 Israel Cidon - Technion 28
Summary A new No. C-based architecture for FPGA n A design methodology for this architecture. n WOT routing algorithm – n ¨ Balanced ¨ In-order ¨ Low cost Israel Cidon - Technion 29
2aef0bdeb9ba8460e9006cce0053d4a0.ppt