Скачать презентацию The Crosspoint Queued Switch Yossi Kanizo Technion Israel Скачать презентацию The Crosspoint Queued Switch Yossi Kanizo Technion Israel

78160a906abfa5a8d82df1a2d08943f1.ppt

  • Количество слайдов: 18

The Crosspoint Queued Switch Yossi Kanizo (Technion, Israel) Joint work with Isaac Keslassy (Technion, The Crosspoint Queued Switch Yossi Kanizo (Technion, Israel) Joint work with Isaac Keslassy (Technion, Israel) and David Hay (Politecnico di Torino, Italy)

Typical Switch Architectures Linecards Switch Fabric Assumes Instantaneous Closed Loop IQ – Input Queued Typical Switch Architectures Linecards Switch Fabric Assumes Instantaneous Closed Loop IQ – Input Queued CICQ – Combined Input and Crosspoint Queued

Single-Rack Router Ø Instantaneous closed loop → works in a single rack ØProblem: multi-rack Single-Rack Router Ø Instantaneous closed loop → works in a single rack ØProblem: multi-rack routers

Current Router Architectures Is the closed loop still instantaneous? [Source: N. Mc. Keown] Current Router Architectures Is the closed loop still instantaneous? [Source: N. Mc. Keown]

ns Time Trends ns Time Trends

Hiding Propagation Delays Ø Traditional solutions: ØIncrease time-slot poor switch performance ØHide propagation delays Hiding Propagation Delays Ø Traditional solutions: ØIncrease time-slot poor switch performance ØHide propagation delays using buffers impractical amount of buffering Ø Proposed solution: closed loop → open loop ØPerformance degradation vs. instantaneous closed loop

Outline Ø CQ: Open-loop switch architecture Ø Performance Evaluation ØAnalytical results ØSimulations CQ performance Outline Ø CQ: Open-loop switch architecture Ø Performance Evaluation ØAnalytical results ØSimulations CQ performance degradation is not significant

Proposed Architecture: The Crosspoint-Queued (CQ) Switch Linecards Switch Core Ø No queues in the Proposed Architecture: The Crosspoint-Queued (CQ) Switch Linecards Switch Core Ø No queues in the linecards Ø Buffering only inside the fabric Ø Independent output schedulers Ø Drops with full buffers 10 s of meters

CQ Properties Ø Open loop ØNo communication overhead Ø No linecard queues ØNo linecard CQ Properties Ø Open loop ØNo communication overhead Ø No linecard queues ØNo linecard queue management Ø “Router on a chip” ØBuffering and switch fabric on same chip

Why not 10 years ago? Ø No need: single rack Ø No technology: SRAM Why not 10 years ago? Ø No need: single rack Ø No technology: SRAM density ØMoore’s law: density doubling every 2. 5 years ØAggressive 128 x 128 CQ switch: 4 cells of 64 bytes per crosspoint → 64 cells today Ø Conservative buffer requirements ØTCP Stanford model with smaller buffer needs [Appenzeller, Keslassy and Mc. Keown ’ 04]

Outline Ø CQ: Our open-loop switch architecture Ø Performance Evaluation ØAnalytical results ØSimulations Outline Ø CQ: Our open-loop switch architecture Ø Performance Evaluation ØAnalytical results ØSimulations

∞ 100% Throughput as B→ Ø Throughput bounds: OQ(2 B-1) ≤ CQ(B)≤ OQ(NB) 100% ∞ 100% Throughput as B→ Ø Throughput bounds: OQ(2 B-1) ≤ CQ(B)≤ OQ(NB) 100% Throughput 100% Buffer size B, Throughput LQF scheduling algorithm 100% Throughput

Uniform Traffic, B=1 Ø Uniform traffic model: Ø At each time-slot, at each of Uniform Traffic, B=1 Ø Uniform traffic model: Ø At each time-slot, at each of the N inputs: Bernoulli IID packet arrivals with probability . Ø Each packet is destined for one of the N outputs uniformly at random Ø Theorem: Under uniform traffic and B=1, the performance of the switch is independent of the specific work-conserving scheduling algorithm Ø Intuition: Symmetry

Uniform Traffic, B=1 Ø Theorem: The throughput and waiting time of a CQ switch, Uniform Traffic, B=1 Ø Theorem: The throughput and waiting time of a CQ switch, B=1 is: q=1 -r/N Goes to 100% as N goes to infinity Ø Proof: Based on Z-transform

Models for larger buffers Ø Approximate Performance Analysis Ø Model for exhaustive round-robin scheduling Models for larger buffers Ø Approximate Performance Analysis Ø Model for exhaustive round-robin scheduling Ø Based on modifications to polling system with zero switch-over times Ø Model for random scheduling algorithm Ø Show 100% throughput as N→∞

Trace-Driven Simulation 32 x 32 CQ switch with different buffer sizes (in units of Trace-Driven Simulation 32 x 32 CQ switch with different buffer sizes (in units of 64 -byte packets) Buffers of size 64 suffice to ensure 99% throughput for N=32.

Conclusions Ø CQ is open loop → allows multi-rack configuration Ø CQ provides easy Conclusions Ø CQ is open loop → allows multi-rack configuration Ø CQ provides easy scheduling Ø CQ is feasible to implement in a single chip Ø CQ shows good performance in simulations

Thank You Thank You