Internet Algorithms Design and Analysis Alex Kesselman MPI

Скачать презентацию Internet Algorithms Design and Analysis Alex Kesselman MPI

610a3336dff329c5772b4bb00812f386.ppt

Количество слайдов: 192

Internet Algorithms: Design and Analysis Alex Kesselman, MPI Mini. Course, Oct. 2004

Algorithms for Networks • Networking provides a rich new context for algorithm design – algorithms are used everywhere in networks – – many new scenarios and very stringent constraints – – at the end-hosts for packet transmission in the network: switching, routing, caching, etc. high speed of operation large-sized systems cost of implementation require new approaches and techniques 2

Methods In the networking context – – – we also need to understand the “performance” of an algorithm: How well does a network or a component that uses a particular algorithm perform, as perceived by the user? performance analysis is concerned with metrics like delay, throughput, loss rates, etc metrics of the designer and of theoretician not necessarily the same 3

Recent Algorithm Design Methods • Motivated by the desire – – • for simplementations and for robust performance Several methods of algorithm design can be used in the networking context – – randomized algorithms approximation algorithms online algorithms distributed algorithms 4

In this Mini Course… • We will consider a number of problems in networking • Show various methods for algorithm design and for performance analysis 5

Network Layer Functions • transport packet from sending to receiving hosts • network layer protocols in every host, router important functions: • path determination: route taken by packets from source to dest. • switching: move packets from router’s input to appropriate router output application transport network data link physical network data link physical network data link physical application transport network data link physical 6

The Internet Core Edge Router 7

Internet Routing Algorithms Balaji Prabhakar

Network looks like Graph ! 9

Routing protocol 5 Goal: determine “good” path (sequence of routers) thru network from source to dest. Graph abstraction for routing algorithms: • graph nodes are routers • graph edges are physical links – link cost: delay, $ cost, or congestion level 2 A B 2 1 D 3 C F 1 3 1 5 E 2 • “good” path: – typically means minimum cost path – other def’s possible 10

Routing Algorithms Classification Global or decentralized information? Global: • all routers have complete topology, link cost info • “link state” algorithms Decentralized: • router knows physicallyconnected neighbors, link costs to neighbors • iterative process of info exchange with neighbors • “distance vector” algorithms Static or dynamic? Static: • routes change slowly over time Dynamic: • routes change more quickly – periodic update – in response to link cost changes 11

Link-State Routing Algorithms: OSPF Compute least cost paths from a node to all other nodes using Dijkstra’s algorithm. – advertisement carries one entry per neighbor router – advertisements disseminated via flooding 12

Dijkstra’s algorithm: example Step 0 1 2 3 4 5 start N A AD ADEBCF D(B), p(B) D(C), p(C) D(D), p(D) D(E), p(E) D(F), p(F) 2, A 1, A 5, A infinity 2, A 4, D 2, D infinity 2, A 3, E 4, E 5 2 A B 2 1 D 3 C 3 1 5 F 1 E 2 13

Route Optimization Improve user performance and network efficiency by tuning OSPF weights to the prevailing traffic demands. AT&T customers or peers backbone customers or peers 14

Route Optimization • Traffic engineering – Predict influence of weight changes on traffic flow – Minimize objective function (say, of link utilization) • Inputs – Networks topology: capacitated, directed graph – Routing configuration: routing weight for each link – Traffic matrix: offered load each pair of nodes • Outputs – Shortest path(s) for each node pair – Volume of traffic on each link in the graph – Value of the objective function 15

Example B 1 1 2 D A 1 2 C E Links AB and BD are overloaded Change weight of CD to 1 to improve routing (load balancing) ! 16

References 1. 2. Anja Feldmann, Albert Greenberg, Carsten Lund, Nick Reingold, Jennifer Rexford, and Fred True, "Deriving traffic demands for operational IP networks: Methodology and experience, " IEEE/ACM Transactions on Networking, pp. 265 -279, June 2001. Bernard Fortz and Mikkel Thorup, "Internet traffic engineering by optimizing OSPF weights, " in Proc. IEEE INFOCOM, pp. 519 -528, 2000. 17

Distance Vector Routing: RIP • Based on the Bellman-Ford algorithm – At node X, the distance to Y is updated by where DX(Y) denote the distance at X currently from X to Y, N(X) is set of the neighbors of node X, and c(X, Z) is the distance of the direct link from X to Z 18

Distance Table: Example A Below is just one step! The algorithm repeats for ever! 1 distance tables from neighbors computation 7 B 1 C 2 8 E D 2 D () A B D E’s distance table A 0 7 1 15 1, A A: 1 B 7 0 8 8 8, B B: 8 C 1 2 9 4 4, D C: 4 D 0 2 2, D D: 2 1 8 2 E: 0 c(E, A) c(E, B) c(E, D) 19 destinations E distance table E sends to its neighbors

Link Failure and Recovery • Distance vectors: exchanged every 30 sec • If no advertisement heard after 180 sec --> neighbor/link declared dead – routes via neighbor invalidated – new advertisements sent to neighbors – neighbors in turn send out new advertisements (if tables changed) – link failure info quickly propagates to entire net 20

The bouncing effect dest cost B C 1 2 dest cost 1 A B A C 1 1 1 25 C dest cost A B 2 1 21

C sends routes to B dest cost B C 1 2 A B A C ~ 1 1 25 C dest cost A B 2 1 22

B updates distance to A dest cost B C 1 2 A B A C 3 1 1 25 C dest cost A B 2 1 23

B sends routes to C dest cost B C 1 2 A B A C 3 1 1 25 C dest cost A B 4 1 24

How are these loops caused? • Observation 1: – B’s metric increases • Observation 2: – C picks B as next hop to A – But, the implicit path from C to A includes itself! 25

Solutions • Split horizon/Poisoned reverse – B does not advertise route to C or advertises it with infinite distance (16) • Works for two node loops – does not work for loops with more nodes 26

Example where Split Horizon fails A B 1 1 1 C D 1 • When link breaks, C marks D as unreachable and reports that to A and B • Suppose A learns it first. A now thinks best path to D is through B. A reports a route of cost=3 to C. • C thinks D is reachable through A at cost 4 and reports that to B. • B reports a cost 5 to A who reports new cost to C. • etc. . . 27

Comparison of LS and DV algorithms Message complexity • LS: with n nodes, E links, O(n. E) msgs sent • DV: exchange between neighbors only – larger msgs Speed of Convergence • LS: requires O(n. E) msgs – may have oscillations • DV: convergence time varies – routing loops – count-to-infinity problem Robustness: what happens if router malfunctions? LS: – node can advertise incorrect link cost – each node computes only its own table DV: – DV node can advertise incorrect path cost – error propagates thru network 28

Hierarchical Routing Our routing study thus far - idealization • all routers identical • network “flat” … not true in practice scale: with 50 million destinations: • can’t store all dest’s in routing tables! • routing table exchange would swamp links! administrative autonomy • internet = network of networks • each network admin may want to control routing in its own network 29

Hierarchical Routing • aggregate routers into regions, “autonomous systems” (AS) • routers in same AS run same routing protocol – “intra-AS” routing protocol gateway routers • special routers in AS • run intra-AS routing protocol with all other routers in AS • also responsible for routing to destinations outside AS – run inter-AS routing protocol with other gateway routers 30

Internet AS Hierarchy Inter-AS border (exterior gateway) routers Intra-AS interior (gateway) routers 31

Intra-AS and Inter-AS routing C. b a Host h 1 C b A. a Inter-AS routing between A and B A. c a d c b A Intra-AS routing within AS A B. a a c B Host h 2 b Intra-AS routing within AS B 32

Peer-to-Peer Networks: Chord Balaji Prabhakar

A peer-to-peer storage problem • 1000 scattered music enthusiasts • Willing to store and serve replicas • How do you find the data? 34

The Lookup Problem N 1 Key=“title” Value=MP 3 data… Publisher N 2 Internet N 4 N 5 N 3 ? Client Lookup(“title”) N 6 35

Centralized lookup (Napster) N 1 N 2 Set. Loc(“title”, N 4) Publisher@N 4 Key=“title” Value=MP 3 data… N 3 DB N 9 N 6 N 7 Client Lookup(“title”) N 8 Simple, but O(N) state and a single point of failure 36

Flooded queries (Gnutella) N 2 N 1 Publisher@N 4 Key=“title” Value=MP 3 data… N 6 N 7 N 3 Lookup(“title”) Client N 8 N 9 Robust, but worst case O(N) messages per lookup 37

Routed queries (Freenet, Chord, etc. ) N 2 N 1 Publisher N 3 N 4 Client Lookup(“title”) Key=“title” Value=MP 3 data… N 6 N 7 N 8 N 9 38

Chord Distinguishing Features • Simplicity • Provable Correctness • Provable Performance 39

Chord Simplicity • Resolution entails participation by O(log(N)) nodes • Resolution is efficient when each node enjoys accurate information about O(log(N)) other nodes 40

Chord Algorithms • Basic Lookup • Node Joins • Stabilization • Failures and Replication 41

Chord Properties • Efficient: O(log(N)) messages per lookup – N is the total number of servers • Scalable: O(log(N)) state per node • Robust: survives massive failures 42

Chord IDs • • Key identifier = SHA-1(key) Node identifier = SHA-1(IP address) Both are uniformly distributed Both exist in the same ID space • How to map key IDs to node IDs? 43

Consistent Hashing [Karger 97] • Target: web page caching • Like normal hashing, assigns items to buckets so that each bucket receives roughly the same number of items • Unlike normal hashing, a small change in the bucket set does not induce a total remapping of items to buckets 44

Consistent Hashing [Karger 97] Key 5 Node 105 K 5 N 105 K 20 Circular 7 -bit ID space N 32 A key is stored at its successor: node with next higher ID N 90 K 80 45

Basic lookup N 120 N 10 “Where is key 80? ” N 105 “N 90 has K 80” N 32 K 80 N 90 N 60 46

Simple lookup algorithm Lookup(my-id, key-id) n = my successor if my-id < n < key-id call Lookup(id) on node n // next hop else return my successor // done • Correctness depends only on successors 47

“Finger table” allows log(N)-time lookups ¼ ½ 1/8 1/16 1/32 1/64 1/128 N 80 48

Finger i points to successor of n+2 i N 120 112 ¼ ½ 1/8 1/16 1/32 1/64 1/128 N 80 49

Lookup with fingers Lookup(my-id, key-id) look in local finger table for highest node n s. t. my-id < n < key-id if n exists call Lookup(id) on node n // next hop else return my successor // done 50

Lookups take O(log(N)) hops N 5 N 10 K 19 N 20 N 110 N 99 N 32 Lookup(K 19) N 80 N 60 51

Node Join Linked List Insert N 25 N 36 1. Lookup(36) N 40 K 38 52

Node Join (2) N 25 2. N 36 sets its own successor pointer N 36 N 40 K 38 53

Node Join (3) N 25 3. Copy keys 26. . 36 from N 40 to N 36 K 30 N 40 K 38 54

Node Join (4) N 25 4. Set N 25’s successor pointer N 36 K 30 N 40 K 38 Update finger pointers in the background Correct successors produce correct lookups 55

Stabilization • Case 1: finger tables are reasonably fresh • Case 2: successor pointers are correct; fingers are inaccurate • Case 3: successor pointers are inaccurate or key migration is incomplete • Stabilization algorithm periodically verifies and refreshes node knowledge – Successor pointers – Predecessor pointers – Finger tables 56

Failures and Replication N 120 N 113 N 102 N 85 Lookup(90) N 80 doesn’t know correct successor, so incorrect lookup 57

Solution: successor lists • Each node knows r immediate successors • After failure, will know first live successor • Correct successors guarantee correct lookups • Guarantee is with some probability 58

Choosing the successor list length • Assume 1/2 of nodes fail • P(successor list all dead) = (1/2)r – I. e. P(this node breaks the Chord ring) – Depends on independent failure • P(no broken nodes) = (1 – (1/2)r)N – r = 2 log(N) makes prob. = 1 – 1/N 59

References Ion Stoica, Robert Morris, David Liben-Nowell, David R. Karger, M. Frans Kaashoek, Frank Dabek, Hari Balakrishnan, ``Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications, '‘IEEE/ACM Transactions on Networking, Vol. 11, No. 1, pp. 17 -32, February 2003. 60

Switch Scheduling Algorithms Balaji Prabhakar

Basic Architectural Components 1. Routing Table 2. Interconnect 3. Output Scheduling Forwarding Decision Routing Table Forwarding Decision 62

Switching Fabrics Output Queued Input Queued Multi stage 3 7 5 2 6 0 1 4 7 2 3 5 6 1 0 4 Combined Input and Output Queued Batcher Sorter 7 77 5 0 4 66 2 5 5 45 3 1 6 54 1 3 0 33 0 4 3 22 6 2 1 01 4 6 2 20 Self-Routing Network 000 001 Parallel Packet Switches 010 011 100 101 110 111 63

Input Queueing Data In Scheduler configuration Data Out 64

Background 1. [Karol et al. 1987] Throughput limited to by head-of-line blocking for Bernoulli IID uniform traffic. 2. [Tamir 1989] Observed that with “Virtual Output Queues” (VOQs) Head-of-Line blocking is reduced and throughput goes up. 65

Head of Line Blocking 66

Input Queueing Virtual output queues 69

Background Scheduling via Matching 3. [Anderson et al. 1993] Observed analogy to maximum size matching in a bipartite graph. Matching O(N 2. 5) 4. [Mc. Keown et al. 1995] (a) Maximum size match can not guarantee 100% throughput. (b) But maximum weight match can – O(N 3). 70

Background Speedup 5. [Chuang, Goel et al. 1997] Precise emulation of a central shared memory switch is possible with a speedup of two and a “stable marriage” scheduling algorithm. 6. [Prabhakar and Dai 2000] 100% throughput possible for maximal matching with a speedup of two. 71

Simulation Input Queueing Output Queueing 72

Using Speedup 1 2 1 73

Scheduling algorithms to achieve 100% throughput 1. 2. 3. 4. • • • Basic switch model. When traffic is uniform (Many algorithms…) When traffic is non-uniform. Technique: Birkhoff-von Neumann decomposition. Load balancing. Technique: 2 -stage switch. Technique: Parallel Packet Switch. 74

Basic Switch Model S(n) A 11(n) L 11(n) 1 1 D 1(n) A 1 N(n) AN 1(n) DN(n) N ANN(n) N LNN(n) 75

Some Definitions 3. Queue occupancies: Occupancy L 11(n) LNN(n) 76

Some possible performance goals When traffic is admissible 77

Algorithms that give 100% throughput for uniform traffic • Quite a few algorithms give 100% throughput when traffic is uniform • “Uniform”: the destination of each cell is picked independently and uniformly and at random (uar) from the set of all outputs. 79

Maximum size bipartite match • Intuition: maximizes instantaneous throughput L 11(n)>0 Maximum Size Match LN 1(n)>0 “Request” Graph Bipartite Match • Gives 100% throughput for uniform traffic. 80

Some Observations • A maximum size match (MSM) maximizes instantaneous throughput. • But a MSM is complex – O(N 2. 5). • In general, maximal matching is much simpler to implement, and has a much faster running time. • A maximal size matching is at least half the size of a maximum size matching. 81

Maximal vs. Maximum Matching A 1 B 2 C 3 D 4 E 5 5 F 6 E F 6 Maximal Matching Maximum Matching 82

TDM Scheduling Algorithm If arriving traffic is i. i. d with destinations picked uar across outputs, then a “TDM” schedule gives 100% throughput. C A 2 B 3 C D 4 A B 1 D 1 2 A B 1 2 3 C 3 4 D 4 Permutations are picked uar from the set of N! permutations. 83

Why doesn’t maximizing instantaneous throughput give 100% throughput for non-uniform traffic? Three possible matches, S(n): 84

Example: With random arrivals, but known traffic matrix • Assume we know the traffic matrix, and the arrival pattern is random: • Then we can simply choose: 86

Birkhoff - von Neumann Decomposition Turns out, any L can always be decomposed into a linear (convex) combination of matrices, (M 1, …, Mr) by Birkhoff-von Neumann. 87

Birkhoff ‘ 1946 Decomposition Example = 0. 2 + 0. 4 + 88

In practice… • Unfortunately, we usually don’t know traffic matrix L a priori, so we can: – measure or estimate L, or – use the current queue occupancies. 89

2 -stage Switch Motivation: 1. If traffic is uniformly distributed, then even a simple TDM schedule gives 100% throughput. 2. So why not force non-uniform traffic to be uniformly distributed? 91

2 -stage Switch S 1(n) A 1(n) 1 1 L 11(n) A’ 1(n) S 2(n) 1 1 A’N(n) AN(n) N D 1(n) DN(n) N N N LNN(n) Bufferless Load-balancing Stage Buffered Switching Stage 92

2 -stage Switch 93

Parallel Packet Switches Definition: A PPS is comprised of multiple identical lower-speed packet-switches operating independently and in parallel. An incoming stream of packets is spread, packet-bypacket, by a demultiplexor across the slower packet-switches, then recombined by a multiplexor at the output. We call this “parallel packet switching” 94

Architecture of a PPS Demultiplexor R (s. R/k) 1 OQ Switch (s. R/k) 1 1 OQ Switch 2 Demultiplexor R R Multiplexor Demultiplexor R Multiplexor 2 3 3 OQ Switch Demultiplexor R (s. R/k) R Multiplexor R N=4 k=3 N=4 R (s. R/k) 95

Parallel Packet Switches Results [Iyer et al. ] If S >= 2 then a PPS can precisely emulate a FIFO output queued switch for all traffic patterns, and hence achieves 100% throughput. 96

References 1. 2. 3. 4. 5. 6. C. -S. Chang, W. -J. Chen, and H. -Y. Huang, "Birkhoff-von Neumann input buffered crossbar switches, " in Proceedings of IEEE INFOCOM '00, Tel Aviv, Israel, 2000, pp. 1614 – 1623. N. Mc. Keown, A. Mekkittikul, V. Anantharam, and J. Walrand. Achieving 100% Throughput in an Input-Queued Switch. IEEE Transactions on Communications, 47(8), Aug 1999. A. Mekkittikul and N. W. Mc. Keown, "A practical algorithm to achieve 100% throughput in input-queued switches, " in Proceedings of IEEE INFOCOM '98, March 1998. L. Tassiulas, “Linear complexity algorithms for maximum throughput in radio networks and input queued switchs, ” in Proc. IEEE INFOCOM ‘ 98, San Francisco CA, April 1998. C. -S. Chang, D. -S. Lee, Y. -S. Jou, “Load balanced Birkhoff-von Neumann switches, ” Proceedings of IEEE HPSR ‘ 01, May 2001, Dallas, Texas. S. Iyer, N. Mc. Keown, "Making parallel packet switches practical, " in Proc. IEEE INFOCOM `01, April 2001, Alaska. 97

Competitive Analysis: Theory and Applications in Networking Balaji Prabhakar

Decision Making Under Uncertainty: Online Algorithms and Competitive Analysis • Online Algorithm: – Inputs arrive online (one by one) – Algorithm must process each input as it arrives – Lack of knowledge of future arrivals results in inefficiency • Malicious, All-powerful Adversary: – Omniscient: monitors the algorithm – Generates “worst-case” inputs • Competitive Ratio: – Worst ratio of the “cost” of online algorithm to the “cost” of optimum algorithm 99

Competitive Analysis: Discussion • Very Harsh Model – All powerful adversary • But. . – Can often still prove good competitive ratios – Really tough Testing-Ground for Algorithms – Often leads to good rules of thumb which can be validated by other analyses – Distribution independent: doesn’t matter whether traffic is heavy-tailed or Poisson or Bernoulli 100

Competitive Analysis in Networking: Outline • Shared Memory Switches • Multicast Trees – The Greedy Strategy • Routing and Admission Control – The Exponential Metric • More Restricted Adversaries – Adversarial Queueing Theory • Congestion Control 101

Interconnects Output Queueing Individual Output Queues Centralized Shared Memory 1 2 N 102

Buffer Model • We consider Nx. N switch • Shared memory able to hold M bytes • Packets may be either: – accepted/rejected – preempted • All packets have the same size M 103

Shared Memory Example 104

Competitive Analysis Aim: maximize the total number of packets transmitted For each packet sequence S denote, • VOPT(S): value of best possible solution, • VA(S): value obtained by algorithm A Throughput-Competitive Ratio: MAXS {VOPT(S) / VA(S)} Uniform performance guarantee 105

Longest Queue Drop Policy When a packet arrives: – Always accept if the buffer is not full – Otherwise we accept the packet and drop a packet from the tail of the longest queue 106

Longest Queue Drop Policy M=9 107

LQD Policy Analysis Theorem 1 (UB): The competitive ratio of the LQD Policy is at most 2. Theorem 2 (LB): The competitive ratio of the LQD policy is at least 2. Theorem 3 (LB): The competitive ratio of any online policy is at least 4/3. 108

Proof Outline (UB) Definition: An OPT packet p sent at time t is an extra packet if the LQD port is idle. Claim: There exists a matching between each packet from EXTRA to a packet in LQD. OPT LQD EXTRA 109

Matching Construction • For each unmatched OPT packet p in a higher position than the LQD queue length: • When p arrives and it is accepted by both OPT and LQD then match p to itself • Otherwise, match p to any unmatched packet in LQD • If a matched LQD packet p is preempted, then the preempting packet replaces p. 110

Proof Outline (UB) OPT LQD 111

Proof Outline (UB) Lemma: The matching process never fails. • Notice: V(OPT) V(LQD) + V(EXTRA) • Existence of matching implies: V(EXTRA) V(LQD) • We obtain that: V(OPT) 2 V(LQD) 112

Proof Outline (LB) Scenario (active ports 1 & 2): • At t = 0 two bursts of M packets to 1 & 2 arrive. • The online retains at most M/2, say 1’s packets. • During the following M time slots one packet destined to 2 arrives. • The scenario is repeated. 113

Proof Outline (LB-LQD) Scenario: • the switch memory M = A 2/2 + A • the number of output ports N = 3 A Active A Ovld. A A Idle 114

Proof Outline (LB-LQD) Active output ports: • have an average load of 1 with period A • the bursts to successive ports are evenly staggered in time Overloaded output ports: • receive exactly 2 packets every time slot 115

Proof Outline (LB-LQD) OPT ensures that both the active and overloaded output ports are completely utilized. At the same time throughput of the active output ports in LQD is ( 2 -1)A. 116

Other Policies Complete Partition: N-competitive – Allocate to each output port M/N buffer space Complete Sharing: N-competitive – Admit packets into the buffer if there is some free space 117

Other Policies Cont. Static Threshold: N-competitive – Set the threshold for a queue length to M/ N – A packet is admitted if the threshold is not violated and there is a free space Dynamic Threshold: open problem – Set the threshold for a queue length to the amount of the free buffer space – All packets above threshold are rejected 118

Steiner Tree Problem Objective: find a minimum cost tree connecting S. 120

KMB Algorithm (Offline) Due to [Kou, Markowsky and Berman 81’] • Step 1: Construct a complete directed distance graph G 1=(V 1, E 1, c 1). • Step 2: Find the min spanning tree T 1 of G 1. • Step 3: Construct a subgraph GS of G by replacing each edge in T 1 by its corresponding shortest path in G. • Step 4: Find the min spanning tree TS of GS. • Step 5: Construct a Steiner tree TH from TS by deleting edges in TS if necessary, so that all the leaves in TH are Steiner points. 121

KMB Algorithm Cont. Worst case time complexity O(|S||V|2). Cost no more than 2(1 - 1/l) *optimal cost where l = number of leaves in the steiner tree. 122

KMB Example A A 4 1 10 H I 1/2 G 1 1 1 B 8 1 F 2 C 9 E 2 D B D 4 4 A 1 4 H C I 1/2 1 G A 4 1 D B 1 4 B 1 F 2 4 C E 2 D C Destination Nodes Intermediate Nodes 123

KMB Example Cont. A A 1 1 H I 1/2 G B 1 1 F 2 C I 1 1/2 1 E B 1 1 F 2 2 C D E 2 D Destination Nodes Intermediate Nodes 124

Incremental Construction of Multicast Trees • Fixed Multicast Source s – K receivers arrive one by one – Must adapt multicast tree to each new arrival without rerouting existing receivers – Malicious adversary generates bad requests – Objective: Minimize total size of multicast tree a s r 1 b C=3/2 Can create worse sequences 125

Dynamic Steiner Tree (DST) • G=(V, E) weighted, undirected, connected graph. • Si V is the set of terminal nodes to be connected at step i. 126

Two Classes of Online Algorithms • Shortest Path Algorithm – Each receiver connects using shortest path to source (or to a core) • DVMRP [Waitzman, Partridge, Deering ’ 88] • CBT [Ballardie, Francis, Crowcroft ‘ 93] • PIM [Deering et al. ’ 96] • Greedy Algorithm [Imase and Waxman ‘ 91] – Each receiver connects to the closest point on the existing tree – Independently known to the Systems community • The “naive” algorithm [Doar and Leslie ‘ 92] • End-system multicasting [Faloutsos, Banerjea, Pankaj ’ 98; Francis ‘ 99] 127

Shortest Path Algorithm: Example r 1 r 2 s r 3 r. K N • Receivers r 1, r 2, r 3, … , r. K join in order 128

Shortest Path Algorithm r 1 r 2 s r 3 r. K N • Cost of shortest path tree K N 129

Shortest Path Algorithm Competitive Ratio s r 1 r 2 r 3 r. K • Optimum Cost K + N • If N is large, the competitive ratio is K 130

Greedy Algorithm • Theorem 1: For the greedy algorithm, competitive ratio = O(log K) • Theorem 2: No algorithm can achieve a competitive ratio better than log K [Imase and Waxman ’ 91] Greedy algorithm is the optimum strategy 131

Proof of Theorem 1 [Alon and Azar ’ 93] • L = Size of the optimum multicast tree • pi = amount paid by online algorithm for ri – i. e. the increase in size of the greedy multicast tree as a result of adding receiver ri • Lemma 1: The greedy algorithm pays 2 L/j or more for at most j receivers – Assume the lemma – Total Cost 2 L (1 + 1/2 + 1/3 + … 1/K) ¼ 2 L log K 132

Proof of Lemma 3 Suppose towars a contradiction that there are more than j receivers for which the greedy algorithm paid more than 2 L/j – Let these be r 1, r 2, … , rm, for m larger than j – Each of these receivers is at least 2 L/j away from each other and from the source 133

Tours and Trees r 1 s r 3 r 2 r 4 rm s rm r 1 Each segment 2 L/j, Tour cost > 2 L r 2 r 3 One can construct tour from tree by repeating edges at most twice, Tour cost 2 L r 4 134

The Exponential Cost Metric • Consider a resource with capacity C • Assume that a fraction of the resource has been consumed • Exponential cost “rule of thumb”: The cost of the resource is given by m for appropriately chosen m • Intuition: Cost increases steeply with – Bottleneck resources become expensive Cost l 136

Applications of Exponential Costs • Exponential cost “rule of thumb” applies to – – – Online Routing Online Call Admission Control Stochastic arrivals Stale Information Power aware routing 137

The Online Routing Problem • Connection establishment requests arrive online in a VPN (Virtual Private Network) • Must assign a route to each connection and reserve bandwidth along that route – PVCs in ATM networks – MPLS + RSVP in IP networks • Oversubscribing is allowed – Congestion = the worst oversubscribing on a link • Goal: Assign routes to minimize congestion • Assume all connections have identical b/w requirement, all links have identical capacity 138

Online Routing Problem: Example a s r 1 b C=2 Can create worse sequences 139

Online Algorithm for Routing • L = Fraction of bandwidth of link L that has been already reserved • m = N, the size of the network • The Exponential Cost Algorithm: – Route each incoming connection on current cheapest path from src to dst – Reserve bandwidth along this path [Aspnes et al. ‘ 93] 140

Online Algorithm for Routing • Theorem 1: The exponential cost algorithm achieves a competitive ratio of O(log N) for congestion • Theorem 2: No algorithm can achieve competitive ratio better than log N in asymmetric networks This simple strategy is optimum! 141

Online Admission Control and Routing • Connection establishment requests arrive online • Must assign a route to each connection and reserve bandwidth along that route • Oversubscribing is not allowed – Must perform admission control • Goal: Admit and route connections to maximize total number of accepted connections (throughput) 143

Exponential Metric and Admission Control • When a connection arrives, compute the cheapest path under current exponential costs • If the cost of the path is less than m then accept the connection; else reject [Awerbuch, Azar, Plotkin ’ 93] • Theorem: This simple algorithm admits at least O(1/log N) as many calls as the optimum 144

Assume Stochastic Arrivals • • • Connection arrivals are Poisson, durations are Memory-less Assume fat links (Capacity >> log N) Theorem: The exponential cost algorithm results in 1. Near-optimum congestion for routing problem 2. Near-optimum throughput for admission problem [Kamath, Palmon, Plotkin ’ 96] Near-optimum: Compt. ratio = (1+e) for e close to 0 146

Versatility of Exponential Costs • Guarantees of log N for Competitive ratio against malicious adversary • Near-optimum for stochastic arrivals • Near-optimum given fixed traffic matrix [Young ’ 95; Garg and Konemann ’ 98] 147

Exponential Metrics and Stale Information • Exponential metrics continue to work well if – Link states are a little stale – Shortest paths are reused over small intervals rather than recomputed for each connection – No centralized agent [Goel, Meyerson, Plotkin ’ 01] • Caveat: Still pretty hard to implement 149

Power Aware Routing • Consider a group of small mobile nodes eg. sensors which form an adhoc network – Bottleneck Resource: Battery – Goal: Maximize the time till the network partitions • Assign a cost to each mobile node which is m where = fraction of battery consumed – Send packets over the cheapest path under this cost measure • O(log n) competitive against an adversary – Near-optimum for stochastic/fixed traffic 151

Adversarial Queueing Theory Motivation • Malicious, all-knowing adversary – Injects packets into the network – Each packet must travel over a specified route • Suppose adversary injects 3 packets per second from s to r – Link capacities are one packet per second s r – No matter what we do, we will have unbounded queues and unbounded delays – Need to temper our definition of adversaries 153

Adversarial Queueing Theory Bounded Adversaries • Given a window size W, and a rate r < 1 – For any link L, and during any interval of duration T > W, the adversary can inject at most r. T packets which have link L in their route • Adversary can’t set an impossible task!! – More gentle than competitive analysis • Will study packet scheduling strategies – Which packet to forward if more than one packets are waiting to cross a link? 154

Some Interesting Scheduling Policies • FIFO: First In First Out • LIFO: Last In First Out • NTG: Nearest To Go – Forward a packet which is closest to destination • FTG: Furthest To Go – Forward a packet which is furthest from its destination • LIS: Longest In System – Forward the packet that got injected the earliest – Global FIFO • SIS: Shortest In System – Forward the packet that got injected the last – Global LIFO 155

Stability in the Adversarial Model • Consider a scheduling policy (eg. FIFO, LIFO etc. ) • The policy is universally stable if for networks and all “bounded adversaries”, the packet delays and queue sizes remain bounded • FIFO, LIFO, NTG are not universally stable [Borodin et al. ‘ 96] • LIS, SIS, FTG are universally stable [Andrews et al. ‘ 96] 156

Adversarial Queueing Model: Routing Using the Exponential Cost Metric • Adversary injects packets into the network but gives only the src, dst – The correct routes are hidden • Need to compute routes – Again, use the exponential cost metric – Reset the cost periodically to zero – Use any stable scheduling policy • Theorem: The combined routing and scheduling policy is universally stable [Andrews et al. ’ 01] 157

The Problem S o u r c e s S i n k s • What rates should the users use to send their data? • How to keep the network efficient and fair? • Goal: match the available bandwidth ! 159

Model Description • Model – Time divided into steps – Oblivious Adversary – Source select xi Available Bandwidth bi chosen by the Adversary Algorithm picks and sends xi • Severe cost function Time 160

Competitive Ratio • An Algorithm achieves • Optimal (offline) achieves • Seek to minimize 161

Adversary Model • Unrestricted Adversary – Has too much power • Fixed Range Adversary • µ-multiplicative adversary • {α, β}-additive adversary 162

Fixed Range Model • Adversary selects any value • Deterministic Algorithm – Optimal would never select a rate > c • If optimal does, adversary can select c, causing the algorithm to send 0 – Optimal selects c – In that case, adversary selects d – Competitive ratio is d/c 163

Fixed range – Randomized Algorithm • No randomized algorithm can achieve competitive ratio better than 1+ln(d/c) in the fixed range model with range [c, d] • Proof : – Yao’s minimax principle – Consider a randomized adversary against deterministic algorithms – Adversary can choose g(y) = c/y^2 in [c, d) – With probability c/d chooses d 164

Proof continued …. • If the algorithm picks xi = x • The expected optimal is at most 165

µ-multiplicative model – Randomized Algorithm • No randomized algorithm can achieve competitive ratio better than ln(µ) + 1 • Proof: – Adversary can always choose bi in [bi, µbi] 166

Randomized Algorithm 4 log(µ) + 12 • Assumptions –relaxed later– µ is a power of 2 – b 1 is in the range [1, 2µ) • Algorithm (MIMD) – At step 1, pick at random x 1 power of 2 between 1 and 2µ – On failure, xi+1 = xi/2; – On success, xi+1 = 2µxi; • Claim: – Competitive ratio of 4 log(µ) + 12 167

Proof outline • Think about choosing one deterministic algorithm from log(2µ) + 1 choices • Think about the algorithms as an ensemble running in parallel • Will show that the ensemble manages to send at least opt/4. [A bit of work] • Once this is done, picking one algorithm gives opt/4(log(µ)+2) 168

Proof (1/3) • Algorithms pick consecutive sequence • Ensemble is successful – bi falls in the picked range – ei : largest value sent by any algorithm – bi < 2 ei – At the next step, if the bandwidth increases or stays constant, the ensemble will succeed • bi < 2 ei , bi+1 < µbi => bi+1 < 2µei • Bandwidth lies in the range covered by the ensemble 169

Proof (2/3) • Need to worry about decreasing bandwidth – May decrease very fast – Ensemble achieved ei at step i – Now it was unsuccessful at step i + 1 • Could not have been more than ei available – At step i+2, they all divide their rates by 2 • Could not have been more than ei/2 available – By induction, one can show that : • ei + ei/2 + ei/4 + …. = 2 ei 170

Proof (3/3) • Optimal algorithm could have achieved at most 4 ei – Up to 2 ei in at step I because it is not constrained to choose a power of 2 – 2 ei when the ensemble were not successful • Summing over all time steps, at least we can transmit opt/4 • µ- assumption -> round µ to the next power of 2. Result in log(µ) + 3 algorithms 171

References 1. 2. 3. 4. 5. 6. N. Alon and Y. Azar. On-line Steiner trees in the Euclidean plane. Discrete and Computational Geometry, 10(2), 113 -121, 1993. M. Andrews, B. Awerbuch, A. Fernandez, J. Kleinberg, T. Leighton, and Z. Liu. Universal stability results for greedy contentionresolution protocols. Proceedings of the 37 th IEEE Conference on Foundations of Computer Science, 1996. M. Andrews, A. Fernandez, A. Goel, and L. Zhang. Source Routing and Scheduling in Packet Networks. To appear in the proceedings of the 42 nd IEEE Foundations of Computer Science, 2001. J. Aspnes, Y. Azar, A. Fiat, S. Plotkin, and O. Waarts. On-line load balancing with applications to machine scheduling and virtual circuit routing. Proceedings of the 25 th ACM Symposium on Theory of Computing, 1993. B. Awerbuch, Y. Azar, and S. Plotkin. Throughput competitive online routing. Proceedings of the 34 th IEEE symposium on Foundations of Computer Science, 1993. A. Ballardie, P. Francis, and J. Crowcroft. Core Based Trees(CBT) An architecture for scalable inter-domain multicast routing. Proceedings of the ACM SIGCOMM, 1993. 172

References [Contd. ] 7. 8. 9. 10. 11. 12. A. Borodin, J. Kleinberg, P. Raghavan, M. Sudan, and D. Williamson. Adversarial queueing theory. Proceedings of the 28 th ACM Symposium on Theory of Computing, 1996. S. Deering, D. Estrin, D. Farinacci, V. Jacobson, C. Liu, and L. Wei. The PIM architecture for wide-area multicast routing. IEEE/ACM Transactions on Networking, 4(2), 153 -162, 1996. M. Doar and I. Leslie. How bad is Naïve Multicast Routing? IEEE INFOCOM, 82 -89, 1992. M. Faloutsos, A. Banerjea, and R. Pankaj. Qo. SMIC: quality of service sensitive multicast Internet protocol. Computer Communication Review, 28(4), 144 -53, 1998. P. Francis. Yoid: Extending the Internet Multicast Architecture. Unrefereed report, http: //www. isi. edu/div 7/yoid/docs/index. html. N. Garg and J. Konemann. Faster and simpler algorithms for multicommodity flow and other fractional packing problems. Proceedings of the 39 th IEEE Foundations of Computer Science, 1998. 173

References [Contd. ] 13. 14. 15. 16. 17. 18. 19. A. Goel, A. Meyerson, and S. Plotkin. Distributed Admission Control, Scheduling, and Routing with Stale Information. Proceedings of the 12 th ACM-SIAM Symposium on Discrete Algorithms, 2001. A. Goel and K. Munagala. Extending Greedy Multicast Routing to Delay Sensitive Applications. Short abstract in proceedings of the 11 th ACM-SIAM Symposium on Discrete Algorithms, 2000. Long version to appear in Algorithmica. M. Imase and B. Waxman. Dynamic Steiner tree problem. SIAM J. Discrete Math. , 4(3), 369 -384, 1991. C. Intanagonwiwat, R. Govindan, and D. Estrin. Directed diffusion: A scalable and robust communication paradigm for sensor networks. Proceedings of the 6 th Annual International Conference on Mobile Computing and Networking (Mobi. COM), 2000. A. Kamath, O. Palmon, and S. Plotkin. Routing and admission control in general topology networks with Poisson arrivals. Proceedings of the 7 th ACM-SIAM Symposium on Discrete Algorithms, 1996. D. Waitzman, C. Partridge, and S. Deering. Distance Vector Multicast Routing Protocol. Internet RFC 1075, 1988. N. Young. Randomized rounding without solving the linear program. Proceedings of the 6 th ACM-SIAM Symposium on Discrete Algorithms, 1995. 174

References [Contd. ] 20. R. Karp, E. Koutsoupias, C. Papadimitriou, and S. Shenker, “Optimization problems in congestion control”. In Proceedings of the 41 st Annual IEEE Symposium of Foundation of Computer Science. 21. S. Arora, B. Brinkman, “A Randomized Online Algorithm for Bandwidth Utilization ” 175

Non-Preemptive Scheduling of Optical Switches Balaji Prabhakar

Optical Fabric Tunable Lasers . . . Receivers . . . q. Switching is achieved by tuning lasers to different wavelengths q. The time to tune the lasers can be much longer than the duration of a cell 177

Model Description Ø Input-queued switch. Ø Scheduler picks a new configuration (matching). Ø There is a configuration delay C. Ø Then the configuration is held for a pre -defined period of time. 178

The Bipartite Scheduling Problem Ø The makespan of the schedule: • total holding time + • the configuration overhead. Ø Goal: minimize the makespan. Ø Preemptive: cells from a single queue can be scheduled in different configurations. Ø Non-preemptive: all cells from a single queue are scheduled in just one configuration. 179

Non-Preemptive Scheduling èMinimizes the number of reconfigurations. èAllows to design low complexity schedulers, which can operate at high speeds. èHandles efficiently variable size packets: no need to keep packet reassembly buffers. 180

Greedy Algorithm The weight of each edge is the occupancy of the corresponding input queue. 1. Create a new matching. 2. Go over uncovered edges in order of nondecreasing weight. Add the edge to the matching if possible marking it as covered. 3. If there are uncovered edges, goto Step 1. 181

Greedy Algorithm Example 7 3 4 2 5 Total holding time: 7+3+2 Configuration overhead: 3 C 2 1 7 4 3 2 5 2 1 182

Analysis of Greedy: Complexity Theorem 1: Greedy needs at most 2 N-1 configurations. Proof outline: • Consider all VOQi* and all VOQ*j • There can be at most 2 N-1 such queues • At each iteration, at least one of the corresponding edges is covered • Thus, after 2 N-1 iterations VOQij must be served. 183

Analysis of Greedy: Makespan Theorem 2 (UB): Greedy achieves an approximation factor of at most 2 for all values of C. Theorem 3 (Greedy-LB): Greedy achieves an approximation factor of at least 2 for C=. 184

Proof of Theorem 2 Consider the k-th matching and let (i, j) be the heaviest edge of weight w. Lemma 1: There at least k/2 edges of weight w incident to either input i or output j. Proof outline: In all iterations 1, . . . , k-1 Greedy chosen edge of weight w incident to i or j. 185

Proof of Theorem 2 Cont. Observation 1: OPT’s schedule contains at least k/2 configurations. Observation 2: The k/2 -th largest holding time in OPT’s schedule is at least w. The theorem follows ! 186

Hardness Results Theorem 4 (General-LB): The NPBS problem is NP-hard for all values of C and hard to approximate within a factor better than 7/6. Proof outline: [GW 85, CDP 01] • Reduction from the Restricted Timetable Design problem, asg. of teachers for 3 hrs. • Encoding as a demand matrix, C=. • There is an optimal non-preemptive schedule that contains 3 matchings. • Works for all values of C ! 187

Offline vs. Online Ø We considered Greedy in the offline case Ø What if packets constantly arrive ? Ø We use the idea of batch scheduling Ø Avoids starvation since all queued cells are included in the next batch 188

Batch Scheduling Batch-(k+1) R 1 Batch-(k) Crossbar 1 1 R N N R 1 N R N 189

Requirements Ø We have shown that the makespan of Greedy is at most twice that of OPT Ø A moderate speedup of 2 will allow us to provide strict delay guarantees for any admissible traffic 190

Open Problems • Close the gap between the upper and the lower bound (2 vs. 7/6), • Consider packet-mode scheduling 191

Literature Ø Preemptive scheduling: § [Inukai 79] Inukai. An Efcient SS/TDMA Time Slot Assignment Algorithm. IEEE Trans. on Communication, 27: 1449 -1455, 1979. § [GW 85] Gopal and Wong. Minimizing the Number of Switchings in a SS/TDMA System. IEEE Trans. Communication, 33: 497 -501, 1985. § [BBB 87] Bertossi, Bongiovanni and Bonuccelli. Time Slot Assignment in SS/TDMA systems with intersatellite links. IEEE Trans. on Communication, 35: 602 -608. 1987. § [BGW 91] Bonuccelli, Gopal and Wong. Incremental Time Slot Assignement in SS/TDMA satellite systems. IEEE Trans. on Communication, 39: 1147 -1156. 1991. § [GG 92] Ganz and Gao. Efficient Algorithms for SS/TDMA scheduling. IEEE Trans. on Communication, 38: 1367 -1374. 1992 § [CDP 01] Crescenzi, Deng and Papadimitriou. On Approximating a Scheduling Problem, Journal of Combinatorial Optimization, 5: 287 -297, 2001. § [TD 02] Towles and Dally. Guaranteed Scheduling for Switches with Conguration Overhead. Proc. of INFOCOM'02. § [LH 03] Li and Hamdi, -Adjust Algorithm for Optical Switches with Reconguration Delay. Proc. of ICC'03. §. . . many others Ø Non-preemptive scheduling: § [PR 00] Prais and Ribeiro. Reactive GRASP: An Application to a Matrix Decomposition Problem in TDMA Trafc Assignment. INFORMS Journal on Computing, 12: 164 -176, 2000. 192