Скачать презентацию CS 514 Intermediate Course in Operating Systems Professor Скачать презентацию CS 514 Intermediate Course in Operating Systems Professor

83ee4b5ec30b3e1e70121eca592759d1.ppt

  • Количество слайдов: 47

CS 514: Intermediate Course in Operating Systems Professor Ken Birman Vivek Vishnumurthy: TA CS 514: Intermediate Course in Operating Systems Professor Ken Birman Vivek Vishnumurthy: TA

Resilient Overlay Networks n A hot new idea from MIT n n Shorthand name: Resilient Overlay Networks n A hot new idea from MIT n n Shorthand name: RON Today: n n n What’s a RON? Are these a good idea, or just antisocial? What next: Underlay networks

What is an overlay network? n n A network whose links are based on What is an overlay network? n n A network whose links are based on (end-to-end) IP hops And that has multi-hop forwarding n n I. e. a path through the network will traverse multiple IP “links” As a “network” service (i. e. not part of an application per se) n I. e. we are not including Netnews, IRC, or caching CDNs in our definition

Why do we have overlay networks? n Historically (late 80’s, early 90’s): n n Why do we have overlay networks? n Historically (late 80’s, early 90’s): n n to get functionality that IP doesn’t provide as a way of transitioning a new technology into the router infrastructure n n mbone: IP multicast overlay 6 bone: IPv 6 overlay

Why do we have overlay networks? n More recently (mid-90’s): n to overcome scaling Why do we have overlay networks? n More recently (mid-90’s): n to overcome scaling and other limitations of “infrastructure-based” networks (overlay or otherwise) n Yoid and End-system Multicast n n Two “end-system” or (nowadays) “peer-to-peer” multicast overlays Customer-based VPNs are kind-of overlay networks n If they do multi-hop routing, which most probably don’t

Why do we have overlay networks? n Still more recently (late-90’s, early 00’s): n Why do we have overlay networks? n Still more recently (late-90’s, early 00’s): n to improve the performance and reliability of native IP!!! n RON (Resilient Overlay Network) n n Work from MIT (Andersen, Balakrishnan) Based on results of Detour measurements of Savage et. al. , Univ of Washington, Seattle

End-to-end effects of Internet Path Selection n Savage et. al. , Univ of Washington End-to-end effects of Internet Path Selection n Savage et. al. , Univ of Washington Seattle n Compared path found by internet routing with alternates n n n Alternates composed by gluing together two internetrouted paths Roundtrip time, loss rate, bandwidth Data sets: Paxson, plus new ones from UW n Found improvements in all metrics with these “Detour” routes

BGP and other studies n n n Paxson (ACIR), Labovitz (U Mich), Chandra (U BGP and other studies n n n Paxson (ACIR), Labovitz (U Mich), Chandra (U Texas) Show that outages are frequent (>1%) BGP can take minutes to recover

RON Rational n n BGP cannot respond to congestion Because of information hiding and RON Rational n n BGP cannot respond to congestion Because of information hiding and policy, BGP cannot always find best path n n n Private peering links often cannot be discovered BGP cannot respond quickly to link failures However, a small dynamic overlay network can overcome these limitations

BGP lack of response to congestion n n Very hard for routing algorithms to BGP lack of response to congestion n n Very hard for routing algorithms to respond to congestion (route around it) Problem is oscillations n n Traffic moved from congested link to lightly-loaded link, then lightly-load link becomes congestions, etc. ARPANET (~70 node network) struggled with this for years n n Khanna and Zinky finally solved this (SIGCOMM ’ 89) Heavy damping of responsiveness

BGP information hiding Internet 30. 1/16 20. 1/16 ISP 1 30. 1. 3/24 ISP BGP information hiding Internet 30. 1/16 20. 1/16 ISP 1 30. 1. 3/24 ISP 2 30. 1. 3/24 Site 1 20. 1. 5/24 Site 2 20. 1. 5/24 Private peering link. Site 1 and Site 2 can exchange traffic, but Site 2 cannot receive internet traffic via ISP 1 (even if policy allows it).

Acceptable Use Policies n Why might Cornell hide a link n Perhaps, Cornell has Acceptable Use Policies n Why might Cornell hide a link n Perhaps, Cornell has a great link to the Arecebo telescope in Puerto Rico but doesn’t want all the traffic to that island routing via Cornell n n n E. g. we pay for it, and need it for scientific research But any Cornell traffic to Puerto Rico routes on our dedicated link This is an example of an “AUP” n “Cornell traffic to 123. 45. 00/16 can go via link x, but non-Cornell traffic is prohibited”

BGP information hiding Internet 30. 1/16 30. 1. 3/24 ISP 1 ISP 2 20. BGP information hiding Internet 30. 1/16 30. 1. 3/24 ISP 1 ISP 2 20. 1/16 X Site 1 Site 2 20. 1. 5/24

RON can bypass BGP information hiding RON 3 Internet 30. 1/16 30. 1. 3/24 RON can bypass BGP information hiding RON 3 Internet 30. 1/16 30. 1. 3/24 ISP 1 20. 1/16 X Site 1 RON 1 30. 1. 3. 5 ISP 2 20. 1. 5/24 Site 2 RON 2 20. 1. 5. 7 …but in doing so may violate AUP

RON test network had private peering links RON test network had private peering links

BGP link failure response n BGP cannot respond quickly to changes in AS path BGP link failure response n BGP cannot respond quickly to changes in AS path n n n Hold down to prevent flapping Policy limitations But BGP can respond locally to link failures n And, local topology can be engineered for redundancy

Local router/link redundancy e. BGP and/or i. BGP can respond to peering failure without Local router/link redundancy e. BGP and/or i. BGP can respond to peering failure without requiring an AS path change ISP R R R R AS 1 AS 2 Intra-domain routing (i. e. OSPF) can respond to internal ISP failures AS path responsiveness is not strictly necessary to build robust internets with BGP. Note: the telephone signalling network (SS 7, a data network) is built this way.

Goals of RON n Small group of hosts cooperate to find better-than-native. IP paths Goals of RON n Small group of hosts cooperate to find better-than-native. IP paths n n Multiple criteria, application selectable per packet n n n 10 -20 seconds Policy routing n n Latency, loss rate, throughput Better reliability too Fast response to outages or performance changes n n ~50 hosts max, though working to improve Avoid paths that violate the AUP (Acceptable Usage Policy) of the underlying IP network General-purpose library that many applications may use n C++

Some envisioned RON applications n n n Multi-media conference Customer-provided VPN High-performance ISP Some envisioned RON applications n n n Multi-media conference Customer-provided VPN High-performance ISP

Basic approach n n Small group of hosts All ping each other---a lot n Basic approach n n Small group of hosts All ping each other---a lot n n n Run a simplified link-state algorithm over the N 2 mesh to find best paths n n Order every 10 seconds 50 nodes produces 33 kbps of traffic per node! Metric and policy based Route over best path with specialized metricand policy-tagged header n Use hysteresis to prevent route flapping

Major results (tested with 12 and 16 node RONs) n n n Recover from Major results (tested with 12 and 16 node RONs) n n n Recover from most complete outages and all periods of sustained high loss rates of >30% 18 sec average to route around failures Routes around throughput failures, doubles throughput 5% of time, reduced loss probability by >0. 5 Single-hop detour provides almost all the benefit

RON Architecture Simple send(), recv(callback) API Local or shared among nodes. Relational DB allows RON Architecture Simple send(), recv(callback) API Local or shared among nodes. Relational DB allows a rich set of query types. Router is itself a RON client

Conduit, Forwarder, and Router Conduit, Forwarder, and Router

RON Header (inspired by IPv 6!…but not IPv 6) Runs under IP Performance metrics RON Header (inspired by IPv 6!…but not IPv 6) Runs under IP Performance metrics IPv 4 Unique per flow. Cached to speed up (3 phase) forwarding decision Runs over UDP Selects conduit

Link evaluation n Defaults: n Latency (sum of): n n n Loss rate (product Link evaluation n Defaults: n Latency (sum of): n n n Loss rate (product of): n n average of last 100 samples Throughput (minimum of): n n lat 1 • lat 1 + (1 - ) • new_sample 1 exponential weighted moving average ( = 0. 9) Noisy, so look for at least 50% improvement Use simple TCP throughput formula: √ 1. 5 / (rtt • √p) p=loss probability Plus, application can run its own

Responding to failure n n n Probe interval: 12 seconds Probe timeout: 3 seconds Responding to failure n n n Probe interval: 12 seconds Probe timeout: 3 seconds Routing update interval: 14 seconds

RON overhead 10 nodes 20 nodes 30 nodes 40 nodes 50 nodes 1. 8 RON overhead 10 nodes 20 nodes 30 nodes 40 nodes 50 nodes 1. 8 Kbps 5. 9 Kbps 12 Kbps 21 Kbps 32 Kbps n n n Probe overhead: 69 bytes RON routing overhead: 60 + 20 (N-1) 50: allows recovery times between 12 and 25 s

Two policy mechanisms n Exclusive cliques n n Only member of clique can use Two policy mechanisms n Exclusive cliques n n Only member of clique can use link Good for “Internet 2” policy n n General policies n n No commercial endpoints went over Internet 2 links BPF-like (Berkeley Packet Filter) packet matcher and list of denied links Note: in spite of this, AUPs may easily, even intentionally, be violated

RON deployment (19 sites) To vu. nl lulea. se ucl. uk To kaist. kr, RON deployment (19 sites) To vu. nl lulea. se ucl. uk To kaist. kr, . ve . com (ca), dsl (or), cci (ut), aros (ut), utah. edu, . com (tx) cmu (pa), dsl (nc), nyu , cornell, cable (ma), cisco (ma), mit, vu. nl, lulea. se, ucl. uk, kaist. kr, univ-in-venezuela

AS view AS view

Latency CDF ~20% improvement? Latency CDF ~20% improvement?

Same latency data, but as scatterplot Banding due to different host pairs Same latency data, but as scatterplot Banding due to different host pairs

30 -min average loss rate on Internet RON greatly improves lossrate RON loss rate 30 -min average loss rate on Internet RON greatly improves lossrate RON loss rate never more than 30% 13, 000 samples 30 -min average loss rate with RON

An order-of-magnitude fewer failures 30 -minute average loss rates Loss Rate 10% 20% 30% An order-of-magnitude fewer failures 30 -minute average loss rates Loss Rate 10% 20% 30% 50% 80% 100% RON Better 479 127 32 20 14 10 No Change 57 4 0 0 RON Worse 47 15 0 0

Resilience Against Do. S Attacks Resilience Against Do. S Attacks

Some unanswered questions n How much benefit comes from smaller or larger RONs? n Some unanswered questions n How much benefit comes from smaller or larger RONs? n n Would a 4 -node RON buy me much? Do results apply to other user communities? n n Testbed consisted mainly of highbandwidth users (3 home broadband) Research networks may have more private peering than residential ISPs

Some concerns n n Modulo unanswered questions, clearly RON provides an astonishing benefit However. Some concerns n n Modulo unanswered questions, clearly RON provides an astonishing benefit However. . .

Some concerns n Is RON TCP-unfriendly? n n RON path change looks like a Some concerns n Is RON TCP-unfriendly? n n RON path change looks like a nonslowstarted TCP connection On the other hand, RON endpoints (TCP) would back off after failure

Some concerns n Would large-scale RON usage result in route instabilities? n n Small Some concerns n Would large-scale RON usage result in route instabilities? n n Small scale probably doesn’t because a few RONs are not enough to saturate a link Note: internet stability is built on congestion avoidance within a stable path, not rerouting

Some concerns n RON’s ambient overhead is significant n n n Lots of RONs Some concerns n RON’s ambient overhead is significant n n n Lots of RONs would increase overall internet traffic, lower performance This is not TCP-friendly overhead 32 Kbps (50 -node RON) is equivalent to highquality audio, low-quality video! Clearly the internet can’t support much of this RON folks are working to improve overhead

RON creators’ opinion on overhead n n “Not necessarily excessive” “Our opinion is that RON creators’ opinion on overhead n n “Not necessarily excessive” “Our opinion is that this overhead is not necessarily excessive. Many of the packets on today’s Internet are TCP acknowledgments, typically sent for every other TCP data segment. These “overhead” packets are necessary for reliability and congestion control; similarly, RON’s active probes may be viewed as “overhead” that help achieve rapid recovery from failures. ”

Some concerns n RONs break AUPs (Acceptable Usage Policies) n RON has its own Some concerns n RONs break AUPs (Acceptable Usage Policies) n RON has its own policies, but requires user cooperation and diligence

Underlay Networks n Idea here is that the Internet has a lot of capacity Underlay Networks n Idea here is that the Internet has a lot of capacity n n n So suppose we set some resources to the side and constructed a “dark network” It would lack the entire IP infrastructure but could carry packets, like Ethernet Could we build a new Internet on it with better properties?

The vision: Side by side Internets The Internet Media. Net Secure. Net Reliable. Net The vision: Side by side Internets The Internet Media. Net Secure. Net Reliable. Net Shared HW, but not “internet”

Doing it on the edges n We might squeak by doing it only at Doing it on the edges n We might squeak by doing it only at the edges n n n After all, core of Internet is “infinitely fast” and loses no packets (usually) So if issues are all on the edge, we could offer these new services just at the edge Moreover, if a data center owns a fat pipe to the core, we might be able to do this just on the consumer side, and just on last few hops…

Pros and cons n Pros: n n n With a free hand, we could Pros and cons n Pros: n n n With a free hand, we could build new and much “stronger” solutions, at least in desired dimensions New revenue opportunities for ISPs Cons n n These “Super. Nets” might need non-standard addressing or non-IP interfaces, which users might then reject And ISPs would need to agree on a fee model and how to split the revenue

Summary n n n The Internet itself has become a serious problem for at Summary n n n The Internet itself has become a serious problem for at least some applications To respond, people have started to hack the network with home-brew routing But a serious response might need to start from the ground up, and would face political and social challenges!