Скачать презентацию Evaluating Potential Routing Diversity for Internet Failure Recovery Скачать презентацию Evaluating Potential Routing Diversity for Internet Failure Recovery

d6a1a2f330781d30669b743cb13dfe68.ppt

  • Количество слайдов: 32

Evaluating Potential Routing Diversity for Internet Failure Recovery *Chengchen Hu, +Kai Chen, +Yan Chen, Evaluating Potential Routing Diversity for Internet Failure Recovery *Chengchen Hu, +Kai Chen, +Yan Chen, *Bin Liu *Tsinghua University, +Northwestern University 1/21

Internet Failures p Failure is part of everyday life in IP networks ue. g. Internet Failures p Failure is part of everyday life in IP networks ue. g. , 675, 000 excavation accidents in 2004 [Common Ground Alliance] u. Network cable cuts every few days … p Real-world emergencies or disasters can lead to substantial Internet disruption u. Earthquakes u. Storms u. Terrorist incident: 9. 11 event u… 2/21

Example: Taiwan earthquake incident p Large earthquakes hit south of Taiwan on 26 December Example: Taiwan earthquake incident p Large earthquakes hit south of Taiwan on 26 December 2006 Page p Only two of nine cross-sea cables 3 not affected p There abundant physical level connectivity there, but the it took too long for ISPs to find them and use them. figures cited from "Aftershocks from the Taiwan Earthquakes: Shaking up Internet transit in Asia, NANOG 42" 3/21

How reliable the Internet is? p Internet is not as reliable as people expected! How reliable the Internet is? p Internet is not as reliable as people expected! [Wu, Co. NEXT’ 07] u 32% ASes are vulnerable to a single critical customerprovider link cut u 93. 7% Tier-1 ISP’s single-homed customers are lost from the peered ISP due to Tier-1 depeering p Our question: can we find more resources to increase the Internet reliability especially when Internet emergency happens? 4/21

Basic Idea p Two places where we can find more routing diversities: u. Internet Basic Idea p Two places where we can find more routing diversities: u. Internet e. Xchange Points (IXPs) Ø Co-location where multiple ASes exchange their traffic Ø Participant ASes in an IXP may not be connected via BGP u. Internet valley-free routing policy Ø AS relationships: customer-provider, peering, sibling Ø Peering relaxation (PR): allow one AS to carry traffic from the other to its provider Ø Mentioned in [Wu, Co. NEXT’ 07], but without evaluation p Our main focus: u. How much can we gain from these two potential resources, i. e. , IXP and PR? 5/21

Dataset for Evaluation p Most complete AS topology graph u. BGP data Ø Route Dataset for Evaluation p Most complete AS topology graph u. BGP data Ø Route Views, RIPE/RIS, Abilene, CERNET BGP View u. P 2 P traceroute Ø Traceroute data from 992, 000 IPs in over 3, 700 ASes u. In total, 120 K AS links with AS relationships uhttp: //aqualab. cs. northwestern. edu/projects/Sidewalk. End s. html [Chen et al, Co. NEXT’ 09] p IXP data u. PCH + Peeringdb + Euro-IX (~200 IXPs) u 3468 participant ASes 6/21

Failure Models p Tier-1 depeering u. Real example: Cogent and Level 3 depeering p Failure Models p Tier-1 depeering u. Real example: Cogent and Level 3 depeering p Tier-1 provider-customer link teardown u. Reported in NANOG forum p Mixed types of link breakdown u 9. 11 event, Taiwan earthquakes, 2003 Northeast blackout 7/21

Evaluation Metrics p Recovery Ratio u# of recovered <src-dst> AS pairs versus total # Evaluation Metrics p Recovery Ratio u# of recovered AS pairs versus total # of affected AS pairs p Path Diversity u# of increased link-disjoint AS paths between affected AS pairs p Shifted Path u# of link-disjoint AS paths shifted onto a normal link after we use IXP or PR resources 8/21

Results: Tier-1 Depeering p 36 experiments for 9 Tier-1 ASes p Recovery ratio: most Results: Tier-1 Depeering p 36 experiments for 9 Tier-1 ASes p Recovery ratio: most of the lost AS pairs can be recovered 9/21

Results: Tier-1 Depeering p Path diversity: multiple AS paths between lost AS pairs 10/21 Results: Tier-1 Depeering p Path diversity: multiple AS paths between lost AS pairs 10/21

Results: Tier-1 Depeering p Shifted path u. On average, 3. 75 ~ 17. 2 Results: Tier-1 Depeering p Shifted path u. On average, 3. 75 ~ 17. 2 for all 36 experiments u. Moderate traffic load shifted onto the unaffected links 11/21

Economic model p B pays to A for recovery A A peer P-C B Economic model p B pays to A for recovery A A peer P-C B A IXP P-C B B p Business model u. Risk alliance (like airlines): price is determined beforehand upay on bandwidth & duration or bits (95 percentile) 12/21

Communication channel p Search for peers u. Have direct connections to peers p Search Communication channel p Search for peers u. Have direct connections to peers p Search for co-located ASes in the same IXP u. ASes are connected by switches in modern IXPs u. Messages are broadcasted with the help of the switches u. Message confidentiality with public key crypto 13/21

Automatic communication p Query message (failed AS) uwho connected to specific destination ASes p Automatic communication p Query message (failed AS) uwho connected to specific destination ASes p Reply message (surviving AS) u. I can provide BW 1 bandwidth to the destination AS p ACK (failed AS) u. I would like buy BW 2 (<=BW 1) p Set up BGP sessions p Withdraw BGP sessions 14/21

Check available connectivity & bandwidth p Connectivity utraceroute p Available bandwidth u. Maximum capacity Check available connectivity & bandwidth p Connectivity utraceroute p Available bandwidth u. Maximum capacity is already known u. Estimate the amount which has been used Ø Y. Zhang, M. Roughan, N. Duffield, and A. Greenberg, “Fast Accurate Computation of Large-Scale IP Traffic Matrices from Link Loads, ” ACM SIGMETRICS, 2003. u. Subtract 15/21

Optimal selection of helper ISPs p From a single victim ISP perspective u. Buy Optimal selection of helper ISPs p From a single victim ISP perspective u. Buy transit from a minimal number of ASes u. Recover all the (prioritized) traffic u. Least cost 16/21

Selection heuristic Lost connectivity to {Di}, with bandwidth demand {Bi} is how much bandwidth Selection heuristic Lost connectivity to {Di}, with bandwidth demand {Bi} is how much bandwidth AS j could provide to Di; 17/21

Selection heuristic Lost connectivity to {Di}, with bandwidth demand {Bi} Score each (helper) AS Selection heuristic Lost connectivity to {Di}, with bandwidth demand {Bi} Score each (helper) AS j with Select the AS with largest score (select the one with lowest price if same score) 3 2. 3 5 2. 1 18/21

Selection heuristic Update Lost connectivity to {Di}, with bandwidth demand {Bi} updated 19/21 Selection heuristic Update Lost connectivity to {Di}, with bandwidth demand {Bi} updated 19/21

Selection heuristic rescore and select Lost connectivity to {Di}, with bandwidth demand {Bi} 1 Selection heuristic rescore and select Lost connectivity to {Di}, with bandwidth demand {Bi} 1 0 0. 3 0. 1 20/21

Summary p First work to evaluate the potential routing diversity via IXP and PR Summary p First work to evaluate the potential routing diversity via IXP and PR with the most complete AS topology graph. p 40%-80% of affected AS pairs can be recovered via IXP and PR with multiple paths and moderate shifted paths. p Point out a new venue for Internet failure recovery. p Possible and practical mechanisms to utilize potential routing diversity. p Look forward to feedback and collaborations from IXP/ISPs! 21/21

Thank you! Q&A 22/21 Thank you! Q&A 22/21

Backup 23/21 Backup 23/21

Failure Models p Tier-1 depeering u. Real example: Cogent and Level 3 depeering p Failure Models p Tier-1 depeering u. Real example: Cogent and Level 3 depeering p Tier-1 provider-customer link teardown u. Reported in NANOG forum p Mixed types of link breakdown u 9. 11 event, Taiwan earthquakes, 2003 Northeast blackout 24/21

Results: Tier-1 provider-customer links teardown p Recovery ratio p Path diversity u 4. 64 Results: Tier-1 provider-customer links teardown p Recovery ratio p Path diversity u 4. 64 for 10 Tier-1 provider-customer links teardown u 4. 54 for 20 Tier-1 provider-customer links teardown p Shifted path u The average number of shifted path when 10, 20 and 30 links are damaged are 3. 4, 4. 0 and 4. 2, respectively. 25/21

Results: Mixed types of links breakdown p Taiwan earthquake, 9 big victim ASes p Results: Mixed types of links breakdown p Taiwan earthquake, 9 big victim ASes p Recovery ratio 26/21

Results: Mixed types of links breakdown p Path diversity 27/21 Results: Mixed types of links breakdown p Path diversity 27/21

Results: Mixed types of links breakdown p Shifted path 28/21 Results: Mixed types of links breakdown p Shifted path 28/21

System framework p Adding an Emergency Recovery (ER) module in a router’s control plane System framework p Adding an Emergency Recovery (ER) module in a router’s control plane p Setting up the communications between ER and the Intra-TE Resource Management modules. 29/21

Building communication channel p An example 30/21 Building communication channel p An example 30/21

Optimal selection of ISPs to help p From global view u. Min. shift path Optimal selection of ISPs to help p From global view u. Min. shift path or tuned AS-links ust. recover all the (prioritized) traffic we could or u. Max. recovery ratio ust. shift path or tuned AS-links p From a single ISP u. Min. cost for the ISP ust. recover all the (prioritized) traffic we could or u. Max. recovery ratio ust. cost for the ISP 31/21

Selection heuristic p Lost connectivity to {Di}, with bandwidth demand {Bi} p is how Selection heuristic p Lost connectivity to {Di}, with bandwidth demand {Bi} p is how much bandwidth AS j could provide to Di; p Score each (helper) AS j with p Select the helper AS with largest score (select the one with lowest price if same score) p Update {Di} by deleting the recovered AS p Update {Bi} by subtracting the recovered bandwidth p rescore and select the next helper AS p Iteration till are recovered 32/21