Скачать презентацию Inter-Domain Traffic Engineering Principles Applications and Case Studies Скачать презентацию Inter-Domain Traffic Engineering Principles Applications and Case Studies

9c133a98e4f6c2e4b7034291e4da4c6c.ppt

  • Количество слайдов: 61

Inter-Domain Traffic Engineering Principles, Applications and Case Studies Inter-Domain Traffic Engineering Principles, Applications and Case Studies

Who We Are § Josh Wepman § § Applications Engineer/Snake Oil Salesman Ixia Net. Who We Are § Josh Wepman § § Applications Engineer/Snake Oil Salesman Ixia Net. Ops [email protected] com Joe Abley § § § Toolmaker/Engineer/Token Canadian MFN PAIX [email protected] net

What We Are Talking About § § Inter-domain Measurement, Analysis and Control Improving Connectivity What We Are Talking About § § Inter-domain Measurement, Analysis and Control Improving Connectivity § § § With whom? Where? At what speed?

What we are NOT talking about § § § MPLS Diff. Serv RSVP CR-LDP What we are NOT talking about § § § MPLS Diff. Serv RSVP CR-LDP All sorts of other words with lots of capital letters that have become associated with “traffic engineering…”

Goals For The Afternoon § § § Methods and Concepts on how to Goals For The Afternoon § § § Methods and Concepts on how to "improve" interdomain connectivity § Depending on who YOU are, "improve" will have different meanings Finding ways to reduce impact of failure in peer or transit networks § a. k. a. "increasing reliability“ WARNING: Some operational complexity may arise! § Put on your peril-sensitive glasses. . .

Presentation Outline § § § § Inter-Domain TE Goals Definition Inter-domain TE Measurement Applying Presentation Outline § § § § Inter-Domain TE Goals Definition Inter-domain TE Measurement Applying Data to Address Your Goals Eliciting Control and the Feedback-Loop Conceptual Examples Who is Doing This Stuff? Real_Live_Network Examples No Questions? Good!

Inter-Domain TE Goals Definition § Iteration-1 – Conceptual Define Goals, Measure, Analyze, Refine Goals, Inter-Domain TE Goals Definition § Iteration-1 – Conceptual Define Goals, Measure, Analyze, Refine Goals, Action § What is it you need to accomplish?

Examples of Goals § § § Need to offload my Examples of Goals § § § Need to offload my "NSFnet" peering links outbound (congestion management) Need to expand my inter-domain peering links cluefully (growth) Need to find some people to provide my services to (sales) § That's right, I said it…sell stuff!!!

Adjusting Your Assumptions § § Be prepared to adjust your assumptions based on measured Adjusting Your Assumptions § § Be prepared to adjust your assumptions based on measured data! What you planned to do, and what you end up doing may change substantially. Do not fear - this is real network data! Clue should increase as valid network data becomes available and consulted

Data Needs… § What data sets are required? § § Active measurement data § Data Needs… § What data sets are required? § § Active measurement data § § BGP routing data § § Flow-export data SNMP Some public tools available (cflowd, zebra, ping, scotty, etc) Some commercial products available…

Inter-domain TE Measurement Also Known As: Getting good, problem/goal specific data! Inter-domain TE Measurement Also Known As: Getting good, problem/goal specific data!

Assumed Network Model § § § Hierarchical Network Model Ingress/Egress Network services are separated Assumed Network Model § § § Hierarchical Network Model Ingress/Egress Network services are separated from Transit Services Works in other network models (as we will show), but this is what we are focusing on. . .

Hierarchical Network Model Core Network Services Core 1 Peer 1 Local. ASN Core 2 Hierarchical Network Model Core Network Services Core 1 Peer 1 Local. ASN Core 2 Peer 2 Remote. ASN AS 2 AS 3 AS 9 AS 4

Types of Data to Measure § Routing Data § § Traffic Data § § Types of Data to Measure § Routing Data § § Traffic Data § § Focus here is BGP Flow-export V 5 is the focus here Active Measurement Performance Data § Ping/Traceroute/One-way delay/Jitter

Routing Data § Routers generally do this well § Core competency by design (Routers Routing Data § Routers generally do this well § Core competency by design (Routers route. . . ) § Different data sets are available for measurement § § IBGP (Good if you are looking at the whole system, looking outbound or using a flat network model) Route-Reflection (Often needed for inbound analysis, can create some complexity in flat netowrk models) EBGP (Good for seeing your neighbor's view of you) Choose the right one to measure based on your needs/goals

Routing Data – In/Outbound Core Network Services Core 1 Local. ASN Core 2 Peer Routing Data – In/Outbound Core Network Services Core 1 Local. ASN Core 2 Peer 1 IBGP vs. Route-Reflection Peer 2 Collector Data Remote. ASN AS 2 AS 3 AS 4 Routes AS 9

Routing Data – In/Outbound § When your goal is outbound characterization, and your measurement Routing Data – In/Outbound § When your goal is outbound characterization, and your measurement point is the exit point for traffic, IBGP is your guy/girl/other. § § § Routes are always external, and thus always propagated (sans election and policy of course) “Protocols hate being anthropomorphized” When your goal is inbound characterization, and your measurement point is the entry point for traffic, Route-Reflection must be used. § Only way to get internal routes “cleanly”

Route Data – Full Mesh (tangent) § Value of full mesh monitoring… § § Route Data – Full Mesh (tangent) § Value of full mesh monitoring… § § Historical route tracking Policy benchmarking Tracking med-selection issue Identifying disasters the FIRST time cluefully § § § Don’t just wait for it to happen again! PLEASE! For everyone’s sake! Slightly off topic, but pretty darn important!

Route Data – Full Mesh (pic) Core 1 Core 2 Collector Core 2 Core Route Data – Full Mesh (pic) Core 1 Core 2 Collector Core 2 Core 1

Traffic Accounting Data § Also Known As: § § Flow-export Net. Flow Cflow A Traffic Accounting Data § Also Known As: § § Flow-export Net. Flow Cflow A MAJOR pain in the AS!

The Quick Skinny on Flow § § Packet and Byte counters per unique set The Quick Skinny on Flow § § Packet and Byte counters per unique set of traffic attributes Measured from strategic routers per input interface Which interfaces depends on your defined goals/needs. . . Come a long way in the last few years § In some respects…

Flow Data Inbound - Easy Core Network Services Core 1 Peer 1 Local. ASN Flow Data Inbound - Easy Core Network Services Core 1 Peer 1 Local. ASN Core 2 Peer 2 Collector Data Remote. ASN AS 2 AS 3 AS 4 Routes AS 9

Flow Data Outbound - Easy Core Network Services Core 1 Peer 1 Local. ASN Flow Data Outbound - Easy Core Network Services Core 1 Peer 1 Local. ASN Core 2 Peer 2 Collector Data Remote. ASN AS 2 AS 3 AS 4 Routes AS 9

Flow Data Outbound - Harder AS 2 AS 4 Core AS 6 Core AS Flow Data Outbound - Harder AS 2 AS 4 Core AS 6 Core AS 3

Flow Data Outbound - Harder § § § Since flow-export data is inbound only, Flow Data Outbound - Harder § § § Since flow-export data is inbound only, all potential feeder links in a nonhierarchical, mixed services device must be accounted for in order to catch all traffic outbound Issue: How do you know what data coming in core link 4 is bound for the local external link? Route Reflection is bad here! Can double-count! Problem exacerbated by complex policy

18 Words or less on flow data § § Micro-management of networks based on 18 Words or less on flow data § § Micro-management of networks based on flows == BAD Macro-management of networks based on flows == GOOD

Operational Challenges (1) § Keep this in mind! § Gilb’s Law: § “Anything can Operational Challenges (1) § Keep this in mind! § Gilb’s Law: § “Anything can be measured in a way that is superior to not measuring it at all. ”

Operational Challenges (2) § § ACLs vs. data-export in the great beast! Sampled Net. Operational Challenges (2) § § ACLs vs. data-export in the great beast! Sampled Net. Flow on the GSR is usually distributed to the LCs ACL > SNF > PIRC > IP Coloring > BGP Policy accounting > FR Traffic policing which is not FR traffic shaping Apparently this changes in 12. 0(18)S

Operational Challenges (3) § § Some releases of JUNOS have bugs where only flow Operational Challenges (3) § § Some releases of JUNOS have bugs where only flow data from the highestnumbered if. Index gets exported Check for PR 20159

Operational Challenges (4) § On high-speed interfaces, the best you can realistically do is Operational Challenges (4) § On high-speed interfaces, the best you can realistically do is sample at some ratio < 1: 1 § § If you need to count bytes, this will introduce errors If you need to compare samples, make sure the samples are normalized § § This does NOT mean multiply by interval! Lack of current research on statistical validity of flow data based on samples § § Last research circa 1993 Research predates substantial HTTP traffic

Operational Challenges (5) § § § The Gilb-Wepman Construct: “The total P. I. T. Operational Challenges (5) § § § The Gilb-Wepman Construct: “The total P. I. T. A. factor experienced through the process of network measurement is far less than the total P. I. T. A factor experienced through planning and engineering a network without network measurements. ” P. I. T. A = Pain In The Ass § those without customers may be unfamiliar with this term

Performance Data § Active measurement § Round-trip vs. one-way § mrtg and link utilization Performance Data § Active measurement § Round-trip vs. one-way § mrtg and link utilization § Important, but not part of our examples § § § Short on time sadly… Helps in goal selection and re-selection Bottom line – is it better or worse?

Applying Data to your Goals § § § What to do with all this Applying Data to your Goals § § § What to do with all this data? Traffic Accounting Data applied to Routing data? Traffic Load per § § attribute or route The focus here is on traffic stats (byte and packet rates) per AS-PATH

AS-PATH / Traffic-data tables § Traffic load per AS-PATH creates a tree of traffic AS-PATH / Traffic-data tables § Traffic load per AS-PATH creates a tree of traffic relationships § § (101) X-bits/sec (101, 1234) Y-bits/sec (101, 1234, 9995) Z-bits/sec 101 -> 1234 -> 9995 § § § X+Y+Z -> Z Addresses the middle mile AS’s instead of traditional first or last ASN. Allows "TO“ (source/sink) and "THROUGH“ (transit) values instead of just "TO" values.

Data Aggregation - Time § Aggregate data over timeframes (macro -level view) § § Data Aggregation - Time § Aggregate data over timeframes (macro -level view) § § Long term averages Short term benchmarks § § Of course, short term means “~long term”. Micro-management of networks based on flows § BAD!

Data Aggregation - Interfaces § § Aggregate across the set of interfaces that represent Data Aggregation - Interfaces § § Aggregate across the set of interfaces that represent your problem statement What interfaces am I interested in? § § Can be interface specific (one) Can be router specific (many) Can be domain wide (all) Can be N of M interfaces (some) § Pretty common…

What to do with all this? § What does one do once they have What to do with all this? § What does one do once they have all this data?

Eliciting Control and The Feedback Loop § § § Sit down, Josh Begone with Eliciting Control and The Feedback Loop § § § Sit down, Josh Begone with your Snake Oil It’s time to beat on some routers

Assumptions about your Routing Architecture § § Routes to external networks are in BGP Assumptions about your Routing Architecture § § Routes to external networks are in BGP Your IGP tells you how to find the NEXT_HOP addresses in BGP We select exit points for traffic based on BGP path selection, not some other weird thing If your routing policy differs significantly from this, you have more problems than measurement can solve

Fixing Outbound Traffic § Mark policy on BGP routes at the place where you Fixing Outbound Traffic § Mark policy on BGP routes at the place where you learn them § § General policy -- prefer peering links over expensive transit links, prefer private peering links over public peering links Specific policy -- temporarily avoid NAP X for traffic to AS Y, prefer AS C to reach remote network D

Tweakable Knobs § § LOCAL_PREF MED AS_PATH Check your vendor’s BGP path selection tiebreaker Tweakable Knobs § § LOCAL_PREF MED AS_PATH Check your vendor’s BGP path selection tiebreaker list, and chose a set of knobs that gives you the kind of control your policy dictates

Control of Outbound Traffic § § Danger, Will Robinson! Helpdesk phone may ring Small Control of Outbound Traffic § § Danger, Will Robinson! Helpdesk phone may ring Small change, pause, check, log, pause, breathe, repeat Exit selection is a reasonably precise science

Fixing Inbound Traffic § § § Controlling inbound traffic flow is all about trying Fixing Inbound Traffic § § § Controlling inbound traffic flow is all about trying to influence the BGP path selection decisions which happens in networks you don’t control Some of those networks you pay money to. Money is sometimes an appropriate weapon It’s nice to buy people drinks at NANOG

Tweakable Knobs § Provider-specific knobs § § CIDR abuse § § whois -h whois. Tweakable Knobs § Provider-specific knobs § § CIDR abuse § § whois -h whois. ra. net as 1755 Cheap trick Longest prefix wins AS_PATH stuffing AS_PATH pollution § Another cheap trick

Responsible Citizenship § Some tweakable knobs have an unwelcome impact on the networks of Responsible Citizenship § Some tweakable knobs have an unwelcome impact on the networks of others § § § Have you met my friend, MED? Your relationship with your target networks is symbiotic It is inappropriate to make demands of someone else’s routing policy, but asking nicely is OK

Conceptual Examples (1) § Who are the top consumers of my network resources? § Conceptual Examples (1) § Who are the top consumers of my network resources? § § § Top sources of traffic Top sinks of traffic Asymmetry

Conceptual Examples (2) § Traffic Aggregation Points and Peering Optimisation § § Appropriate network Conceptual Examples (2) § Traffic Aggregation Points and Peering Optimisation § § Appropriate network expansion Offloading the expensive peer § § Mitigating settlement fees and traffic ratios Mitigating congestion § § Do it without MED selection issues Maximize route availibility (N>1 copies, not 1 or 0)

Conceptual Examples (3) § Theft-over-IP (how to know when peers are stealing from you) Conceptual Examples (3) § Theft-over-IP (how to know when peers are stealing from you) § § § Peers dumping traffic at you for routes you didn’t send them Rather rude Catch them in the act

Who is doing this stuff? § Yahoo! - Jeffrey Papen (TUNDRA Tool) § § Who is doing this stuff? § Yahoo! - Jeffrey Papen (TUNDRA Tool) § § Peering Analysis, Capacity Planning, Performance Analysis Features: § Custom macros for AS analysis: § § § Source and Destination AS bandwidth details Transit AS (hop counts) bandwidth summary data Bandwidth forecasting; peering merit analysis Billing formulas for cost/benefit budget analysis Also: § § § Analyze internal usage for Charge Back Billing POP-to-POP Network Performance Analysis (latency / loss) DOS attack detection

Destination vs. Transit Traffic – UUNet (Yahoo – TUNDRA Output) Destination vs. Transit Traffic – UUNet (Yahoo – TUNDRA Output)

Who is doing this stuff? § § § MFN Lots of people, we think Who is doing this stuff? § § § MFN Lots of people, we think Not enough people, we think

Real Live Network Examples 1 § § We peer with a particular large regional Real Live Network Examples 1 § § We peer with a particular large regional ISP in several places. Due to various familiar reasons, the demands on the peering circuits approach supply Who are the top talkers and top listeners that we reach via this peer? Maybe we can peer with them directly Not just sinks, but traffic aggregation points (middle mile)

Network Facts § § Topology is not pure core/edge in some locations, so we Network Facts § § Topology is not pure core/edge in some locations, so we might expect some complexities All peering routers happen to be GSR 12000 s Peering circuits are all OC 12 Backbone links are mostly OC 48

Data Collection § Relative traffic volumes § § § Low Net. Flow sample ratio Data Collection § Relative traffic volumes § § § Low Net. Flow sample ratio is OK Turning on “ip route-cache flow sampled” seems like it can cause traffic belches Turn off all inbound ACLs on peering interfaces Turn off all outbound ACLs on peering routers Drink from the Hose Take off every /var

Analysis of Data § § § Relative byte count through and to networks reached Analysis of Data § § § Relative byte count through and to networks reached through the peer in question Ranked list of peering candidates Absolute numbers don’t really matter; we have a list of people we should be talking to, in order of how useful they would be to peer with

See. ASP Output See. ASP Output

Real Live Network Examples 2 § § AS R wants to peer That’s fine, Real Live Network Examples 2 § § AS R wants to peer That’s fine, we’ll public peer with anybody. We’re easy. AS R wants to private peer right away, since they say we send them 140 M of traffic already Can we confirm those numbers before we dedicate a port to them?

Network Facts § § We currently reach AS R through AS T We peer Network Facts § § We currently reach AS R through AS T We peer with AS T in six places One of the peering routers is a 7500, which doesn’t do SNF One of the peering routers is a router which is also being used to collect data to answer the previous question

More Network Facts § § § Topology is not edge/core everywhere We want numbers More Network Facts § § § Topology is not edge/core everywhere We want numbers out of this, so we need to manage the SNF ratios K 1 dd 13 s keep attacking the routers § § § Ops folk attack K 1 dd 13 s with ACLs The ACL attacks the SNF The SNF dies!

Analysis § § § We only have traffic samples, but we want absolute numbers Analysis § § § We only have traffic samples, but we want absolute numbers We have interface byte and packet counters We can take AS R traffic as a proportion of all AS T traffic, and divide up the mrtg/duck data in proportion

Summary § What did we talk about? § § § What didn’t we talk Summary § What did we talk about? § § § What didn’t we talk about? § § Answering specific, ad-hoc questions by attacking them with numbers Inter-Domain Traffic Engineering is an Iterative process (lather, rinse, repeat) Experience exporting from Juniper (and other noncisco) routers Construction of a full-time, general-purpose measurement infrastructure What if my vendor does not support flow-export and traffic accounting? Questions? § No? Good.