fc21225caae20e5ba4c9148e1080c2f1.ppt
- Количество слайдов: 160
DCR Project Internet Routing System (Model) Material extracted from tutorial on BGP routing course presented in training seminar (ECODE FP 7 Project) Dimitri Papadimitriou and Olivier Bonaventure Alcatel-Lucent BELL - Universite catholique de Louvain (UCL)
Outline § Internet Routing and Protocols Organization of the Internet Intra-domain routing – – Static routing Distance vector routing Link-state routing Path vector routing Inter-domain routing § BGP basics § BGP in large networks
TCP/IP Model Interface between application and OS (Sockets interface) Application (FTP, Telnet, WWW, email) Unreliable datagram delivery User datagram protocol (UDP) Transmission control protocol (TCP) Reliable byte stream delivery Network Layer: Internet Protocol (IP) Best effort datagram delivery (host-host) Data Link Layer: Ethernet Physical interface (PHY) Physical connection
Internetworking - Layer 3: Routers Telnet, FTP, HTTP, email application transport network Ethernet data link 100 Base. TX Gb. E PHY physical Host on network A Router (forwards IP datagrams) Host on network B IP
Internet Routing Internet domains comprises devices called routers comprising a routing and a forwarding engine (and a management agent) Routing engine: Process routing information (exchanged between routers using a routing protocols such as BGP) so as to compute routes (using a shortest path algorithms) Routes entries (composed by a destination, a next-hop interface, and a metric) are stored in routing information bases (RIB) Routing entries are subsequently used by the forwarding engine Forwarding engine: Transfer incoming IP datagram to an outgoing interface directed towards a router closer (next-hop) to the traffic destination by performing a longest match prefix lookup on forwarding entries stored in forwarding information base (FIB) using the incoming IP datagram destination address
What is Routing? § Routing: Components: routing information exchange, algorithm, routing information base (RIB) RIB entries
Architecture of a normal IP router Routing protocol Routing table The "best" paths selected from the routing table built by the routing protocols are installed in the forwarding table Shap. IP packets Forwarding Table Control IP packets Class. Pol Forwarding Shap. Class. Pol Forwarding decision based on longest prefix match Update of TTL and checksum fields in IP packets
Simplified router model § Router: Simulator §Data structures: Router Routing engine Routing information Routing info processing Routing §Processing: The "best" paths selected from routing table built by the routing protocol are installed in the forwarding table (FT) I/f FT Forwarding engine Routing table information RIB IP datagram I/f Forwarding decision based on longest prefix match Update of TTL and checksum fields in IP packets RIB (or RT) = Routing Information Base FT = Forwarding table Note: Forwarding Information Base (that populates FT) is derived from RIB IP datagram Routing information processing: – – – Route selection Route computation Route dissemination -- No simulation of the forwarding table
Routing Information Base (RIB) § Repository storing in which all IP Routing protocols place all of their routes (routing entries) § RIB is not specific to any routing protocol, rather, it is the repository where all the routing protocols place all of their routes § Routes are inserted into the RIB whenever a routing protocol learns/computes a new route When a destination becomes unreachable, the route is first marked unusable and later removed from the RIB as per the specifications of the routing protocol they were learned from Note: RIB is NOT used forwarding IP datagrams RIB is also referred to as Routing Table (RT)
Routing vs Forwarding (RIB vs FIB) § Routing Information Base (RIB): table that contains all destinations to which the router may forward IP datagrams § RIB entries can be used to populate the FIB based on some selection criteria. These entries can also be used as a source to re-advertise routing information RIB may contain multiple paths to the same destination (possibly with the same or different degree of preference and possibly advertised from multiple sources) Forwarding Information Base (FIB): table containing the information necessary to forward IP datagrams Describes a database indexing network prefixes versus router interface identifiers At minimum, contains the interface identifier and next hop information for each reachable destination network prefix Note: contains unique paths only (i. e. does not contain secondary paths)
Dynamic Routing Operations § The basic functions of dynamic routing are: § Processing of routing information: route computation (algorithm) resulting in routing table entries) Maintenance of the RIB Distribution of routing information (routes, network topology state, etc. ) Routing functions implemented by routing engine
Internet Routing Protocols § Interior Gateway Protocol (IGP) Routing of IP datagrams inside each domain Only knows topology of its own domain (all routers within given AS managed by a single admin unit) Domain 4 Domain 2 Domain 1 § Exterior Gateway Protocol (EGP) Domain 3 Routing of IP packets between domains Each domain is considered as a blackbox
Inter vs Intra-domain Routing Protocols IGP: Intra-domain routing (within AS) Allow routers to transmit IP packets towards their destination along the best path = shortest-path (metrics: #hops, link cost) IGP routing protocols: distance vector RIPv 2 or link state OSPFv 2 (RFC 2328) and IS-IS (RFC 1195) All routers exchange routing information: each domain router can obtain routing information for the whole domain (all routers within given AS are managed by a single administrative unit) e. BGP IGP e. BGP i. BGP e. BGP EGP: Inter-domain routing (between AS) Routing policies based on business relationships No common metrics, and limited cooperation Policy-based, path-vector routing protocol: external/internal Border Gateway Protocol (e. BGP/i. BGP)
Inter vs Intra-domain Routing Protocols AS 16 Step 1: AS as abstract nodes (e. BGP only) e. BGP Abstract node router e. BGP AS 15 Step 2: AS with internal structure (i. BGP) i. BGP AS 76 AS 16 e. BGP AS 15 i. BGP e. BGP i. BGP AS 76
Outline § Internet Routing and Protocols Organization of the Internet – – Topology Types of domains – – Transit domain Stub domain – – Static routing Distance vector routing Link-state routing – Example of domain Path vector routing Intra-domain routing Inter-domain routing § BGP basics § BGP in large networks
Organization of the Internet § Internet: infrastructure composed by an interconnected set of (heterogeneous) networks architected around a distributed routing system that is partitioned into independently administrated domains (autonomous systems) § A domain is a set of routers, links, hosts and local area networks under the same administrative control A domain can be very large. . . – A domain can be very small. . . – § AS 568: SUMNET-AS DISO-UNRRA contains 73154560 IP addresses AS 2111: IST-ATRIUM TE Experiment a single PC running Linux. . . Internet is composed of ~ 30. 000 autonomous systems (AS)
Organization of the Internet § Domains are interconnected in various ways The interconnection of all domains should in theory allow packets to be sent anywhere Usually IP datagram will need to cross a few ASes (3 to 4, average 3. 4) to reach its destination
Evolution of the Internet Topology (1) § § § 1986: NSF builds NSFNet as backbone, links 6 supercomputer centers, 56 kbps; huge increase of connections, especially from universities 1987: 10, 000 hosts - 1989: 100, 000 hosts - 1992: 1 million hosts 1988: NSFNet backbone upgrades to 1. 5 Mbps 1991: NSF lifts restrictions on the commercial use of the Net; 1994: NSF reverts back to research network (v. BNS); the backbone of the Internet consists of multiple private backbones Before ‘ 95: Strict hierarchical network with a single central backbone NSFNet Backbone Regional Campus
Evolution of the Internet Topology (2) § § Between 1995 -1999: increased meshedness between ISP backbones and customers Decentralization: from a single backbone network to a conglomeration of 100 s of backbone and 1000 s ISP Loss of hierarchy and abstraction: from hierarchical network to increasingly meshed interconnection Significant bandwidth increase: from T 3 (45 MB) and T 1 (1 MB) to OC 48 (2. 5 GB) and OC 12 (622 MB) link capacity AS 1 AS 2 R 1 AS 4 AS 3 R 4
Evolution of the Internet Topology (3) Can be viewed as structured into tiers Tier-1 ISPs a. k. a backbone providers – – – Dozen (12 to 20 AS) of large international or large national ISPs interconnected by multiple private peering points (shared cost) Provide transit service (no “upstream” provider) Examples: AT&T, Verizon, Sprint, Level 3, etc. Tier-2 ISPs – – Regional or National ISPs (order 1 k AS) Customer of T 1 ISP(s) - at least 1 and often 2 - and Provider of T 3 ISP(s) Shared-cost with other T 2 ISPs Examples: France Telecom, BT, Belgacom – – – Smaller ISPs, Corporate Networks, Content providers (order 10 k AS) Customers of T 2 or T 1 ISPs (no transit service to other ISPs) Shared-cost with other T 3 ISPs – – Tier-3 ISPs a. k. a stub AS Interconnections An ISP runs (private) Points of Presence (Po. P) where its customers and other ISPs connect to it ISPs also connect at (public) Network Access Point (NAP) called public peering
Tier-1 ISP “Tier-1” ISPs (a. k. a. backbone providers e. g. , AT&T, Verizon, Sprint, Level 3, Qwest): national/ international coverage treating each other as equals (peers) Tier-1 providers interconnect privately = multiple private peering Tier-1 providers also interconnect at public network access points (NAPs) = public peering Tier 1 ISP NAP Tier 1 ISP
Tier-2 ISP “Tier-2” ISPs (often regional-national): ISPs that connect to one or more Tier-1 ISPs, possibly other Tier-2 ISPs Tier-2 ISP pays Tier-1 ISP for connectivity to rest of Internet Tier-2 ISP is customer of Tier-1 ISP Tier-2 ISPs also peer privately with each other, and publicly interconnect at NAP Tier 1 ISP Tier-2 ISP Po. P Tier-2 ISP
Tier-3 ISP “Tier-3” ISPs: last hop (“access”) network (closest to end systems) Tier-3 ISP Tier-2 ISP Tier- 3 ISPs are customers of higher tier ISPs connecting them to rest of Internet Tier-3 ISP Tier-2 ISP Tier 1 ISP NAP Tier 1 ISP Tier-2 ISP Po. P Tier-3 ISP
Organization of the Internet Tier-1 ISPs – – Dozen of large ISPs interconnected by shared-cost Provide transit service – Uunet, Level 3, Sprint, . . . Tier-2 ISPs – – Regional or National ISPs Customer of T 1 ISP(s) Provider of T 2 ISP(s) Shared-cost with other T 2 ISPs – France Telecom, BT, Belgacom Tier-3 ISPs – – – Smaller ISPs, Corporate Networks, Content providers Customers of T 2 or T 1 ISPs Shared-cost with other T 3 ISPs
AS Ranking § Proposing two ranking methods: Degree-based: ASes are ranked by their degrees in the AS topology graph: http: //as-rank. caida. org/ AS-relationship-based: ASes are ranked by their customer cone sizes See http: //as-rank. caida. org/data Route. Views BGP AS links annotated with inferred relationships Dataset date: 20080818 Alpha parameter of inference algorithm: 0. 01000 Format:
Summary § Based on AS connectivity and relationships, the Internet routing infrastructure can be viewed as a three tier hierarchy Core: consisting of a dozen or so Tier-1 providers forming the top level of the hierarchy Middle: consisting of few thousands of ASes (Tier-2 providers) that provide transit service but are not part of the core Edge: 10 thousands of stub ASes that do not provide transit service. Usually, local ISP, ASP and CSP
Outline § Internet Routing and Protocols Organization of the Internet – – Topology Types of domains – – Transit domain Stub domain – – Static routing Distance vector routing Link-state routing – Example of domain Path vector routing Intra-domain routing Inter-domain routing § BGP basics § BGP in large networks
Types of domains § The Internet consists of routing domains: Autonomous Systems (AS) interconnected with each other: Transit domain: provider, hooking many AS together Stub domain: smaller corporation/domain: – – § At least one and usually two connections to other domain No transit service to other domains Two-level routing: Intra-domain: administrator responsible for choice of routing protocol within network (usually link-state routing protocol) Inter-domain: standard for interdomain routing: BGP
Types of domains (1) § Transit domain A transit domain allows external domains to use its own infrastructure to send packets to other domains S 1 S 2 § Examples T 2 T 1 T 3 S 4 S 3 UUNet, Open. Transit, GEANT, Internet 2, RENATER, EQUANT, BT, Telia, Level 3, . . .
Types of domains (2) Stub domains A stub domain does not allow external domains to use its infrastructure to send packets to other domains A stub is connected to at least one transit domain – – Single-homed stub : connected to one transit domain Dual-homed stub : connected to two transit domains S 1 S 2 T 1 T 3 S 4 S 3 Content stub domain (Content Service Provider) – Large web servers : Yahoo, Google, MSN, TF 1, BBC, . . . – ISPs providing Internet access via CATV, ADSL, . . . Access-rich stub domain (Access Service Provider)
Multihomed domains § Definition: use of redundant network links/connections to the same or different domain for the purposes of external connectivity § Objective: § Robustness in case of failure (link, upstream domain) Performance (load balancing) Cost Multi-homed stub AS: connectivity to multiple immediate upstream transit domains T 2 T 1 § Multi-homed transit AS T 3 S 3
A transit domain : Easynet
A transit domain : GEANT
A transit domain : BT/IGnite
A large transit domain : UUNet
Composition of Internet paths § Most Internet paths contain a sequence of 0 or more Customer->Provider relationships 0 or 1 Peer-to-Peer relationships 0 or more Provider->Customer relationships AS 1 AS 2 $ $ $ Shared-cost (peering) $ $ AS 9 $ AS 4 AS 3 AS 8 $ $ AS 7 Customer-provider
Outline § Internet Routing and Protocols Organization of the Internet Intra-domain routing – – Static routing Distance vector routing Link-state routing Path vector routing Inter-domain routing § BGP basics § BGP in large networks
Intradomain Routing § Goal Allow routers to transmit IP datagrams along the best path towards their destination – Best usually means the shortest path – – § Shortest measured in sum of link costs or as number of hops along the path Sometimes best means the less loaded path Allow to find alternate routes in case of failures Behavior All routers exchange routing information – – Each domain router can obtain routing information for the whole domain The network operator or the routing protocol selects the cost of each link
Intradomain Routing Protocols § Static routing § Distance vector routing § Only useful in very small domains Routing Information Protocol (RIPv 2 - RFC 2453) Still used in small domains despite its limitations Link-state routing Open Shortest Path First (OSPF - RFC 2328) Developed to address the needs of large, scalable internetworks that RIP could not. Widely used in enterprise networks and ISPs Integrated Intermediate System-Intermediate System (ISIS - RFC 1195) or Dual IS-IS Extended version of IS-IS (RFC 1142) that supports multiple routed protocols including IP Widely used by ISPs
Distance Vector Routing: Bellman-Ford § Distance Vector routing algorithm (a. k. a. Bellman-Ford algorithm) Each router maintains a vector with an entry for every destination that contains – – Distance = cost to reach the destination from this router Direction = direct link that is on that least cost path the IP address prefix the distance between itself and the destination Each router, periodically sends its vector to his direct neighbors containing, for each known prefix: Upon receiving a vector, a router updates the local vector based on the direct link’s cost and the received vector § Consequently Distant Vector (DV) routing algorithm do not allow a router to know the exact topology of an internetwork Routers discover the best path to destinations based on accumulated metrics from each neighbor
Distance Vector Routing: Bellman-Ford § Define distances at each node x § dx(y) = cost of least-cost path from x to y Update distances based on neighbors (each node only has a “next-hop-view”) dx(y) = min {c(x, v) + dv(y)} over all neighbors v v 3 u y 2 1 1 x 2 Every node sends its vector to its directly connected neighbors 4 z 1 5 w t 3 4 s du(z) = min{c(u, v) + dv(z), c(u, w) + dw(z)} Upon receiving a vector, a router updates the local vector based on the direct link’s cost and the received vector After a few iterations, the routing table converges to a consistent state
Distance Vector Routing: Count-to-Infinity § Routing loops can occur when routers propagate reverse route (= route pointing back to the router from which packets were received) § “Count-to-infinity” problem (loop between three or more routers = cycle in network topology graph) A-D link failure B 1 A 1 D § A tells B and C that D is unreachable 1 1 C B tells A that D is reachable via C with cost=3 (since route is through C, split horizon doesn’t apply) A tells C that D is reachable through A (cost=4) Etc… Rule of thumb: with distance vectors good news travel quickly and bad news travel slowly (resulting in slow convergence)
Distance Vector Routing: Count-to-Infinity § How to avoid the count-to-infinity problem ? Define “infinity” as a max. distance (16 is often used in distance vector routing protocols) to prevent routing updates from looping endlessly Avoiding (two-hop) routing loops using Split horizon: technique for preventing reverse routes between two routers – Simple Split Horizon: when sending updates out an interface, do not send networks that were learned from an update that came in on the same interface Rule: do no send back vector information in the direction where it came – Split Horizon with Poisoned Reverse: when sending updates out a particular interface, mark any networks that were learned from an update that received on the interface as unreachable (considered safer and stronger) Rule: do send back vector information with distance set to infinity – Note: Split horizon does not solve “count-to-infinity” problem for loops (cycle) between three or more routers
Link State Routing § Assumptions: each router can discover the state of the link to its neighbors and the cost of each link § Basic principle: If every router knows how to reach its directly connected neighbors, one can distribute the local knowledge of each router to all other routers so that every router can construct a weighted graph. With this knowledge, a node can always determine the shortest path to any other router A N 1 R 1 Process to update this routing table R 1 Topology change in Link State Update R 2 R 3 Process to update this routing table N 2 B R 2 R 3 Routing table Topological Database LSDB SPF Process to update this routing table Each router has its own topological database on which the SPF algorithm is run SPF Tree
Terminology (1) § Link: interface on a router § An interface has state information associated with it, obtained from the underlying lower level protocols and the routing protocol itself – State: the functional level of an interface that determines whether or not full adjacencies are allowed to form over the interface Link state (LS): description of router interface (= link) § Interface: connection between a router and one of its attached networks (an interface is referred to as a link) A single IP interface address and interface mask (unless the network is an unnumbered point-to-point network) Output cost(s): cost of sending data packet on the interface, expressed in the link state metric (advertised as the interface link cost). The cost of an interface must be greater than zero List of neighboring routers: other routers attached through this link Flooding: mechanism used to reliably distribute local topology description so as keep routers Link State Database (LSDB) up to date
Terminology (2) § Link State PDU § Unit of data describing the local state of router's interfaces and adjacencies Each LS PDU is flooded throughout the “routing domain” The collected LS advertisements of all routers and networks forms the link state database Link State Database (LSDB) Logically separated LSDB for each area the router is connected to Two routers interfacing the same area must have, for that area, identical LSDBs LSDB collection of all LS PDUs (re-)originated from the area's routers – – Each router advertises directly connected networks via LS PDU Every router has it’s own view of the network – it builds a “topologic database” 14. 2. 2. 0/24 32. 2. 9. 0/24 Metric = 1 R 2 32. 2. 5. 0/24 Metric = 1 R 4 32. 2. 5. 0/24 R 2 1 32. 2. 9. 0/24 R 3 1 R 1 32. 2. 3. 0/24 R 2 2 R 2 Metric = 1 Cost R 1 R 3 Next Hop 14. 2. 2. 0/24 R 1 1 R 2 32. 2. 9. 0/24 R 3 1 R 2 R 1 Destination R 1 Metric = 1 Router 32. 2. 3. 0/24 R 4 1 32. 2. 3. 0/24 Router 1 is aware of 2 paths to 32. 2. 3. 0/24 – this provides redundancy should one of the routers fail
Terminology (3) § Shortest Path computation: performed on the link state database in order to produce a router's routing table § Shortest Path Tree (SPT) § Each router has an identical LSDB, leading to an identical representation of the network topology graph Each router generates its routing table from this graph by computing a tree of shortest paths with the local router as root of this tree Derived from the collected LS PDUs using the Dijkstra algorithm (Shortest Path algorithm) Shortest-path tree with local router as root, gives the shortest path to any IP dest. network or host (only the next hop to the destination is used in the forwarding process) Routing table (a. k. a. Routing Information Base or RIB) Derived from the LSDB using the SPF algorithm Each entry of this table is indexed by a destination, and contains the destination's cost and a set of paths (described by its type and next hop) to use in forwarding packets to the destination
Link State Protocol and SPF Algorithm R 2 Link State PDU R 1 Topological Database N 1 R 2 R 3 Routing table Topological Database LSDB Route SPF LSDB B Topological Database Routing table R 1 N 2 Routing table R 4 R 3 A LSDB SPF SPF Algorithm SP Tree Link-state protocols Flood reliably routing information to all routers of the same routing domain Each router knows the topology of the entire routing domain Converge more quickly than distance vector protocols (link-state protocols are less prone to routing loops)
Link State Concept (1) 1. Flooding of link-state information Link-state PDUs (LS PDUs): PDU carrying routing information exchanged between routers R 1 Topological database: 2. Building a collection of information topological DB gathered from LS PDUs Shortest Path First (SPF) Topological Database Entries algorithm: performed on the database resulting in the SPF tree Routing tables: A LSDB repository of the known paths and interfaces R 2 Link State PDU 5. Routing Table R 4 Routing Table R 3 4. SP Tree 3. SPF Algorithm Routers send LS PDUs to their adjacent neighbors LS PDUs are used to build a topological database SPF algorithm computes the shortest path first tree in which the root is the individual router and then a routing table is created Resulting routing entries populates the routing table Route
Link State Concept (2) 1. Flooding of linkstate information R 2 R 1 Link State PDU 5. Routing Table R 4 2. Building a topological DB Routing Table R 3 Topological Database Entries 4. SP Tree Route 3. SPF LSDB Algorithm 1. Flooding of link-state information Each node, router, on the network announces to other all other routers on the network its own piece of link-state information (including who their neighboring routers are and the cost of the link between them) – Example: “Hi, I’m Router 1, and I can reach Router 2 via a T 1 link and I can reach Router 3 via an Ethernet link. ” Each router sends these announcements to all of the routers in the network
Link State Concept (3) 1. Flooding of linkstate information R 2 R 1 Link State PDU 5. Routing Table R 4 2. Building a topological DB Routing Table R 3 Topological Database Entries 4. SP Tree Route 3. SPF LSDB Algorithm 2. Building a Topological Database Each router collects link-state information from other routers into LSDB 3. Shortest-Path First (SPF), Dijkstra’s Algorithm Using this information, routers can recreate a network topology graph Path-selection model (to destination prefix): minimum hop count or sum of link costs
Link State Concept (4) 1. Flooding of linkstate information R 2 R 1 Link State PDU 5. Routing Table R 4 2. Building a topological DB Routing Table R 3 Topological Database Entries 4. SP Tree Route 3. SPF LSDB Algorithm 4. Shortest Path Tree The SPF algorithm computes the SP tree, with the local router as the root of the tree and the other routers and links to other routers as the various branches of the tree 5. Routing Table Using this information, the router creates a routing table populated by routing entries
Link State Advertisement Processing LSA LSU Is entry in LSDB ? No A Add to database Yes Is seq. # the same? Ignore LSA Yes No Is seq. # higher? Send LS Ack Yes No Flood LSA Run SPF to calculate new routing table End Send LSU with newer information to source End Go to A
Shortest-Path Problem Given: network topology with link costs § c(x, y): link cost from node x to node y c(x, y) set to infinity if x and y are not direct neighbors Compute: from a given source (u) least-cost paths to all other nodes § p(i): predecessor node along path from source to i v 3 v y 2 1 1 u x 2 3 u 1 I/f 2 5 x 2 3 4 s 4 z 1 5 t w 1 1 I/f 1 z 4 y 2 w t 3 4 s Shortest-path tree from u
Dijsktra’s Algorithm (iterative algorithm) Notation c(x, y): link cost from node x to y c(x, y) = infinity if x and y not direct neighbors D(v): current value of cost of path from source to dest. v S: set of nodes whose least cost path definitively known Initialization S = {u} /* u is the source node for all nodes v if v adjacent to u then D(v) = c(u, v) else D(v) = ∞ Repeat find w not in S such that D(w) is a minimum add w to S update D(v) for all v adjacent to w and not in S D(v) = min( D(v), D(w) + c(w, v) ) until all nodes are in S
Shortest-Path Tree: Example § Shortest-path tree from u v I/f 2 1 5 w t 3 4 v 3 w 2 x w 3 y z Cost w 4 Next-hop v 1 x 2 Routing table at u Destination 1 I/f 1 u y 2 3 § v 5 z v 6 s w 6 t w 8 Destination Interface Next-hop v 1 v w 2 w x 2 w y 1 v z 1 v s 2 w t 2 w s § Forwarding table at u
Outline § Internet Routing and Protocols Organization of the Internet Intra-domain routing – – Static routing Distance vector routing Link-state routing Path vector routing Inter-domain routing § BGP basics § BGP in large networks
Inter-domain routing § Goals Allow to transmit IP packets along the best path towards their destination through several transit domains while taking into account the routing policies of each domain without knowing the detailed topology of those domains – From an interdomain viewpoint, best path often means cheapest path – Each domain is free to specify inside its routing policy the domains for which it agrees to provide a transit service and the method it uses to select the best path to reach destination
Inter-domain Routing Protocols § Border Gateway Protocol (BGP) version 4 (RFC 4271) is an inter-domain routing protocol Exchanges routing information between AS while guaranteeing loop-free path selection BGP protocol is similar to Distance Vector, but called “Path Vector” instead – – BGP router advertises in its vector only reachability information and associated path attributes to each destination (so, avoids loops), no costs or hop counts Unlike IGPs, such as RIPv 2, and OSPFv 2, BGP does not use metrics like hop count, link cost, or delay. Instead, BGP performs its routing decisions (best path selection) based on network policies and route selection rules applied in sequence to various path attributes Supports classless inter-domain routing (CIDR) and route aggregation THE inter-domain routing protocol of the Internet
Routing Protocols: Comparison Link State Dissemination Algorithm Condition Convergence Protocols Distance Vector Path Vector Flood reliably topology information (link state PDU) to all routers within the routing domain Dijsktra’s shortest path Best end-to-end paths are computed locally at each router (determine next hop) Works only if policy is shared and uniform Update distances from neighbors’ distances Update paths based on neighbors’ paths Each router knows little about network topology Bellman-Ford shortest path Best end-to-end paths result from composition of all next-hop choices Notion of distance and direction Fast due to flooding Responds rapidly to topology changes Slow convergence (bouncing effect, countto-infinity) Local policy to rank paths Route update include the entire AS path information to prevent count-to-infinity Does not require any notion of distance Does not require uniform policies at all routers Slow convergence due to path exploration (path OSPFv 2 (RFC 2328) IS-IS (RFC 1195) RIPv 1 (RFC 1058) RIPv 2 (RFC 2453) hunting) BGPv 4
Summary § Types of domains § Transit domain Stub domain Intradomain routing Selects the best route towards each destination based on one metric – – – Static routing Distance vector routing Link-state routing
Outline § Internet Routing § BGP basics § Routing policies The Border Gateway Protocol How to prefer some routes over others BGP in large networks
Domains versus Autonomous Systems § The BGP interdomain routing protocol deals with Autonomous Systems (AS) An AS is defined as “a set of routers under a single technical administration. . . that presents a consistent picture of what destinations are reachable through it. ” § Each AS is identified by its AS number In practice A domain is often equivalent to an AS A domain may be composed of several ASes – Ex: Worldcom uses AS 701, AS 702, . . . – Ex: small networks connected to one provider without using BGP Many domains do not have an AS number
Types of interdomain links § Two types of interdomain links Private link – Usually a leased line between two routers belonging to the two connected domains R 2 R 1 Domain. A Domain. B Connection via a public interconnection point – Usually Gigabit or higher Ethernet switch that interconnects routers belonging to different domains Physical link Interdomain link R 2 R 3 R 1 R 4
Routing policies § In theory BGP allows each domain to define its own routing policy. . . § In practice there are two common policies Customer-provider peering – Customer c buys Internet connectivity from provider P Shared-cost peering – Domains x and y agree to exchange packets by using a direct link or through an interconnection point
Customer-provider peering AS 1 $ AS 3 AS 2 $ $ Customer AS 4 $ $ AS 7 Principle – Customer sends to its provider its internal routes and the routes learned from its own customers – – Provider will advertise those routes to the entire Internet to allow anyone to reach the Customer Provider sends to its customers all known routes – Customer will be able to reach anyone on the Internet Provider
Shared-cost peering AS 1 $ AS 3 AS 2 $ $ $ AS 4 Shared-cost Customer-provider $ AS 7 Principle – Peer. X sends to Peer. Y its internal routes and the routes learned from its own customers – – – Peer. Y will use shared link to reach Peer. X and Peer. X's customers Peer. X's providers are not reachable via the shared link Peer. Y sends to Peer. X its internal routes and the routes learned from its own customers – – Peer. X will use shared link to reach Peer. Y and Peer. Y's customers Peer. Y's providers are not reachable via the shared link
Routing policies § A domain specifies its routing policy by defining on each BGP router two sets of filters for each peer Import filter – Export filter – § Specifies which routes can be accepted by the router among all the received routes from a given peer Specifies which routes can be advertised by the router to a given peer Filters can be defined in RPSL Routing Policy Specification Language
Outline § Internet Routing § BGP basics § Routing policies The Border Gateway Protocol How to prefer some routes over others BGP in large networks
The Border Gateway Protocol § Principle Path vector protocol – BGP router advertises its best route to each destination AS 5 prefix: 1. 0. 0. 0/8 l. ASPath: AS 1 l 1. 0. 0. 0/8 AS 1 prefix: 1. 0. 0. 0/8 l. ASPath: AS 1 AS 2 l prefix: 1. 0. 0. 0/8 l. ASPath: AS 4: AS 1 l AS 4 . . . with incremental updates – prefix: 1. 0. 0. 0/8 l. ASPath: : : AS 2: AS 4 AS 1 l Advertisements are only sent when their content changes
''Origin'' of the routes announced by BGP § Where do the routes announced by a BGP router come from ? Learned from other BGP routers – – BGP router only propagates the received routes BGP router is configured to advertise some prefixes Drawback : requires manual configuration Advantage : Stable set of advertised prefixes – The prefixes received from the IGP are advertised by the BGP router usually as an aggregate Advantage Static configuration Learned from an Interior Gateway Protocol – – – BGP advertisements follow network state, prefix is automatically withdrawn by BGP it is not reachable via IGP Drawback – BGP announcements will be unstable if IGP is unstable. . .
Policies and BGP § Two mechanisms to support policies in BGP Each domain defines itself which is the best route to reach destination based on the routes learned from its peers – – The chosen best route is not necessarily the ''shortest'' route as with IGPs Only the best route towards each destination can be announced to external peers Each domain determines, on its own, which routes can be advertised to each peer – An AS does not necessarily advertise to all its neighbors all the routes that it knows
BGP Routing Information Base (RIB) § BGP RIB consists of three distinct parts: Adj-RIBs-In – – Loc-RIB – – Stores routing information learned from inbound UPDATE messages received from other BGP speakers These routes are available as input to the Decision Process after applying Import Policy rules (import filter) Contains the local routing information the BGP speaker selects by applying its local policies to the routing information contained in its Adj-RIBs-In These are the routes that will be used by the local BGP speaker Adj-RIBs-Out – – Stores routing information the local BGP speaker selected for advertisement to its peers This routing information will be carried in the local BGP speaker's UPDATE messages and advertised to its peers by means of the local speaker's UPDATE messages after applying Export Policy rules (export filter)
Conceptual model of a BGP router (1) BGP Adj-RIB-In Peer[N] BGP Msgs from Peer[1] All acceptable routes Import filter Attribute manipulation BGP Decision Process Import filter(Peer[i]) Determines which BGP Msgs are acceptable from Peer[i] One best route selection to each destination BGP Loc-RIB BGP Adj-RIB-Out Peer[N] BGP Msgs to Peer[N] Peer[1] Export filter Attribute manipulation BGP Msgs to Peer[1] Export filter(Peer[i]) Determines which routes can be sent to Peer[i] BGP decision process selects the best route towards each destination BGP Loc-RIB: contains the routes that have been selected by the local BGP speaker's Decision Process
Conceptual model of a BGP router (2) Constrained by operator’s policies and configuration language Receive BGP Updates Apply Policy = Input filtering routes & treat attributes Apply Import Policies Apply Policy = Output filtering routes & treat attributes Based on Prefix Attribute BGP Decision process Loc-RIB Best routes Best Route Selection Adj-RIB-Out Adj-RIB-In: contains unprocessed routing information that has been advertised to the local BGP speaker by its peers Loc-RIB: contains the routes that have been selected by the local BGP speaker's Decision Process Adj-RIB-Out: contains the routes for advertisement to specific peers by means of the local speaker's UPDATE messages Apply Export Policies Install forwarding entries for best routes IP Forwarding Table Send BGP Updates
BGP Decision Process When a BGP speaker receives more than one route for the same IPv 4 prefix, the BGP route selection rules for route preference are used to choose which IPv 4 route is installed by BGP Highest Local Preference Enforce relationships Shortest AS-PATH Lowest MED Traffic Engineering i. BGP < e. BGP Lowest IGP cost to BGP egress Lowest router ID Tie breaker Note: not selected/unprocessed routing information is usually maintained in case currently selected information is being withdrawn or superseded
BGP : Principles of operation § Principles BGP relies on the incremental exchange of path vectors BGP session established over TCP connection between peers AS 3 R 1 BGP session Each peer sends all its active routes BGP Msgs R 2 AS 4 As long as the BGP session remains up Incrementally update BGP routing tables
BGP : Principles of operation (2) § Simplified model of BGP 2 types of BGP path vectors UPDATE – – Used to announce a route towards one prefix Content of UPDATE – – – Destination address/prefix Interdomain path used to reach destination (AS-Path) Nexthop (address of the router advertising the route) WITHDRAW – – Used to indicate that a previously announced route is not reachable anymore Content of WITHDRAW – Unreachable destination address/prefix
BGP : Session Initialization Initialize_BGP_Session(Remote. AS, Remote. IP) { /* Initialize and start BGP session */ /* Send BGP OPEN Message to Remote. IP on port 179*/ /* Follow BGP state machine */ /* advertise local routes and routes learned from peers*/ foreach (destination=d inside BGP-Loc-RIB) { B=build_BGP_UPDATE(d); S=apply_export_filter(Remote. AS, B); if (S<>NULL) { /* send UPDATE message */ send_UPDATE(S, Remote. AS, Remote. IP) } } /* entire RIB was sent */ /* new UPDATE will be sent only to reflect local or distant changes in routes */. . . }
Events during a BGP session 1. Addition of a new route to RIB A new internal route was added on local router – – Static route added by configuration Dynamic route learned from IGP Reception of UPDATE message announcing a new or modified route 2. Removal of a route from RIB Removal of an internal route – – Static route is removed from router configuration Intradomain route declared unreachable by IGP Reception of WITHDRAW message l Loss of BGP session All routes learned from this peer removed from RIB
Export and Import filters BGPMsg Apply_export_filter(Remote. AS, BGPMsg) { /* check if Remote AS already received route */ if (Remote. AS isin BGPMsg. ASPath) BGPMsg==NULL; /* Many additional export policies can be configured : */ /* Accept or refuse the BGPMsg */ /* Modify selected attributes inside BGPMsg */ } BGPMsg apply_import_filter(Remote. AS, BGPMsg) { /* check that we are not already inside ASPath */ if (My. AS isin BGPMsg. ASPath) BGPMsg==NULL; /* Many additional import policies can be configured : */ /* Accept or refuse the BGPMsg */ /* Modify selected attributes inside BGPMsg */ }
BGP : Processing of UPDATES Recvd_BGPMsg(Msg, Remote. AS) { B=apply_import_filer(Msg, Remote. AS); if (B==NULL) /* Msg not acceptable */ exit(); if Is. UPDATE(Msg) { Old_Route=Best. Route(Msg. prefix); Insert_in_RIB(Msg); Run_Decision_Process(RIB); if (Best. Route(Msg. prefix)<>Old_Route) { /* best route changed */ B=build_BGP_Message(Msg. prefix); S=apply_export_filter(Remote. AS, B); if (S<>NULL) /* announce best route */ send_UPDATE(S, Remote. AS); else if (Old_Route<>NULL) send_WITHDRAW(Msg. prefix); }. . .
BGP : Processing of WITHDRAW Recvd_Msg(Msg, Remote. AS). . . if Is. WITHDRAW(Msg) { Old_Route=Best. Route(Msg. prefix); Remove_from_RIB(Msg); Run_Decision_Process(RIB); if (Best_Route(Msg. prefix)<>Old_Route) { /* best route changed */ B=build_BGP_Message(d); S=apply_export_filter(Remote. AS, B); if (S<>NULL) /* still one best route */ send_UPDATE(S, Remote. AS, Remote. IP); else if(Old_Route<>NULL)/* no best route anymore */ send_WITHDRAW(Msg. prefix, Remote. AS, Remote. IP); } } }
The BGP messages § Variable length messages With fixed size header 32 bits Marker ( 16 bytes ) : All 11. . . Length : 16 bits Type Max length of BGP messages : 4096 bytes OPEN used to establish BGP session UPDATE used to send new routes and to remove unusable routes NOTIFICATION used to inform the remote peer of an error BGP session is closed upon transmission or reception of NOTIFICATION message KEEPALIVE one message must be sent at least every 30 seconds on each BGP session ROUTE_REFRESH used to support graceful restart
The OPEN message § Used to establish a BGP session between two BGP peers 32 bits Version My AS Number Hold Time BGP Identifier Opt. Len Optional Parameters Variable Length Encoded in TLV Format Currently version 4 AS # of the BGP peer sending the message Hold Time: maximum delay between successive KEEPALIVE, and/or UPDATE messages BGP Id: Usually IP v 4 loopback address of BGP peer Optional field: Used notably for capabilities negotiation
Establishment of a BGP session CONNECT. req SYN(port=179) CONNECT. ind CONNECT. resp CONNECT. conf SYN+ACK(port=179) TCP connection established DATA. req(OPEN) ACK(port=179) TCP connection established DATA(BGP OPEN) ACK DATA. req(OPEN) BGP session established DATA. req(OPEN) DATA(BGP OPEN) ACK BGP session established
The UPDATE message Single message type used to carry both IP v 4 route announcements and route withdrawals 32 bits # Withdrawn routes Variable Length LEN Prefix length in bits Withdrawn prefix (1 -4 octets) Tot. Path Attr. Len Path attributes Variable Length Network Layer Reachability Information Variable Length LEN Prefix length in bits Advertised prefix (1 -4 octets)
BGP Path Attributes BGP update (announcement / withdraw) includes: Path attribute values + Network layer reachability information (NLRI) Reachability information is encoded as one or more 2 -tuples of the form
BGP Path Attributes (2) § § § ORIGIN: mandatory attribute that defines the origin of the path information as generated by the BGP speaker that originates the associated routing information. It is one of: IGP (from a network statement), EGP (from an external peer), Unknown (from IGP redistribution) AS_PATH: mandatory attribute identifying the AS sequence through which routing information carried in the UPDATE message has passed. Provides a mechanism for loop detection. Local AS added only when send to external peer. NEXT_HOP: mandatory attribute that identifies the (unicast) IP address of the router that should be used as the next hop to the destination prefixes listed in the NRLI field of the UPDATE message MED (Multi-Exit Discriminator): optional non-transitive attribute used on external (inter-AS) links to discriminate among multiple exit or entry points to the same neighboring AS. Optionally used by a BGP speaker's Decision Process to discriminate among multiple entry points to a neighboring AS. LOCAL_PREF: attribute included in all UPDATE messages that a given BGP speaker sends to internal peers (i. BGP). A BGP speaker uses it to inform its other internal peers of the advertising speaker's degree of preference for an advertised route => attribute that overrides AS_PATH, and is transitive throughout the network. Never advertised to e. BGP peer
The KEEPALIVE and NOTIFICATION messages § The KEEPALIVE message § BGP Message containing only the default header Every Hold. Time/3 seconds, send a KEEPALIVE message if no recent BGP message was sent The NOTIFICATION message indicates problem in processing of BGP message – BGP session is released upon transmission/reception of NOTIFICATION Err Code Sub. Code Additional data (variable length) Example errors : 2 : OPEN Message Error Unsupported Version, Unsupported Optional Parameter, . . . 3 : UPDATE Message Error Malformed Attribute List, . . . 4 : Hold Timer Expired 5 : Finite State Machine Error 6 : Cease
BGP and IP A first example Initial updates AS 10 UPDATE lprefix: 194. 100. 0. 0/24, l. Next. Hop: R 2 l. ASPath: AS 20: AS 10 UPDATE lprefix: 194. 100. 0. 0/24, l. Next. Hop: R 1 l. ASPath: AS 10 AS 20 R 1 BGP 194. 100. 0. 0/24 R 2 R 3 194. 100. 1. 0/24 BGP AS 30 UPDATE lprefix: 194. 100. 0. 0/24, l. Next. Hop: R 4 l. ASPath: AS 40: AS 10 UPDATE lprefix: 194. 100. 0. 0/24, l. Next. Hop: R 1 l. ASPath: AS 10 R 4 AS 40 What happens if link AS 10 -AS 20 goes down ?
BGP and IP A second example AS 20 AS 10 AS 30 195. 100. 0. 4/30 195. 100. 0. 0/30 R 1 195. 100. 0. 2 R 2 195. 100. 0. 5 195. 100. 0. 6 194. 100. 0. 0/24 BGP R 3 194. 100. 1. 0/24 194. 100. 2. 0/23 UPDATE lprefix: 194. 100. 0. 0/24, l. Next. Hop: 195. 100. 0. 1 l. ASPath: AS 10 UPDATE lprefix: 194. 100. 2. 0/23, l. Next. Hop: 195. 100. 0. 2 l. ASPath: AS 20 Main Path attributes of UPDATE message – – Next. Hop : IP address of router used to reach destination ASPath : Path followed by the route advertisement
BGP and IP A second example (2) AS 20 AS 10 AS 30 195. 100. 0. 4/30 195. 100. 0. 0/30 R 1 195. 100. 0. 2 R 2 195. 100. 0. 5 195. 100. 0. 6 194. 100. 0. 0/24 BGP R 3 194. 100. 1. 0/24 194. 100. 2. 0/23 UPDATE lprefix: 194. 100. 0. 0/24 l. Next. Hop: 195. 100. 0. 5 l. ASPath: AS 20: AS 10 UPDATE lprefix: 194. 100. 2. 0/23 l. Next. Hop: 195. 100. 0. 5 l. ASPath: AS 20 UPDATE lprefix: 194. 100. 1. 0/24, l. Next. Hop: 195. 100. 0. 2 l. ASPath: AS 20; AS 30 UPDATE lprefix: 194. 100. 1. 0/24, l. Next. Hop: 195. 100. 0. 6 l. ASPath: AS 30
BGP and IP A second example (3) AS 20 AS 10 AS 30 195. 100. 0. 4/30 195. 100. 0. 0/30 R 1 195. 100. 0. 2 R 2 195. 100. 0. 5 195. 100. 0. 6 194. 100. 0. 0/24 BGP 194. 100. 1. 0/24 194. 100. 2. 0/23 WITHDRAW lprefix: 194. 100. 1. 0/24 R 3
Outline § Internet Routing § BGP basics § Routing policies The Border Gateway Protocol How to prefer some routes over others BGP in large networks
How to prefer some routes over others ? RA RB AS 2 Backup: 2 Mbps Primary: 34 Mbps R 1 AS 1 – How to ensure that packets will flow on primary link ? RA AS 2 RB R 3 Expensive AS 1 R 1 – AS 3 R 5 Cheap R 2 AS 4 How to prefer cheap link over expensive link ? AS 5
How to prefer some routes over others (2) ? Peer[N] BGP Msgs from Peer[1] Import filter Attribute manipulation BGP RIB All acceptable routes BGP Decision Process One best route to each destination Import filter Selection of acceptable routes Addition of local-pref attribute inside received BGP Msg Normal quality route : local-pref=100 Better than normal route : local-pref=200 Worse than normal route : local-pref=50 Peer[N] Peer[1] Export filter Attribute manipulation BGP Msgs to Peer[N] BGP Msgs to Peer[1] Simplified BGP Decision Process Select routes with highest local-pref If there are several routes, choose routes with the shortest AS Path l. If there are still several routes tie-breaking rule
How to prefer some routes over others (3) ? RA AS 2 Backup: 2 Mbps RB Primary: 34 Mbps R 1 AS 1 RPSL-like policy for AS 1 aut-num: AS 1 import: from AS 2 RA at R 1 set localpref=100; from AS 2 RB at R 1 set localpref=200; accept ANY export: to AS 2 RA at R 1 announce AS 1 to AS 2 RB at R 1 announce AS 1 RPSL-like policy for AS 2 aut-num: AS 2 import: from AS 1 R 1 at RA set localpref=100; from AS 1 R 1 at RB set localpref=200; accept AS 1 export: to AS 1 R 1 at RA announce ANY to AS 2 R 1 at RB announce ANY
How to prefer some routes over others (4) ? RA AS 2 RB R 3 Expensive AS 3 R 5 AS 1 R 1 Cheap R 2 AS 5 AS 4 RPSL policy for AS 1 aut-num: AS 1 import: from AS 2 RA at R 1 set localpref=100; from AS 4 R 2 at R 1 set localpref=200; accept ANY export: to AS 2 RA at R 1 announce AS 1 to AS 4 R 2 at R 1 announce AS 1 – – AS 1 will prefer to send packets over the cheap link But the flow of the packets destined to AS 1 will depend on the routing policy of the other domains
Limitations of local-pref § In theory Each domain is free to define its order of preference for the routes learned from external peers 1. 0. 0. 0/8 Preferred paths for AS 3 1. AS 4: AS 1 2. AS 1 AS 3 – AS 1 Preferred paths for AS 4 1. AS 3: AS 1 2. AS 1 AS 4 How to reach 1. 0. 0. 0/8 from AS 3 and AS 4 ?
Limitations of local-pref (2) § AS 1 sends its UPDATE messages. . . 1. 0. 0. 0/8 UPDATE l. Prefix: 1. 0. 0. 0/8 l. ASPath: AS 1 AS 3 Preferred paths for AS 3 1. AS 4: AS 1 2. AS 1 Routing table for AS 3 1. 0. 0. 0/8 ASPath: AS 1 (best) AS 1 UPDATE l. Prefix: 1. 0. 0. 0/8 l. ASPath: AS 1 AS 4 Preferred paths for AS 4 1. AS 3: AS 1 2. AS 1 Routing table for AS 4 1. 0. 0. 0/8 ASPath: AS 1 (best)
Limitations of local-pref (3) § First possibility AS 3 sends its UPDATE first. . . 1. 0. 0. 0/8 AS 1 Preferred paths for AS 3 1. AS 4: AS 1 2. AS 1 AS 3 Routing table for AS 3 1. 0. 0. 0/8 ASPath: AS 1 (best) – Preferred paths for AS 4 1. AS 3: AS 1 2. AS 1 AS 4 UPDATE l. Prefix: 1. 0. 0. 0/8 l. ASPath: AS 3: AS 1 Stable route assignment Routing table for AS 4 1. 0. 0. 0/8 ASPath: AS 1 1. 0. 0. 0/8 ASPath: AS 3: AS 1 (best)
Limitations of local-pref (4) § Second possibility AS 4 sends its UPDATE first. . . 1. 0. 0. 0/8 AS 1 Preferred paths for AS 3 1. AS 4: AS 1 2. AS 1 AS 3 Routing table for AS 3 1. 0. 0. 0/8 ASPath: AS 1 1. 0. 0. 0/8 ASPath: AS 4: AS 1 (best) – Preferred paths for AS 4 1. AS 3: AS 1 2. AS 1 AS 4 UPDATE l. Prefix: 1. 0. 0. 0/8 l. ASPath: AS 4: AS 1 Routing table for AS 4 1. 0. 0. 0/8 ASPath: AS 1 (best) Another (but different) stable route assignment
Limitations of local-pref (5) § Third possibility AS 3 and AS 4 send their UPDATE together. . . 1. 0. 0. 0/8 AS 1 Preferred paths for AS 3 1. AS 4: AS 1 2. AS 1 AS 3 UPDATE l. Prefix: 1. 0. 0. 0/8 l. ASPath: AS 3: AS 1 – – Preferred paths for AS 4 1. AS 3: AS 1 2. AS 1 AS 4 UPDATE l. Prefix: 1. 0. 0. 0/8 l. ASPath: AS 4: AS 1 AS 3 prefers the indirect path and will thus send withdraw since the chosen best path is via AS 4 prefers the indirect path and will thus send withdraw since the chosen best path is via AS 3
Limitations of local-pref (6) § Third possibility (cont. ) AS 3 and AS 4 send their UPDATE together. . . 1. 0. 0. 0/8 Preferred paths for AS 3 1. AS 4: AS 1 2. AS 1 Preferred paths for AS 4 1. AS 3: AS 1 2. AS 1 AS 3 WITHDRAW l. Prefix: 1. 0. 0. 0/8 – AS 4 WITHDRAW l. Prefix: 1. 0. 0. 0/8 AS 3 learns that the indirect route is not available anymore – – AS 3 will reannounce its direct route. . . – AS 4 will reannounce its direct route. . . AS 4 learns that the indirect route is not available anymore
More limitations of local-pref § Unfortunately, interdomain routing may not converge at all in some cases. . . AS 1 AS 0 Preferred paths for AS 3 1. AS 4: AS 0 2. AS 0 AS 3 – Preferred paths for AS 1 1. AS 3: AS 0 2. AS 0 Preferred paths for AS 4 1. AS 1: AS 0 2. AS 0 AS 4 How to reach a destination inside AS 0 in this case ?
local-pref and economical relationships § In practice, local-pref is often used to enforce economical relationships Prov 1 Prov 2 $ $ Peer 1 AS 1 Peer 2 $ Cust 1 Local-pref values used by AS 1 > 1000 for the routes received from a Customer 500 – 999 for the routes learned from a Peer < 500 for the routes learned from a Provider Peer 3 Peer 4 $ Cust 2 Shared-cost $ Customer-provider
Consequence of this utilization of local-pref § Which route will be used by AS 1 to reach AS 5 ? AS 2 $ AS 3 $ $ AS 4 AS 8 $ $ AS 6 AS 5 $ AS 7 $ Shared-cost $ Customer-provider $ AS 1 and how will AS 5 reach AS 1 ? Internet paths are often asymmetrical
Guidelines for a safe utilisation of local-pref § The directed graph composed of the customer ->provider links is loop-free An AS cannot be a customer of a provider of its providers $ AS 1 $ AS 2 AS 3 $ An AS always prefer a route via a customer over a route via a provider or a peer – With some restrictions on the graph composed of peerto-peer relationships, it is also possible to allow an AS to give the same preference to a route via a customer or via a peer
Summary § Routing policies Two main routing policies – – § Customer-Provider relationship Peer-to-Peer relationship The Border Gateway Protocol Path vector protocol with incremental updates Import and export filters to implement routing policies Utilisation of local-pref – – – Influence BGP decision process Prefer some routes over others Be careful with possible oscillations due to bad setting
Outline § Internet Routing Protocols § BGP basics § BGP in large networks The needs for i. BGP Confederations and Route Reflectors The BGP decision process Scalable routing policies
BGP and IP Second example 194. 100. 2. 0/23 AS 10 195. 100. 0. 2 195. 100. 0. 0/30 R 1 195. 100. 0. 1 194. 100. 0. 0/23 AS 30 R 2 BGP AS 20 195. 100. 0. 8/30 195. 100. 0. 9 194. 100. 4. 0/23 § Problem 195. 100. 0. 6 195. 100. 0. 10 R 3 BGP 195. 100. 0. 4/30 R 4 195. 100. 0. 5 How can R 2 (resp. R 4) advertise to R 4 (resp. R 2) the routes learned from AS 10 (resp. AS 30) ?
BGP and IP Second example (2) 194. 100. 2. 0/23 AS 10 195. 100. 0. 2 195. 100. 0. 0/30 R 1 195. 100. 0. 1 194. 100. 0. 0/23 AS 30 R 2 BGP AS 20 IGP 194. 100. 4. 0/23 BGP 195. 100. 0. 4/30 R 4 195. 100. 0. 5 First solution § R 3 195. 100. 0. 8/30 195. 100. 0. 9 § 195. 100. 0. 6 195. 100. 0. 10 Use IGP (OSPF/ISIS, RIP) to carry BGP routes Drawbacks IGP may not be able to support so many routes IGP does not carry BGP attributes like ASPath !
The AS 7007 incident § The AS 7007 incident AS 7007 AS x RX R 1 4. 0. 0. 0/8 : AS x: AS 3: AS 6 AS Y R 2 RY 4. 0. 0. 0/8 : AS 7007 !!!!!! A single configuration error in two routers – – All routes learned from ASX on R 1 were redistributed to R 2 via IGP and R 2 announced them to ASY Consequence – – – AS 7007 advertised routes that almost all IP addresses were belonging to AS 7007 These routes were shorter than the real routes. . . Two hours of disruption for large parts of the Internet !
i. BGP and e. BGP 194. 100. 2. 0/23 AS 10 195. 100. 0. 2 195. 100. 0. 0/30 R 1 195. 100. 0. 1 e. BGP 194. 100. 0. 0/23 AS 20 § R 2 195. 100. 0. 6 195. 100. 0. 10 R 3 195. 100. 0. 8/30 i. BGP 195. 100. 0. 9 194. 100. 4. 0/23 AS 30 e. BGP 195. 100. 0. 4/30 R 4 195. 100. 0. 5 Solution Use BGP to carry routes between all routers of domain – – – Two different types of BGP sessions e. BGP between routers belonging to different ASes i. BGP between each pair of routers belonging to the same AS – – Each BGP router inside ASx maintains an i. BGP session with all other BGP routers of ASx (full i. BGP mesh) Note that the i. BGP sessions do not necessarily follow physical topology
i. BGP versus e. BGP § Differences between i. BGP and e. BGP local-pref attribute is only carried inside messages sent over i. BGP session Over an e. BGP session, a router only advertises its best route towards each destination – Usually, import and export filters are defined for each e. BGP session Over an i. BGP session, a router advertises only its best routes learned over e. BGP sessions – – A route learned over an i. BGP session is never advertised over another i. BGP session Usually, no filter is applied on i. BGP sessions
i. BGP and e. BGP : Example UPDATE (via e. BGP) l. Prefix: 194. 100. 0. 0/23, l. Next. Hop: 195. 100. 0. 1 l. ASPath: AS 10 194. 100. 2. 0/23 195. 100. 0. 2 195. 100. 0. 0/30 R 1 195. 100. 0. 1 e. BGP 194. 100. 0. 0/23 – R 2 195. 100. 0. 6 195. 100. 0. 10 AS 20 UPDATE (via i. BGP) l. Prefix: 194. 100. 0. 0/23, l. Next. Hop: 195. 100. 0. 1 l. ASPath: AS 10 l. Local-pref: 1000 AS 30 R 3 195. 100. 0. 8/30 i. BGP 195. 100. 0. 9 e. BGP 195. 100. 0. 4/30 R 4 195. 100. 0. 5 194. 100. 4. 0/23 UPDATE (via e. BGP) l. Prefix: 194. 100. 0. 0/23, l. Next. Hop: 195. 100. 0. 5 l. ASPath: AS 20: AS 10 Note that the next-hop and the AS-Path of BGP update messages are only updated when sent over an e. BGP session
i. BGP and e. BGP Packet Forwarding 194. 100. 2. 0/23 AS 10 195. 100. 0. 2 195. 100. 0. 0/30 R 1 195. 100. 0. 1 e. BGP 194. 100. 0. 0/23 AS 20 IGP routing table of R 2 195. 100. 0. 0/30 West 195. 100. 0. 4/30 via 195. 100. 0. 9 195. 100. 0. 8/30 South 194. 100. 0. 4/23 via 195. 100. 0. 9 194. 100. 2. 0/23 North 194. 100. 4. 0/23 R 2 195. 100. 0. 6 195. 100. 0. 10 195. 100. 0. 8/30 i. BGP 195. 100. 0. 9 BGP routing table of R 2 194. 100. 0. 0/23 via 195. 100. 0. 1 AS 30 e. BGP 195. 100. 0. 4/30 R 4 195. 100. 0. 5 BGP routing table of R 4 194. 100. 0. 0/23 via 195. 100. 0. 1 IGP routing table of R 4 195. 100. 0. 0/30 via 195. 100. 0. 10 195. 100. 0. 4/30 East 195. 100. 0. 8/30 North 194. 100. 2. 0/23 via 195. 100. 0. 10 194. 100. 0. 4/23 West R 3
i. BGP and e. BGP Packet Forwarding (2) 194. 100. 2. 0/23 AS 10 195. 100. 0. 2 195. 100. 0. 0/30 R 1 195. 100. 0. 1 e. BGP 194. 100. 0. 0/23 195. 100. 0. 6 195. 100. 0. 8/30 AS 20 i. BGP 194. 100. 4. 0/23 IGP routing table of R 4 195. 100. 0. 0/30 via 195. 100. 0. 10 195. 100. 0. 4/30 East 195. 100. 0. 8/30 North 194. 100. 2. 0/23 via 195. 100. 0. 10 194. 100. 4. 0/23 West R 2 195. 100. 0. 10 195. 100. 0. 9 BGP routing table of R 4 194. 100. 0. 0/23 via 195. 100. 0. 1 AS 30 e. BGP 195. 100. 0. 4/30 R 4 195. 100. 0. 5 Forwarding of R 4 194. 100. 0. 0/23 via 195. 100. 0. 10 195. 100. 0. 0/30 via 195. 100. 0. 10 195. 100. 0. 4/30 East 195. 100. 0. 8/30 North 194. 100. 2. 0/23 via 195. 100. 0. 10 194. 100. 4. 0/23 West R 3
Using non-BGP routers 194. 100. 2. 0/23 AS 10 195. 100. 0. 2 195. 100. 0. 0/30 R 1 195. 100. 0. 1 e. BGP 194. 100. 0. 0/23 AS 20 AS 30 R 2 195. 100. 0. 6 R 5 i. BGP e. BGP R 3 12. 0. 0. 0/8 195. 100. 0. 4/30 194. 100. 4. 0/23 § R 4 195. 100. 0. 5 Problem What happens when there are internal backbone routers between BGP routers inside an AS ? – – i. BGP session between BGP routers is easily established when IGP is running since i. BGP runs over TCP connection How to populate the routing table of the backbone routers to ensure that they will be able to route any IP packet ?
Using non-BGP routers (2) 194. 100. 2. 0/23 AS 10 195. 100. 0. 2 195. 100. 0. 0/30 R 1 195. 100. 0. 1 e. BGP 194. 100. 0. 0/23 AS 20 AS 30 R 2 195. 100. 0. 6 R 5 i. BGP R 3 e. BGP 195. 100. 0. 4/30 194. 100. 4. 0/23 § R 4 195. 100. 0. 5 First solution Use tunnels between BGP routers to encapsulate interdomain packets – GRE tunnel – – Needs static configuration and be careful with MTU issues – Can be dynamically established in MPLS enabled backbone MPLS tunnel
MPLS in large ISP networks § Only one BGP table lookup inside the AS Use a hierarchy of labels top label is used to reach egress router second label is used to reach e. BGP peer – – RG RH RA B 4 R 1 B 3 RB R 2 RC RD R 5 B 6 AS 1 Egress Border router u packets are label switched R 7 RE Ingress Border router u Maintains full BGP routing table u Attach two labels based on routing table RF
Using non-BGP routers (3) 194. 100. 2. 0/23 AS 10 195. 100. 0. 2 195. 100. 0. 0/30 R 1 195. 100. 0. 1 e. BGP 194. 100. 0. 0/23 AS 20 AS 30 R 2 195. 100. 0. 6 R 5 i. BGP e. BGP R 3 12. 0. 0. 0/8 195. 100. 0. 4/30 194. 100. 4. 0/23 § R 4 195. 100. 0. 5 Second solution Use IGP (OSPF/IS-IS - RIP) to redistribute interdomain routes to internal backbone routers Drawbacks – – Size of BGP tables may completely overload the IGP Make sure that BGP routes learned by R 2 and injected inside IGP will not be re-injected inside BGP by R 4 !
Using non-BGP routers (4) 194. 100. 2. 0/23 AS 10 195. 100. 0. 2 195. 100. 0. 0/30 R 1 195. 100. 0. 1 e. BGP 194. 100. 0. 0/23 AS 20 194. 100. 4. 0/23 § AS 30 R 2 i. BGP R 5 i. BGP 195. 100. 0. 6 e. BGP R 3 12. 0. 0. 0/8 195. 100. 0. 4/30 R 4 195. 100. 0. 5 Classical solution Run BGP on internal backbone routers Internal backbone routers need to participate in i. BGP full mesh – Internal backbone routers receive BGP routes via i. BGP but never advertise any routes – Remember : a route learned over an i. BGP session is never advertised over another i. BGP session
The roles of IGP and BGP 194. 100. 2. 0/23 AS 10 195. 100. 0. 2 195. 100. 0. 0/30 R 1 195. 100. 0. 1 e. BGP 194. 100. 0. 0/23 AS 20 R 2 i. BGP R 5 i. BGP AS 30 195. 100. 0. 4/30 194. 100. 4. 0/23 Role of the IGP inside AS 20 – e. BGP R 3 12. 0. 0. 0/8 Distribute internal topology and internal addresses R 2 -R 4 -R 5) Role of BGP inside AS 20 – – 195. 100. 0. 6 R 4 Distribute the routes towards external destinations IGP must run to allow BGP routers to establish i. BGP sessions
The i. BGP full mesh § Drawback N*(N-1)/2 i. BGP sessions for N routers R R R R i. BGP session R
Outline § Internet routing § BGP basics § BGP in large networks The needs for i. BGP Confederations and Route Reflectors The BGP decision process Scalable routing policies
How to scale i. BGP in large domains ? § Confederations Divide the large domain in smaller sub-domains – – Use i. BGP full mesh inside each sub-domain Use e. BGP between sub-domains Confederation : AS 20 R R Member-AS AS 65001 R R R Member-AS AS 65002 R R R i. BGP session e. BGP session Each router is configured with two AS numbers – – Its confederation AS number Its Member-AS AS number Usually, a single IGP covers the whole domain
Confederations : example UPDATE (via e. BGP) l. Prefix: 1. 0. 0. 0/8, l. ASPath: AS 10 AS 20 e. BGP RX R 2 AS 10 i. BGP e. BGP i. BGP R 1 R 6 AS 65021 i. BGP R 3 AS 65020 R 5 e. BGP RY AS 30 – – – On the e. BGP session between R 2 and RX, R 2 belongs to AS 20 On the e. BGP session between R 5 and RY, R 5 belongs to AS 20 On the e. BGP session between R 1 and R 6, R 1 belongs to AS 65020 and R 6 belongs to AS 65021
Confederations : example (2) UPDATE (via i. BGP) l. Prefix: 1. 0. 0. 0/8, l. ASPath: AS 10 UPDATE (via e. BGP) l. Prefix: 1. 0. 0. 0/8, l. ASPath: [AS 65020]: AS 10 AS 20 e. BGP RX R 2 AS 10 i. BGP e. BGP i. BGP R 1 R 6 AS 65021 i. BGP R 3 AS 65020 R 5 e. BGP RY AS 30 – When propagating an UPDATE via e. BGP to another router of the same confederation, R 1 inserts its Member-AS number in the AS_PATH
Confederations : example (3) UPDATE (via i. BGP) l. Prefix: 1. 0. 0. 0/8, l. ASPath: [AS 65020]: AS 10 AS 20 e. BGP RX R 2 AS 10 i. BGP e. BGP i. BGP R 1 R 6 AS 65021 i. BGP R 3 AS 65020 R 5 UPDATE (via e. BGP) l. Prefix: 1. 0. 0. 0/8, l. ASPath: AS 20: AS 10 e. BGP RY AS 30 – When propagating an UPDATE via e. BGP to a router outside its confederation, R 5 removes the internal path from the AS_Path and inserts its Confederation AS number in the AS_PATH
Route reflectors An alternative to confederations § Route reflectors A route reflector is a special router that is allowed to propagate the routes learned over i. BGP sessions on other i. BGP sessions Normal i. BGP full mesh e. BGP R 2 i. BGP with one route reflector e. BGP i. BGP R 2 i. BGP R 1 i. BGP e. BGP R 3 RR i. BGP e. BGP R 3 Route Reflector
Behavior of a Route Reflector § Two types of i. BGP peers of a route reflector R 1 R 2 i. BGP . . RN i. BGP RR clients peers ( do not participate in i. BGP full mesh) RR i. BGP RX i. BGP RZ i. BGP RY i. BGP Non-clients peers (participate in i. BGP full mesh)
Behavior of a Route Reflector § Route received from an e. BGP session or a client peer Select best path Advertise to – – RR clients peers. . R 2 R 1 All client peers All non-client peers i. BGP RN i. BGP RR § Route received from non-client peer Select best path Advertise to : – All client peers i. BGP RX i. BGP RZ i. BGP RY i. BGP Non-clients peers
Fault tolerance of route reflectors § How to avoid having the RR as a single point of failure ? Solution – Allow each client peer to be connected at 2 RRs R 1 RR clients peers. . R 2 i. BGP Issue – RR 1 i. BGP RN i. BGP RR 2 Configuration errors may cause redistribution loops – – ORIGINATOR_ID used to carry router ID of originator of route CLUSTER_LIST contains the list of RR that sent the UPDATE message inside the current AS
Route reflectors : an example UPDATE (via e. BGP) l. Prefix: 1. 0. 0. 0/8, l. ASPath: AS 10 AS 20 e. BGP RX R 2 AS 10 i. BGP RR 1 UPDATE (via e. BGP) l. Prefix: 1. 0. 0. 0/8, l. ASPath: AS 10 RZ e. BGP i. BGP R 3 – – – RR 6 R 5 e. BGP R 2 and R 3 are clients of Route Reflector RR 1 and RR 6 are in i. BGP full mesh R 5 is client of Route Reflector RR 6 RY AS 30
Route reflectors : an example (2) e. BGP RX UPDATE (via i. BGP) l. Prefix: 1. 0. 0. 0/8, l. ASPath: AS 10 AS 20 l. Nexthop: RX R 2 AS 10 RR 6 i. BGP RR 1 i. BGP R 3 RZ – e. BGP R 5 e. BGP RY UPDATE (via i. BGP) l. Prefix: 1. 0. 0. 0/8, l. ASPath: AS 10 l. Nexthop: RZ RR 1 will select its best path towards 1. 0. 0. 0/8 and will re-advertise it by adding the ORIGINATOR_ID and the CLUSTERID AS 30
Route reflectors : an example (3) UPDATE (via i. BGP) l. Prefix: 1. 0. 0. 0/8, l. ASPath: AS 10 l. Nexthop: RX l. ORIGINATOR_ID: R 2 l. CLUSTER_ID: RR 1 AS 20 e. BGP RX R 2 AS 10 i. BGP UPDATE (via i. BGP) l. Prefix: 1. 0. 0. 0/8, l. ASPath: AS 10 l. Nexthop: RX l. ORIGINATOR_ID: R 2 l. CLUSTER_ID: RR 1 e. BGP – i. BGP R 3 RZ RR 6 R 5 e. BGP RR 1 prefers the path to 1. 0. 0. 0/8 via RX-R 2 – – RY AS 30 RR 1 advertises this path to its client peer (R 3) – the path is not advertised to R 2 since R 2 already received it RR 1 advertises this path to its non-client peer (RR 6)
Route reflectors : an example (4) AS 20 e. BGP RX UPDATE (via i. BGP) l. Prefix: 1. 0. 0. 0/8, l. ASPath: AS 10 l. Nexthop: RX l. ORIGINATOR_ID: R 2 l. CLUSTER_ID: RR 1: RR 6 R 2 AS 10 RR 6 i. BGP RR 1 i. BGP R 3 e. BGP RZ – RY AS 30 RR 6 advertises the path to 1. 0. 0. 0/8 via RX-R 2 – – R 5 e. BGP to its client peer R 5 will remove ORIGINATOR_ID and CLUSTER_ID before advertising the path to RY via e. BGP
Hierarchy of route reflectors § In large domains, a hierarchy of route reflectors can be built R 5 R 4 R 1 R 2 R 3 RR 4 RR 1 RRA R 6 RR 5 RR 2 RRC RRB i. BGP session
Confederations versus Route reflectors § Confederations Solves i. BGP scaling Redundancy with i. BGP full-mesh inside each Member. AS Possible to run one IGP per Member AS Requires manual router configuration Can be used when merging domains Can lead to some routing oscillations § Route reflectors Solves i. BGP scaling Redundancy by using Redundant RRs Usually a single IGP for the whole AS Requires manual router configuration Can lead to some routing oscillations
Outline § Internet Routing § BGP basics § BGP in large networks The needs for i. BGP Confederations and Route Reflectors The BGP decision process Scalable routing policies
The BGP decision process Peer[N] BGP Msgs from Peer[1] Import filter Attribute manipulation BGP RIB All acceptable routes BGP Decision Process One best route to each destination Peer[N] Peer[1] Export filter Attribute manipulation BGP Msgs to Peer[N] BGP Msgs to Peer[1] BGP Decision Process – Ignore routes with unreachable nexthop – Prefer routes with highest local-pref – Prefer routes with shortest ASPath – Prefer routes with smallest MED – Prefer routes learned via e. BGP over routes learned via i. BGP – Prefer routes with closest next-hop – Tie breaking rules – Prefer Routes learned from router with lowest router id
The shortest AS-Path step in the BGP decision process § Motivation BGP does not contain a real “ metric” Use length of AS-Path as an indication of the quality of routes – Not always a good indicator R 0 R 1 RA R 2 RB § Consequence R 3 RC R 4 Internet paths tend to be short, 3 -5 AS hops Many paths converge at Tier-1 ISPs and those ISPs carry lots of traffic
The prefer e. BGP over i. BGP step in the BGP decision process § Motivation: hot potato routing A router should try to get rid of packets sent to external domains as soon as possible AS 1 R 6's routing table l 1/8: AS 2 via R 2 (e. BGP, best) l 1/8: AS 2 via R 3 (i. BGP) R 8 C=50 C=1 R 7 R 6 UPDATE l Prefix: 1. 0. 0. 0/8 l ASPath: AS 2 l Next. Hop: R 2 C=1 R 2 R 7's routing table l 1/8: AS 2 via R 2 (i. BGP) l 1/8: AS 2 via R 3 (e. BGP, best) UPDATE l Prefix: 1. 0. 0. 0/8 l ASPath: AS 2 l Next. Hop: R 3 C=98 R 0 1. 0. 0. 0/8 R 3 AS 2 Flow of IP packets towards 1. 0. 0. 0/8
The closest nexthop step in the BGP decision process § Motivation : hot potato routing A router should try to get rid of packets sent to external domains as soon as possible R 8's routing table l 1/8: AS 2 via R 2 (NH=R 7, best) l 1/8: AS 2 via R 3 (NH=R 6) R 9 Flow of IP packets R 8 AS 1 C=50 R 7 R 6 UPDATE l Prefix: 1. 0. 0. 0/8 l ASPath: AS 2 l Next. Hop: R 2 C=1 R 2 Content provider sending to 1. 0. 0. 0/8 UPDATE l Prefix: 1. 0. 0. 0/8 l ASPath: AS 2 l Next. Hop: R 3 C=98 R 0 1. 0. 0. 0/8 R 3 AS 2
The lowest MED step in the BGP decision process § Motivation : cold potato routing In a multi-connected AS, indicate which entry border router is closest to the advertised prefix – Usually MED= IGP cost Content provider sending to 1. 0. 0. 0/8 R 9 R 8's routing table l 1/8: AS 2 via R 2 (MED=1, best) l 1/8: AS 2 via R 3 (MED=98) AS 1 Flow of IP packets R 8 C=1 C=50 R 7 R 6 UPDATE l Prefix: 1. 0. 0. 0/8 l ASPath: AS 2 l Next. Hop: R 2 l MED : 1 C=1 R 2 C=98 R 0 1. 0. 0. 0/8 R 3 AS 2 UPDATE l Prefix: 1. 0. 0. 0/8 l ASPath: AS 2 l Next. Hop: R 3 l MED: 98
The lowest router id step in the BGP decision process § Motivation A router must be able to determine one best route towards each destination prefix – A router may receive several routes with comparable attributes towards one destination R 0 AS 1 1. 0. 0. 0/8 AS 2 UPDATE l Prefix: 1. 0. 0. 0/8 l ASPath: AS 2: AS 1 § Consequence R 3 R 2 R 1 AS 3 UPDATE l Prefix: 1. 0. 0. 0/8 l ASPath: AS 3: AS 1 A router with a low IP address will be preferred
More on the MED step in the BGP decision process § Unfortunately, the processing of the MED is more complex than described earlier § Correct processing of the MED values can only be compared between routes receiving from the SAME neighboring AS – Routes which do not have the MED attribute are considered to have the lowest possible MED value. Selection of the routes containing MED values for m = all routes still under consideration for n = all routes still under consideration if (neighbor. AS(m) == neighbor. AS(n)) and (MED(n) < MED(m)) { remove route m from consideration }
Why such a complex MED step ? Content provider R 9 Flow of IP packets AS 1 R 8 C=50 R 6 C=1 R 7 b R 6 b R 0: AS 3: AS 0, MED=21 R 0: AS 2: AS 0, MED=0 R 0: AS 3: AS 0, MED=20 C=1 R 0: AS 2: AS 0, MED=9 R 3 R 4 AS 3 C=9 R 2 R 5 AS 2 R 0 AS 0
Route oscillations with MED e. BGP session i. BGP session C=1 C=2 Physical link R 1 R 0: ASX: AS 0, MED=0 C=1 C=4 R 2 R 3 R 0: ASZ: AS 0, MED=1 RX RR 3 RR 1 R 0: ASZ: AS 0, MED=0 RZ Consider a single prefix advertised by R 0 in AS 0 – – R 1, R 2 and R 3 always prefer their direct e. BGP path Due to the utilization of route reflectors, RR 1 and RR 3 only know a subset of the three possible paths – This limited knowledge is the cause of the oscillations
Route oscillations with MED (2) RR 3's best path selection – – – If RR 3 only knows the R 3 -RZ path, this path is preferred and advertised to RR 1 RR 3 knows the R 1 -RX and R 3 -RZ paths, R 1 -RX is best (IGP cost) and RR 3 doesn't advertise a path to RR 1 If RR 3 knows the R 2 -RZ and R 3 -RZ paths, RR 3 prefers the R 3 -RZ path (MED) and R 3 -RZ is advertised to RR 1 e. BGP session i. BGP session C=1 RR 3 RR 1 C=2 Physical link R 1 C=4 R 2 R 0: ASX: AS 0, MED=0 R 3 R 0: ASZ: AS 0, MED=1 RX R 0: ASZ: AS 0, MED=0 RZ
Route oscillations with MED (3) RR 1's best path selection – If RR 1 knows the R 1 -RX, R 2 -RZ and R 3 -RZ paths, R 1 RX is preferred and RR 1 advertises this path to RR 3 – – – But if RR 1 advertises R 1 -RX, RR 3 does not advertise any path ! But if RR 1 advertises R 2 -RZ, RR 3 prefers and advertises R 3 RZ ! If RR 1 knows the R 1 -RX and R 2 -RZ paths, RR 1 prefers the R 2 -RZ path and advertises this path to RR 3 e. BGP session i. BGP session C=1 RR 3 RR 1 C=2 Physical link R 1 C=4 R 2 R 0: ASX: AS 0, MED=0 R 3 R 0: ASZ: AS 0, MED=1 RX R 0: ASZ: AS 0, MED=0 RZ
Other problems with Route Reflectors RR 2 e. BGP session i. BGP session C=5 Physical link C=5 C=1 Ra RX RR 3 RR 1 Rb RY Rc RZ Consider one prefix advertised by RX, RY, RZ – – Ra, Rb, and Rc will all prefer their direct e. BGP path RR 1, RR 2 and RR 3 will never reach an agreement
Forwarding problems with Route Reflectors e. BGP session i. BGP session C=1 R 1 C=1 R 2 Physical link RR 1 C=5 RX RR 2 RY Consider a prefix advertised by RX and RY – BGP routing will converge – – BGP RR 1 (and R 1) prefer path via RX, RR 2 (and R 2) prefer path via RY – – R 1 sends packets towards prefix via R 2 (to reach RX, its best path) R 2 sends packets towards prefix via R 1 (to reach RY, its best path) © O. Bonaventure, 2008 But forwarding of IP packets will cause loop !
Outline § Internet Routing § BGP basics § BGP in large networks The needs for i. BGP AS Confederations and Route Reflectors The BGP decision process Scalable routing policies
The Community attribute § Principle Optional transitive attribute containing a set of communities each community acts as a marker – – Standardized communities – – one community is represented as a 32 bits value usually routes with same marker are treated same manner NO_EXPORT (0 x. FFFFFF 01) NO_ADVERTISE (0 x. FFFFFF 02) Delegated communities – 65536 communities have been delegated to each AS – ASX 65536 ASX: 0 through ASX: 65535
Scalable routing policies with communities § Principle attach same community value to all routes that need to receive the same treatment Prov 1 Prov 2 $ $ Peer 1 R R Peer 2 Route learned from Peer Provider Customer $ Cust 1 R R Peer 3 Peer 4 $ Cust 2 Shared-cost $ Customer-provider
More complex routing policies with communities § Other utilizations of communities Research ISP providing two types of services – – – Access to research networks for universities Access to the commercial Internet for universities and government institutions Solution – – – Tag routes learned from research network and commercial Internet Only announce the universities to research network Only advertise research network to universities Commercial ISP providing several transit services – Full transit service – – – Announce all known routes to all customers Advertise customer routes to all peers, customers, providers – Only advertise to those customers the routes learned from customers, but not the routes learned from peers Advertise the routes learned from those customers only to customers Client routes only –
Other utilizations of communities § Communities used for tagging Community attached by router that receives route to indicate country where route was received – Example (Eunet, AS 286) – – – 286: 1000 + countrycode for Public peer routes 286: 2000 + countrycode for Private peer routes 286: 3000 + countrycode for customer routes 3561: SRCC – S : Peer or Customer – R : Regional Code – CC : ISO 3166 country code Another example (C&W, AS 3561) Community to indicate IX where route was learned – Example : AS 12369 (Global Access Telecommunications) – – – 13129: 2110 : route leared at DE-CIX 13129: 2120 : route learned at INXS 13129: 2130 : route learned at SFINX