Title: The Basics of BGP (Border Gateway Protocol) Routing and its Performance in Today
1The Basics of BGP (Border Gateway Protocol)
Routing and its Performance in Todays Internet
- Presenter Sophia Poku
- Slides taken from presentation by
Nina Taft
2Outline
1. Highlights 2. Addressing and CIDR 3. BGP
Messages and Prefix Attributes 4. BGP Decision
and Filtering Processes 5. I-BGP 6. Route
Reflectors 7. Multihoming 8. Aggregation 9.
Routing Instability 10. BGP Table Growth
3Routing Protocols
A
E-BGP
AS2
AS (Autonomous System) - a collection of routers
under the same technical and administrative
domain. EGP (External Gateway Protocol) - used
between two ASs to allow them to exchange
routing information so that traffic can be
forwarded across AS borders. Example BGP
4Routers used
- Internal Router directly connects networks
belonging to the same area - It runs a single copy of the basic routing
protocol - Border/Boundary Router exchanges routing
information with routers belonging to other AS
5Purpose to share connectivity information
AS2
BGP
AS1
A
border router
internal router
6BGP Sessions
- Primary function is to exchange
network-reachability information (includes AS s) - Uses TCP to establish connection
- Initially node advertises ALL routes it wants
neighbor to know (could be gt50K routes) - Ongoing only inform neighbor of changes
- One router can participate in many BGP sessions.
AS1
AS3
AS2
7Configuration and Policy
- A BGP node has a notion of which routes to share
with its neighbor. It may only advertise a
portion of its routing table to a neighbor. - A BGP node does not have to accept every route
that it learns from its neighbor. It can
selectively accept and reject messages. - What to share with neighbors and what to accept
from neighbors is determined by the routing
policy, that is specified in a routers
configuration file.
8Addressing Schemes
- Original addressing schemes (class-based)
- 32 bits divided into 2 parts
- Class A
- 0xxx or 1-126 in decimal subnet mask255.0.0.0
- Class B
- 10xx or 128-192 in decimal
- Subnet mask255.255.0.0
- Class C
- 110x or 192-223 in decimal, Subnet
Mask255.255.255.0
2 million nets 256 hosts
9CIDR (Classless Inter-Domain Routing)
- CIDR introduced to solve 2 problems
- exhaustion of IP address space
- size and growth rate of routing table
10Problem 1 Lifetime of Address Space
- Example an organization needs 500 addresses. A
single class C address not enough (256 hosts).
Instead a class B address is allocated. (64K
hosts) Thats overkill -a huge waste. - CIDR allows networks to be assigned on arbitrary
bit boundaries. - permits arbitrary sized masks 178.24.14.0/23 is
valid - requires explicit masks to be passed in routing
protocols - CIDR solution for example above organization is
allocated a single /23 address (equivalent of 2
class Cs).
11Problem 2 Routing Table Size
Without CIDR
With CIDR
232.71.0.0 232.71.1.0 232.71.2.0 .. 232.71.255.0
232.71.0.0/16
12CIDR Classless Inter-Domain Routing
- Address format ltIP address/prefix Pgt. The prefix
denotes the upper P bits of the IP address. - E.g. in CIDR address 206.13.01.48/25, the /25
indicates the first 25 bits are used to identify
a unique network, the remaining bits are hosts - Idea - use aggregation - provide routing for a
large number of customers by advertising one
common prefix. - This is possible because nature of addressing is
hierarchical - Summarizing routing information reduces the size
of routing tables, but allows to maintain
connectivity. - Aggregation is critical to the scalability and
survivability of the Internet
13Address Arithmetic Address Blocks
- The ltaddress/prefixgt pair defines an address
block - Examples
- 128.15.0.0/16 gt 128.15.0.0 - 128.15.255.255
- 188.24.0.0/13 gt 188.24.0.0 - 188.31.255.255
consider 2nd octet in binary - Address block sizes
- a /13 address block has 232-13 addresses(524288)
(/16 has 232-16 65536) - a /13 address block is 8 times as big as a /16
address blockbecause 232-13 232-16 23
14CIDR longest prefix match
- Because prefixes of arbitrary length allowed,
overlapping prefixes can exist. - Example router hears 124.39.0.0/16 from one
neighborand 124.39.11.0/24 from another neighbor - Router forwards packet according to most specific
forwarding information, called longest prefix
match - Packet with destination 124.39.11.32 will be
forwarded using /24 entry. - Packet w/destination 124.39.22.45 will be
forwarded using /16 entry
15Will CIDR work ?
- For CIDR to be successful need
- address registries must assign addresses using
CIDR strategy - providers and subscribers should configure their
networks, and allocate addresses to allow for a
maximum amount of aggregation - BGP must be configured to do aggregation as much
as possible - Factors that complicate achieving aggregation
- multihoming, proxy aggregation, changing providers
16Four Basic Messages
- Open Establishes BGP session (uses TCP port
179) - Notification Report unusual conditions
- UpdateInform neighbor of new routes that become
activeInform neighbor of old routes that become
inactive - Keepalive Inform neighbor that connection is
still viable
17BGP Database
- 1.Neighbor table
- List of BGP neighbors
- 2. BGP forwarding table
- List of all networks learned from each
neighbor - 3. IP routing table
- List of best path to destination networks
18OPEN Message
- During session establishment, two BGP speakers
exchange their - AS numbers
- BGP identifiers (usually one of the routers IP
addresses) - Router ID
- Holdtime
- Open messages are confirmed using a keep-alive
message sent by a peer and must be confirmed
before updates - A BGP speaker has option to refuse a session
- Select the value of the hold timer
- maximum time to wait to hear something from
other end before assuming session is down. - authentication information (optional)
19NOTIFICATION and KEEPALIVE Messages
- NOTIFICATION
- Indicates an error
- terminates the TCP session
- gives receiver an indication of why BGP session
terminated - Examples header errors, hold timer expiry, bad
peer AS, bad BGP identifier, malformed attribute
list, missing required attribute, AS routing
loop, etc. - KEEPALIVE
- protocol requires some data to be sent
periodically. If no UPDATE to send within the
specified time period, then send KEEPALIVE
message to assure partner that connection still
alive
20UPDATE Message
- Updates are sent using TCP to ensure delivery
- used to either advertise and/or withdraw
unfeasible prefixes from routing table - path attributes list of attributes that pertain
to ALL the prefixes in the Reachability Info field
FORMAT
Withdrawn routes length (2 octets)
Withdrawn routes (variable length)
Total path attributes length (2 octets)
Path Attributes (variable length)
Reachability Information (variable length)
21Advertising a prefix
- When a router advertises a prefix to one of its
BGP neighbors - information is valid until first router
explicitly advertises that the information is no
longer valid - BGP does not require routing information to be
refreshed - if node A advertises a path for a prefix to node
B, then node B can be sure node A is using that
pathitself to reach the destination.
22BGP Attributes
- Attributes routes learned via BGP have
associated properties that are used to determine
the best route to a destination when multiple
paths exist to a particular destination - Local Preference
- Multi-Exit Discriminator (MED)
- Origin
- AS-path
- Next-hop
23Attribute ORIGIN
- ORIGIN
- Who originated the announcement? Where was a
prefix injected into BGP? indicates how BGP
learned about a particular route - IGP route is interior to the originating AS.
This value is Value set using network router
configuration command to inject router into BGP - EGP route learned via the External Gateway
Protocol - Incomplete (often used for static routes) origin
of routes unknown or learned in some other way
24Attributes AS_PATH
- a list of ASs through which the announcement for
a prefix has passed - each AS prepends its AS to the AS-PATH
attribute when forwarding an announcement - useful to detect and prevent loops
25Attribute NEXT HOP
- IP address used to reach the advertising router
- For EBGP session, NEXT HOP IP address of
neighbor that announced the route. - For IBGP sessions, if route originated inside AS,
NEXT HOP IP address of neighbor that announced
the route - For routes originated outside AS, NEXT HOP of
EBGP node that learned of route, is carried
unaltered into IBGP.
BGP Table at Router C
IP Routing Table at Router C
26Next-Hop Contd
- Router C advertises 172.16.1.0 with next hop
10.1.1.1 - A propagates it within its AS
27Attribute Multi-Exit Discriminator (MED)
- when ASs interconnected via 2 or more links
- AS announcing prefix sets MED
- enables AS2 to indicate its preference
- AS receiving prefix uses MED to select link
- a way to specify how close a prefix is to the
link it is announced on
28Attribute Local Preference
- Used to prefer an exit point from the local AS
- Used to indicate preference among multiple paths
for the same prefix anywhere in the internet. - The higher the value the more preferred
- Exchanged between IBGP peers only. Local to the
AS. - Often used to select a specific exit point for a
particular destination
29Routing Process Overview
Choose best route
accept, deny, set preferences
forward, not forward set MEDs
Output policy engine
BGP table
IP routing table
30Input Policy Engine
- Inbound filtering controls outbound traffic
- filters route updates received from other peers
- filtering based on IP prefixes, AS_PATH,
community - denying a prefix means BGP does not want to reach
that prefix via the peer that sent the
announcement - accepting a prefix means traffic towards that
prefix may be forwarded to the peer that sent the
announcement - Attribute Manipulation
- sets attributes on accepted routes
- example specify LOCAL_PREF to set priorities
among multiple peers that can reach a given
destination
31BGP Decision Process
- 1. Choose route with highest LOCAL-PREF
- 2. If have more than 1 route, select route with
shortest AS-PATH - 3. If have more than 1 route, select according
to lowest ORIGIN type where IGP lt EGP lt
INCOMPLETE - 4. If have more than 1 route, select route with
lowest MED value - 5. Select min cost path to NEXT HOP using IGP
metrics - 6. If have multiple internal paths, use BGP
Router ID to break tie.
32Output Policy Engine
- Outbound Filtering controls inbound traffic
- forwarding a route means others may choose to
reach the prefix through you - not forwarding a route means others must use
another router to reach the prefix - may depend upon whether E-BGP or I-BGP peer
- example if ORIGINEGP and you are a non-transit
AS and BGP peer is non-customer, then dont
forward - Attribute Manipulation
- sets attributes such as AS_PATH and MEDs
33 Transit vs. Nontransit AS
Transit traffic traffic whose source and
destination are outside the AS
Nontransit AS does not carry transit traffic
Transit AS does carry transit traffic
- Advertise own routes only
- Do not propagate routes learned from other ASs
- case 1
- Advertises its own routes PLUS routes
- learned from other ASs
34Internal BGP (I-BGP)
- Used to distribute routes learned via EBGP to all
the routers within an AS - I-BGP and E-BGP are same protocol in that
- same message types used
- same attributes used
- same state machine
- BUT use different rules for readvertising
prefixes - Rule 1 prefixes learned from an E-BGP neighbor
can be readvertised to an I-BGP neighbor, and
vice versa - Rule 2 prefixes learned from an I-BGP neighbor
cannot be readvertised to another I-BGP neighbor
35I-BGP Preventing Loops and Setting Attributes
- Why rule 2? To prevent announcements from
looping. - In E-BGP can detect via AS-PATH.
- AS-PATH not changed in I-BGP
- Implication of rule a full mesh of I-BGP
sessions between each pair of routers in an AS is
required - Setting Attributes The router that injects the
route into the I-BGP mesh is responsible for - setting the LOCAL-PREF attribute
- prepending AS to AS-PATH
36Route Reflectors
- Problem requiring a full mesh of I-BGP sessions
between all pairs of routers is hard to manage
for large ASs. - Solution
- group routers into clusters.
- Assign a leader to each cluster, called a route
reflector (RR). - Members of a cluster are called clients of the RR
- I-BGP Peering
- clients peer only with their RR
- RRs must be fully meshed
RR
RR
clusters
clients
37Route Reflectors Rule on Announcements
- Provides mechanisms for minimizing the of
updates messages transmitted within an AS and
reducing the amount of data propagated in each
message. - If received from RR, reflect to clients
- If received from a client, reflect to RRs and
clients - If received from E-BGP, reflect to all - RRs and
clients - RRs reflect only the best route to a given
prefix, not all announcements they receive. - helps size of routing table
- sometimes clients dont need to carry full table
38Avoiding Loops with Route Reflectors
- Loops cannot be detected by traditional approach
using AS-PATH because AS-PATH not modified within
an AS. - Announcements could leave a cluster and re-enter
it. - Two new attributes introduced
- ORIGINATOR_ID router id of routes originator in
ASrule announcement discarded if returns to
originator - CLUSTER_LIST a sequence of cluster ids. set by
RRs.rule if an RR receives an update and the
cluster list contains its cluster id, then update
is discarded.
39Multihoming
40Single-homed vs. Multi-homed subscribers
- A single-homed network has one connection to the
Internet (i.e., to networks outside its domain) - A multi-homed network has multiple connections to
the Internet. Two scenarios - can be multi-homed to a single provider
- can be multi-homed to multiple providers
- Why multi-home?
- Reliability
- Performance
41Single-homed AS
- Subscriber called a stub AS
- Provider-Subscriber communication for route
advertisement - static configuration. most common.
- Providers router is configured with subscribers
prefix. - good if customers routes can be represented by
small set of aggregate routes - bad if customer has many noncontiguous subnets
- can use BGP between provider and stub AS
Provider
R1
R2
Subscriber
42Multihoming to Multiple Providers
43Multihoming Issues
- Load sharing
- how distribute the traffic over the multiple
links? - Reliability
- if load sharing leads to preferencing certain
links for certain subnets, is reliability
reduced? - Address/Aggregation
- which subnet addresses should the multihomed
customer use? - how will this affect its providers ability to
aggregate routes?
44Load sharing from ISP to Customer using
attributes
- Goal provider splits traffic across 2 links
according to prefix - Implement this strategy using attributes
- customer sets MEDs
- provider sets LOCAL_PREF
R2
R3
Customer
45Load sharing from Customer to ISP using policy
blue announcements red traffic
- Goal send traffic to ISPs customers on one
link send traffic to the rest of the Internet on
2nd link - Implement using policy to control announcements
ISP
R1
R2
advertise customer routes only
advertise default route 0/0
R3
traffic
Customer
46Address/Aggregation Issue
- Where should the customer get its address block
from? - 1. From ISP1
- 2. From ISP2
- 3. From both ISP1 and ISP2
- 4. Independently from address registry
(cases 1 and 2 are equivalent)
47Case 1 2 Get address block from one ISP
- example customer gets address from ISP 1
- ISP 1s aggregation is not broken
- customers prefix not aggregatable at ISP 2
- longer prefix becomes a traffic magnet
- How good is load sharing?If all ISPs generate
same amount of traffic for customer, then
ISP2-customer link twice as loaded as
ISP1-customer link
48Case 3 Get address block from both ISPs
- announcement policy announce prefix only to its
parent - advantage both ISPs can aggregate the prefix
they receive - disadvantage lose reliability
- load balancing good? depends upon how much
traffic sent to each prefix
140.20/16
200.50/16
140.20.1/24
200.50.1/24
140.20.1/24
200.50.1/24
49Case 4 obtain address block from registry
- no aggregation possible
- no traffic magnets created
- good reliability
- how to achieve load sharing?
- customer breaks address block into 2 /25 blocks,
and announce one per link (but may lose
reliability) - OR use method of AS-PATH manipulation
150.55.10/24
150.55.10/24
50AS-PATH manipulation
- Idea prepend your AS number in AS-PATH multiple
times to discourage use of a link - makes AS-PATH seem longer than it is
- recall BGP decision process uses shortest AS-PATH
length as a criteria for selecting best path - Example ISP 3 will choose path through AS 2
because its AS-PATH appears shorter
150.55.10/24 - 33
51Aggregation
52Address Arithmetic When is aggregation
possible? Case 1
- Process of combining characteristics of several
different routes in such a way that a single
route is advertised. - Possible when one prefix contained in another.
- Example Two ASs having customer-provider
relationship. Provider does the aggregation. - Provider has address block 18.0.0.0/8
- Its customers have address blocks 18.6.0.0/15 and
18.9.0.0/15 - Provider announces its address block only
- Rule Prefix p1 contains prefix p2 iff
length(p2) gt length(p1) AND address(p2) /
232-length(p1) address(p1) / 232-length(p1)
53Address Arithmetic When is aggregation
possible? Case 2
- Some pairs of consecutive prefixes
- Example routes within the same AS
- AS has 2 address blocks 1.2.2.0/24
0000001.00000010.00000010.00000000/241.2.3.0/24
0000001.00000010.00000011.00000000/24can
announce instead 1.2.2.0/23 - Rule consecutive prefixes p1 and p2 are
aggregatable iff - length(p1)length(p2) ANDaddress(p1) /
232-length(p1) 1 address(p2) / 232-length(p2)
AND address(p1) 233-length(p1) 0
54Aggregation and Multihoming
- Most common scenario customer breaks its address
block in 2 for load sharing purposes.YET, also
advertises whole block for reliability.
AS 2
1.2.3.0/24
1.2.2.0/24
1.2.2.0/23
1.2.2.0/23
customer
1.2.2.0/23
55Black holes and cardinal sins
- The cardinal sin of BGP routing is advertising
routes that you don't know how to get to called
"black-holing" someone - If you announce part of IP space owned by someone
else, using a more-specific prefix, all their
traffic flows to you. Makes that address block
disconnected from the Internet. - Example black holes can happen inadvertently by
non-careful aggregation
wrong !! 100.24.0.0/18
100.24/16
222.2/16
ISP 1 100.24/16
ISP 2 222.2/16
56Limitations to Aggregation
- Lack of hierarchical allocation of address space
prior to CIDR (before 1995) - A single AS can have noncontiguous address blocks
- Customer AS and provider AS can have
non-contiguous address blocks - Reluctance of customers to renumber their address
space when they switch providers - Multi-homing
- multi-homed prefixes require global visibility
- may choose not to load sharing
57Rules of Thumb for Aggregation
- To avoid black holes an ISP is not allowed to
aggregate routes unless it is a supernet of the
address block it wants to aggregate - In other words, an ISP has to specifically
announce each IP address range of its downstream
customers that are not part of its own address
space.
58Routing Instability
59Route Stability
- Routing instability rapid fluctuation of network
reachability information - route flapping when a route is withdrawn and
re-announced repeatedly in a short period of
time - happens via UPDATE messages
- because messages propagate to global Internet,
route flapping behavior can cascade and
deteriorate routing performance in many places - Effects increased packet loss, increased network
latency, CPU overhead, loss of connectivity - Route Stability Studies by C. Labovitz, R. Malan
F. Jahanian
60Types of Routing Updates
- Forwarding instability
- reflects legitimate topology changes
- e.g., changes in Prefix, NEXT_HOP and/or ASPATH
- affects forwarding paths used
- Policy fluctuation
- reflects changes in policy
- e.g., changes in MED, LOCAL_PREF, etc.
- may not necessarily affect forwarding paths used
- Pathological
- redundant messages
- reflect neither topology nor policy changes
61Whos Responsible?
- ASs
- No single AS dominates instability statistics
- No correlation between the size of an AS and its
share of updates generated. - Prefixes
- Instability is evenly distributed across routes.
- Example of measurements
- 75 of AADiff events come from prefixes change
less than 10 times a day. - 80-90 of instability comes from prefixes that
are announced less than 50 times/day.
62Sources of Instabilities in General
- router configuration errors
- transient physical and data link problems
- software bugs
- problems with leased lines (electrical timing
issues that cause false alarms of disconnect) - router failures
- network upgrades and maintenance
63Instability Problem and Cause. Example 1.
- Problem 3-5 million duplicate withdrawals
- Cause stateless BGP implementation
- time-space tradeoff no state maintained on what
advertised to peers - when receive any change, transmit withdrawal to
all peers regardless of whether previously
notified or not - sent updates for both explicit and implicit
withdrawals - By 1998, most vendors had BGP implementations
with partial state. - Result number of WWDups reduced by an order of
magnitude
64Instability Problem and Cause. Example 2
- Problem duplicate announcements
- Cause min-advertisement timer stateless BGP
- min-adv timer wait 30 seconds. Combine all
received updates in last 30 seconds into single
outbound update message (if possible). - within 30 seconds route can be withdrawn and
re-announced so that there is no net change to
original announcement - Solution Have BGP keep some state about recently
sent messages to peers. Avoid sending duplicate
messages
65Instability Problem and Cause. Example 3
Net X
AS 1
AS3
6
5
4
10
3
2
AS2
border router
internal router
Example interaction IGP/BGP policy set MED
using IGP metrics, such as shortest
path
E-BGP
IGP
66Controlling route instability Route Dampening
- track number of times a route has flapped over a
period of time - routes that exhibit a high level of instability
in a short period of time should be suppressed
(not advertised) - penalize ill behaved routes proportionally to
their expected future stability - if a suppressed route stops flapping for a long
enough period of time, unsuppress it (readvertise)
67Route Dampening
suppress limit
penalty
reuse limit
time
68Route Dampening Algorithm
- Each time a route flaps, increase the penalty
- If the route has not flapped in the last
half-life time units, then cut penalty in half. - If the penalty gt suppress limit, then suppress
the route - If the penalty lt reuse limit, then free a
suppressed route
69Side Effect of Route Dampening
- A legitimate update may arrive and it will be
ignored because that route has been suppressed
and not yet released. - The modification needed for the legitimate
announcement is delayed
70Aggregation can help route flapping
- If a more-specific route is flapping, but
provider only announces aggregated prefix, then
other networks dont see route flap. - Hence aggregation can mask route flapping.
- Aggregation helps combat instability because it
reduces the number of networks visible in the
core Internet.
not flapping
140.40/16
140.40.4/24
flapping