Nick Feamster - PowerPoint PPT Presentation

About This Presentation
Title:

Nick Feamster

Description:

Big problem: Very poor understanding. of how to manage it. 4. Why does ... Permitting paths of length n 1 over paths of length n will result in a dispute wheel. ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 56
Provided by: NickFe
Learn more at: http://nms.lcs.mit.edu
Category:
Tags: big | feamster | nick | wheel

less

Transcript and Presenter's Notes

Title: Nick Feamster


1
Interdomain RoutingCorrectness and Stability
  • Nick Feamster

2
Is correctness really that important?
3
Is correctness really that important?
  • The Internet is increasingly becoming part of the
    mission-critical Infrastructure (a public
    utility!).

Big problem Very poor understanding of
how to manage it.
4
Why does routing go wrong?
  • Complex policies
  • Competing / cooperating networks
  • Each with only limited visibility
  • Large scale
  • Tens of thousands networks
  • each with hundreds of routers
  • each routing to hundreds of thousands of IP
    prefixes

5
What can go wrong?
Some things are out of the hands of networking
research
But
Two-thirds of the problems are caused by
configuration of the routing protocol
6
Review Simple operation
Autonomous Systems (ASes)
Route Advertisement
MIT
7
but complex configuration!
Flexibility for realizing goals in complex
business landscape
  • Which neighboring networks can send traffic
  • Where traffic enters and leaves the network
  • How routers within the network learn routes to
    external destinations

Traffic
No Route
Route
Flexibility
Complexity
8
Configuration Semantics
Ranking route selection
Customer
Primary
Competitor
Backup
9
What types of problems does configuration cause?
  • Persistent oscillation (todays reading)
  • Forwarding loops
  • Partitions
  • Blackholes
  • Route instability

10
These problems are real
a glitch at a small ISP triggered a major
outage in Internet access across the country.
The problem started when MAI Network
Services...passed bad router information from one
of its customers onto Sprint. -- news.com,
April 25, 1997
Florida Internet Barn
11
These problems are real
a glitch at a small ISP triggered a major
outage in Internet access across the country.
The problem started when MAI Network
Services...passed bad router information from one
of its customers onto Sprint. -- news.com,
April 25, 1997
Microsoft's websites were offline for up to 23
hours...because of a router misconfigurationit
took nearly a day to determine what was wrong and
undo the changes. -- wired.com, January 25,
2001
WorldCom Incsuffered a widespread outage on its
Internet backbone that affected roughly 20
percent of its U.S. customer base. The network
problemsaffected millions of computer users
worldwide. A spokeswoman attributed the outage to
"a route table issue." -- cnn.com,
October 3, 2002
"A number of Covad customers went out from 5pm
today due to, supposedly, a DDOS (distributed
denial of service attack) on a key Level3 data
center, which later was described as a route leak
(misconfiguration). -- dslreports.com,
February 23, 2004
12
Several Big Problems a Week
13
Why is routing hard to get right?
  • Defining correctness is hard
  • Interactions cause unintended consequences
  • Each network independently configured
  • Unintended policy interactions
  • Operators make mistakes
  • Configuration is difficult
  • Complex policies, distributed configuration

14
Correctness Specification
Safety The protocol converges to a stable path
assignment for every possible initial state and
message ordering
15
Safety No Persistent Oscillation
1 3 0 1 0
0
2 1 0 2 0
3 2 0 3 0
Varadhan, Govindan, Estrin, Persistent Route
Oscillations in Interdomain Routing, 1996
16
Strawman Global Policy Check
  • Require each AS to publish its policies
  • Detect and resolve conflicts

Problems
  • ASes typically unwilling to reveal policies
  • Checking for convergence is NP-complete
  • Failures may still cause oscillations

17
Think Globally, Act Locally
  • Key features of a good solution
  • Safety guaranteed convergence
  • Expressiveness allow diverse policies for each
    AS
  • Autonomy do not require revelation/coordination
  • Backwards-compatibility no changes to BGP
  • Local restrictions on configuration semantics
  • Ranking
  • Filtering

18
Main Idea of Todays Paper
  • Permit only two business arrangements
  • Customer-provider
  • Peering
  • Constrain both filtering and ranking based on
    these arrangements to guarantee safety
  • Surprising result these arrangements correspond
    to todays (common) behavior

Gao Rexford, Stable Internet Routing without
Global Coordination, IEEE/ACM ToN, 2001
19
Relationship 1 Customer-Provider
  • Filtering
  • Routes from customer to everyone
  • Routes from provider only to customers

From the customer To other destinations
From other destinations To the customer
providers
providers
advertisements
traffic
20
Relationship 2 Peering
  • Filtering
  • Routes from peer only to customers
  • No routes from other peers or providers

advertisements
peer
peer
21
Rankings
  • Routes from customers over routes from peers
  • Routes from peers over routes from providers

22
Additional Assumption Hierarchy
Disallowed!
23
Safety Proof Sketch
  • System state the current route at each AS
  • Activation sequence revisit some routers
    selection based on those of neighboring ASes

24
Activation Sequence Intuition
  • Activation emulates a message ordering
  • Activated router has received and processed all
    messages corresponding to the system state
  • Fair activation all routers receive and
    process outstanding messages

25
Safety Proof Sketch
  • State the current route at each AS
  • Activation sequence revisit some routers
    selection based on those of neighboring ASes
  • Goal find an activation sequence that leads to a
    stable state
  • Safety satisfied if that activation sequence is
    contained within any fair activation sequence

26
Proof, Step 1 Customer Routes
  • Activate ASes from customer to provider
  • AS picks a customer route if one exists
  • Decision of one AS cannot cause an earlier AS to
    change its mind

An AS picks a customer route when one exists
27
Proof, Step 2 Peer Provider Routes
  • Activate remaining ASes from provider to customer
  • Decision of one Step-2 AS cannot cause an earlier
    Step-2 AS to change its mind
  • Decision of Step-2 AS cannot affect a Step-1 AS

AS picks a peer or provider route when no
customer route is available
28
Ranking and Filtering Interactions
  • Allowing more flexibility in ranking
  • Allow same preference for peer and customer
    routes
  • Never choose a peer route over a shorter customer
    route
  • at the expense of stricter AS graph assumptions
  • Hierarchical provider-customer relationship (as
    before)
  • No private peering with (direct or indirect)
    providers

Peering
29
Some problems
  • Requires acyclic hierarchy (global condition)
  • Cannot express many business relationships

Sprint
Abovenet
Verio
Customer
PSINet
Question Can we relax the constraints on
filtering? What happens to rankings?
30
Other Possible Local Rankings
  • Accept only next-hop rankings
  • Captures most routing policies
  • Generalizes customer/peer/provider
  • Problem system not safe
  • Accept only shortest hop count rankings
  • Guarantees safety under filtering
  • Problem not expressive

Feamster, Johari, Balakrishnan, Implications
of Autonomy for the Expressiveness of Policy
Routing, SIGCOMM 2005
31
What Rankings Violate Safety?
Theorem. Permitting paths of length n2 over
paths of length n will violate safety under
filtering. Theorem. Permitting paths of length
n1 over paths of length n will result in a
dispute wheel.
Feamster, Johari, Balakrishnan, Implications
of Autonomy for the Expressiveness of Policy
Routing, SIGCOMM 2005
32
What about properties of resulting paths, after
the protocol has converged?
We need additional correctness properties.
33
Correctness Specification
Safety The protocol converges to a stable path
assignment for every possible initial state and
message ordering
Path Visibility Every destination with a usable
path has a route advertisement
If there exists a path, then there exists a route
Example violation Network partition
Route Validity Every route advertisement
corresponds to a usable path
If there exists a route, then there exists a path
Example violation Routing loop
34
Path Visibility Internal BGP (iBGP)
Default Full mesh iBGP. Doesnt
scale. Large ASes use Route reflection
Route reflector non-client routes over client
sessions client routes over all sessions
Client dont re-advertise iBGP routes.
35
iBGP Signaling Static Check
Theorem. Suppose the iBGP reflector-client
relationship graph contains no cycles. Then, path
visibility is satisfied if, and only if, the set
of routers that are not route reflector clients
forms a clique. Condition is easy to check with
static analysis.
36
How do we guarantee these additional properties
in practice?
37
Today Reactive Operation
What happens if I tweak this policy?
Revert
No
Yes
Wait for Next Problem
Desired Effect?
Configure
Observe
  • Problems cause downtime
  • Problems often not immediately apparent

38
Goal Proactive Operation
  • Idea Analyze configuration before deployment

Many faults can be detected with static analysis.
Feamster Balakrishnan, Detecting BGP
Configuration Faults with Static Analysis, NSDI
2005
39
rcc Overview
rcc
Distributed router configurations (Single AS)
Correctness Specification

Constraints
Faults
Normalized Representation
Challenges
  • Analyzing complex, distributed configuration
  • Defining a correctness specification
  • Mapping specification to constraints

Feamster Balakrishnan, Detecting BGP
Configuration Faults with Static Analysis, NSDI
2005
40
rcc Implementation
Preprocessor
Parser
Distributed router configurations
Relational Database (mySQL)
(Cisco, Avici, Juniper, Procket, etc.)
Constraints
Verifier
Faults
Feamster Balakrishnan, Detecting BGP
Configuration Faults with Static Analysis, NSDI
2005
41
Summary Faults across 17 ASes
Every AS had faults, regardless of network size
Most faults can be attributed to distributed
configuration
Route Validity
Path Visibility
42
rcc Take-home lessons
  • Static configuration analysis uncovers many
    errors
  • Major causes of error
  • Distributed configuration
  • Intra-AS dissemination is too complex
  • Mechanistic expression of policy

http//nms.csail.mit.edu/rcc/ About 100 downloads
(70 network operators)
43
Two Philosophies
  • This lecture Accept the Internet as is. Devise
    band-aids.
  • Another direction Redesign Internet routing to
    guarantee safety, route validity, and path
    visibility

44
Preventing Errors in the First Place
Before conventional iBGP
eBGP
iBGP
After RCP gets best iBGP routes (and IGP
topology)
Feamster et al., The Case for Separating Routing
from Routers, SIGCOMM FDNA, 2004
Caesar et al., Design and Implementation of a
Routing Control Platform, NSDI, 2005
45
(No Transcript)
46
Configuration Syntax (Example)
router bgp 7018 neighbor
192.0.2.10 remote-as 65000 neighbor 192.0.2.10
route-map IMPORT in neighbor
192.0.2.20 remote-as 7018 neighbor
192.0.2.20 route-reflector-client !
route-map IMPORT permit 1 match ip address
199 set local-preference 80 !
route-map IMPORT permit 2 match
as-path 99 set local-preference 110
! route-map IMPORT permit 3 set
community 70181000 ! ip as-path
access-list 99 permit 65000
access-list 199 permit ip host 192.0.2.0 host
255.255.255.0 access-list 199 permit ip
host 10.0.0.0 host 255.0.0.0
47
Why is Routing Hard to Get Right?
  • Defining correctness is hard
  • Operators make mistakes
  • Configuration is difficult
  • Complex policies, distributed configuration
  • Interactions cause unintended consequences
  • Each network independently configured
  • Unintended policy interactions

48
Which faults does rcc detect?
Faults found by rcc
Latent faults
Potentially active faults
End-to-end failures
49
Normalizing Router Configuration
Challenge
  • Hundreds of routers distributed across an AS
  • Thousands to tens of thousands of lines per
    router
  • Many ways to express identical policy

Solution
  • Express configuration with centralized tables
  • Check constraints by issuing queries on tables

Sessions
50
Route Validity Consistent Export
Possible Causes
  • Malice/deception
  • iBGP signaling partition
  • Inconsistent export policy

neighbor 10.4.5.6 route-map PEER permit 10 set
prepend 123 123
neighbor 10.1.2.3 route-map PEER permit 10 set
prepend 123
51
Inconsistent Export Observed at ATT
15 of destinations inconsistent for gt4 days
Percentage of destinations with inconsistent
routes
Percentage of time
Feamster et al., BorderGuard Detecting Cold
Potatoes from Peers. ACM IMC, October 2004.
52
Example Bogon Routes
Feamster et al., An Empirical Study of Bogon
Route Advertisements. ACM CCR, January 2005.
53
rcc Interface
54
Parsing Configuration
55
List of Faults
Write a Comment
User Comments (0)
About PowerShow.com