COMS/CSEE 4140 Networking Laboratory Lecture 06 - PowerPoint PPT Presentation

1 / 79
About This Presentation
Title:

COMS/CSEE 4140 Networking Laboratory Lecture 06

Description:

BGP Issues - What is a BGP Wedgie? BGP policies make sense locally ... BGP Wedgies: Bad Routing Policy Interactions that Cannot be Debugged ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 80
Provided by: Sal60
Category:

less

Transcript and Presenter's Notes

Title: COMS/CSEE 4140 Networking Laboratory Lecture 06


1
COMS/CSEE 4140 Networking LaboratoryLecture 06
  • Salman Abdul Baset
  • Spring 2008

2
Announcements
  • Lab 4 (5-7) due next week before your lab slot
  • Prelab 5 due next week.
  • There will be Lab 5 next week.
  • Midterm (March 10th, duration 1.5 hours)
  • Assignment 2 issues
  • aslookup compilation?
  • ISP name nslookup or whois for IP address
  • Lab 4 (count-to-infinity issues)

3
Agenda
  • Autonomous Systems (AS)
  • Policy vs. distance based routing
  • Border gateway protocol (BGP)
  • Transmission control protocol (TCP)

4
Autonomous Systems Terminology
  • local traffic traffic with source or
    destination in AS
  • transit traffic traffic that passes through
    the AS
  • Stub AS has connection to only one AS, only
    carry local traffic
  • Multihomed AS has connection to gt1 AS, but
    does not carry transit traffic
  • Transit AS has connection to gt1 AS and
    carries transit traffic

5
Stub and Transit Networks
  • AS 1, AS 2, and AS 5 are stub networks
  • AS 2 is a multi-homed stub network
  • AS 3 and AS 4 are transit networks

6
Selective Transit
  • Example
  • Transit AS 3 carries traffic between AS 1 and AS
    4 and between AS 2 and AS 4
  • But AS 3 does not carry traffic between AS 1 and
    AS 2
  • The example shows a routing policy.

7
Customer/Provider
  • A stub network typically obtains access to the
    Internet through a transit network.
  • Transit network that is a provider may be a
    customer for another network
  • Customer pays provider for service

8
Customer/Provider and Peers
  • Transit networks can have a peer relationship
  • Peers provide transit between their respective
    customers
  • Peers do not provide transit between peers
  • Peers normally do not pay each other for service

9
Shortcuts through peering
  • Note that peering reduces upstream traffic
  • Delays can be reduced through peering
  • But Peering may not generate revenue

10
ASNs already assigned
Source http//www.potaroo.net/tools/asn32/
private ASN 65412 65536
11
ASNs in use
12
ASN projections
13
Autonomous Routing Domains Dont Always Need BGP
or an ASN
ARDs versus ASes
Qwest
Nail up routes 130.132.0.0/16 pointing to Yale
Nail up default routes 0.0.0.0/0 pointing to Qwest
Yale University
130.132.0.0/16
Static routing is the most common way of
connecting an autonomous routing domain to the
Internet. This helps explain why BGP is a
mystery to many
14
ASNs Can Be Shared (RFC 2270)
AS 701 UUNet
AS 7046 Crestar Bank
AS 7046 NJIT
AS 7046 Hood College
128.235.0.0/16
ASN 7046 is assigned to UUNet. It is used
by Customers single homed to UUNet, but needing
BGP for some reason (load balancing, etc..) RFC
2270
15
ARDs and ASes Summary
  • Most ARDs have no ASN (statically routed at
    Internet edge)
  • Some unrelated ARDs share the same ASN (RFC
    2270)
  • Some ARDs are implemented with multiple ASNs
    (example Worldcom)

ASes are just an implementation detail of
Inter-domain routing
16
Agenda
  • Autonomous Systems (AS)
  • Policy vs. distance based routing
  • Border gateway protocol (BGP)
  • Transmission control protocol (TCP)

17
Why not minimize AS hop Count?
Shortest path routing is not compatible with
commercial relations
18
Customer versus Provider
provider
customer
Customer pays provider for access to the Internet
19
The Peering Relationship
20
Peering Provides Shortcuts
21
Peering Wars
Peer
Dont Peer
  • You would rather have customers
  • Peers are usually your competition
  • Peering relationships may require periodic
    renegotiation
  • Reduces upstream transit costs
  • Can increase end-to-end performance
  • May be the only way to connect your customers to
    some part of the Internet (Tier 1)

Peering struggles are by far the most
contentious issues in the ISP world! Peering
agreements are often confidential.
22
Agenda
  • Autonomous Systems (AS)
  • Policy vs. distance based routing
  • Border gateway protocol (BGP)
  • Transmission control protocol (TCP)

23
The Gang of Four
24
BGP Overview
  • BGP Border Gateway Protocol v4 . RFC 1771. (
    60 pages)
  • Note In the context of BGP, a gateway is nothing
    else but an IP router that connects autonomous
    systems.
  • Interdomain routing protocol for routing between
    autonomous systems.
  • Uses TCP to establish a BGP session and to send
    routing messages over the BGP session.
  • Update only new routes.
  • BGP is a path vector protocol. Routing messages
    in BGP contain complete routes.
  • Network administrators can specify routing
    policies.

25
BGP Policy-based Routing
  • Each node is assigned an AS number (ASN)
  • BGPs goal is to find any AS-path (not an optimal
    one). Since the internals of the AS are never
    revealed, finding an optimal path is not
    feasible.
  • Network administrator sets BGPs policies to
    determine the best path to reach a destination
    network.

26
The Border Gateway Protocol (BGP)
BGP
RFC 1771

optional extensions RFC 1997 (communities) RFC
2439 (damping) RFC 2796 (reflection) RFC3065
(confederation)

routing policy configuration languages
(vendor-specific)

Current Best Practices in management of
Interdomain Routing
BGP was not DESIGNED. It EVOLVED.
27
BGP Route Processing
Open ended programming. Constrain
ed only by vendor configuration language
Apply Policy filter routes tweak attributes
Apply Policy filter routes tweak attributes
Receive BGP Updates
Best Routes
Transmit BGP Updates
Based on Attribute Values
Best Route Selection
Apply Import Policies
Best Route Table
Apply Export Policies
Install forwarding Entries for best Routes.
IP Forwarding Table
28
BGP Attributes
Value Code
Reference ----- -----------------------------
---- --------- 1 ORIGIN
RFC1771 2 AS_PATH
RFC1771 3 NEXT_HOP
RFC1771 4
MULTI_EXIT_DISC RFC1771 5
LOCAL_PREF RFC1771
6 ATOMIC_AGGREGATE
RFC1771 7 AGGREGATOR
RFC1771 8 COMMUNITY
RFC1997 9 ORIGINATOR_ID
RFC2796 10 CLUSTER_LIST
RFC2796 11 DPA
Chen 12
ADVERTISER RFC1863 13
RCID_PATH / CLUSTER_ID RFC1863
14 MP_REACH_NLRI
RFC2283 15 MP_UNREACH_NLRI
RFC2283 16 EXTENDED
COMMUNITIES Rosen ... 255
reserved for development
Most important attributes
Not all attributes need to be present in every
announcement
From IANA http//www.iana.org/assignments/bgp-par
ameters
29
LOCAL_PREF Attribute
Forces outbound traffic to take primary link,
unless link is down.
30
NEXT_HOP Attribute
  • EGP IP address used to reach the advertising
    router
  • IGP next-hop address is carried into local AS

31
AS_PATH Attribute
  • Used to detect routing loops and find shortest
    paths

32
Shedding Inbound Traffic with ASPATH Prepending
Prepending will (usually) force inbound traffic
from AS 1 to take primary link
AS 1
provider
192.0.2.0/24 ASPATH 2 2 2
192.0.2.0/24 ASPATH 2
backup
primary
customer
192.0.2.0/24
AS 2
Yes, this is a Glorious Hack
33
But Padding Does Not Always Work
AS 1
AS 3
provider
provider
192.0.2.0/24 ASPATH 2 2 2 2 2 2 2 2 2 2 2 2 2
192.0.2.0/24 ASPATH 2
AS 3 will send traffic on backup link because
it prefers customer routes and local preference
is considered before ASPATH length! Padding in
this way is often used as a form of load balancing
backup
primary
customer
192.0.2.0/24
AS 2
34
COMMUNITY Attribute to the Rescue!
AS 3 normal customer local pref is 100, peer
local pref is 90
AS 1
AS 3
provider
provider
192.0.2.0/24 ASPATH 2 COMMUNITY 370
192.0.2.0/24 ASPATH 2
backup
primary
Customer import policy at AS 3 If 390 in
COMMUNITY then set local preference to 90 If
380 in COMMUNITY then set local preference
to 80 If 370 in COMMUNITY then set local
preference to 70
customer
192.0.2.0/24
AS 2
35
BGP Issues - What is a BGP Wedgie?
  • BGP policies make sense locally
  • Interaction of local policies allows multiple
    stable routings
  • Some routings are consistent with intended
    policies, and some are not
  • If an unintended routing is installed (BGP is
    wedged), then manual intervention is needed to
    change to an intended routing
  • When an unintended routing is installed, no
    single group of network operators has enough
    knowledge to debug the problem

Full wedgie
36
YouTube blocking
  • Pakistan blocks YouTube
  • How? (according to BBC)
  • Advertise a shorter route to reach YouTube
  • The incorrect short route gets propagated
  • Seen by two thirds of the Internet
  • Traffic to YouTube goes through Pakistan
  • Since Pakistan blocked YouTube, all traffic
    reaches a dead end!

37
Dynamic Routing Protocols Summary
  • Dynamic routing protocols RIP, OSPF, BGP
  • RIP uses distance vector algorithm, and converges
    slow (the count-to-infinity problem)
  • OSPF uses link state algorithm, and converges
    fast. But it is more complicated than RIP.
  • Both RIP and OSPF finds lowest-cost path.
  • BGP uses path vector algorithm, and its path
    selection algorithm is complicated, and is
    influenced by policies.
  • BGP has its own problems see WIDGI by Tim Griffin

38
More Readings (Optional)
  • BGP Wedgies Bad Routing Policy Interactions that
    Cannot be Debugged
  • JIs Intro to interdomain routing.
  • "Interdomain Setting of PlanetLab Nodes."
    PlanetLab Meeting, May 14, 2004.
  • Understanding the Border Gateway Protocol (BGP)
  • ICNP 2002 Tutorial Session

39
Agenda
  • Autonomous Systems (AS)
  • Policy vs. distance based routing
  • Border gateway protocol (BGP)
  • Transmission control protocol (TCP)

40
Transmission Control Protocol (RFC)
  • Reliable and in-order byte-stream service
  • TCP format
  • Connection establishment
  • Flow control
  • Reaction to congestion
  • Packet corruption

41
TCP Format
  • TCP segments have a 20 byte header with gt 0
    bytes of data.

42
TCP header fields
  • Sequence Number (SeqNo)
  • Sequence number is 32 bits long.
  • So the range of SeqNo is
  • 0 lt SeqNo lt 232 -1 ? 4.3 Gbyte
  • Each sequence number identifies a byte in the
    byte stream
  • Initial Sequence Number (ISN) of a connection is
    set during connection establishment
  • Q What are possible requirements for ISN ?

43
TCP header fields
  • Acknowledgement Number (AckNo)
  • Acknowledgements are piggybacked, i.e.,
  • a segment from A -gt B can contain an
    acknowledgement for a data sent in the B -gt A
    direction
  • Q Why is piggybacking good ?
  • A hosts uses the AckNo field to send
    acknowledgements. (If a host sends an AckNo in a
    segment it sets the ACK flag)
  • The AckNo contains the next SeqNo that a hosts
    wants to receiveExample The acknowledgement
    for a segment with sequence numbers 0-1500 is
    AckNo1501

44
TCP header fields
  • Acknowledge Number (contd)
  • TCP uses the sliding window flow protocol (see CS
    457) to regulate the flow of traffic from sender
    to receiver
  • TCP uses the following variation of sliding
    window
  • no NACKs (Negative ACKnowledgement)
  • only cumulative ACKs
  • Example
  • Assume Sender sends two segments with 1..1500
    and 1501..3000, but receiver only gets the
    second segment.
  • In this case, the receiver cannot acknowledge the
    second packet. It can only send AckNo1

45
TCP header fields
  • Header Length ( 4bits)
  • Length of header in 32-bit words
  • Note that TCP header has variable length (with
    minimum 20 bytes)

46
TCP header fields
  • Flag bits
  • URG Urgent pointer is valid
  • If the bit is set, the following bytes contain an
    urgent message in the rangeSeqNo lt urgent
    message lt SeqNourgent pointer
  • ACK Acknowledgement Number is valid
  • PSH PUSH Flag
  • Notification from sender to the receiver that the
    receiver should pass all data that it has to the
    application.
  • Normally set by sender when the senders buffer
    is empty

47
TCP header fields
  • Flag bits
  • RST Reset the connection
  • The flag causes the receiver to reset the
    connection
  • Receiver of a RST terminates the connection and
    indicates higher layer application about the
    reset
  • SYN Synchronize sequence numbers
  • Sent in the first packet when initiating a
    connection
  • FIN Sender is finished with sending
  • Used for closing a connection
  • Both sides of a connection must send a FIN

48
TCP header fields
  • Window Size
  • Each side of the connection advertises the window
    size
  • Window size is the maximum number of bytes that a
    receiver can accept.
  • Maximum window size is 216-1 65535 bytes
  • TCP Checksum
  • TCP checksum covers over both TCP header and TCP
    data (also covers some parts of the IP header)
  • 16-bit ones complement
  • Urgent Pointer
  • Only valid if URG flag is set

49
TCP header fields
  • Options

50
TCP header fields
  • Options
  • NOP is used to pad TCP header to multiples of 4
    bytes
  • Maximum Segment Size
  • Window Scale Options
  • Increases the TCP window from 16 to 32 bits,
    i.e., the window size is interpreted differently
  • Q What is the different interpretation ?
  • This option can only be used in the SYN segment
    (first segment) during connection establishment
    time
  • Timestamp Option
  • Can be used for roundtrip measurements

51
Three-Way Handshake
52
Why is a Two-Way Handshake not enough?
Will be discarded as a duplicate SYN
When aida initiates the data transfer (starting
with SeqNo15322112355), mng will reject all
data.
53
TCP Connection Termination
54
Connection termination with tcpdump
  • 1 mng.poly.edu.telnet gt aida.poly.edu.1121 F
    172488734172488734(0) ack 1031880221 win 8733
  • 2 aida.poly.edu.1121 gt mng.poly.edu.telnet .
    ack 172488735 win 17484
  • 3 aida.poly.edu.1121 gt mng.poly.edu.telnet F
    10318802211031880221(0) ack 172488735 win
    17520
  • 4 mng.poly.edu.telnet gt aida.poly.edu.1121 . ack
    1031880222 win 8733

55
TCP States in Normal Connection Lifetime
56
TCP State Transition DiagramOpening A Connection
57
TCP State Transition DiagramClosing A Connection
Issue close()
58
2MSL Wait State
  • 2MSL Wait State TIME_WAIT
  • When TCP does an active close, and sends the
    final ACK, the connection must stay in in the
    TIME_WAIT state for twice the maximum segment
    lifetime.
  • 2MSL 2 Maximum Segment Lifetime
  • Why? TCP is given a chance to resent the final
    ACK. (Server will timeout after sending the FIN
    segment and resend the FIN)
  • The MSL is set to 2 minutes or 1 minute or 30
    seconds.

59
Rules for sending Acknowledgments
  • TCP has rules that influence the transmission of
    acknowledgments
  • Rule 1 Delayed Acknowledgments
  • Goal Avoid sending ACK segments that do not
    carry data
  • Implementation Delay the transmission of (some)
    ACKs
  • Rule 2 Nagles rule
  • Goal Reduce transmission of small segments
    Implementation A sender cannot send multiple
    segments with a 1-byte payload (i.e., it must
    wait for an ACK)

60
Delayed Acknowledgement
  • TCP delays transmission of ACKs for up to 200ms
  • Goal Avoid to send ACK packets that do not carry
    data.
  • The hope is that, within the delay, the receiver
    will have data ready to be sent to the receiver.
    Then, the ACK can be piggybacked with a data
    segment
  • In Example
  • Delayed ACK explains why the ACK of character
    and the echo of character are sent in the same
    segment
  • The duration of delayed ACKs can be observed in
    the example when Argon sends ACKs
  • Exceptions
  • ACK should be sent for every second full sized
    segment
  • Delayed ACK is not used when packets arrive out
    of order

61
Observing Delayed Acknowledgements
  • Remote terminal applications (e.g., Telnet) send
    characters to a server. The server interprets the
    character and sends the output at the server to
    the client.
  • For each character typed, you see three packets
  • Client ? Server Send typed character
  • Server ? Client Echo of character (or user
    output) and acknowledgement for first packet
  • Client ? Server Acknowledgement for second packet

62
Observing Delayed Acknowledgements
  • This is the output of typing 3 (three) characters
  • Time 44.062449 Argon ? Neon Push, SeqNo
    01(1), AckNo 1
  • Time 44.063317 Neon ? Argon Push, SeqNo
    12(1), AckNo 1
  • Time 44.182705 Argon ? Neon No Data, AckNo
    2
  • Time 48.946471 Argon ? Neon Push, SeqNo
    12(1), AckNo 2
  • Time 48.947326 Neon ? Argon Push, SeqNo
    23(1), AckNo 2
  • Time 48.982786 Argon ? Neon No Data, AckNo
    3
  • Time 55.116581 Argon ? Neon Push, SeqNo
    23(1) AckNo 3
  • Time 55.117497 Neon ? Argon Push, SeqNo
    34(1) AckNo 3
  • Time 55.183694 Argon ? Neon No Data, AckNo 4

63
Why 3 segments per character?
  • We would expect four segments per character
  • But we only see three segments per character
  • This is due to delayed acknowledgements

64
Observing Nagles Rule
  • This is the output of typing 7 characters
  • Time 16.401963 Argon ? Tenet Push, SeqNo
    12(1), AckNo 2
  • Time 16.481929 Tenet ? Argon Push, SeqNo
    23(1) , AckNo 2
  • Time 16.482154 Argon ? Tenet Push, SeqNo
    23(1) , AckNo 3
  • Time 16.559447 Tenet ? Argon Push, SeqNo
    34(1), AckNo 3
  • Time 16.559684 Argon ? Tenet Push, SeqNo
    34(1), AckNo 4
  • Time 16.640508 Tenet ? Argon Push, SeqNo
    45(1) AckNo 4
  • Time 16.640761 Argon ? Tenet Push, SeqNo
    48(4) AckNo 5
  • Time 16.728402 Tenet ? Argon Push, SeqNo
    59(4) AckNo 8

65
Observing Nagles Rule
  • Observation Transmission of segments follows a
    different pattern, i.e., there are only two
    segments per character typed
  • Delayed acknowledgment does not kick in at Argon
  • The reason is that there is always data at Argon
    ready to sent when the ACK arrives
  • Why is Argon not sending the data (typed
    character) as soon as it is available?

66
Resetting Connections
  • Resetting connections is done by setting the RST
    flag
  • When is the RST flag set?
  • Connection request arrives and no server process
    is waiting on the destination port
  • Abort (Terminate) a connection Causes the
    receiver to throw away buffered data. Receiver
    does not acknowledge the RST segment

67
TCP Congestion Control
  • TCP has a mechanism for congestion control. The
    mechanism is implemented at the sender
  • The window size at the sender is set as follows
  • Send Window MIN (flow control window,
    congestion window)
  • where
  • flow control window is advertised by the receiver
  • congestion window is adjusted based on feedback
    from the network

68
TCP Congestion Control
  • TCP congestion control is governed by two
    parameters
  • Congestion Window (cwnd)
  • Slow-start threshhold Value (ssthresh)
  • Initial value is 216-1
  • Congestion control works in two modes
  • slow start (cwnd lt ssthresh)
  • congestion avoidance (cwnd ssthresh

69
Slow Start
  • Initial value Set cwnd 1
  • Note Unit is a segment size. TCP actually is
    based on bytes and increments by 1 MSS (maximum
    segment size)
  • The receiver sends an acknowledgement (ACK) for
    each Segment
  • Note Generally, a TCP receiver sends an ACK for
    every other segment.
  • Each time an ACK is received by the sender, the
    congestion window is increased by 1 segment
  • cwnd cwnd 1
  • If an ACK acknowledges two segments, cwnd is
    still increased by only 1 segment.
  • Even if ACK acknowledges a segment that is
    smaller than MSS bytes long, cwnd is increased by
    1.
  • Does Slow Start increment slowly? Not really. In
    fact, the increase of cwnd is exponential

70
Slow Start Example
  • The congestion window size grows very rapidly
  • For every ACK, we increase cwnd by 1 irrespective
    of the number of segments ACKed
  • TCP slows down the increase of cwnd when cwnd gt
    ssthresh

71
Congestion Avoidance
  • Congestion avoidance phase is started if cwnd has
    reached the slow-start threshold value
  • If cwnd ssthresh then each time an ACK is
    received, increment cwnd as follows
  • cwnd cwnd 1/ cwnd
  • So cwnd is increased by one only if all cwnd
    segments have been acknowledged.

72
Example of Slow Start/Congestion Avoidance
  • Assume that ssthresh 8

ssthresh
Cwnd (in segments)
Roundtrip times
73
Responses to Congestion
  • So, TCP assumes there is congestion if it detects
    a packet loss
  • A TCP sender can detect lost packets via
  • Timeout of a retransmission timer
  • Receipt of a duplicate ACK
  • TCP interprets a Timeout as a binary congestion
    signal. When a timeout occurs, the sender
    performs
  • cwnd is reset to one
  • cwnd 1
  • ssthresh is set to half the current size of the
    congestion window
  • ssthressh cwnd / 2
  • and slow-start is entered

74
Fast Retransmit
  • If three or more duplicate ACKs are received in a
    row, the TCP sender believes that a segment has
    been lost.
  • Then TCP performs a retransmission of what seems
    to be the missing segment, without waiting for a
    timeout to happen.
  • Enter slow start
  • ssthresh cwnd/2
  • cwnd 1

75
Fast Recovery
  • Fast recovery avoids slow start after a fast
    retransmit
  • Intuition Duplicate ACKs indicate that data is
    getting through
  • After three duplicate ACKs set
  • Retransmit packet that is presumed lost
  • ssthresh cwnd/2
  • cwnd cwnd3
  • (note the order of operations)
  • Increment cwnd by one for each additional
    duplicate ACK
  • When ACK arrives that acknowledges new data
    (here AckNo6148), set
  • cwndssthresh
  • enter congestion avoidance

76
Flavors of TCP Congestion Control
  • TCP Tahoe (1988, FreeBSD 4.3 Tahoe)
  • Slow Start
  • Congestion Avoidance
  • Fast Retransmit
  • TCP Reno (1990, FreeBSD 4.3 Reno)
  • Fast Recovery
  • New Reno (1996)
  • SACK (1996)
  • RED (Floyd and Jacobson 1993)

77
SACK
  • SACK Selective acknowledgment
  • Issue Reno and New Reno retransmit at most 1
    lost packet per round trip time
  • Selective acknowledgments The receiver can
    acknowledge non-continuous blocks of data (SACK
    0-1023, 1024-2047)
  • Multiple blocks can be sent in a single segment.
  • TCP SACK
  • Enters fast recovery upon 3 duplicate ACKs
  • Sender keeps track of SACKs and infers if
    segments are lost. Sender retransmits the next
    segment from the list of segments that are deemed
    lost.

78
TCP in Linux
  • Congestion control algorithm is pluggable
  • /proc/sys/net/ipv4/tcp_congestion_control
  • TCP read and write buffer sizes
  • /proc/sys/net/ipv4/tcp_rwmem

79
Midterm questions
  • ARP, ICMP, UDP, TCP, RIP, OSPF, BGP
  • Compare and contrast design principles in
    protocols.
  • Fragmentation
Write a Comment
User Comments (0)
About PowerShow.com