Title: Computer Networking A Top-Down Approach Featuring the Internet ?????-???????Internet??
1 Computer Networking A Top-Down Approach
Featuring the Internet?????-???????Internet??
2Chapter 3 Transport Layer
- learn about transport layer protocols in the
Internet - UDP connectionless transport
- TCP connection-oriented transport
- TCP congestion control
- Our goals
- understand principles behind transport layer
services - multiplexing/demultiplexing
- reliable data transfer
- flow control
- congestion control
3Chapter 3 Roadmap
- 3.1 Introduction and Transport-layer services
- 3.2 Multiplexing and Demultiplexing
- 3.3 Connectionless transport UDP
- 3.4 Principles of reliable data transfer
- 3.5 Connection-oriented transport TCP
- 3.6 Principles of congestion control
- 3.7 TCP congestion control
- 3.8 Summary
4Introduction
- Transport layer protocol provides for logical
communication between app processes running on
different hosts. - From an applications perspective, it is as if
the hosts running the processes were directly
connected - In reality, the hosts may be connected via
numerous routers and a wide range of link types.
3.1 Introduction and Transport-Layer Services
5Introduction
3.1 Introduction and Transport-Layer Services
6Introduction
- transport protocols run in end systems
- send side breaks app messages into segments,
passes to network layer - rcv side reassembles segments into messages,
passes to app layer - Internet provides two transport services for apps
- TCP and UDP
3.1 Introduction and Transport-Layer Services
7Transport vs. Network Layer
- network layer
- logical communication between hosts
- transport layer
- logical communication between processes
- relies on and enhances network layer services
3.1 Introduction and Transport-Layer Services
8Transport vs. Network Layer
3.1 Introduction and Transport-Layer Services
- Household analogy
- 12 kids sending letters to each of 12 kids in
another family - app messages letters in envelopes
- transport protocol Ann and Bill
- network-layer protocol postal service
- hosts houses
- processes kids
9Internet transport-layer protocols
3.1 Introduction and Transport-Layer Services
- Application developer must specify one of the two
transport services. - TCP
- reliable, in-order, connection-oriented delivery
- congestion control
- flow control
- UDP
- unreliable, unordered , connectionless delivery
- no-frills extension of best-effort IP
10Internet transport-layer protocols
- IPs service model is a best-effort service
- IP makes its best-effort to deliver segments
between hosts, but it makes no guarantees. - Transport protocols extend IPs delivery service
between two hosts to delivery service between two
processes. - transport-layer multiplexing demultiplexing
3.1 Introduction and Transport-Layer Services
11Chapter 3 Roadmap
- 3.1 Introduction and Transport-layer services
- 3.2 Multiplexing and Demultiplexing
- 3.3 Connectionless transport UDP
- 3.4 Principles of reliable data transfer
- 3.5 Connection-oriented transport TCP
- 3.6 Principles of congestion control
- 3.7 TCP congestion control
- 3.8 Summary
12Introduction
- When receives segments, transport layer protocol
should delivery the data to the appropriate app
processes. - Actually, delivers data the to the socket of a
process
3.2 Multiplexing and Demultiplexing
13Multiplexing/demultiplexing
process
socket
host 3
host 2
host 1
14How demultiplexing works?
- Multiplexing requirements
- 1) sockets' unique identifier
- 2) source and destination port number header
fields in segments - Port number
- 16bit
- 01023 well-known port numbers
- HTTP 80 SMTP25
3.2 Multiplexing and Demultiplexing
15Connectionless demultiplexing
3.2 Multiplexing and Demultiplexing
- A UDP socket is fully identified by a two-tuple
- lt dest address, dest port gt
- When host receives UDP segment
- checks destination port number in the segment
- directs UDP segment to socket with that port
number - demultiplexing
16Connectionless demux (cont)
- P3 DatagramSocket serverSocket new
DatagramSocket(6428)
segments with different source IP addresses
and/or source port numbers directed to same socket
17Connection-Oriented demultiplexing
- TCP socket identified by a four-tuple
- ltsource address, source port, dest address, dest
portgt - recv host uses all four values to direct segment
to appropriate socket
3.2 Multiplexing and Demultiplexing
ServerSocket welcomeSocketnew ServerSocket(PNum)
Socket connectionSocket welcomeScoekt.Acce
pt()
18Connection-Oriented demultiplexing
- Server host may support many simultaneous TCP
sockets - each socket identified by its own 4-tuple
- Web servers have different sockets for each
connecting client - non-persistent HTTP will have different socket
for each request
3.2 Multiplexing and Demultiplexing
19Web Server and TCP
203.2 Multiplexing Demultiplexing summary
- multiplexing demultiplexing extend delivery
service between hosts to service between
processes. - demultiplexing delivering data in the received
segments to correct socket - multiplexing gathering data, create segments
and passing the segments to network layer. - demultiplexing
- multiplexing requirements
- connectionless demultiplexing
- connection-oriented demultiplexing
21Chapter 3 Roadmap
- 3.1 Introduction and Transport-layer services
- 3.2 Multiplexing and Demultiplexing
- 3.3 Connectionless transport UDP
- 3.4 Principles of reliable data transfer
- 3.5 Connection-oriented transport TCP
- 3.6 Principles of congestion control
- 3.7 TCP congestion control
- 3.8 Summary
22Introduction
- no frills, bare bones transport protocol
- multiplexing/demultiplexing
- light error checking
- The application is directly (almost) talking with
IP - UDP segment
- source/dest port fields multiplexing/demultiplex
ing - length and checksum fields
3.3 Connectionless Transport UDP
23Introduction
3.3 Connectionless Transport UDP
32 bits
source port
dest port
Length, in bytes of UDP segment, including header
checksum
length
Application data (message)
UDP segment format
24Introduction
- UDP is a connectionless transport layer protocol
- no handshaking between sender and receiver
- each UDP segment handled independently of others
- for example DNS
3.3 Connectionless Transport UDP
25Why using UDP?
- Why is there a UDP?
- finer application-level control what data is sent
,and when. - no connection establishment
- does not introduce delay
- no connection state
- no buffers, connection parameters, seq /Ack
number parameters - small segment header overhead
3.3 Connectionless Transport UDP
26Applications using UDP
271. UDP Segment Structure
32 bits
3.3 Connectionless Transport UDP
source port
dest port
Length, including header
checksum
length
Application data (message)
- Why UDP provides checksum?
- not all links of the path provide error checking
- storing packet into buffer can introduce error too
282. UDP checksum
Goal detect errors of segment
3.3 Connectionless Transport UDP
- Sender performs 1s complement of the sum of all
the 16-bit words in segment. - treat segment contents as a sequence of 16-bit
integers - sum addition of segment contents , overflow
wrapped around the least important bit - checksum performs the 1s complement of the sum
- sender puts checksum value into checksum field
292. UDP checksum
- Goal detect errors of segment
3.3 Connectionless Transport UDP
- Receiver
- compute sum of received segment
- check if computed sum equals all ones
- NO - error detected
- YES - no error detected. But maybe errors
nonetheless?
30UDP Checksum Example
- Note
- When adding numbers, a carryout from the most
significant bit needs to be added to the result - Example add two 16-bit integers
1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1
0 1 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0
1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1
1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 1
0 0 1 0 1 0 0 0 1 0 0 0 1 0 0 0 0
1 1
wraparound
sum
checksum
313.3 UDP Summary
- UDP only provides de/multiplexing and light
error checking over IP - UDP is a connectionless transport protocol
- Why applications choose UDP?
- UDP segment structure
- UDPs checksum
32Chapter 3 Roadmap
- 3.1 Introduction and Transport-layer services
- 3.2 Multiplexing and Demultiplexing
- 3.3 Connectionless transport UDP
- 3.4 Principles of reliable data transfer
- 3.5 Connection-oriented transport TCP
- 3.6 Principles of congestion control
- 3.7 TCP congestion control
- 3.8 Summary
33Introduction
3.4 Principles of Reliable Data Transfer
- framework for discussing of reliable data
transfer
Step by Step Incrementally develop a
reliable data transfer protocol considering
increasingly complex models of the underlying
channel
34Introduction - Methodology
- Step by Step
- Incrementally develop a reliable data transfer
protocol considering increasingly complex models
of the underlying channel
3.4 Principles of Reliable Data Transfer
35Introduction
- Senders application invoke rdt_send() to send
data - call rdt sender side
- When a packet arrives from the receiving side of
the channel, rdt_rcv() will be called - Calling deliver_data() to deliver data to
application layer - only consider unidirectional data transfer from
sender to receiver.
3.4 Principles of Reliable Data Transfer
36Reliable data transfer - getting started
Reliable data transfer
send side
receive side
37Reliable data transfer - getting started
- use finite state machine (FSM) to specify sender
and the receiver - arrows indicate the transition from one state to
another - only event(s) can cause the transition
- actions are taken when event(s) occurs
3.4 Principles of Reliable Data Transfer
event causing state transition
actions taken on state transition
event
381.1 reliable transfer over a perfectly reliable
channel rdt1.0
- underlying channel perfectly reliable
- no bit errors no loss of packets in-order .
- separate FSM for sender, receiver
- sender sends data into underlying channel
- receiver read data from underlying channel
3.4 Principles of Reliable Data Transfer
391.1 reliable transfer over a perfectly reliable
channel rdt1.0
- Sender
- accepts data from app via rdt_send(data) event
- action
- creates a packet containing the data
make_pkt(data) - sends the packet into the channel
udt_send(packet) - Receiver
- receives a packet from channel via
rdt_rcv(packet) event - action
- extracts data from the packet extract(packet,
data) - passes the data to upper deliver_data(data)
3.4 Principles of Reliable Data Transfer
401.2 reliable transfer over a channel with bit
errors rdt2.0
- a packet may be corrupted, but no loss however.
- How should rdt do to handle corrupted pkts?
- Who should be responsible for checking the
possible error in the packet(s)? - receiver checks if there is a error in the
received pkt - sender must adds extra bits in packet
- How to handle the error?
- receiver informs the sender
- sender retransmits the corrupted packet
3.4 Principles of Reliable Data Transfer
411.2 reliable transfer over a channel with bit
errors rdt2.0
- Reliable data transfer protocols based on such
retransmission are Automatic Repeat reQuest
(ARQ) protocols - Error detection
- feedback
- positive acknowledgements (ACKs) receiver
explicitly tells sender that pkt received OK - negative acknowledgements (NAKs) receiver
explicitly tells sender that pkt has errors - Retransmission
3.4 Principles of Reliable Data Transfer
42rdt2.0 FSM specification
sender
rdt_send(data)
sndpkt make_pkt(data, checksum) udt_send(sndpkt)
rdt_rcv(rcvpkt) isNAK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) isACK(rcvpkt)
L
43rdt2.0 operation with no errors
rdt_send(data)
snkpkt make_pkt(data, checksum) udt_send(sndpkt)
rdt_rcv(rcvpkt) isNAK(rcvpkt)
Wait for call from above
rdt_rcv(rcvpkt) corrupt(rcvpkt)
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) isACK(rcvpkt)
Wait for call from below
L
rdt_rcv(rcvpkt) notcorrupt(rcvpkt)
extract(rcvpkt,data) deliver_data(data) udt_send(A
CK)
44rdt2.0 error scenario
rdt_send(data)
sndpkt make_pkt(data, checksum) udt_send(sndpkt)
rdt_rcv(rcvpkt) isNAK(rcvpkt)
Wait for call from above
rdt_rcv(rcvpkt) corrupt(rcvpkt)
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) isACK(rcvpkt)
Wait for call from below
L
rdt_rcv(rcvpkt) notcorrupt(rcvpkt)
extract(rcvpkt,data) deliver_data(data) udt_send(A
CK)
45rdt2.0 has a fatal flaw!
- What happens if ACK/NAK corrupted?
- sender doesnt know what happened at receiver!
- cant just retransmit possible duplicate
- Handling corrupted ACK/NAK
- Add new message(s)
- error detection and correction
- retransmit ?duplicated pkts
3.4 Principles of Reliable Data Transfer
46rdt2.0 has a fatal flaw!
- Handling duplicates
- sender adds sequence number to each pkt
- sender retransmits current pkt if ACK/NAK garbled
- receiver discards (doesnt deliver up) duplicate
pkt - ACK/NAK need a sequence number?
- No. for a stop and wait protocol, sender knows
that a ACK or NAK received was generated in
response to its most recently transmitted pkt
3.4 Principles of Reliable Data Transfer
47rdt2.1 handles garbled ACK/NAKs
rdt_send(data)
sndpkt make_pkt(0, data, checksum) udt_send(sndp
kt)
Sender
rdt_rcv(rcvpkt) ( corrupt(rcvpkt)
isNAK(rcvpkt) )
Wait for call 0 from above
udt_send(sndpkt)
rdt_rcv(rcvpkt) notcorrupt(rcvpkt)
isACK(rcvpkt)
rdt_rcv(rcvpkt) notcorrupt(rcvpkt)
isACK(rcvpkt)
L
L
rdt_rcv(rcvpkt) ( corrupt(rcvpkt)
isNAK(rcvpkt) )
rdt_send(data)
udt_send(sndpkt)
sndpkt make_pkt(1, data, checksum) udt_send(sndp
kt)
48rdt2.1 handles garbled ACK/NAKs
rdt_rcv(rcvpkt) notcorrupt(rcvpkt)
has_seq0(rcvpkt)
Receiver
extract(rcvpkt,data) deliver_data(data) sndpkt
make_pkt(ACK, chksum) udt_send(sndpkt)
rdt_rcv(rcvpkt) (corrupt(rcvpkt)
rdt_rcv(rcvpkt) (corrupt(rcvpkt)
sndpkt make_pkt(NAK, chksum) udt_send(sndpkt)
sndpkt make_pkt(NAK, chksum) udt_send(sndpkt)
rdt_rcv(rcvpkt) not corrupt(rcvpkt)
has_seq1(rcvpkt)
rdt_rcv(rcvpkt) not corrupt(rcvpkt)
has_seq0(rcvpkt)
sndpkt make_pkt(ACK, chksum) udt_send(sndpkt)
sndpkt make_pkt(ACK, chksum) udt_send(sndpkt)
rdt_rcv(rcvpkt) notcorrupt(rcvpkt)
has_seq1(rcvpkt)
extract(rcvpkt,data) deliver_data(data) sndpkt
make_pkt(ACK, chksum) udt_send(sndpkt)
Why ACK?
49rdt2.1 discussion
- Sender
- seq added to pkt
- two seq. s (0,1) will suffice. Why?
- must check if received ACK/NAK corrupted
- Receiver puts checksum in ACK/NAK
- twice as many states in FSM
- state must remember whether current pkt has 0
or 1 seq.
3.4 Principles of Reliable Data Transfer
50rdt2.1 discussion
- Receiver
- must check if received pkt is duplicate
- state indicates whether 0 or 1 is expected pkt
seq - note
- receiver can not know if its last ACK/NAK
received OK at sender
3.4 Principles of Reliable Data Transfer
51rdt2.2 a NAK-free protocol
- rdt 2.1 uses both positive and negative
acknowledgments. - rdt2.2 is a NAK-free protocol
- instead of NAK, receiver sends ACK for last pkt
received OK - receiver must explicitly include seq of pkt
being ACKed - duplicate ACK at sender results in same action as
a NAK retransmit current pkt
3.4 Principles of Reliable Data Transfer
52rdt2.2 sender FSM
53rdt2.2 Receiver FSM
541.3 rdt3.0 channels with errors and loss
- underlying channel can also lose data and ACK
packets - checksum, seq. , ACKs, retransmissions will be
of help, but not enough. - How to do?
- Who should be responsible for packet loss?
- sender
- Timer
- How long should the sender wait before resending
the lost packet?
3.4 Principles of Reliable Data Transfer
551.3 rdt3.0 channels with errors and loss
- Approach
- sender waits reasonable amount of time for ACK
- if no ACK received before timer expired,
retransmits the packet - requires a countdown timer
- if pkt (or ACK) just delayed but not lost,
retransmission will cause duplicate pkt - but use of seq. s can handles this
- receiver must specify seq of pkt being ACKed
3.4 Principles of Reliable Data Transfer
56rdt3.0 sender
rdt_send(data)
rdt_rcv(rcvpkt) ( corrupt(rcvpkt)
isACK(rcvpkt,1) )
sndpkt make_pkt(0, data, checksum) udt_send(sndp
kt) start_timer
L
rdt_rcv(rcvpkt)
L
timeout
udt_send(sndpkt) start_timer
rdt_rcv(rcvpkt) notcorrupt(rcvpkt)
isACK(rcvpkt,1)
rdt_rcv(rcvpkt) notcorrupt(rcvpkt)
isACK(rcvpkt,0)
stop_timer
stop_timer
timeout
udt_send(sndpkt) start_timer
rdt_rcv(rcvpkt)
L
rdt_send(data)
rdt_rcv(rcvpkt) ( corrupt(rcvpkt)
isACK(rcvpkt,0) )
sndpkt make_pkt(1, data, checksum) udt_send(sndp
kt) start_timer
L
57rdt3.0 in action
58rdt3.0 in action
592. Performance of rdt3.0
- rdt3.0 works, but performance stinks
- example 1 Gbps link, 15 ms e-e prop. delay, 1KB
packet
3.4 Principles of Reliable Data Transfer
- U sender utilization fraction of time sender
busy sending - 1KB pkt every 30.008 msec -gt 267kb/sec thruput
over 1 Gbps link - network protocol limits use of physical resources!
60rdt3.0 Performance of stop-and-wait Protocol
sender
receiver
first packet bit transmitted, t 0
last packet bit transmitted, t L / R
first packet bit arrives
RTT
last packet bit arrives, send ACK
ACK arrives, send next packet, t RTT L / R
- example 1 Gbps link, 15 ms e-e prop. delay, 1KB
packet - U sender utilization fraction of time sender
busy sending - 1KB pkt every 30.008 msec -gt 267kb/sec throughput
61Pipelining increased utilization
sender
receiver
first packet bit transmitted, t 0
last bit transmitted, t L / R
first packet bit arrives
RTT
last packet bit arrives, send ACK
last bit of 2nd packet arrives, send ACK
last bit of 3rd packet arrives, send ACK
ACK arrives, send next packet, t RTT L / R
Increase utilization by a factor of 3!
622. Pipelined protocols
- Pipelining sender allows multiple, in-flight,
yet-to-be-acknowledged pkts - range of sequence numbers must be increased
- buffering at sender and/or receiver
3.4 Principles of Reliable Data Transfer
632. Pipelined protocols
- Two basic approaches toward pipelined error
recovery - Go-Back-N
- If a packet is lost, retransmits all packet
yet-to-be ACKed in the window - Selective Repeat
- If a packet is lost, retransmit the lost packet
3.4 Principles of Reliable Data Transfer
643. Go-Back-N
- Sender
- k-bit seq in pkt header
- window of up to N, consecutive unacked pkts
allowed
3.4 Principles of Reliable Data Transfer
0,send_base-1 have already been transmitted
and acked
send_base, nextseqnum-1have been sent but not
yet acked
nextseqnum, send_baseN-1can be used for pkts
that can be sent right now
send_baseN, Max_SeqNumcan not be used for
pkts until an unacked
pkt in pipeline has been acked.
65Sliding-window protocol
3.4 Principles of Reliable Data Transfer
- window
- The range of permissible seq-num for sent but not
yet acked pkts over the range of seq number space - window size is N
- window sliding
- as the protocol operates, this window slides
forward over the seq number space
663. Go-Back-N (BGN) Sender
- Invocation from above
- Check if the window is full
- Yes simply returns the data back to the upper
layer - No create and send the packet
- Receipt of an ACK(n)
- cumulative ACK
- Ack(n) all pkts up to, including seq n, have
been correctly received - may deceive duplicate ACKs (see receiver)
- windows slides forward over Seq. space
- Timeout(n)
- retransmit pkt n and all higher seq pkts in
window
3.4 Principles of Reliable Data Transfer
67GBN sender(one timer) extended FSM
rdt_send(data)
if (nextseqnum lt send_baseN)
sndpktnextseqnum make_pkt(nextseqnum,data,chks
um) udt_send(sndpktnextseqnum) if
(send_base nextseqnum) start_timer
nextseqnumnextseqnum1)2k else
refuse_data(data)
timeout
start_timer udt_send(sndpktsend_base) udt_send(s
ndpktsend_base1) udt_send(sndpktnextseqnum-1
)
rdt_rcv(rcvpkt) corrupt(rcvpkt)
L
rdt_rcv(rcvpkt) notcorrupt(rcvpkt)
send_base getacknum(rcvpkt)1 If (send_base
nextseqnum) stop_timer else start_timer
68GBN receiver extended FSM
3.4 Principles of Reliable Data Transfer
- ACK-only always send ACK for correctly-received
pkt with highest in-order seq - may generate duplicate ACKs
- need only remember expectedseqnum
- out-of-order pkt
- discard (dont buffer) -gt no receiver buffering!
- Re-ACK pkt with highest in-order seq
69GBN in action
3.4 Principles of Reliable Data Transfer
70GBN final words
- Extended FSM and event-based Programming
- event procedure is called by other procedures in
protocols stack , or - as the result of an interrupt, such as a timer
- GBN incorporates almost all of the techniques of
reliable data transfer - Seq. , cumulative ACK, checksum, timer
- too many duplicated packets , so suffers from
performance problems.
3.4 Principles of Reliable Data Transfer
714. Selective Repeat (SR)
- receiver individually acknowledges all correctly
received pkts - out-of-order packet will be buffered for eventual
in-order delivery to upper layer - sender only resends pkts for which ACK not
received - sender timer for each unACKed pkt
- sender window
- N consecutive seq s
- again limits seq s of sent, unACKed pkts
3.4 Principles of Reliable Data Transfer
72sender, receiver views of seq number space
73Selective Repeat
- data from above
- if next available seq in window, send pkt
- timeout(n)
- resend pkt n, restart timer n
- ACK(n) in sendbase,sendbaseN-1
- mark pkt n as received
- if n smallest unACKed pkt, advance window base to
next unACKed seq
744. Selective repeat in action
3.4 Principles of Reliable Data Transfer
754. Selective Repeat dilemma
- Example
- seq s 0, 1, 2, 3 window size N3
- receiver sees no difference in two scenarios!
- incorrectly passes duplicate data as new in (a)
3.4 Principles of Reliable Data Transfer
764. Selective Repeat dilemma
- Q what relationship between seq size and
window size? - worst case
- rcvbase sendbaseN
- sendbase, sendbaseNN-12N
- So, Must guarantee 2Nlt2k . Nlt2k-1
3.4 Principles of Reliable Data Transfer
77reliable data transfer summary
78Chapter 3 Roadmap
- 3.1 Introduction and Transport-layer services
- 3.2 Multiplexing and Demultiplexing
- 3.3 Connectionless transport UDP
- 3.4 Principles of reliable data transfer
- 3.5 Connection-oriented transport TCP
- 3.6 Principles of congestion control
- 3.7 TCP congestion control
- 3.8 Summary
79Introduction
- point-to-point,full duplex connection
- one sender, one receiver
- bi-directional data flow in same connection
- reliable, in-order, byte-steam
- no message boundaries
- pipelined
- TCP congestion and flow control together set
window size
3.5 Connection-Oriented Transport Service TCP
801. The TCP Connection
- connection-oriented
- 3-way handshake to establish the connection
- exchange of control msgs
- init sender and receiver state before data
exchange - send receive buffers
- app passes a stream of data through the socket of
TCP - TCP directs data to connection' s send buffer
- not specified when sends the buffered data
- in segments at its own convenience
3.5 Connection-Oriented Transport Service TCP
811. The TCP Connection
- Overview
- TCPs connection consists of buffers, variables
(such as MSS), and a socket connection to a
process in each host. - No buffers or variables are allocated to the
connection in the network elements between the
hosts. - maximum transmission unit (MTU)
- MTU is the maximum size of the Layer-3 PDU
(Packet) - MTU is 1500 bytes in Ethernet
- maximum segment size (MSS)
- MSS is the maximum size of the payload in any
segment - MSS is 1460 bytes in Ethernet, usually
3.5 Connection-Oriented Transport Service TCP
823.5.2 TCP segment structure
32 bits
source port
dest port
sequence number
acknowledgement number
head len
not used
Receive window
F
S
R
P
A
U
checksum
Urg data pnter
Options (variable length)
application data (variable length, MSS)
832.2.1 Sequence and ACK numbers
- Sequence Number
- TCP views data as unstructured, ordered
byte-stream - Seq. is byte-stream number of first byte in
segment - ACKs
- seq of next byte expected from receiver
- cumulative ACK
- Both sides of a TCP connection randomly choose an
initial Seq. No.
3.5 Connection-Oriented Transport Service TCP
842.2.1 Sequence and ACK numbers
- Q How does the receiver handle out-of-order
segments? - TCP spec doesnt say, - up to implementer
- immediately discards out-of-order segments. Or
- buffering the out-of-order segments
3.5 Connection-Oriented Transport Service TCP
852.2.2 A Case Study - Telnet
Host A
3.5 Connection-Oriented Transport Service TCP
Host B
User types C
Seq42, ACK79, data C
host ACKs receipt of C, echoes back C
Seq79, ACK43, data C
host ACKs receipt of echoed C
piggyback
Seq43, ACK80
simple telnet scenario
863. TCP Round Trip Time and Timeout
- Q how to set timeout interval value in TCP?
- longer than RTT at least
- but RTT varies
- RTT estimation
- too short premature timeout
- unnecessary retransmissions
- too long slow reaction to segment loss
3.5 Connection-Oriented Transport Service TCP
873. TCP Round Trip Time and Timeout
- Q how to estimate RTT?
- SampleRTT measured time from segment
transmission until ACK receipt - One time a RTT
- Why ignore retransmitted pkts ?
- SampleRTT will vary, want estimated RTT
smoother - average several recent measurements, not just
current SampleRTT - EstimatedRTT
3.5 Connection-Oriented Transport Service TCP
883.1 Estimating the Round-Trip Time
3.5 Connection-Oriented Transport Service TCP
EstimatedRTT (1- ?)EstimatedRTT ?SampleRTT
- Exponential weighted moving average (EWMA)
- influence of past sample decreases exponentially
fast - typical value (RFC 2988) ? 0.125
893.1 Estimating the Round-Trip Time
3.5 Connection-Oriented Transport Service TCP
903.1 Estimating the Round-Trip Time
- Setting the timeout
- EstimtedRTT plus safety margin
- large variation in EstimatedRTT -gt larger safety
margin - first estimate of how much SampleRTT deviates
from EstimatedRTT
3.5 Connection-Oriented Transport Service TCP
DevRTT (1-?)DevRTT ?SampleRTT-EstimatedRTT
(typically, ? 0.25)
Then set timeout interval
TimeoutInterval EstimatedRTT 4DevRTT
914. TCP reliable data transfer
- TCP creates rdt service on top of IPs unreliable
service - can ensure that the byte stream is exactly the
same byte stream what was sent. - cumulative ACKs
- TCP uses single retransmission timer
3.5 Connection-Oriented Transport Service TCP
924. TCP reliable data transfer
- Retransmissions are triggered by
- timeout events
- duplicate ACKs
- Initially consider simplified TCP sender
- ignore duplicate ACKs
- ignore flow control, congestion control
3.5 Connection-Oriented Transport Service TCP
934.1 TCP sender events
- data rcvd from app
- create segment with seq
- seq is byte-stream number of first data byte in
segment - start timer if not already running (think of
timer as for oldest unacked segment) - expiration interval TimeOutInterval
3.5 Connection-Oriented Transport Service TCP
944.1 TCP sender events
- timeout
- retransmit segment which causes the timeout event
- restart timer
- Ack rcvd
- If acknowledges previously unacked segments
- update what is known to be ACKed sendbase
- start timer if there are outstanding segments
3.5 Connection-Oriented Transport Service TCP
954.1 TCP Sender (simplified)
NextSeqNum InitialSeqNum
SendBase InitialSeqNum loop (forever)
switch(event) event
data received from application above
create TCP segment with sequence number
NextSeqNum if (timer
currently not running)
start timer pass segment to
IP NextSeqNum NextSeqNum
length(data) event timer timeout
retransmit not-yet-acknowledged
segment with smallest
sequence numbe r(NO.sendbase)
start timer event ACK received,
with ACK field value of y if (y
gt SendBase) SendBase
y if (there are currently
not-yet-acknowledged segments)
start timer
/ end of loop forever /
- Comment
- SendBase-1 last
- cumulatively acked byte
- Example
- SendBase-1 71y 73, so the rcvrwants 73
y gt SendBase, sothat new data is acked
964.1 TCP retransmission scenarios
Host A
Host B
Host A
Host B
Seq92, 8 bytes data
Seq92, 8 bytes data
Seq100, 20 bytes data
ACK100
timeout
X
ACK100
loss
ACK120
Seq92, 8 bytes data
Seq92, 8 bytes data
Sendbase 100
SendBase 120
ACK120
ACK100
Seq92 timeout
SendBase 100
1) lost ACK scenario duplicated segment
SendBase 120
2) premature timeout
time
974.1 TCP retransmission scenarios
Host A
Host B
Seq92, 8 bytes data
ACK100
Seq100, 20 bytes data
timeout
X
loss
ACK120
SendBase 120
time
3) Cumulative ACK scenario
984.2 Doubling the timeout interval
- timeout event
- retransmits the not yet acknowledged segment with
the smallest seq . - set timeout interval to twice the previous value
- rather than deriving it from last EstimatedRTT
and DevRTT - limited form of congestion control
- slow down the senders rate rather than
retransmit packets persistently
3.5 Connection-Oriented Transport Service TCP
994.3 Fast Retransmit
- Time-out period often relatively long
- long delay before resending lost packet
- Detect lost segments via duplicate ACKs.
- Sender often sends many segments back-to-back
- If segment is lost, there will likely be many
duplicate ACKs.
3.5 Connection-Oriented Transport Service TCP
100TCP ACK Generation RFC 1122, RFC 2581
Event at Receiver Arrival of in-order segment
with expected seq . All data up to expected seq
already ACKed Arrival of in-order segment with
expected seq . One other segment has ACK
pending Arrival of out-of-order segment
higher-than-expect seq. .Gap detected Arrival
of segment that partially or completely fills gap
TCP Receiver action Delayed ACK. Wait up to
500ms for next segment. If no next segment, send
ACK Immediately send single cumulative ACK,
ACKing both in-order segments Immediately send
duplicate ACK, indicating seq. of next expected
byte Immediate send ACK, provided that segment
starts at lower end of gap
1014.3 Fast Retransmit
- If sender receives 3 ACKs for a same segment, it
supposes that the segment after the ACKed
segment was lost - fast retransmit
- resends the segment before timer expires
3.5 Connection-Oriented Transport Service TCP
1024.3 Fast Retransmit
event ACK received, with ACK field value of y
if (y gt SendBase)
SendBase y
if (there are currently not-yet-acknowledged
segments) start
timer
else increment count
of dup ACKs received for y
if (count of dup ACKs received for y 3)
resend segment with
sequence number y
3.5 Connection-Oriented Transport Service TCP
a duplicate ACK for already ACKed segment
fast retransmit
1034.4 TCP is GBN or SR?
- sender only maintain Send_Base, NextSeqNum and
receiver adopts cumulative ACK - TCP looks like a GBN protocol
- Differences between TCP and GBN
- most TCP receivers buffering the out-of-order
segment - sender only retransmits segment send_base when
timeout - sender doesnt retransmit segment n if the ack of
segment n1 arrived before the timeout of segment
n.
3.5 Connection-Oriented Transport Service TCP
1044.4 TCP is GBN or SR?
- RFC2018 (SACK)-selective acknowledgment
- Receiver selectively ACKs the out-of-order
segments - Sender selectively retransmits the segment except
the ACKed segment - Looks like SR
- TCPs error-recovery mechanism is a hybrid of GBN
and SR
3.5 Connection-Oriented Transport Service TCP
1055. Flow Control
- receive has a receive buffer
3.5 Connection-Oriented Transport Service TCP
- app process may be slow at reading from the
buffer than the sending rate of sender
1065. Flow Control
- speed-matching service
- matching the sending rate to the receiving apps
draining rate - sender maintains a variable called receive window
to indicate how much free buffer space available
at the receiver.
3.5 Connection-Oriented Transport Service TCP
1075. TCP Flow control how it works?
3.5 Connection-Oriented Transport Service TCP
- spare room in receivers buffer
- RcvWindow RcvBuffer-LastByteRcvd -
- LastByteRead
http//wps.aw.com/aw_kurose_network_3/0,9212,14063
46-,00.html
1085. TCP Flow control how it works?
- receiver advertises the size of spare room in
buffer by including RcvWindow in the segments
header. - Sender limits unACKed data to RcvWindow
- A problem
- If a host A advertises to B that RcvWindow is 0,
B can not send any data to A even when A empty
the buffer some time later. - TCPs specification requires A to continue to
send one-byte data segment to B when Rcv_Window
0.
3.5 Connection-Oriented Transport Service TCP
1096. TCP Connection Management
- Recall TCP must establish connection before
exchanging data. - initialize seq. s , buffers, flow control info
(e.g. RcvWindow) - initiated by the client
- socket clientSocket new Socket(
host, port) - socket connectionSocket welcomeSocket.accep
t()
3.5 Connection-Oriented Transport Service TCP
1106.1 TCP Connection Management
- Step 1 client sends SYN segment (S1, no data)
- initial seq is randomly chosen client_isn
- Step 2 server replies with SYNACK segment
(S1,A1, no data) - allocates buffers and variables
- Acknowledgment client_isn 1
- randomly chooses the seq. server_isn
- Step 3 client replies with ACK segment, which
may contain data (piggyback) - client allocates buffers and variables
- A1 Acknowledgment server_isn1
3.5 Connection-Oriented Transport Service TCP
1116.1 TCP Connection Management
1126.2 Close a Connection
- Step 1 client sends a FIN segment
- Step 2 server replies with a ACK and a FIN
segment. - Step 3 client replies with a ACK segment.
- enters timed wait - will respond with ACK to
received FINs - Typically 30s
- Step 4 server receives ACK. The connection is
closed.
3.5 Connection-Oriented Transport Service TCP
1136.3 TCP Connection Management
TCP server lifecycle
TCP client lifecycle
1143.5 Connection-Oriented Transport - TCP
- Connection-oriented
- point-to-point and full-duplex
- allocate buffers and variables, initializes the
sequence number - TCP segment structure
- RTT estimation and Timeout
- reliable data transfer
- double timeout interval
- fast retransmit
- a hybrid of GBN and SR
- flow control
- connection management
115Chapter 3 Roadmap
- 3.1 Introduction and Transport-layer services
- 3.2 Multiplexing and Demultiplexing
- 3.3 Connectionless transport UDP
- 3.4 Principles of reliable data transfer
- 3.5 Connection-oriented transport TCP
- 3.6 Principles of congestion control
- 3.7 TCP congestion control
- 3.8 Summary
116Introduction
- Whats congestion?
- informally too many sources sending too much
data too fast for network to handle - quite different from flow control!
- congestion manifestations
- lost packets (buffer overflow at routers)
- long delays (queueing in router buffers)
- a top-10 problem!
3.6 Principles of congestion control
1176.1.1 Causes/costs of congestion scenario 1
- two senders, two receivers
- ?in
- one router, infinite buffers
- link capacity C
- Suppose no retransmission
- large delays when congested
- maximum achievable throughput
1181.2 Causes/costs of congestion scenario 2
- one router, finite buffers
- sender retransmits lost packets
3.6 Principles of congestion control
Host A
lout
lin original data
l'in original plus retransmitted data
Host B
finite shared buffers
1191.2 Causes/costs of congestion scenario 2
R/2
R/2
R/2
3.6 Principles of congestion control
R/3
lout
lout
lout
R/4
R/2
R/2
R/2
b.
a.
c.
- a- no packet lost ?in?out
- b- perfect retransmission only when loss
?ingt ?out - c- retransmit delayed (not lost) packet ?in
larger than perfect case for same ?out .
- costs of congestion
- more work (retrans) for given goodput
- unneeded retransmissions
1201.3 Causes/costs of congestion scenario 3
- four senders
- multi-hop paths
- timeout/retransmit
Q what happens as ?in and ?in increase ?
3.6 Principles of congestion control
lout
Host D
lin original data
l'in original data, plus retransmitted data
finite shared output link buffers
Host C
1211.3 Causes/costs of congestion scenario 3
A
D
3.6 Principles of congestion control
R1
B
R2
R4
C
R3
- Another cost of congestion
- when packet dropped, any upstream transmission
capacity used for that packet was wasted!
1222. Approaches to Congestion Control
- approaches to congestion control
- End-to-End congestion control
- no explicit feedback from network
- congestion inferred from end-system observed
loss, delay - network-assisted congestion control
- routers provide feedback to end systems
- single bit indicating congestion (SNA, DECbit,
ECN, ATM) - explicit rate sender should send at
3.6 Principles of congestion control
123Chapter 3 Roadmap
- 3.1 Introduction and Transport-layer services
- 3.2 Multiplexing and Demultiplexing
- 3.3 Connectionless transport UDP
- 3.4 Principles of reliable data transfer
- 3.5 Connection-oriented transport TCP
- 3.6 Principles of congestion control
- 3.7 TCP congestion control
- 3.8 Summary
124Introduction
- TCP has each sender limit sending rate as a
function of perceived congestion - How does a sender perceive that network is in
congestion? - How does a sender limit sending rate?
- What algorithm should the sender use to change
its sending rate as a function of congestion?
3.7 TCP Congestion Control
1251. How to Limit Send Rate?
- sender limits transmission
- LastByteSent-LastByteAcked ?minCongWin,RcvWin
- RcvWin gtgt CongWin
- LastByteSent-LastByteAcked ? CongWin
- Roughly,
- CongWin is dynamic, function of perceived network
congestion
3.7 TCP Congestion Control
1262. How to Conceive the Congestion?
- Retransmission indicates congestion
- loss event timeout or 3-duplicate ACKs
- TCP sender reduces sending rate (CongWin) after
loss event
3.7 TCP Congestion Control
1273. Algorithm of Changing Send Rate
- ACKs arrival indicates the network is OK
- increase the sending rate by increasing the
CongWin - Timeout or 3 duplicate ACKs
- decrease the sending rate by reducing the CongWin
- Congestion Control Algorithm
- AIMD
- Additive Increase, Multiplicative Decrease
- slow start
- reaction to timeout events
3.7 TCP Congestion Control
1284. AIMD (1)
- multiplicative decrease
- cut CongWin in half after loss event
- CongWin CongWin/2, but CongWin should always
gt1MSS - additive increase
- increase CongWin by 1 MSS every a RTT period
- probingone ACK, CongWin increase
- MSS2/CongWin (bytes)
- Congestion Avoidance phase
3.7 TCP Congestion Control
1294. AIMD (2)
3.7 TCP Congestion Control
Long-lived TCP connection
1305. Slow Start
- When connection begins, CongWin 1 MSS
- Example MSS 500 bytes RTT 200 msec
- initial rate 20 kbps
- available bandwidth may be gtgt MSS/RTT
- desirable to quickly ramp up to respectable rate
3.7 TCP Congestion Control
1315. Slow Start
- When connection begins, increase rate
exponentially fast until first loss event - when receiving a ACK, increases CongWin by 1 MSS.
- Then CongWin doubles every one RTT
- CongWin increasing exponentially
- the initial phase is called slow start
3.7 TCP Congestion Control
1325. TCP Slow Start (more)
3.7 TCP Congestion Control
Host A
Host B
one segment
RTT
two segments
four segments
- initial rate is slow but ramps up exponentially
fast
1336. Reaction to Timeout Events
- After 3 dup ACKs
- CongWin is cut in half
- window then grows linearly (AIMD)
- But after timeout event
- CongWin instead set to 1 MSS
- window then grows exponentially (Slow Start)
- to a threshold, then grows linearly (AIMD)
- thresholdCongWin/2
3.7 TCP Congestion Control
1346. Reaction to Timeout Events
- Threshold determines the CongWin at which slow
start will end and congestion avoidance will
begin. - Threshold initially set to a large value
- Loss event occurs, threshold CongWin/2
- Timeout
- CongWin 1 MSS
- slow start until reaches Threshold, then
congestion avoidance - 3DupACKs
- ThresholdCongWin/2 CongWinThreshold
- Increase linearly(congestion avoidance)
- Canceling of the slow-start phase after 3-dup
ACKs is called fast recovery
3.7 TCP Congestion Control
135Summary TCP Congestion Control
- When CongWin is below Threshold, sender in
slow-start phase, window grows exponentially. - When CongWin is above Threshold, sender is in
congestion-avoidance phase, window grows
linearly. - When a triple duplicate ACK occurs, Threshold set
to CongWin/2 and CongWin set to Threshold. - When timeout occurs, Threshold set to CongWin/2
and CongWin is set to 1 MSS.
3.7 TCP Congestion Control
136TCP sender congestion control
Event State TCP Sender Action Commentary
ACK receipt for previously unacked data Slow Start (SS) CongWin CongWin 1 MSS, If (CongWin gt Threshold) set state to Congestion Avoidance Resulting in a doubling of CongWin every one RTT
ACK receipt for previously unacked data Congestion Avoidance (CA) CongWin CongWin MSS (MSS/CongWin) Additive increase, increase of CongWin by 1 MSS every one RTT
Loss event detected by 3 dupACK SS or CA Threshold CongWin/2, CongWin Threshold, Set state to Congestion Avoidance Fast recovery, multiplicative decrease. CongWin will not drop below 1 MSS.
Timeout SS or CA Threshold CongWin/2, CongWin 1 MSS, Set state to Slow Start Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked CongWin and Threshold not changed
1376. Reaction to Timeout Events
3.7 TCP Congestion Control
RTT
1387. TCP Throughput
- Ignoring the slow start, Whats the average
throughout of TCP as a function of window size
and RTT? - Let W be the window size when loss occurs.
- When window size is W, throughput is W/RTT
- Just after loss, window drops to W/2, throughput
to W/2RTT. - Average throughout
- 0.75 W/RTT
3.7 TCP Congestion Control
1398. TCP Futures
- Consider a TCP connection with 1500-byte MSS and
a 100ms RTT, and we want to send data through
this connection at 10 Gbps - throughput WMSS8/RTT, then
- WthroughputRTT/(MSS8)
- throughput10Gbps, then W83,333
3.7 TCP Congestion Control
1408. TCP Futures
- throughput in terms of loss rate
- CongWin increase from W/2 to W, total pkts
number - W/2(W/21)(W/22).W 3W2/83W/4
- if W is large, then 3W2/8gtgt3W/4, so L 8/(3W2)
-
- L 2?10-10 Wow!!!
- new versions of TCP for high-speed needed!
3.7 TCP Congestion Control
1419. TCP Fairness
3.7 TCP Congestion Control
- fairness goal
- if K TCP sessions share same bottleneck link of
bandwidth R, each should have average rate of R/K
1429. TCP Fairness
3.7 TCP Congestion Control
R
equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
Connection 2 throughput
congestion avoidance additive increase
Connection 1 throughput
R
1439. Fairness and UDP
- multimedia apps often do not use TCP
- do not want rate throttled by congestion control
- Instead use UDP
- pump audio/video at constant rate, tolerate
packet loss - Research area TCP friendly
3.7 TCP Congestion Control
1449. Fairness and parallel TCP connections
- nothing prevents app from opening parallel
connections between 2 hosts. - Web browsers do this
- Example
- link of rate R running 9 connections
- new app asks for 1 TCP, gets rate R/10
- new app asks for 11 TCPs, gets more than R/2 !
3.7 TCP Congestion Control
14510. TCP Delay modeling
- Q How long does it take to receive an object
from a Web server after sending a request? - Ignoring congestion, delay is influenced by
- TCP connection establishment
- data transmission delay
- slow start period
3.7 TCP Congestion Control
14610. TCP Delay modeling
- Suppose
- one link between client and server of rate R
- S MSS (bits)
- O object size (bits)
- no retransmissions (no loss, no corruption)
- Window size
- First assume fixed congestion window, W segments
- Then dynamic window, modeling slow start
3.7 TCP Congestion Control
14710. Fixed congestion window (1)
- First case
- WS/R gt RTT S/R ACK for first segment in window
returns before windows worth of data sent
3.7 TCP Congestion Control
delay 2RTT O/R
14810. Fixed congestion window (2)
- Second case
- WS/R lt RTT S/R wait for ACK after sending
windows worth of data sent
3.7 TCP Congestion Control
delay 2RTT O/R (K-1)S/R RTT - WS/R
149 TCP Delay Modeling Slow Start (1)
Recall K number of windows that cover
object How do we calculate K ?
Calculation of Q, number of idles for
infinite-size object, is similar (see HW).
150 TCP Delay Modeling Slow Start (2)
151 TCP Delay Modeling Slow Start (3)
- Delay components
- 2 RTT for connection estab and request
- O/R to transmit object
- time server idles due to slow start
- P Number of actual Server idles
- k kth window
- Example
- O/S 15 segments
- K 4 windows
- Q 2
- P minK-1,Q 2
- Server idles P2 times
152 TCP Delay Modeling Slow Start (4)
- when object contained an infinite number of
segments, whats the number of times the sever
would stall?
- Example
- O/S 15 segments
- K 4 windows
- Q 2
- P minK-1,Q 2
- Server idles P2 times
Server idles P minK-1,Q times
153 TCP Delay Modeling Slow Start (5)
154HTTP Response time Modeling
- Assume Web page consists of
- 1 base HTML page (of size O bits)
- M images (each of size O bits)
- Non-persistent HTTP, serial connections
- M1 TCP connections in series
- Response time (M1)O/R (M1)2RTT sum of
idle times - Persistent HTTP
- 2 RTT to request and receive base HTML file
- 1 RTT to request and receive M images
- Response time (M1)O/R 3RTT sum of idle
times - Non-persistent HTTP with X parallel connections
- Suppose M/X integer.
- 1 TCP connection for base file
- M/X sets of parallel connections for images.
- Response time (M1)O/R (M/X 1)2RTT sum