Title: LTTCP: EndtoEnd Framework to Improve TCP Performance over Highly Lossy Wireless MANET
1LT-TCP End-to-End Framework to Improve TCP
Performance over Highly Lossy Wireless MANET
Mesh Environments
- Vijay Subramanian, Shiv Kalyanaraman, K. K.
Ramakrishnan - (Rensselaer Polytechnic Institute)
- (ATT)
- Acknowledgments to Omesh Tickoo (now at Intel)
- (The multi-path version also involves Vicky
Sharma, Koushik Kar)
Project support from AFOSR ESC Hanscom and MIT
Lincoln Laboratory, Letter No. 14-S-06-0206
ATT Labs Research
2Multi-Tier NLOS MANETs Meshes Challenging
Conditions for TCP
WIFI meshes, WiMAX Backhaul meshes, Fixed/mobile
convergence
Meshed Backhaul Multiple NLOS Hops, Need Low
e2e Latency, High Goodput, Low residual loss
Bursty Losses, Disruptions Protocols need to be
loss tolerant and provide reliability
3TCP-SACK Performance VERY bad for the
combination 5 PER 100 ms RTT
From Chris Rammings DARPA CBMANET Overview slides
4Time Scales of Disruptions
- Multi-way tradeoff high goodput, fairness,
latency, residual loss, persistence beyond
disruption time-scales
5Multi-way Tradeoffs
Latency
Goodput
Residual Loss, Distortion
Persistence beyond disruptions
Fairness
- Easy to reduce loss by hugely over-provisioning
FEC, and slamming goodput! - Easy to reduce loss by ARQ persistence keep
retransmitting to tradeoff latency. This penalty
is especially high at links of a multi-hop lossy
path. - Hard Targeting FEC/ARQ to get the efficient
tradeoff for a mix of interactive, streaming and
bulk transfer applications. - Hard Division of Functions/Cross layer How much
of reliability features to put at each layer
(PHY, Link, Transport)? How to manage the
cooperation between layers? What are cross-layer
interactions that matter?
6Loss-Tolerant TCP (LT-TCP)
7LT-TCP Problem Motivation
- Dynamic Range
- Can we extend the dynamic range of TCP into high
loss regimes? - Can TCP perform close to the residual capacity
available under high loss rates? - Congestion Response
- How should TCP respond to notifications due to
congestion.. - but not respond to packet erasures that do not
signal congestion? - Mix of Reliability Mechanisms
- What mechanisms should be used to extend the
operating point of TCP into loss rates from 0 -
30 - 50 packet loss rate? - How can Forward Error Correction (FEC) help?
- How should the FEC be split between sending it
proactively (insuring the data in anticipation of
loss) and reactively (sending FEC in response to
a loss)? - Timeout Avoidance
- Timeouts Useful as a fall-back mechanism but
wasteful otherwise especially under high loss
rates. - How can we add mechanisms to minimize timeouts?
8LT-TCP Building Blocks
- ECN-Environment
- We infer congestion solely based on ECN markings.
- Window is cut in response to
- ECN signals hosts/routers have to be
ECN-capable. - Timeouts The response to a timeout is the same
as with standard TCP. - Window Granulation and Adaptive MSS
- We ensure that the window always has at least G
segments ( allows for dupacks to help recover
from loss at small windows) - Avoids timeouts
- Window size in bytes initially is the same as
normal SACK TCP. - Initial segment size is small to accommodate G
segments. - Packet size is continually adjusted so that we
have at least G segments. Once we have G
segments, packet size increases with window size.
- Loss Estimation
- The receiver continually tracks the loss rate and
provides a running estimate of perceived loss
back to the TCP sender through ACKs. - We use an EWMA smoothed estimate of packet
erasure rate
9Block Erasure Coding Reed-Solomon FEC RS(N,K)
RS(N,K)
FEC (N-K)
Block Size (N)
Data K
Can also use fountain codes (eg tornado, raptor,
LT-codes)
Recovery possible if we receive at least K
packets out of N
10Diversity Techniques Hybrid ARQ/FEC
- Hybrid ARQ/FEC is a time-diversity technique.
- Error coding (PHY/MAC) Bits flipped, but
destination does not know which ones flipped - Erasure coding (Link/Transport)
packets/fragments erased, the destination does
not know what was their content OUR FOCUS
11Building Blocks Proactive/Reactive FEC
- Proactive FEC (PFEC)
- TCP sender sends data in blocks where the block
contains K data segments and R FEC packets. The
amount of FEC protection is determined by the
current loss estimate. - Proactive FEC based upon estimate of per-window
loss rate (Adaptive) - Reactive FEC (RFEC)
- Upon receipt of a dupack, Reactive FEC packets
are scheduled based on the following criteria. - Number of Proactive FEC packets already sent.
- Cumulative hole size seen in the decoding block
at the receiver. - Loss rate currently estimated.
- Reactive FEC to complement retransmissions
future block data transmissions - both used to reconstruct packets at receiver
- DATA, PFEC and RFEC follow the spirit of TCP
semantics (self-clocking and packet-conservation
principle.)
12LT-TCP performance preview
w/ Multiple flows
- Tradeoff aggregate TCP goodput vs block recovery
latency/short file transfers
13Short Term Per-Block Loss Binomial Distribution
14Aside Binomials for different loss rates, N 20
- As Npq gtgt 1, better approximated by normal
distribution (esp) near the mean - symmetric, sharp peak at mean, exponential-square
(e-x2) decay of tails - (pmf concentrated near mean)
10 PER
30 PER
Npq 4.2
Npq 1.8
N 20 for all cases
50 PER
Npq 5
15Hybrid ARQ/FEC Scheme Adaptivity to ?,?
16Adaptive MSS/Granulation In Lossy Networks
Tradeoff small MSS gt ? per-packet overheads
17Design Questions
- How much granulation per block (G)? (Adaptive
MSS) - How does PFEC provisioning depend upon the loss
statistics (µ,s)? (Adaptive PFEC) - How does RFEC provisioning depend upon the loss
statistics (µ,s) units needed (X) ? (Adaptive
RFEC) - How to fit these building blocks in the context
of TCPs congestion control and at the
link-layer?
18How Much Granulation?Key Factor P(all units
lost)
? O(sqrt(N))
When all units/block are lost, the HARQ will fail
(lead to timeouts etc). This probability is
non-trivial for N 5 N gt 10 is good enough.
3.175 blocks irrecoverably lost, i.e. all units
lost and no feedback (eg timeout)
19Tradeoff with larger G gt Per-Pkt Overhead
1000 byte packet gt 2.5 overhead
40 bytes
1000 bytes
400 byte packet gt 10 overhead
400 bytes
40 bytes
- TCP layer we cannot increase N (minimum
granulation) gt constrained by packetization
overhead - 802.11b MAC per-packet overheads such as MAC-ACK
sent at lower rate per PDU etc! - Tradeoff G (reduces timeout risk) vs. per-packet
overheads (reduces goodput). - Sweet spot G 10, assuming per-flow b-d product
gt 4000 bytes - Eg 10 Mbps link, 50 ms RTT b-d product of
62.5kB
20How much PFEC? ? or ? ?
21How much PFEC? (? - ?) or (? - 2.5?)
22How much Adaptive PFEC? Summary
- PFEC very efficient lt (? - k?) for small k, but
it - increases the burden on FEC in round 1
(latency penalty). - PFEC very inefficient gt (? k?) (goodput
penalty) - but it reduces the burden on FECs in round 1.
- Feasible PFEC Choices ? or ??.
- We pick ?? for bursty loss robustness
23How Much RFEC? Residual Units (X) Distribution
Conditional Binomial.
Chop!
24RFEC Issues
- Like PFEC, send more RFEC than expected number of
losses to reduce dependence on future rounds - Problem Many blocks require only a small number
of units (X 1 to 5 units). - Need to send gtgt X units when X is very small to
counter the small-N binomial effect. - A high proportion of RFEC wasted vs RFEC sent.
- However, the absolute RFEC waste is low when PFEC
gt ? - Total FEC waste still dominated by PFEC waste!
RFEC should be large enough to avoid small-N
binomial effect Some RFEC over-provisioning is
ok even for larger X, to avoid steep timeout
penalties. Absolute overhead matters more than
relative overhead. For TCP, we have to do this
in a partially blind manner (X not known), and
be in line with TCP self-clocking constraints etc
25RFEC Issues Effects
Model RFEC in Round2 (Y) (X 3?)/(1-p) Add 3?
and scale up by (1-p), and round-off to nearest
integer
26LT-TCP Packet Scheduling
Issue When the block is recovered at the rcvr,
RFECs stuck in the pipe are wasted (?
goodput) Soln Weighted Round Robin Tx of RFEC
pkts interleaved w/ future block data PFEC
pkts.
27Putting it Together LT-TCP
28Modeling Insights Tradeoffs
Analysis Numbers (p 50) Goodput 3.61 Mbps
vs 5 Mbps (max) PFEC waste 1.0 Mbps 10 RFEC
waste 0.39 Mbps 3.9 Residual Loss
0.0 Weighted Avg Rounds 1.13
29Model Validation Link/Transport Layer,
Uniform/Bursty 10 50 PER
Remarkably good match, especially at the
transport layer (since we have abstracted
several features)
30Multiple Flow Simulation Configuration
31LT-TCP vs SACK Multiple Flows
32Performance Multiple Hops
33Performance Uniform vs Bursty Losses
- ON/OFF Loss Process
- Error Rate toggles between 0.5p and 1.5p for an
average PER of p. - Sojourn time is randomized around a mean period
of 10 ms (- 1ms).
34Short-File Transfer Times Utility of PFEC
35Co-existence of TCP SACK and LT-TCP Cumulative
Goodput
- We test fairness under a lossless scenario.
- Cumulative goodput for a representative pair of
flows (1 TCP-SACK and 1 LT-TCP) are shown out of
10 flows total. - We see that LT-TCP (starting later) achieves fair
allocation within 40-50 RTTs. - This convergence is representative of the
concurrent flows.
36Fairness Comparisons
- Instantaneous goodput for a representative pair
of flows (1 TCP-SACK and 1 LT-TCP) are shown out
of 10 flows total. - The goodput was measured in intervals of 100ms.
37Value of LL-HARQ vs LL-ARQ
- LL-HARQ uses HARQ, with strict limit of 1-retry
(using ideas discussed earlier) - LL-ARQ persistent ARQ with 10 retries (no
backoffs) - LL-HARQ allows us to go multiple hops (as in
meshed backhaul) without rapidly increasing the
e2e visible RTT - Useful for interactive applications gaming, VoIP
etc
38Division of Functions Link vs Transport?
Division of reliability functions between layers
1. Delay-constrained HARQ at link-layer _at_ each
hop. Maximize outage capacity w/ small residual
loss rate in severe bursty cases 2.
Delay-unconstrained HARQ at transport layer to
handle accumulated residual losses (only in
severe bursty scenarios)
- If LL-HARQ is good, when do we need LT-TCP
(beyond TCP-SACK)? - Ans when high bursty losses, each hop has a
small residual loss rate. - With multiple hops, TCP-SACK cannot absorb the
accumulated residual loss. - This naturally occurs as we tradeoff
outage-capacity vs outage-probability in fading
channels - i.e. we want more backbone capacity gt tolerate
more short term outages, hoping to use HARQ over
longer time-scales!
39Medium/Long Time-Scale Disruptions Multi-Path
LT-TCP preliminary
40Problem Single path limited capacity, delay,
loss
Time
- Network paths usually have
- low e2e capacity,
- high latencies and
- high/variable residual loss rates.
41Idea Aggregate Capacity, Use Route Diversity!
42Diversity Burst-Error Tolerance Variance
Reduction w/ Multi-Paths
Aggregate Error
Path 1 error
Path 2 error
Path n error
43Multi-Route (Path) Challenges
- Very bursty, lossy component paths need adaptive
HARQ at the aggregate level - How to organize HARQ across paths (at the
aggregate level)? - How to efficiently achieve diversity gains from
partially correlated paths? - Note perfect correlation gives no diversity gain
- Bursty losses on paths becomes a resource, rather
than a liability! - How to scalably aggregate the information rate of
a number of heterogeneous routes? - Different RTTs (eg 40 ms vs 400 ms paths!)
- Different nominal capacities (per-path windows)
- Different per-path loss rates, and burstiness
characteristics - Many paths
- How does intelligent aggregation compare with
naïve strategies?
44Multi-path LT-TCP Structure
Socket Buffer
Map pkts?paths intelligently based upon Rank(pi,
RTTi, wi)
Per-path congestion control (like TCP)
Reliability _at_ aggregate, across paths (FEC block
weighted sum of windows, PFEC based upon
weighted average loss rate)
Note our core ideas can be applied to other
link-level multi-homing, network-level virtual
paths or non-TCP transport protocols
45Multi-Path Loss Tolerant TCP
- Design features
- FEC coding reliability across all paths
- Block size is a weighted sum of per-path windows.
- Concepts of loss-rate estimation, adaptive
PFEC/RFEC/MSS are similar to LT-TCP, but need to
be re-designed for multi-paths - Flow Control
- Per-path, done similar to TCP, but acks for data
sent can return on any path (reducing effective
RTT)! - Path Ranking Packet Mapping
- Path rank is a function of RTT, window size, loss
rate of a path. - Used for intelligent mapping.
- Goals use longer paths for later blocks, better
(shorter, less lossy) paths for recovery of
current block - Adaptive MSS (Maximum Segment Size)
- We modify packet size on a path to reduce timeout
probability. The variable MSS scheme is simpler
than LT-TCP.
46Eg Delay Heterogeneity
RTTs 40ms, 40 ms, k40ms
Can we achieve scalable capacity aggregation,
despite this delay heterogeneity?
47? Delay Heterogeneity Diversity-Aware vs -Blind
Diversity-Aware
Diversity-Blind
Diversity-Aware A small penalty paid for 5-10X
RTT heterogeneity (40ms vs 400ms). Penalty
declines for higher loss rates. Diversity-Blind
50 goodput penalty for all cases!
48Bursty Loss Diversity in Routes
Equal RTTs 40ms. Each path has a 2-state Markov
loss process (CTMC) with Exponential sojourn
times (avg 250ms). Eg ON 30 PER OFF 10
PER.
49Loss Diversity Diversity-Aware vs -Blind
50 goodput penalty w/ Diversity-blind, and
reduced goodput with ? paths
Total BW 10Mbps
50Loss Diversity Goodput Increase with paths
51Summary
- Improvement in TCP performance over lossy links
with residual erasure rates 0-50 (short- or
long-term). - LT-TCP design
- Adaptive MSS gt better flow of ACKs in small
window regime. - Adaptive FEC (proactive and reactive) protects
critical packets appropriately - Adaptive gt No overhead when there is no loss.
- ECN to distinguish congestion from loss
- LL-HARQ
- Adaptation of ideas to build a link-layer HARQ
scheme with delay constraint (1 HARQ attempt) - Division of reliability functions between
transport and link layers - Multi-path LT-TCP
- Extension to multi-path diversity.
- Can handle heterogeneity in RTT, b/w, losses,
burstiness/outage due to any reason
52Related projects
- Cross-layer issues in reliability for high-speed
mesh networks - WiMAX PHY/MAC modeling in ns-2 for large-scale
simulations - Cooperative MIMO/ST-Coding Cooperative FEC for
mesh/open-spectrum networks - Free-space-optical (FSO) meshed networks
auto-configuration space-time diversity - Large-scale Vehicular Nets DTNs Random walks
and weak-state routing in large-scale highly
mobile MANETs, Vehicular networks
delay-tolerant opportunistic networks
53Thanks !
- Vijay Subramanian
- subrav_at_rpi.edu (Rensselaer Polytechnic Institute)
- Shivkumar Kalyanaraman
- shivkuma_at_ecse.rpi.edu (Rensselaer Polytechnic
Institute) - K.K. Ramakrishnan,
- kkrama_at_research.att.com (ATT Labs Research)
Papers, PPTs, Audio talks, class videos
shiv rpi
ps new grad course on broadband wireless
communications online
Project support from AFOSR ESC Hanscom and MIT
Lincoln Laboratory, Letter No. 14-S-06-0206
ATT Labs Research
54Shortened Reed Solomon FEC (per-Window)
RS(N,K)
RS(N,K)
0
0
z
Zeros (z)
0
0
0
0
Reactive FEC inventory (R)
K d z
Block Size (N)
Proactive FEC (P)
Window (W)
Data d
d
55Co-existence of LT-TCP and SACK Reaction to
LossCongestion Windows
- 5 TCP-SACK and 5 LT-TCP flows At t50s, a burst
error event occurs for a 100ms period at with PER
set to 50. - Congestion Window for TCP-SACK is as shown
- Recovery of cwnd for TCP-SACK after t50 secs
shows - Following a timeout, TCP-SACK recovers quickly.
- It does not get beaten down by LT-TCPs behavior
during this vulnerable period. - LT-TCP but does not suffer a timeout during the
loss period
56Distribution of Units Required in Round 2 (X)
57Latency LL-HARQ vs LL-ARQ
- Even with 1-hop, latency effects are significant.