Title: The GRID ''' Bad and Good News from the Network
1The GRID ...Bad and Good News from the Network
Michael Welzl michael.welzl_at_uibk.ac.atDistribu
ted and Parallel Systems Group Institute of
Computer Science University of Innsbruck
2Intro How Grid engineers see the Internet
- Abstraction - simply use what is available
- still performance important (often main goal)!
- sometimes strict performance bounds required!
- Existing transport system (TCP/IP Routing ..)
takes good care of everything - QoS makes things better, the Grid needs it!
- we now have a chance for that, thanks to IPv6
- Main problem separated viewpoints!
- Grid engineers dont care about network
- Network engineers dont care about applications
on top - Considering both things at once theoretically
ideal, but extra work (
3The Bad News
- Brief overview of how some relevant things works
4Current typical Internet functionality
- Transport system
- TCP ... reliable data stream with window based
congestion control - UDP ... unreliable, take care of everything
yourself - QoS
- implemented, but not available on a global basis
for several reasons - one of them global accounting regulations ...
will not be solved! - IPv6 doesnt change anything
- QoS can be made available to certain specific
(virtual) communities! - Measurements
- estimations, predictions etc. ... but no
guarantees!
5The Congestion Control problem
- Congestion control necessary
- adding fast links does not help!
total throughput w/o cc. 20kb/s total throughput
w/ cc. 110kb/s
6TCP Congestion Control /1
- 1968/69 dawn of the Internet
- 1986 first congestion collapse
- 1988 "Congestion Avoidance and Control"
(Jacobson/Karels)Combined congestion/flow
control for TCP - Goal stability - in equilibrum, no packet is
sent into the network until an old packet leaves - ack clocking, conservation of packets principle
- made possible through window based stopgo -
behaviour - Superposition of stable systems stable
-gtnetwork based on TCP with congestion control
stable
7TCP Congestion Control /2
- If a packet or ack is lost (timeout, roughly
4rtt), set cwnd 1, ssthresh current
bandwidth / 2(multiplicative decrease") -
exponential backoff - Several timers, based on RTT good estimation is
crucial! - Later additions(TCP Reno, 1990)Fast retransmit
/ fast recovery (notify sender of loss via
duplicate acks)
Congestion Avoidance(Linear)
Slow Start(Exponential)
8TCP Congestion Control /3
- Timeout interpreted as congestion
- does not work well with noisy links!
- TCP over long fat pipes
- long time to reach equilibrium, MD problematic!
- Active Queue Management (something other than
FIFO) - circumvent synchronization (traffic phase
effects) - recent addition ECN (notify sender of congestion
using 1 bit) - partially deployed - makes everything even less
predictable! - TCP rate fluctuations undesirable for, e.g.,
streaming media! - proposals for TCP-friendly congestion control
with smoother rate - TCP very bad match for sporadic traffic
(transience ? goto start)
9The only alternative (?) ... UDP
- Unreliable stream quite useless for the Grid
- implement reliability on top
- implement congestion control? ...too
sophisticated - Fairness (TCP-friendliness) is an issue!
- single UDP flow can harm a large number of
TCP-flows... - problems with your own applications
- problems with your ISP (will look like an attack)
- danger of global congestion collapse
10Simulations by a student in Linz
11(No Transcript)
12X-axis CBR rate, y-axis TCP rate
13(No Transcript)
14X-axis CBR rate, y-axis TCP rate
15(No Transcript)
16X-axis CBR rate, y-axis TCP rate
17(No Transcript)
18X-axis CBR rate, y-axis TCP rate
19Elephants and Mice
20Internet traffic characteristics
- MRTG trace (based on SNMP, accessing traffic
counters in MIB)
21Internet traffic characteristics /2
- Traditional traffic modelling queuing
theorynotion traffic follows poisson
distribution - Internet traffic is bursty - intuitive reasons
- TCP is bursty by nature congestion avoidance,
payload vs. acks - ACK compression can cause payload bursts due to
ACK-clocking - various packet sizes
- Bursts from queues aggregate as traffic traverses
the net - Burstiness of one flow affects other adaptive
flows
22Internet traffic characteristics /3
- Overlapping of independent on-off sources leads
to distribution with heavy-tailed autocorrelation
function - Long-range dependance "peaks sit on ripples
which sit on waves" - No "flattening" towards a mean as you zoom out -
same structures may be found at different time
scales, hence self similar - Traffic characteristics sometimes modeled with
time series (fARIMA models) or wavelets - Measurement of the "degree of self similarity"
Hurst parameter - -gt model approximation involves Hurst parameter
estimation - Calculations extremely difficult -Internet
analysis mostly based on simulation!
23Measuring the network
- When you measure, you measure the past
- predictions / estimations with a ?? chance of
success - When you measure, you change the system
- think of CBR vs. TCP
- non-intrusiveness really important (e.g., monitor
TCP behavior) - Measurements yield no guarantees
- Internet traffic result of user behavior!
- Research carried out in controllable, isolated
environments - Field trials are a necessary extra when you know
that something works
24The Good News
- Some helpful things that are, will, and can be
done
25Two great new standards
- SCTP
- standard mostly finished, might already be
deployed in your next Windows / Linux /.. version - Basically like TCP, but more efficient in certain
aspects - Example problem that is solved with SCTP
- packets 1,3 arrive
- TCP receiver keeps no. 3 until packet 2 arrives
- no way to force TCP to hand over packet 3
- no notion of separate packets on top of TCP -
just data stream - Disadvantage TCP congestion control
- DCCP
- Will definitely become a standard, but not as
soon as SCTP - Well-defined framework for unreliable
TCP-friendly CC. schemes
26The future of transport
- What happens when we have TCPUDPSCTPDCCP?
- Increasingly hard to decide which protocol /
parameters to use! - Proposed solution hide details from application,
decide based on requirements specification
27Something special about Grid Traffic
- Predictable traffic pattern!
- There must be a way to exploit this...
- This is totally new to the Internet!
- Web users create traffic
- FTP download starts ... ends
- Streaming video either CBR or depends on
content! (head movement, ..) - How to predict
- automatic prediction
- analysis of running system
- analysis of source code
- specification as part of Grid Service Engineering
- Related signaling traffic
- usually not a large amount of data
- to date, no serious efforts for tailored
congestion control
28Requirement Behavior spec better Transport!
- Example 1 require delay bound
- Possibility
- Grid transport system (protocol interface)
transmits packets 1,2,3,4 - 1,3,4 received
- Delay bound of 3 exceeded do not request
retransmission of packet 2 ! - save bandwidth - more efficient transmission of
subsequent packets! - Not possible with any existing transport
protocol, not even SCTP! - Example 2 is it fair that my sporadic traffic
receives less network support during 10 minutes
than a 10 minute FTP download? - Possibility
- reward transcience with aggression points,
decrease points in times of transport - such (unfair) ideas require IETF standardization
- but this is an option!!!
29Something special about the Grid itself
- Distributed system, active for a certain duration
- Can exploit distributed transport strategies
- Multicast
- P2P paradigm do work for others to enhance the
total system(for your own good) - e.g.
transcoding, overlay multicast, .. - Can exploit highly sophisticated network
measurements! - some take a long time
- some require a distributed infrastructure
- Examples
- TCP monitoring
- Topology mapping (consider impact of stream 1 on
stream 2) - Packet pair based approaches
30Example pathchar (alternative pchar)
- Underlying technique packet pair
- send a large packet p1 followed by a small packet
p2 - high probability that p2 is enqueued exactly
behind p1 - at receiver calculate bottleneck bandwidth via
time between p1 and p2 - minimize error via multiple probes
- problem different queueing mechanisms at
bottleneck - pathchar to wwoz.org (128.121.224.134)
- can't find path mtu - using 1500 bytes.
- doing 32 probes at each of 45 sizes (64 to 1500
by 32) - 0 localhost
- 7936 Mb/s, 114 us (230 us)
- 1 sr1 (138.232.24.126)
- 819 Mb/s, 4 us (253 us)
- 2 r1b (138.232.10.126)
- ?? b/s, 9 us (256 us)
- 3 Ibk-GBS.ACO.net (193.171.19.1)
- 841 Kb/s, -1383 us (11.8 ms)
- 4 Wien2.ACO.net (193.171.12.209)
- ?? b/s, 1.33 ms (10.7 ms)
Note output must be interpreted carefully!
31Vision
GRID
GRID TRANSPORT Layer(provide transparent
QoS,perform sophisticated measurements,utilize
knowledge of Grid behavior, ..)
Internet
32The End
- ... of the presentation, but the beginning of a
better Grid! - )