Title: An Active Reliable Multicast Framework for the Grids
1An Active Reliable Multicast Framework for the
Grids
- M. Maimour C. Pham
- ICCS 2002, Amsterdam
- Network Support and Services for Computational
Grids - Sunday, April 21st, 2002
Action INRIA-RESO
http//www.ens-lyon.fr/LIP/RESAM
2Outline
- Motivations behind (reliable) multicast
- Use of active networks the DyRAM protocol
- DyRAM main services
- Simulation results
- Conclusion
3From unicast
Sender
- Problem
- Sending same data to many receivers via unicast
is inefficient.
data
data
data
data
data
data
Receiver
Receiver
Receiver
4to multicast on the Internet.
Sender
- Problem
- Sending same data to many receivers via unicast
is inefficient. - Solution
- Using multicast is more efficient
data
data
data
data
Receiver
Receiver
Receiver
5Reliable multicast
- At the routing level, IP Multicast efficiently
delivers packets to all the receivers subscribed
to a multicast session but without any
reliability guarantees. - Reliability (including flow and congestion
control) is to be addressed at the transport
level.
6Reliable multicast a big win for grids
Data replications Database updates Code data
transfers Data communications for distributed
applications (collective gather operations,
sync. barrier)
SDSC IBM SP 1024 procs 5x12x17 1020
224.2.0.1
NCSA Origin Array 256128128 5x12x(422) 480
CPlant cluster 256 nodes
Multicast address group 224.2.0.1
7Reliable multicast strategies
- End-to-end solutions
- Only the end hosts (the source and/or the
receivers) are involved. - Problem the lack of topology information at
the end hosts. - In-network solutions
- Some intermediate nodes (router/server) are
involved in the recovery process.
8Active networking solutions
- Active routers are able to perform customized
computations on incoming packets - cache of data,
- feedback aggregation,
- filtering, subcasting,
-
9The DyRAM framework for grids(Dynamic Replier
Active Reliable Multicast)
- In order to enable distributed grid applications,
main design goals are - low recovery latency using local recovery
- low memory usage in routers local recovery is
performed from the receivers (no cache in
routers) - low processing overheads in routers light
active services
10DyRAM loss recovery strategy main active
services
- DyRAM is NACK-based
- Global NACK suppression
- Early packet loss detection
- Subcast of repair packets
- Dynamic replier election
11Global NACKs suppression
12Early loss packet detection
The repair latency can be reduced if the lost
packet could be requested as soon as possible
These NACKs are ignored!
13Replier election
- A receiver is elected to be a replier for each
lost packet (one recovery tree per packet) - Load balancing can be taken into account for the
replier election
14Replier election and repair subcast
D0
DyRAM
0
2
1
D1
DyRAM
Repair 2
R1
1
0
R2
R3
R4
R6
R5
R7
15The DyRAM framework for grids
The backbone is very fast so nothing else than
fast forwarding functions.
source
- Nacks suppresion
- Subcast
- Loss detection
1000 Base FX
active router
active router
Any receiver can be elected as a replier for a
loss packet.
core network Gbits rate
active router
A hierarchy of active routers can be used for
processing specific functions at different layers
of the hierarchy.
active router
100 Base FX
active router
- Nacks suppression
- Subcast
- Replier election
16Some simulation results
- Network model and metrics used
- Local recovery from the receivers
- DyRAM vs. ARM (cache in routers)
- DyRAM early lost packet detection
17Network model
10 MBytes file transfer
Source router
18Metrics
- Load at the source the number of the
retransmissions from the source. - Load at the network the consumed bandwidth.
- Completion time per packet (latency).
19Local recovery from the receivers (1)
4 receivers/group
- Local recoveries reduces the end-to-end delay
(especially for high loss rates and a large
number of receivers).
grp 624
p0.25
20Local recovery from the receivers (2)
- As the group size increases, doing the recoveries
from the receivers greatly reduces the bandwidth
consumption
48 receivers distributed in g groups ? grp 224
21DyRAM vs ARM
- ARM performs better than DyRAM only for very low
loss rates and with considerable caching
requirements
22DyRAM early lost packet detection
grp 624
4 receivers/group
- The end-to-end latency is decreased when the
early lost packet detection is enabled
grp 624
23Conclusions
- Reliability on large-scale multicast is
difficult. - Active services can provide more efficient
solutions for reliable multicast related
problems. - Main DyRAM design goal is reducing the end-to-end
latencies using active services - which are keeped as light as possible making
DyRAM more suitable to grid applications.