Title: A Survey on Parallel Computing in Heterogeneous Grid Environments
1A Survey on Parallel Computing in Heterogeneous
Grid Environments
- Takeshi Sekiya
- Chikayama-Taura Laboratory M1
- Nov 24, 2006
2Parallel Computingin Grid Environments
Dynamic Change of CPU/Network Load
- Increase opportunity in which we can use multi
cluster environments - But, schemes for stand alone clusters cause
problems in grid-like usage - New mechanisms are needed
- Handling heterogeneity
- Firewall/NAT traversal
- Adaptation to dynamic environment
- Monitoring
Heterogeneous hardware and software
Firewall/ NAT
Maintenance
Complex Configuration
Failure
Difficult to Know Whats Happening
3Heterogeneous Environments
- Heterogeneous machines
- Binaries are different
- Complex configuration are required when
hardware/software is different - Heterogeneous networks
- Overheads of synchronization in parallel
application with different latency/bandwidth - Firewalls/NATs
4Firewall/NAT
- Firewalls/NATs hinder bi-directional connectivity
- Bi-directional TCP/IP connectivity needs to be
provided to support a wide spectrum of
applications
Firewall or NAT
5Solutions to the Internet Asymmetric-Connectivity
Problem
- MPI Environment on Grid with Virtual Machines
Tachibana et al. 2006 - Xen for VM and VPN for Virtual Network
- Low cost VM migration
- ViNe Tsugawa et al. 2006
- A host named Virtual Router
- Overlay network base
- WOW Ganguly et al. 2006
6Outline
- Introduction
- WOW
- IPOP IP over P2P
- Routing IP on the P2P Overlay
- Connection Setup
- Joining an Existing Network
- NAT Traversal
- Experiments
- Summary
7Objective and Approach
- The system architected to
- Adapt heterogeneous environments
- Present to end-users a cluster-like environment
- Scale to large number of nodes
- Facilitate the addition of nodes through
self-organization of virtual network - Less manual configuration
- Approach with Virtualization
- Virtual Machines
- Homogeneous software
- Self-organizing overlay network
- All-to-all connectivity
8Virtual Machine
- A homogeneous software environment
- Offering opportunities for load balancing and
fault tolerance - Users can use pre-configured systems
- Linux distribution
- Libraries and softwares
9Virtual Network
Virtual Grid Cluster
IPOP (IP over P2P)
P2P Network
Physical Infrastructure
firewall
P2P overlay network
10IPOP Ganguly et al. 2006
- Characteristics
- A virtual IP address space
- Self-organizing
- Architecture
- IP tunneling over P2P
- A virtualized network interface (tap) captures
virtual IP packets - Brunet P2P overlay network
11Capturing Virtual IP Packets
- The tap appears as a network interface from
applications - IPOP translates virtual IP addresses to Brunet
P2P network addresses
application
application
Tunneling
tap
tap
IPOP
IPOP
12Brunet P2P
- Ring-structured overlay
- Organized connections
- Near with neighbors
- Far across the ring
- 160 bit SHA-1 hash address
- Greedy routing
- Each node has constant number of connections
- O(log2(n)) overlay hops
n4
n3
n5
n2
n6
Multi hop path from n1 to n7
n1
n7
n8
n12
n9
n11
n10
13Connection SetupConnection Protocol
- Node A wishes to connect to node B
- A sends a CTM (Connect To Me) request to B over
P2P network - The CTM request contains As URI
- When B receives the CTM request, B sends a CTM
reply to A - The CTM reply contains Bs URI
CTM reply
CTM request
A
B
URI (Uniform Resource Indicator) ex.)
brunet.tcp192.0.0.11024
14Connection SetupLinking Protocol
- B sends a link request message to A over the
physical network - When A receives the link request, A simply
responds with a link reply message - Finally, new connection is established between A
and B
Direct connection A to B
link request
A
B
link reply
15Linking Race Condition (1)
- Race condition may occur because linking protocol
is initiated by both peers
link request
link request
link reply
link reply
Both attempts succeed
16Linking Race Condition (2)
link request
link request
- Check no existing connection or connection
attempt, when nodes receive link request - When nodes receive link error, they restart
protocol with random back-off
Active linking on?
link error
link error
link request
Random back-off
link reply
17Joining an Existing NetworkLeaf Connection
- A new node N creates a leaf connection to an
initial node I by directly using linking protocol - I acts as forwarding agent for N
Correct position of new node
Initial node I
Leaf connection
New node N
18Joining an Existing NetworkSend CTM request
- N sends a CTM request addressed to itself over
P2P network - the CTM request contains Ns URI
- A CTM request is received by right and left
neighbors, since N is still not in the
ring
Left neighbor L
CTM request
Right neighbor R
Initial node I
New node N
19Joining an Existing NetworkSend CTM reply
- L and R send CTM reply including their URI to I
- I forwards CTM reply to N
Left neighbor L
CTM reply
Right neighbor R
Initial node I
CTM reply
New node N
20Joining an Existing NetworkLinking Protocol
- Start linking protocol
- L and R send link request message to N over the
physical network
Left neighbor L
Link request
Right neighbor R
Initial node I
Link request
New node N
21Joining an Existing NetworkComplete Joining
- N forms connections with neighbors and is in ring
- Acquires far connections
Left neighbor L
New node N
Right neighbor R
Initial node I
22Adaptive Shortcut Creation
- High latencies were observed in experiments due
to multi-hop overlay routing - Shortcut creation
- Count IPOP packets to other nodes
- When number of packets within an interval exceeds
threshold, initiate connection setup - Because overhead incurred during maintenance
connections, drop connections no longer in use
23NAT
IP 192.168.0.2
IP 133.11.238.100
IP 157.82.13.244
Host b
Host a
NAT
Src 192.168.0.25000 Dst 157.82.13.24480
Src 133.11.23.1006000 Dst 157.82.13.24480
Src 157.82.13.24480 Dst 192.168.0.25000
Src 157.82.13.24480 Dst 133.11.23.1006000
Private Network
Global Network
NAT Table 192.168.0.25000 ? 133.11.23.1006000
24NAT TraversalUDP Hole Punching
IP A
IP N
IP M
IP B
NAT
NAT
Host A
Host B
Src Aa Dst Mm
Src Nn Dst Mm
Src Bb Dst Nn
Src Mm Dst Aa
Src Mm Dst Nn
NAT Table Aa ? Nn
NAT Table Mm ? Bb
25Experimental Setup
Hosts 2.0 GHz Xeon, Linux 2.4.20, VMware GSX
Hosts 2.4GHz Xeon, Linux 2.4.20, VMware GSX
Host 1.3GHz P-III Linux 2.4.21 VMPlayer
Host 1.7GHz P4, Win XP SP2, VMPlayer
34 compute nodes, 118 P2P router nodes on
PlanetLab
26Experiment 1Joining and Shortcut Connections
- Node A IPOP node
- Node B new joining node
- A and B are in different network domains with NAT
- B sends ICMP packets to A at 1sec intervals
- Within period 1 (about 3 seconds), B establish a
route to other nodes - Within period 2 (about 28seconds), B establish a
shortcut connections to A
27Experiment 2PVM parallel application FastDNAml
(1)
- Parallelization with PVM based master-workers
model - FastDNAml has a high computation-to-communication
ratio - Dynamic task assignment tolerates performance
heterogeneities among computing nodes
Master
Task Pool
Workers
28Experiment 2PVM parallel application FastDNAml
(2)
Sequential Execution Parallel Execution Parallel Execution
Node 2 30 Nodes 30 Nodes
Node 2 Shortcuts disabled Shortcuts enabled
Execution time (sec) 22272 2033 1642
Parallel Speed up n/a 11.0 13.6
- The execution with shortcuts enabled is 24
faster than that with shortcuts disabled - The parallel speedup is 13.6x
- 23x is reported in previous work in homogeneous
cluster
29Summary
- Introduced WOW
- Scalable, fault-resilient and low management
infrastructure - Future works
- Research on middleware which is easy to use for
heterogeneous adaptive Grid environment