Title: gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments
1gluepyA Simple Distributed Python Programming
Framework for Complex Grid Environments
- 8/1/08
- Ken Hironaka, Hideo Saito,
- Kei Takahashi, Kenjiro Taura
- The University of Tokyo
2Barriers of Grid Environments
- Grid Multiple Clusters (LAN/WAN)
- Complex environment
- Dynamic node joins
- Resource removal/failure
- Network and nodes
- Connectivity
- NAT/firewall
Fire Wall
leave
Grid enabled frameworks are crucial to facilitate
computing in these environments
join
3What type of applications?
- Typical Usage
- Standalone jobs
- No interaction among nodes
- Parallel and distributed Applications
- Orchestrate nodes for a single application
- Map an existing application on the Grid
- Requires complex interaction
- ?frameworks must make it
- simple and manageable
4Common Approaches(1)
execute
- Programming-less
- Batch Scheduler
- Task placement (inter-cluster)
- Transparent retries on failure
- Enables minimal interaction
- Pass data via files/raw sockets
- Embarrassingly parallel tasks
- Very limited for application
SUBMIT
redo
5Common Approaches(2)
- Incorporate some user programming
- e.g.Master-Worker framework
- Program the master/worker(s)
- Job distribution
- Handling worker join/leave
- Error handling
- Enables simple interaction
- Still limited in application
-
doJob()
error()
join()
For more complex interaction (larger problem
set) must allow more flexible/general programming
6The most flexible approach
- Parallel Programming Languages
- Extend existing languages retains flexibility
- Countless past examples
- (MultiLispHalstead 85, JavaRMI, ProActiveHuet
et al. 04, ) - Problemnot in context of the Grid
- Node joins/leaves?
- Resolve connectivity with NAT/firewall?
- Coding becomes complex/overwhelming
Can we not complement this?
7Our Contribution
- Grid-enabled distributed object-oriented
framework - a focus on coping with complex environment
- Joins, failures, connectivity
- simple Programming minimal Configuration
- Simple tool to act as a glue for the Grid
- Implemented parallel applications on Grid
environment with 900 cores (9 clusters)
8Agenda
- Introduction
- Related Work
- Proposal
- Evaluation
- Conclusion
9Programming-less frameworks
- Condor/DAGMan Thain et al. 05
- Batch scheduler
- Transparent retires/ handle multiple clusters
- Extremely limited interaction among nodes
- Tasks with DAG dependencies
- Pass on data using intermediate/scratch files
Task
Interaction using files
Central Manager
Assign
Busy Nodes
Cluster
10Restricted Programming frameworks
- Master-Worker Model Jojo2 Aoki et al. 06,
OmniRPC Sato et al. 01, - Ninf-C Nakata et al. 04, NetSolve
Casanova et al. 96 - Event driven master code handle join/leave
- Map-Reduce Dean et al. 05
- define 2 functions map(), reduce()
- Partial retires when nodes fail
- Ibis Satin Wrzesinska et al. 06
- Distributed divide-and-conquer
- Random work stealing accommodate join/leave
- Effective for specialized problem sets
- Specialize on a problem/model, made
mapping/programming easy - For unexpected models, users have to resort to
out-of-band/Ad-hoc means
Join Handler
Failure Handler
Join
fib(n)
Map()
divide
Reduce()
fib(n-1)
Map()
Reduce()
Input Data
Map()
11Distributed Object Oriented frameworks
foo.doJob(args)
- ABCL Yonezawa 90
- JavaRMI, Manta Maassen et al. 99
- ProActive Huet et al. 04
- Distributed Object oriented
- Disperse objects among resources
- Load delegation/distribution
- Method invocations
- RMI (Remote Method Invocation)
- Async. RMIs for parallelism
- RMI
- good abstraction
- Extension of general language
- Allow flexible coding
compute
RMI
foo
Async. RMI
12Hurdles for DOO on the Grid
- Race conditions
- Simultaneous RMIs on 1 object
- Active Objects
- 1 object 1 thread
- Deadlocks
- e.g. recursive calls
- Handling asynchronous events
- e.g., handling node joins
- Why not event driven?
- The flow of the program is segmented, and hard to
flow - Handling joins/failures
- Difficult to handle them transparently in a
reasonable manner
13Hurdles for Implementation
NAT
- Connecivity with NAT/firewall
- Solution Build an overlay
- Existing implementations
- ProActive Huet et al. 04
- Tree topology overlay
- User must hand write connectable points
- Jojo2 Aoki et al. 06
- 2-level Hierarchical topology
- SSH / UDP broadcast
- assumes network topology/setting
- out of user control
- Requirements
- Minimal user burden
Configure each link
Firewall
Connection Configuration File
14Summarization of the Problems
- Distributed Object-Oriented on the Grid
- Thread race conditions
- Event handling
- Node join/leave
- underlying Connectivity
15Proposal gluepy
- Grid enabled distributed object oriented
framework - As a Python Library
- glue together Grid resources via simple and
flexible coding - Resolve the issues in an object-oriented paradigm
- SerialObjects
- define ownership for objects
- blocking operations unblock on events
- Constructs for handling Node join/leave
- Resolve the first reference problem
- Failures are abstracted as exceptions
- Connectivity (NAT/firewall)
- Peers automatically construct an overlay
16The Basic Programming Model
- RemoteObjects
- Created/mapped to a process
- Accessible from other processes (RMI)
- Passive Objects
- Threads are not bound to objects
- Thread
- Simply to gain parallelism
- RMIs / async. invocations (RMIs) implicitly spawn
a thread - Future
- Returned for async. invocation
- placeholder for result
- Uncaught exception is stored
- and re-raised at collection
17Programming in gluepy
inherit Remote Object
- Basics RemoteObject
- Inherit Base class
- Externally referenceable
- Async. invocation with futures
- No explicit threads
- Easier to maintain
- sequential flow
- mutual exclusion? events?
- ? SerialObjects
-
class Peer(RemoteObject) def run(self, arg)
work here return result futures
for p in peers f p.run.future(arg)
futures.append(f) waitall(futures)
for f in futures print f.get()
async. RMI run() on all
wait for all results
read for all results
18ownership with SerialObjects
- SerialObjects
- Objects with mutual exclusion
- RemoteObject sub-class
- No explicit locks
- Ownership for each object
- call ? acquire
- return ? release
- Method execution by only 1 thread
- The owner thread
- Owner releases ownership on
- blocking operations
- e.g waitall(), RMI to other SerialObject
- Pending threads contest for ownership
- Arbitrary thread is scheduled
- Eliminate deadlocks for recursive calls
19Signals to SerialObjects
- We dont want event-driven loops!
- Events ? signals
- Blocking op. unblock on signal
- Signals to objects
- Unblock a thread blocking
- in objects context
- If none, unblock a next blocking thread
- Unblocked thread can handle
- the signal(event)
20SerialObjects in gluepy
class DistQueue(SerialObject) def
__init__(self) self.queue def
add(self, x) self.queue.append(x) if
len(self.queue) 1 self.signal() def
pop(self) while len(self.queue) 0
wait() x self.queue.pop(0)
return x
- e.g.A Queue
- pop()
- blocks on empty Queue
- add()
- call signal() to unblock waiter
- Atomic Section
- Between blocking ops
- in a method
- Can update obj. attr.s
- and do invocation on
- Non-Serial Objects
Signal wake
Block until signal
21Managing dynamic resources
Objects in computation
- Node Join
- Python process starts
- Node leave
- Process termination
- Constructs for node joins/leaves
- Node Join
- ?first reference problem
- Object lookup
- obtain ref. to existing objects in computation
- Node Leave
- ? RMI exception
- Catch to handle failure
-
joining node
Exception!
Object on failed node
22e.g.Master-worker in gluepy (1/3)
class Master(SerialObject) ... def
nodeJoin(self, node) self.nodes.append(node)
self.signal() def run (self) assigned
while True while
len(self.nodes)gt0 and len(self.jobs)gt0
ASYNC. RMIS TO IDLE WORKERS readys
wait(futures) if readys None
continue for f in readys HANDLE
RESULTS
- Handles join/leave
- code for join
- join will invoke signal
- signal will unblock main
- master thread
Signal for join
Block Handle join
23e.g. Master-worker in gluepy (2/3)
for f in readys node, job
assigned.pop(f) try print
done, f.get() self.nodes.append(node)
except RemoteException, e
self.jobs.append(job)
- Failure handling
- Exception on collection
- Handle exception to resubmit task
Failure handling
24e.g. Master-worker in gluepy (3/3)
- Deployment
- Master exports object
- Workers get reference
- and do RMI to join
Master init
master Master() master.register(master) mast
er.run()
Worker init
worker Worker() master RemoteRef(master) m
aster.nodeJoin(worker) while True sleep(1)
lookup on join
25Automatic Overlay Construction(1)
- Solution for Connectivity
- Automatically construct
- an overlay
- TCP overlay
- On boot, acquire other peer info.
- Each node connects to a small number of peers
- Establish a connected connection graph
-
NAT
Global IP
Firewall
Attempt connection
established connections
26Automatic Overlay Construction(2)
- Firewalled clusters
- Automatic
- port-forwarding
- User configure SSH info
- Transparent routing
- P-P communication is routed
- (AODV Perkins 97)
Firewall traversal
SSH
config file use src_pat dst_pat, protssh,
userkenny
P-to-P communication
27RMI failure detection on Overlay
RMI handler
- Problem with overlay
- A route consists of a number of connections
- RMI failure
- ? failure of any intermediate
- connection
- Path Pointers
- Recorded on each forwarding node
- RMI reply returns the path it came
- Failure of intermediate connection
- The preceding forwarding node back-propagates the
failure
Path pointer
RMI invoker
Backpropagate
failure
28Agenda
- Introduction
- Related Work
- Proposal
- Evaluation
- Conclusion
29Experimental Environment
InTrigger Grid Platform in Japan
Max. scale9 clusters, over 900 cores
Global IPs
istbs316
tsubame64
mirai48
okubo28
hongo98
All packets dropped
hiro88
chiba186
kyoto70
suzuk72
InTrigger
imade60
kototoi88
Private IPs
Firewall
30Necessary Configuration
- Configuration necessary for Overlay
- 2 clusters( tsubame, istbs) require
SSH-portforwarding to other clusters - ? 2 lines of configuration
add connection instruction by regular expression
istbs cluster uses SSH for inter-cluster
conn. use 133\.11\.23\. (?!133\.11\.23\.),
protssh, userkenny tsubame cluster gateway
uses SSH for inter-cluster conn. use 131.112.3.1
(?!172\.17\.), protssh, userkenny
31Overlay Construction Simulation
- Evaluate the overlay construction scheme
- For different cluster configurations, modified
number of attempted connections per peer - 1000 trials per each cluster/attempted connection
configuration -
28 Global/ 238 Private Peers Case 95
32Dynamic Master-Worker
- Master object distributes work to Worker objects
- 10,000 tasks as RMI
- Workers repeat join/leave
- Tasks for failed nodes are redistributed
- No tasks were lost during the experiment
33A Real-life Application
- A combination optimization problem
- Permutation Flow Shop Problem
- parallel branch-and-bound
- Master-Worker like
- Requires periodic exchange of bounds
- Code
- 250 lines of Python code as glue code
- Worker node starts up sequential C code
- Communicate with local Python through pipes
34Master-Worker interaction
- Master does RMI to worker
- Worker periodical RMI to master
- Not your typical master-worker
- requires a flexible framework like ours
Master
exchange_bound()
doJob()
Worker
35Performance
- Work Rate
- ci total comp. time per core
- N num. of cores
- T completion time
- Slight drop with 950 cores
- due to master node becoming overloaded
36Troubleshoot Search Engine
- Ever stuck debugging, or troubleshooting?
- Re-rank query results obtained from google
- Use results from machine learning web-forums
- Perform natural language processing on page
contents - at query time
- Use a Grid backend
- Computationally intensive
- Require good response time
- in 10s of seconds
-
Compute!!
Compute!!
backend
Query vmware kernel panic
Search Engine
Compute!!
37Troubleshoot Search Engine Overview
Python CGI
Leveraged sync/async RMIs to seamlessly integrate
parallelism into a sequential program. Merged
CGIs with Grid backend
38Agenda
- Introduction
- Related Work
- Proposal
- Evaluation
- Conclusion
39Conclusion
- gluepy Grid enabled distributed object oriented
framework - Supports simple and flexible coding for complex
Grid - SerialObjects
- Signal semantics
- Object lookup / exception on RMI failure
- Automatic overlay construction
- as a tool to glue together Grid resources simply
and flexibly - Implemented and evaluated applications on the
Grid - Max. scale 900 core (9 cluster)
- NAT/Firewall, with runtime joins/leaves
- Parallelized real-life applications
- Take full advantage of gluepy constructs for
seamless programming -
40Questions?
- gluepy is available from its homepage
- www.logos.ic.i.u-tokyo.ac.jp/kenny/gluepy