Title: Distributing a Java Workflow Across a Network of Workstations
1Trellis Driver
- Distributing a Java Workflow Across a Network of
Workstations
Nicholas Lamb, Paul Lu, and Alona
FysheDepartment of Computing ScienceUniversity
of Albertanlambpaullualona_at_cs.ualberta.ca
2Motivation
- Some applications in science consist of main
job that drives other jobs
J1
Driver Job
- Main or driver jobcontrols workflow
J2
J3
J4
3Motivation
- Languages provide functions for running external
jobs on local server (e.g., system() ) - Javas runtime.exec() has drawback that location
of jobs must be statically specified - RMI offloads jobs onto remote hosts, but
- RMI is a set of low-level mechanisms
- RMI does not include scheduling
- RMI does not provide job barrier function
4Trellis Driver Introduction
- Trellis Driver Java package schedules jobs
across metacomputers - Trellis Driver integrates Java applications
with Trellis metacomputing system - TrellisDriver.exec() accepts command line and
sends it to underlying metascheduler - TrellisDriver.waitForAll() provides a job
barrier
5Trellis Driver Implementation
Java Code
Trellis Driver
Local OS
mqsub process
Producer
Consumer
next
Trellis
For i 1 to n td.exec( blast )
Meta-queue
6Example ApplicationProteome Analyst (PA)
- Bioinformatics tool for annotating
proteomeshttp//www.cs.ualberta.ca/bioinfo/PA - Training process Machine-learn new classifier
based on set of proteins with known annotations - Prediction process Use existing classifier to
predict annotations of unknown proteins - PA is best at predicting subcellular
localization (Bioinformatics, March 2004)
7Workflow Pipeline Shape
Sequence
Sequence
BLAST
BLAST
Homologues
Homologues
Parsing
Parsing
Features
Features
PredictionProcess
TrainingProcess
Training
Prediction
Classifier
Class Label
8Workflow Pipeline Shape
Sequence
Sequence
Phase 151927
BLAST
BLAST
Homologues
Homologues
Phase 200637
Parsing
Parsing
Features
Features
Phase 310817
Phase 401834
Training
Prediction
Classifier
Class Label
9Homogenous Job Batching
- Homogenous Job Batching refers to grouping
together of multiple calls to same program in
common metacomputer job - Job1 Job2 Job3 Job4
- Job1Job2 Job3Job4
- Grouping multiple BLAST invocations in single
call to mqsub greatly amortizes overhead - Concatenation of Blast commands transparent to
PA application
10Parallel BLAST Phase Data Flow
Sets batching factor
td.setGroup( Blast, 2)td.exec(Blast,
cmd1)td.exec(Blast, cmd2)td.exec(Blast,
cmdN-1)td.exec(Blast, cmdN)td.waitForAl
l()
mqsub cmd1 cmd2
mqsub cmdN-1 cmdN
11Parallel BLAST Phase Data Flow
td.setGroup( Blast, 2)td.exec(Blast,
cmd1)td.exec(Blast, cmd2)td.exec(Blast,
cmdN-1)td.exec(Blast, cmdN)td.waitForAl
l()
Multiple concurrent mqsub processes
mqsub cmd1 cmd2
mqsub cmdN-1 cmdN
12Parallel BLAST Phase Results
13Parallel BLAST Phase Results
14Parallel BLAST Phase Results
15Future Work
- Parallelize Machine Learning phase (runtime was
more than 1 hr) - Provide Heterogeneous Job Batching for job
pipelines to ensure data affinity - Test with geographically distant servers
connected via wide-area networks (WANs) - Underlying Trellis system is already being used
across administrative domains and WANs (e.g.,
CISS-3)
16Conclusions
- Scientific applications can benefit by
scheduling workflows across metacomputers - Contribution Trellis Driver module provides
convenient replacement for Runtime.exec() - PA and Trellis Driver can obtain reasonable
speed-ups for large BLAST phase - Trellis Drivers overheads can be amortized by
homogeneous job batching
17Trellis Driver Architecture
- Job submission results in producer-consumer
data flow pattern - Application threads (producers) generate jobs
- Trellis Driver threads (consumers) process jobs
- Consumer count decoupled from producer count by
bounded buffer implementation
18Trellis Metacomputing System
- Trellis metacomputing system provides
user-level aggregation of multiple computing
systems - Trellis offers load-balancing of workloads
- Metaqueues provide single point of submission
for jobs - mqsub add jobs to metaqueue mqdel remove
jobs from metaqueue mqstat list metaqueue
contents
19Trellis Driver API
setGroup(groupName, batchFactor) Register
new job group execSynch(commandLine) Run
command synchronously exec(commandLine,
groupName, prodId) Run command
asynchronously waitForOne(key, prodId)
Await completion of single job
waitForAll(prodId) Await completion of all
jobs
20Java Support for Workflow Parallelism
- PA originally used Runtime.exec() to run BLAST
jobs - Runtime.exec() provides process-level
concurrency within single server - Not scalable to large inputs
- RMI works only between Java objects and methods
21Parallel BLAST Phase Code Changes
- Code changes required to parallelize BLAST phase
minimal - add TrellisDriver.setGroup() at beginning
- replace Runtime.exec() with
TrellisDriver.exec() - add TrellisDriver.waitForAll() at end
22Original PA Performance Results
- Trained and validated new classifier with
training set of 3,916 sequences