Title: Case%20Studies%20of%20Using%20Condor%20for%20Scientists%20%20Barcelona,%202006
1Case Studies of Using Condor for Scientists
Barcelona, 2006
2Agenda
- Extended users tutorial
- Advanced Uses of Condor
- Java programs
- DAGMan
- Stork
- MW
- Grid Computing
- Case studies, and a discussion of your
applications needs
3BLAST
4Background
- Each species has a genetic encoding within its
cells - Humans are made of approximately 1014 cells
5Background
- The human nucleus of each cell contains 46
chromosomes - Each chromosome contains between 231 and 2958
genes - Each chromosome is made of somewhere between 25
million and 237 million (approximately) base pairs
6(No Transcript)
7Base Pairs (Simplified)
- Each base pair is one of 4 nucleotides
- Each nucleotide is represented by one letter
- A C G T
8The Science Issue
- Scientists ask many questions and pose
computationally difficult issues - map a species genome - build a huge database of
information - understand evolution at a genetic level answer
homology and related questions - identify mutations and genes to develop
diagnoses and medical treatments
9BLAST
- Basic Local Alignment Search Tool
- A really good pattern matching program
- An answer to the science questions often requires
queries such as - Does the following nucleotide sequence (1000
pairs), or something close appear in the database
(several billions of pairs)? To what certainty is
there a match?
10The Biological Magnetic Resonance Data Bank
- Department of Biochemistry at University of
Wisconsin-Madison - Part of the Center for Eukaryotic Structural
Genomics (CESG) - Working on three dimensional protein structure
11The BMRB and BLAST
- The BMRB (with the help of the Condor Team) has a
weekly set of automated BLAST runs - These BLAST runs compare progress on the BMRB set
of working proteins to the Protein Data Bank
12Serial versus Parallel
- Too slow The BMRB working set could be input as
a single BLAST program execution - Load the Protein Data Bank database
- Serially query the database with each protein in
the working set - Faster Divide the working set into pieces that
allow parallel executions of BLAST
13Weekly BMRB Runs
- Obtain and install the BLAST executable and
Protein Data Bank database - Decide on the best way to split the BMRB working
set of proteins to minimize the parallel
execution time - Make a custom DAG for this split
- Produce a report on the BMRB run
14The Custom DAG
. . .
B is BLAST
. . .
E is Extract results
15An Economics Application
- Computations are done at points on a coordinate
plane - Initial values are known along the axes
- Computation of one point at a time is too slow
(serial execution) - Each point is dependent on 2 neighboring points
- (x,y) can be computed knowing (x-1,y) and (x,y-1)
16The Coordinate Plane
known result
6
5
4
3
2
1
1
2
3
5
6
4
17The Coordinate Plane
known result
6
inputs ready
5
4
3
2
1
1
2
3
5
6
4
18The Coordinate Plane
known result
6
inputs ready
5
4
3
2
1
1
2
3
5
6
4
19The Coordinate Plane
known result
6
inputs ready
5
4
3
2
1
1
2
3
5
6
4
20The Coordinate Plane
known result
6
inputs ready
5
4
3
2
1
1
2
3
5
6
4
21The Coordinate Plane
known result
6
inputs ready
5
4
3
2
1
1
2
3
5
6
4
22The DAG
1-4
1-3
1-2
2-3
etc.
1-1
2-2
2-1
3-2
3-1
4-1
23Use DAGMan
- Write a program to generate the DAG input file
- The submit description file (and the executable)
is the same for each node in the DAG
24DAG Input File
- Job 1-1 gonkulate.submit
- Job 1-2 gonkulate.submit
- Parent 1-1 Child 1-2
- Job 2-1 gonkulate.submit
- Parent 1-1 Child 2-1
- Job 1-3 gonkulate.submit
- Parent 1-2 Child 1-3
- Job 2-2 gonkulate.submit
- Parent 1-2 2-1 Child 2-2
- Vars 2-2 leftfile1-2
- Vars 2-2 belowfile2-1
- Vars 2-2 resultfile2-2
- . . .
- DAG input file, continued
- Job 3-4 gonkulate.submit
- Parent 2-4 3-3 Child 3-4
- Vars 3-4 leftfile2-4
- Vars 3-4 belowfile3-3
- Vars 3-4 resultfile3-4
- . . .
25Submit Description File
- In gonkulate.submit
- universe vanilla
- executable gonkulate
- output (result)
- should_transfer_files YES
- when_to_transfer_output ON_EXIT
- transfer_input_files (left) (below)
- log gonkulate.log
- notification Never
- queue
26Nug30
27Description of Nug30
- nug30 (a Quadratic Assignment Problem instance of
size 30) had been the holy grail of
computational QAP research since 1968 - In 2000, Anstreicher, Brixius, Goux, Linderoth
set out to solve this problem - Using a mathematically sophisticated and
well-engineered algorithm, they still estimated
that we would require 11 CPU years to solve the
problem.
28Nugents Problem
- There are a set of N locations and a set of N
facilities, and each facility must be assigned a
location. To measure the cost of each possible
assignment, the flow between each pair of
facilities is multiplied by the distance between
the pair's assigned locations, and then a sum is
taken over all of the pairs. - For Nug30, N 30
29QAP Definition
- The formal definition of the quadratic assignment
problem is - Given two sets, P ("facilities") and L
("locations"), of equal size, together with a
weight function w P x P g R and a distance
function d L x L g R. Find the bijection f P
g L (assignment) such that the cost function - w(a,b) . d(f(a), f(b))
- is minimized and a and b are members of P.
- Usually weight and distance functions are viewed
as a square real-valued matrices.
Wikipedia
30Scope of the Problem
- This QAP problem is difficult due to the
excessively large number of possible facility
assignments. - The number of possible assignments is factorial
in the number of facilities. - N! N x (N-1) x (N-2) x . . . x 2
- 30! is approximately 2.6 x 1032
31The Simplified Approach
- Method of choice is branch and bound
- The complete tree has 30! nodes as leaves
- Branching grows the tree
- Bounding results in pruning the tree
32The Nug30 Solution
- Used a new algorithm called
- quadratic programming bound
- developed by Anstreicher and Brixius
- Sequential execution would have taken 7 years, so
parallelization of the algorithm was important - Used MW
33Nug30 Computational Grid
Number Arch/OS Location
414 Intel/Linux Argonne
96 SGI/Irix Argonne
1024 SGI/Irix NCSA
16 Intel/Linux NCSA
45 SGI/Irix NCSA
246 Intel/Linux Wisconsin
146 Intel/Solaris Wisconsin
133 Sun/Solaris Wisconsin
190 Intel/Linux Georgia Tech
94 Intel/Solaris Georgia Tech
54 Intel/Linux Italy (INFN)
25 Intel/Linux New Mexico
12 Sun/Solaris Northwestern
5 Intel/Linux Columbia U.
10 Sun/Solaris Columbia U.
- Used tricks to make it look like one Condor pool
- Flocking
- Glidein
- 2510 CPUs total
34Workers Over Time
35Nug30 solved
Wall Clock Time 6 days 220431 hours
Avg Machines 653
CPU Time 11 years
Parallel Efficiency 93
36The Football Pool Problem
37Win By Gambling
- Each week, 6 games are played
- The outcome of each game is
- win
- lose
- tie
38Bet, and win
- Get 5 of the 6 games correctly predicted, and you
win - What is the minimum number of predictions you
must make to guarantee winning?
39Known Values
number of games
minimum predictions
3 5
4 9
5 27
40Problem Description
- A covering code
- An NP Hard problem
- Many years of research and effort for 6 games
leads to - 65 lt minimum number of predictions lt 73
- An integer programming problem
- Best solver is the commercial application CPLEX
41Why the Problem is Difficult
- Number of tickets possible 6! x 36
- The tree that represents the problem (and
solutions) has many isomorphic branches. This
makes it difficult to prune the tree. - New techniques have been developed, which leads
to reducing the interval of solution - The latest and greatest does many smaller
problems using MW
42Solution!
- Not yet. . .
- The first effort (many CPU years worth of time)
had a very small error in input - Second effort is still in progress.
- All this to improve the lower bound from 65 to
70, thereby reducing the range for the solution