Title: Bioinformatics and Graphical Models: Computation, approximation, and their value
1Bioinformatics and Graphical ModelsComputation,
approximation, and their value
- MSR
- Nebojsa Jojic, Vladimir Jojic, Chris Meek, David
Heckerman
UW Jim Mullins, Mark Jensen, Jerry Learn
2Overview
- Computational cost of usual algorithms
- State of the art
- Phylogeny alignment
- Phylogeny sequence modeling
- Approximations and their pitfalls
- Recombination
- Analogy to other ML domains
- Graphical model
- Experiments and computational cost
- Value of the computation
- Potential applications
- Drug discovery cycle
- Value of time and clinical success
- Market size and growth
- Discussion
3Rational vaccine design(Jim Mullins et al)
- Rational design
- Analysis of sequences to form a model of virus
evolution (phylogenies, etc.) - Develop vaccines that target as much variability
as possible - Traditional design
- Trial and error
- Educated guesses
4State of the art sequence analysis programs
- Example
- Rational AIDS vaccine design
- Analysis of the envelope gene from a single
patient in one visit - 200 sequences with 600 base pairs each
- Overnight to align
- 1-2 hours to 2-3 days to build a tree, depending
on how much search you are willing to do - This does not include modeling the inter-sequence
dependencies, coupling alignment and tree search,
and it ignores recombination - The total length of the HIV genome is 10000 and
the number of samples is practically only limited
by cost
5Computational cost of a slightly more detailed
analysis
- Metropolis search over all trees on 400 sequences
of the full genome (10k) would last around 2
years on one machine - Exact search intractable!
6Approximation
- Free energy as a bound on negative log-likelihood
- Computation and approximation of the free energy
- Iterative conditional modes
- Mean-field method
- Structured variational techniques
- (Loopy) belief propagation
- Sampling techniques
- How tight is the bound?
- What does the looseness translate to?
7An example of the approximation issues
8An example of the approximation issues
9An example of the approximation issuesTightness
of the bounds
Variational technique
Exact EM algorithm
10Recombination
- In HIV, the rate of recombination has recently
been estimated to be ¼ of the rate of mutation! - Combinatorial explosion in inference
11Similar situations in other domains where
graphical models work well
- Occlusion in video
- Source interaction in audio
- Composition of images
12Occlusion in audio
Speaker2
Speaker1
M
1-M
Retrieved Speaker1
Retrieved Speaker2
13Epitome of an image
A set of image patches
Input image
Epitome
14Layers from a single photograph
es
em
S1
s2
M
x
15Modeling alignment and recombination by learning
a library of gene patterns
16Experimental results
17Value of computation(from Tufts Center)
18Growth
- Human viruses
- West Nile
- SARS
- Hepatitis C
- Polio
-
- Animal viruses
- FIV
- Pig, chicken and cow viruses
- Most bacterial diseases
- Parasitic diseases
- The first sign of success of rational design
might trigger great increase in the number of
diseases tackled
19How can MS/MSR be involved?
- MS Architecture, platform, tools
- Storage, transmission, computation
- E.g., parallelizable computation on a single
machine pear-to-pear networks for parallel
computation on multiple machines - MSR
- Helping to speed up the scientific progress
leading to the new opportunities for growth - Advising MS on the research direction in the
community and the future requirements for the
platform