Lecture 4 Introduction to maximum likelihood in computational biology - PowerPoint PPT Presentation

1 / 69
About This Presentation
Title:

Lecture 4 Introduction to maximum likelihood in computational biology

Description:

Big picture view of statistics, probability, likelihood. ... to be close enough to constant so as not to be worth bothering about. ... – PowerPoint PPT presentation

Number of Views:164
Avg rating:3.0/5.0
Slides: 70
Provided by: karch7
Category:

less

Transcript and Presenter's Notes

Title: Lecture 4 Introduction to maximum likelihood in computational biology


1
Lecture 4Introduction to maximum likelihood in
computational biology
  • Rachel Karchin
  • BME 580.688/580.488 and CS 600.488 Spring 2009

2
Overview
  • Big picture view of statistics, probability,
    likelihood.
  • How people are actually using maximum likelihood
    estimation in a few computational sequence
    analysis methods

3
What is statistics and probability?
  • Statistics
  • Study of uncertainty. How to measure it and
    whatto do about it.
  • Probability
  • Area of mathematics and philosophy devoted to
    quantification of uncertainty.Fairly recent in
    the history of ideas. Started around 1650.

Material courtesy of David Draper
4
Three main schools of probability
  • Classical
  • Enumerate elemental outcomes (EOs) of a process
    s.t. they are equipossible nA number EOs
    favorable to A n total number of EOs
    P(A) nA / n

Material courtesy of David Draper, based on Oakes
1990
5
Three main schools of probability
  • Frequentist
  • Considers only attributes A of events event
    phenomenon that is inherently and independently
    repeatable under identical conditionsP(A)
    limiting value of the relative frequency with
    which A occurs in the repetitions

Material courtesy of David Draper, based on Oakes
1990
6
Three main schools of probability
  • BayesianPersonal or subjective
    approach.Imagine betting with someone about the
    truth of proposition AAsk myself what odds O (in
    favor of A) I would need to give or receive to
    judge the bet fair.P(A) O / (1 O)

Material courtesy of David Draper, based on Oakes
1990
7
Computational biology example
  • A this mutation is pathogenic

8
Subjectivity is always involved, isnt it?
  • All three approaches require me touse my
    judgment to identify a recognizable subset of
    mutations whose members pathogenicity differs
    from the larger set of mutations by an amount
    that I judge as large enough to matter in a
    practical sense

Material based on David Draper
9
Subjectivity is always involved, isnt it?
  • Within that subset, I regard pathogenicity to be
    close enough to constant so as not to be worth
    bothering about.

10
Example mutations that cause cystic fibrosis
  • What is a disease-causing CFTR mutation?
  • Thousands of different mutations found in this
    gene that impact protein function severely,
    moderately, mildly.

Cystic Fibrosis Transmembrane Regulator
Image from Hanrahan 2004
11
Example mutations that cause cystic fibrosis
  • Clinical symptoms from these defects also have a
    wide range
  • Obstructive lung disease
  • Pancreatic Insufficiency
  • Pseudomonas infection
  • Male infertility

Cystic Fibrosis Transmembrane Regulator
Image from Hanrahan 2004
12
Example mutations that cause cystic fibrosis
  • One way to define a pathogenic CFTR mutation
  • 2/3 of patients with the mutation should have
    sweat chloride measurement gt 60
    milliequivalents/liter plus PI, obstructive lung
    disease, and pseudomonas

Cystic Fibrosis Transmembrane Regulator
Image from Hanrahan 2004
13
Computational biology example
  • A this mutation is pathogenicWhat is P(A)
    ?

A U AC
CFTR mutations for which 2/3 of patients have
sweat chloride measurement gt 60
milliequivalents/liter plus PI, obstructive
lung disease, and pseudomonas
All mutations in CFTR
A
14
Classical
  • Consider a space of elemental outcomes. What is
    P(A) ?

P(A) nA / n
A U AC
CFTR mutations for which 2/3 of patients have
sweat chloride measurement gt 60
milliequivalents/liter plus PI, obstructive
lung disease, and pseudomonas
All mutations in CFTR
A
15
Frequentist
  • Consider a space of repeated events that occur
    independently under identical conditions.
  • What is P(A) ?

m1m2 m3 m4 m5 m6. . .mn
A
A
Population
Sample
16
Likelihood
Under the assumption that the events we see
occur independently under identical
conditions The occurrence of mutation is
pathogenic or not for each sample i is a random
variable Y(i)
17
Likelihood
  • Independent Y(i) implies that the joint sampling
    distribution of all of them
  • is the product of the separate, or marginal
    sampling distributions

18
Likelihood
Identically distributed Y(i) implies
where f(?) is a pdf
19
Likelihood
Simple example If Y(i) are from a Bernouli
distribution, each one is distributed according
to a Bernouli pdf with parameter ?
20
Likelihood
Let y represent the vector of data values (y1,
y2,y3, . . ., yn) Before we see the data, the
joint sampling distribution is a function of y
for fixed ? Tells us how the data would be
likely to behave in the future if you take an iid
sample from f(?)
Bayesian Modeling Inference and Prediction,
Draper
21
Likelihood
In 1921, Fisher suggested the following
idea After we see the data,can be
interpreted as a function of ? for fixed y He
called this the likelihood function for ?
Bayesian Modeling Inference and Prediction,
Draper
22
Likelihood
Fisher tried to create a theory of inference
about ? using only the likelihood function
Bayesian Modeling Inference and Prediction,
Draper
23
Likelihood
Example with Bernouli pdf
y 0,0,1,0,1,1,1,0,1,0,1,0,0,0,0,1
Let
Now the likelihood function depends on the data
vector y only through s Fisher called this a
sufficient statistic.
Bayesian Modeling Inference and Prediction,
Draper
24
Likelihood
Example with Bernouli pdf
sample size 400 s 72
function of ? for fixed y
Bayesian Modeling Inference and Prediction,
Draper
25
Likelihood
Possible problemsLook at those numbers on the
y-axis !Hardware/softwareability to work
withfloating point numbersthis small is
questionable Solution Logarithms
Bayesian Modeling Inference and Prediction,
Draper
26
Log Likelihood
Locally quadratic around its maximum for
sufficiently large n
Bayesian Modeling Inference and Prediction,
Draper
27
Maximum Likelihood
Logarithm function is monotone increasing, so
the ? that maximizes the log likelihood also
maximizes the likelihoodFor a function this
well-behaved, can set its first partial
derivative wrt ? to 0 and solve
Bayesian Modeling Inference and Prediction,
Draper
28
Maximum Likelihood
Fisher also had the idea that the maximum of the
likelihood function (which is based on the data
weve seen the sample) would be a good estimate
of ?, which is a parameter that represents the
population, which we dont see.
Bayesian Modeling Inference and Prediction,
Draper
29
Maximum Likelihood
  • Bernouli example
  • Maximum likelihood estimate of ? for Bernouli
    distribution is the sample mean (average number
    of successes)

For a function this well-behaved, can set its
first partial derivative wrt ? to 0 and solve
30
Maximum Likelihood
  • Another simple example to build intuition
  • We can simulate data from a Gaussian distribution
    for a fixed mean and the standard deviation
  • This is how to do it in R

gt data lt- rnorm(n20,mean5,sd1)
31
Gaussian Likelihood
  • Plot of Gaussian probability distribution with
    mean 5 and standard deviation 1.
  • The twenty vertical bars correspond to the
    twenty values in our simulated data
  • The height of each bar represents the probability
    of that value, given the assumed mean and
    standard deviation.
  • The likelihood function is the product of all
    those probabilities (heights).

None of the probabilities is particularly
low,and they are highest in the center of the
distribution, which is most heavily populated by
the data.
http//www-structmed.cimr.cam.ac.uk/Course/Likelih
ood/likelihood.html
32
Gaussian Likelihood
What if we change the assumed mean of the
distribution to 4?
http//www-structmed.cimr.cam.ac.uk/Course/Likelih
ood/likelihood.html
33
Gaussian Likelihood
What if we change the assumed mean of the
distribution to 4?
The data values on the high end of the
distribution become very improbable, which will
reduce the likelihood function significantly.
Also, fewer of the data values are now in the
peak region.
http//www-structmed.cimr.cam.ac.uk/Course/Likelih
ood/likelihood.html
34
Gaussian Likelihood
What if we change the assumed mean of the
distribution to 6?
http//www-structmed.cimr.cam.ac.uk/Course/Likelih
ood/likelihood.html
35
Gaussian Likelihood
What if we have the correct mean but the wrong
standard deviation?
mean 5 and standard deviation 0.6
http//www-structmed.cimr.cam.ac.uk/Course/Likelih
ood/likelihood.html
36
Gaussian Likelihood
What if we have the correct mean but the wrong
standard deviation?
mean 5 and standard deviation 0.6
In the heavily populated centre, the probability
values go up, but the values in the two tails go
down even more, so that the overall value of the
likelihood is reduced.
http//www-structmed.cimr.cam.ac.uk/Course/Likelih
ood/likelihood.html
37
Gaussian Likelihood
What if we have the correct mean but the wrong
standard deviation?
mean 5 and standard deviation 2
http//www-structmed.cimr.cam.ac.uk/Course/Likelih
ood/likelihood.html
38
Gaussian Likelihood
What if we have the correct mean but the wrong
standard deviation?
mean 5 and standard deviation 2
The probabilities in the tails go up, but the
decrease in the heavily-populated centre means
that the overall likelihood goes down.
http//www-structmed.cimr.cam.ac.uk/Course/Likelih
ood/likelihood.html
39
Gaussian Likelihood
We can carry out a likelihood calculation for all
possible pairs of mean and standard deviation,
shown in the following contour plot. As you see,
the peak in this distribution is close to the
mean and standard deviation that were used in
generating the data from a Gaussian distribution.
http//www-structmed.cimr.cam.ac.uk/Course/Likelih
ood/likelihood.html
40
Maximum Likelihood
  • If you maximize l(?y)
  • and I maximize c l(?y)
  • The likelihood function is only defined up to a
    positive multiplicative constant
  • Fishers actual definition was

well get the same thing
cgt0
41
Maximum Likelihood
  • Provides a basic principle for estimation of a
    (population) parameter ? from the
    frequentist/likelihood point of view.

42
Problems with this approach in practice
  • For distributions we are interested in, usually
    more than one parameter to estimate
  • Likelihood functions may be extremely complicated
    and maximum cannot be found analytically

43
Maximum Likelihood in computational biology
  • Real-world examples
  • Sequence similarity scores from a profile hidden
    Markov model

44
Profile hidden Markov models
Profile hidden Markov models. Eddy 1998
45
Multiple sequence alignment with no gaps (can be
represented by BLOCKS model)
46
Multiple sequence alignment with gaps deletions
and/or insertions
47
Sequence similarity scores from a profile hidden
Markov model
HMM trained onprotein kinases
Protein identifiers
Scores
P42644 S1 P48349 S2 O70456 S3 . .
. Q15172 Sk-1 O94376 Sk
Protein sequence Database
48
Sequence similarity scores from a profile hidden
Markov model
  • How is S distributed?
  • Depends on choice of N (the null model)

49
Sequence similarity scores from a profile hidden
Markov model
peak (location parameter)
extreme value distribution
scaling parameter
The HMMer program models the null with background
amino acid frequencies from UniProt and scores
with an extreme value distribution
50
Sequence similarity scores from a profile hidden
Markov model
What might the alternative distributionlook like
here?
peak (location parameter)
extreme value distribution
scaling parameter
The HMMer program models the null with background
amino acid frequencies from UniProt and scores
with an extreme value distribution
51
Sequence similarity scores from a profile hidden
Markov model
peak (location parameter)
scaling parameter
extreme value distribution
Given a collection of random sequence scores,
what are the most likely values of ?, µ ?
52
Sequence similarity scores from a profile hidden
Markov model
peak (location parameter)
scaling parameter
extreme value distribution
Given a collection of random sequence scores,
what are the most likely values of ?, µ ?
How could we do this?
53
Sequence similarity scores from a profile hidden
Markov model
Given a collection of random sequence scores,
what are the most likely values of ?, µ ?
extreme value distribution
Maximum likelihood fitting of extreme value
distributions. Eddy 1997
54
Sequence similarity scores from a profile hidden
Markov model
Given a collection of random sequence scores,
what are the most likely values of ?, µ ?
Whats wrong with this?
extreme value distribution
Maximum likelihood fitting of extreme value
distributions. Eddy 1997
55
Maximum likelihood fit to EVD
Maximum likelihood fitting of extreme value
distributions. Eddy 1997
56
Maximum likelihood fit to EVD
After much painful algebra . . .
Maximum likelihood fitting of extreme value
distributions. Eddy 1997
57
How its done in HMMer
  • Newton-Raphson algorithm

Using xis from simulation of 1000 samples from
EVDwith ?0.4 and µ-20 and varying? (along
x-axis)This function is well-behaved in
vicinityof the root.
Maximum likelihood fitting of extreme value
distributions. Eddy 1997
58
How its done in HMMer
  • Numerical root finding with Newton-Raphson
    algorithm

x0
xHAT
59
How its done in HMMer
  • Guess ?0
  • Apply Newton-Raphson
  • Calculate target function f and f at ?0
  • If f within some absolute tolerance of 0 (e.g.
    10-6) stop, found
  • Else estimate new ? ?- f/f and iterate again
  • Use to get

60
Sequence similarity scores from a profile hidden
Markov model
  • The SAM hidden Markov modeling program uses an
    unusual null model.
  • The reverse sequence null model
  • EVD does not fit these scores very well.

Karplus et al. Predicting protein structure using
only sequence information. Proteins
1999S3212-5
61
Reverse sequence null model scores
  • Reverse null model scores have fatter tails than
    expected in EVD
  • A candidate distribution family is the heavy
    tailed student t

Karplus et al. Calibrating E-values for hidden
Markov models using reverse-sequence null models.
62
Reverse sequence null model scores
  • Student-t density
  • By symmetry, set µ0
  • A priori, know that the tail-weight parameter ?
    is small (heavy tail)
  • Without much prior information, makes sense to
    estimate ? and s with maximum likelihood

Karplus et al. Calibrating E-values for hidden
Markov models using reverse-sequence null models.
63
Reverse sequence null model scores
  • Student-t log likelihood function

The maximum likelihood estimates of and
are the solutions to
Karplus et al. Calibrating E-values for hidden
Markov models using reverse-sequence null models.
64
Reverse sequence null model scores
  • Cant do this analytically
  • Requires numerical multivariate optimization
    methods
  • For this problem, I used the GNU scientific
    library (GSL) to
  • Initialize ?0 and s0
  • Find direction of steepest gradient ascent
  • Bracket a local maximum along the line with
    golden section
  • Iterate through 2 and 3 until convergence
    threshold achieved

Karplus et al. Calibrating E-values for hidden
Markov models using reverse-sequence null models.
65
Minimization in multi-dimensional space often
broken down to a series of 1-D line
minimizations
  • If we can find three points a, b and c with
    altbltc and f(a) gt f(c) gt f(b) (and f is well
    behaved in this region) then there must exist at
    least one minimum point in the interval (a,c).
    The points a, b and c are said to bracket the
    minimum.
  • Dont need derivativesfor a line search basedon
    bracketing a mimimum.

http//homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_C
OPIES/BMVA96Tut/node16.html http//www.bip.bham.ac
.uk/osmart/msc/lect/lect2a.html
66
Golden section search in 1D
  • evaluate the function at some point x in the
    larger of the two intervals (a,b) or (b,c).
  • If f(x)ltf(b), then x replaces the midpoint b and
    b becomes an endpoint
  • If f(x)gtf(b) then b remains the midpoint and x
    replaces one of the end points.
  • Either way bracketing interval gets smaller.
  • Repeat until bracketing interval reaches a
    desired tolerance
  • Choosing x at the golden section proportion
    yields optimal interval reduction rate

x
(1,3,2) (1,3,4)
Numerical Recipes in C The Art of Scientific
Computing Second Edition Press et al.
67
Golden section search in 1D
  • evaluate the function at some point x in the
    larger of the two intervals (a,b) or (b,c).
  • If f(x)ltf(b), then x replaces the midpoint b and
    b becomes an endpoint
  • If f(x)gtf(b) then b remains the midpoint and x
    replaces one of the end points.
  • Either way bracketing interval gets smaller.
  • Repeat until bracketing interval reaches a
    desired tolerance
  • Choosing x at the golden section proportion
    yields optimal interval reduction rate

x
(1,3,2) (1,3,4) (5,3,4)
Numerical Recipes in C The Art of Scientific
Computing Second Edition Press et al.
68
Golden section search in 1D
  • evaluate the function at some point x in the
    larger of the two intervals (a,b) or (b,c).
  • If f(x)ltf(b), then x replaces the midpoint b and
    b becomes an endpoint
  • If f(x)gtf(b) then b remains the midpoint and x
    replaces one of the end points.
  • Either way bracketing interval gets smaller.
  • Repeat until bracketing interval reaches a
    desired tolerance
  • Choosing x at the golden section proportion
    yields optimal interval reduction rate

x
(1,3,2) (1,3,4) (5,3,4) (5,3,6)
Numerical Recipes in C The Art of Scientific
Computing Second Edition Press et al.
69
What you should know
  • Develop some intuition about likelihood, maximum
    likelihood and its limitations
  • Identify problems where you need a maximum
    likelihood solution and be able to
  • assess whether analytic solution is possible
  • select and implement a numerical optimization
    algorithm when necessary
Write a Comment
User Comments (0)
About PowerShow.com