# Chapter 15: Likelihood, Bayesian, and Decision Theory - PowerPoint PPT Presentation

PPT – Chapter 15: Likelihood, Bayesian, and Decision Theory PowerPoint presentation | free to download - id: 6d3676-MzAyZ The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
Title:

## Chapter 15: Likelihood, Bayesian, and Decision Theory

Description:

### Chapter 15: Likelihood, Bayesian, and Decision Theory AMS 572 Group Members Yen-hsiu Chen, Valencia Joseph, Lola Ojo, Andrea Roberson, Dave Roelfs, – PowerPoint PPT presentation

Number of Views:105
Avg rating:3.0/5.0
Slides: 51
Category:
Tags:
Transcript and Presenter's Notes

Title: Chapter 15: Likelihood, Bayesian, and Decision Theory

1
Chapter 15 Likelihood, Bayesian, and Decision
Theory
• AMS 572
• Group Members
• Yen-hsiu Chen, Valencia Joseph, Lola Ojo,
• Andrea Roberson, Dave Roelfs,
• Saskya Sauer, Olivia Shy, Ping Tung

2
Introduction
"To call in the statistician after the experiment
is done may be no more than asking him to perform
a post-mortem examination he may be able to say
what the experiment died of." - R.A. Fisher
• Maximum Likelihood, Bayesian, and Decision Theory
are applied and have proven its selves useful and
necessary in sciences, such as physics, as well
as research in general.
• They provide a practical way to begin and carry
out an analysis or experiment.

3
15.1 Maximum Likelihood Estimation
4
15.1.1 Likelihood Function
• Objective Estimating the unknown parameters ?of
a population distribution based on a random
sample ?1,,?n from that distribution
• Previous chapters Intuitive Estimates
• gt Sample Means for Population Mean
• To improve estimation, R. A. Fisher (18901962)
proposed MLE in 19121922.

5
Ronald Aylmer Fisher (18901962)
• The greatest of Darwin's successors
• Known for
• 1912 Maximum likelihood
• 1922 F-test
• 1925 Analysis of variance (Statistical
Method for Research Workers )
• Notable Prizes
• Royal Medal (1938)
• Copley Medal (1955)

Source http//www-history.mcs.st-andrews.ac.uk/hi
story/PictDisplay/Fisher.html
6
Joint p.d.f. vs. Likelihood Function
• Identical quantities
• Different interpretation
• Joint p.d.f. of X1 ,, Xn
• A function of ?1,,?n for given ?
• Probability interpretation
• Likelihood Function of ?
• A function of ?for given ?1,,?n
• No probability interpretation

7
Example Normal Distribution
• Suppose ?1,,?n is a random sample from a normal
distribution with p.d.f.

• parameter ( ), Likelihood Function

8
15.1.2 Calculation of Maximum Likelihood
Estimators (MLE)
• MLE of an unknown parameter ?
• The value which
maximizes the likelihood function
• Example of MLE
• 2 independent Bernoulli trials with success
probability ?
• ?is known 1/4 and 1/3
• gtparameter space T 1/4, 1/3
• Using Binomial distribution, the probabilities of
observing
• ? 0, 1, 2 successes can be calculated

9
Example of MLE
• Probability of Observing?Successes

the of successes Parameter space T ?
the of successes Parameter space T 0 1 2
1/4 9/16 6/16 1/16
1/3 4/ 9 4/ 9 1/9
• When ?0, the MLE of
• When ?1 or 2, the MLE of
• The MLE is chosen to maximize
for observed ?

10
15.1.3 Properties of MLEs
• Objective
• optimality properties in large sample
• Fisher information (continuous case)
• Alternatives of Fisher information

(1)
(2)
11
(No Transcript)
12
(No Transcript)
13
MLE (Continued)
• Define the Fisher information for an i.i.d. sample

14
MLE (Continued)
• Generalization of the Fisher information for
• k-dimensional vector parameter

15
MLE (Continued)
• Cramér-Rao Lower Bound
• A random sample X1, X2, , Xn from p.d.f
f(x?).
• Let be any estimator of ? with where B(?)
is the bias of If B(?) is differentiable in
? and if certain regularity conditions holds,
then
• (Cramér-Rao inequality)
• The ratio of the lower bound to the variance of
any estimator of ? is called the efficiency of
the estimator.
• An estimator has efficiency 1 is called the
efficient estimator.

16
15.1.4 Large Sample Inference Based on the MLEs
• Large sample inference on unknown parameter ?
• estimate
• 100(1-a) CI for ?

17
15.1.4 Delta Method for Approximating the
Variance of an Estimator
• Delta method
• estimate a nonlinear function h(?)
• suppose that and is a known function of
?.
• using

18
15.2 Likelihood Ratio Tests
19
15.2 Likelihood Ratio Tests
The last section presented an inference for
pointwise estimation based on likelihood theory.
In this section, we present a corresponding
inference for testing hypotheses. Let be
a probability density function where is a real
valued parameter taking values in an interval
that could be the whole real line. We call the
parameter space. An alternative hypothesis
will restrict the parameter to some subset
of the parameter space . The null hypothesis
is then the complement of with respect to
.
20
• Consider the two-sided hypothesis

versus , where is a
specified value.
We will test versus on the basis of the
random sample from . If the null
hypothesis holds, we would expect the likelihood
to be relatively large, when
evaluated at the prevailing value . Consider
the ratio of two likelihood functions, namely
Note that , but if is true should
be close to 1 while if is true, should
be smaller. For a specified significance level
, we have the decision rule, reject in
favor of if , where c is such that
This test is called the likelihood ratio test.
21
Example 1
Let be a random sample of size n from a
normal distribution with known variance. Obtain
the likelihood ratio for testing versus
.
So
is a maximum since
.
lt 0 . Thus
is the MLE of
.
22
Example 1 (continued)
.
is equivalent to
thus
23
Example 2
Let be a random sample from a Poisson
distribution with mean gt0. a. Show that
the likelihood ratio test of versus is
based upon the statistic .
Obtain the null distribution of Y.
So
is a maximum since
thus
is the mle of
24
Example 2 (continued)
The likelihood ratio test statistic is

,
. Under
And its a function of Y
25
Example 2 (continued)
• For 2 and n 5, find the significance
level of the test that rejects if
or .

The null distribution of Y is Poisson(10).
26
Composite Null Hypothesis
The likelihood ratio approach has to be modified
slightly when the null hypothesis is composite.
When testing the null hypothesis
concerning a normal mean when
is unknown, the parameter space
is a subset of
The null hypothesis is composite and
Since the null hypothesis is composite, it isnt
certain which value of the parameter(s) prevails
even under
. So we take the maximum of the
likelihood over
The generalized likelihood ratio test statistic
is defined as
27
Example 3
be a random sample of size n from a
normal distribution with unknown mean and
variance. Obtain the likelihood ratio test
statistic for testing
Let
versus

In Example 1, we found the unrestricted mle
Now

,
Since
we only need to find the value of
maximizing
28
Example 3 (continued)
So
is a maximum since
is the MLE of
Thus
Thus
is the MLE of
We can also write
.
29
Example 3 (continued)
30
Example 3 (continued)
Rejection region
, such that
so
where
define
and
implies
or
So
where
31
15.3 Bayesian Inference
Bayesian inference refers to a statistical
inference where new facts are presented and used
draw updated conclusions on a prior belief. The
term Bayesian stems from the well known Bayes
Theorem which was first derived by Reverend
Thomas Bayes.
Thomas Bayes (c. 1702 April 17, 1761) Source
www.wikipedia.com
Thomas Bayes (pictured above) was a Presbyterian
minister and a mathematician born in London who
developed a special case of Bayes theorem which
was published and studied after his death.

f (AB) f (A n B) / f (B) f (B A) f (A) /
f(B) since, f (A n B) f (B n A) f (B A) f
(A)
Bayes Theorem (review)
(15.1)
32
Some Key Terms in Bayesian Inference
in plain English
• prior distribution probability tendency of an
uncertain quantity, ?, that expresses previous
knowledge of ? from, for example, a past
experience, with the absence of some proof
• posterior distribution this distribution takes
proof into account and is then the conditional
probability of ?. The posterior probability is
computed from the prior and the likelihood
function using Bayes theorem.
• posterior mean the mean of the posterior
distribution
• posterior variance the variance of the
posterior distribution
• conjugate priors - a family of prior probability
distributions in which the key property is that
the posterior probability distribution also
belongs to the family of the prior probability
distribution

33
15.3.1 Bayesian Estimation
So far weve learned that the Bayesian approach
treats ? as a random variable and then data is
used to update the prior distribution to obtain
the posterior distribution of ?. Now lets move on
to how we can estimate parameters using this
approach.
(Using text notation) Let ? be an unknown
parameter based on a random sample, x1, x2, , xn
from a distribution with pdf/pmf f (x ?). Let p
(?) be the prior distribution of ?. Let p (?
x1, x2, , xn) be the posterior distribution.
Note that p (? x1, x2, , xn) is the
condition distribution of ? given the observed
data, x1, x2, , xn. If we apply Bayes Theorem
(Eq. 15.1), our posterior distribution becomes
f (x1, x2, , xn ?) p(?)
f (x1, x2, , xn ?) p(?)

(15.2)
f (? x1, x2, , xn)
f (x1, x2, , xn ?)p(?)
d?
Note that f (? x1, x2, , xn) is the marginal
PDF of X1, X2, ,Xn
34
Bayesian Estimation (continued)
As seen in equation 15.2, the posterior
distribution represents what is known about ?
after observing the data X x1, x2, , xn . From
earlier chapters, we know that the likelihood of
a variable ? is f (X ?) .
So, to get a better idea of the posterior
distribution, we note that

posterior distribution likelihood
x prior distribution i.e. p (?
X) f (X ?) x p (?)
For a detailed practical example of deriving the
posterior mean and using Bayesian estimation,
visit
http//www.stat.berkeley.edu/users/rice/Stat135/B
ayes.pdf ?
35
Example 15.26
Let x be the number of successes from n i.i.d.
Bernoulli trials with unknown success probability
p?. Show that the beta distribution is a
conjugate prior on ?.
?
?
? Goal
36
Example 15.26 (continued)
X has a binominal distribution of n and p ?
x1,2,n
Prior distribution of ? is the beta distribution
0 ? 1
37
Example 15.26 (continued)
It is a beta distribution with parameters (xa)
and (n-xb)!!
38
• Notes
• The parameters a and b of the prior distribution
• may be interpreted as prior successes and
prior
• failures, with mab being the total number
of
• prior observations.
• After actually observing x successes and
n-x
• failures in n i.i.d Bernoulli trials, these
parameters
• are updated to ax and bn-x, respectively.
• The prior and posterior means are, respectively,

and
39
15.3.2 Bayesian Testing
Assumption
, we reject in favor of .
If
Where k gt0 is a suitably chosen critical constant.
40

15.4 Decision Theory
• Abraham Wald
• (1902-1950)
• was the founder of
• Statistical decision theory.
• His goal was to
• provide a unified
• theoretical framework
• for diverse problems.
• i.e. point estimation,
• confidence interval
• estimation and hypothesis testing.

Source http//www-history.mcs.st-andrews.ac.uk/hi
story/PictDisplay/Wald.html
41
Statistical Decision Problem
• The goal is to choose a decision d from a set of
possible decisions D, based on a sample outcome
(data) x
• Decision space is D
• Sample space the set of all sample outcomes
denoted by x
• Decision Rule d is a function d(x) which assigns
to every sample outcome x ? X, a decision d ? D.

42
Continued
• Denote by X the R.V. corresponding to x and the
probability distribution of X by f (x?).
• The above distribution depends on an unknown
parameter ? belonging to a parameter space T
• Suppose one chooses a decision d when the true
parameter is ?, a loss of L (d, ?) is incurred
also known as the loss function.
• The decision rule is assessed by evaluating its
expected loss called the risk function
• R(d, ?) EL(d(X),?) ?xL(d(X),?) f (x?)dx.

43
Example
• Calculate and compare the risk functions for the
squared error loss of two estimators of success
probability p from n i.i.d. Bernoulli trials. The
first is the usual sample proportion of successes
and the second is the bayes estimator from
Example 15.26
• ?1 X/n
• and
• ?2 a X/ m n

44
Von Neumann (1928) Minimax
Sourcehttp//jeff560.tripod.com/
45
How Minimax Works
• Focuses on risk avoidance
• Can be applied to both zero-sum and non-zero-sum
games
• Can be applied to multi-stage games
• Can be applied to multi-person games

46
Classic Example The Prisoners Dilemma
• Each player evaluates his/her alternatives,
attempting to minimize his/her own risk
• From a common sense standpoint, a sub-optimal
equilibrium results

Prisoner B Stays Silent Prisoner B Betrays
Prisoner A Stays Silent Both serve six months Prisoner A serves ten years Prisoner B goes free
Prisoner A Betrays Prisoner A goes free Prisoner B serves ten years Both serve two years
47
Classic example With Probabilities
Two player game with simultaneous moves, where the probabilities with which player two acts are known to both players. Two player game with simultaneous moves, where the probabilities with which player two acts are known to both players. Two player game with simultaneous moves, where the probabilities with which player two acts are known to both players. Two player game with simultaneous moves, where the probabilities with which player two acts are known to both players. Two player game with simultaneous moves, where the probabilities with which player two acts are known to both players.
2 1 Action A P(A)p Action B P(B)q Action C P(C)r Action D P(D)1-pqr
Action A -1 1 -2 4
Action B -2 7 1 1
Action C 0 -1 0 3
Action D 1 0 2 3
• When disregarding the probabilities when playing
the game, (D,B) is the equilibrium point under
minimax
• With probabilities (pqr1/4), player one will
choose B. This is

48
how Bayes works
• View (pi,qi,ri) as ?i where i1 in the previous
example
• Letting i1,n we get a much better idea of what
Bayes meant by states of nature and how
probabilities of each state enter into ones
strategy

49
Conclusion
• We covered three theoretical approaches in our
presentation
• Likelihood
• provides statistical justification for many of
the methods used in statistics
• MLE - method used to make inferences about
parameters of the underlying probability
distribution of a given data set
• Bayesian and Decision Theory
• Bayesian Theory
• probabilities are associated with individual
event or statements rather than with sequences of
events
• Decision Theory
• Describe and rationalize the process of decision
making, that is, making a choice of among several
possible alternatives