Chapter 15: Likelihood, Bayesian, and Decision Theory - PowerPoint PPT Presentation

Loading...

PPT – Chapter 15: Likelihood, Bayesian, and Decision Theory PowerPoint presentation | free to download - id: 6d3676-MzAyZ



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Chapter 15: Likelihood, Bayesian, and Decision Theory

Description:

Chapter 15: Likelihood, Bayesian, and Decision Theory AMS 572 Group Members Yen-hsiu Chen, Valencia Joseph, Lola Ojo, Andrea Roberson, Dave Roelfs, – PowerPoint PPT presentation

Number of Views:105
Avg rating:3.0/5.0
Slides: 51
Provided by: Sask62
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Chapter 15: Likelihood, Bayesian, and Decision Theory


1
Chapter 15 Likelihood, Bayesian, and Decision
Theory
  • AMS 572
  • Group Members
  • Yen-hsiu Chen, Valencia Joseph, Lola Ojo,
  • Andrea Roberson, Dave Roelfs,
  • Saskya Sauer, Olivia Shy, Ping Tung

2
Introduction
"To call in the statistician after the experiment
is done may be no more than asking him to perform
a post-mortem examination he may be able to say
what the experiment died of." - R.A. Fisher
  • Maximum Likelihood, Bayesian, and Decision Theory
    are applied and have proven its selves useful and
    necessary in sciences, such as physics, as well
    as research in general.
  • They provide a practical way to begin and carry
    out an analysis or experiment.

3
15.1 Maximum Likelihood Estimation
4
15.1.1 Likelihood Function
  • Objective Estimating the unknown parameters ?of
    a population distribution based on a random
    sample ?1,,?n from that distribution
  • Previous chapters Intuitive Estimates
  • gt Sample Means for Population Mean
  • To improve estimation, R. A. Fisher (18901962)
    proposed MLE in 19121922.

5
Ronald Aylmer Fisher (18901962)
  • The greatest of Darwin's successors
  • Known for
  • 1912 Maximum likelihood
  • 1922 F-test
  • 1925 Analysis of variance (Statistical
    Method for Research Workers )
  • Notable Prizes
  • Royal Medal (1938)
  • Copley Medal (1955)

Source http//www-history.mcs.st-andrews.ac.uk/hi
story/PictDisplay/Fisher.html
6
Joint p.d.f. vs. Likelihood Function
  • Identical quantities
  • Different interpretation
  • Joint p.d.f. of X1 ,, Xn
  • A function of ?1,,?n for given ?
  • Probability interpretation
  • Likelihood Function of ?
  • A function of ?for given ?1,,?n
  • No probability interpretation

7
Example Normal Distribution
  • Suppose ?1,,?n is a random sample from a normal
    distribution with p.d.f.

  • parameter ( ), Likelihood Function

8
15.1.2 Calculation of Maximum Likelihood
Estimators (MLE)
  • MLE of an unknown parameter ?
  • The value which
    maximizes the likelihood function
  • Example of MLE
  • 2 independent Bernoulli trials with success
    probability ?
  • ?is known 1/4 and 1/3
  • gtparameter space T 1/4, 1/3
  • Using Binomial distribution, the probabilities of
    observing
  • ? 0, 1, 2 successes can be calculated

9
Example of MLE
  • Probability of Observing?Successes

the of successes Parameter space T ?
the of successes Parameter space T 0 1 2
1/4 9/16 6/16 1/16
1/3 4/ 9 4/ 9 1/9
  • When ?0, the MLE of
  • When ?1 or 2, the MLE of
  • The MLE is chosen to maximize
    for observed ?

10
15.1.3 Properties of MLEs
  • Objective
  • optimality properties in large sample
  • Fisher information (continuous case)
  • Alternatives of Fisher information

(1)
(2)
11
(No Transcript)
12
(No Transcript)
13
MLE (Continued)
  • Define the Fisher information for an i.i.d. sample

14
MLE (Continued)
  • Generalization of the Fisher information for
  • k-dimensional vector parameter

15
MLE (Continued)
  • Cramér-Rao Lower Bound
  • A random sample X1, X2, , Xn from p.d.f
    f(x?).
  • Let be any estimator of ? with where B(?)
    is the bias of If B(?) is differentiable in
    ? and if certain regularity conditions holds,
    then
  • (Cramér-Rao inequality)
  • The ratio of the lower bound to the variance of
    any estimator of ? is called the efficiency of
    the estimator.
  • An estimator has efficiency 1 is called the
    efficient estimator.

16
15.1.4 Large Sample Inference Based on the MLEs
  • Large sample inference on unknown parameter ?
  • estimate
  • 100(1-a) CI for ?

17
15.1.4 Delta Method for Approximating the
Variance of an Estimator
  • Delta method
  • estimate a nonlinear function h(?)
  • suppose that and is a known function of
    ?.
  • using

18
15.2 Likelihood Ratio Tests
19
15.2 Likelihood Ratio Tests
The last section presented an inference for
pointwise estimation based on likelihood theory.
In this section, we present a corresponding
inference for testing hypotheses. Let be
a probability density function where is a real
valued parameter taking values in an interval
that could be the whole real line. We call the
parameter space. An alternative hypothesis
will restrict the parameter to some subset
of the parameter space . The null hypothesis
is then the complement of with respect to
.
20
  • Consider the two-sided hypothesis

versus , where is a
specified value.
We will test versus on the basis of the
random sample from . If the null
hypothesis holds, we would expect the likelihood
to be relatively large, when
evaluated at the prevailing value . Consider
the ratio of two likelihood functions, namely
Note that , but if is true should
be close to 1 while if is true, should
be smaller. For a specified significance level
, we have the decision rule, reject in
favor of if , where c is such that
This test is called the likelihood ratio test.
21
Example 1
Let be a random sample of size n from a
normal distribution with known variance. Obtain
the likelihood ratio for testing versus
.
So
is a maximum since
.
lt 0 . Thus
is the MLE of
.
22
Example 1 (continued)
.
is equivalent to
thus
23
Example 2
Let be a random sample from a Poisson
distribution with mean gt0. a. Show that
the likelihood ratio test of versus is
based upon the statistic .
Obtain the null distribution of Y.
So
is a maximum since
thus
is the mle of
24
Example 2 (continued)
The likelihood ratio test statistic is



,
. Under
And its a function of Y
25
Example 2 (continued)
  • For 2 and n 5, find the significance
    level of the test that rejects if
    or .

The null distribution of Y is Poisson(10).
26
Composite Null Hypothesis
The likelihood ratio approach has to be modified
slightly when the null hypothesis is composite.
When testing the null hypothesis
concerning a normal mean when
is unknown, the parameter space
is a subset of
The null hypothesis is composite and
Since the null hypothesis is composite, it isnt
certain which value of the parameter(s) prevails
even under
. So we take the maximum of the
likelihood over
The generalized likelihood ratio test statistic
is defined as
27
Example 3
be a random sample of size n from a
normal distribution with unknown mean and
variance. Obtain the likelihood ratio test
statistic for testing
Let
versus

In Example 1, we found the unrestricted mle
Now

,
Since
we only need to find the value of
maximizing
28
Example 3 (continued)
So
is a maximum since
is the MLE of
Thus
Thus
is the MLE of
We can also write
.
29
Example 3 (continued)
30
Example 3 (continued)
Rejection region
, such that
so
where
define
and
implies
or
So
where
31
15.3 Bayesian Inference
Bayesian inference refers to a statistical
inference where new facts are presented and used
draw updated conclusions on a prior belief. The
term Bayesian stems from the well known Bayes
Theorem which was first derived by Reverend
Thomas Bayes.
Thomas Bayes (c. 1702 April 17, 1761) Source
www.wikipedia.com
Thomas Bayes (pictured above) was a Presbyterian
minister and a mathematician born in London who
developed a special case of Bayes theorem which
was published and studied after his death.

f (AB) f (A n B) / f (B) f (B A) f (A) /
f(B) since, f (A n B) f (B n A) f (B A) f
(A)
Bayes Theorem (review)
(15.1)
32
Some Key Terms in Bayesian Inference
in plain English
  • prior distribution probability tendency of an
    uncertain quantity, ?, that expresses previous
    knowledge of ? from, for example, a past
    experience, with the absence of some proof
  • posterior distribution this distribution takes
    proof into account and is then the conditional
    probability of ?. The posterior probability is
    computed from the prior and the likelihood
    function using Bayes theorem.
  • posterior mean the mean of the posterior
    distribution
  • posterior variance the variance of the
    posterior distribution
  • conjugate priors - a family of prior probability
    distributions in which the key property is that
    the posterior probability distribution also
    belongs to the family of the prior probability
    distribution

33
15.3.1 Bayesian Estimation
So far weve learned that the Bayesian approach
treats ? as a random variable and then data is
used to update the prior distribution to obtain
the posterior distribution of ?. Now lets move on
to how we can estimate parameters using this
approach.
(Using text notation) Let ? be an unknown
parameter based on a random sample, x1, x2, , xn
from a distribution with pdf/pmf f (x ?). Let p
(?) be the prior distribution of ?. Let p (?
x1, x2, , xn) be the posterior distribution.
Note that p (? x1, x2, , xn) is the
condition distribution of ? given the observed
data, x1, x2, , xn. If we apply Bayes Theorem
(Eq. 15.1), our posterior distribution becomes
f (x1, x2, , xn ?) p(?)
f (x1, x2, , xn ?) p(?)

(15.2)
f (? x1, x2, , xn)
f (x1, x2, , xn ?)p(?)
d?
Note that f (? x1, x2, , xn) is the marginal
PDF of X1, X2, ,Xn
34
Bayesian Estimation (continued)
As seen in equation 15.2, the posterior
distribution represents what is known about ?
after observing the data X x1, x2, , xn . From
earlier chapters, we know that the likelihood of
a variable ? is f (X ?) .
So, to get a better idea of the posterior
distribution, we note that

posterior distribution likelihood
x prior distribution i.e. p (?
X) f (X ?) x p (?)
For a detailed practical example of deriving the
posterior mean and using Bayesian estimation,
visit
http//www.stat.berkeley.edu/users/rice/Stat135/B
ayes.pdf ?
35
Example 15.26
Let x be the number of successes from n i.i.d.
Bernoulli trials with unknown success probability
p?. Show that the beta distribution is a
conjugate prior on ?.
?
?
? Goal
36
Example 15.26 (continued)
X has a binominal distribution of n and p ?
x1,2,n
Prior distribution of ? is the beta distribution
0 ? 1
37
Example 15.26 (continued)
It is a beta distribution with parameters (xa)
and (n-xb)!!
38
  • Notes
  • The parameters a and b of the prior distribution
  • may be interpreted as prior successes and
    prior
  • failures, with mab being the total number
    of
  • prior observations.
  • After actually observing x successes and
    n-x
  • failures in n i.i.d Bernoulli trials, these
    parameters
  • are updated to ax and bn-x, respectively.
  • The prior and posterior means are, respectively,

and
39
15.3.2 Bayesian Testing
Assumption
, we reject in favor of .
If
Where k gt0 is a suitably chosen critical constant.
40
                                      
15.4 Decision Theory
  • Abraham Wald
  • (1902-1950)
  • was the founder of
  • Statistical decision theory.
  • His goal was to
  • provide a unified
  • theoretical framework
  • for diverse problems.
  • i.e. point estimation,
  • confidence interval
  • estimation and hypothesis testing.

Source http//www-history.mcs.st-andrews.ac.uk/hi
story/PictDisplay/Wald.html
41
Statistical Decision Problem
  • The goal is to choose a decision d from a set of
    possible decisions D, based on a sample outcome
    (data) x
  • Decision space is D
  • Sample space the set of all sample outcomes
    denoted by x
  • Decision Rule d is a function d(x) which assigns
    to every sample outcome x ? X, a decision d ? D.

42
Continued
  • Denote by X the R.V. corresponding to x and the
    probability distribution of X by f (x?).
  • The above distribution depends on an unknown
    parameter ? belonging to a parameter space T
  • Suppose one chooses a decision d when the true
    parameter is ?, a loss of L (d, ?) is incurred
    also known as the loss function.
  • The decision rule is assessed by evaluating its
    expected loss called the risk function
  • R(d, ?) EL(d(X),?) ?xL(d(X),?) f (x?)dx.

43
Example
  • Calculate and compare the risk functions for the
    squared error loss of two estimators of success
    probability p from n i.i.d. Bernoulli trials. The
    first is the usual sample proportion of successes
    and the second is the bayes estimator from
    Example 15.26
  • ?1 X/n
  • and
  • ?2 a X/ m n

44
Von Neumann (1928) Minimax
Sourcehttp//jeff560.tripod.com/
45
How Minimax Works
  • Focuses on risk avoidance
  • Can be applied to both zero-sum and non-zero-sum
    games
  • Can be applied to multi-stage games
  • Can be applied to multi-person games

46
Classic Example The Prisoners Dilemma
  • Each player evaluates his/her alternatives,
    attempting to minimize his/her own risk
  • From a common sense standpoint, a sub-optimal
    equilibrium results

Prisoner B Stays Silent Prisoner B Betrays
Prisoner A Stays Silent Both serve six months Prisoner A serves ten years Prisoner B goes free
Prisoner A Betrays Prisoner A goes free Prisoner B serves ten years Both serve two years
47
Classic example With Probabilities
Two player game with simultaneous moves, where the probabilities with which player two acts are known to both players. Two player game with simultaneous moves, where the probabilities with which player two acts are known to both players. Two player game with simultaneous moves, where the probabilities with which player two acts are known to both players. Two player game with simultaneous moves, where the probabilities with which player two acts are known to both players. Two player game with simultaneous moves, where the probabilities with which player two acts are known to both players.
2 1 Action A P(A)p Action B P(B)q Action C P(C)r Action D P(D)1-pqr
Action A -1 1 -2 4
Action B -2 7 1 1
Action C 0 -1 0 3
Action D 1 0 2 3
  • When disregarding the probabilities when playing
    the game, (D,B) is the equilibrium point under
    minimax
  • With probabilities (pqr1/4), player one will
    choose B. This is

48
how Bayes works
  • View (pi,qi,ri) as ?i where i1 in the previous
    example
  • Letting i1,n we get a much better idea of what
    Bayes meant by states of nature and how
    probabilities of each state enter into ones
    strategy

49
Conclusion
  • We covered three theoretical approaches in our
    presentation
  • Likelihood
  • provides statistical justification for many of
    the methods used in statistics
  • MLE - method used to make inferences about
    parameters of the underlying probability
    distribution of a given data set
  • Bayesian and Decision Theory
  • paradigms used in statistics
  • Bayesian Theory
  • probabilities are associated with individual
    event or statements rather than with sequences of
    events
  • Decision Theory
  • Describe and rationalize the process of decision
    making, that is, making a choice of among several
    possible alternatives

Source http//www.answers.com/maximum20likelihoo
d, http//www.answers.com/bayesian20theory,
http//www.answers.com/decision20theory
50
The End ?
Any questions for the group?
About PowerShow.com