Loading...

PPT – Chapter 15: Likelihood, Bayesian, and Decision Theory PowerPoint presentation | free to download - id: 6d3676-MzAyZ

The Adobe Flash plugin is needed to view this content

Chapter 15 Likelihood, Bayesian, and Decision

Theory

- AMS 572
- Group Members
- Yen-hsiu Chen, Valencia Joseph, Lola Ojo,
- Andrea Roberson, Dave Roelfs,
- Saskya Sauer, Olivia Shy, Ping Tung

Introduction

"To call in the statistician after the experiment

is done may be no more than asking him to perform

a post-mortem examination he may be able to say

what the experiment died of." - R.A. Fisher

- Maximum Likelihood, Bayesian, and Decision Theory

are applied and have proven its selves useful and

necessary in sciences, such as physics, as well

as research in general. - They provide a practical way to begin and carry

out an analysis or experiment.

15.1 Maximum Likelihood Estimation

15.1.1 Likelihood Function

- Objective Estimating the unknown parameters ?of

a population distribution based on a random

sample ?1,,?n from that distribution - Previous chapters Intuitive Estimates
- gt Sample Means for Population Mean
- To improve estimation, R. A. Fisher (18901962)

proposed MLE in 19121922.

Ronald Aylmer Fisher (18901962)

- The greatest of Darwin's successors
- Known for
- 1912 Maximum likelihood
- 1922 F-test
- 1925 Analysis of variance (Statistical

Method for Research Workers ) - Notable Prizes
- Royal Medal (1938)
- Copley Medal (1955)

Source http//www-history.mcs.st-andrews.ac.uk/hi

story/PictDisplay/Fisher.html

Joint p.d.f. vs. Likelihood Function

- Identical quantities
- Different interpretation
- Joint p.d.f. of X1 ,, Xn
- A function of ?1,,?n for given ?
- Probability interpretation
- Likelihood Function of ?
- A function of ?for given ?1,,?n
- No probability interpretation

Example Normal Distribution

- Suppose ?1,,?n is a random sample from a normal

distribution with p.d.f. -

- parameter ( ), Likelihood Function

15.1.2 Calculation of Maximum Likelihood

Estimators (MLE)

- MLE of an unknown parameter ?
- The value which

maximizes the likelihood function - Example of MLE
- 2 independent Bernoulli trials with success

probability ? - ?is known 1/4 and 1/3
- gtparameter space T 1/4, 1/3
- Using Binomial distribution, the probabilities of

observing - ? 0, 1, 2 successes can be calculated

Example of MLE

- Probability of Observing?Successes

the of successes Parameter space T ?

the of successes Parameter space T 0 1 2

1/4 9/16 6/16 1/16

1/3 4/ 9 4/ 9 1/9

- When ?0, the MLE of
- When ?1 or 2, the MLE of
- The MLE is chosen to maximize

for observed ?

15.1.3 Properties of MLEs

- Objective
- optimality properties in large sample
- Fisher information (continuous case)
- Alternatives of Fisher information

(1)

(2)

(No Transcript)

(No Transcript)

MLE (Continued)

- Define the Fisher information for an i.i.d. sample

MLE (Continued)

- Generalization of the Fisher information for
- k-dimensional vector parameter

MLE (Continued)

- Cramér-Rao Lower Bound
- A random sample X1, X2, , Xn from p.d.f

f(x?). - Let be any estimator of ? with where B(?)

is the bias of If B(?) is differentiable in

? and if certain regularity conditions holds,

then - (Cramér-Rao inequality)
- The ratio of the lower bound to the variance of

any estimator of ? is called the efficiency of

the estimator. - An estimator has efficiency 1 is called the

efficient estimator.

15.1.4 Large Sample Inference Based on the MLEs

- Large sample inference on unknown parameter ?
- estimate
- 100(1-a) CI for ?

15.1.4 Delta Method for Approximating the

Variance of an Estimator

- Delta method
- estimate a nonlinear function h(?)
- suppose that and is a known function of

?. - using

15.2 Likelihood Ratio Tests

15.2 Likelihood Ratio Tests

The last section presented an inference for

pointwise estimation based on likelihood theory.

In this section, we present a corresponding

inference for testing hypotheses. Let be

a probability density function where is a real

valued parameter taking values in an interval

that could be the whole real line. We call the

parameter space. An alternative hypothesis

will restrict the parameter to some subset

of the parameter space . The null hypothesis

is then the complement of with respect to

.

- Consider the two-sided hypothesis

versus , where is a

specified value.

We will test versus on the basis of the

random sample from . If the null

hypothesis holds, we would expect the likelihood

to be relatively large, when

evaluated at the prevailing value . Consider

the ratio of two likelihood functions, namely

Note that , but if is true should

be close to 1 while if is true, should

be smaller. For a specified significance level

, we have the decision rule, reject in

favor of if , where c is such that

This test is called the likelihood ratio test.

Example 1

Let be a random sample of size n from a

normal distribution with known variance. Obtain

the likelihood ratio for testing versus

.

So

is a maximum since

.

lt 0 . Thus

is the MLE of

.

Example 1 (continued)

.

is equivalent to

thus

Example 2

Let be a random sample from a Poisson

distribution with mean gt0. a. Show that

the likelihood ratio test of versus is

based upon the statistic .

Obtain the null distribution of Y.

So

is a maximum since

thus

is the mle of

Example 2 (continued)

The likelihood ratio test statistic is

,

. Under

And its a function of Y

Example 2 (continued)

- For 2 and n 5, find the significance

level of the test that rejects if

or .

The null distribution of Y is Poisson(10).

Composite Null Hypothesis

The likelihood ratio approach has to be modified

slightly when the null hypothesis is composite.

When testing the null hypothesis

concerning a normal mean when

is unknown, the parameter space

is a subset of

The null hypothesis is composite and

Since the null hypothesis is composite, it isnt

certain which value of the parameter(s) prevails

even under

. So we take the maximum of the

likelihood over

The generalized likelihood ratio test statistic

is defined as

Example 3

be a random sample of size n from a

normal distribution with unknown mean and

variance. Obtain the likelihood ratio test

statistic for testing

Let

versus

In Example 1, we found the unrestricted mle

Now

,

Since

we only need to find the value of

maximizing

Example 3 (continued)

So

is a maximum since

is the MLE of

Thus

Thus

is the MLE of

We can also write

.

Example 3 (continued)

Example 3 (continued)

Rejection region

, such that

so

where

define

and

implies

or

So

where

15.3 Bayesian Inference

Bayesian inference refers to a statistical

inference where new facts are presented and used

draw updated conclusions on a prior belief. The

term Bayesian stems from the well known Bayes

Theorem which was first derived by Reverend

Thomas Bayes.

Thomas Bayes (c. 1702 April 17, 1761) Source

www.wikipedia.com

Thomas Bayes (pictured above) was a Presbyterian

minister and a mathematician born in London who

developed a special case of Bayes theorem which

was published and studied after his death.

f (AB) f (A n B) / f (B) f (B A) f (A) /

f(B) since, f (A n B) f (B n A) f (B A) f

(A)

Bayes Theorem (review)

(15.1)

Some Key Terms in Bayesian Inference

in plain English

- prior distribution probability tendency of an

uncertain quantity, ?, that expresses previous

knowledge of ? from, for example, a past

experience, with the absence of some proof - posterior distribution this distribution takes

proof into account and is then the conditional

probability of ?. The posterior probability is

computed from the prior and the likelihood

function using Bayes theorem. - posterior mean the mean of the posterior

distribution - posterior variance the variance of the

posterior distribution - conjugate priors - a family of prior probability

distributions in which the key property is that

the posterior probability distribution also

belongs to the family of the prior probability

distribution

15.3.1 Bayesian Estimation

So far weve learned that the Bayesian approach

treats ? as a random variable and then data is

used to update the prior distribution to obtain

the posterior distribution of ?. Now lets move on

to how we can estimate parameters using this

approach.

(Using text notation) Let ? be an unknown

parameter based on a random sample, x1, x2, , xn

from a distribution with pdf/pmf f (x ?). Let p

(?) be the prior distribution of ?. Let p (?

x1, x2, , xn) be the posterior distribution.

Note that p (? x1, x2, , xn) is the

condition distribution of ? given the observed

data, x1, x2, , xn. If we apply Bayes Theorem

(Eq. 15.1), our posterior distribution becomes

f (x1, x2, , xn ?) p(?)

f (x1, x2, , xn ?) p(?)

(15.2)

f (? x1, x2, , xn)

f (x1, x2, , xn ?)p(?)

d?

Note that f (? x1, x2, , xn) is the marginal

PDF of X1, X2, ,Xn

Bayesian Estimation (continued)

As seen in equation 15.2, the posterior

distribution represents what is known about ?

after observing the data X x1, x2, , xn . From

earlier chapters, we know that the likelihood of

a variable ? is f (X ?) .

So, to get a better idea of the posterior

distribution, we note that

posterior distribution likelihood

x prior distribution i.e. p (?

X) f (X ?) x p (?)

For a detailed practical example of deriving the

posterior mean and using Bayesian estimation,

visit

http//www.stat.berkeley.edu/users/rice/Stat135/B

ayes.pdf ?

Example 15.26

Let x be the number of successes from n i.i.d.

Bernoulli trials with unknown success probability

p?. Show that the beta distribution is a

conjugate prior on ?.

?

?

? Goal

Example 15.26 (continued)

X has a binominal distribution of n and p ?

x1,2,n

Prior distribution of ? is the beta distribution

0 ? 1

Example 15.26 (continued)

It is a beta distribution with parameters (xa)

and (n-xb)!!

- Notes
- The parameters a and b of the prior distribution
- may be interpreted as prior successes and

prior - failures, with mab being the total number

of - prior observations.
- After actually observing x successes and

n-x - failures in n i.i.d Bernoulli trials, these

parameters - are updated to ax and bn-x, respectively.
- The prior and posterior means are, respectively,

and

15.3.2 Bayesian Testing

Assumption

, we reject in favor of .

If

Where k gt0 is a suitably chosen critical constant.

15.4 Decision Theory

- Abraham Wald
- (1902-1950)
- was the founder of
- Statistical decision theory.
- His goal was to
- provide a unified
- theoretical framework
- for diverse problems.
- i.e. point estimation,
- confidence interval
- estimation and hypothesis testing.

Source http//www-history.mcs.st-andrews.ac.uk/hi

story/PictDisplay/Wald.html

Statistical Decision Problem

- The goal is to choose a decision d from a set of

possible decisions D, based on a sample outcome

(data) x - Decision space is D
- Sample space the set of all sample outcomes

denoted by x - Decision Rule d is a function d(x) which assigns

to every sample outcome x ? X, a decision d ? D.

Continued

- Denote by X the R.V. corresponding to x and the

probability distribution of X by f (x?). - The above distribution depends on an unknown

parameter ? belonging to a parameter space T - Suppose one chooses a decision d when the true

parameter is ?, a loss of L (d, ?) is incurred

also known as the loss function. - The decision rule is assessed by evaluating its

expected loss called the risk function - R(d, ?) EL(d(X),?) ?xL(d(X),?) f (x?)dx.

Example

- Calculate and compare the risk functions for the

squared error loss of two estimators of success

probability p from n i.i.d. Bernoulli trials. The

first is the usual sample proportion of successes

and the second is the bayes estimator from

Example 15.26 - ?1 X/n
- and
- ?2 a X/ m n

Von Neumann (1928) Minimax

Sourcehttp//jeff560.tripod.com/

How Minimax Works

- Focuses on risk avoidance
- Can be applied to both zero-sum and non-zero-sum

games - Can be applied to multi-stage games
- Can be applied to multi-person games

Classic Example The Prisoners Dilemma

- Each player evaluates his/her alternatives,

attempting to minimize his/her own risk - From a common sense standpoint, a sub-optimal

equilibrium results

Prisoner B Stays Silent Prisoner B Betrays

Prisoner A Stays Silent Both serve six months Prisoner A serves ten years Prisoner B goes free

Prisoner A Betrays Prisoner A goes free Prisoner B serves ten years Both serve two years

Classic example With Probabilities

Two player game with simultaneous moves, where the probabilities with which player two acts are known to both players. Two player game with simultaneous moves, where the probabilities with which player two acts are known to both players. Two player game with simultaneous moves, where the probabilities with which player two acts are known to both players. Two player game with simultaneous moves, where the probabilities with which player two acts are known to both players. Two player game with simultaneous moves, where the probabilities with which player two acts are known to both players.

2 1 Action A P(A)p Action B P(B)q Action C P(C)r Action D P(D)1-pqr

Action A -1 1 -2 4

Action B -2 7 1 1

Action C 0 -1 0 3

Action D 1 0 2 3

- When disregarding the probabilities when playing

the game, (D,B) is the equilibrium point under

minimax - With probabilities (pqr1/4), player one will

choose B. This is

how Bayes works

- View (pi,qi,ri) as ?i where i1 in the previous

example - Letting i1,n we get a much better idea of what

Bayes meant by states of nature and how

probabilities of each state enter into ones

strategy

Conclusion

- We covered three theoretical approaches in our

presentation - Likelihood
- provides statistical justification for many of

the methods used in statistics - MLE - method used to make inferences about

parameters of the underlying probability

distribution of a given data set - Bayesian and Decision Theory
- paradigms used in statistics
- Bayesian Theory
- probabilities are associated with individual

event or statements rather than with sequences of

events - Decision Theory
- Describe and rationalize the process of decision

making, that is, making a choice of among several

possible alternatives

Source http//www.answers.com/maximum20likelihoo

d, http//www.answers.com/bayesian20theory,

http//www.answers.com/decision20theory

The End ?

Any questions for the group?