2 Conditional probability distribution and likelihood
Let us assume that we know that our random sample points come from the distribution with parameter(s) . We do not know . If we would know it then we could write probability distribution of single observation f(x). Here f(x) is conditional distribution of the observed random variable if parameter would be known. If we observe n independent sample points from the same population then joint conditional probability distribution of all observations can be written
We could write product of the individual probability distribution because observations are independent. f(x) is probability of observation for discrete cases and density of the distribution for continuous cases.
We could interpret f(x1x2xn) as probability of observing given sample points if we would know parameter . If would vary the parameter we would get different values for the probability f. Since f is the probability distribution parameters are fixed and observation varies. For a given observation we define likelihood equal to the conditional probability distribution.
3 Conditional probability distribution and likelihood Cont.
When we talk about conditional probability distribution of the observations given parameter(s) then we assume that parameters are fixed and observation vary. When we talk about likelihood then observations are fixed parameters vary. That is major difference between likelihood and conditional probability distribution. Sometimes to emphasize that parameters vary and observations are fixed likelihood is written as
In this and following lectures we will use one notation for probability and likelihood. When we will talk about probability then we will assume that observations vary and when we will talk about likelihood we will assume that parameters vary.
Principle of maximum likelihood states that best parameters are those that maximise probability of observing current values of observations. Maximum likelihood chooses parameters that satisfy
4 Maximum likelihood
Purpose of maximum likelihood is to maximise likelihood function and estimate parameters. If derivatives of the likelihood function exist then it can be done using
Solution of this equation will give possible values for maximum likelihood estimator. If the solution is unique then it will be the only estimator. In real application there might be many solutions.
Usually instead of likelihood its logarithm is maximised. Since log is monotonically increasing function derivative of likelihood and derivative of the log of likelihood will have exactly same roots. If we use the fact that observations are independent then joint probability distribution of all observations is equal to product of individual probabilities. We can write log of likelihood (denoted as l)
Usually working with sums is easier than working with products
5 Maximum likelihood Example success and failure
Let us consider two examples. First example corresponds to discrete probability distribution. Let us assume that we carry out trials. Possible outcomes of the trials are success or failure. Probability of success is and probability of failure is 1- . We do not know value of . Let us assume we have n trials and k of them are successes and n-k of them are failures. Value of random variable describing our trials are either 0 (failure) or 1 (success). Let us denote observations as y(y1y2yn). Probability of the observation yi at the ith trial is
Since individual trials are independent we can write for n trials
For log of this function we can write
Derivative of the likelihood wrt unknown parameter is
Estimate for the parameter is equal to fraction of successes.
6 Maximum likelihood Example success and failure
.In the example of successes and failure result was not unexpected and we could have guessed it intuitively. More interesting problems arise when parameter itself becomes function of some other parameters and possible observations also. Let us say
It may happen that xi themselves are random variable also. If it is case and the function corresponds to normal distribution then analysis is called Probit analysis. Then log likelihood function would look like
Finding maximum of this function is more complicated. This problem can be considered as a non-linear optimisation problem. This kind of problems are usually solved iteratively. I.e. solution to the problem is guessed and then it is improved iteratively.
7 Maximum likelihood Example normal distribution
Now let us assume sample points come from the population with normal distribution with unknown mean and variance. Let us assume that we have n observation y(y1y2yn). We want to estimate population mean and variance. Then log likelihood function will have the form
If we get derivative of this function w.r.t mean value and variance then we can write
Fortunately first of these equations can be solved without knowledge about second one. Then if we use result from the first solution in the second solution (substitute by its estimate) then we can solve second equation also. Result of this will be sample variance
8 Maximum likelihood Example normal distribution
Maximum likelihood estimator in this case gave sample mean and sample variance. Many statistical techniques are based on maximum likelihood estimation of the parameters when observations are distributed normally. All parameters of interest are usually inside mean value. In other word is a function of several parameters.
Then problem is to estimate parameters using maximum likelihood estimator. Usually either x-s are fixed values or random variables. Parameters are -s. If this function is linear then we have linear regression.
9 Information matrix Observed and Fishers
One of the important aspects of the likelihood function is its behavior near to maximum. If the likelihood function is flat then observations have little to say about the parameters. It is because changes of the parameters will not cause large changes in the probability. That is to say same observation can be observed with similar probabilities for various values of the parameters. On the other hand if likelihood has pronounced peak near to the maximum then small changes in parameters would cause large changes in probability. In this cases we say that observation has more information about parameters. It is usually expressed as second derivative of the log-likelihood function. Observed information is equal second derivative of the minus log-likelihood function
Usually it is calculated at the maximum of the likelihood. This information is different from that defined using entropy.
Example In case of successes and failures we can write
10 Information matrix Observed and Fishers
Expected value of the observed information matrix is called expected information or Fishers information matrix. Expectation is taken over observations
It is calculated at any value of the parameter. Remarkable fact about Fishers information matrix is that it is also equal expected value of product of the gradients (first derivatives)
Note that observed information matrix depends on particular observation whereas expected information matrix depends only on probability distribution of observations (It is result of integration. When integrate over variables we loose dependence on these variables)
When sample size becomes large then maximum likelihood estimate becomes approximately normally distributed with variance close to
Fisher points out that inversion of observed information matrix gives slightly better estimate to variance than that of the expected information matrix.
11 Information matrix Observed and Fishers
More precise relation between expected information and variance is given by Cramer and Rao inequality. According to this inequality variance of the maximum likelihood estimator never can be less than inversion of information
Now let us consider an example of successes and failures. If we get expectation value for the second derivative of minus log likelihood function we can get
If we take this at the point of maximum likelihood then we can say that variance of the maximum likelihood estimator can be approximated by
This statement is true for large sample sizes.
12 Exercise 1
a) Assume that we have n sample points independently drawn from population with exponential distribution
What is maximum likelihood estimator for .
b) Now consider the case when population has the distribution
What is maximum likelihood estimator for . What is observed information for . What is expected information for .
PowerShow.com is a leading presentation/slideshow sharing website. Whether your application is business, how-to, education, medicine, school, church, sales, marketing, online training or just for fun, PowerShow.com is a great resource. And, best of all, most of its cool features are free and easy to use.
You can use PowerShow.com to find and download example online PowerPoint ppt presentations on just about any topic you can imagine so you can learn how to improve your own slides and presentations for free. Or use it to find and download high-quality how-to PowerPoint ppt presentations with illustrated or animated slides that will teach you how to do something new, also for free. Or use it to upload your own PowerPoint slides so you can share them with your teachers, class, students, bosses, employees, customers, potential investors or the world. Or use it to create really cool photo slideshows - with 2D and 3D transitions, animation, and your choice of music - that you can share with your Facebook friends or Google+ circles. That's all free as well!
For a small fee you can get the industry's best online privacy or publicly promote your presentations and slide shows with top rankings. But aside from that it's free. We'll even convert your presentations and slide shows into the universal Flash format with all their original multimedia glory, including animation, 2D and 3D transition effects, embedded music or other audio, or even video embedded in slides. All for free. Most of the presentations and slideshows on PowerShow.com are free to view, many are even free to download. (You can choose whether to allow people to download your original PowerPoint presentations and photo slideshows for a fee or free or not at all.) Check out PowerShow.com today - for FREE. There is truly something for everyone!