45-733 lecture 8 (chapter 7)

- Point Estimation

Samples from populations

- There is some population we are interested in
- Families in the US
- Products coming off our assembly line
- Consumers in our products market segment
- Employees

Samples from populations

- We are interested in some quantitative

information (called variables) about these

populations - Income of families in the US
- Defects in products coming off our assembly line
- Perception of consumers of our product
- Productivity of our employees

Samples from populations

- All the information (accessible to statistics)

about a quantity in a population is contained in

its distribution function - Real-world distribution functions are complicated

things - In real life, we usually know little or nothing

about the distribution functions of the variables

we are interested in

Samples from populations

- Because distribution functions are complex, we

only try to find out about certain aspects of

them (parameters) - Average income of families in the US
- Rate of defects coming off our production line
- of customers who view our product favorably
- Average pieces/hour finished by a worker

Samples from populations

- Of course, we do not begin by knowing even these

quantities - One possibility is to measure the whole

population - Allows us to answer any question about the

distribution or parameters, using the techniques

of chapter 2 - However, this is almost always expensive and

often infeasible

Samples from populations

- Instead, we take a sample
- Taking a sample
- We select only a few of the members of the

population - We measure the variables of interest for those

members we select - Examples
- Phone survey
- Take 1 out of each 10,000 units off our prod line

Samples from populations

- The whole of statistics is figuring out what we

can learn about the population from a sample - What can we say about the distribution of a

variable from the information in a sample? - What can we say about the parameters we are

interested in from our sample? - How good is the information in our sample about

the population?

Samples and statistics

- As a practical matter, we are usually interested

in using our sample to say something about a

parameter of the distribution we care about - To get at this parameter, we construct a variable

called an estimator or statistic

Sample and estimator

- An estimate is an informed guess at the value of

a parameter - An estimator is an algorithm or rule for turning

samples into informed guesses about the value of

a parameter - An estimator is an algorithm for tuning samples

into estimates

Sample and estimator

- Example
- We are benchmarking our compensation policies for

our salesforce - Therefore, we are interested in how much

salespeople who work in similar jobs for similar

companies are paid - Naturally, they are not all paid the same
- There is a distribution of salaries among these

salespeople

Sample and estimator

- Example
- We dont need or want to know exactly how much

each and every one of these comparable people is

paid - We dont need or want to know the exact

distribution of pay for this job

Sample and estimator

- Example
- We do need and want to know some basic facts

about pay in this job. For example - What is the mean salary?
- What is the median salary?
- What is the standard deviation of salary?
- What is the 25th percentile of salary?
- What is the 75th percentile of salary?
- How is salary related to
- Experience?
- Typical hours? Travel requirements?
- Job responsibilities? Etc.

Sample and estimator

- Example
- Each of these things can be regarded as a

parameter, either of the distribution of salaries

or of the joint distribution of salary and other

variables - Lets focus on mean salary
- We take a sample of salaries s1, s2, ,sn
- How can we get an estimate of E(s)?s?

Sample and estimator

- Example
- Lets focus on mean salary, E(s)?s
- There is a TRUE value of ?s
- This value is fixed (non-random)
- It is just a number, like 47,432.81
- We wish to know it
- Knowing it exactly would be nice
- If we cant know it exactly, a good guess would

be useful.

Sample and estimator

- Example
- Lets focus on mean salary
- We take a sample of salaries s1, s2, ,sn
- S-bar is an estimator
- S-bar tells us what to do with a sample to turn

it into a guess at the (population) mean salary

Sample and estimator

- Example
- Lets focus on mean salary
- We take a sample of salaries s1, s2, ,sn
- S-bar is an estimator
- S-bar is a random variable with a distribution

function of its own - The distribution of s-bar depends on the

distribution of the underlying s

Sample and estimator

- Example
- Lets focus on mean salary
- Suppose our sample is (in thousands)55,62,43,77

,89,61 - The our estimate would be

Sample and estimator

- Example
- Lets focus on mean salary
- Suppose our sample is (in thousands)45,52,33,67

,79,51 - The our estimate would be

Sample and estimator

- Example
- Lets focus on mean salary
- In both cases, the estimator is
- But in one case, the estimate is 64.5 and in the

other example, the estimate is 54.5

Sample and estimator

- A key distinction estimator vs. estimate
- An estimate is a guess, based on a sample, at the

value of a parameter - It is a number, not random
- It is different for each sample, and depends on

the sample - An estimator is an algorithm, a rule, a formula

for turning a sample into an estimate. - It is a random variable
- Its distribution depends only on the

distribution of the underlying variable - It is exactly the same from sample to sample

Sample and estimator

- Review
- We wish to know about (some quantity) in a

population - The distribution of the quantity complete

knowledge - A parameter of the distribution a summary of

the info in the distribution - A estimate is a guess at a parameter based on the

information in a sample - An estimator is a way of turning samples into

guesses

All estimators are created equal?

- NOT!
- What makes for a good estimator?
- What makes for a good guess?
- Being exactly right all the time (cant be done)
- Being close to right, making few/small mistakes
- Being right on average
- Improving as the sample size grows

All estimators are created equal?

- There is a parameter we want to know, lets call

it ?. It has a true value that we dont know. - We have an estimator, call it ?1-hat, which has

some distribution. - We have another estimator, call it ?2-hat, which

has some (other) distribution - How can we know which of these two is better than

the other

All estimators are created equal?

- Some examples of estimators for E(s)?s
- The sample mean

All estimators are created equal?

- Some examples of estimators for E(s)?s
- The sample mean plus one

All estimators are created equal?

- Some examples of estimators for E(s)?s
- The first observation

All estimators are created equal?

- Some examples of estimators for E(s)?s
- Roll a die and use the number of spots

All estimators are created equal?

- Some examples of estimators for E(s)?s
- Seven

All estimators are created equal?

- Some examples of estimators for E(s)?s
- It should be clear that the sample mean is the

best of these estimators - We want to develop objective criteria for

evaluating estimators which allow us to conclude

that, for example, that the sample mean is the

best of these estimators

All estimators are created equal?

- Consider the distribution of the sample mean

All estimators are created equal?

- Compared to the distribution of ?s,2-hat

All estimators are created equal?

- Why do we like the distribution of the sample

mean better? - It is centered on the true value, ?s
- The estimator (the random variable) is more often

close to the truth, ?s

All estimators are created equal?

- Consider the distribution of the sample mean

All estimators are created equal?

- Compare to the distribution of the first obs

All estimators are created equal?

- Why do we like the distribution of the sample

mean better? - Now, both are centered on the true value, ?s
- The sample mean is more often close to the truth,

?s - Now, because it has smaller variance

All estimators are created equal?

- Consider the distribution of the sample mean

All estimators are created equal?

- Compare to the distribution of seven

All estimators are created equal?

- Why do we like the distribution of the sample

mean better? - Sample mean is centered on the true value, ?s, no

matter what the true value is - The estimator seven is only centered on the

true value if the true value happens to be ?s7 - Similarly, the sample mean is close to the true

value more often unless the true value is very

close to seven

All estimators are created equal?

- Recall
- In general, we are trying to estimate a parameter

whose value we do not know, ? - We have a proposed estimator, ?1-hat
- We have another proposed estimator, ?2-hat
- We want to know which is better
- So, we need some criteria to use to compare

estimators

All estimators are created equal?

- The simplest criteria
- Is an estimator is good if it is always right
- But a parameter is just a fixed number, like 62.
- An estimator is a random variable, so it can take

on many values - So, practically no estimator will be good by this

criterion. - We must lower our standards!

All estimators are created equal?

- Bias and unbiasedness
- Since estimators are random variables, we can

think about their expectations - We are going to say that an estimator is unbiased

if

All estimators are created equal?

- Bias and unbiasedness
- An estimator is unbiased if it is (always) right

on average - An unbiased estimator is not systematically

wrong

All estimators are created equal?

- Bias and unbiasedness
- The bias of an estimator is defined as
- Obviously, an unbiased estimator has a bias equal

to zero

All estimators are created equal?

- Bias and unbiasedness
- The sample mean is unbiased
- The sample mean plus one is biased
- The sample mean plus one has a bias of 1
- This is why we like the sample mean better than

the sample mean plus one - Sample mean is better than sample mean plus one

on the biasedness criterion

All estimators are created equal?

- Some unbiased estimators
- The sample mean for the population mean
- The sample variance for the population variance
- The sample proportion for the population

proportion

All estimators are created equal?

- Some biased estimators
- The sample standard deviation for the population

standard deviation - The sample median for the population median
- Sample percentiles for population percentiles

All estimators are created equal?

- Variance (efficiency)
- Suppose we are comparing two unbiased estimators,

- We say that ?1-hat is more efficient than ?2-hat

if

All estimators are created equal?

- Variance (efficiency)

All estimators are created equal?

- Variance (efficiency)
- We like the sample mean better than the first

observation because its variance is lower

All estimators are created equal?

- Variance (efficiency)
- When we are talking about a group of unbiased

estimators, the best estimator is the one with

the least variance

All estimators are created equal?

- Mean squared error
- Consider these two estimators

All estimators are created equal?

- Mean squared error
- We might like ?1-hat more than ?2-hat even though

?1-hat is biased and ?2-hat is not - We might like ?1-hat better because it is near

the true value of the parameter more often, even

though it is biased.

All estimators are created equal?

- Mean squared error
- To formalize this, we develop the mean-squared

error

All estimators are created equal?

- Mean squared error
- The mean squared error is just the average

squared mistake that the estimator makes - So, even though ?1-hat is biased and ?2-hat is

not, we might like ?1-hat better since

All estimators are created equal?

- Mean squared error and bias
- There is a relationship between mean squared

error and bias