5 Introduction to estimation

- Intro to statistical inference
- Sampling distribution of the mean
- Confidence intervals (s known)
- Students t distributions
- Confidence intervals (s not known)
- Sample size requirements

Statistical inference

- Statistical inference ? generalizing from a

sample to a population with calculated degree of

certainty - Two forms of statistical inference
- Estimation ? introduced this chapter
- Hypothesis testing ? next chapter

Parameters and estimates

- Parameter ? numerical characteristic of a

population - Statistics a value calculated in a sample
- Estimate ? a statistic that guesstimates a

parameter - Example sample mean x-bar is the estimator of

population mean µ

Parameters and estimates are related but are not

the same

Parameters and statistics

Parameters Statistics

Source Population Sample

Notation Greek (µ, s) Roman (x, s)

Random variable? No Yes

Calculated No Yes

Sampling distribution of the mean

- x-bar takes on different values with repeated

(different) samples - µ remain constant
- Even though x-bar is variable, its behavior is

predictable - The behavior of x-bar is predicted by its

sampling distribution, the Sampling Distribution

of the Mean (SDM)

Simulation experiment

- Distribution of AGE in population.sav (Fig.

right) - N 600
- µ 29.5 (center)
- s 13.6 (spread)
- Not Normal (shape)
- Conduct three sampling simulations
- For each experiment
- Take multiple samples of size n
- Calculate means
- Plot means ? simulated SDMs
- Experiment A each sample n 1
- Experiment B each sample n 10
- Experiment C each sample n 30

Results of simulation experiment

- Findings
- SDMs are centered on 29 (µ)
- SDMs become tighter as n increases
- SDMs become Normal as the n increases

95 Confidence Interval for µ

Formula for a 95 confidence interval for µ when

s is known

Illustrative example

- Example
- Population with s 13.586 (known ahead of time)
- SRS ? 21, 42, 11, 30, 50, 28, 27, 24, 52
- n 10, x-bar 29.0
- SEM s / ?n 13.586 / ?10 4.30
- 95 CI for µ
- xbar (1.96)(SEM)
- 29.0 (1.96)(4.30)
- 29.0 8.4
- (20.6, 37.4)

Margin of error

Margin of error

- Margin or error ? d half the confidence

interval - Surrounded x-bar with margin of error
- 95 CI for µ
- xbar (1.96)(SEM)
- 29.0 (1.96)(4.30)
- 29.0 8.4

point estimate

margin of error

Interpretation of a 95 CI

We are 95 confident the parameter will be

captured by the interval.

Other levels of confidence

Let a ? the probability confidence interval will

not capture parameter 1 a ? the confidence level

Confidence level 1 a Alpha level a z1a/2

.90 .10 1.645

.95 .05 1.96

.99 .01 2.58

(1 a)100 confidence for µ

Formula for a (1-a)100 confidence interval for µ

when s is known

Example 99 CI, same data

- Same data as before
- 99 confidence interval for µ
- x-bar (z1.01/2)(SEM)
- x-bar (z.995)(SEM)
- 29.0 (2.58)(4.30)
- 29.0 11.1
- (17.9, 40.1)

Confidence level and CI length

p. 5.9 demonstrates the effect of raising your

confidence level ? CI length increases ? more

likely to capture µ

Confidence level CI for illustrative data CI length

90 (21.9, 36.1) 14.2

95 (20.6, 37.4) 16.8

99 (17.9, 40.1) 22.2

CI length UCL LCL

Beware

- Prior CI formula applies only to
- SRS
- Normal SDMs
- s known ahead of time
- It does not account for
- GIGO
- Poor quality samples (e.g., due to non-response)

When s is Not Known

- In practice we rarely know s
- Instead, we calculate s and use this as an

estimate of s - This adds another element of uncertainty to the

inference - A modification of z procedures called Students t

distribution is needed to account for this

additional uncertainty

Students t distributions

Brilliant!

- William Sealy Gosset (1876-1937) worked for the

Guinness brewing company and was not allowed to

publish - In 1908, writing under the the pseudonym

Student he described a distribution that

accounted for the extra variability introduced by

using s as an estimate of s

t Distributions

- Students t distributions are like a Standard

Normal distribution but have broader tails - There is more than one t distribution (a family)
- Each t has a different degrees of freedom (df)
- As df increases, t becomes increasingly like z

t table

- Each row is for a particular df
- Columns contain cumulative probabilities or tail

regions - Table contains t percentiles (like z scores)
- Notation tdf,p Example t9,.975 2.26

95 CI for µ, s not known

Formula for a (1-a)100 confidence interval for µ

when s is NOT known

Same as z formula except replace z1-a/2 with

t1-a/2 and SEM with sem

Illustrative example diabetic weight

- To what extent are diabetics over weight?
- Measure of ideal body weight (actual body

weight) (ideal body weight) 100 - Data (n 18) 107, 119, 99, 114, 120, 104, 88,

114, 124, 116, 101, 121, 152, 100, 125, 114, 95,

117

Interpretation of 95 CI for µ

- Remember that the CI seeks to capture µ, NOT

x-bar - 95 confidence means that 95 of similar

intervals would capture µ (and 5 would not) - For the diabetic body weight illustration, we can

be 95 confident that the population mean is

between 105.6 and 120.0

Sample size requirements

- Assume SRS, Normality, valid data
- Let d ? the margin of error (half confidence

interval length) - To get a CI with margin of error d, use

Sample size requirements, illustration

- Suppose, we have a variable with s 15

Smaller margins of error require larger sample

sizes

Acronyms

- SRS ? simple random sample
- SDM ? sampling distribution of the mean
- SEM ? sampling error of mean
- CI ? confidence interval
- LCL ? lower confidence limit
- UCL ? lower confidence limit