Statistics - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Statistics

Description:

Daniel's example 6.6.1: 96 boys, 123 girls. Suicide attempts in 18 of the boys and 60 of the girls. ... girls and boys is between .145 and .456. NOTE: since CI ... – PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 23
Provided by: Ins2
Category:

less

Transcript and Presenter's Notes

Title: Statistics


1
Estimation
2
Estimators
  • Statistical Inference
  • Population - Sample - Conclusions about
    population
  • Point Estimate
  • Single value chosen from a sampling distribution
  • Little information about actual value of
    population parameter
  • Confidence Interval
  • Interval/range of values believed to include the
    unknown population parameter
  • Associated measure of the confidence that the
    interval contains the parameter

3
Confidence Interval (CI)
For ? when ? Is known
  • Population normal, sample mean normal
  • Sample large, sampling mean normal (CLT)

Before sampling, there is a probability(1-a)
that the interval m ? z1-a/2 include the sample
mean
After sampling, (1-a) of the intervals will
include the population mean m
Source Aczel, A. (1998), Complete Business
Statistics, McGraw-Hill/Irwin, Mass., 4th ed.
(CD-ROM)
4
95 CI Around the Sample Mean
95 of the intervals around sample mean expected
to include ?
(1-?)95
?/22.5
?/22.5
.
0
?
5 of the intervals around sample mean expected
not to include ?


x??????
x??????
Sources Daniel, W. (1999), Biostatistics a
foundation for analysis in the health sciences,
7th ed. John Wiley Sons, p155. Aczel, A.
(1998), Complete Business Statistics,
McGraw-Hill/Irwin, Mass., 4th ed. (CD-ROM)
5
CI Components
100(1-a) CI for ? when ? is known (sampling from
normal population or large sample)
precision of the estimate (margin of error)
estimator
standard error
reliability coefficient
Interpretation a. Probabilistic in repeated
sampling, 100(1-a) of all intervals will include
m b. Practical we are 100(1-a) confident that a
single interval contains m
6
The t Distribution
CI for ? when ? is unknown
Replace s with the sample standard deviation s
Properties
  • Family of bell-shaped, symmetric distributions
  • Each member specified by the degrees of freedom
    (dof) n-1
  • -?
  • E(t)0
  • For df 2, var(t) df/(df-2)
  • t is less peaked and with higher tails than z
  • t z as dof ? (n-1 - ?)

Sources Daniel, W. (1999), Biostatistics a
foundation for analysis in the health sciences,
7th ed. John Wiley Sons, p155. Aczel, A.
(1998), Complete Business Statistics,
McGraw-Hill/Irwin, Mass., 4th ed. (CD-ROM)
7
CI Using t
100(1-a) CI for ? when ? is unknown (sampling
from normal population)
precision of the estimate (margin of error)
estimator
standard error
reliability coefficient
Interpretation a. Probabilistic in repeated
sampling, 100(1-a) of all intervals will include
m b. Practical we are 100(1-a) confident that a
single interval contains m
8
CI for Difference Between Two Population Means
?1 - m2
We are interested in
We know that x1 - x2 is an unbiased point
estimate of it, and that the variance of this
estimator is
(1-?)95
?/22.5
?/22.5
?1 - m2
Differences this large or larger (as large) -
Look to the left
Differences this large or larger (as large) -
Look to the right
9
Scenarios
  • Independent populations normally distributed with
    known variance, or
  • Independent populations not normally distributed
    with known variance and large sample size

2. Independent populations normally distributed
or not normal with large sample size (unknown
equal variance)
3. Independent populations normally distributed
or not normal with large sample size (unknown
unequal variance)
10
CI for Population Proportion
We are interested in p, population proportion
We know that p (sample proportion) is an
unbiased point estimate of it, and that the
variance of this estimator is
We also know that if np and n (1-p) 5 then
CI for population proportion
Daniels exercise 6.5.1 90 CI for p? Sample 947
patients, 166 history of sexual abuse
In this case np1665 and n(1-p) (947-166)781
5 then
We are 90 confident that between 0.155 and
0.1956 of the patients present a history of
sexual abuse
11
CI for Differences Between Population Proportion
We are interested in p1-p2
We know that p1-p2 (sample proportion) is an
unbiased point estimate of it, and that an
estimate of the variance of this estimator is
CI
Daniels example 6.6.1 96 boys, 123 girls.
Suicide attempts in 18 of the boys and 60 of the
girls. Assume independent random samples from
equivalent populations, 99 CI for difference in
proportions of suicidal attempts.
.3003 ? 2.58(.0602) (.1450,.4556) We are 99
confident that the difference in proportion of
suicidal attempts between girls and boys is
between .145 and .456. NOTE since CI does not
include 0 - not equal
12
Sample Size for Estimating Means
We are interested in designing a study to
estimate a given population parameter (MEAN) with
certain precision Half width of CI d
reliability coefficient x standard error
Daniels exercise 6.7.1 goal estimate mean
weight of babies born in the hospital with a 99
CI of 1 pound wide. Estimate of ?1.
You need to weight at least 27 babies to obtain
an estimate to be 99 confident that the error
will be ?0.5 pounds
You need to weight at least 16 babies to obtain
an estimate to be 95 confident that the error
will be ?0.5 pounds
13
Sample Size for Estimating Proportions
We are interested in designing a study to
estimate a given population parameter
(PROPORTION) with certain precision Half width
of CI d reliability coefficient x standard
error
Daniels exercise 6.8.1 goal determine
proportion of adults living with hepatitis B
virus. N required to estimate it within 0.03 with
95 confidence. In a similar area this proportion
is 0.20. What should be n if such estimate is not
available?
You need to recruit at least 683 people to obtain
an estimate within 0.03 being 95 confident
You need to recruit at most 1068 people to obtain
an estimate within 0.03 being 95 confident -
p0.5 Worst case scenario!
14
Sample Size Guidelines
Sample size reliability coefficient x variance
/ error width
TO PLAN THE SAMPLE SIZE FOR YOUR STUDY
observe The desired confidence level (1-?) The
admissible error width The parameters
variability (variance/st. dev.) in the population
under study (pilot or worst case scenario)
  • SAMPLE SIZE INCREASES with
  • Increased confidence level
  • Increased variability of the population parameter
  • Reduced error width

15
Critical Values for Z
Sampling from the same population, using
fixed (1) sample size, for higher confidence
level - wider CI (2) confidence level, for
larger sample size - narrower CI
Source Aczel, A. (1998), Complete Business
Statistics, McGraw-Hill/Irwin, Mass., 4th ed.
(CD-ROM)
16
z or t?
Pop N( )?
Y
N
n large?
Y
N
t
z


CLT
Non-parametric
Source Daniel W. (1999), Biostatistics a
foundation for analysis in the health sciences,
7th ed. John Wiley Sons, p165.
17
Point Estimate of Population Variance
We are interested in s 2, population variance
assume normally distributed population)
Daniels example 5.3.1 age of 5 children ?10,
?28
a) Sampling with replacement
6 8 10 12 14
b) Sampling without replacement
If N large, N?N-1 and s2 ? S2
18
The Chi-Square Distribution
Sampling from normal population
c2 (Chi-square) with n-1 df
Properties
  • Sum of independent squared standard normal
    variables
  • Family of asymmetric distributions
  • Each member specified by the degrees of freedom
    (df) n-1
  • 0
  • E(c2)df and Var(c2)2df
  • c2 z as df ? (n-1 - ?)

Source Aczel, A. (1998), Complete Business
Statistics, McGraw-Hill/Irwin, Mass., 4th ed.
(CD-ROM)
19
Sampling Distribution for the Variance
Daniels exercise 6.9.4 99 CI for s2 and s?
Sample of 30 patients (myocardial transit times),
s21.03
To find 99 CI using c2 - P(? ?).99 Note
a0.01, look for a/20.005 and 1- a/20.995 in
the table
We are 99 confident that the population variance
is between 0.571 and 2.277 and that the
population standard deviation is between 0.756
and 1.509
Table formatting adapted from Aczel, A. (1998),
Complete Business Statistics, McGraw-Hill/Irwin,
Mass., 4th ed. (CD-ROM)
20
Sampling Distribution for Ratio of Two Population
Variances
We are interested in
Application to verify whether two variances are
equal or different - Important to compare two
population means
Assumptions (1) Estimates s12 and s22 come from
independent samples (2) Populations are normally
distributed (3) Arbitrarily assign s12 the larger
of the two sample variances
We know
The ratio of two Chi-square random variables is a
random variable with an F distribution
21
The F Distribution
Properties
  • Ratio of two independent Chi-square random
    variables
  • Family of asymmetric distributions, skewed to the
    right
  • Each member specified by two degrees of freedom
    numerator df and denominator df
  • 0

Source Aczel, A. (1998), Complete Business
Statistics, McGraw-Hill/Irwin, Mass., 4th ed.
(CD-ROM)
22
CI for Ratio of Two Population Variances
Daniels exercise 6.10.4 90 CI for ?
s128, s2215, n32 divided equally
To find 90 note a0.1, look for 1- a/2 0.95 in
table with df1df215 for upper bound, and for
lower bound F0.051/F0.95 1/2.40
We are 90 confident that the ratio
is between 0.781 and 4.5
Table formatting adapted from Aczel, A. (1998),
Complete Business Statistics, McGraw-Hill/Irwin,
Mass., 4th ed. (CD-ROM)
Write a Comment
User Comments (0)
About PowerShow.com