Raoul LePage - PowerPoint PPT Presentation

About This Presentation
Title:

Raoul LePage

Description:

Pepsi 42 GM 8 Dow 9. GREAT TRICK : SOME CAVEATS. 20. population of N = 5 ... Pepsi 42 GM 8 Dow 9. CORRECTION TO PAGE 25 OF TEXT. They would have you believe ... – PowerPoint PPT presentation

Number of Views:106
Avg rating:3.0/5.0
Slides: 59
Provided by: raoull
Learn more at: https://www.stt.msu.edu
Category:
Tags: lepage | pepsi | raoul

less

Transcript and Presenter's Notes

Title: Raoul LePage


1
Raoul LePage Professor STATISTICS AND
PROBABILITY www.stt.msu.edu/lepage click on
STT315_Sp06
Week 4
2
WEEK PLAN Normal Approx of Binomial Poisson and
its normal Approx Exponential Selected material
from Ch. 1.
Week 4
3
Normal Approx of Binomial
n 10, p 0.4 mean n p 4 sd root(n p q)
1.55
Week 4
4
Normal Approx of Binomial
n 30, p 0.4 mean n p 12 sd root(n p q)
2.683
Week 4
5
Normal Approx of Binomial
n 100, p 0.4 mean n p 40 sd root(n p q)
4.89898
Week 4
6
Poisson Distribution Governing Counts of Rare
Events
p(x) e-mean meanx / x! for x 0, 1, 2, ..ad
infinitum
Week 4
7
Poisson
e..g. X number of times ace of spades turns up
in 104 tries X Poisson with mean 2 p(x)
e-mean meanx / x! e.g. p(3) e-2 23 / 3! 0.18
Week 4
8
Poisson
e.g. X number of raisins in MY cookie. Batter
has 400 raisins and makes 144 cookies. E X
400/144 2.78 per cookie p(x) e-mean meanx /
x! e.g. p(2) e-2.78 2.782 / 2! 0.24 (around
24 of cookies have 2 raisins)
Week 4
9
Poisson
THE FIRST BEST THING ABOUT THE POISSON IS THAT
THE MEAN ALONE TELLS US THE ENTIRE
DISTRIBUTION! note Poisson sd root(mean)
Week 4
10
400 raisins 144 COOKIES
E X 400/144 2.78 raisins per cookie sd
root(mean) 1.67 (for Poisson)
Week 4
11
Poisson
THE SECOND BEST THING ABOUT THE POISSON IS THAT
FOR A MEAN AS SMALL AS 3 THE NORMAL APPROXIMATION
WORKS WELL.
1.67 sd root(mean) Special to Poisson
Week 4
mean 2.78
12
WE AVERAGE 127.8 ACCIDENTS PER MO.
E X 127.8 accidents If Poisson then sd
root(127.8) 11.3049 and the approx dist is
sd root(mean) 11.3 Special to Poisson

Week 4
mean 127.8 accidents
13
Exponential
The lifetime distribution when death comes by
rare event.
Week 4
14
Exponential density for mean life 57.3 years
Beware, the text uses notation E(57.3) to denote
exponential distribution having mean 57.3.
We will NOT do so!
mean 57.3 years
Week 4
15
Exponential tail areas for mean life 57.3 years
P(X gt x) e-x/mean P(X gt 100) e-100/57.3
0.1746
mean 57.3 years
Week 4
16
Selected material from Chapter 1
Week 4
17
The overwhelming majority of samples of n from a
population of N can stand-in for the population.
THE GREAT TRICK OF STATISTICS
ATT Sysco Pepsico GM Dow
population of N 5
sample of n 2
18
The overwhelming majority of samples of n from a
population of N can stand-in for the population.
THE GREAT TRICK OF STATISTICS
ATT Sysco Pepsico GM Dow
ATT Pepsico
population of N 5
sample of n 2
19
Sample size n must be large. For only a few
characteristics at atime, such as profit, sales,
dividend.SPECTACULAR FAILURES MAY OCCUR!
GREAT TRICK SOME CAVEATS
ATT 12 Sysco 21 Pepsi 42 GM 8 Dow 9
population of N 5
sample of n 2
20
GREAT TRICK SOME CAVEATS
This sample is obviously not
representative.
ATT 12 Sysco 21 Pepsi 42 GM 8 Dow 9
Sysco 21 Pepsi 42
population of N 5
sample of n 2
21
With-replacementvs without replacement.
HOW ARE WE SAMPLING ?
ATT 12 Sysco 21 Pepsi 42 GM 8 Dow 9
population of N 5
sample of n 2
22
With-replacement
HOW ARE WE SAMPLING ?
ATT 12 Sysco 21 Pepsi 42 GM 8 Dow 9
Pepsi 42 Pepsi 42
population of N 5
sample of n 2
23
Rule of thumb With and without replacement are
about the same ifroot (N-n) /(N-1) 1.
DOES IT MAKE A DIFFERENCE ?
with vs without
SAME ?
population of N
sample of n
24
WITH-replacement samples have no limit to the
sample size n.
UNLIMITED SAMPLING
ATT 12 Sysco 21 Pepsi 42 GM 8 Dow 9
population of N 5
sample of n 6
25
WITH-replacement samples have no limit to the
sample size n.
UNLIMITED SAMPLING
Dow 9 Pepsi 42 Dow 9 Pepsi 42 ATT 12 GM 8
ATT 12 Sysco 21 Pepsi 42 GM 8 Dow 9
repeats allowed
population of N 5
sample of n 6
26
TOSS COIN 100 TIMES
H
T
toss coin 100 times
population of N 2
27
TOSS COIN 100 TIMES
H , T , H , T , T , H , H , H , H , T , T , H , H
, H , H , H , H , H , T , T , H , H , H , H , H
, H , H , H , H , T , H , T , H , H , H , H , T ,
T , H , H , T , T , T , T ,T , T , T , H , H , T
, T , H , T , T , H , H , H , T , H , T , H , T
, T , H , H , T ,T , T , H , T , T , T , T , T ,
H , H , T , H , T , T , T , T , H , H , T , T ,
H , T , T , T , T , H , T , H , H , T , T , T ,
T , T
H
T
sample of n 100 with-replacement 49 H and 51 T
population of N 2
28
POPULATION CLONING, MANY COINS SAME AS 2
TOSS COIN 100 TIMES
H , T , H , T , T , H , H , H , H , T , T , H , H
, H , H , H , H , H , T , T , H , H , H , H , H
, H , H , H , H , T , H , T , H , H , H , H , T ,
T , H , H , T , T , T , T ,T , T , T , H , H , T
, T , H , T , T , H , H , H , T , H , T , H , T
, T , H , H , T ,T , T , H , T , T , T , T , T ,
H , H , T , H , T , T , T , T , H , H , T , T ,
H , T , T , T , T , H , T , H , H , T , T , T ,
T , T
H
H
T
sample of n 100 with-replacement 49 H and 51 T
T
population of N
29
HOW MANY SAMPLES ARE THERE ?
2 from 5
30
HOW MANY SAMPLES ARE THERE ?
30 from 500
31
HOW MANY SAMPLES ARE THERE ?
there are far more samples with-replacement
32
They would have you believe the population is
8, 9, 12, 42 and the sample is 42. A SET
is a collection of distinct entities.
CORRECTION TO PAGE 25 OF TEXT
ATT 12 IBM 42 AAA 9 Pepsi 42 GM 8 Dow 9
WE SAMPLE COMPANIES NUMBERS COME WITH THEM
Pepsi 42 Pepsi 42
33
IF THE OVERWHELMING MAJORITY OF SAMPLES ARE GOOD
SAMPLES THEN WE CAN OBTAIN A GOOD SAMPLE BY
RANDOM SELECTION.
THE ROLE OF RANDOM SAMPLING
34
HOW TO SAMPLE RANDOMLY ?
SELECTING A LETTER AT RANDOM
Digits are made to correspond to letters. a
00-02 b 03-05 . z 75-77 Random digits
then give random letters. 1559 9068
(Table 14, pg. 809) 15 59 90 68 etc (split
into pairs) f t w etc (take
chosen letters) For samples without replacement
just pass over any duplicates.
35
The Great Trick is far more powerful than we have
seen.A typical sample closely estimates such
things as a population mean or the shape of a
population density.But it goes beyond this to
reveal how much variation there is among sample
means and sample densities. A typical sample
not only estimates population quantities. It
estimates the sample-to-sample variations of its
own estimates.
36
EXAMPLE ESTIMATING A MEAN
  • The average account balance is 421.34 for a
    random with-replacement sample of 50 accounts.
  • We estimate from this sample that the average
    balance is 421.34 for all accounts.
  • From this sample we also estimate
  • and display a margin of error
  • 421.34 /- 65.22 .

s denotes "sample standard deviation"
37
SAMPLE STANDARD DEVIATION
NOTE Sample standard deviation s may be
calculated in several equivalent ways, some
sensitive to rounding errors, even for n 2.
38
EXAMPLE MARGIN OF ERROR CALCULATION
The following margin of error calculation for n
4 is only an illustration. A sample of four
would not be regarded as large enough. Profits
per sale 12.2, 15.3, 16.2, 12.8. Mean
14.125, s 1.92765, root(4) 2. Margin of error
/- 1.96 (1.92765 / 2) Report 14.125 /-
1.8891. A precise interpretation of margin of
error will be given later in the course,
including the role of 1.96. The interval 14.125
/- 1.8891 is called a 95 confidence interval
for the population mean. We used
(12.2-14.125)2 (15.3-14.125)2 (16.2-14.125)2
(12.8-14.125)2 11.1475.
39
EXAMPLE ESTIMATING A PERCENTAGE
  • A random with-replacement sample of 50 stores
    participated in a test marketing. In 39 of these
    50 stores (i.e. 78) the new package design
    outsold the old package design.
  • We estimate from this sample that 78 of all
    stores will sell more of new vs old.
  • We also estimate a margin of error /- 11.5

Figured 1.96 root(pHAT qHAT)/root(n)
1.96 root(.78 .22)/root(50)
0.114823 in Binomial setup
40
Plot the average heights of tents placed at 10,
14. Each tent has integral 1, as does their
average.
TENTING TONIGHT
tent
density "average tent"
41
THE MEAN OF A DENSITY IS TYPICALLY THE SAME AS
THE MEAN OF THE DATA FROM WHICH IT IS MADE.
42
Plot the average heights of tents placed at 10,
14. Each tent has integral 1, as does their
average.
SMOOTHING DATA SO YOU CAN SEE IT
tent
density "average tent"
43
Making the tents narrower isolates different
parts of the data and reveals more detail.
NARROWER TENTS MORE DETAIL
44
With narrow tents.
THE DENSITY BY ITSELF
density
45
Histograms lump data into categories (the black
boxes), not as good for continuous data.
DENSITY OR HISTOGRAM ?
density histogram
46
Plot of average heights of 5 tents placed at data
12, 21, 42, 8, 9.
DENSITY FOR 12, 21, 42, 8, 9
tent
density
47

Narrower tents operate at higher resolution but
they may bring out features that are illusory.
IS DETAIL ILLUSORY ?
which do we trust ?
kinkier
smoother
48
Population of N 500 compared with two samples
of n 30 each.
BEWARE OVER-FINE RESOLUTION
POP mean 32.02
population of N 500
with 2 samples of n 30
49
Population of N 500 compared with two samples
of n 30 each.
BEWARE OVER-FINE RESOLUTION
sample means are close
SAM1 mean 33.03 SAM2 mean 30.60
POP mean 32.02
densities not good at fine resolution
population of N 500
with 2 samples of n 30
50
The same two samples of n 30 each from the
population of 500.
WE DO BETTER AT COARSE RESOLUTION
SAM1 mean 33.03 SAM2 mean 30.60
POP mean 32.02
how about coarse resolution ?
population of N 500
with 2 samples of n 30
51
The same two samples of n 30 each from the
population of 500.
WE DO BETTER AT COARSE RESOLUTION
SAM1 mean 33.03 SAM2 mean 30.60
POP mean 32.02
good at coarse resolution
population of N 500
with 2 samples of n 30
52
The same two samples of n 30 each from the
population of 500.
HOW ABOUT MEDIUM RESOLUTION ?
SAM1 mean 33.03 SAM2 mean 30.60
POP mean 32.02
medium resolution ?
population of N 500
with 2 samples of n 30
53
The same two samples of n 30 each from the
population of 500.
HOW ABOUT MEDIUM RESOLUTION ?
SAM1 mean 33.03 SAM2 mean 30.60
POP mean 32.02
not good at medium resolution
population of N 500
with 2 samples of n 30
54
A sample of only n 600 from a population of N
500 million.(medium resolution)
SAMPLING ONLY 600 FROM 500 MILLION ?
large sample of n 600 ?
POP mean 32.02
medium resolution ?
population of N 500,000
with a sample of n 600
55
A sample of only n 600 from a population of N
500 million.(MEDIUM resolution)
SAMPLING ONLY 600 FROM 500 MILLION ?
sample of n 600 sample mean 32.84
mean very close
POP mean 32.02
densities are close
population of N 500,000
with a sample of n 600
56
A sample of only n 600 from a population of N
500 million.(FINE resolution)
SAMPLING ONLY 600 FROM 500 MILLION ?
sample of n 600 sample mean 32.84
POP mean 32.02
FINE resolution
densities very close
population of N 500,000
with a sample of n 600
57
TALKING POINTS
  • 1. The Great Trick of Statistics.
  • 1a. The overwhelming majority of all
    samples of n can stand-in for the population to
    a remarkable degree.
  • 1b. Large n helps.
  • 1c. Do not expect a given sample to
    accurately reflect the population in many
    respects, it asks too much of a sample.
  • 2. The Law of Averages is one aspect of The
    Great Trick.
  • 2a. Samples typically have a mean that is
    close to the mean of the population.
  • 2b. Random samples are nearly certain to
    have this property since the overwhelming
    majority of samples do.
  • 3. A density is controlled by the width of the
    tents used.
  • 3a. Small samples zero-in on coarse
    densities fairly well .
  • 3b. Samples in hundreds can perform
    remarkably well.
  • 3c. Histograms are notoriously unstable but
    remain popular.
  • Making a density from two to four values issue
    of resolution.
  • With-replacement vs without unlimited samples.
  • Using Table 14 to obtain a random sample.

58
Write a Comment
User Comments (0)
About PowerShow.com