Statistical Inference - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Statistical Inference

Description:

Statistical Inference – PowerPoint PPT presentation

Number of Views:621
Avg rating:3.0/5.0
Slides: 39
Provided by: robertas5
Category:

less

Transcript and Presenter's Notes

Title: Statistical Inference


1
Statistical Inference
2
Two Statistical Tasks 1. Description 2.
Inference
3
Thus far, we have completed 1. Descriptive
Statistics a. Central tendency i.
discrete variables ii. continuous variables
b. Variation i. discrete variables ii.
continuous variables c. Association i.
discrete variables
4
Now we begin 2. Inferential Statistics
a. Estimation b. Hypothesis testing
5
Inferential statistics are based on random
sampling.A sample is a subset of some universe
(or population set).If (and only if) the
sample is selected according to the laws of
probability, we can make inferences about the
universe from known (statistical) characteristics
of the sample.
6
(No Transcript)
7
(No Transcript)
8
Random means selected so that each element in
the universe has exactly the same chance of being
picked for the sample (sometimes called an
equi-probability sample).
9
Put differently, the only difference between
elements selected into the sample and those not
selected is pure chance (i.e., the luck of the
draw).
10
All inferential statistics evaluate the
probability that unlucky selection in creating a
random sample (the luck of the draw,
technically called sampling error) explains the
statistical outcomes obtained from random samples.
11
(No Transcript)
12
Sample 1 75 cardinal(n1 4)
13
Sample 2 0 cardinal(n2 4)
14
Sample 3 25 cardinal(n3 4)
15
Percent cardinal f
0 lowest 25 medium 50 highest
75 medium 100 lowest
16
0 25 50 75 100Percent cardinal in
random samples
17
All statistics calculated on variables from a
random sample have a (known) sampling
distribution. Sampling distributions are the
theoretically possible distributions of
statistical outcomes from an infinite number of
random samples of the same size.
18
Knowing this, we do not actually need to draw an
infinite number of random samples. When we draw
ONE (large) random sample, CHANCES ARE that its
characteristics will be closer to the center of
its sampling distribution than the extremes. That
is, any sample statistic is likely to be close to
(rather than very different from) the actual
(unknown) value (parameter) in the universe.
19
For example, when we find that the value of ?2
for the association between two variables in a
large random sample is 13.748, chances are that
the (unknown) value of ?2 for the universe (the
so-called true value) is similar rather than
very different.
20
The question is Does this sample value of ?2
permit us to infer that the two variables are
(probably) related or are (probably) independent
in the universe? The answer requires knowing how
to use the Chi-Square sampling distribution(s).
21
Sampling distributions allow us to identify the
probability that a sample statistic has a similar
value in the universe from which the random
sample was drawn (that is, whether the value
holds in general, not merely for the sample).
22
Unfortunately, ?2 has not one but several
sampling distributions, each differently shaped.
The one that is relevant for the specific
inference we wish to make can be identified by
knowing the number of degrees of freedom involved
in the calculation of this sample statistic.
23
(No Transcript)
24
In the case of contingency tables
(crosstabulations), degrees of freedom associated
with ?2 are a function of the size of the table
(i.e., the number of rows and columns).
Specifically, df (R 1)(C 1)
25
For example, a contingency table having two rows
and two columns (i.e., a 2 x 2 table) has only
one degree of freedom df (R 1)(C 1)
(2 1)(2 1) (1)(1)
1
26
ColumnRow One Two TotalOne ?
? 100Two ? ? 200Total 200
100 300
27
ColumnRow One Two TotalOne 96
? 100Two ? ? 200Total 200
100 300
28
ColumnRow One Two TotalOne 96
4 100Two ? ? 200Total 200
100 300
29
ColumnRow One Two TotalOne 96
4 100Two 104 ? 200Total 200
100 300
30
ColumnRow One Two TotalOne 96
4 100Two 104 96 200Total 200
100 300
31
An Example Year
1984 1985 TotalParty
Preference Democrat 545 595
1,140 Independent 528 462
990 Republican 370 455
825 Total 1,443 1,512
2,955For the crosstabulation in this table, ?2
13.748. Is the association in this table
confined to the sample, or does this mean that
there was a real shift in party identification
from one year to the next? There are several
steps in answering this question.
32
Since these data are from a large random sample,
we can use the laws of chance to infer whether
this value represents a real shift in the
universe (i.e., among people in the U.S. in
general ) or is merely an artifact of sampling
(bad luck in randomly selecting 2,955 people who
are NOT like the rest of the population).We
know that 13.748 is ONE of the values on a
sampling distribution of ?2, but which sampling
distribution? Since df 2 i.e., (3-1)(2-1),
we can determine that the sampling distribution
is the one whose values are located in row 2 of
the table in Appendix 4, the Critical Values of
Chi-Square. We need a DECISION RULE or CUT
POINT to decide whether this represents a true
shift or merely the result of chance in drawing
the random sample.
33
(No Transcript)
34
We must decide what chance of being wrong we want
to entertain in deciding between a true
relationship between changes over time and
political party preference (i.e., one that
actually exists in the universe) and an artifact
of sampling (i.e., a relationship that exists
nowhere else except in our sample due to the
luck of the draw). Actually, with Appendix 4
we are limited to some conventional probabilities
of deciding incorrectly 10 percent (.10, column
1), 5 percent (.05, column 2), 1 percent (.01,
column 3), or 1/10 of 1 percent (.001, column 4).
Until we have introduced some additional
criteria, let's stick with a 5 percent chance of
incorrectly deciding between a real association
and chance.
35
This is known as an alpha level (or significance
level) and is expressed as ?
0.05It means that we have only a 5 percent
chance of incorrectly deciding between a true
association in the universe and one due to chance
(which exists only in the sample). In other
words, this means that we have a 95 percent
chance of being correct in making our inference.
36
Having decided on an alpha level of .05 (i.e.,
accepting a 5 percent chance that we will decide
incorrectly) and knowing the appropriate
Chi-Square sampling distribution (one defined by
2 degrees of freedom), we can find the critical
value of ?2. From row 2 (df 2) and column 2 (?
.05) of Appendix 4, we find that the
appropriate critical value is 5.99. Since ?2 for
the data was calculated to be 13.748 and since
13.748 is GREATER than the critical value, we
conclude that the odds favor there being a true
association between party preference and year of
poll. In other words, there is less that a 5
percent chance that this association could be due
to chance (by randomly selecting people who are
atypical of the rest of the population).
37
(No Transcript)
38
Recapitulation1. Statistical inference
involves generalizing from a sample to a
(statistical) universe.2. Statistical inference
is only possible with random samples.3.
Statistical inference estimates the probability
that a sample result could be due to chance
(in sample selection).4. Sampling
distributions are the keys that connect
(known) sample statistics and (unknown)
universe parameters.5. Alpha levels are used to
identify critical values on sampling
distributions.
Write a Comment
User Comments (0)
About PowerShow.com