Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

About This Presentation

Title:

Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Description:

Inferential Statistics generalizes findings obtained from samples to the ... Samples need to be representative of the populations they are drawn from so we ... – PowerPoint PPT presentation

Number of Views:165

Avg rating:3.0/5.0

Slides: 79

Provided by: hpcus552

Category:

more less

Transcript and Presenter's Notes

Title: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

1
Basic Quantitative Methods in the Social
Sciences(AKA Intro Stats)

02-250-01
Lecture 5

2
Sampling Distributions

Inferential Statistics generalizes findings
obtained from samples to the populations that the
samples were drawn from
Samples need to be representative of the
populations they are drawn from so we use
random sampling

3
Random Sample

Random Sample a sample in which each member of
the population has an equal chance of being
included
We cannot assume that a random sample is exactly
representative of its population
E.g., randomly choosing 50 students from this
class their mean age may not be exactly the
mean age of the entire class (the population
approx 230 students)

4
Random Sampling

Random sampling makes all the samples which could
be drawn from the population equally likely
(e.g., who is included in the 50 student sample)
Each of the possible samples of 50 students would
have mean ages that would slightly differ from
the population mean age
We measure this difference with sampling error

5
Sampling Error

Sampling Error the difference between a
statistic and the parameter it estimates
E.g., if the population mean age was 24 and the
sample mean age was 21, we say we have a sampling
error of 3 years

6
Sampling Error

Because we usually dont collect data for an
entire population, we must have some way of
estimating the sampling error size and account
for it when we generalize sample information to
populations
We often obtain more samples to determine the
sampling error

7
Sampling Distributions

If we draw 6 samples of 50 students from this
class, we can obtain a better estimate of the
true population mean age than if we only drew one
sample
Suppose the mean ages for those 6 samples were as
follows
25, 23, 23, 25, 25, 26
The mean of these 6 mean ages is 24.5

8
Sampling Distributions

Looking at the mean age of the first sample, 25
years, if we only had data for this one sample,
25 years would be our best estimate of the true
population mean
By taking more than one sample, we calculate a
more accurate estimate of the population mean,
24.5 years

9
Sampling Error

Since all of the ages are relatively close to
each other, we can say with greater certainty
that we have small sampling error for any one of
the sample means
If the samples mean ages were much more
dissimilar, any one of the sample age means would
probably have a much higher sampling error

10
Sampling Error

This means that the variability of a statistic
over repeated samplings gives us some indication
of sampling error
If we continued to draw samples from the
population until all possible samples had been
drawn and the statistic of interest (mean age) is
entered into a frequency distribution, this is
known as a sampling distribution

11
Sampling Distributions

Sampling Distribution the distribution of a
statistic over repeated sampling from a specified
population
Using our previous example, the sampling
distribution of the mean for this class is a
distribution of the means of every possible
sample of 50 students

12
Expected Value

The mean of a sampling distribution of
is known as the expected value of the mean
the mean of sampling means
We use the symbol ? instead of for the mean
of a sampling distribution because it is a
population of terms

13
Standard Error

The standard deviation of a sampling distribution
is know as the standard error (?x) the
standard amount of difference between and ?
that is reasonable to expect simply by chance
The mean of any sample we take can be plotted on
the sampling distribution of X if we know the ?x
and ?x
The sampling distribution of X is a normal
distribution

14
Sampling Distribution
Sampling error
?x ? x
Obtained from one sample
15
Standard Error

The formula for standard error is as follows

16
Sampling Distributions

We usually dont know ?x and ?x and must
estimate ?x
Sampling Distributions are the basis for many
statistical tests (e.g., t-test well talk
about this later)
Statistical tests are a mathematical way of
testing a hypothesis

17
Hypothesis Testing

Hypothesis testing is a way of examining a
statement about a relationship between
independent and dependent variables
Independent variable the variable whose effects
the experimenter is interested in studying
Dependent variable the variable that the
experimenter measures (the data)

18
Independent and Dependent Variables - Example

If an experimenter is interested in researching
how hours of studying for an exam affect
performance on a test, the variables are as
follows
Independent Variable (IV) hours spent studying
Dependent Variable (DV) performance on test
(e.g., grade received)

19
Independent Variables

There are 2 broad types of IVs
Treatment Variable a treatment the experimenter
applies to previously undifferentiated
participants
E.g., certain participants are told to study for
5 hours and others are told to study for 2 hours
Categorical Variable A characteristic that is
inherent to, or pre-exists, in the participant
E.g., gender you cant assign someone a gender

20
Levels of IV

We also talk about the levels of IVs how we
break down the IV
E.g., if we are interested in studying the IV of
hours spent studying, it could have 2 levels 2
hours and 5 hours
Studying the IV of gender has 2 levels male and
female
The levels of an IV are compared on their DV
scores to look for a difference in outcome is
there a difference in test performance between
those who study for 5 hours and those who study
for 2?

21
Time to Think

A nursing researcher wants to know if giving TLC
prolongs life in cancer patients. 50 cancer
patients are divided into two groups group A
(n25) is given TLC by their nurses, and group B
(n25) are not. What is the DV, IV, and levels of
IV?
A researcher wants to know if members of the
Federal Liberal Party are wealthier than are
members of the Federal NDP. 100 members of each
party are asked to submit financial statements.
What is the DV, IV, and levels of IV?

22
Null Hypothesis

Tests of hypotheses in science are decisions to
retain or reject a null hypothesis (Ho)
Null hypothesis (Ho) a statement of
relationship between the IV and DV, usually a
statement of no difference or no relationship
we assume there is no relationship between IV and
DV

23
Null Hypothesis Examples

Men and women do not differ in IQ (?men ?women)
Hours spent studying do not affect test
performance (?2 hours ?5 hours)
Height does not affect weight (?short
?tall)

24
Null Hypotheses

Null hypotheses contain 3 components
The IV comparison being made
The DV being measured
The null relationship between IV and DV (e.g.,
do not differ)

25
Alternative Hypothesis

Although not directly tested, the Alternative
Hypothesis (Ha) does state a relationship, or
effect, of the IV on the DV this is often
called the Research Hypothesis
E.g.,
Ha Men and women do differ in IQ (?men ?
?women)
Ha Women have higher IQs than men (?women gt ?men)

26
Directional Ha

Ha Women have higher IQs than men (?women gt
?men) is a directional alternative hypothesis
we state that one level of the IV will have
greater (or lesser) DV scores than the other
level
When we make a directional alternative
hypothesis, we have a reason (either based on
past research or a theory) to predict the
direction of the results (i.e., that a statistic
at one level of the IV will be greater or less
than the statistic at the other level of the IV)
(note the above example is hypothetical only)

27
Non-Directional Ha

A non-directional alternative hypothesis does not
state the expected direction of effect
Ha Men and women have differing IQs (?women ?
?men)
We make a non-directional alternative hypothesis
when we have no reason to predict the direction
of the results. For instance, since there is no
theory or research body that would suggest that
women should have higher IQs than men, we would
only predict that their IQs are different than
mens

28
Hypothesis Testing

Hypothesis testing looks at the observed
difference in DV scores between the levels of the
IV and compares this difference to the expected
difference (Ho)
Any difference in value of the DV between the
levels of the IV can be explained in 2 ways the
effect of the IV or sampling error

29
Hypothesis Testing

Testing the null hypothesis is a way of
determining the probability that the observed
outcome could be found if the null hypothesis was
true
E.g., if we did find a difference between the IQs
of men and women, what is the chance we would
find this result if there is actually no
difference between their IQs?

30
Confidence Levels

When this probability drops below a certain
level (a criterion level), we call the result
significant
This criterion level is known as the confidence
level of the test, or alpha (?)

31
Confidence Level

Confidence Level a criterion level of
probability (alpha ?), set by the experimenter,
which acts as the reference for deciding whether
to reject or retain the null hypothesis
Significant Result at .05 we determine the null
hypothesis is not true but there is a 5 chance
that the null hypothesis is actually true.

32
Confidence Level

The confidence level is set by the experimenter,
but generally the convention is to use ? 0.05
and ? 0.01
For ? 0.05, this means that there is a 5
chance we will reject the null hypothesis when it
is actually true

33
Rejecting the Null Hypothesis

If the likelihood of observing this outcome is
below the confidence level (? 0.05 or ?
0.01), then we say that the result is significant
and we reject the null hypothesis
Significant results reject Ho (there is a
difference)
Non-significant results retain Ho (there is no
difference)

34
Type I and Type II Errors

When we decide to retain or reject the null
hypothesis, we never do so with 100 certainty we
are making the right decision we make the
decision with a probability of being correct (the
alpha level)
We can make an incorrect decision, resulting in 2
types of errors, Type I or Type II

35
Type I Errors

Type I Error Rejection of the null hypothesis
when it is true
We conclude that the IV affects or is related to
the DV when in reality the result was due to
sampling error
We see something that is not really there

36
Type I Error Example

If our null hypothesis is that men and women do
not differ in IQ, the Type I error is
Finding a result that men and women do differ in
IQ, when in reality they do not
We find this difference because of sampling error

37
Type II Errors

Type II Error Retention of the null hypothesis
when it is false
We conclude that the IV does not affect or is not
related to the DV when in reality there is an
effect or relationship
We fail to see something that is really there

38
Type II error Example

If our null hypothesis is that men and women do
not differ in IQ, the Type II error is
Finding a result that men and women do not
differ in IQ, when in reality they do

39
Type I and Type II Errors
40
Type I and Type II Errors

The probability of making a Type I error is equal
to the confidence level of the statistical test
(? 0.05 or ? 0.01)
When you lower the probability of making a Type I
error (e.g., use ? 0.01 instead of ? 0.05)
you increase the probability of making a Type II
error

41
Forget About It!

For this class, you do not need to know how to
determine the numerical value of a Type II error,
nor do you need to understand power
You do need to understand what a Type II error is

42
Consider a Sampling Distribution of Arts
Students GPAs.
Sampling error
?x ? x
6
10
43
What might this mean?

This samples mean (10) appears to be
substantially larger than the population mean
(6). Why might this be?
Perhaps there is something distinct about this
sample such that it is not really part of this
sampling distribution to begin with (e.g., maybe
there are gifted arts students)
Alternatively, perhaps its just fluke, and we
just happened to have sampled a bunch of good
arts students. Stated differently, perhaps this
sample mean is part of the sampling distribution
of arts students

44
Reminder

We can determine the proportion of scores (in
this situation, sample means) that would fall to
the right of the sample mean in question by
looking at a normal distribution table (Table
E.10).
To do so, we need to know the Z value of this
sample mean. We will come back to this (but for
sake of clarity, note that we will be learning to
calculate a z-test, which uses a slightly
different formula than the z-score formula that
you know)

45
One vs. Two Tailed Tests

The tails of a test set up our rejection region
they determine how we decide to retain or
reject Ho
When we use a one-tailed test, we are testing the
null hypothesis for a directional alternative
hypothesis (e.g., Ha women will have higher IQs
than men)
We are only interested in whether or not women
have higher IQs than men, not lower

46
Two-Tailed Tests

When we use a two-tailed test, we are testing the
null hypothesis for a non-directional alternative
hypothesis (e.g., Ha women and men will have
different IQs)
Here, we are interested in whether or not women
have higher or lower IQs than men

47
One vs. Two Tailed Tests (using ? 0.05)
2.5 2.5 5 5
48
Two-Tailed Tests

Once we begin discussing t-tests, you will see
that the value that determines whether or not our
observed statistic falls above or below the ?
0.05 depends on a number of factors
For now, know that we reject Ho if our observed
statistic is significantly greater than our
expected statistic

49
Test Statistics

A test statistic is a number calculated from the
scores of a sample that allows us to test a Null
Hypothesis and make a decision to reject or
retain the Ho
We will be talking about various test statistics
for the remainder of the term, and will begin
with the z-statistic today

50
Z-scores Revisited

We know, by using the z-score formula, the
probability of obtaining a score less than a
given X value in a standard normal distribution
E.g., when

51
The smaller portion area is .0668 (from Table
E.10)
.0668
z -1.5 0 X
70 100
52
Interpreting z

This means that if we randomly select one score
from this sample, the probability of that score
being less than 70 is .0668
But what if we want to test the hypothesis that a
sample of n scores (mean 70) is actually a part
of the population (mean 100, sd 20)?
We no longer use the z-score formula, we use a
z-statistic
Remember whenever we are testing a hypothesis,
we use a test statistic

53
What is Sigma?

Usually, we do not know sigma ( ), the sd for
a population (because obtaining data for an
entire population is usually not done)
Sometimes we do know sigma (e.g., for common
psychological tests)
When we know sigma, we can obtain the sampling
distribution of the mean when the Null Hypothesis
is true (that the sample does come from the
population)

54
Null Hypothesis

When we compare a sample mean with a population
mean, the Null Hypothesis is that the sample DOES
come from that population
Ho or that 70 100
But how can 70 100??
Recall that a sample extracted from a population
with µ 100 will more than likely result in a
sample mean that is above or below 100 because of
sampling error
When we test a Null Hypothesis, we are testing to
see if the sample mean and population mean are
statistically different from each other (that
there is a 95 chance based on an alpha level of
.05 that 70 is statistically different from 100)

55
Sampling Distribution of the Mean

In hypothesis testing, we set up the sampling
distribution of the mean and then calculate a
test statistic to determine if we can reject the
Ho
How is this done? Whenever we know we use a
z-test we know for the one sample of
interest, we know for the population, so we
can calculate (standard error for the
sampling distribution of the mean)

56
Standard Error Revisited

Last week, we stated that the standard deviation
of a sampling distribution of the mean is called
standard error
Standard error is used in test statistic formulae
because we are using sampling distributions of
the mean

57
Z Statistic

If testing a null hypothesis that a sample mean
is equal to the population mean (and sigma is
known), we must use the following formula for the
z-statistic (standard error instead of standard
deviation)

58
The z-statistic

Why zobs?? When we test the Ho, we will compare
this zobs (our z observed) value with a zcrit
(our z critical) value
Note zobs is often also called zobt (for z
obtained)
Hypothesis testing compares the absolute value of
zobs and zcrit in the following way
If zobs gt zcrit we reject the null hypothesis
If zobs lt zcrit, we retain the null hypothesis
If zobs zcrit, we retain the null hypothesis

59
Zcrit

The zcrit value is determined based on the alpha
level used (usually alpha .05)
zcrit is the z-score below which the probability
that the sample data come from the population is
less than .05 (the score that marks the tail)
We use Table E.10 to determine zcrit
Why might we be interested in this?
We will know if we are using a one-tailed or
two-tailed z-test based on our research question
If we use a one-tailed test, the area in that
tail is .05
If we use a two-tailed test, the area in EACH
tail is .025 (.05/2 tails)

60
Determining zcrit

When we discussed z-scores, we reviewed problems
where you know the proportion of scores and
needed to determine the z-score (e.g., the
lowest 10)
Determining zcrit is a similar process
Step 1 one-tailed or two-tailed?
Step 2 alpha .05 or alpha .01?
Step 3 Find the area in the smaller portion
column in Table E.10 to determine the zcrit

61
Tail Review
2.5 2.5 5 5
62
zcrit for Two-tailed Tests

Alpha .05 means that there is .025 per tail
Find .025 in the smaller portion column
zcrit 1.96
Note! This is two-tailed, so this means
zcrit 1.96
Alpha .01 means that there is .005 per tail
Find .005 in the smaller portion column
zcrit 2.57
Note! The exact smaller portion of .005 is not
in the table. The values of .0049 and .0051 are
listed, so which do we use?? Convention dictates
that we use zcrit 2.57

63
zcrit for One-tailed Tests

Alpha .05 means that there is .05 in the tail
Find .05 in the smaller portion column zcrit
1.64
Note The exact smaller portion of .05 is not
in the table. The values of .0495 and .0505 are
listed, so which do we use?? Convention dictates
that we use zcrit 1.64
Note! To determine if this is a or zcrit,
look at your Alternative Hypothesis
Alpha .01 means that there is .01 in the tail
Find .01 in the smaller portion column zcrit
2.33
Note! .0099 and .0102 are listed we use .0099 (z
2.33) because it is closest to .0100
Note! To determine if this is a or zcrit,
look at your Alternative Hypothesis

64
Z-test Hypothesis Testing Steps

1. State level of significance
? 0.05 (? 0.05 is usually used)
OR ? 0.01
2. State IV, levels of IV, and DV
3. State the hypotheses
Null hypothesis Ho
Alternative Hypothesis Ha
Note! At this point you need to read the question
carefully to decide if you are testing a
directional or nondirectional hypothesis

65
Z-test Hypothesis Testing Steps

4. Determine if you are using a one-tailed or
two-tailed test
A one-tailed test is used when you test a
Directional hypothesis
A two-tailed test is used when you test a
nondirectional hypothesis
5. Find the rejection region
I.e., find your zcrit!
It is usually a good idea to draw the normal
curve and plot your zcrit at this point it
helps!

66
Z-test Hypothesis Testing Steps

6. Calculate your z statistic (zobs)
7. Compare zcrit to zobs
Plot zobs on your normal distribution
Compare the numerical value of zcrit to zobs

67
Step 7 Example 1 (alpha .05)
.025
.025

If zobs 2.59

Two-tailed
zcrit -1.96
1.96
zobs
2.59
.05
One-tailed
zcrit
1.64
zobs
2.59
68
Step 7 Example 2 (alpha .05)
.025
.025

If zobs -1.75

Two-tailed
zcrit -1.96
1.96
zobs -1.75
.05
One-tailed
zcrit -1.64
zobs -1.75
69
Step 7

Null Hypotheses are rejected when zobs falls in
the rejection region (the area beyond zcrit).
The rejection region is the tail of the
distribution
OR Null Hypotheses are rejected when
zobs gt zcrit
BUT! What about when the zobs and zcrit are both
negative numbers??
In this case, think of rejecting Ho when the
absolute value of zobs gt zcrit
Absolute value means that you remove the
negative sign from both numbers (e.g., the
absolute value of 5.5 is 5.5)

70
Step 8 (Last One)

Step 8. State conclusions in words
Once you decide to reject or retain Ho, you need
to state your conclusions
So what does rejecting the Ho actually mean for
this research study?
OR What does retaining the Ho actually mean for
this research study?

71
Step 8 continued

Rejecting the Ho for z-tests means that the
sample mean is significantly different than the
population mean, i.e., there is less than a 5
chance that a sample extracted from this
population would result in such a sample mean
(because its in the tail end)
BUT! For one-tailed tests, make sure that you
state how they are different (i.e., is the sample
mean greater or less than the population mean)
Your conclusions should be clear enough that
anyone in the general public could understand
what the study found

72
An Example Using the Z-test

Scientists have come up with a breakthrough new
drug, they assert that by taking this drug it
will affect your IQ. Because it is so new they
are hoping it makes you smarter, but at this
point it might also make you dumber. A sample of
36 people has x 105, the population µ 100 and
the population ? 15. Test their hypothesis.

73
Example cont.

1. State level of significance - ? 0.05 (what
is usually used)
2. State IV and DV
IV pill (levels pill and no pill)
DV IQ scores
3. Null hypothesis
The drug does not make you smarter or dumber
(i.e., the sample mean does not differ from the
population mean)
Alternative Hypothesis
The drug makes you either smarter or dumber

74
Example

4. B/c this hypothesis is non-directional, we use
a two-tailed test
5. Find the rejection region ? 0.05, so with a
two-tailed test we want a critical value that
represents a region of rejection that makes up
0.025 of the area of each tail

.025
.025
75
Example

From Table E.10, we find that the critical value
for z is equal to 1.96 or 1.96
This means that zcrit ?1.96
6. Calculate your statistic

76
Example

This means our zobs 2.00
7. Compare zcrit to zobs
Is zobs gt zcrit??
Yes! 2.00 gt 1.96

.025
z -1.96 1.96 2.00
77
Example

B/c our zobs lies beyond zcrit we say our z-value
falls into the region of rejection the value of
zobs is greater than the value of zcrit so we
choose to reject the Ho
8. We conclude that the IQ pill significantly
changes someones IQ when they ingest it

78
Work On It

The average number of times that a Canadian
donates blood by the time they reach the age of
50 is 10, with a population standard deviation of
3 times. Researchers think that nurses donate
more blood than average Canadians. 25
fifty-year-old nurses are asked how many times
they have given blood, and their mean number of
times donating blood is 15. Test the hypothesis
at the .01 level of significance.

Write a Comment

User Comments (0)