Basic Quantitative Methods in the Social Sciences (AKA Intro Stats) - PowerPoint PPT Presentation

About This Presentation
Title:

Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Description:

Inferential Statistics generalizes findings obtained from samples to the ... Samples need to be representative of the populations they are drawn from so we ... – PowerPoint PPT presentation

Number of Views:165
Avg rating:3.0/5.0
Slides: 79
Provided by: hpcus552
Category:

less

Transcript and Presenter's Notes

Title: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)


1
Basic Quantitative Methods in the Social
Sciences(AKA Intro Stats)
  • 02-250-01
  • Lecture 5

2
Sampling Distributions
  • Inferential Statistics generalizes findings
    obtained from samples to the populations that the
    samples were drawn from
  • Samples need to be representative of the
    populations they are drawn from so we use
    random sampling

3
Random Sample
  • Random Sample a sample in which each member of
    the population has an equal chance of being
    included
  • We cannot assume that a random sample is exactly
    representative of its population
  • E.g., randomly choosing 50 students from this
    class their mean age may not be exactly the
    mean age of the entire class (the population
    approx 230 students)

4
Random Sampling
  • Random sampling makes all the samples which could
    be drawn from the population equally likely
    (e.g., who is included in the 50 student sample)
  • Each of the possible samples of 50 students would
    have mean ages that would slightly differ from
    the population mean age
  • We measure this difference with sampling error

5
Sampling Error
  • Sampling Error the difference between a
    statistic and the parameter it estimates
  • E.g., if the population mean age was 24 and the
    sample mean age was 21, we say we have a sampling
    error of 3 years

6
Sampling Error
  • Because we usually dont collect data for an
    entire population, we must have some way of
    estimating the sampling error size and account
    for it when we generalize sample information to
    populations
  • We often obtain more samples to determine the
    sampling error

7
Sampling Distributions
  • If we draw 6 samples of 50 students from this
    class, we can obtain a better estimate of the
    true population mean age than if we only drew one
    sample
  • Suppose the mean ages for those 6 samples were as
    follows
  • 25, 23, 23, 25, 25, 26
  • The mean of these 6 mean ages is 24.5

8
Sampling Distributions
  • Looking at the mean age of the first sample, 25
    years, if we only had data for this one sample,
    25 years would be our best estimate of the true
    population mean
  • By taking more than one sample, we calculate a
    more accurate estimate of the population mean,
    24.5 years

9
Sampling Error
  • Since all of the ages are relatively close to
    each other, we can say with greater certainty
    that we have small sampling error for any one of
    the sample means
  • If the samples mean ages were much more
    dissimilar, any one of the sample age means would
    probably have a much higher sampling error

10
Sampling Error
  • This means that the variability of a statistic
    over repeated samplings gives us some indication
    of sampling error
  • If we continued to draw samples from the
    population until all possible samples had been
    drawn and the statistic of interest (mean age) is
    entered into a frequency distribution, this is
    known as a sampling distribution

11
Sampling Distributions
  • Sampling Distribution the distribution of a
    statistic over repeated sampling from a specified
    population
  • Using our previous example, the sampling
    distribution of the mean for this class is a
    distribution of the means of every possible
    sample of 50 students

12
Expected Value
  • The mean of a sampling distribution of
  • is known as the expected value of the mean
    the mean of sampling means
  • We use the symbol ? instead of for the mean
    of a sampling distribution because it is a
    population of terms

13
Standard Error
  • The standard deviation of a sampling distribution
    is know as the standard error (?x) the
    standard amount of difference between and ?
    that is reasonable to expect simply by chance
  • The mean of any sample we take can be plotted on
    the sampling distribution of X if we know the ?x
    and ?x
  • The sampling distribution of X is a normal
    distribution

14
Sampling Distribution
Sampling error
?x ? x
Obtained from one sample
15
Standard Error
  • The formula for standard error is as follows

16
Sampling Distributions
  • We usually dont know ?x and ?x and must
    estimate ?x
  • Sampling Distributions are the basis for many
    statistical tests (e.g., t-test well talk
    about this later)
  • Statistical tests are a mathematical way of
    testing a hypothesis

17
Hypothesis Testing
  • Hypothesis testing is a way of examining a
    statement about a relationship between
    independent and dependent variables
  • Independent variable the variable whose effects
    the experimenter is interested in studying
  • Dependent variable the variable that the
    experimenter measures (the data)

18
Independent and Dependent Variables - Example
  • If an experimenter is interested in researching
    how hours of studying for an exam affect
    performance on a test, the variables are as
    follows
  • Independent Variable (IV) hours spent studying
  • Dependent Variable (DV) performance on test
    (e.g., grade received)

19
Independent Variables
  • There are 2 broad types of IVs
  • Treatment Variable a treatment the experimenter
    applies to previously undifferentiated
    participants
  • E.g., certain participants are told to study for
    5 hours and others are told to study for 2 hours
  • Categorical Variable A characteristic that is
    inherent to, or pre-exists, in the participant
  • E.g., gender you cant assign someone a gender

20
Levels of IV
  • We also talk about the levels of IVs how we
    break down the IV
  • E.g., if we are interested in studying the IV of
    hours spent studying, it could have 2 levels 2
    hours and 5 hours
  • Studying the IV of gender has 2 levels male and
    female
  • The levels of an IV are compared on their DV
    scores to look for a difference in outcome is
    there a difference in test performance between
    those who study for 5 hours and those who study
    for 2?

21
Time to Think
  • A nursing researcher wants to know if giving TLC
    prolongs life in cancer patients. 50 cancer
    patients are divided into two groups group A
    (n25) is given TLC by their nurses, and group B
    (n25) are not. What is the DV, IV, and levels of
    IV?
  • A researcher wants to know if members of the
    Federal Liberal Party are wealthier than are
    members of the Federal NDP. 100 members of each
    party are asked to submit financial statements.
    What is the DV, IV, and levels of IV?

22
Null Hypothesis
  • Tests of hypotheses in science are decisions to
    retain or reject a null hypothesis (Ho)
  • Null hypothesis (Ho) a statement of
    relationship between the IV and DV, usually a
    statement of no difference or no relationship
    we assume there is no relationship between IV and
    DV

23
Null Hypothesis Examples
  • Men and women do not differ in IQ (?men ?women)
  • Hours spent studying do not affect test
    performance (?2 hours ?5 hours)
  • Height does not affect weight (?short
    ?tall)

24
Null Hypotheses
  • Null hypotheses contain 3 components
  • The IV comparison being made
  • The DV being measured
  • The null relationship between IV and DV (e.g.,
    do not differ)

25
Alternative Hypothesis
  • Although not directly tested, the Alternative
    Hypothesis (Ha) does state a relationship, or
    effect, of the IV on the DV this is often
    called the Research Hypothesis
  • E.g.,
  • Ha Men and women do differ in IQ (?men ?
    ?women)
  • Ha Women have higher IQs than men (?women gt ?men)

26
Directional Ha
  • Ha Women have higher IQs than men (?women gt
    ?men) is a directional alternative hypothesis
    we state that one level of the IV will have
    greater (or lesser) DV scores than the other
    level
  • When we make a directional alternative
    hypothesis, we have a reason (either based on
    past research or a theory) to predict the
    direction of the results (i.e., that a statistic
    at one level of the IV will be greater or less
    than the statistic at the other level of the IV)
    (note the above example is hypothetical only)

27
Non-Directional Ha
  • A non-directional alternative hypothesis does not
    state the expected direction of effect
  • Ha Men and women have differing IQs (?women ?
    ?men)
  • We make a non-directional alternative hypothesis
    when we have no reason to predict the direction
    of the results. For instance, since there is no
    theory or research body that would suggest that
    women should have higher IQs than men, we would
    only predict that their IQs are different than
    mens

28
Hypothesis Testing
  • Hypothesis testing looks at the observed
    difference in DV scores between the levels of the
    IV and compares this difference to the expected
    difference (Ho)
  • Any difference in value of the DV between the
    levels of the IV can be explained in 2 ways the
    effect of the IV or sampling error

29
Hypothesis Testing
  • Testing the null hypothesis is a way of
    determining the probability that the observed
    outcome could be found if the null hypothesis was
    true
  • E.g., if we did find a difference between the IQs
    of men and women, what is the chance we would
    find this result if there is actually no
    difference between their IQs?

30
Confidence Levels
  • When this probability drops below a certain
    level (a criterion level), we call the result
    significant
  • This criterion level is known as the confidence
    level of the test, or alpha (?)

31
Confidence Level
  • Confidence Level a criterion level of
    probability (alpha ?), set by the experimenter,
    which acts as the reference for deciding whether
    to reject or retain the null hypothesis
  • Significant Result at .05 we determine the null
    hypothesis is not true but there is a 5 chance
    that the null hypothesis is actually true.

32
Confidence Level
  • The confidence level is set by the experimenter,
    but generally the convention is to use ? 0.05
    and ? 0.01
  • For ? 0.05, this means that there is a 5
    chance we will reject the null hypothesis when it
    is actually true

33
Rejecting the Null Hypothesis
  • If the likelihood of observing this outcome is
    below the confidence level (? 0.05 or ?
    0.01), then we say that the result is significant
    and we reject the null hypothesis
  • Significant results reject Ho (there is a
    difference)
  • Non-significant results retain Ho (there is no
    difference)

34
Type I and Type II Errors
  • When we decide to retain or reject the null
    hypothesis, we never do so with 100 certainty we
    are making the right decision we make the
    decision with a probability of being correct (the
    alpha level)
  • We can make an incorrect decision, resulting in 2
    types of errors, Type I or Type II

35
Type I Errors
  • Type I Error Rejection of the null hypothesis
    when it is true
  • We conclude that the IV affects or is related to
    the DV when in reality the result was due to
    sampling error
  • We see something that is not really there

36
Type I Error Example
  • If our null hypothesis is that men and women do
    not differ in IQ, the Type I error is
  • Finding a result that men and women do differ in
    IQ, when in reality they do not
  • We find this difference because of sampling error

37
Type II Errors
  • Type II Error Retention of the null hypothesis
    when it is false
  • We conclude that the IV does not affect or is not
    related to the DV when in reality there is an
    effect or relationship
  • We fail to see something that is really there

38
Type II error Example
  • If our null hypothesis is that men and women do
    not differ in IQ, the Type II error is
  • Finding a result that men and women do not
    differ in IQ, when in reality they do

39
Type I and Type II Errors
40
Type I and Type II Errors
  • The probability of making a Type I error is equal
    to the confidence level of the statistical test
    (? 0.05 or ? 0.01)
  • When you lower the probability of making a Type I
    error (e.g., use ? 0.01 instead of ? 0.05)
    you increase the probability of making a Type II
    error

41
Forget About It!
  • For this class, you do not need to know how to
    determine the numerical value of a Type II error,
    nor do you need to understand power
  • You do need to understand what a Type II error is

42
Consider a Sampling Distribution of Arts
Students GPAs.
Sampling error
?x ? x
6
10
43
What might this mean?
  • This samples mean (10) appears to be
    substantially larger than the population mean
    (6). Why might this be?
  • Perhaps there is something distinct about this
    sample such that it is not really part of this
    sampling distribution to begin with (e.g., maybe
    there are gifted arts students)
  • Alternatively, perhaps its just fluke, and we
    just happened to have sampled a bunch of good
    arts students. Stated differently, perhaps this
    sample mean is part of the sampling distribution
    of arts students

44
Reminder
  • We can determine the proportion of scores (in
    this situation, sample means) that would fall to
    the right of the sample mean in question by
    looking at a normal distribution table (Table
    E.10).
  • To do so, we need to know the Z value of this
    sample mean. We will come back to this (but for
    sake of clarity, note that we will be learning to
    calculate a z-test, which uses a slightly
    different formula than the z-score formula that
    you know)

45
One vs. Two Tailed Tests
  • The tails of a test set up our rejection region
    they determine how we decide to retain or
    reject Ho
  • When we use a one-tailed test, we are testing the
    null hypothesis for a directional alternative
    hypothesis (e.g., Ha women will have higher IQs
    than men)
  • We are only interested in whether or not women
    have higher IQs than men, not lower

46
Two-Tailed Tests
  • When we use a two-tailed test, we are testing the
    null hypothesis for a non-directional alternative
    hypothesis (e.g., Ha women and men will have
    different IQs)
  • Here, we are interested in whether or not women
    have higher or lower IQs than men

47
One vs. Two Tailed Tests (using ? 0.05)
2.5 2.5 5 5
48
Two-Tailed Tests
  • Once we begin discussing t-tests, you will see
    that the value that determines whether or not our
    observed statistic falls above or below the ?
    0.05 depends on a number of factors
  • For now, know that we reject Ho if our observed
    statistic is significantly greater than our
    expected statistic

49
Test Statistics
  • A test statistic is a number calculated from the
    scores of a sample that allows us to test a Null
    Hypothesis and make a decision to reject or
    retain the Ho
  • We will be talking about various test statistics
    for the remainder of the term, and will begin
    with the z-statistic today

50
Z-scores Revisited
  • We know, by using the z-score formula, the
    probability of obtaining a score less than a
    given X value in a standard normal distribution
  • E.g., when

51
The smaller portion area is .0668 (from Table
E.10)
.0668
z -1.5 0 X
70 100
52
Interpreting z
  • This means that if we randomly select one score
    from this sample, the probability of that score
    being less than 70 is .0668
  • But what if we want to test the hypothesis that a
    sample of n scores (mean 70) is actually a part
    of the population (mean 100, sd 20)?
  • We no longer use the z-score formula, we use a
    z-statistic
  • Remember whenever we are testing a hypothesis,
    we use a test statistic

53
What is Sigma?
  • Usually, we do not know sigma ( ), the sd for
    a population (because obtaining data for an
    entire population is usually not done)
  • Sometimes we do know sigma (e.g., for common
    psychological tests)
  • When we know sigma, we can obtain the sampling
    distribution of the mean when the Null Hypothesis
    is true (that the sample does come from the
    population)

54
Null Hypothesis
  • When we compare a sample mean with a population
    mean, the Null Hypothesis is that the sample DOES
    come from that population
  • Ho or that 70 100
  • But how can 70 100??
  • Recall that a sample extracted from a population
    with µ 100 will more than likely result in a
    sample mean that is above or below 100 because of
    sampling error
  • When we test a Null Hypothesis, we are testing to
    see if the sample mean and population mean are
    statistically different from each other (that
    there is a 95 chance based on an alpha level of
    .05 that 70 is statistically different from 100)

55
Sampling Distribution of the Mean
  • In hypothesis testing, we set up the sampling
    distribution of the mean and then calculate a
    test statistic to determine if we can reject the
    Ho
  • How is this done? Whenever we know we use a
    z-test we know for the one sample of
    interest, we know for the population, so we
    can calculate (standard error for the
    sampling distribution of the mean)

56
Standard Error Revisited
  • Last week, we stated that the standard deviation
    of a sampling distribution of the mean is called
    standard error
  • Standard error is used in test statistic formulae
    because we are using sampling distributions of
    the mean

57
Z Statistic
  • If testing a null hypothesis that a sample mean
    is equal to the population mean (and sigma is
    known), we must use the following formula for the
    z-statistic (standard error instead of standard
    deviation)

58
The z-statistic
  • Why zobs?? When we test the Ho, we will compare
    this zobs (our z observed) value with a zcrit
    (our z critical) value
  • Note zobs is often also called zobt (for z
    obtained)
  • Hypothesis testing compares the absolute value of
    zobs and zcrit in the following way
  • If zobs gt zcrit we reject the null hypothesis
  • If zobs lt zcrit, we retain the null hypothesis
  • If zobs zcrit, we retain the null hypothesis

59
Zcrit
  • The zcrit value is determined based on the alpha
    level used (usually alpha .05)
  • zcrit is the z-score below which the probability
    that the sample data come from the population is
    less than .05 (the score that marks the tail)
  • We use Table E.10 to determine zcrit
  • Why might we be interested in this?
  • We will know if we are using a one-tailed or
    two-tailed z-test based on our research question
  • If we use a one-tailed test, the area in that
    tail is .05
  • If we use a two-tailed test, the area in EACH
    tail is .025 (.05/2 tails)

60
Determining zcrit
  • When we discussed z-scores, we reviewed problems
    where you know the proportion of scores and
    needed to determine the z-score (e.g., the
    lowest 10)
  • Determining zcrit is a similar process
  • Step 1 one-tailed or two-tailed?
  • Step 2 alpha .05 or alpha .01?
  • Step 3 Find the area in the smaller portion
    column in Table E.10 to determine the zcrit

61
Tail Review
2.5 2.5 5 5
62
zcrit for Two-tailed Tests
  • Alpha .05 means that there is .025 per tail
  • Find .025 in the smaller portion column
  • zcrit 1.96
  • Note! This is two-tailed, so this means
  • zcrit 1.96
  • Alpha .01 means that there is .005 per tail
  • Find .005 in the smaller portion column
  • zcrit 2.57
  • Note! The exact smaller portion of .005 is not
    in the table. The values of .0049 and .0051 are
    listed, so which do we use?? Convention dictates
    that we use zcrit 2.57

63
zcrit for One-tailed Tests
  • Alpha .05 means that there is .05 in the tail
  • Find .05 in the smaller portion column zcrit
    1.64
  • Note The exact smaller portion of .05 is not
    in the table. The values of .0495 and .0505 are
    listed, so which do we use?? Convention dictates
    that we use zcrit 1.64
  • Note! To determine if this is a or zcrit,
    look at your Alternative Hypothesis
  • Alpha .01 means that there is .01 in the tail
  • Find .01 in the smaller portion column zcrit
    2.33
  • Note! .0099 and .0102 are listed we use .0099 (z
    2.33) because it is closest to .0100
  • Note! To determine if this is a or zcrit,
    look at your Alternative Hypothesis

64
Z-test Hypothesis Testing Steps
  • 1. State level of significance
  • ? 0.05 (? 0.05 is usually used)
  • OR ? 0.01
  • 2. State IV, levels of IV, and DV
  • 3. State the hypotheses
  • Null hypothesis Ho
  • Alternative Hypothesis Ha
  • Note! At this point you need to read the question
    carefully to decide if you are testing a
    directional or nondirectional hypothesis

65
Z-test Hypothesis Testing Steps
  • 4. Determine if you are using a one-tailed or
    two-tailed test
  • A one-tailed test is used when you test a
    Directional hypothesis
  • A two-tailed test is used when you test a
    nondirectional hypothesis
  • 5. Find the rejection region
  • I.e., find your zcrit!
  • It is usually a good idea to draw the normal
    curve and plot your zcrit at this point it
    helps!

66
Z-test Hypothesis Testing Steps
  • 6. Calculate your z statistic (zobs)
  • 7. Compare zcrit to zobs
  • Plot zobs on your normal distribution
  • Compare the numerical value of zcrit to zobs

67
Step 7 Example 1 (alpha .05)
.025
.025
  • If zobs 2.59

Two-tailed
zcrit -1.96
1.96
zobs
2.59
.05
One-tailed
zcrit
1.64
zobs
2.59
68
Step 7 Example 2 (alpha .05)
.025
.025
  • If zobs -1.75

Two-tailed
zcrit -1.96
1.96
zobs -1.75
.05
One-tailed
zcrit -1.64
zobs -1.75
69
Step 7
  • Null Hypotheses are rejected when zobs falls in
    the rejection region (the area beyond zcrit).
    The rejection region is the tail of the
    distribution
  • OR Null Hypotheses are rejected when
    zobs gt zcrit
  • BUT! What about when the zobs and zcrit are both
    negative numbers??
  • In this case, think of rejecting Ho when the
    absolute value of zobs gt zcrit
  • Absolute value means that you remove the
    negative sign from both numbers (e.g., the
    absolute value of 5.5 is 5.5)

70
Step 8 (Last One)
  • Step 8. State conclusions in words
  • Once you decide to reject or retain Ho, you need
    to state your conclusions
  • So what does rejecting the Ho actually mean for
    this research study?
  • OR What does retaining the Ho actually mean for
    this research study?

71
Step 8 continued
  • Rejecting the Ho for z-tests means that the
    sample mean is significantly different than the
    population mean, i.e., there is less than a 5
    chance that a sample extracted from this
    population would result in such a sample mean
    (because its in the tail end)
  • BUT! For one-tailed tests, make sure that you
    state how they are different (i.e., is the sample
    mean greater or less than the population mean)
  • Your conclusions should be clear enough that
    anyone in the general public could understand
    what the study found

72
An Example Using the Z-test
  • Scientists have come up with a breakthrough new
    drug, they assert that by taking this drug it
    will affect your IQ. Because it is so new they
    are hoping it makes you smarter, but at this
    point it might also make you dumber. A sample of
    36 people has x 105, the population µ 100 and
    the population ? 15. Test their hypothesis.

73
Example cont.
  • 1. State level of significance - ? 0.05 (what
    is usually used)
  • 2. State IV and DV
  • IV pill (levels pill and no pill)
  • DV IQ scores
  • 3. Null hypothesis
  • The drug does not make you smarter or dumber
    (i.e., the sample mean does not differ from the
    population mean)
  • Alternative Hypothesis
  • The drug makes you either smarter or dumber

74
Example
  • 4. B/c this hypothesis is non-directional, we use
    a two-tailed test
  • 5. Find the rejection region ? 0.05, so with a
    two-tailed test we want a critical value that
    represents a region of rejection that makes up
    0.025 of the area of each tail

.025
.025
75
Example
  • From Table E.10, we find that the critical value
    for z is equal to 1.96 or 1.96
  • This means that zcrit ?1.96
  • 6. Calculate your statistic

76
Example
  • This means our zobs 2.00
  • 7. Compare zcrit to zobs
  • Is zobs gt zcrit??
  • Yes! 2.00 gt 1.96

.025
z -1.96 1.96 2.00
77
Example
  • B/c our zobs lies beyond zcrit we say our z-value
    falls into the region of rejection the value of
    zobs is greater than the value of zcrit so we
    choose to reject the Ho
  • 8. We conclude that the IQ pill significantly
    changes someones IQ when they ingest it

78
Work On It
  • The average number of times that a Canadian
    donates blood by the time they reach the age of
    50 is 10, with a population standard deviation of
    3 times. Researchers think that nurses donate
    more blood than average Canadians. 25
    fifty-year-old nurses are asked how many times
    they have given blood, and their mean number of
    times donating blood is 15. Test the hypothesis
    at the .01 level of significance.
Write a Comment
User Comments (0)
About PowerShow.com