Lecture 13: Statistics - PowerPoint PPT Presentation

1 / 50
About This Presentation
Title:

Lecture 13: Statistics

Description:

2. What are tests of significance for? Tests of significance weigh two competing hypotheses: ... Say, for example, that we want to look at F1 by speaker age? ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 51
Provided by: billha2
Category:
Tags: age | and | for | height | how | lecture | much | my | should | statistics | weigh

less

Transcript and Presenter's Notes

Title: Lecture 13: Statistics


1
Lecture 13 Statistics

MA/MSc LVC U. of York Autumn 2007 Bill Haddican
Greg Guy
2
Lecture 13 Statistics
  • Outline
  • Conventions in presenting quantitative data
  • Tests of significance-when to use which?
  • Chi-square tests
  • Goldvarb
  • T-tests (seminar)
  • Pearsons r (seminar)

Greg Guy
3
1. Some conventions
  • 1. Tables must always have a title (legend).
    Ideally, the title should be descriptive enough
    that your reader doesnt have to read the text to
    interpret it. That is, your table should be able
    to stand alone.
  • Tables should also be numbered. The number
    precedes the legend
  • Table 1. Frequencies of use of be like by
    speaker sex.
  • 3. Figures in columns should lined up by decimal
    places.
  • 4. Provide Ns as well as s when possible in
    your tables.

4
1. Some conventions

(Source my diss.)
5
1. Some conventions
  • 5. You should refer to all tables in your text.
  • Additional information needed to interpret the
    table--such as notes on differences in
    significance go below the table in a footnote.
  • Figures must include units of measurement.
  • On figures, the independent variable goes on the
    x axis and the dependent variable goes on the
    y-axis.
  • If your independent variable is a category
    variable use a bar-graph.

6
1. Some conventions

From Tagliamonte and Hudson 1999162
7
1. Some conventions
Figure X Coda r deletion in NYC by style

From Labov 1966
8
1. Some conventions
  • 10. Cardinal rule for tables Frequencies should
    reflect use of the dependent variable as a
    proportion of the total number of tokens of the
    independent variable (not the other way around.)
  • Table X Use of be like vs. say by speaker sex.
    (BAD Table!)

Men Men Women Women
Ns Ns
Be like 200 33 400 67
say 100 50 100 50
9
1. Some conventions
  • The s here tell us that 33 of our be like
    tokens are by men and 67 are by women. Our say
    tokens are evenly distributed by sex.
  • But this isnt what we want to know. Rather,
    what we want to know is whether men use be like
    vs. say to a greater or lesser extent than women.
    To see this, we need to look at use of be like
    vs. say among men as a proportion of the total
    number of tokens for men.

10
1. Some conventions
  • Table X Use of be like vs. say by speaker sex.
    (GOOD Table!)

Men Men Women Women
Ns Ns
Be like 200 67 400 80
say 100 33 100 20
total 300 100 500 100
  • The s here tell a very different story. Here,
    we see that Women tend toward be like much more
    strongly than men.

11
2. What are tests of significance for?
  • Quantitative sociolinguistic work involves
    positing relationships between variables.

(Source my diss.)
12
2. What are tests of significance for?
  • But how confident can we be that such
    distributions really do reflect a relationship
    between our variables and are not just by chance.
  • Tests of significance, then, are used to provide
    an estimate of this chance.

13
2. What are tests of significance for?
  • Tests of significance weigh two competing
    hypotheses
  • The null hypothesis There is NO relationship
    between the dependent variable and the
    independent variable.
  • The experimental hypothesis There IS a
    relationship between the dependent variable and
    the independent variable.

14
2. What are tests of significance for?
  • Example speaker sex and be like usage.
  • The null hypothesis There is NO relationship
    between speaker sex and be like usage.
  • The experimental hypothesis There IS a
    relationship between speaker sex and be like
    usage.

15
2. What are tests of significance for?
  • Our test of significance--a chi-square test in
    this case--will help us decide if observed
    differences in be like use by speaker sex reflect
    a relationship or are coincidental.
  • Tests of significance generate a probability
    value, denoted as p. This is the probability
    that the null hypothesis is correct.
  • Our p-value is the chance that there is NO
    relationship between our variable.

16
2. What are tests of significance for?
  • In other words the smaller our p, the greater the
    chance that there IS a relationship between our
    variables.
  • p.05, for example, indicates a 5, or 1/20
    chance that the relationship between our
    variables is a fluke.
  • p.01 indicates a 1 or 1/100 chance that the
    observed relationship is accidental.
  • p.001 indicates a .1 or 1/1000 chance that the
    observed relationship is accidental.
  • (These are some standard benchmarks used.)

17
2. What are tests of significance for?
  • In sociolinguistics and in other social sciences
    p.05 is a standard minimum threshold for
    positing a relationship.
  • In multivariate analyses that youve seen, for
    example, when a given figure is said to be not
    significant this means pgt.05.

18
2. What are tests of significance for?
  • Table 4. Significant factor groups favoring
    (non-standard) participial affix doubling
  • Factor Group Frequency Weight
  • Educational attainment
  • High
    141/292 48 .25 Medium
    187/208 89 .81
  • Low
    186/215 87 .52
  • Sex
  • Women 229/296 77 .61
  • Men 285/419 68 .42
  • (Source my diss.)

19
3. When to use which test
  • In your reading of sociolinguistic work, youll
    have noticed different kinds of tests of
    significance chi-square tests, t-tests, F-tests.
    There are others, but these are some of the most
    frequently used.
  • Which of these to use depends on the kind of data
    you have.
  • Ordinal variables are categorical, e.g. male vs.
    female, Labour vs. Tory vs. Liberal.
  • Continuous variables are numerical or
    quantitative, e.g. formant frequencies,
    temperatures, heights.

20
3. When to use which test
  • Chi-square tests are used for two nominal
    variables.

Use of be like vs. say by speaker sex
Be like say
Men 70 30
Women 50 50
21
3. When to use which test
  • t-tests are used with a nominal variable and a
    continuous variable. More precisely, they
    compare the means of two samples.

F1s by social class
F1 Mean
Working class 400, 450, 500 450
Middle class 450, 500, 550 500
22
3. When to use which test
  • A good question thats undoubtedly in everyones
    head right now What kind of test should be use
    with two quantitative variables. Say, for
    example, that we want to look at F1 by speaker
    age?
  • Typically, measurements of correlation are used
    in such cases. These indicate to what degree one
    variable predicts or covaries with another.

23
4. The chi-square test
  • How it works.
  • What a chi-square test does is test the null
    hypothesis, that is, that there is NO
    relationship between our variables.
  • It compares observed values in a distribution
    with the expected values and measures the
    probability that the difference in these two is
    by chance.

24
4. The chi-square test
  • How it works.
  • The observed values are what we have in our data,
    which, lets suppose, is the following.
  • Observed values Use of be like vs. say by
    speaker sex

Be like say Total
Men 70 30 100
Women 50 50 100
Total 120 80 200
25
4. The chi-square test
  • How it works.
  • How, then do we determine the expected values?
    First, look at the totals for be like and say.
    How would we expect them to be distributed if
    there were no relationship?
  • Expected values Use of be like vs. say by
    speaker sex

Be like say Total
Men 100
Women 100
Total 120 80 200
26
4. The chi-square test
  • How it works.
  • How, then do we determine the expected values?
    First, look at the totals for be like and say.
    How would we expect them to be distributed if
    there were no relationship?
  • Expected values Use of be like vs. say by
    speaker sex

Be like say Total
Men 60 40 100
Women 60 40 100
Total 120 80 200
27
4. The chi-square test
  • How it works.
  • Now, in the previous example, figuring out the
    expected values was easy because the number of
    tokens for men and women was the same. What
    would we do if it wasnt?

28
4. The chi-square test
  • How it works.
  • Now, in the previous example, figuring out the
    expected values was easy because the number of
    tokens for men and women was the same. What
    would we do if it wasnt?
  • Observed values Use of be like vs. say by
    speaker sex

Be like say Total
Men 89 45 134
Women 60 47 107
Total 149 92 241
29
4. The chi-square test
  • How it works.
  • Easy. The expected values for each cell will be
  • ((?column)(?row))/total.
  • Expected values Use of be like vs. say by
    speaker sex

Be like say Total
Men (134x149)/24182.85 134
Women 107
Total 149 92 241
30
4. The chi-square test
  • How it works.
  • And so on.
  • Expected values Use of be like vs. say by
    speaker sex

Be like say Total
Men 82.85 (134x92)/241 52.15 134
Women 107
Total 149 92 241
31
4. The chi-square test
  • How it works.
  • Once you calculate one cells value in this way,
    you can calculate the rest by subtracting from
    the marginals.
  • Expected values Use of be like vs. say by
    speaker sex

Be like say Total
Men 82.85 134-82.85 51.15 134
Women 107
Total 149 92 241
32
4. The chi-square test
  • How it works.
  • Once filled out, our table of expected values
    will look like this.
  • Expected values Use of be like vs. say by
    speaker sex

Be like say Total
Men 82.85 51.15 134
Women 66.15 40.85 107
Total 149 92 241
33
4. The chi-square test
  • How it works.
  • We then compare the observed and the expected
    values in each cell. Note that the difference in
    each case is 6.15 (absolute value).

Be like say
Men (Obs) 89 (Exp) 82.85 (Obs) 45 (Exp) 51.15
Women (Obs) 60 (Exp) 66.15 (Obs) 47 (Exp) 40.85
34
4. The chi-square test
  • How it works.
  • To figure out what chance there is that this
    difference is by chance we use the following
    formula.
  • ?(observed-expected)2/expected
  • This means three steps
  • We square the difference between the observed and
    expected values in each cell
  • We divide this number by the expected value for
    each cell
  • We then add all of these cell values together.

35
4. The chi-square test
  • How it works.
  • Lets do this step by step.
  • First, subtract the expected from the observed
    for each cell.

Be like say
Men 89-82.856.15 45-51.15-6.15
Women 60-66.15-6.15 47-40.856.15
36
4. The chi-square test
  • How it works.
  • Second, square these differences.

Be like say
Men 6.15237.82 -6.152 37.82
Women -6.152 37.82 6.152 37.82
37
4. The chi-square test
  • How it works.
  • Third, divide these squares by the expected
    values for each cell.

Be like say
Men 37.82/82.85 .46 37.82/51.15 .74
Women 37.82/66.15 .57 37.82/40.85 .93
38
4. The chi-square test
  • How it works.
  • Finally, add all of these values together and
    this is our chi-square value.
  • X2.46.74.57.932.70

39
4. The chi-square test
  • How it works.
  • We then look up our chi-square on a chi-square
    table, which will give the probabilities
    (p-values) associated with each chi-square. (Such
    a table can be found on the web or in the back of
    any statistics book.)
  • To do this we will need to know the degrees of
    freedom (d.f.) The degrees of freedom for a
    chi-square is (rows-1)(columns-1. Recall that
    we have a 2 x 2 table, so our d.f. (2-1)(2-1)1.

40
4. The chi-square test
  • How it works.
  • Looking in our table, we find the following at 1
    d.f.
  • chi-square p
  • 3.84 .05
  • 5.41 .02
  • 6.64 .01
  • 10.33 .001

41
4. The chi-square test
  • How it works.
  • Our chi-square value of 2.70, then, means that
    theres more than a 5 chance that our observed
    relationship is by chance! (In fact, its about
    10.)
  • At the level of p.05, then, we do not reject the
    null hypothesis.
  • NB chi-square tests cannot be performed when the
    expected frequency of any given cell is less than
    5 and not ideal when total N lt 20. Instead use
    Fishers exact test.

42
4. The chi-square test
  • Other ways to do chi-square tests
  • Excel
  • On a spread sheet, youll need a table of
    observed values and a table of expected values,
    as above.
  • Click on a free cell. This is where your result
    will appear.
  • Then, on the Insert menu at the top, select
    Function. A dialog box will then appear with
    two columns in it.
  • In the left column select statistical. In the
    right column, select CHITEST.

43
4. The chi-square test
  • Other ways to do chi-square tests
  • Excel
  • A new dialog box will appear with two fields, one
    asking you for a range of observed values and one
    asking for a range of expected values.
  • You can input these by right-clicking and
    dragging the cursor over the relevant tables in
    your spreadsheet, and selecting OK.
  • Excel will then give you the p-value.

44
4. The chi-square test
  • Other ways to do chi-square tests
  • Web pages
  • An even easier solution is to use webpages such
    as the following
  • http//www.graphpad.com/quickcalcs/contingency1.cf
    m
  • No explanation necessary for this!

45
5. Goldvarb
  • Goldvarb (Varbrul) is a kind of multivariate
    analysis (specifically, a logistic regression
    model).
  • In the kind of variation data that we typically
    work with, different kinds of factors combine to
    produce the patterns of variation we see.
  • A speakers use of t-glottaling, say, may be
    influenced by his/her age, gender, dialect, as
    well as linguistic factors such as preceding and
    following segment.
  • The problem, then, is to sort out the effect of
    these competing constraints on variation.

46
5. Goldvarb
  • What Goldvarb does is build a model of this
    variation that estimates the contribution of
    different factors on the dependent variable.
  • Variables correspond (roughly) to factor groups,
    and the different categories of these variables
    are called factors. The factor group sex, for
    example, will (presumably) have the factors male
    and female.
  • Each factor in each group is then assigned a
    weight which estimates its contribution to the
    application value-a variant of the dependent
    variable.

47
5. Goldvarb

Source my diss.
48
5. Goldvarb
  • In reporting goldvarb analyses, authors typically
    also provide the Ns and the input (also
    corrected mean or overall tendency). This is
    roughly the overall likelihood of occurrence of
    the application value.
  • It is also typical to report Ns and frequencies
    for each factor.
  • Note, also, that goldvarb is only used with
    non-categorical variables. Variables for which
    variation is categorical or near categorical
    gt95 are excluded.

49
6. Conclusions
  • Goals
  • To review some conventions of presenting
    quanitative data.
  • To show when to use different kinds of tests of
    significance.
  • To show how to perform a chi-square test.

50
Further Reading
  • Guy, Gregory. 1993. The quantitative analysis of
    linguistic variation. (photocopy pack.)
  • Labov, William. 1966. The social stratification
    of English in New York City. Washington D.C.
    Center for Applied Linguistics.
  • Tagliamonte, Sali and Rachel Hudson. 1999. Be
    like et al beyond America. The quotative system
    in British and Canadian English. Journal of
    Sociolinguistics 3147-172.
  • Garson webpage (nice explanation of Fishers
    exact test)
  • http//www2.chass.ncsu.edu/garson/PA765/fisher.htm
Write a Comment
User Comments (0)
About PowerShow.com