Statistics for Linguistics Students presentation

About This Presentation

Transcript and Presenter's Notes

Title: Statistics for Linguistics Students

1
Statistics for Linguistics Students

Michaelmas 2004
Week 4
Bettina Braun
www.phon.ox.ac.uk/bettina

2
Overview

Discussion of last assignment
z-distribution vs. t-distribution
Between-subjects design vs. Within-subjects
design
t-tests
for independent samples
for dependent samples

3
Exercise z-scores
1) The mean pause duration in a read text is
200ms with a standard deviation of 50ms. For the
calculations please specify how you reached your
conclusion! a) Is this a statistic or a
parameter? If we are interested in describing
this particular read test, then its a parameter.
If we use this text to draw inferences about
pause duration in any text then its a
statistic. b) What proportion of the data is
above 70ms?z2.60.47 of the data lie below
70ms99.53 of the data lie above 70ms c) What
proportion of the data falls between 100ms and
300ms?z22,28 lie below 100ms and 2.28 lie
above 300ms95.44 lie between 100ms and 300ms
4
Exercise sampling distribution

2) If we have a sample size of 50, what does the
sampling distribution of the means look like if
the population is
U-shaped
skewed-left, and
normally distributed?
Because of the central limit theorem, the
sampling distribution of the mean will be
normally distributed, irrespective of the form of
the parent distribution

5
Exercise central limit theorem, standard error

3) What happens, if the sample size increases
for the following statistics. Does the
estimated mean increase, decrease, or stay
approximately the same? Why?Stays the same as
the sample mean is an adequate estimate for the
population mean (central limit theorem)
standard error increase, decrease, or stay
approximately the same? Why?Standard error
decreases with the square root of the sample size
(see formula for standard error)

6
What are frequency data?

Number of subjects/events in a given category
You can then test whether the observed
frequencies deviate from your expected
frequencies
E.g. In an election, there is an a priori change
of 50-50 for each candidate.

7
X2-test

Null-hypothesis there is no difference between
expected and observed frequency
Data
Calculation

Kerry supporter Bush supporter
observed
expected
8
X2-test

Limitations
All raw data for X2 must be frequencies
Each subject or event is counted only once(if we
wish to find out whether boys or girls are more
likely to pass or fail a test, we might observe
the performance of 100 children on a test. We may
not observe the performance of 25 children on 4
tests, however)
The total number of observations should be
greater than 20
The expected frequency in any cell should be
greater than 5

9
Looking up the p-value

Degrees of freedom
If there is one independent variabledf (a
1)
Iif there are two independent variablesdf
(a-1)(b-1)

10
Exercise dependent and independent variables

Generally, in hypothesis testing, the independent
variable is hardly ever interval. Mostly it is
nominal, or ordinal
Differentiate between
Number of independent variables (e.g. gender and
exam year for score example gt 2)
Levels of an independent variable are the number
of values it can take (e.g. gender generally 2)
The null-hypothesis is formulated to deny a
relation between dependent and independent
variable

11
Exercise dependent and independent variables

Imagine you have a text-to-speech synthesis
system. You are interested to find out whether
the acceptability (from 1 to 5) is increased if
you model short pauses at syntactic phrases.
dependent variable acceptability (ordinal data)
independent variable TTS with/without pause
model (2 levels)
Null-Hypothesis Duration model does not
influence acceptability rating

12
Exercise dependent and independent variables

Subjects learned 20 nonsense-words presented
visually. 30 minutes later they were tested for
retention. The next day, the same subjects
learned another 20 nonsense-words, this time in a
combined visual and auditory presentation. Again,
after 30 minutes they were tested for retention.
The researcher measured the number of correct
nonsense-words.
dependent variable number of correct responses
(interval data)
independent variable kind of presentation (2
levels)
null hypothesis The number of correct responses
will be the same in the two conditions

13
Further influencing factors

Besides the independent variable, there might be
further factors that influence your dependent
variable.
Other factors might be confounded with our
independent variable (e.g. in the nonword
retention task, the audio-visual presentation was
on a different day than the auditory
presentation. Presentation kind can thus be
confounded with presentation time)
Systematic error

14
Counterbalancing

To avoid confounding variables, the conditions
have to be counterbalanced. Examle
Half the subjects are doing the auditory
presentation first and the audio-visual
presentation second
Half the subjects are doing the task in opposite
order
We often have a group of subjects to perform the
task (not just one subject)
Also, in linguistic research, we often use
multiple repetitions or different lexicalisations
for a given condition (e.g. different words that
all have a CVCV strucure)

15
Exercise drawing error-bars

Variables need to have the correct type!
Error bars show the 95 confidence interval for
the mean (i.e. the mean and the area where 95 of
the data fall in)
One independent variable
Simple error bar for groups of variables
Two independent variables
Clustered error bar for groups of variables

16
Exercise drawing error-bars
Clustered error bars for two independent variables
17
Example testing if a sample is drawn from a
given population

A lecturer at Oxford University expects that
students at this university have a higher
IQ-score than the average British population.
Since records are taken, he knows that the mean
IQ-score in Britain is 200 with a standard
deviation of 32

18
Experimental Procedure

The Null-hypothesis H0 is that the IQ of Oxford
students is no different from the general public.
He randomly selects 40 students and gives them
the standard IQ test.
This results in an IQ-score of 210
Questions
Can he conclude that Oxford students have a
higher IQ?
Can he compare his sample to the population?

19
Comparison to population

The sample mean cannot directly be compared to
the whole population, but to the sampling
distribution of the sample mean (with samples of
size n40).
The sampling distribution has the same mean as
the population (200) and the standard error of

20
Calculating z-score

Since the sampling distribution will be normally
distributed (for n gt 30), we can calculate the
z-score to see how likely a mean of 210 is, given
the null-hypothesis were true

There is a chance of 2.4 that the sample mean
falls within the sampling distribution
21
What if the population is unknown?

Often, we compare two different samples and we do
not know the population parameters (e.g. are
exam scores of the year 1990 and 2000 from the
same distribution?)
Independent variable ( levels?)
Dependent variable (type?)

22
What if the population is unknown?

Often, we compare two different samples and we do
not know the population parameters (e.g. are
exam scores of the year 1990 and 2000 from the
same distribution?)
Independent variable ( levels?)exam year (2
levels)
Dependent variable (type?)exam score (interval
data)

23
Hypothesis

Null-hypothesis The scores in the 2 exam years
were drawn from the same distribution
Comparison of the means of the two populations
(estimated from two representatitve samples)
What statistical test do we have to perform?

24
Between-subjects design (completely randomised)

All comparisons between the different conditions
are based on comparisons between different
(groups of) subjects
Each subject provides data for only one research
condition
ExampleYou want to test whether the pitch of
children under the age of 10 is dependent on
their gender (a given child is either male or
female!)

25
Within-subjects design (repeated measures)

All comparisions between different conditions are
based on comparisons within the same group of
subjects
Each subject provides data for all experimental
conditions (as many scores as experimental
conditions)
ExampleYou want to test whether the number of
reading errors is higher when a subject is sober
or slightly drunk.

26
Why is this difference important?

On average, two scores from P1 and two scores
from P2 will be more alike than two scores, one
from P1 and one from P2
Scores from one person on the same task will be
correlated this is taken into account by
within-subjects tests.
If between-subjects test is used for
within-subjects design, we may fail to find an
effect (type II error)
If within-subjects test is used for
between-subjects design, we might find an effect
that is actually not there (type I error)

27
Example

You want to test whether the precontext has an
effect on the prosodic realisation of
sentence-initial accents.
You construct 20 sentences, which can appear in
two different contexts, say contrastive and
non-contrastive.
Then you ask 20 subjects to read the 40 short
paragraphs and measure the pitch height of the
initial accent and the duration of the initial
word.
You want to know if accents are realised
differently in contrastive and non-contrastive
context.

28
Difficult cases

Different classes of dependent variables
If you are interested in articulatory precision
at two different speech rates, you might measure
the formant values of the vowels and the number
of sound elisions
These two dependent variables are taken from the
same speaker but this is not a within-subjects
design

29
Difficult cases

More than one measurement per subject, combined
to give one score
You are interested in the formant values of male
and female /a/. You have a list of 20 words,
containing an /a/. Each group of 10 speakers
reads the 20 words and you measure the formant
values. Then you build the mean formant value of
/a/ for every speaker
Since the analysis is performed on only one score
per subject, no within-subjects design

30
Which statistical test, when youve score data
(parametric tests)?
Between, within, mixed?
Significance test
Number of indepen-dent variables?
Indep. t-Test (2 levels)
One
One-way ANoVA
Between
Two-/Three-way ANoVA
More than one
Paired t-Test (2 levels)
One
a x s ANoVA
Within
b x b (x c) x s ANoVA
More than one
Mixed
31
Assumptions for statistical tests on score data
(parametric tests)

The scores must be from an interval scale
The scores must be normally distributed in the
population
The variances in the conditions must be
homogenious
Note You can perform parametric tests only if
these assumptions are met!

32
T-Test

Students T-test
How likely is it that two samples are taken from
the same population?
T-test looks at the ratio of the difference in
group means to the variance

Sample 1 Sample 2
Figure taken from http//esa21.kennesaw.edu/module
s/basics/exercise3/3-8.htm
33
T-Tests

Calculating t-statistic
Comparable to z-statistic, but dependent on the
degrees of freedoms (df)
Degrees of freedom (df)
Independent t-test N1N2-2
Paired t-test N-1
The critical t-value for a 0.05 (5 risk of
finding an effect that is not actually there) is
dependent on df

34
T-distribution

The more degrees of freedom, the closer the
closer the t- distribution is to the normal
distribution

35
T-Table
36
One-tailed vs. two-tailed predictions

If we predict a direction of the difference, we
are making a one-tailed prediction
If we predict that there is a difference
(irrespective of direction), we are making a
two-tailed prediction
If there is not enough evidence for a directional
difference, a two-tailed test is safe.

37
Example

Hypothesis reaction time in cond a is
significantly different from cond b
Null-hypothesis the reaction times are not
different in conditions a and b

38
Independent t-test in SPSS

Organise independent and dependent variables in
separate columns!

39
Independent t-test in SPSS

Independent variable(s)Test variable(s)
Dependent variableGrouping variable

You have to specify the levels of the independent
variable (can only have two!)
40
How to interpret the output?
Descriptive statistics
If p gt 0.05, variances are homogenious
There is an effect of condition on rt
41
How to interpret the output?

Group statistics (descriptive statistics for the
conditions)
Independent samples test
Levenes test for equality of variances(if p gt
0.05, then variances are homogenious)
t-test for equality of means
t-value
df (N-2)
Significance level (2-tailed)
mean difference (difference between the means)

42
What do we report?

There is a significant effect of condition on
reaction time. The average reaction time in
condition a was 238.7ms longer than in condition
b (t 6.12, df 62, p lt 0.001).
Interpretation?

43
Paired t-test in SPSS

Variables of different conditions have to be in
parallel columns.
Click on variables to compare and then

44
How to interpret the output?

Paired samples statistic (descriptive statistics)
Paired samples correlation (naturally, there
should be a rather strong correlation. Subjects
with a low rt will have a slow one in both
conditions)
Paired samples t-test(t, df (N-1), significance
level)

45
What if the basic assumptions are not met

For example
if the distributions are very skewed
if you have ordinal data instead of interval data
You have to use non-parametric tests
There is a whole range of non-parametric tests
Ill only show the most common ones

46
Non-parametric statistical tests (for one
independent variable only)
Between, within, mixed?
Significance test
Number of levels of independent variable?
Mann-Whitney Test
Two
Between
Kruskal-Wallis Test
More than two
Two
Wilcoxon Signed Ranks Test
Within
Freedman Test
More than two

Write a Comment

User Comments (0)

About PowerShow.com

Statistics for Linguistics Students PowerPoint PPT Presentation