Title: Magnitude estimation of linguistic acceptability: applications to research on developing grammars
1Magnitude estimation of linguistic acceptability
applications to research on developing grammars
- Antonella Sorace
- Utrecht, 13 November 2003
- antonella_at_ling.ed.ac.uk
2Outline
- Questions that are difficult to address with
conventional acceptability judgment tests. - Magnitude Estimation from psychophysics to
linguistics. - What you can get with ME that you cant with
other methods. - Applications of ME.
- Web-based ME and demo.
3The questions
- Anyone who deals with DEVELOPING grammars (in
language acquisition, language attrition,
diachronic change) is confronted with the
existence of gradience and optionality in
linguistic data.
4Optionality vs. gradedness
- Optionality is characteristic of a grammar that
allows different forms for the same meaning. - Gradedness is a manifestation of optionality the
likelihood with which optional variants appear or
are preferred.
5Differentiating among constraints
- Sorace Keller (2003) hard vs. soft
constraints. - Duffield (2003) underlying vs. surface
competence. - In both cases, the distinction is between
narrow syntax and interface syntax the
former is categorical and consists of formal
syntactic principles, the latter is determined by
the interaction of formal principles and specific
non-syntactic properties.
6The problem
- Can one capture this kind of data with
conventional acceptability judgment tests? - The answer is NO, or not completely.
- Judgments of linguistic acceptability are
essential data in linguistic and language
acquisition research but they are typically
elicited in informal ways that limit their
usefulness.
7Conventional measurements of linguistic
acceptability
- Judgments of linguistic acceptability usually
form category scales (acceptable, ) or limited
ordinal (acceptable, ?, , ) scales. - These scales require absolute rating judgments,
rather than relative ranking judgments. - Ordinal scales do not provide information about
the relative distance between adjacent points on
the scale.
8Disadvantage of conventional scales
- Measurements on these scales have several
disadvantages - They are limited in their range of values
- They are inconsistent in application
- They are not susceptible to analysis via
parametric statistics - They are unsuited to comparisons between effects
of different linguistic constraints, to estimates
of systematic variability of judgments, etc. - They are difficult to interpret (what do the
middle points on a rating scale mean?)
9What these scales cant capture
- relative strength of syntactic violations (for
native and non-native speakers are they the
same?) - lexical-semantic hierarchies within the domain of
application of syntactic principles (are these
acquired by L2ers?) - developmental optionality (do we find it in L2
endstate grammars?). - (among many other things.)
10In these cases, we want to measure
- The precise differences in acceptability between
sentences. - The strength of preferences expressed by subjects
for one sentence over another
11ME in psychophysics
- Magnitude estimation is an experimental technique
used to quickly and easily determine how much of
a given sensation a person is having. - Stevens was the first experimenter to suggest
using magnitude estimations to quantitatively
scale sensation.
12- In a magnitude estimation experiment subjects are
presented with a standard stimulus (a modulus)
and are asked to express the magnitude by a
number. - The subjects are then presented with a series of
stimuli that vary in intensity and are asked to
assign each of the stimuli a number relative to
the standard stimulus.
13- Subjects assign a number
- to first stimulus (the modulus), to reflect
- magnitude of pertinent characteristics
- (length, loudness, brightness, etc).
- to each successive stimulus to indicate
- apparent magnitude relative to the first.
14Scaling
- Scaling is not about absolute accuracy of
judgments - scaling is about the relative relationships
between judgments of stimuli of different
intensities.
15Different modalities
- The numerical modality is the most common but
other modalities are possible (e.g. line length). - Other modalities can be more user-friendly
particularly if you are testing people who (think
they) are numerically-challenged.
16How can you be sure subjects understand how to
perform magnitude estimations?
- Many magnitude estimation experiments use a
control condition in which subjects are asked to
perform magnitude estimations of the length of a
line. - Magnitude estimations of line length have been
shown to be proportional to the actual length of
the line.
17- If you can show that for a group of subjects
magnitude estimations increased proportionally
with the length of lines, you have established
that the subjects do indeed understand the
directions they have been given and can assign
numbers to their sensations systematically.
18Advantages of ME for physical dimensions
- ME provides measurements of subjective
impressions on a numerical scale which can be
plotted against the objective measure of the
physical stimuli giving rise to the impressions. - it does not restrict the number of values which
can be used. - linear regression of estimates against physical
measures in log-log coordinates produces a
straight line with a slope characteristics of the
physical property being assessed equal ratios on
the physical dimension give rise to equal ratios
of judgments (Stevens Power Law).
19The Power Law
- The magnitude of sensation varies as the
intensity of the physical stimulus raised to some
power m - Ssensation aconstant
Iintensity mexponent for a particular
sensation - when plotted on log-log axes, the power law plots
as a straight line with a slope of the exponent.
20(No Transcript)
21Examples of the Power Law for psychophysical
judgment tasks
22What about linguistic acceptability?
- Unlike other dimensions, linguistic acceptability
has no obvious physical continuum to plot
against the informants impressions.
23A psychophysical law for linguistic judgments?
- Keller (2003) has recently argued that a power
law of the same kind as that obtained in
psychophysics can be derived by plotting
estimated linguistic acceptability against the
number of linguistic constraints violated in the
stimuli.
24Extensions to non-physical domains
- Magnitude estimation has been adapted to judging
psycho-social continua with no objective metric
prestige of occupations, support for political
policies, etc. - Magnitude estimation was used on acceptability
judgments for the first time by Sorace (1992) not
to plot any function, but simply to compare the
results with those obtained using more familiar
techniques.
25Typical instructions
- Heres an example of what the instructions look
like..
26Instructions
- The purpose of this exercise is to get you to
judge the acceptability of some English
sentences. You will see a series of sentences on
the screen. These sentences are all different.
Some will seem perfectly okay to you, but others
will not. What we're after is not what you think
of the meaning of the sentence, but what you
think of the way it's constructed.
27- Your task is to judge how good or bad each
sentence is by assigning a number to it. - You can use any number that seems appropriate to
you. For each sentence after the first, assign a
number to show how good or bad that sentence is
in proportion to the reference sentence.
28- For example, if the first sentence was
- (1) cat the mat on sat the.
- and you gave it a 1, and if the next example
- (2) the dog the bone ate.
- seemed 20 times better, you'd give it twenty. If
- it seems half as good as the reference sentence,
- give it the number 0.5
29- You can use any range of positive numbers you
like including, if necessary, fractions or
decimals. - You should not restrict your responses to, say,
an academic marking scale. - You may not use minus numbers or zero, of course,
because they aren't proper multiples or fractions
of positive numbers. - If you forget the reference sentence don't worry
if each of your judgments is in proportion to the
first, you can judge the new sentence relative to
any of them that you do remember.
30- There are no 'correct' answers, so whatever seems
right to you is a valid response. Nor is there a
'correct' range of answers or a correct place
to start. - Any convenient positive number will do for the
reference. - We are interested in your first impressions, so
don't spend too long thinking about your
judgment.
31- Remember
- Use any number you like for the first sentence.
- Judge each sentence in proportion to the
reference sentence. - Use any positive numbers you think appropriate.
32Validation of ME
- How can we be sure that people can reliably use
magnitude estimation to judge linguistic
acceptability given that there is no metric
measurement? - Bard, Robertson and Sorace (1996) applied
standard validation procedures (i.e.
cross-modality matching and replication) people
had to use one modality to judge the magnitude of
the other.
33Choices about the modulus and face validity
- The experimenter has the option of assigning a
fixed number to the modulus. - The other option is to leave the modulus in sight
throughout the experiment. - This option has good face validity, but it
doesnt affect the ultimate reliability of the
estimates. - People dont need to remember the modulus if
they are making judgments proportionally, the
reference point shifts as they move on.
34Timed vs. untimed ME
- Timing the intervals between sentences may reduce
the likelihood that people consult metalinguistic
or prescriptive knowledge. - Intervals have to be different for non-native
speakers they have to be piloted carefully.
35Varying the instructions
- There is a tendency in some people to use a fixed
(usually 10-point) scale. This is possibly
because of familiarity with school marking
systems. - If the instructions contain an explicit warning
against using a restricted range of numbers, the
tendency is much reduced. - People are very sensitive to instructions these
have to be as explicit and clear as possible.
36Applying ME to linguistic acceptability
- ME yields interval scales, which allow the
application of parametric statistics. - mathematical operations can be applied to the
estimates, allowing - a direct indication of the speakers ability
to discriminate between grammatical and
ungrammatical sentences - a direct measure of the strength of
speakers preferences.
37Data analysis
- ME data need to be normalized. Two ways
- Transforming raw magnitude values into logarithms
before carrying out any further operation. - Dividing each numerical value by the modulus that
the subject had assigned to the reference
sentence then carry out analyses on the
log-transformed judgments. - Any statistical package can do this!
38Do you need a lab to use ME?
- No. ME is very adaptable and can be used with
pencil and paper, an overhead projector,
booklets, etc.
39Who can do ME?
- Any adults, although it may not be the technique
of choice if you are doing fieldwork with
low-literacy or low-numeracy people.
40Lexical gradience the Auxiliary Selection
Hierarchy (Sorace 2000)
- CHANGE OF LOCATION 'BE'
- CHANGE OF STATE
- CONTINUATION OF A STATE
- EXISTENCE OF STATE
- UNCONTROLLED PROCESS
- CONTROLLED PROCESSES (MOT)
- CONTROLLED PROCESS (-MOT) 'HAVE'
41Gradedness in Italian auxiliary selection(Bard,
Robertson Sorace, 1996)
42Other recent ME applications on language
development
- L1 attrition in the use of referential pronouns
(Tsimpli et al. 2003 Filiaci 2003). - Pronouns and clitics in L2 Spanish and Greek
(Parodi 2002). - Focus in L2 Hungarian (Papp 1999)
- Verb movement and null subject parameters in L2
French and Spanish (Ayoun 2000). - Residual verb raising in Faroese (Heycock 2003).
43WebExp
- Keller et al. (1998) developed a dedicated
interactive software WebExp which can be used
to collect acceptability judgment remotedly on
Internet, as well as in standard experimental
conditions. - The current version of WebExp is still available
but will undergo substantial revision in order to
improve compatibility.
44Collecting data at a distance WebExp
- WebExp offers the following features for
conducting web-based experiments - Two experimental paradigms are supported
magnitude estimation and sentence completion.
Both within-subject and between-subject designs
can be used. - Automatic subject authentication is achieved by
conducting basic plausibility checks on the
subject's data and by verifying the subject's
e-mail address.
45- WebExp automatically creates an individual
randomization of the experimental materials for
each subject. The experimenter can impose
constraints on the randomization to prevent
certain experimental items from occurring
consecutively. - WebExp records the time a subject takes to
respond to each experimental item. Automatic
checks can be carried out on both onset times and
completion times. - The response data are stored in a format that can
be easily processed by standard statistics
packages.
46- WebExp has been subjected to standard validation
procedures (Keller Alexopoulou 2001), which
suggest that the data it produces are comparable
to lab-based data.
47Future developments
- We are going to test a non-numerical version of
ME with older children (4 - yr olds).
- Older children should be able to understand the
concept of proportionality.
48Conclusions
- Magnitude estimation can be used by naive
informants to judge linguistic acceptability.
Within-group estimates are consistent across
response modalities and between-group comparisons
show consistent statistically significant
effects. - Magnitude estimation allows us to use the full
power of experimental design and statistical
analysis to test hypotheses derived from
linguistic theory.
49It works
- Magnitude estimation is particular suited to the
investigation of developing/unstable grammars. - ME has now been used in a wide range of language
studies on different topics and within different
theoretical frameworks.
50References
- Bard, E.G., Robertson, D. and Sorace, A. 1996.
Magnitude estimation of linguistic acceptability.
Language 72 32-68. - Keller, F. 2003. A psychophysical law for
linguistic judgments. Proceedings of the 25th
Annual Conference of the Cognitive Science
Society. Mahawah Lawrence Erlbaum. - Sorace, A. 1996. The use of acceptability
judgments in second language research. In V. T.
Bhatia and W. Ritchie (eds.) Handbook of Second
Language Acquisition. New York Academic Press,
p. 375-409. - Sorace, A. Keller, F. in press. Gradience in
linguistic data. To appear in Lingua.