Multiword Expressions Facilitate, not Hinder, Understanding 9 November 2006 - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Multiword Expressions Facilitate, not Hinder, Understanding 9 November 2006

Description:

... But humans recognize and understand linguistic units holistically at multiple levels, not just words Letter, Phoneme, Syllable, Morpheme, Word, ... – PowerPoint PPT presentation

Number of Views:115
Avg rating:3.0/5.0
Slides: 35
Provided by: doub50
Category:

less

Transcript and Presenter's Notes

Title: Multiword Expressions Facilitate, not Hinder, Understanding 9 November 2006


1
Multiword Expressions Facilitate, not Hinder,
Understanding9 November 2006
  • Jerry Ball
  • Senior Research Psychologist
  • Human Effectiveness Directorate
  • Air Force Research Laboratory

2
Multiword Expressions A Pain in the Neck?
  • According to Sag et al. (2002) Multiword
    Expressions (MWEs) are a pain in the neck for
    developing Natural Language Processing (NLP)
    systems
  • MWEs must be handled as exceptions to a
    word-based compositional semantics
  • Meaning of MWEs cannot be determined from
    meanings of individual words composed together
    according to syntax
  • Unfortunately, MWEs are ubiquitous in natural
    language
  • Sag, I., Baldwin, T, Bond, F, Copestake, A. and
    Flickinger, D. (2002). Multiword Expressions A
    Pain in the Neck for NLP. In Proceedings of the
    Third International Conference on Intelligent
    Text Processing and Computational Linguistics

3
Multiword Expressions A Pain in the Neck?
  • Maybe the current word-based compositional
    semantic approach to building NLP systems is
    missing something!
  • Words are the base meaningful units
  • Words are the base units of recognition
  • Meaning of expression is composed from meanings
    of words recognized independently and combined
    syntactically
  • But humans recognize and understand linguistic
    units holistically at multiple levels, not just
    words
  • Letter, Phoneme, Syllable, Morpheme, Word,
    Phrase, Text

4
Identifying Letters in Words
Count the number of F's in the following text
FINISHED FILES ARE THE RESULT OF
YEARS OF SCIENTIFIC STUDY COMBINED WITH
THE EXPERIENCE OF YEARS
5
Identifying Letters in Words
Count the number of F's in the following text
FINISHED FILES ARE THE RESULT OF
YEARS OF SCIENTIFIC STUDY COMBINED WITH
THE EXPERIENCE OF YEARS
6
Identifying Letters in Words
Count the number of F's in the following text
FINISHED FILES ARE THE RESULT OF
YEARS OF SCIENTIFIC STUDY COMBINED WITH
THE EXPERIENCE OF YEARS
7
Composing Words from Letters
  • The word of is recognized holistically
  • of is not recognized by recognizing o and
    recognizing f and combining them to get of
  • Words can be recognized without recognizing the
    individual letters
  • Even when the task is to identify letters, this
    can be difficult for very common words
  • The f in of is perceptually implicit

Healy, A. F. (1976). Detection errors on the word
The Evidence for reading units larger than
letters. Journal of Experimental Psychology
Human Perception Performance, 2, 235-242.
8
Identifying Words
rscheearch ltteer waht lteter oredr wrod pclae
deosn't olny tihs taht frist uinervtisy lsat
rghit rset toatl mttaer mses iprmoetnt raed
aoccdrnig wouthit porbelm cmabrigde ltteers
bcuseae huamn deos raed sitll mnid ervey istlef
tihng wrod wlohe
9
Identifying Words
rscheearch ltteer waht lteter oredr wrod pclae
deosn't olny tihs taht frist uinervtisy lsat
rghit rset toatl mttaer mses iprmoetnt raed
aoccdrnig wouthit porbelm cmabrigde ltteers
bcuseae huamn deos raed sitll mnid ervey istlef
tihng wrod wlohe
10
Identifying Words
rscheearch ltteer waht lteter oredr wrod pclae
deosn't olny tihs taht frist uinervtisy lsat
rghit rset toatl mttaer mses iprmoetnt raed
aoccdrnig wouthit porbelm cmabrigde ltteers
bcuseae huamn deos raed sitll mnid ervey istlef
tihng wrod wlohe
11
Identifying Words
rscheearch ltteer waht lteter oredr wrod pclae
deosn't olny tihs taht frist uinervtisy lsat
rghit rset toatl mttaer mses iprmoetnt raed
aoccdrnig wouthit porbelm cmabrigde ltteers
bcuseae huamn deos raed sitll mnid ervey istlef
tihng wrod wlohe
12
Identifying Words in Context
Aoccdrnig to a rscheearch at Cmabrigde Uinervtisy
it deosn't mttaer in waht oredr the ltteers in a
word are, the olny iprmoetnt tihng is taht the
frist and lsat ltteer be at the rghit pclae. The
rset can be a toatl mses and you can sitll raed
it wouthit porbelm. Tihs is bcuseae the huamn
mnid deos not raed ervey lteter by istlef, but
the wrod as a wlohe.
Rawlinson, G. E. (1976) The significance of
letter position in word recognition. Unpublished
PhD Thesis, Psychology Department, University of
Nottingham, Nottingham UK.
http//www.mrc-cbu.cam.ac.uk/mattd/Cmabrigde/
13
Words in Context are Easier to Recognize
  • It is easier to recognize words whose letters are
    jumbled within an expression than to recognize
    isolated words with jumbled letters
  • toatl
  • a toatl mses
  • More noise in the input can be tolerated when
    recognizing larger units
  • If the linguistic unit cant be recognized, the
    meaning cannot be determined!
  • Larger units facilitate recognition ? larger
    units faciliate understanding

14
Words in Context are Easier to Recognize
  • It is easier to recognize words whose letters are
    jumbled within an expression than to recognize
    isolated words with jumbled letters
  • toatl
  • a toatl mses
  • More noise in the input can be tolerated when
    recognizing larger units
  • If the linguistic unit cant be recognized, the
    meaning cannot be determined
  • Larger units facilitate recognition ? larger
    units faciliate understanding

15
Whats Wrong With Compositional Semantics!
  • Meaning of expression is composed from meaning of
    words recognized independently
  • Meaning of black cat equals meaning of black
    meaning of cat
  • MWEs must be treated as exceptions
  • Meaning of black ice does not equal meaning of
    black meaning of ice
  • black ice is actually clear, not black!
  • Why not recognize the largest units of meaning
    and simplify the problem!
  • Dont treat MWEs as exceptions

16
High Frequency Words
  • The meaning of high frequency words like take
    and have cannot be determined in isolation from
    the expressions in which they occur
  • Take take for instance
  • Take a hike
  • Take five
  • Take place
  • Have a blast
  • Dont have a cow
  • Have at it

17
High Frequency Words
  • Why are high frequency words the most ambiguous?
  • It isnt possible to have a separate word for
    every concept that may need to be expressed
  • Some words must be used in the expression of
    multiple concepts
  • The words used in the expression of multiple
    concepts are necessarily ambiguous and tend to be
    high frequency

18
Syllables, Morphemes or Words?
  • Irrelevant, but possible words or morphemes
    within words are better recognized as meaningless
    syllables
  • It does not make sense to try to compose the
    meaning of carpet from the meanings of car
    and pet!
  • How do we avoid recognizing car and pet as
    meaningful?
  • Words in MWEs often function more like
    meaningless syllables than independent meaningful
    units!
  • The meanings of ad and hoc in ad hoc
  • Although ad and hoc have meanings in Latin
  • The meaning of blue in blue moon
  • Even if the meaning of blue is initially
    activated by blue, it is not part of the
    meaning of blue moon

19
Syllables, Morphemes, Words or Expressions?
  • No sharp divide between syllables, morphemes,
    words and expressions
  • nonetheless vs. none the less
  • Is none a syllable or morpheme in nonetheless
    or a word in none the less?
  • whatever vs. what ever
  • alot vs. a lot
  • whatchamacallit vs. what do you call it

20
What are Acronyms?
  • Acronyms are MWEs that are perceptually
    re-encoded as a sequence of letters (written) or
    syllables corresponding to letters (spoken)!
  • AFMC vs. Air Force Materiel Command
  • Acronyms allow a single perceptual unit to encode
    an entire MWE!
  • Overcome limitations of visual and aural
    perceptual span

21
Frequency of Multiword Expressions
  • Conventionalized Expressions
  • We say baked potato and roast beef not baked
    beef or roast potato (although roast
    potatoes is OK)
  • 25 of expressions are conventionalized ways of
    saying things!
  • Erman, B. Warren, B. (1999). The idiom
    principle and the open choice principle. Text Vol
    20 pp. 29-62
  • Formulaic Language
  • As much as 70 of our adult native language may
    be formulaic! (Altenberg, 1990)
  • Wray, A. Perkins, M. (2000). The functions of
    formulaic language an integrated model. Language
    Communication 20, pp. 1-29
  • Altenberg, B (1990). Speech as linear
    composition. Proceedings of the Fourth Nordic
    Conference for English Studies

22
Frequency of Multiword Expressions
  • The number of MWEs in a speakers lexicon is of
    the same order of magnitude as the number of
    single words
  • Jackendoff, J. (1997). The Architecture of the
    Language Faculty. Cambridge, MA The MIT Press.
  • In WordNet, 41 of entries are multiword
  • The number of MWEs increases in specialized
    domains and acronymns are ubiquitous!
  • AFRL vs. Air Force Research Laboratory
  • BRAA vs. Bearing, Range, Altitude Aspect
  • Sag, I., Baldwin, T, Bond, F, Copestake, A. and
    Flickinger, D. (2002). Multiword Expressions A
    Pain in the Neck for NLP. In Proceedings of the
    Third International Conference on Intelligent
    Text Processing and Computational Linguistics

23
Processing Efficiency
  • There really isnt time to process spoken input
    one word at a time
  • Word-based compositionality is computationally
    too expensive
  • Even if each word in a 20 word sentence has only
    3 meanings (on average), there are 203 possible
    combinations!
  • Extensive search is not a cognitively viable
    option
  • There must be constraints that minimize the
    number of alternatives
  • MWEs offer one such constraint
  • MWEs are directly retrievable from memory
    reducing the amount of processing required to
    determine meaning

24
Processing Efficiency
  • Humans can recognize letters in words more
    rapidly than letters in isolation
  • Word Superiority Effect
  • Can humans recognize words in MWEs more rapidly
    than recognizing words in isolation?
  • Multiword Superiority Effect?
  • Suggested by our ability to complete unfinished
    MWEs without seeing or hearing the entire final
    word
  • kicked the bu
  • spill the b
  • Suggested by the Cambridge Study example

25
Processing Efficiency
  • Perceptual processing is constrained by the
    visual perceptual span in reading and the size of
    the phonological buffer in speech
  • Mechanisms that shorten the visual and aural span
    should facilitate processing
  • Mechanisms that link perceptual units to larger
    units of meaning should facilitate processing

26
Processing Efficiency
  • Acronyms and abbreviations support efficient
    processing
  • HE vs. Human Effectiveness Directorate
  • AFRL/HE vs. Air Force Research Laboratory
  • They achieve this by associating a perceptual
    unit with a larger unit of meaning
  • HE is perceived as a unit
  • HE is stored as a unit and linked to Human
    Effectiveness Directorate which is also stored
    as a unit
  • Sometimes the original expression is lost or
    modified
  • AOC vs. Air and Space Operations Center
  • RADAR vs. ??? (Radio Detection and Ranging)

27
Processing Efficiency
  • Recognition of larger units competes with
    recognition of smaller units
  • If larger unit is recognized first, smaller unit
    remains implicit unless task requires accessing
    smaller unit
  • Recognition of smaller units of meaning is
    detrimental to understanding in many cases!
  • Irrelevant meanings
  • car in carpet
  • a in a priori
  • Literal interpretation of non-literal language
  • Have a nice day!
  • I wasnt going to, but if you say so!

28
Processing Efficiency
  • MWE Storage
  • Humans have a powerful associative memory
  • Storage of frequently occurring MWEs is
    psychologically plausible
  • MWE Perception
  • MWEs may be holistically perceivable
  • Perhaps in single fixation when reading
  • Advantage of acronyms and abbreviations in
    English
  • In written Hebrew, only consonants are written
    which should facilitate recognition of MWEs
  • Via some concatenation mechanism in speech

29
Why MWEs are good!
  • The larger the linguistic unit, the less likely
    to be ambiguous
  • The larger the linguistic unit, the less
    susceptible to noise
  • The larger the linguistic unit, the more rapidly
    it can be recognized relative to individually
    recognizing the lower level elements of the unit
  • Bigger is better!

30
Summary
  • Humans have little difficulty understanding MWEs
  • NLP systems should be designed to handle MWEs as
    part and parcel of what they do, not treat them
    as exceptions that are a pain in the neck!
  • The result will be better NLP systems!

31
Questions?
32
Perceiving Larger Linguistic Units
  • Phonologic Loop
  • 2 seconds of spoken input
  • Baddeley, A. (???)
  • Visual fixations
  • 4 letters to left of fixation
  • 9 letters to right of fixation
  • Carpenter, Just (???).

33
Storing Larger Linguistic Units
  • Long-Term Memory
  • About 4 Distinct Units in a Single Declarative
    Memory Chunk
  • Hierarchically organized
  • No limit to depth of hierarchy
  • Short-Term Working Memory
  • Phonologic Loop
  • Visuo-Spatial Sketch Pad

34
Change in Meaning ? Change in Form
  • Changes in meaning often result in changes in
    form
  • Grammaticization Processes
  • going to ? gonna
  • want to ? wanna
  • Bybee (2001) explains the processes of reduction
    and drift by which frequently co-occurring words
    come to have unique phonology (i.e. perceptual
    form) and meaning
  • Bybee, J. (2001). Phonology and Language Use.
    Cambridge, UK Cambridge University Press
  • Specialized Uses Lead to Specialized
    Pronunciation
  • Whatever used as a negative response
  • Bad used to mean Good
Write a Comment
User Comments (0)
About PowerShow.com