NLP 2?? ?? - PowerPoint PPT Presentation

About This Presentation
Title:

NLP 2?? ??

Description:

see(Mark-Twain[SSN:...],Tom-Sawyer[SSN:...])[Time:bef 99/9/27/14:15][Place:39s19' ... new newly, interesting interestingly. non-derived adverbs: ... – PowerPoint PPT presentation

Number of Views:713
Avg rating:3.0/5.0
Slides: 72
Provided by: Hoo1
Category:
Tags: nlp | about | facts | interesting | mark | twain

less

Transcript and Presenter's Notes

Title: NLP 2?? ??


1
NLP 2?? ??
  • Linguistic Essentials
  • (Ch 3)

2
Competence and Performance
  • Innate ? Learning, Categorical ? Statistical
  • CFG (Context free grammar)
  • Performance

3
The Description of Language
  • Grammar
  • set of rules which describe what is allowable in
    a language
  • Classic Grammars (Quirk et al.)
  • meant for humans who know the language
  • definitions and rules are mainly supported by
    examples
  • no (or almost no) formal description tools
    cannot be programmed
  • Explicit Grammar (CFG, LFG, GPSG, HPSG,
    Dependency Grammars, Link Grammars,...)
  • formal description
  • can be programmed tested on data (texts)

4
Levels of (Formal) Description
  • 6 basic levels (more or less explicitly present
    in most theories)
  • and beyond (pragmatics/logic/...)
  • meaning (semantics)
  • (surface) syntax
  • morphology
  • phonology
  • phonetics/orthography
  • Each level has an input and output representation
  • output from one level is the input to the next
    (upper) level
  • sometimes levels might be skipped (merged) or
    split

5
Phonetics/Orthography
  • Input
  • acoustic signal (phonetics) / text (orthography)
  • Output
  • phonetic alphabet (phonetics) / text
    (orthography)
  • Deals with
  • Phonetics
  • consonant vowel ( others) formation in the
    vocal tract
  • classification of consonants, vowels, ... in
    relation to frequencies, shape position of the
    tongue and various muscles in the vocal t.
  • intonation
  • Orthography normalization, punctuation, etc.

6
Phonology
  • Input
  • sequence of phones/sounds (in a phonetic
    alphabet) or normalized text (sequence of
    (surface) letters in one languages alphabet) NB
    nota bene (note well) phones vs. phonemes
  • Output
  • sequence of phonemes ( (lexical) letters in an
    abstract alphabet)
  • Deals with
  • relation between sounds and phonemes (units which
    might have some function on the upper level)
  • e.g. u oo (as in book), æ a (cat) i y
    (flies)

7
Morphology
  • Input
  • sequence of phonemes ( (lexical) letters)
  • Output
  • sequence of pairs (lemma, (morphological) tag)
  • Deals with
  • composition of phonemes into word forms and their
    underlying lemmas (lexical units) morphological
    categories (inflection, derivation, compounding)
  • e.g. quotations quote/V -ation(der.V-gtN)
    NNS.

8
(Surface) Syntax
  • Input
  • sequence of pairs (lemma, (morphological) tag)
  • Output
  • sentence structure (tree) with annotated nodes
    (all lemmas, (morphosyntactic) tags, functions),
    of various forms
  • Deals with
  • the relation between lemmas morph. categories
    and the sentence structure
  • uses syntactic categories such as Subject, Verb,
    Object,...
  • e.g. I/PP1 see/VB a/DT dog/NN
  • ((I/sg)SB ((see/pres)V
    (a/ind dog/sg)OBJ)VP)S

9
Meaning (semantics)
  • Input
  • sentence structure (tree) with annotated nodes
    (lemmas, (morphosyntactic) tags, surface
    functions)
  • Output
  • sentence structure (tree) with annotated nodes
    (autosemantic -has meaning in isolation - lemmas,
    (morphosyntactic) tags, deep functions)
  • Deals with
  • relation between categories such as Subject,
    Object and (deep) categories such as Agent,
    Effect adds other cats
  • e.g. ((I)SB ((was seen)V (by Tom)OBJ)VP)S
  • (I/Sg/Pat/t
    (see/Perf/Pred/t) Tom/Sg/Ag/f)

10
...and Beyond
  • Input
  • sentence structure (tree) annotated nodes
    (autosemantic lemmas, (morphosyntactic) tags,
    deep functions)
  • Output
  • logical form, which can be evaluated (true/false)
  • Deals with
  • assignment of objects from the real world to the
    nodes of the sentence structure
  • e.g. (I/Sg/Pat/t (see/Perf/Pred/t) Tom/Sg/Ag/f)
  • see(Mark-TwainSSN...,Tom-SawyerSSN...)Ti
    mebef 99/9/27/1415Place39s1940N76s3710W

11
Phonology
  • (Surface lt-gt Lexical) Correspondence
  • symbol-based (no complex structures)
  • En. (stem-final change)
  • lexical b a b y s ( denotes start of
    ending)
  • surface b a b i e s (phonetic-related bébì0s)
  • Arabic (interfixing, inside-stem doubling) (lit.
    read)
  • lexical kTbuuCVCCVC (CVCC...vowel/consonant
    pattern)
  • surface kuttub

12
Phonology Examples
  • German (umlaut) (satz sentence)
  • lexical s A t z e (A denotes umlautable a)
  • surface s ä t z e (phonetic zæc?, vs. zac)
  • Turkish (vowel harmony)
  • lexical e v l A r (?houses) b a š l
    A r
  • surface e v l e r (heads?) b a š l
    a r
  • Czech (e-insertion palatalization)
  • lexical m a t E K 0 (lt-mothers/gen.) m a t E
    K e
  • surface m a t e k (mother/dat. ?) m a t
    c e

13
Parts of Speech and Morphology
  • Parts of Speech correspond to syntactic or
    grammatical categories such as noun, verb,
    adjective, adverb, pronoun, determiner,
    conjunction, and preposition.
  • Word categories are systematically related by
    morphological processes such as the formation of
    plural form from the singular form.
  • The major types of morphological processes are
    inflection, derivation and compounding.

14
Parts of Speech
  • Correspond to syntactic or grammatical categories
    such as noun, verb, adjectives, prepositions.
  • Word categories are systematically related by
    morphological processes such as the formation of
    plural form from the singular form, past tense
    from present tense.

15
The Parts of Speech
  • Noun Refer to entities like people, places,
    things or idea.
  • Pronoun words that take the place of nouns.
  • Proper noun names.
  • Determiner describes the particular action in a
    noun.
  • Adjective describes the properties of nouns or
    pronouns.
  • Verb action in a sentence.
  • Adverb describes a verb, an adjective or
    another adverb.
  • And many more

16
POS Labeling
  • Children (NOUN) eat (VERB) sweet(ADJECTIVE)
    candy(NOUN)
  • The(ARTICLE) children(NOUN) ate(VERB)
    the(ARTICLE) cake(NOUN)
  • The(ARTICLE) news(NOUN) has(AUXILIARY)
    been(MAIN VERB) quite(ADVERB)
    sad(ADJECTIVE) in(PREPOSITION) fact(NOUN)
    .(PERIOD)

17
Morphology Morphemes Order
  • Handles what is an isolated form in written text
  • Grouping of phonemes into morphemes
  • sequence deliverables ? deliver, able and s (3
    units)
  • could as well be some ID numbers
  • e.g. deliver 23987, s 12, able 3456
  • Morpheme Combination
  • certain combinations/sequencing possible, other
    not
  • deliverables, but not ablederives nouns,
    but not nouning
  • typically fixed (in any given language)

18
Morphology From Morphemes to Lemmas Categories
  • Lemma lexical unit, pointer to lexicon
  • might as well be a number, but typically is
    represented as the base form, or dictionary
    headword
  • possibly indexed when ambiguous/polysemous
  • state1 (verb), state2 (state-of-the-art), state3
    (government)
  • from one or more morphemes (root, stem,
    rootderivation, ...) (derivation vs.
    inflection)
  • Categories non-lexical
  • small number of possible values (lt 100, often lt
    5-10)

19
Morphology Level The Mapping
  • Formally A ? 2(L,C1,C2,...,Cn)
  • A is the alphabet of phonemes (A denotes any
    non-empty sequence of phonemes)
  • L is the set of possible lemmas, uniquely
    identified
  • Ci are morphological categories, such as
  • grammatical number, gender, case
  • person, tense, negation, degree of comparison,
    voice, aspect, ...
  • tone, politeness, ...
  • part of speech (not quite morphological category,
    but...)
  • 2(L,C1,C2,...,Cn) denotes the power set of
    (L,C1,C2,...,Cn)
  • A, L and Ci are obviously language-dependent

20
The Dictionary (or Lexicon)
  • Repository of information about words
  • Morphological
  • description of morphological behavior
    inflection patterns/classes
  • Syntactic
  • Part of Speech
  • relations to other words
  • subcategorization (or surface valency frames)
  • Semantic
  • semantic features
  • valency frames
  • ...and any other! (e.g., translation)

21
The Categories Part of Speech Open and Closed
Categories
  • Part of Speech - POS (pretty much stable set
    across languages)
  • not so much morphological (can be looked up in a
    dictionary), but
  • morphological behavior is typically consistent
    within a POS category
  • Open categories (open to additions)
  • verb, noun, pronoun, adjective, numeral, adverb
  • subject to inflection (in general) subject to
    cross-category derivations
  • newly coined words always belong to open POS
    categories
  • potentially unlimited number of words
  • Closed categories
  • preposition, conjunction, article, interjection,
    clitic, particle
  • not a base for derivation (possibly only by
    compounding)
  • finite and (very) small number of words

22
The Categories Part of Speech,Open Categories
Verbs
  • Verbs
  • infl. categories person, number, tense, voice,
    aspect, gender, neg., ...
  • syntactic/semantic classification
  • ordinary (to) speak, (to) write
  • auxiliaries be, have, will, would, do, go
    (going)
  • modals can, could, may, should, must, want
  • phasal begin, end, start
  • morphological classification
  • conjugation type regular/irregular, (Ge.
    weak/strong/irregular)
  • conjugation class (Cz. 5 classes 100
    combinations)

23
The Categories Part of Speech,Open Categories
Nouns
  • Nouns infl. categories number, gender, case,
    negation, ...
  • semantic classification
  • human/animal/(non-living) things
    driver/bird/stone
  • concrete/abstract computer/thought
  • common/proper table/Hopkins
  • syntactic classification countable/unc. book,
    water
  • morphological classification
  • pluralia/singularia tantum data (is), police
    (are)
  • declension type (pattern or class) (Cz. 14
    basic patterns, plus deviations 300 patterns,
    irregular inflection)
  • adverbial nouns afternoon, home, east (no
    inflection)

24
The Categories Part of Speech,Open Categories
Pronouns
  • Pronouns infl. categories number, gender, case,
    negation person
  • much like nouns (syntactic usage also similar)
  • (pro)noun stands for a noun
  • classification (mostly syntactic/semantic)
  • personal I, you, she, she, it, we, you, they
  • demonstrative this, that
  • possessive my, your, her, his, its, our, their
    mine, yours, ours,...
  • reflexive myself, yourself, herself,..., oneself
  • interrogative what, which, who, whom, whose,
    that
  • indefinite (nominal) somebody, something, one
  • morphological classification mostly
    idiosyncratic pattern

25
The Categories Part of Speech,Open Categories
Adjectives
  • Adjectives
  • infl. categories degree of comp., number,
    gender, case, negation
  • classification
  • ordinary new, interesting, test (equipment)
  • possessive Johns, drivers
  • proper Appalachian (Mountains)
  • often derived from verbs/nouns teaching
    (assistant), trendy, stylish
  • morphological classification
  • mostly regular declension (Cz. 4 basic patterns,
    10 total)
  • degrees of comparison (En. big, bigger, biggest)
  • but large number of forms (agreement, cf.
    section on syntax)

26
The Categories Part of Speech,Open Categories
Adverbs
  • Adverbs infl. categories degree of comp.,
    negation
  • open cat. regular derivation from adjectives
    common
  • new ? newly, interesting ? interestingly
  • non-derived adverbs
  • ordinary so, well, just, too, then, often, there
  • wh-adverbs (interrogative) why, when, where, how
  • degree adverbs/qualifiers very, too
  • morphological classification (not much,
    really...)
  • degree of comparison well, better, best
  • soon, sooner (other lang. all 3 degrees regular)

27
The Categories Part of Speech,Open Categories
Numerals
  • Numerals infl. categories number, gender, case,
    negation
  • open cat. compounding (Ge. einundzwanzig, 21)
  • classification
  • cardinals one, five, hundred
  • NB million etc. often considered noun
  • ordinals/fractionals first, second, thirtieth
  • quantifiers all, many, some, none
  • multiplicative times, twice (Cz.
    dvaadvacetkrát, 22-times)
  • multilateral single, triple, twofold
  • morphological classification as
    nouns/adjectives many irreg.

28
The Categories Part of Speech, Closed Categories
  • Closed categories preposition, conjunction,
    article, interjection, clitic, particle
  • Morphological behavior indeclinable (no
    declension, no conjugation)
  • preposition of, without, by, to
  • conjunction
  • coordinating and, but, or, however
  • subordinating that, if, because,
    before, after, although, as
  • article a, the
  • interjection wow, eh, hello
  • clitic s may be attached to whole phrases (at
    the end)
  • particle yes, no, not to (verb)
  • many (otherwise) prepositions if part of phrasal
    verbs, e.g. (look) up

29
The Categories Number and Gender
  • Grammatical Number Singular, Plural
  • nouns, pronouns, verbs, adjectives, numerals
  • computer / computers (he) goes / (they) go
  • In some languages (Czech) Dual (nouns, pronouns,
    adjectives)
  • (Pl.) nohami / (Dl.) nohama (Cz. (by) legs (of
    sth)/(by) legs (of sb))
  • Grammatical Gender Masculine, Feminine, Neuter
  • nouns, pronouns, verbs, adjectives, numerals
  • he/she/it ?????, ??????, ?????? (Ru.
    (he/she/it) was-reading)
  • nouns (mostly) do not change gender for a single
    lexical unit
  • Also animate/inanimate (gram., some genders),
    etc.
  • Mädchen (Ge. girl, neuter) deti (Cz. children,
    masc. inanim.)

30
The Categories Case
  • Case
  • English only personal pronouns/possessives, 2
    forms
  • other languages 4 (German), 6 (Russian), 7
    (Czech,Slovak,...)
  • nouns, pronouns, adjectives, numerals
  • most common cases (forms in singular/plural)
  • nominative I/we (work)
    tøída/tøídy (Cz. class)
  • genitive (picture of) me/us tøídy/tøíd
  • dative (give to) me/us
    tøíde/tøídám
  • accusative (see) me/us
    tøídu/tøídy
  • vocative -/- tøído/tøídy
  • locative (about) me/us tøíde/tøídách
  • instrumental (by) me/us
    tøídou/tøídami

31
The Categories Person, Tense
  • Person
  • verbs, personal pronouns
  • 1st, 2nd, 3rd (I) go, (you) go, (he) goes (we)
    go, (you) go, (they) go
  • jdu, jdeš, jde,
    jdeme, jdete, jdou (Cz.)
  • Tense (Cz. go) (Pol. go)
  • past (you) went -
    szliœcie
  • present (you pl.) go jdete
    idziecie
  • future (!if not analytical) -
    pùjdete -
  • concurrent (gerund) going jda
    idac
  • preceding -
    - sze³szy

32
The Categories Person, Tense
  • Person
  • verbs, personal pronouns
  • 1st, 2nd, 3rd (I) go, (you) go, (he) goes (we)
    go, (you) go, (they) go
  • jdu, jdeš, jde,
    jdeme, jdete, jdou (Cz.)
  • Tense (Cz. go) (Pol. go)
  • past (you) went -
    szliœcie
  • present (you pl.) go jdete
    idziecie
  • future (!if not analytical) -
    pùjdete -
  • concurrent (gerund) going jda
    id¹c
  • preceding -
    - szed³szy

33
Note on Tense
  • Grammars more (syntactic/sematnic) tenses
  • but morphology handles isolated words ? some
    tenses can be defined handled only at an upper
    level (surface syntax)
  • Examples of (traditional) tense (synthetical and
    analytical)
  • infinitive (to) write (tenseless, personless,
    ..., except negation (Cz.))
  • simple present/past (I) write/(she) writes
    (I,she) wrote
  • progressive present/past (I) am writing (I) was
    writing
  • perfect present/past (I) have written (I) had
    written
  • all in passive voice (cf. later), too
  • (the book) is being/has been/had been written
    etc.
  • all in conditional mood, too (mood in Eng. not a
    morph. category!)
  • (the book) would have been written

34
The Categories Voice Aspect
  • Voice
  • active vs. passive
  • (I) drive / (I am being) driven
  • (Ich) setzte (mich) / (Ich bin) gesetzt (Ge. to
    sit down)
  • Aspect
  • imperfective vs. perfective
  • ?o????? / ????? (Ru. I used to buy, I was
    buying) / I (have) bought)
  • imperfective continuous vs. iterative (repeating)
  • spal / spával (Cz. I was sleeping / I used to
    sleep (every ...))

35
The Categories Negation, Degree of Comparison
  • Negation
  • even in English impossible ( not possible)
  • Cz every verb, adjective, adverb, some nouns
    prefix ne-
  • Degree of Comparison (non-analytical)
  • adjectives, adverbs
  • positive (big), comparative (bigger), superlative
    (biggest)
  • Pol. (new) nowy, nowszy, najnowszy
  • Combination (by prefixing)
  • order? both possible (neg. Cz./Pol. ne-/nie-,
    sup. nej-/naj-)
  • Cz. nejnemo?nìjší (the most impossible)
  • Pol. nienajwierniejszy (the most unfaithful)

36
Typology of Languages
  • By morphological features
  • Analytical using (function) words to express
    categories
  • English, also French, Italian, ..., Japanese,
    Chinese
  • I would have been going (Pol.) szlabym
  • Inflective using prefix/suffix/infix, combines
    several categ.
  • Slavic Czech, Russian, Polish,... (not
    Bulgarian) also French, German Arabic
  • (Cz. new(acc.)) novou (Adj, Fem., Sg., Acc.,
    Non-neg., Pos.)
  • Agglutinative one category per (non-lexical)
    morpheme
  • Finnish, Turkish, Hungarian
  • (Fin. plural) -i-

37
Categories Tags
  • Tagset
  • list of all possible combinations of category
    values for a given language
  • T Ì C1?C2?... ?Cn
  • typically string of letters digits
  • compact system short idiosyncratic
    abbreviations
  • NNS (gen. noun, plural)
  • positional system each position i corresponds to
    Ci
  • AAMP3----2A---- (gen. Adj., Masc., Pl., 3rd case
    (dative), comparative (2nd degree of comparison),
    Affirmative (no negation))
  • tense, person, variant, etc. N/A (marked by
    empty position, or -)
  • Famous tagsets Brown, Penn, Multext-East, ...

38
Words Syntactic Functions
  • Typically, nouns refer to entities in the world
    like people, animals and things.
  • Determiners describe the particular reference of
    a noun and adjectives describe the properties of
    nouns.
  • Verbs are used to describe actions, activities
    and states.
  • Adverbs modify a verb in the same way as
    adjectives modify nouns. Prepositions are
    typically small words that express spatial or
    time relationships. Prepositions can also be used
    as particles to create phrasal verbs.
    Conjunctions and complementizers link two words,
    phrases or clauses.

39
Syntax or Phrase Structure A simplecontext-free
grammar
  • S --gt NP VP
  • NP --gt AT NNS AT NN NP PP
  • VP --gt VP PP VBD VBD NP
  • P --gt IN NP
  • AT --gt the
  • NNS --gt children students mountains
  • VBD --gt slept ate saw
  • IN --gt in of
  • NN --gt cake

The Grammar
The Lexicon
40
Syntax or Phrase Structure A Parse Tree
41
A Simple Context-Free Grammar
  • The Grammar rules
  • S -gt NP V
  • NP -gt N
  • The Lexicon
  • N -gt John, Gaurav, Ram
  • V -gt walks, talks, eats, went ..

42
Tag Sets
  • A tag indicates the various conventional parts of
    speech.
  • Different Tag Sets have been used E.g., Brown
    Tag Set, Penn Treebank Tag Set.
  • Tag examples NP Proper noun, NN Singular noun,
    AT Article, DET Determinant.

43
Stochastic Grammars
  • Grammars obtained by adding probabilities in a
    fairly transparent way to algebraic (i. e.,
    non-probabilistic) grammars.
  • Stochastic grammars supplement underlying
    algebraic grammars.

44
Dependencies
  • Local Dependency dependence between two words
    expressed within the same syntactic rule.
    (n-grams model this well)
  • Non-local dependency is an instance in which two
    words can be syntactically dependent even though
    they occur far apart in a sentence.

45
Ambiguities
  • Children eat sweet candy
  • Too much boiling will candy the molasses
  • In sentence (1) candy is a noun while in (2) it
    is an adjective.
  • Word category (POS) ambiguity needs to be
    resolved.

46
Ambiguities (Cont.)
  • Semantic Roles Determining thematic roles in a
    sentence.
  • Agent, Patient, Experiencer, Instrument, Goal .
  • Raju(AGENT) hit us (PATIENT) with a ball
    (INSTRUMENT).
  • Complicated by the notions of direct and indirect
    object, active and passive voice.

47
Ambiguities (Cont.)
  • Attachment ambiguities occur with phrases that
    could have been generated by two different nodes
    in the parse tree. E.g. saw the man in the house
    with a pole.
  • Rare Usage and spurious usage A hectare is a
    hundred ares.

48
Garden-Path Sentences
  • Garden-Path sentences are sentences that lead you
    along a path that suddenly turns out not to work.

    E.g. The horse raced past the barn fell.

49
Local and Non-Local Dependencies
  • A local dependency is a dependency between two
    words expressed within the same syntactic rule.
  • A non-local dependency is an instance in which
    two words can be syntactically dependent even
    though they occur far apart in a sentence (e.g.,
    subject-verb agreement long-distance
    dependencies such as wh-extraction).
  • Non-local phenomena are a challenge for certain
    statistical NLP approaches (e.g., n-grams) that
    model local dependencies.

50
The Place of Syntax
  • Between Morphology and Meaning
  • Morphology provides/expects
  • lemmas (now its time to extract syntactic
    information from a dictionary)
  • tags (Part-of-Speech and combination of
    morphological categories, such as number, case,
    tense, voice, ...)
  • and of course, we also have word order now to
    look at/provide
  • Typically multiple input (non-disambiguated
    morphology) / output (multiple syntactic
    structures, non-disambiguated)

51
Words, Phrases, Clauses, Sentences
  • Words
  • smallest units on the syntax level
  • function/autosemantic
  • Phrases
  • consist of words and/or phrases constituents
  • Clauses
  • have predicative meaning (single predicate)
  • Sentences
  • consist of clauses (one or more)

52
Words
  • Words
  • lexical units
  • auxiliary (function) words have grammatical
    function
  • autosemantic words (lexical words)
  • idioms
  • fixed phrases (non-compositional) -gt words
  • Relate to other words
  • dictionary repository of information for each
    words about its (idiosyncratic) relations to
    other words

53
Phrases
  • Phrases
  • sequences of words and/or phrases (i.e. of
    constituents)
  • may be discontinuous, sometimes
  • Types of Phrases
  • Simple/Clausal (i.e. clauses, which consist of
    phrases, behave like phrases... recursively!)
  • According to head type
  • Noun a new book
  • Adjective brand new
  • Adverbial so much
  • Prepositional in a class
  • Verb catch a ball

54
Noun Phrases
  • Head noun
  • water
  • a book
  • new ideas
  • that small village
  • The greatest rise of interest rates since W.W.II
    within a single year
  • an operating system which, despite great efforts
    on the part of our administrators, fails all too
    often

55
Adjective Phrases
  • Head adjective
  • Simple APs very common, complex APs rare
  • old
  • very old
  • really very old
  • five times older than the oldest elephant in our
    ZOO
  • (was) sure, as far as I know, to be there first

56
Adverbial and Numerical Phrases
  • Head adverb
  • three times as much
  • quickly
  • really
  • (... speaks) more loudly than anybody could
    imagine
  • yesterday
  • Numerical Phrases
  • (... lasted) three hours
  • twenty-two

57
Prepositional Phrases
  • Head preposition
  • In fact, play the role of Adverbial Phrases often
  • in the City
  • at five oclock
  • to a brightest future
  • without a glitch
  • to the point where neither of them could get out
    of it
  • up to five points
  • instead of Charles

58
Verb Phrases
  • Head verb
  • (It) rains
  • ... could ever see a large Unidentified Flying
    Object
  • ..., why (we) have got so much rain
  • Please!
  • On Sunday, (he) was driven to the hospital
  • (It) began to snow
  • (...) prohibits smoking in this area

59
Coordination of Phrases
  • Head conjunction, punctuation
  • and, or, but
  • cats and dogs
  • new or even newer
  • quickly and precisely
  • he came to the conclusion that it makes no sense
    to hide himself anymore and therefore we could
    hear him today
  • (trains) from and to Baltimore
  • eat your lunch now or at the picnic table

60
Ellipsis
  • Word or Phrase missing where one would normally
    expect one often happens in dialogues
  • Whom did you see there?
  • Peter. ?? verb ??
  • Most common in coordination (written text)
  • Pittsburgh leads 4-0 but Detroit only 3-1.
    ??verb in 2nd part??
  • Systematic in many languages pro-drop (leave out
    a pers. pronoun in the Subject position)
  • She Passed the exam easily.

61
Clauses
  • Predicative function
  • some activity of some subjects/objects, somewhere
    in time, under certain circumstances
  • Main clause
  • not part of a greater clause
  • Embedded clause
  • part of other clause, having some function (like
    a phrase)
  • Function of a Clause
  • same as for phrase, plus some (direct
    speech/discourse etc.)

62
Gaps (Non-Continuous Constituents)
  • Constituent moves from the expected position
  • happens in questions and relative clauses
  • Who(m) do you work for ltgapgtwhom?
  • strictly speaking, do you work should be you (do
    work)
  • I dont know why we have got so much rain
    ltgapgtwhy?
  • On Sundays, I usually work ltgapgtOn Sundays but I
    stay home on Tuesdays.
  • The story he never wrote ltgapgtthe story
  • And finally the car she was supposed to use
    ltgapgtthe car for her trip to New York broke.
  • The last two also could be considered ellipsis
    (which) plus a gap.

63
Sentences
  • Consist of a single or several main clauses
  • If several main clauses
  • coordination, much like coordinated phrases
  • more coordinating conjunctions
  • and, or, but, (and) therefore, ...
  • In written text, starts with a capital letter
  • Ends by period/question mark/exclamation mark
  • not all periods end a sentence!
  • Sometimes even semicolon () might be a sentence
    break (...vague)

64
Syntax Representation
  • Tree structure (tree in the sense of graph
    theory)
  • one tree per sentence
  • Two main ideas for the shape of the tree
  • phrase structure ( derivation tree, cf. parsing
    later)
  • using bracketed grouping
  • brackets annotated by phrase type
  • heads (often) explicitly marked
  • dependency structure (lexical relations local,
    functions)
  • basic relation head (governor) - dependent
  • links (edges) annotated by syntactic function
    (Sb, Obj, ...)
  • phrase structure implicitly present (but 1n
    mapping Dep?PS)

65
Phrase Structure Tree
  • Example

66
Dependency Tree
  • Example
  • rosePred(sharesSb(DaimlerChryslersAtr),eightsAdv(
    threeAtr),toAuxP(22Adv))

67
Semantic Roles
  • Most commonly, noun phrases are arguments of
    verbs. These arguments have semantic roles the
    agent of an action, the patient and other roles
    such as the instrument or the goal.
  • In English, these semantic roles correspond to
    the notions of subject and object.
  • But things are complicated by the notions of
    direct and indirect object, active and passive
    voice.

68
Subcategorization
  • Different verbs can relate different numbers of
    entities transitive versus intransitive verbs.
  • Tightly related verb arguments are called
    complements but less tightly related ones are
    called adjuncts. Prototypical examples of
    adjuncts tell us time, place, or manner of the
    action or state described by the verb.
  • Verbs are classified according to the type of
    complements they permit. This called
    subcategorization. Subcategorizations allow to
    capture syntactic as well as semantic
    regularities.

69
Attachment Ambiguity and Garden-Path Sentences
  • Attachment ambiguities occur with phrases that
    could have been generated by two different nodes
    in the parse tree.The child ate the cake with a
    spoon.
  • Genuinely ambiguous Fruit flies like a banana.
  • Garden-Path sentences are sentences that lead
    along a path that suddenly turns out not to
    work.The horse raced past the barn fell.

70
Semantics
  • Semantics is the study of the meaning of words,
    constructions, and utterances.
  • Semantics can be divided into two parts lexical
    semantics and combination semantics.
  • Lexical semantics hypernymy, hyponymy, antonymy,
    meronymy, holonymy, synonymy, homonymy, polysemy,
    and homophony.
  • Compositionality the meaning of the whole often
    differs from the meaning of the parts.
  • Idioms correspond to cases where the compound
    phrase means something completely different from
    its parts.

71
Pragmatics
  • Pragmatics is the area of studies that goes
    beyond the study of the meaning of a sentence and
    tries to explain what the speaker really is
    expressing.
  • Understand the scope of quantifiers, speech acts,
    discourse analysis, anaphoric relations.
  • The resolution of anaphoric relations is crucial
    to the task of information extraction.
Write a Comment
User Comments (0)
About PowerShow.com