CC384 Natural Language Engineering - PowerPoint PPT Presentation

Loading...

PPT – CC384 Natural Language Engineering PowerPoint presentation | free to view - id: f0bc0-ZDc1Z



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

CC384 Natural Language Engineering

Description:

E.g., anaphora MOST STUDENTS SETTING OFF ON GAP YEAR TRIPS WILL NEED THEIR ... Anaphora. John arrived late. He always does that. My car didn't start this morning. ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 36
Provided by: massimo90
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: CC384 Natural Language Engineering


1
CC384 - Natural Language Engineering
  • Linguistic Essentials
  • What NLE systems do

2
This lecture
  • An overview of basic linguistic terms and
    concepts used in the module
  • A basic architecture for NL systems

3
Levels of Linguistic Analysis
  • Word level
  • Parts of speech DOG, EAT, RED
  • Sub-word level
  • Phonetics, Phonology
  • Morphology
  • Phrase level (syntax) THE RED DOG, CUTTING
    CORNERS, BY 3 OCLOCK
  • Semantics
  • E.g., lexical semantics COURSE / MODULE, DOG /
    ANIMAL
  • Discourse
  • E.g., anaphora MOST STUDENTS SETTING OFF ON GAP
    YEAR TRIPS WILL NEED THEIR MONEY AND POSSESSIONS
    TO FIT SECURELY AND COMPACTLY INTO A RUCKSACK
    AND MONEYBELT

4
Words Parts of speech
  • Words belong to different classes
  • The sad / sheep / run / always / most dog
    barked
  • Basic PARTS OF SPEECH
  • NOUN dog, man, car, law
  • ADJECTIVE red, fat, brave
  • VERBS run, barked
  • Best known set of part of speech TAGS Brown TAGS
  • NN for nouns
  • VB for verb base forms
  • JJ for adjectives in positive form
  • Notice many words belong to more than one class
  • Open and closed classes

5
Nouns and Pronouns
  • Nouns cat, dog, house, notebook
  • Plurals
  • Regular dog -gt dogs
  • Irregular deer -gt deer, ox -gt oxen
  • Case
  • The womans house
  • Pronouns I, you, he / she/ it, we, they
  • Accusative case him, her, them
  • Genitive case his, her, theirs
  • Reflexives
  • Herself
  • Mary saw her in the mirror
  • Mary saw herself in the mirror

6
Words that go with nouns determiners,
adjectives
  • Determiners
  • Articles THE tree
  • Demonstratives THIS tree
  • Quantifiers MANY trees, MOST children, .
  • Brown tags AT for articles, DT for singular
    demonstratives (THIS, THAT), DTS for plural ones
  • Adjectives
  • A red rose, many intelligent children,
  • Predicative use That rose is red, Many children
    are intelligent
  • Comparative use John is richer than Bill
  • Superlative use John is the trendiest student in
    his class /
  • John is the MOST incompetent mechanic I ever met.
  • Brown tags JJ for positive form, JJR for
    comparatives, JJT for superlatives

7
Verbs
  • Used to describe
  • actions She threw the stone
  • Activities She walked along the river
  • States I have 50
  • Morphological forms
  • Base form walk
  • 3rd singular present tense walks
  • Gerund and present participle walking
  • Past tense, past / passive participle walked
  • Auxiliaries
  • John has been to Boston
  • Modal auxiliaries
  • You should spend more time with your family

8
Other parts of speech
  • Adverbs (RB)
  • She often travels to Las Vegas
  • Prepositions (IN)
  • In the glass, over the table
  • Particles (RP)
  • He put me off
  • Conjunctions (CC)
  • She bought her car, but she also considered
    leasing it.
  • She bought or leased the car.
  • Give me a peach or an apple.

9
Sub-word level Morphology
  • Inflection
  • Dog / dogs
  • Eat / eats
  • Derivation
  • Adjectives into adverbs ly
  • widely (from wide)
  • But note difficultly
  • Verbs into adjectives able
  • Understandable
  • Compounding
  • Tea kettle
  • Schadenfreude
  • Finnish rautatieasemassa

10
Syntax Phrase Structure
  • Words are organized in PHRASES
  • I put THE BAGELS in the freezer
  • I put THE BAGELS THAT WE HAD NOT EATEN in the
    freezer
  • Phrases are classified according to their main
    CONSTITUENT, or HEAD
  • Noun phrases
  • the bagels, the homeless old man that I tried to
    help yesterday
  • Mary, she, one of them
  • Verb phrases
  • Mary went to the store and bought a bagel
  • Adjective Phrases
  • John is tall / very tall / quite certain to
    succeed
  • Sentences

11
Marking Phrase Constituents
  • BRACKETING
  • S NP The children VP ate NP the cake
  • TREES

S
NP
VP
NP
AT
NNS
VBD
AT
NN
the
children
ate
the
cake
12
Semantics
  • Lexical semantics
  • (Near) Synonyms COURSE / MODULE
  • Hypernyms DOG / ANIMAL
  • Compositional semantics
  • John ran
  • Red car
  • Red herring

13
Discourse
  • Anaphora
  • John arrived late. He always does that.
  • My car didnt start this morning. There was some
    problem with the engine fan.
  • Discourse relations
  • My car didnt start this morning BECAUSE there
    was some problem with the engine fan.

14
Levels of linguistic processing the basic
pipeline of an NLP system (e.g., GATE)
15
Example processing a query to a web search engine
TERM IDENTIFICATION STOP WORDS
POS TAGGING
List the estate agents in Stratford, London.
LEXICAL PROCESSING
SYNTACTIC PROCESSING
PREPROCESSING
SYNONYMS
TOKENIZATION
SEMANTIC PROCESSING
WEB ACCESS
16
Preprocessing tokenizing, conversion to a
standard format (e.g., XML)
List the estate agents in Stratford, London
PARAGRAPH MARKUP TOKENIZER
ltW CwgtListlt/Wgt ltW Cwgtthelt/Wgt ltW
Cwgtestatelt/Wgt ltW Cwgtagentslt/Wgt ltW
Cwgtinlt/Wgt ltW CwgtStratfordlt/Wgt ltW
Cwgt,lt/Wgt ltW CwgtLondonlt/Wgt
17
Processing Steps
  • LEXICAL PROCESSING
  • POS TAGGING
  • THE -gt THE/DT ESTATE -gt ESTATE/NN
  • STEMMING / LEMMATIZATION
  • AGENTS -gt AGENT (or even AGENT N PL)

18
Lexical Processing, I POS tagging
ltW CVB'gtListlt/Wgt ltW CDT'gtthelt/Wgt ltW
CNN'gtestatelt/Wgt ltW CNNS'gtagentslt/Wgt ltW
CIN'gtinlt/Wgt ltW CNNP'gtStratfordlt/Wgt ltW
C'CM'gt,lt/Wgt ltW CNNP'gtLondonlt/Wgt
19
Lexical Processing, IIlemmatizing / stemming
ltW CVB'gtListlt/Wgt ltW CDT'gtthelt/Wgt ltW
CNN'gtestatelt/Wgt ltW CNNS'gtagentlt/Wgt ltW
CIN'gtinlt/Wgt ltW CNNP'gtStratfordlt/Wgt ltW
C'CM'gt,lt/Wgt ltW CNNP'gtLondonlt/Wgt
20
Processing Steps, III Syntactic Processing
  • Identify TERMS ESTATE AGENT
  • Remove STOPWORDS (e.g., words tagged as DT, IN,
    VB, )

21
Practical (partial) parsingidentifying search
terms, filtering
ltSEARCHTERMgt ltW CNN'gtestatelt/Wgt ltW
CNN'gtagentlt/Wgt lt/SEARCHTERMgt ltSEARCHTERMgt ltW
CNNP'gtStratfordlt/Wgt lt/SEARCHTERMgt ltBOOLgt ltW
C'CM'gt,lt/Wgt lt/BOOLgt ltSEARCHTERMgt ltW
CNNP'gtLondonlt/Wgt lt/SEARCHTERMgt
22
Processing Steps, IV Semantic Processing
  • QUERY EXPANSION ESTATE AGENT OR REAL ESTATE

23
Semantic processing finding synonyms, (or better
keywords) interpreting stop words.
ltSEARCHTERMgt ltW CNN'gtestatelt/Wgt ltW
CNN'gtagentlt/Wgt lt/SEARCHTERMgt ltBOOL
TYPEORgtlt/BOOLgt ltSEARCHTERMgt ltW CNN'gtreallt/Wgt
ltW CNN'gtestatelt/Wgt lt/SEARCHTERMgt ltBOOL
TYPEANDgtlt/BOOLgt ltSEARCHTERMgt ltW
CNNP'gtStratfordlt/Wgt lt/SEARCHTERMgt ltBOOL
TYPEANDgt ltW C'CM'gt,lt/Wgt lt/BOOLgt ltSEARCHTERMgt
ltW CNNP'gtLondonlt/Wgt lt/SEARCHTERMgt
24
More advanced examples Information Extraction
Systems (e.g., LASIE)
25
Preprocessing, I tokenizing
In July 1995 CEG Corp. posted net of 102
million, or 34 cents a share Late last night the
company announced a growth of 20.
PARAGRAPH MARKUP TOKENIZER
ltPgtltW C'W'gtInlt/Wgt ltW C'W'gtJulylt/Wgt ltW
C'CD'gt1995lt/Wgt ltW C'W'gtCEGlt/Wgt ltW
C'W'gtCorp.lt/Wgt ltW C'W'gtpostedlt/Wgt ltW
C'W'gtnetlt/Wgt ltW C'W'gtoflt/Wgt ltW C'W'gtlt/WgtltW
C'CD'gt102lt/Wgt ltW C'W'gtmillionlt/Wgt ltW
C'CM'gt,lt/Wgt ltW C'W'gtorlt/Wgt ltW C'CD'gt34lt/Wgt
ltW C'W'gtcentslt/Wgt ltW C'W'gtalt/Wgt ltW
C'W'gtsharelt/Wgt ltW C.'gt.lt/Wgt lt/Pgt
26
Preprocessing, I tokenizing
PARAGRAPH MARKUP TOKENIZER
27
Preprocessing,II sentence splitting
ltPgt ltSgt ltW C'W'gtInlt/Wgt ltW C'W'gtJulylt/Wgt ltW
C'CD'gt1995lt/Wgt ltW C'W'gtCEGlt/Wgt ltW
C'W'gtCorp.lt/Wgt ltW C'W'gtpostedlt/Wgt ltW
C'W'gtnetlt/Wgt ltW C'W'gtoflt/Wgt ltW C'W'gtlt/WgtltW
C'CD'gt102lt/Wgt ltW C'W'gtmillionlt/WgtltW
C'CM'gt,lt/Wgt ltW C'W'gtorlt/Wgt ltW C'CD'gt34lt/Wgt ltW
C'W'gtcentslt/Wgt ltW C'W'gtalt/Wgt ltW
C'W'gtsharegtlt/Wgt ltW C'.'gt.lt/Wgtlt/Sgt lt/Pgt
ltPgt ltSgt ltW C'W'gtLatelt/Wgt ltW C'W'gtlastlt/Wgt ltW
C'W'gtnightlt/Wgt ltW C'W'gtthelt/Wgt ltW
C'W'gtcompanylt/Wgt ltW C'W'gtannouncedlt/Wgt ltW
C'W'gtalt/Wgt ltW C'W'gtgrowthlt/Wgt ltW C'W'gtoflt/Wgt
ltW C'CD'gt20lt/WgtltW C'W'gtlt/Wgt ltW C'.'gt.lt/Wgt
lt/Sgt lt/Pgt
28
Lexical Processing, I POS tagging
ltW CNNP'gtCEGlt/Wgt ltW CNN'gtCorp.lt/Wgt ltW
CVBD'gtpostedlt/Wgt ltW CNN'gtnetlt/Wgt ltW
CIN'gtoflt/Wgt ltW CS'gtlt/Wgt ltW C'CD'gt102lt/Wgt
ltW CNN'gtmillionlt/Wgt ltW C'CM'gt,lt/Wgt
29
Lexical Processing, IIlemmatizing / stemming
ltW CNNP'gtCEGlt/Wgt ltW CNN'gtCorp.lt/Wgt ltW
CVBD'gtpostlt/Wgt ltW CNN'gtnetlt/Wgt ltW
CIN'gtoflt/Wgt ltW CS'gtlt/Wgt ltW C'CD'gt102lt/Wgt
ltW CNN'gtmillionlt/Wgt ltW C'CM'gt,lt/Wgt
30
An example of practical (partial)
ParsingIdentifying numerical expressions
ltW CNNP'gtCEGlt/Wgt ltW CNN'gtCorp.lt/Wgt ltW
CVBD'gtpostlt/Wgt ltW CNN'gtnetlt/Wgt ltW
CIN'gtoflt/Wgt ltNUMEXgt ltW CS'gtlt/Wgt ltW
C'CD'gt102lt/Wgt ltW CNN'gtmillionlt/Wgt lt/NUMEXgt ltW
C'CM'gt,lt/Wgt
31
An example of practical semantic processing
identifying semantic type
ltW CNNP'gtCEGlt/Wgt ltW CNN'gtCorp.lt/Wgt ltW
CVBD'gtpostlt/Wgt ltW CNN'gtnetlt/Wgt ltW
CIN'gtoflt/Wgt ltNUMEX TYPEMONEYgt ltW
CS'gtlt/Wgt ltW C'CD'gt102lt/Wgt ltW
CNN'gtmillionlt/Wgt lt/NUMEXgt ltW C'CM'gt,lt/Wgt
32
An example of discourse processingresolving
anaphoric references
In July 1995 CEG Corp. posted net of 102
million, or 34 cents a share Late last night the
company announced a growth of 20.
33
Why language processing is hard
  • There is a virtually infinite number of ways of
    expressing the same information
  • E.g., different temporal terms
  • Virtually all text contains some noise this
    holds even more for spoken output
  • It becomes particularly funny in the case of some
    instruction manuals
  • Large amounts of money should be kept on your
    derson. Other wise lockers are available
  • After put on the costume Put this parts inside
    the outerwear
  • This garment is selected with new materials for
    your daily best comfort, it's an original way of
    fashion Try it and you'll be enjoyed!
  • The same string can mean different things
  • POS
  • The Dilbert slide

34
Ambiguity and humor
35
References
  • Jurafsky and Martin, chapters 3.1, 5.1, 5.2, 12.1
    (first edition 3.1, 8.1, 8.2, and 9.1)
  • R. Huddleston English Grammar an outline,
    Cambridge University Press 1990.
  • This presentation is based on slides prepared by
    Massimo Poesio.
About PowerShow.com