Overview%20of%20Statistical%20NLP

About This Presentation

Title:

Overview%20of%20Statistical%20NLP

Description:

Overview of Statistical NLP IR Group Meeting March 7, 2006 Outline Some basic/important NLP problems Topics that recently attracted many interests NLP research groups ... – PowerPoint PPT presentation

Number of Views:128

Avg rating:3.0/5.0

Slides: 24

Provided by: Jing156

Category:

more less

Transcript and Presenter's Notes

Title: Overview%20of%20Statistical%20NLP

1
Overview of Statistical NLP

IR Group Meeting
March 7, 2006

2
Outline

Some basic/important NLP problems
Topics that recently attracted many interests
NLP research groups
Discussion on the relation between NLP and IR

3
Levels of Analysis in NLP(from Dan Roths CS598)

Morphology
How words are constructed
Syntax
Structural relation between words
Semantics
The meaning of words and of combinations of words
Pragmatics.
How is a sentence used? Whats its purpose?
Discourse (sometimes distinguished as a subfield
of Pragmatics)
Relationships between sentences global context.

4
Some NLP Problems

N-gram Models
Word Sense Disambiguation
Lexical Acquisition
(POS) Tagging
(Syntactic) Parsing
Semantic Role Labeling (Semantic Parsing)
Named Entity Recognition
Textual Entailment

5
N-gram Models

The task to estimate P(wnw1,,wn-1)
Approaches
Maximum likelihood estimation
Various smoothing methods
Applications
Automatic speech recognition
Spelling correction
Handwriting recognition
Statistical machine translation

6
Word Sense Disambiguation (WSD)

The task to determine which of the senses of an
ambiguous word is involved in a particular use of
the word
Approaches
Supervised
Log-linear models
Information-theoretic
Memory-based learning (kNN)
Dictionary-based
Sense definitions
Thesauri
Translations in a second language
Unsupervised
Clustering using EM algorithm

7
Word Sense Disambiguation (WSD)

Accuracy
Word-specific
Easy words gt 90
Hard words 5070
Applications
Statistical machine translation
Information retrieval

8
Lexical Acquisition

The task to develop algorithms and statistical
techniques for filling the holes in existing
machine-learnable dictionaries by looking at the
occurrence patterns of words in large text
corpora
Examples
Verb subcategorization
Propositional phrase attachment disambiguation
Selectional preferences
Semantic similarity

9
Semantic Similarity

The task to acquire a relative measure of
similarity between two words
Approaches
Vector space measures (document space, word
space, modifier space, etc.)
Probabilistic measures (KL-divergence, etc.)
Applications
Information retrieval (query expansion)

10
POS Tagging

The task labeling each word in a sentence with
its appropriate part of speech
Major approaches
HMM
Transformation-based
Advantages speed and storage
Other approaches
Neural networks, decision trees, memory-based
learning, maximum entropy models

11
POS Tagging

Accuracy
9597
Achieved only when the application text and the
training text are from the similar source
Applications
For higher-level NLP tasks partial parsing,
parsing, NER, etc.
the best lexicalized probabilistic parsers are
now good enough that they perform better starting
with untagged text and doing the tagging
themselves, rather than using a tagger as
preprocessor. (Charniak 1997)

12
(Syntactic) Parsing

The task to find the most likely syntactic parse
tree of a sentence
Approaches
Probabilistic context free grammar (PCFG)
Supervised
Unsupervised
Lexicalized models
Dependency-based models

13
(Syntactic) Parsing

Accuracy
Charniak 1997 Rec 0.875 Prec 0.874
Collins 1997 Rec 0.881 Prec 0.886
Applications
For other NLP tasks such as semantic role
labeling and relation extraction

14
Semantic Role Labeling

The task to identify the predicate-argument
structures in sentences
Approaches
Supervised learning
Accuracy
Best 70 (CoNLL 04 shared task)
Applications
Information extraction
Question answering

15
Textual Entailment

The task given two text fragments, to recognize
whether the meaning of one text is entailed (can
be inferred) from the other text
Approaches
Word overlap
Statistical lexical relations
Syntactic matching
Logic inference
Accuracy
0.56, best 0.60 (PASCAL Challenge 05)
Applications
Question answering
Multi-document summarization

16
Tools

Brill Tagger
Charniak Parser
Collins Parser
MiniPar
Semantic Parser
ASSERT Parser
CCGs demo

17
Corpora

WordNet
Penn Treebank (Sample)
PropBank
FrameNet

18
Other Tasks

Automatic Speech Recognition
Natural Language Generation
Automatic Summarization

19
Outline

Some basic/important NLP problems
Topics that recently attracted many interests
NLP research groups
Discussion on the relation between NLP and IR

20
Recent topics

Unsupervised and semi-supervised approaches
Knowledge acquisition bottleneck
Semantic role labeling
Improve the performance of SRL
Use the results for other tasks
Relation extraction
WSD
Parsing
Statistical machine translation
Word alignment

21
Outline

Some basic/important NLP problems
Topics that recently attracted many interests
NLP research groups
Discussion on the relation between NLP and IR

22
NLP Research Groups

USC/ISI
Stanford
UPenn
Johns-Hopkins
UIUC

Overview%20of%20Statistical%20NLP - PowerPoint PPT Presentation

Overview%20of%20Statistical%20NLP

Overview of Statistical NLP IR Group Meeting March 7, 2006 Outline Some basic/important NLP problems Topics that recently attracted many interests NLP research groups ... – PowerPoint PPT presentation