ParaMor - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

ParaMor

Description:

You are not being taken. 2nd person singular. Turkish ... Agglutinative Inuit language. 50,000 speakers. Per Langaard. 6. Carnegie Mellon. Christian Monson ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 37
Provided by: carnegieme
Category:
Tags: paramor | inuit

less

Transcript and Presenter's Notes

Title: ParaMor


1
ParaMor
  • Finding Paradigms

Across Morphology
Christian Monson
2
Turkish Morphology Beads on a String
present progressive
2nd person singular
take
passive
negative
You are not being taken
3
Turkish Morphology Beads on a String
götür
ül
m
sun
üyor
present progressive
2nd person singular
take
passive
negative
You are not being taken
4
Applications of Computational Morphology
  • Machine Translation
  • Turkish-English (Oflazer, 2007)
  • Czech-English (Goldwater and McClsky, 2005)
  • Speech Recognition
  • Finnish (Creutz, 2006)
  • Information Retrieval

5
Challenges of Computational Morphology
  • Time Consuming
  • Kemal Oflazer estimates
  • 3-4 months to build basic Turkish analyzer
  • Plus lexicon development and maintenance
  • Expertise Needed
  • Greenlandic
  • Official language of Greenland
  • Agglutinative Inuit language
  • 50,000 speakers
  • Per Langaard

6
The Solution
Raw Text
Unsupervised Morphology Induction
7
Paradigms The Structure of Morphology
ül
m
sun
üyor
present progressive
2nd person singular
take
passive
negative
8
Paradigms The Structure of Morphology
Tense Mood
Person Number
Stem
Voice
Polarity
ül
m
sun
üyor
present progressive
2nd person singular
take
passive
negative
9
Paradigms The Structure of Morphology
Tense Mood
Person Number
Stem
Voice
Polarity
ül
m
um
üyor
um
present progressive
take
passive
negative
1st person singular
10
Paradigms The Structure of Morphology
Tense Mood
Person Number
Stem
Voice
Polarity
ül
m
um
üyor
um
Ø
present progressive
take
passive
negative
3rd person singular
11
Paradigms The Structure of Morphology
Tense Mood
Person Number
Stem
Voice
Polarity
ül
m
um
üyor
um
Ø
uz
present progressive
take
passive
negative
1st person plural
12
Paradigms The Structure of Morphology
Tense Mood
Person Number
Stem
Voice
Polarity
ül
m
um
üyor
um
Ø
uz
present progressive
take
passive
negative
13
Paradigms The Structure of Morphology
Tense Mood
Person Number
Stem
Voice
Polarity
ül
m
um
üyor
yecek
um
Ø
uz
take
passive
negative
future
14
Paradigms The Structure of Morphology
Tense Mood
Person Number
Stem
Voice
Polarity
ül
m
um
üyor
yecek
um
Ø
uz
take
passive
negative
15
Paradigms The Structure of Morphology
Tense Mood
Person Number
Stem
Voice
Polarity
ül
m
um
üyor
yecek
um
Ø
uz
16
Paradigms The Structure of Morphology
ül
m
um
üyor
yecek
um
Ø
uz
  • Paradigm
  • Set of mutually replaceable strings

17
Paradigms The Structure of Morphology
Paradigms
ül
m
um
üyor
yecek
um
Ø
uz
  • Paradigm
  • Set of mutually replaceable strings

18
Paradigms The Structure of Morphology
Paradigm
ül
m
um
üyor
yecek
um
Ø
uz
  • Paradigm
  • Set of mutually replaceable strings

19
Paradigms The Structure of Morphology
Paradigm
ül
m
um
üyor
yecek
um
Ø
uz
  • Paradigm
  • Set of mutually replaceable strings

20
Overview
  • ParaMor
  • Unsupervised morphology induction system

21
Overview
  • ParaMor
  • Unsupervised morphology induction system
  • Evaluation Methodology

22
Overview
  • ParaMor
  • Unsupervised morphology induction system
  • Evaluation Methodology
  • Results

23
The ParaMor Algorithm
24
The ParaMor Algorithm
  • Identify paradigms in 3 steps

25
The ParaMor Algorithm
  • Identify paradigms in 3 steps
  • Search for candidate paradigms

26
The ParaMor Algorithm
  • Identify paradigms in 3 steps
  • Search for candidate paradigms
  • Cluster candidates modeling the same paradigm

27
The ParaMor Algorithm
  • Identify paradigms in 3 steps
  • Search for candidate paradigms
  • Cluster candidates modeling the same paradigm
  • Filter

28
The ParaMor Algorithm
  • Paradigm discovery in 3 steps
  • Search for candidate paradigms
  • Cluster candidates modeling the same paradigm
  • Filter
  • Segment words
  • using the discovered paradigms

29
Search for Candidate Paradigms
a ada adas ado ados an ar aron ó 1786
ra rada radas rado rados ran rar raron ró 23
Ø da das do dos n ndo r ron 118
strada stradas strado strar stró 7
a an ar ó 353
rada radas rado rados 53
Ø do n r 354
a as o os 892
strada strado strar stró 8
a an ar 413
rada rado rados 67
Ø n r 509
a o os 1410
strada strado stró 9
Ø r s 287
a an 1049
rada rado 89
Ø n 1874
a o 2304
Ø s 5501
strada strado 12
Ø es 874
strado 15
rado 167
an 1786
n 6051
a 8981
s 10662
es 2751
...
...
30
Search for Candidate Paradigms
a ada adas ado ados an ar aron ó 1786
ra rada radas rado rados ran rar raron ró 167
Ø da das do dos n ndo r ron 6051
rada radas rado rados 167
strada stradas strado strar stró 7
a an ar ó 1786
Ø do n r 6051
a as o os 8981
strada strado strar stró 8
rada rado rados 167
a an ar 1786
Ø n r 6051
a o os 8981
Ø r s 287
strada strado stró 9
a an 1786
rada rado 167
Ø n 6051
a o 8981
Ø s 5501
Ø es 10662
strada strado 12
strado 15
rado 167
an 1786
n 6051
a 8981
s 10662
es 10662
...
31
a
a ada adas ado ados an ar aron ó 1786
ra rada radas rado rados ran rar raron ró 167
Ø da das do dos n ndo r ron 6051
a ada ado ados an ar aron ó 1786
ra rada radas rado rados rar raron ró 167
Ø da das do dos n r ron 6051
rada radas rado rados rar raron ró 167
trada tradas trado trados trar traron tró 167
a ada ado an ar aron ó 1786
Ø da do dos n r ron 6051
a ado an ar aron ó 1786
Ø da do n r ron 6051
trada tradas trado trados trar tró 167
rada radas rado rados rar ró 167
trada tradas trado trar tró 30
strada stradas strado strar stró 7
a ado an ar ó 1786
rada radas rado rados rar 167
Ø do n r ron 6051
a an ar ó 1786
rada radas rado rados 167
Ø do n r 6051
a as o os 8981
trada trado trar tró 30
strada strado strar stró 8
Ø r s 287
a an ar 1786
rada rado rados 167
Ø n r 6051
a o os 8981
trada trado tró 30
strada strado stró 9
a an 1786
rada rado 167
Ø n 6051
a o 8981
Ø s 5501
trada trado 30
strada strado 12
Ø es 10662
strado 15
trado 30
rado 167
an 1786
n 6051
a 8981
s 10662
es 10662
...
...
...
32
a
17 a aba aban ada adas ado ados an ando ar ara
aron arse ará arán aría ó Cosine Similarity
0.715 532 Covered Types
15 Suffixes a aba aban ada adas ado ados an ando
ar aron arse ará arán ó 25 Stems anunci, aplic,
apoy, celebr, consider, 375 Covered Types
16 a aba ada adas ado ados an ando ar ara aron
arse ará arán aría ó Cosine Similarity 0.664 451
Covered Types
15 Suffixes a aba ada adas ado ados an ando ar
aron arse ará arán aría ó 22 Stems anunci, aplic,
apoy, celebr, concentr, 330 Covered Types
15 Suffixes a aba ada adas ado ados an ando ar
ara aron arse ará arán ó 23 Stems anunci, apoy,
confirm, consider, declar, 345 Covered Types
33
a
F1
ParaMor Morfessor
Bernhard 2
Morfessor
ParaMor
34
a
ParaMor
Identify
Search Cluster Filter
Segment
35
a
ParaMor
Identify
Search Cluster Filter
Segment
36
Morphology in NLP
sun
götür
ül
m
um
üyor
present progressive
2nd person singular
take
passive
negative
You are not being taken
Write a Comment
User Comments (0)
About PowerShow.com