Title: On the Ambiguity of Serbian Texts and Methods to disambiguate it
1On the Ambiguity of Serbian Texts and Methods to
disambiguate it
- Cvetana Krstev, Duko Vitas,
- University of Belgrade
8th Intex/Nooj Workshop
2What is the ambiguity?
- the assignment of different lemmas
- the assignment of different grammatical
categories
3The ambiguity in Serbian
- In Serbian many word forms are homographs
although not homophonesstress marks are not
recorded - gore adv. up
- gore adv. worse
- gòre P3s goreti,VEk to burn
- gòre A3s
- gòre P3s gorjeti,VIjk to burn
- gòre A3s
- gòre fs2 gora forest
4The ambiguity in Serbian (2)
- rodoslovna,rodoslovni.A2PosQakms2gakms4vaefs1g
aefs5gakns2gaenp1gaenp4gaenp5g - rodoslovne,rodoslovni.A2PosQaemp4gaefs2gaefp1g
aefp4gaefp5g - rodoslovni,rodoslovni.A2PosQadms1gaems4qaems5g
aemp1gaemp5g - rodoslovnih,rodoslovni.A2PosQaemp2gaefp2gaenp2
g - rodoslovnim,rodoslovni.A2PosQaems6gaemp3gaemp6
gaemp7gaefp3gaefp6gaefp7gaens6gaenp3gaenp6g
aenp7g - rodoslovnima,rodoslovni.A2PosQaemp3gaemp6gaemp
7gaefp3gaefp6gaefp7gaenp3gaenp6gaenp7g - rodoslovno,rodoslovni.A2PosQaens1gaens4gaens5g
- rodoslovnog,rodoslovni.A2PosQadms2gadms4vadns2
g - rodoslovnoga,rodoslovni.A2PosQadms2gadms4vadns
2g - rodoslovnoj,rodoslovni.A2PosQaefs3gaefs7g
- rodoslovnom,rodoslovni.A2PosQadms3gadms7gaefs6
gadns3gadns7g -
e form is the same for definite, indefinite
? 9 sets of grammatical categories
g form is the same for animate and inanimate
5Disambiguation process
- Reconstructing word forms
- Using filter dictionaries
- Using restricted dictionaries
- Using dictionaries of compounds
- Using disambiguation grammars
6Reconstructing word forms date adverbial phrases
7Reconstructing word forms date adverbial
phrases (2)
- i izdavanxem YUBA kartica 20. februara 2002.
godine. - celog sistema. Zato je josx pocyetkom 1996.
godine jedan - i www.plivamed.net. U petom mjesecu 2001.godine
smo oformlx - cxe biti odrzxan u novembru ove godine u Neumu, a
za prvog
8Reconstructing word forms forms written with
digits, etc.
9Reconstructing word forms forms written with
digits(2)
- sxkovi iznosili oko 500 hilxada maraka. Znacyajna
usxteda - poput SAP-ovog ili IBM-ovog, dobijate i
organizaciju firme - cyelicyne industrije 1890-ih nije postojao. Ali,
poznata je - sveta drma tezxinom od 81,7 milijardi dolara u
160 zemalxa, - odnosno ukupno bezmalo pola milijarde (464
miliona)! Predxe
10Using filter dictionaries
- mi,ja.PRO01Prssx3i
- mi,mi.PRO03Prspx1r
- mi,miti.V35ImperfTrIrefRefAysAzs
- li,li.PAR
- li,liti.V98ImperfTrItIrefAysAzs
11Using filter dictionaries (2)
- Very cautious filter dictionary with only 41
entries
12Using restricted dictionaries
- Dictionaries contain lemmas for both standard
pronunciations Ekavian and Ijekavian. Text,
however, are usually written in only one. - Dictionaries contain lemmas for both Serbian and
Croatian language (or variant of Serbo-Croatian)
13Using restricted dictionaries (2)
- crvene,crven.A17Colaemp4gaefs2gaefp1gaefp4ga
efp5g - crvene,crveneti.V547ImperfItIrefRefEkPzpAys
Azs - crvene,crveniti.V54ImperfTrIrefPzp
- crvene,crvenxeti.V747ImperfItIrefRefIjkPzp
14Using dictionary of compounds
- bez obzira na,bez obzira na.PREPCNcnp4
- bez,bez.PREPp2
- na,na.INT
- na,na.PREPp4p7
- obzira,obzir.N1ms2qmp2q
- obzira,obzirati.V519ImperfItRefAysAzs
15Using disambiguation grammars positional
constraint
It is interjection, if it is followed by an
exclamation mark.
16Using disambiguation grammars positional
constraint (2)
After sentence or phrase boundary, mi and ti
are personal pronouns in nominative case (after
other possibilities were excluded)
17Using disambiguation grammars sequential
constraint
da is a conjunction (and not a form of a verb
dati to give if is followed by an auxiliary
verb in clitic form)
18Using disambiguation grammars sequential and
positional constraints
- sxargarepe evropska unija ne samo da je
prihvatila nasxu i - da,.CONJ
- da,.ADV
- da,.INT
- da,.PAR
- da,dati.V103PerfTrIrefRefPzsAysAzs
19Using disambiguation grammars agreement
An adjective, possessive pronoun or numeral has
to agree in gender, number, and case with a noun
that follows
20Using disambiguation grammars agreement (2)
- povecxati nxegov proboj u regionu. Rumunska
proporcija - u,.PREPp2
- u,.PREPp4
- u,.PREPp7
- regionu,region.N1ms3q
- regionu,region.N1ms7q
21Using disambiguation grammars agreement of
personal names
Special rules of the agreement of first name and
surname
22Using disambiguation grammars agreement (2)
- raspalio je Mladxan Dinkicx sxakom o okrugli sto
"Platne kartice - - Mladxan,Mladxan.N1002HumNPropFirstSRms1v
- Mladxan,mladxan.A7akms1gakms4q
- Dinkicx,Dinkicx.N28NPropHumLastSRms1v
23The order of grammar application
?Apply first
Apply second ?
24Careful construction of grammars
- Syntactic ambiguity
- Zalagacxu se da ti trosxkovi budu minimalni.
- I will do my best to minimize these expences.
- I will do my best to minimize your expences.
- Although some cases are much more frequent...
- Klicke je bio voljan da da automobil.
- Klicke was willing to give the car.
- Mislio sam da ti tvoja gospoda ne da da je vida.
- I thought that your misses is not giving to you
to see her.
25Thank you!