Title: An example of an Arabic Sentence in UNL (universal) forma
1Towards a Language-Independent Universal Digital
Library
The Second International Conference on Universal
Digital Libraries (ICUDL 2006) 17-19-2006
November, Alexandria, Egypt
Sameh Alansary
Magdy Nagi Noha Adly sameh.alansary_at_biba
lex.org magdy.nagi_at_bibalex.org
noha.adly_at_bibalex.org Bibliotheca
Alexandrina
2Introduction
- IT made the full text libraries assets
available digitally (Independent of time, place
and copy).
e.g. - Million Book Project.
- Nasser Digital Library.
UDL
- Digitization only does not lead to
universality in its optimum sense.
- A new dimension of universality should be added
Independency of Language
3Language-dependency blocks information
dissemination
- Language dependency holds language barriers.
- If it is always possible for everyone to read in
everyones mother tongue, this will help in
- Dissemination of knowledge.
- Preservation of nationality and identity.
- Preventing cultural hegemony.
- 80 of books and e-materials is written in
English and 20 is written in other languages.
4Attempts to break language barriers
- Translation systems have been introduced (NLP)
Approaches
1- Direct translation approach.
2- Transfer approach.
3- Interlingual approach.
- Examples of Systems
- Google translation
http//www.google.ch/language_tools - Fujitsu systems http//www.fujitsu.com/
global/services/translation
5Drawback of MT systems
1- The quality of results is often inadequate.
2- Work for a limited number of language
combinations.
3- Hold an overload on the network
To translate from and to only 10 languages,
10 grammars, 10 lexicons, 90 translation
dictionaries and 90 sets of translation rules
will be needed, plus the need for semantic
processing in each language.
6Towards a universal system for knowledge
representation
7Some questions may bear in mind
- How can we represent natural language materials
in a language independent format? (a format
required)
- What is the system suitable for representing
knowledge in the format selected? (a system
required)
- How is this system going to work?
8Requirements for a universal representation of
knowledge
1- The content of the original material (meaning)
must not be lost.
2- This universal format should be understandable
by various platforms over the network.
3- This universal format should be decodable to
any natural language.
9UNL System
10What is UNL? (1)
- The Universal Networking Language (UNL) is an
artificial language for computers to express
information and knowledge that can be expressed
in natural language.
- Started in 1996, as an initiative of the UNU/IAS
in Japan
- Development on 15 languages Arabic, Chinese,
English, French, German, Hindi, Indonesian,
Italian, Japanese, Korean, Portuguese, Russian,
Spanish, Thai, Swahili.
- Transferred to the UNDL Foundation in 2001.
11What is UNL? (2)
- It expresses information or knowledge of
natural language (NL) in the form of semantic
network with hyper-node.
The boy who works here went to school
Example
UNL expression
- UNL
- agt(go(iclgtmove)._at_entry._at_past, 01)
- plt(go(iclgtoccur)._at_entry._at_past,
school(iclgtinstitution)) - agt01(work(iclgtdo), boy(iclgtperson._at_entry))
- plc01(work(iclgtdo),here)
- /UNL
12The boy who works here went to school
go(iclgtmove) _at_ entry _at_ past
agt
plt
boy(iclgtperson) _at_ entry
school(iclgtinstitution)
here
agt
plc
work(iclgtdo)
01
UNL-hyper graph
13The UNL System
Formalism
System
Components
Knowledge representation
14The UNL-system components
UNL LANGUAGE SERVER Enconverter ? ?
Deconverter (EnCO) (EnCO)
Language Server UNL lt- gtChinese
DeCO
EnCO
UNL document
Language Server UNL lt-gt Arabic
UNL Editor
UNL Viewer
UNL Proxy
Language Server UNL lt-gt Spanish
Internet
1
2
3
Language Server UNL lt-gt Hindi
DeCO
EnCO
Language Server UNL lt- gtJapanese
Language Server UNL lt- gt English
DeCO
EnCO
DeCO
EnCO
15A) Language servers
Analysis Rules
Web Server with UNL document
EnConverter
UNL
UNL-language Dictionary
Knowledge Base
UNL
UNL Language Server
DeConverter
NL
Concurrence Dictionary
Generation Rules
16C) UNL Proxy Server
B) UNL Tools
1- UNL viewer.
2- UNL editor.
3- UNL verifier.
- Searches for UNL at the web, send it to the
language server and displays it on the users
chosen language.
17Mechanism of conversion between NL and UNL
Annotated Natural Language texts
Annotation Editor
Universal Parser
UNL Verifier
EnConverter
Natural Language texts
UNL Document
UW Dictionary
Grammatical Rules
Word Dictionary
Co- Occurrence Dictionary
UNL KB
Web server HTMLXML
Natural Language texts
UNL Document
DeConverter
18UNL as a formal language How does it
represent knowledge?
1- Universal words (UW) to represent concepts.
Example boy(iclgtperson)
hear(iclgtperceive(agtgtperson,objgtthing))
2- relations 38 semantic relations can be
distinguished.
Example agt, aoj, bas, con, coo, dur, etc.
3- Attributes to express subjectivity of the
speaker.
Example _at_past, _at_emphasis, _at_def, _at_not, etc.
194- Knowledge base (UNLKB).
- Define the Universal Word.
- Provide linguistic knowledge of concepts
20Ibrahim Shihata UNL Arabic Center (ISUAC)
- It is established at Bibliotheca Alexandrina.
- It is responsible for designing, implementing,
and maintaining the various components of the
Arabic language server.
- The Arabic language server will be capable of
- Enconverting the Arabic texts to the universal
format.
- Deconverting the universal materials produce by
other language centers to Arabic.
21The Achievements of the ISUAC
A) Arabic language resources and tools.
B) Developing tools.
C) Arabic language-based universal materials.
22A) Arabic language resources and tools
1- The Arabic Dictionary
It is a repository of information for all UNL
Arabic grammars.
Dictionary
Head Words (Vocabulary of Arabic)
Universal words (Vocabulary of UNL)
Linguistics Features (Linguistic info about HWs)
232- Arabic EnConversion Rules
- It is responsible for Enconverting Arabic to UNL.
- Arabic EnConversion Rules are able to
1- Perform morphological analysis to extract
concepts the Arabic words refer to.
2- Assign exact semantic relation between
concepts as being expressed in the context of the
Arabic sentence.
24- Simulation of how Enconverter works
??? ???? ??? ?????? ?? 15 ????? 1918 ?? 18 ????
????? ?? ????? ???????????.
??? / /???? ??? ??????/ /??/ /15/?????/ /1918/
/??/ /18/ /????/ /?????/ /??/ /?????/
/???/????????/.
delete
delete
mod
mod
mod
obj
tim
tim
mod
plc
plc
plc
25UNL Network
263- Arabic DeConversion Rules
- It is responsible for generating Arabic
sentences out of UNL networks.
- Arabic DeConversion Rules are able to
1- Select Arabic words that represent universal
concepts.
2- Arrange the concepts of the UNL network in a
syntactically well-formed sentence.
27- Simulation of how the Deconverter works
description(iclgtaction)
obj
Egypt
aoj
outcome(iclgtresul)._at_entry
collaboration(iclgtaction)
mod
agt
bas
aoj
More (aojgtthing)
150
scientist(iclgtscholar) ._at_entrry
aoj
prominent(aojgtthing)
and
Egypt
scholar(iclgtperson)
gol
accompany(agtgtthing,objgtthing)
1798
agt
tim
Bonaparte(iofgtperson)
obj
?
?????
????
?????
??
????
?
?????
????
????
??
???
???????
???
???
????
???
??
150
1798
??? ??? ????? ????? ???? ?? 150 ???? ? ???? ?????
????? ?????? ??????? ?? 1789 ??? ???
284- A Corpus for Modern Standard Arabic
- A representative sample (100 Millions) that
reflects the empirical usage of Modern Standard
Arabic.
- It plays a principle role in enhancing and
updating both EnConversion and DeConversion rules.
29B) Developing tools
1- Integrated Development Environment (IDE)
30(No Transcript)
312- Corpus analysis software (GATE)
32C) Arabic language-based universal materials.
Library of Alexandria the Fourth Pyramid.
Abou Simple The Temple of the Sun.
Nasser Digital Library
The Encyclopaedia of Famous Persons
33An example of an Arabic Sentence in UNL
(universal) format
34(No Transcript)
35???? ???? ??? ?????? ????? ?????? ???? ??????
???? ???? ??? ?? ??? 1888 ?? ???? ??? ?? ?? ????
??? ?? ???? ?? ????????? ????? ??? ??? ??? ??
??????? ??? ?? ??? ????? ?????? ?? ????? ??????
???????????? ???? ????? ???? ?????? ????? ??????
??????.
unl aoj(son(iclgtperson)0I._at_def._at_entry, Gamal
Abdel Nasser(iofgtperson)00) mod(son(iclgtperson)0
I._at_def._at_entry, Abd El-Naser Hosen(iofgtperson)23._at_
topic) aoj(old(aojgtthing)1J, son(iclgtperson)0I._at_
def) man(old(aojgtthing)1J, most(iclgthow)15) obj(
born(objgtthing)31._at_past, Abd El-Naser
Hossain(iofgtperson)23._at_topic) and(get(agtgtthing,o
bjgtthing)6S._at_past._at_contrast, born(objgtthing)31._at_
past) scn(born(objgtthing)31._at_past, family(iclgtgro
up)5Q) plc(born(objgtthing)31._at_past, village(iclgt
region)4D) tim(born(objgtthing)31._at_past, year(icl
gtperiod)3M) mod(year(iclgtperiod)3M, 188841) plc
(village(iclgtregion)4D, upper Egypt(iofgtplace)58
) mod(village(iclgtregion)4D, Bani
Morr(iofgtvillage)4S) mod(family(iclgtgroup)5Q, fa
rmer(iclgtperson)65._at_pl._at_def) obj(get(agtgtthing,ob
jgtthing)6S._at_past._at_contrast, degree(iclgtabstract
thing)7N) agt(allow(agtgtthing,golgtthing,objgtthing
)8M._at_past, degree(iclgtabstract
thing)7N) mod(degree(iclgtabstract
thing)7N, education(iclgtactivity)82._at_def) gol(al
low(agtgtthing,golgtthing,objgtthing)8M._at_past, join(
agtgtperson,objgtthing)9I._at_present) obj(allow(agtgtt
hing,golgtthing,objgtthing)8M._at_past, his(posgthe)97
) and(suffice(aojgtthing,objgtthing)CM._at_present, jo
in(agtgtperson,objgtthing)9I._at_present) obj(join(agt
gtperson,objgtthing)9I._at_present, job(iclgtwork)A7)
plc(job(iclgtwork)A7, postal serviceiclgtservice
)AN) plc(postal serviceiclgtservice
)AN, Alexandria(iofgtcity)BB) aoj(suffice(aojgtthi
ng,objgtthing)CM._at_present, salary(iclgtmoney)BV) m
od(salary(iclgtmoney)BV, his(posgthe)CB) obj(suffi
ce(aojgtthing,objgtthing)CM._at_present, satisfy(agtgtt
hing,objgtthing)DQ) man(suffice(aojgtthing,objgtthin
g)CM._at_present, hardlyDA) obj(satisfy(agtgtthing,o
bjgtthing)DQ, demand(iclgtwants)E6._at_pl._at_def)
mod(demand(iclgtwants)E6._at_pl._at_def, life(iclgtactiv
ity)EV._at_def) /unl
Language -Independent Format
36Is it going to work this way?!!
- Are there language servers ready to work?
- Are the universal materials deconvertable to
other languages?
What about Arabic??
- Is the Arabic language server able to enconvert
Arabic texts to universal format?
- Is it also able to deconvert the universal
materials back to Arabic?
37A proof of the concept
38UNL-based Library Information System (UNL-LIS)
- It is a system to search in a digital library
catalogs.
- It is built on the UNL KI, therefore
- Query is in Natural Language (two languages)
- Answer is also in Natural Language (7 languages)
39UNL LIS Core Architecture
User Question
LIS
Language Server Enco rules Dic
Question in NL
MARC21 Records
Enconversion Process
Question in UNL
UNL KB
MARC21 Importing Process
Encyclopedia
Query Engine
Answer in UNL
Concepts Definitions
Deconversion Process
Language Server Deco rules Dic
Answer in NL
40Demo Screen Shots
41unl agt(begin(agtgtthing,objgtaction)12._at_past._at_en
try, Naguib Mahfouz(iofgtperson)0N._at_topic)
obj(begin(agtgtthing,objgtaction)12._at_past._at_entry,
writing(iclgtaction)18) tim(begin(agtgtthing,objgtac
tion)12._at_past._at_entry, year old1S._at_past)
aoj(year old1S._at_past, Naguib Mahfouz(iofgtperson)
0N._at_topic) qua(year old1S._at_past, 17)
plc(born(aojgtthing)00, Cairo(iofgtcity)08)
aoj(born(aojgtthing)00, Naguib Mahfouz(iofgtperson)
0N._at_topic) tim(born(aojgtthing)00, 19110H)
/unl /S Time 1.4 Sec Done! unl
and(write(agtgtthing,objgtthing)1K._at_past._at_entry,
publish(agtgtthing,objgtthing)0K._at_past)
obj(write(agtgtthing,objgtthing)1K._at_past._at_entry,
novel(iclgttale)1B._at_pl._at_topic) tim(write(agtgtthing
,objgtthing)1K._at_past._at_entry, before(iclgthow(objgtth
ing))1S) aoj(more(iclgtadditional)1A,
novel(iclgttale)1B._at_pl._at_topic) qua(novel(iclgttale)
1B._at_pl._at_topic, 1016) /S
42(No Transcript)
43(No Transcript)
44(No Transcript)
45(No Transcript)
46Conclusion
47Conclusion
- Independency of language is a very important
dimension that should be considered in storing
and retrieving texts for a UDL
- The UNL system is a promising formalism for
representing knowledge in a universal format.
- The ISAUC less than 2 years old, however, it is
one of the very active language centres in
designing and implementing UNL materials and
tools.
- The UNL LIS has proved feasibility of the
concept of language independency.
48Thank YouAny question is welcomed.