An example of an Arabic Sentence in UNL (universal) forma - PowerPoint PPT Presentation

About This Presentation
Title:

An example of an Arabic Sentence in UNL (universal) forma

Description:

An example of an Arabic Sentence in UNL (universal) format Is it going to work this way?!! A proof of the concept UNL-based Library Information System ... – PowerPoint PPT presentation

Number of Views:75
Avg rating:3.0/5.0
Slides: 49
Provided by: bibalexOr5
Learn more at: http://www.bibalex.org
Category:

less

Transcript and Presenter's Notes

Title: An example of an Arabic Sentence in UNL (universal) forma


1
Towards a Language-Independent Universal Digital
Library
The Second International Conference on Universal
Digital Libraries (ICUDL 2006) 17-19-2006
November, Alexandria, Egypt
Sameh Alansary
Magdy Nagi Noha Adly sameh.alansary_at_biba
lex.org magdy.nagi_at_bibalex.org
noha.adly_at_bibalex.org Bibliotheca
Alexandrina
2
Introduction
  • IT made the full text libraries assets
    available digitally (Independent of time, place
    and copy).

e.g. - Million Book Project.
- Nasser Digital Library.
UDL
  • Digitization only does not lead to
    universality in its optimum sense.
  • A new dimension of universality should be added
    Independency of Language

3
Language-dependency blocks information
dissemination
  • Language dependency holds language barriers.
  • If it is always possible for everyone to read in
    everyones mother tongue, this will help in
  • Dissemination of knowledge.
    - Preservation of nationality and identity.
    - Preventing cultural hegemony.
  • 80 of books and e-materials is written in
    English and 20 is written in other languages.

4
Attempts to break language barriers
  • Translation systems have been introduced (NLP)

Approaches
1- Direct translation approach.
2- Transfer approach.
3- Interlingual approach.
  • Examples of Systems
  • Google translation
    http//www.google.ch/language_tools
  • Fujitsu systems http//www.fujitsu.com/
    global/services/translation

5
Drawback of MT systems
1- The quality of results is often inadequate.
2- Work for a limited number of language
combinations.
3- Hold an overload on the network
To translate from and to only 10 languages,
10 grammars, 10 lexicons, 90 translation
dictionaries and 90 sets of translation rules
will be needed, plus the need for semantic
processing in each language.
6
Towards a universal system for knowledge
representation
7
Some questions may bear in mind
  • How can we represent natural language materials
    in a language independent format? (a format
    required)
  • What is the system suitable for representing
    knowledge in the format selected? (a system
    required)
  • How is this system going to work?

8
Requirements for a universal representation of
knowledge
1- The content of the original material (meaning)
must not be lost.
2- This universal format should be understandable
by various platforms over the network.
3- This universal format should be decodable to
any natural language.
9
UNL System
10
What is UNL? (1)
  • The Universal Networking Language (UNL) is an
    artificial language for computers to express
    information and knowledge that can be expressed
    in natural language.
  • Started in 1996, as an initiative of the UNU/IAS
    in Japan
  • RD in UNL

- Development on 15 languages Arabic, Chinese,
English, French, German, Hindi, Indonesian,
Italian, Japanese, Korean, Portuguese, Russian,
Spanish, Thai, Swahili.
- Transferred to the UNDL Foundation in 2001.
11
What is UNL? (2)
  • It expresses information or knowledge of
    natural language (NL) in the form of semantic
    network with hyper-node.

The boy who works here went to school
Example
UNL expression
  • UNL
  • agt(go(iclgtmove)._at_entry._at_past, 01)
  • plt(go(iclgtoccur)._at_entry._at_past,
    school(iclgtinstitution))
  • agt01(work(iclgtdo), boy(iclgtperson._at_entry))
  • plc01(work(iclgtdo),here)
  • /UNL

12
The boy who works here went to school
go(iclgtmove) _at_ entry _at_ past
agt
plt
boy(iclgtperson) _at_ entry
school(iclgtinstitution)
here
agt
plc
work(iclgtdo)
01
UNL-hyper graph
13
The UNL System
Formalism
System
Components
Knowledge representation
14
The UNL-system components
UNL LANGUAGE SERVER Enconverter ? ?
Deconverter (EnCO) (EnCO)
Language Server UNL lt- gtChinese
DeCO
EnCO
UNL document
Language Server UNL lt-gt Arabic
UNL Editor
UNL Viewer
UNL Proxy
Language Server UNL lt-gt Spanish
Internet
1
2
3
Language Server UNL lt-gt Hindi
DeCO
EnCO
Language Server UNL lt- gtJapanese
Language Server UNL lt- gt English
DeCO
EnCO
DeCO
EnCO
15
A) Language servers
Analysis Rules
Web Server with UNL document
EnConverter
UNL
UNL-language Dictionary
Knowledge Base
UNL
UNL Language Server
DeConverter
NL
Concurrence Dictionary
Generation Rules
16
C) UNL Proxy Server
B) UNL Tools
1- UNL viewer.
2- UNL editor.
3- UNL verifier.
  • Searches for UNL at the web, send it to the
    language server and displays it on the users
    chosen language.

17
Mechanism of conversion between NL and UNL
Annotated Natural Language texts
Annotation Editor
Universal Parser
UNL Verifier
EnConverter
Natural Language texts
UNL Document
UW Dictionary
Grammatical Rules
Word Dictionary
Co- Occurrence Dictionary
UNL KB
Web server HTMLXML
Natural Language texts
UNL Document
DeConverter
18
UNL as a formal language How does it
represent knowledge?
1- Universal words (UW) to represent concepts.
Example boy(iclgtperson)
hear(iclgtperceive(agtgtperson,objgtthing))
2- relations 38 semantic relations can be
distinguished.
Example agt, aoj, bas, con, coo, dur, etc.
3- Attributes to express subjectivity of the
speaker.
Example _at_past, _at_emphasis, _at_def, _at_not, etc.
19
4- Knowledge base (UNLKB).
  • Define the Universal Word.
  • Provide linguistic knowledge of concepts

20
Ibrahim Shihata UNL Arabic Center (ISUAC)
  • It is established at Bibliotheca Alexandrina.
  • It is responsible for designing, implementing,
    and maintaining the various components of the
    Arabic language server.
  • The Arabic language server will be capable of

- Enconverting the Arabic texts to the universal
format.
- Deconverting the universal materials produce by
other language centers to Arabic.
21
The Achievements of the ISUAC
A) Arabic language resources and tools.
B) Developing tools.
C) Arabic language-based universal materials.
22
A) Arabic language resources and tools
1- The Arabic Dictionary
It is a repository of information for all UNL
Arabic grammars.
Dictionary
Head Words (Vocabulary of Arabic)
Universal words (Vocabulary of UNL)
Linguistics Features (Linguistic info about HWs)
23
2- Arabic EnConversion Rules
  • It is responsible for Enconverting Arabic to UNL.
  • Arabic EnConversion Rules are able to

1- Perform morphological analysis to extract
concepts the Arabic words refer to.
2- Assign exact semantic relation between
concepts as being expressed in the context of the
Arabic sentence.
24
  • Simulation of how Enconverter works

??? ???? ??? ?????? ?? 15 ????? 1918 ?? 18 ????
????? ?? ????? ???????????.
??? / /???? ??? ??????/ /??/ /15/?????/ /1918/
/??/ /18/ /????/ /?????/ /??/ /?????/
/???/????????/.
delete
delete
mod
mod
mod
obj
tim
tim
mod
plc
plc
plc
25
UNL Network
26
3- Arabic DeConversion Rules
  • It is responsible for generating Arabic
    sentences out of UNL networks.
  • Arabic DeConversion Rules are able to

1- Select Arabic words that represent universal
concepts.
2- Arrange the concepts of the UNL network in a
syntactically well-formed sentence.
27
  • Simulation of how the Deconverter works

description(iclgtaction)
obj
Egypt
aoj
outcome(iclgtresul)._at_entry
collaboration(iclgtaction)
mod
agt
bas
aoj
More (aojgtthing)
150
scientist(iclgtscholar) ._at_entrry
aoj
prominent(aojgtthing)
and
Egypt
scholar(iclgtperson)
gol
accompany(agtgtthing,objgtthing)
1798
agt
tim
Bonaparte(iofgtperson)
obj
?
?????
????
?????
??
????
?
?????
????
????
??
???
???????
???
???
????
???
??
150
1798
??? ??? ????? ????? ???? ?? 150 ???? ? ???? ?????
????? ?????? ??????? ?? 1789 ??? ???
28
4- A Corpus for Modern Standard Arabic
  • A representative sample (100 Millions) that
    reflects the empirical usage of Modern Standard
    Arabic.
  • It plays a principle role in enhancing and
    updating both EnConversion and DeConversion rules.

29
B) Developing tools
1- Integrated Development Environment (IDE)
30
(No Transcript)
31
2- Corpus analysis software (GATE)
32
C) Arabic language-based universal materials.
Library of Alexandria the Fourth Pyramid.
Abou Simple The Temple of the Sun.
Nasser Digital Library
The Encyclopaedia of Famous Persons
33
An example of an Arabic Sentence in UNL
(universal) format
34
(No Transcript)
35
???? ???? ??? ?????? ????? ?????? ???? ??????
???? ???? ??? ?? ??? 1888 ?? ???? ??? ?? ?? ????
??? ?? ???? ?? ????????? ????? ??? ??? ??? ??
??????? ??? ?? ??? ????? ?????? ?? ????? ??????
???????????? ???? ????? ???? ?????? ????? ??????
??????.
unl aoj(son(iclgtperson)0I._at_def._at_entry, Gamal
Abdel Nasser(iofgtperson)00) mod(son(iclgtperson)0
I._at_def._at_entry, Abd El-Naser Hosen(iofgtperson)23._at_
topic) aoj(old(aojgtthing)1J, son(iclgtperson)0I._at_
def) man(old(aojgtthing)1J, most(iclgthow)15) obj(
born(objgtthing)31._at_past, Abd El-Naser
Hossain(iofgtperson)23._at_topic) and(get(agtgtthing,o
bjgtthing)6S._at_past._at_contrast, born(objgtthing)31._at_
past) scn(born(objgtthing)31._at_past, family(iclgtgro
up)5Q) plc(born(objgtthing)31._at_past, village(iclgt
region)4D) tim(born(objgtthing)31._at_past, year(icl
gtperiod)3M) mod(year(iclgtperiod)3M, 188841) plc
(village(iclgtregion)4D, upper Egypt(iofgtplace)58
) mod(village(iclgtregion)4D, Bani
Morr(iofgtvillage)4S) mod(family(iclgtgroup)5Q, fa
rmer(iclgtperson)65._at_pl._at_def) obj(get(agtgtthing,ob
jgtthing)6S._at_past._at_contrast, degree(iclgtabstract
thing)7N) agt(allow(agtgtthing,golgtthing,objgtthing
)8M._at_past, degree(iclgtabstract
thing)7N) mod(degree(iclgtabstract
thing)7N, education(iclgtactivity)82._at_def) gol(al
low(agtgtthing,golgtthing,objgtthing)8M._at_past, join(
agtgtperson,objgtthing)9I._at_present) obj(allow(agtgtt
hing,golgtthing,objgtthing)8M._at_past, his(posgthe)97
) and(suffice(aojgtthing,objgtthing)CM._at_present, jo
in(agtgtperson,objgtthing)9I._at_present) obj(join(agt
gtperson,objgtthing)9I._at_present, job(iclgtwork)A7)
plc(job(iclgtwork)A7, postal serviceiclgtservice
)AN) plc(postal serviceiclgtservice
)AN, Alexandria(iofgtcity)BB) aoj(suffice(aojgtthi
ng,objgtthing)CM._at_present, salary(iclgtmoney)BV) m
od(salary(iclgtmoney)BV, his(posgthe)CB) obj(suffi
ce(aojgtthing,objgtthing)CM._at_present, satisfy(agtgtt
hing,objgtthing)DQ) man(suffice(aojgtthing,objgtthin
g)CM._at_present, hardlyDA) obj(satisfy(agtgtthing,o
bjgtthing)DQ, demand(iclgtwants)E6._at_pl._at_def)
mod(demand(iclgtwants)E6._at_pl._at_def, life(iclgtactiv
ity)EV._at_def) /unl
Language -Independent Format
36
Is it going to work this way?!!
  • Are there language servers ready to work?
  • Are the universal materials deconvertable to
    other languages?

What about Arabic??
  • Is the Arabic language server able to enconvert
    Arabic texts to universal format?
  • Is it also able to deconvert the universal
    materials back to Arabic?

37
A proof of the concept
38
UNL-based Library Information System (UNL-LIS)
  • It is a system to search in a digital library
    catalogs.
  • It is built on the UNL KI, therefore

- Query is in Natural Language (two languages)
  • Answer is also in Natural Language (7 languages)

39
UNL LIS Core Architecture
User Question
LIS
Language Server Enco rules Dic
Question in NL
MARC21 Records
Enconversion Process
Question in UNL
UNL KB
MARC21 Importing Process
Encyclopedia
Query Engine
Answer in UNL
Concepts Definitions
Deconversion Process
Language Server Deco rules Dic
Answer in NL
40
Demo Screen Shots
41
unl agt(begin(agtgtthing,objgtaction)12._at_past._at_en
try, Naguib Mahfouz(iofgtperson)0N._at_topic)
obj(begin(agtgtthing,objgtaction)12._at_past._at_entry,
writing(iclgtaction)18) tim(begin(agtgtthing,objgtac
tion)12._at_past._at_entry, year old1S._at_past)
aoj(year old1S._at_past, Naguib Mahfouz(iofgtperson)
0N._at_topic) qua(year old1S._at_past, 17)
plc(born(aojgtthing)00, Cairo(iofgtcity)08)
aoj(born(aojgtthing)00, Naguib Mahfouz(iofgtperson)
0N._at_topic) tim(born(aojgtthing)00, 19110H)
/unl /S Time 1.4 Sec Done! unl
and(write(agtgtthing,objgtthing)1K._at_past._at_entry,
publish(agtgtthing,objgtthing)0K._at_past)
obj(write(agtgtthing,objgtthing)1K._at_past._at_entry,
novel(iclgttale)1B._at_pl._at_topic) tim(write(agtgtthing
,objgtthing)1K._at_past._at_entry, before(iclgthow(objgtth
ing))1S) aoj(more(iclgtadditional)1A,
novel(iclgttale)1B._at_pl._at_topic) qua(novel(iclgttale)
1B._at_pl._at_topic, 1016) /S
42
(No Transcript)
43
(No Transcript)
44
(No Transcript)
45
(No Transcript)
46
Conclusion
47
Conclusion
  • Independency of language is a very important
    dimension that should be considered in storing
    and retrieving texts for a UDL
  • The UNL system is a promising formalism for
    representing knowledge in a universal format.
  • The ISAUC less than 2 years old, however, it is
    one of the very active language centres in
    designing and implementing UNL materials and
    tools.
  • The UNL LIS has proved feasibility of the
    concept of language independency.

48
Thank YouAny question is welcomed.
Write a Comment
User Comments (0)
About PowerShow.com