Adriana Roventini* - PowerPoint PPT Presentation

About This Presentation
Title:

Adriana Roventini*

Description:

Istituto di Linguistica Computazionale del CNR Pisa Italy ... a bordo pi assicurazione e nolo mare pagati (loading costs, insurance and sea-freight prepaid) ... – PowerPoint PPT presentation

Number of Views:68
Avg rating:3.0/5.0
Slides: 41
Provided by: ritamar
Category:

less

Transcript and Presenter's Notes

Title: Adriana Roventini*


1
Adriana Roventini Rita Marinelli
Extending the Italian WordNet with the
Specialized Language of the Maritime
Domain Istituto di Linguistica Computazionale
del CNR Pisa Italye-mail rita.marinelli_at_ilc.
cnr.it - adriana.roventini_at_ilc.cnr.it
2
Our purpose
to describe the construction we are carrying
out at the Institute for Computational
Linguistics, of a terminological subset belonging
to the maritime lexical domain (in particular to
the technical and commercial/maritime transport
domain).
3
Wordnet
  • In the Princeton semantic WordNet (Miller et al.,
    1990) the meanings of words are represented in
    terms of their conceptual-semantic and lexical
    relations to other words
  • it has been the tool of choice for building
    Natural Language Processing (NLP) systems of
    various kinds.

4
EWN
  • The main goals of the EuroWordNet (EWN) are
  • to develop a (multilingual) lexical resource,
    retaining the basic underlying design of WordNet
    1.5 (hereafter WN1.5)
  • to improve it in order to meet the needs of
    research in the field of NLP (Vossen, 1999).

5
Background
  • SI-TAL an Italian national Project (Integrated
    System for the Automatic Treatment of Language)
  • development of various integrated language
    resources and software tools for the automatic
    treatment of Italian written and spoken language
  • ITALWORDNET lexical semantic resource developed
    within the SI-TAL project, enlarging the first
    database built in EWN.

6
IWN
  • EWN project IWN
    SI-TAL

  • Integrated System for the Automatic


  • Treatment of Language
  • IWN database containing ca. 50.000 synsets
  • Nouns
  • Verbs
  • Adjectives
  • Adverbs
  • Proper Names
  • IWN links synsets by lexical-semantic relations
  • Synonymy
  • the most important
    relations
  • Hyponymy
  • Many other semantic relations encoded for various
    subsets of Italian Nouns (Common Proper ),
    Verbs, Adjectives
  • IWN synsets linked toWordNet 1.5 through a
    generic ILI (InterLingual
  • Index)

Not encoded in EWN
7
The IWN linguistic model
  • Synsets and synonymy relation
  • Synset as basic notion
  • around which WN, EWN and IWN are built
    synset or set of synonymous words belonging to
    the same Part-of-Speech (PoS) that can be
    interchanged at least in a context.
  • Synsets are connected by semantic relations
    to other synsets and to the ILI (an unstructured
    version of WN 1.5, containing all its synsets but
    not the relations among them).

8
Inherited from EWN also
  • language-internal relations link the
    language-specific synsets (mainly
    hyperonymy/hyponymy or is-A relation, role,
    causes, purpose, part relations, etc.)
  • equivalence relations link the Italian synsets
    to the InterLingual-Index (ILI).
  • By linking our wordnet to the ILI we ensured the
    possibility to use IWN for multilingual
    applications.

9
Reasons for our choice
  • The globalisation of trade, business and travel
    and the technology development (growing
    importance of transport).
  • The changes produced within the maritime activity
    and the related terminology (remarkable incidence
    of this lexical domain)
  • New techniques of communication, translation and
    diffusion of terms (monopole of the English
    language).

10
Building/structuring the terminological IWN
  • according to the design principles of the generic
    wordnet, (applying the same semantic relations
    model)
  • exploiting the possibility - available in IWN
    through the Inter-Lingual Index (ILI) - of
    linking the specialized terms to the
    corresponding closest concepts in English.

11
Sources
  • Several information sources have been used to
    select the BC
  • the Dizionario Globale dei termini marinareschi,
    edited by the Capitaneria del Porto di Livorno,
    online on the Web
  • the Dizionario di marina, edited by Barberi
    Squarotti G. , Gallinaro I, (2002)
  • the Glossario dello spedizioniere (Annuario
    Federspedi 1988)
  • the Dizionario di termini marittimi mercatili,
    compiled by P. R. Brodie and translated by E.
    Vincenzini, Lloyds of London Press, Legal
    Publishing and Conferences Division, 1988.

12
Choice of the base concepts (BCs)
  • design of the terminological database top level,
    identifying the most relevant and representative
    domain concepts or basic concepts (BCs) .
  • (i.e. showing a large number of hyponyms,
    and/or more frequently used in this particular
    domain of maritime navigation and transport).

13
First Base-Concepts
  • A first nucleus of over 200 BCs was identified,
    such as nave (ship), porto (harbour), ormeggio
    (mooring), albero (mast), carico (cargo),
    spedizione (shipment), navigazione (navigation),
    trasporto (transport), tariffa (tariff), nolo
    (freight) and so on, which are sufficiently
    general and constitute the root nodes of the
    specialized database.

14
BCs export/import
  • as XML files
  • (see the example below concerning the verb
    imbarcare/to ship).
  •  

IWN
xml
IWNTerm
15
  • Example of an XML export file imbarcare (to
    ship)
  • - ltWORD_MEANING ID"V32560" PART_OF_SPEECH"V"gt
  •   ltGLOSS /gt
  • - ltVARIANTSgt
  •   ltLITERAL LEMMA"imbarcare" SENSE"1"
    STATUS"CT" /gt
  •   lt/VARIANTSgt
  • - ltINTERNAL_LINKSgt
  • - ltRELATION TYPE"xpos_near_synonym" ID"2"
    INV_ID"2"gt
  •   ltTARGET_WM ID"27869" PART_OF_SPEECH"N"
    LEMMA"imbarco" SENSE"1" GLOSS"" /gt
  •   lt/RELATIONgt
  • - ltRELATION TYPE"has_hyperonym" ID"8"
    INV_ID"8"gt
  •   ltTARGET_WM ID"32127" PART_OF_SPEECH"V"
    LEMMA"fare" SENSE"14" GLOSS"causare un
    cambiamento in un processo o uno stato (seguito
    da un infinito)." /gt
  •   lt/RELATIONgt
  • - ltRELATION TYPE"has_hyponym" ID"10"
    INV_ID"10"gt
  •   ltTARGET_WM ID"36489" PART_OF_SPEECH"V"
    LEMMA"reimbarcare" SENSE"1" GLOSS"" /gt
  •   lt/RELATIONgt
  • - ltRELATION TYPE"involved_instrument" ID"31"
    INV_ID"31"gt
  •   ltTARGET_WM ID"15111" PART_OF_SPEECH"N"
    LEMMA"imbarcatoio" SENSE"1" GLOSS"" /gt
  •   lt/RELATIONgt

16
New BCs
  • Other BCs were included ex novo, not present
    with their maritime senses in the generic
    database, but very frequently used and
    representative of this specific domain, for
    instance nolo (freight), classe (class), fanale
    (light), punto (position), destino (destination),
    agente marittimo (shipping agent), spedizioniere
    (freight forwarder).

17
Example Punto (Position)
18
Use of Relations to codify specialized terms
  • first nucleus of terms increased
  • (encoding hyponyms and using other semantic
    relations)

19
Example Ormeggio (Mooring)
20
Kind and Number of Terms
  • 2227 lemmas corresponding to 1721 synsets and
    2355 word-senses belonging to the maritime
    (technical/nautical and maritime transports)
    domain all linked to the generic wordnet.
  • Terms belonging to all the different grammatical
    categories of nouns, verbs, adjectives, adverbs
    and a small set of proper names have been
    codified in the terminological data base (3971
    relations).

21
Example Porto (Harbour)
22
Polilexical Units
  • Base Concepts (BCs) as the root of a
    terminological sub-hierarchy
  • (in many cases) hyponyms BC adjective or
    prepositional phrase
  • For instance
  • carico (cargo),
  • carico completo (full cargo), carico di
    merci varie (general cargo), carico in coperta
    (deck cargo), carico parziale (part load cargo),
  • tariffa (tariff),
  • tariffa doganale (custom tariff), tariffa di
    trasporto (transport tariff), tariffa forfettaria
    (flat-rate tariff),
  • nolo (freight)
  • nolo anticipato (freight prepaid), nolo
    intero (full freight), nolo secondo il valore (ad
    valorem freight), nolo a destino (freight payable
    at destination).

23
Linking Terms to the ILI
  • Actually the English term or multiword (or its
    acronym) is often known and used much more than
    the Italian one in the maritime transport
    activity.
  • Difficulty in finding the synonyms
  • both the English term (or multiword) and the
    Italian
  • one are included in the synset as variants,
    (as we thought
  • this could be useful to non-professionals as
    well).

24
EXAMPLES
  • RO-RO (Roll On/Roll Off) usually indicates nave
    traghetto per automezzi (ferry for vehicles
    transport),
  • the abbreviation FOB (Free On Board) is used to
    say con le spese pagate fino a bordo, (loading
    costs paid up to the ships broadside),
  • CIF (Cost Insurance and Freight) to say costi
    fino a bordo più assicurazione e nolo mare pagati
    (loading costs, insurance and sea-freight
    prepaid).

25
The Link Structure
  • the BCs identified for this terminological
    lexicon constitute the top level and are the root
    nodes for the plug-in operation which allows
    linking between the generic and the specialized
    wordnet.

26
Two types of plug_in relations are codified
  • the eq-plug-in relation, as equivalence
    synonymy relation between synsets of the two
    databases
  • the has-hyperonym(hyponym)-plug relation, as
    equivalence hyperonymy/hyponymy relation between
    synsets of the two databases.

27
Tool Facilities
  • a simultaneous parallel consultation of the two
    databases to facilitate insertion of the
    relations
  • an integrated research between the two databases
  • if the lemma is found in both databases and
    there is an eq-plug-in relation between the
    synsets, the synset belonging to the specific
    domain eclipses the generic one exploiting the
    integrated research.

28
Tool Facilities
downward and horizontal relations (part-of
relations, role relations, cause relations,
derivation, etc.) are taken from the
terminological wordnet. upward (hyperonymy)
relations are taken from the generic one. It is
possible to access the generic database or the
terminological database or both databases at the
same time.
29
EXAMPLE Nolo (Freight)
30
Nolo plug-in (with downward relations)
31
Nolo plug-in (with upward relations)
32
EXAMPLE Bussola (Compass)
33
Bussola plug_in (with downward relations)
34
Bussola plug_in (with upward relations)
35
Differences between IWN and Dictionaries/Glossarie
s
  • The data are not only described (by the
    definition), but also codified (by relations)
  • data structured only alfabetically in the
    dictionary edited by the Harbour Master (we can
    read for example all information about bussola
    all together and almost confused) become, in a
    relational database, synsets, linked to each
    other by many types of semantic relations
    (hyperonymy, hyponymy, holo/mero part, etc.)
    which can also be managed automatically.

36
FINAL REMARKS
  • maritime terminology is object of great interest
    in a maritime nation like Italy, which has a
    strong marine tradition
  • the English terms prevail over the Italian
    synonyms
  • maritime terminology dictionaries are rare and
    sometimes it is very difficult to find an English
    translation of these terms

37
Instrument for work
  • The possibility of having definitions and
    translations of specific terms is a useful
    instrument for work (export-import companies,
    maritime agencies, etc.), at school and the
    didactic activities of various types (nautical
    Institutes, professional training, etc.) and, in
    general, whenever a reference to terms of this
    specific domain is needed.

38
  • From a commercial point of view, the English
    language prevails over all other languages
    contracts, negotiations, chartering and operation
    documents of cargo ships (like bills of lading,
    etc.) are in English, and so are a great number
    of reference books.
  • from the point of view of usefulness, there are
    circumstances in which it is necessary to refer
    to a translation of technical terms that is
    correct, abreast and absolutely unambiguous.

39
Our aim
  • to build a terminological database showing the
    semantic relations between different concepts, a
    precise correct linkage to the English terms, and
    then to make it a point of reference, in
    circumstances like legal actions, for instance,
    when the judge..
  • to carry on this research increasing the number
    of terms and starting a cooperation with the
    official transport organizations in order to
    enrich and refine this product and to arrive at a
    definitive version recognized and validated.
  • to start this kind of research for the Italian
    language.

40
Results
  • Specialized lexicon enlarged
  • Italian terms clarified
  • More effective management of Italian terms and
    English terms
  • In spite of globalisation, in a maritime
    country like ours it is absolutely essential not
    to lose our linguistic identity
Write a Comment
User Comments (0)
About PowerShow.com