OLIF V2 - PowerPoint PPT Presentation

About This Presentation
Title:

OLIF V2

Description:

do you have to re-build your language resources? who pays ... (only language or also locale?) canonical form. do we need guidelines (multiwords) part of speech ... – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 26
Provided by: brigit6
Category:

less

Transcript and Presenter's Notes

Title: OLIF V2


1
OLIF V2
  • Gr. ThurmairApril 2000

2
OLIF Overview
  • Rationale
  • Principles
  • Entries
  • Descriptions
  • Header
  • Examples
  • Status

3
The Exchange Problem
  • the same text is processed with different tools
  • how do these tools communicate?

4
The Resource Problem
Translation Project
  • how often will the same term be stored?
  • who pays for redundant maintenance?

5
The Migration Problem
use of translation tool 1
  • do you have to re-build your language
    resources?
  • who pays for rebuilding lexicons and memories?

6
Problems in Exchange
  • Different purposes of exchange
  • data import / export
  • data validation
  • Different content of exchange
  • terminological data, different for each system
  • lexicographical data, different for each system
  • Different structures of exchange files
  • trend towards markup structures (SGML/XML)

7
OLIF Principles
  • Keep it simple!
  • flat feature value structures
  • standard software environment
  • Keep it pragmatic!
  • worry only about whats there
  • bottom-up compare what systems have

8
OLIF V2 formalism
  • Define a representation formalism
  • XML
  • general format processing tools available
  • but overhead in markups gt not suitable for
    very large files
  • Coding standard UTF-8
  • File structure
  • header globals, defaults, definitions
  • body sequence of entries

9
OLIF principles
  • Entries are concept-based
  • i.e. we describe word senses different
    readings -gt different entries
  • Entries have (monolingual) descriptions
  • both for MT and terminology
  • Entries have links to other entries
  • inner-language crossreference links
  • intra-language transfers (multilingual,
    directed)

10
Entry Structure
  • central information
  • definition features
  • administrative features
  • monolingual / linguistic feature set
  • terminological feature set
  • transfer features
  • cross-references / links

11
Definition of the entry
  • Obligatory features
  • language
  • (only language or also locale?)
  • canonical form
  • do we need guidelines (multiwords)
  • part of speech
  • only open word classes N, V, A, Adv, Prep
  • domain information (semantics)
  • gt a common top level classification?
  • reading no.

12
Scope of the descriptions
  • minimal linguistic descriptions
  • features which everybody has / needs
  • minimal terminology descriptions
  • feature set of e.g. Interval
  • minimal transfer descriptions
  • equitype, tests, transfers
  • minimal thesaurus / ontology relations
  • ISO standards
  • additional fields for personal use

13
Linguistic Descriptions
  • Morphological Features
  • entry type
  • abbreviation, single word, compound, multiword)
  • inflection class
  • enumeration of inflection patterns, per category
  • gender
  • (special) number
  • singulare / plurale tantum
  • degree / comparative

14
Linguistic descriptions
  • Syntactic Features
  • Syntactic Type
  • (subcategorisation of part-of-speech)
  • Syntactic Frame
  • Argument structures (DObj, PObj-for, ...)
  • (Transitivity)
  • intransitive, transitive

15
Linguistic Descriptions
  • Semantic Features
  • Semantic type
  • for subclasses only?
  • who has / needs it?

16
Terminological Descriptions
  • Minimum needed to validate an entry(Interval)
  • Definition
  • Context
  • Scope
  • Comment / Note
  • (validation status)
  • (a three-level hierarchy)

17
Transfer Descriptions
  • Equivalence type
  • full - partial (subset / superset) - none
  • for reversible entries
  • Tests and Actions
  • (to be worked out)
  • Comment
  • (Definition of the target link)

18
Cross-Reference Descriptions
  • Linktype
  • thesaurus relations
  • broader / narrower / synonym / related
  • additional customisable relations
  • abbreviation_for, forbidden, outdated
  • Definition of the target link

19
Administrative Information
  • Source of Entry
  • string
  • Author
  • creation author
  • last modofication
  • Date
  • creation date
  • last modification date

20
Header Information
  • Definition of Encoding
  • given in the XML statement (UTF-8)
  • Definition of features / values used
  • Definition of default values

21
Example (1)
ltENTRYgt ltMONOgt
ltLGgt de lt/LGgt
ltCANgt Brot
lt/CANgt ltCATgt noun lt/CATgt
ltSAgt gv lt/SAgt ....
lt/MONOgt lt/ENTRYgt
22
Example 2a
ltEntrygtltMONOgt.... ltCANgt offshore account
lt/CANgt ltCATgt noun lt/CATgt ltSAgt
Money-Laundering lt/SAgt ...lt/MONOgtltXFRgt ltCANgt
compte en banque à létranger lt/CANgt ltLGgt
fr lt/LGgt ltCATgt noun lt/CATgt ltSAgt
Money-Laundering lt/SAgt lt/XFRgtltXFRgt ltCANgt
Auslandskonto lt/CANgt ltLGgt de lt/LGgt
... lt/Entrygt
23
Example (2b)
ltEntrygtltMONOgt ltCANgt compte en banque à
létranger lt/CANgt ltCATgtnoun lt/CATgt
ltSAgtMoney-Laundering lt/SAgt ...lt/MONOgtltXFRgt
ltCANgtoffshore account lt/CANgt ltLGgten lt/LGgt
ltCATgtnoun ltSAgtMoney-Laundering lt/SAgt
lt/XFRgtltXFRgtltCANgtAuslandskonto lt/CANgt ltLGgtde
lt/LGgt ... lt/Entrygt
24
OLIF Status
  • Implementation of a central DB
  • Implementation of OLIF parser generatorBUT
  • different flavours of OLIF
  • dependent on projects (OTELO - Aventinus)
  • different formalisms (own, SGML, XML)

25
Status
  • Verification of OLIF
  • MT lexicons (Logos, T1)
  • Term Bases (SAPterm, DanTerm)
  • Converter prototypes
  • T1 lt-gt OLIF, Logos lt-gt OLIF
  • Term Lookup systems
  • based on OLIF-type database
  • Comparisons with other formats
Write a Comment
User Comments (0)
About PowerShow.com