Title: OLIF V2
1OLIF V2
2OLIF Overview
- Rationale
- Principles
- Entries
- Descriptions
- Header
- Examples
- Status
3The Exchange Problem
- the same text is processed with different tools
- how do these tools communicate?
4The Resource Problem
Translation Project
- how often will the same term be stored?
- who pays for redundant maintenance?
5The Migration Problem
use of translation tool 1
- do you have to re-build your language
resources? - who pays for rebuilding lexicons and memories?
6Problems in Exchange
- Different purposes of exchange
- data import / export
- data validation
- Different content of exchange
- terminological data, different for each system
- lexicographical data, different for each system
- Different structures of exchange files
- trend towards markup structures (SGML/XML)
7OLIF Principles
- Keep it simple!
- flat feature value structures
- standard software environment
- Keep it pragmatic!
- worry only about whats there
- bottom-up compare what systems have
8OLIF V2 formalism
- Define a representation formalism
- XML
- general format processing tools available
- but overhead in markups gt not suitable for
very large files - Coding standard UTF-8
- File structure
- header globals, defaults, definitions
- body sequence of entries
9OLIF principles
- Entries are concept-based
- i.e. we describe word senses different
readings -gt different entries - Entries have (monolingual) descriptions
- both for MT and terminology
- Entries have links to other entries
- inner-language crossreference links
- intra-language transfers (multilingual,
directed)
10Entry Structure
- central information
- definition features
- administrative features
- monolingual / linguistic feature set
- terminological feature set
- transfer features
- cross-references / links
11Definition of the entry
- Obligatory features
- language
- (only language or also locale?)
- canonical form
- do we need guidelines (multiwords)
- part of speech
- only open word classes N, V, A, Adv, Prep
- domain information (semantics)
- gt a common top level classification?
- reading no.
12Scope of the descriptions
- minimal linguistic descriptions
- features which everybody has / needs
- minimal terminology descriptions
- feature set of e.g. Interval
- minimal transfer descriptions
- equitype, tests, transfers
- minimal thesaurus / ontology relations
- ISO standards
- additional fields for personal use
13Linguistic Descriptions
- Morphological Features
- entry type
- abbreviation, single word, compound, multiword)
- inflection class
- enumeration of inflection patterns, per category
- gender
- (special) number
- singulare / plurale tantum
- degree / comparative
14Linguistic descriptions
- Syntactic Features
- Syntactic Type
- (subcategorisation of part-of-speech)
- Syntactic Frame
- Argument structures (DObj, PObj-for, ...)
- (Transitivity)
- intransitive, transitive
15Linguistic Descriptions
- Semantic Features
- Semantic type
- for subclasses only?
- who has / needs it?
16Terminological Descriptions
- Minimum needed to validate an entry(Interval)
- Definition
- Context
- Scope
- Comment / Note
- (validation status)
- (a three-level hierarchy)
17Transfer Descriptions
- Equivalence type
- full - partial (subset / superset) - none
- for reversible entries
- Tests and Actions
- (to be worked out)
- Comment
- (Definition of the target link)
18Cross-Reference Descriptions
- Linktype
- thesaurus relations
- broader / narrower / synonym / related
- additional customisable relations
- abbreviation_for, forbidden, outdated
- Definition of the target link
19Administrative Information
- Source of Entry
- string
- Author
- creation author
- last modofication
- Date
- creation date
- last modification date
20Header Information
- Definition of Encoding
- given in the XML statement (UTF-8)
- Definition of features / values used
- Definition of default values
21Example (1)
ltENTRYgt ltMONOgt
ltLGgt de lt/LGgt
ltCANgt Brot
lt/CANgt ltCATgt noun lt/CATgt
ltSAgt gv lt/SAgt ....
lt/MONOgt lt/ENTRYgt
22Example 2a
ltEntrygtltMONOgt.... ltCANgt offshore account
lt/CANgt ltCATgt noun lt/CATgt ltSAgt
Money-Laundering lt/SAgt ...lt/MONOgtltXFRgt ltCANgt
compte en banque à létranger lt/CANgt ltLGgt
fr lt/LGgt ltCATgt noun lt/CATgt ltSAgt
Money-Laundering lt/SAgt lt/XFRgtltXFRgt ltCANgt
Auslandskonto lt/CANgt ltLGgt de lt/LGgt
... lt/Entrygt
23Example (2b)
ltEntrygtltMONOgt ltCANgt compte en banque à
létranger lt/CANgt ltCATgtnoun lt/CATgt
ltSAgtMoney-Laundering lt/SAgt ...lt/MONOgtltXFRgt
ltCANgtoffshore account lt/CANgt ltLGgten lt/LGgt
ltCATgtnoun ltSAgtMoney-Laundering lt/SAgt
lt/XFRgtltXFRgtltCANgtAuslandskonto lt/CANgt ltLGgtde
lt/LGgt ... lt/Entrygt
24OLIF Status
- Implementation of a central DB
- Implementation of OLIF parser generatorBUT
- different flavours of OLIF
- dependent on projects (OTELO - Aventinus)
- different formalisms (own, SGML, XML)
25Status
- Verification of OLIF
- MT lexicons (Logos, T1)
- Term Bases (SAPterm, DanTerm)
- Converter prototypes
- T1 lt-gt OLIF, Logos lt-gt OLIF
- Term Lookup systems
- based on OLIF-type database
- Comparisons with other formats