Using OLIF, The Open Lexicon Interchange Format - PowerPoint PPT Presentation

About This Presentation
Title:

Using OLIF, The Open Lexicon Interchange Format

Description:

The Open Lexicon Interchange Format. XML-compliant standard ... Handles basic exchange as well as more complex applications such as MT lexicons ... – PowerPoint PPT presentation

Number of Views:184
Avg rating:3.0/5.0
Slides: 26
Provided by: Sus5130
Category:

less

Transcript and Presenter's Notes

Title: Using OLIF, The Open Lexicon Interchange Format


1
Using OLIF,The Open Lexicon Interchange Format
  • Susan McCormick
  • OLIF2 Consortium
  • October 1, 2004

2
The OLIF Format
  • The Open Lexicon Interchange Format
  • XML-compliant standard
  • Supports exchange of lexical and terminological
    data for language technology applications
  • Handles basic exchange as well as more complex
    applications such as MT lexicons

3
The OLIF2 Consortium
  • OLIF v.2 was developed by the OLIF2 Consortium, a
    group of language technology companies and
    organizations interested in issues of MT
    data/term data exchange
  • Led by SAP
  • Members include Xerox, Microsoft, Trados, IBM,
    Systran, IAI, DFKI and Comprendium

4
Developing OLIF v.2
  • Based on OLIF prototype
  • Developed in EC-funded OTELO project proposing
    standards for users of disparate language tools
  • Original purpose of OLIF was to facilitate
    terminology exchange for industrial users of MT

5
Developing OLIF v.2
  • Version 2 adapted from OLIF prototype using input
    from
  • Developers/users of 3 MT systems
  • Developers/users of terminology management
    systems
  • Other language standards projects
  • EAGLES
  • SALT
  • ISLE
  • MARTIF, TBX

6
OLIF Version 2
  • Released as open standard in 2002
  • XML-compliant
  • Covers 6 European languages
  • English, German, French, Spanish, Danish,
    Portuguese
  • Includes options for modeling administrative,
    morphological, syntactic and semantic data

7
Available to Users
  • XML implementation of OLIF specification in a DTD
  • Available from OLIF2 Consortium web site
  • www.olif.net

8
The OLIF File
  • Follows Terminology Markup Framework (TMF)
    structure
  • Header
  • Body
  • Shared resources

9
The OLIF Entry
  • Collection of monolingual data on a specified
    sense of a word or phrase
  • Optional links for cross-reference and transfer
  • Transfer is bilingual and unidirectional
  • Multiple transfers in multiple languages possible
    for single word sense

10
Key Data Categories
  • The OLIF entry is uniquely identified by 5 key
    data categories
  • Canonical form
  • Language
  • Part of speech
  • Subject field
  • Semantic reading

11
Basic Well-Formed OLIF Entry
ltentrygt ltmonogt ltkeyDCgt  
ltcanFormgttablelt/canFormgt  
ltlanguagegtenlt/languagegt  
ltptOfSpeechgtnounlt/ptOfSpeechgt  
ltsubjFieldgtgenerallt/subjFieldgt  
ltsemReadinggt86lt/semReadinggt   lt/keyDCgt
lt/monogt lt/entrygt
12
  • ltentrygt
  • ltmonogt
  • ltkeyDCgt
  • ltcanFormgttablelt/canFormgt
  •   ltlanguagegtenlt/languagegt
  •   ltptOfSpeechgtnounlt/ptOfSpeechgt
  •   ltsubjFieldgtgenerallt/subjFieldgt
  •   ltsemReadinggt86lt/semReadinggt
  • lt/keyDCgt
  • ltmonoDCgt

ltmonoAdmingt  
ltoriginatorgtWeberlt/originatorgt
ltadminStatusgtverlt/adminStatusgt
lt/monoAdmingt
ltmonoMorphgt
ltinflectiongtlike book,bookslt/inflectiongt
lt/monoMorphgt
ltmonoSyngt
ltsynTypegtcntlt/synTypegt
ltsynFramegtgencomp-optlt/synFramegt
lt/monoSyngt
ltmonoSemgt
ltsemTypegtinformlt/semTypegt
lt/monoSemgt
13
OLIF Entry with Cross-Reference
  • ltentrygt
  • ltmonogt
  • ltkeyDCgt
  • ltcanFormgttablelt/canFormgt
  •   ltlanguagegtenlt/languagegt
  •   ltptOfSpeechgtnounlt/ptOfSpeechgt
  •   ltsubjFieldgtgenerallt/subjFieldgt
  •   ltsemReadinggt86lt/semReadinggt
  • lt/keyDCgt
  • lt/monogt

ltcrossRefergt ltkeyDCgt  
ltcanFormgtrowlt/canFormgt  
ltlanguagegtenlt/languagegt  
ltptOfSpeechgtnounlt/ptOfSpeechgt  
ltsubjFieldgtgenerallt/subjFieldgt  
ltsemReadinggt69lt/semReadinggt   lt/keyDCgt
ltcrLinkTypegthas-meronymlt/crLinkTypegt lt/crossRefergt
14
OLIF Entry with Transfer
  • ltentrygt
  • ltmonogt
  • ltkeyDCgt
  • ltcanFormgttablelt/canFormgt
  •   ltlanguagegtenlt/languagegt
  •   ltptOfSpeechgtnounlt/ptOfSpeechgt
  •   ltsubjFieldgtgenerallt/subjFieldgt
  •   ltsemReadinggt86lt/semReadinggt
  • lt/keyDCgt
  • lt/monogt

lttransfergt ltkeyDCgt  
ltcanFormgtTabellelt/canFormgt  
ltlanguagegtdelt/languagegt  
ltptOfSpeechgtnounlt/ptOfSpeechgt  
ltsubjFieldgtgenerallt/subjFieldgt  
ltsemReadinggt86lt/semReadinggt  
lt/keyDCgt lt/transfergt
15
Data Category Values
  • Allowed values specified by OLIF
  • Administrative, terminological, linguistic values
    based on
  • General industry standards
  • E.g., allowed values for date derived from
    recommendations from ISO 86011988
  • MT/Terminology standards
  • E.g., suggested values for subject field adapted
    from EC
  • Widely-recognized linguistic standards
  • E.g., allowed values for gender based on
    longstanding gender description for European
    languages

16
User Extensions The OLIF Data Category Registry
  • Users may declare and use their own values for
    certain data categories
  • Subject field
  • Semantic reading
  • Morphological structure
  • Part of speech
  • Inflection
  • Aspect
  • Syntactic type
  • Syntactic frame
  • Semantic type
  • Concept hierarchy

17
Organizing Based on Concept
  • Users may link monolingual entries via a concept
    identifier
  • These IDs can be used to organize entries as
    equivalent word senses associated with the same
    concepts rather than source word senses
    associated with transfers.

18
Entries Linked by Concept
  • ltentry ConceptUserId
  • 0731F16CCCD2D3119B4Dgt
  • ltmonogt
  • ltkeyDCgt
  •   ltcanFormgttablelt/canFormgt
  •   ltlanguagegtenlt/languagegt
  •   ltptOfSpeechgtnounlt/ptOfSpeechgt
  •   ltsubjFieldgtgenerallt/subjFieldgt
  •   ltsemReadinggt86lt/semReadinggt
  •   lt/keyDCgt
  • lt/monogt
  • lt/entrygt

ltentry ConceptUserId
0731F16CCCD2D3119B4Dgt ltmonogt
ltkeyDCgt   ltcanFormgtTabellelt/canFormgt
  ltlanguagegtdelt/languagegt  
ltptOfSpeechgtnounlt/ptOfSpeechgt  
ltsubjFieldgtgenerallt/subjFieldgt  
ltsemReadinggt86lt/semReadinggt  
lt/keyDCgt lt/monogt lt/entrygt
19
Whats Available to the OLIF User?
  • On www.olif.net
  • Complete XML DTD for download
  • Hyperlinked DTD for viewing
  • Graphical view of structure of DTD
  • Current specification for OLIF v.2
  • Formalization of OLIF data categories
  • Alphabetic list of XML elements and attributes
  • Fixed and recommended values for elements and
    attributes
  • Guidelines for formulating canonical forms
  • Sample OLIF entries

20
(No Transcript)
21
Using OLIF
  • Some applications
  • SAP has implemented an OLIF converter to exchange
    terminological data from its central termbase
    SAPterm
  • MT developers in OLIF2 Consortium currently
    developing OLIF converters (Comprendium, Systran)
  • OLIF User Forum 60 members

22
Whats New XML Schema
  • OLIF XSD offers
  • 40 built-in data types
  • Allows creation of user-defined data types
  • Supports inheritance

23
Whats New The OLIF API
  • Based on OLIF XSD, Java classes created
  • Supports
  • Converting .csv files to OLIF
  • Converting from XML format to OLIF
  • Creating OLIF documents from scratch
  • Modifying OLIF documents

24
What to Expect this Year from OLIF
  • OLIF XSD and API are available to the user from
    www.olif.net
  • OLIF web site upgraded, updated
  • Requirements for modeling Japanese entries
    integrated

25
OLIF User Forum
  • Users of OLIF can access and post questions,
    messages and sample data from the OLIF group
    site
  • http//groups.yahoo.com/group/olifConsortium/
Write a Comment
User Comments (0)
About PowerShow.com