Symposium on Best Practice - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Symposium on Best Practice

Description:

Encoding resource in archival form (3) insures that ... Converting from enriched text form (Unicode compliant) or XML coded data is best, ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 18
Provided by: willia473
Category:

less

Transcript and Presenter's Notes

Title: Symposium on Best Practice


1
Resource Conversion
  • William Lewis
  • CSU Fresno

2
Preliminaries
  • Eventually any resource will become obsolete
  • ? Resource conversion is inevitable
  • One should plan from the start for eventual
    conversion
  • Encode your resource such that it
  • is Migratable
  • is Reusable
  • will Endure

3
Best Practice?
  • Simons (this symposium) argues there are 3
    relevant formats for encoding data
  • Working form
  • Presentation form
  • Archival form
  • (1) is tied to particular software.
  • (2) is generally generated from (1), but itself
    is often semantically sparse.

4
Best Practice!
  • Encoding resource in archival form (3) insures
    that
  • the resource is reusable, facilitating
    interoperability
  • the resource can be migrated to other formats
    (including presentation formats)
  • the resource endures

5
Data Survivability
  • Converting to archival XML form provides for data
    reuse and insures survivability

Working Form
Archival Form
Conversion Process
HTML
PDF
Other XML form
6
Data Conversion (text)
  • Highly dependent on flexibility of working form
    and related software
  • Converting from proprietary, binary format most
    difficult to be avoided
  • Converting from plain text output easiest might
    be avoided due to potential data loss
  • Converting from enriched text form (Unicode
    compliant) or XML coded data is best, but may not
    always be possible

7
Data Conversion
Working Form
Archival Form
Conversion Process
8
Data Conversion
Working Form
Inter-mediary Form (Enriched text)
Archival Form
CP
9
Intermediary Conversion
  • Use
  • Print Function or Save as
  • Print Function or other file convert
  • Data Query (direct to XML?)
  • As is
  • Resources in
  • Word Processor
  • Spreadsheet
  • Proprietary Flat File DB
  • Relational DB
  • XML or enriched text (inc. Shoebox)

10
Intermediary Conversion
  • Important
  • Insure that conversion to Intermediary Form
    suffers no data loss, or that the data loss
    suffered is minimal
  • Danger in Save As (and Print to file), in that
    data loss is possible

11
Final Conversion
  • Intermediary to Archival Form (Best Practice
    XML)
  • Font/Character transforms
  • Macros or methods for enriching and aligning data
    elements
  • Tables or glossaries defining how content and
    form should be interpreted

12
Data Conversion Case Study
  • Converting Hopi Dictionary (Hill et al 1998) from
    working form (legacy format)
  • Purpose
  • Build software to extract relevant data from
    working form
  • Generate reusable archival format
  • For dissemination on the Web
  • For use by others
  • To preserve data should DB software become
    unusable

13
Hopi Dictionary
  • Example entry from Hopi Dictionary

14
Hopi Dictionary Conversion
  • Until now
  • Generated text file from DB
  • Manually converted IPA fonts in MSWord
  • Generated PDFs for dissemination

15
Hopi Dictionary Conversion
  • New Process
  • Convert DB format to enriched text
  • Software transforms for fonts from text format
    (Unicode compliant IPA)
  • Identify the grammatical concepts used in
    entries, linked to GOLD (Farrar Langendoen,
    this symposium)
  • Generate XML structured using modified EMELD
    IGT format (Bow, Hughes Bird 2003)

16
Archival Hopi Dictionary Record
17
Recipe for Resource Conversion
  • Choose data format that is easily archived
  • Where the software provides for data migration,
    or,
  • The data format itself is easily converted
  • Use existing software to bring you as close to
    Archival Form as possible (Intermediary Form)
  • Clearly identify
  • Content and structural semantics (terms)
  • Fonts used (and transforms)
  • Data alignment
  • Construct transforms/macros/software to convert
    to Archival form
Write a Comment
User Comments (0)
About PowerShow.com