TEI : Text Encoding Initiative - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

TEI : Text Encoding Initiative

Description:

TEI : Text Encoding Initiative. Computing in the Humanities. July 11, 2003. John A. Mess. Overview. Origins of Text Encoding Initiative. Features of Encoded Text ... – PowerPoint PPT presentation

Number of Views:575
Avg rating:3.0/5.0
Slides: 31
Provided by: library96
Category:

less

Transcript and Presenter's Notes

Title: TEI : Text Encoding Initiative


1
TEI Text Encoding Initiative
  • Computing in the Humanities

July 11, 2003 John A. Mess
2
Overview
  • Origins of Text Encoding Initiative
  • Features of Encoded Text
  • Tagging of Text
  • Examples

Adapted in part from Lou Burnards
Workshop http//www.tei-c.org/Talks/ESS2001/elsnet
-4.htm Project Gutenbergs TEI Guide for HTML
Writers Guild http//gutenberg.hwg.org/teidtds.htm
l
3
Text Encoding Initiative
  • 1988, the date of the Poughkeepsie Conference
  • 1999, when the process of setting up the TEI
    Consortium began
  • Originally, a research project within the
    humanities
  • Sponsored by leading professional associations
  • Major influences of
  • digital libraries and text collections
  • language corpora
  • scholarly datasets
  • International consortium established June 1999
    (see http//www.tei-c.org/)

4
Goals of the TEI
  • better interchange and integration of scholarly
    data
  • support for all texts, in all languages, from all
    periods
  • guidance for the perplexed
  • what to encode
  • user-driven codification of existing best
    practice
  • assistance for the specialist how to encode
  • loose framework into which unpredictable
    extensions can be fitted
  • These apparently incompatible goals result in a
    highly flexible, modular, environment for DTD
    customization.

5
TEI Deliverables
  • A set of recommendations for text encoding,
    covering both generic text structures and some
    highly specific areas based on (but not limited
    by) existing practice
  • A very large collection of element
    definitionscombined into a very loose document
    type declaration
  • A mechanism for creating multiple views (DTDs) of
    the foregoing

One such view and associated tutorial TEI Lite
(http//www.tei-c.org/TEI/Lite/) for the full
picture see http//www.tei-c.org/TEI/Guidelines/
6
Legacy of the TEI
  • Way of looking at what text really is a
    codification of current scholarly practice
  • A set of shared assumptions and priorities about
    the digital agenda
  • focus on content and function (rather
    thanpresentation)
  • identify generic solutions (rather
    thanapplication-specific ones)

7
Designing a DTD for the TEI
  • How can a single mark-up scheme handle a large
    variety of requirements ?
  • all texts are alike
  • every text is different
  • learn from the database designers
  • one construct, many views
  • each view a selection from the whole

8
The Chicago Pizza Model
  • A useful metaphor for expressing modularity now
    implemented athttp//www.hcu.ox.ac.uk/TEI/pizza.h
    tml lt!ENTITY base "deepDishthinCruststuffed"
    gt lt!ENTITY topping "pepperonimushrooms
    sausageanchovies... " gt lt!ELEMENT pizza (
    base, tomatoSauce, cheese, (topping)) gt

9
To build a TEI pizza, take...
  • The core tagsets, the base of your choice, the
    toppings of your choice, (optionally) a reference
    to your extensions
  • lt? xml version1.0 ?gt
  • lt!DOCTYPE tei.2 SYSTEM tei2.dtdgt
  • lttei.2gt
  • ltteiHeadergt .... lt/teiHeadergt
  • lttextgt .... lt/textgt
  • lt/tei.2gt

10
The core tagsets
  • Detailed metadata provision the TEI Header tags
    for a large set of common textual requirements
  • paragraphs
  • highlighted phrases
  • names, dates, number, abbreviations...
  • editorial tags
  • notes, cross-references, bibliography
  • verse and drama

11
The base tagsets define
  • basic high-level structure of document one must
    be chosen from
  • prose, verse, or drama
  • transcribed speech
  • dictionaries and terminology
  • or combine two or more using either of the
    general base (anything anywhere)
  • the mixed base (homogenous divisions)

12
TEI additional tagsets
  • sets of elements for specialised application
    areas can be mixed and matched freely
  • currently provided
  • linking and alignment
  • analysis feature structures certainty
  • physical transcription
  • textual criticism, names and dates
  • graphs and trees figures and tables
  • language corpora....
  • in preparation...
  • manuscript description

13
The Lampeter corpus in the Oxford Text Archive
  • Fairly typical requirements for language corpora
  • light presentational tagging
  • structural markup for access
  • demographic information about text production
  • small number of tags to ease data capture and
    validation
  • Implementation tagsets
  • prose base, and
  • tags from four additional sets some extensions,
    many exclusions

14
Issues with the TEI
  • Unmodified TEI offers authors too many choices
  • four different types of bibliographic citation
  • three (or four) different tags for proper names
  • an indigestably rich choice of text editing tags
  • At the same time, unmodified TEI lacks
  • detailed table model
  • detailed tags for mathematical and other formulae
  • front matter for modern publications
  • tags for multimedia objects
  • All this can be addressed by TEI customization

15
Why bother?
  • The TEI is a well-known reference point
  • Using the TEI enables
  • sharing of data and resources
  • shared modular software development
  • lower learning curve and reduced training costs
  • The TEI is stable, rigorous, and well-documented
  • The TEI is also flexible, customizable, and
    extensible in documented ways
  • The architectural approach offers the best
    compromise for practical work.

16
Using the TEI for authoring
  • A DTD for authoring should be prescriptive rather
    than descriptive closely tied to current
    authoring practice very easy to use
  • This suggests that we need content-full tagging
    only the tags we need and all the tags we need
  • For details of version 4, see
  • http//www.tei-c.org/P4X/index.html
  • http//etext.lib.virginia.edu/tei/uvatei.html

17
Graphical Layout
18
The overall structure of a unitary text
  • ltTEI.2gt
  • ltteiHeadergt lt!-- ... --gt lt/teiHeadergt
  • lttextgt
  • ltfrontgt
  • lt!-- front matter of copy text goes here. --gt
  • lt/frontgt
  • ltbodygt
  • lt!-- body of text goes here. --gt
  • lt/bodygt
  • ltbackgt
  • lt!-- back matter of text, if any, here. --gt
  • lt/backgt
  • lt/textgt
  • lt/TEI.2gt

19
Core Structural Elements
  • lt!ELEMENT TEI.2
  • (teiHeader, text) gt
  • lt!ELEMENT teiHeader
  • (fileDesc, encodingDesc,
    profileDesc, revisionDesc?) gt
  • lt!ELEMENT text
  • ((index interp interpGrp lb
    milestone pb gap
  • anchor), (front, (index interp
    interpGrp lb milestone
  • pb gap anchor))?, (body
    group), (index interp
  • interpGrp lb milestone pb
    gap anchor), (back,
  • (index interp interpGrp lb
    milestone pb gap
  • anchor))?) gt
  • lt!ELEMENT group
  • ((argument byline docAuthor
    docDate epigraph head
  • opener salute signed index
    interp interpGrp
  • lb milestone pb gap
    anchor), (text group),

20
TEI Header structure
  • ltteiHeadergt ltfileDescgt
    ltencodingDescgt ltprofileDescgt
    ltrevisionDescgt
  • lt/teiHeadergt

21
The File Description ltfileDescgt
  • Mandatory
  • Supplies full description of the electronic file
    itself, and its source/s
  • Must specify at least a title, a publication
    statement, and a source
  • Use of authority control is advisable but not
    required

22
The File Description
  • ltfileDescgt
  • lttitleStmtgt
  • lteditionStmtgt 250
  • ltpublicationStmtgt
  • ltextentgt 300
  • ltsourceDescgt
  • ltnotesStmtgt 786
  • lt/fileDescgt

23
The source description
  • May contain common TEI bibliographic
    elementselements ltbiblgt, ltbiblStructgt,
  • or a nested file description ltbiblFullgt
  • or a list ltlistBiblgt
  • or a prose description
  • or specialised elements for transcribed speech
  • Or (for the born-digital document) simply the
    text Original

24
Crosswalks
lttitle typemaingt DC.title.main 246
ltauthorgt DC.creator.name 100
ltpublicationStmtgt DC.publisher.name 260
ltsourceDescgt DC.source 500,534
ltclassDeclgt DC.subject.schema 6xx
25
The Body of a Text
26
Front Matter
  • lttextgt
  • ltfrontgt
  • lttitlePagegt
  • ltdocTitlegt
  • lttitlePartgtRIDERS OF THE PURPLE
    SAGElt/titlePartgt
  • lt/docTitlegt
  • ltdocAuthorgt
  • ZANE GREY
  • lt/docAuthorgt
  • lt/titlePagegt
  • lt/frontgt
  • ...rest of book content here...
  • lt/textgt

27
Within a Text
  • ltdiv1 type"chapter"gt
  • lthead n"1"gtCHAPTER I.lt/headgt
  • lthead n"chaptitle"gtLASSITERlt/headgt
  • ltpgt
  • A sharp clip-crop of iron-shod hoofs deadened
    and died away, and
  • clouds of yellow dust drifted from under the
    cottonwoods out over
  • the sage.
  • lt/pgt
  • ...
  • lt/div1gt

28
Multi-Part Books
  • lttextgt
  • ltfrontgt
  • ...book front content here...
  • lt/frontgt
  • ltgroupgt
  • lttextgt
  • ltfrontgt ...part 1 front content
    here...lt/frontgt
  • ltbodygt ...part 1 body content
    here...lt/bodygt
  • ltbackgt ...part 1 back content (if
    any) here...lt/backgt
  • lt/textgt
  • lttextgt ...content of part 2...lt/textgt
  • lttextgt ...content of part 3...lt/textgt
  • ...etc...
  • lt/groupgt
  • ltbackgt
  • ...book back content(if any) here...
  • lt/backgt

29
Drama Markup
  • ltstage type"enterance"gtEnter HELENAlt/stagegt
  • ltspgt
  • ltspeaker who"Hermia"gtHERMIAlt/speakergt
  • ltlgtGod speed fair Helena! whither away?lt/lgt
  • lt/spgt
  • ltspgt
  • ltspeaker who"Helena"gtHELENAlt/speakergt
  • ltlgtCall you me fair? that fair again
    unsay.lt/lgt
  • ltlgtDemetrius loves your fair O happy
    fair!lt/lgt
  • ltlgtYour eyes are lode-stars and your
    tongue's sweet airlt/lgt
  • ltlgtMore tuneable than lark to shepherd's
    ear,lt/lgt
  • ltlgtWhen wheat is green, when hawthorn buds
    appear.lt/lgt
  • ltlgtSickness is catching O, were favour
    so,lt/lgt
  • ltlgtYours would I catch, fair Hermia, ere I
    golt/lgt

30
Examples from Lady in Boomtown
  • ltpgtThe statistics on ltrs type"place"
    key"GOLD1"gtGoldfieldlt/rsgt are from an article by
    Charles F. Spillman
  • that appeared in the ltrs type"place"
    key"NEVA1"gtNevadalt/rsgt News Letter of January 1,
  • ltdate value"1916"gt1916lt/rsgt, a copy of which I
    had preserved. His figures corresponded to my own
    memory
  • sufficiently for me to accept their accuracy.lt/pgt
  • ltpgtDuring ltrs type"person" key"TASK1"gtTasker
    Oddieslt/rsgt lifetime we talked for many happy
    hours
  • about the old days, and much I have written came
    out of those conversations.
  • He also furnished me with campaign literature
    pertaining to
  • himself and ltrs type"person" key"KEYP1"gtKey
    Pittmanlt/rsgt, together with magazine clippings
    about the
  • battleship Nevada. All of these were returned to
    ltrs type"person" key"TASK1"gtSenator Oddielt/rsgt,
    so I
  • have no record of the published sources from
    which they were clipped.
  • Most of the gossip came to me through my brother
    and my brother-in-
  • law, who joined us when ltrs type"place"
    key"TONO1"gtTonopahlt/rsgt began to boom. Whenever
    either
  • one of them picked up ltqgtoff the streetlt/qgt an
    exciting bit of information,
  • they shared it with me and I added it to my
    notes.lt/pgt
Write a Comment
User Comments (0)
About PowerShow.com