TEXT ENCODING INITIATIVE (TEI) - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

TEXT ENCODING INITIATIVE (TEI)

Description:

... article, thingie grammar Formula that delineates what kind of content, how many and in what order Empty elements: EMPTY No content restrictions ... – PowerPoint PPT presentation

Number of Views:528
Avg rating:3.0/5.0
Slides: 39
Provided by: Franci116
Category:

less

Transcript and Presenter's Notes

Title: TEXT ENCODING INITIATIVE (TEI)


1
TEXT ENCODING INITIATIVE (TEI)
  • Inf 384C
  • Block II, Module C

2
TEI History
  • The developing organizations first met in 1987
  • Association for Computers and the Humanities
    (ACH)
  • Association for Computational Linguistics (ACL)
  • Association for Literary and Linguistic Computing
    (ALLC)
  • 1990first Version TEI P1
  • 1992TEI P2
  • 1993TEI P3

3
TEI History Continued
  • Principles for the development of TEI
  • Standard format for data interchange in
    humanities research
  • Guidelines for encoding texts in the same format
  • Define a recommended syntax
  • Define a meta language for description of
    text-encoding schemes
  • Future Developments
  • Linguistic description and grammatical annotation
  • Historical analysis and interpretation
  • Base tag sets for further document types
  • Manuscript analysis and physical description of
    text

4
General Introduction to SGML and XML
5
The Evolution of SGML and XML
  • 1960 Generalized Markup Language by IBM 1960s
  • 1970s 1980s ANSI initiates project to develop
    a Standard text-description language based on GML
  • 1983 SGML became an industry standard
  • 1986 ISO ratified a standards for SGML
  • 1990s Tim Berners-Lee developed HTML a simple
    formatting markup language for the World Wide Web
  • Mid 1990s XML was developed by the W3C to
    combine the flexibility of SGML and the
    simplicity of HTML

6
Benefits of SGML and XML
  • SGML is a toolkit for developing specialized
    markup languages
  • Specifies the structure of information
  • Enables interoperability between multiple
    platforms
  • Acts like a database
  • ail encompassing
  • The DTD acts as a blueprint for document
    structure
  • XML provides a manageable framework in which you
    can define your own elements

7
XML Syntax
  • Information content must have start and end tags
  • Case is significant
  • Elements may not overlap
  • Elements can nest one inside another

8
The XML Environment
  • XML Editor
  • XML Parser/Validator
  • Display program
  • DTD or schema to define elements
  • Style sheet for display of elements

9
The XML Document
  • Document prologue
  • XML declaration
  • Document type declaration
  • Points to root element
  • Points to external standards (DTDs, namespaces)
  • Document itself
  • Bracketed by root element
  • Contains elements, attributes, entities

10
The Document Type Definition
11
The DTDDocument Type Definition
  • DTD defines a documents structure
  • i.e. it is a set of rules and declarations that
    specify what tags can be used and what these tags
    can contain
  • DTD validates documents
  • - determines which documents conform to
    language
  • - reduces possibility of errors
  • DTD provides blueprint for documents
  • - specifies how to handle elements
  • - specifies which elements are allowed

12
The DTDDocument Type Definition
  • The DTD has four main functions
  • 1. declares a set of allowed elements
    vocabulary
  • 2. defines content model for each element
    grammar
  • 3. declares set of allowed attributes for each
    element
  • 4. provide various mechanisms to make
    management of model easier
  • (Ray, Chapter 5, p 148)

13
Basic Structure of DTD-Element Declaration-
  • lt!Element name (content-model)gt
  • Holds two functions
  • Adds a new element
  • States what can go inside the element
  • For every element that appears in the document,
    one must be identified in the DTD
  • Order of declarations is important

14
lt!Element name (content-model)gt
?
?
  • vocabulary
  • Denotes NAME of element that appears in mark-up
    tag
  • (case-sensitive-LOWER)
  • e.g. title, graphic, article, thingie
  • grammar
  • Formula that delineates what kind of content, how
    many and in what order
  • Empty elements EMPTY
  • No content restrictions (little value) ALL
  • Only character data, no elements PCDATA
  • Only elements formula
  • Mixed Content content model

15
Basic Structure of a DTD-Attribute Declaration-
  • lt!attlist name (attname1 atttype1
    attdescl1)
  • (attname2 atttype2 attdescl2)gt
  • For each element that appears in document,
    attributes of the
  • element must be declared
  • All attributes are declared in one place,
    attribute list

16
lt!attlist name (attname1 atttype1
attdescl1)gt ?
?
  • vocabulary
  • Name of element to which the attributes belong
  • Same as name as element declared earlier
  • e.g. title, article, thingie
  • Attribute declarations
  • attname1 Gives attribute name
  • atttype1 Specifies datatype of
  • attribute, list of values
  • CDATA, NMTOKEN, ID
  • attdesc1 Describes behavior
  • 1. default value high
  • 2. author specified value
  • REQUIRED, FIXED, IMPLIED

17
The DTDDocument Type Definition
  • It is important to remember that every document
    type definition is an interpretation of a text.
    There is no single DTD which encompasses any kind
    of absolute truth about a text, although it may
    be convenient to privilege some DTDs above others
    for particular types of analysis.
  • TEI Guidelines for Electronic Text Encoding and
    Interchange
  • http//etext.virginia.edu/TEI.html

18
The TEI DTD
  • Uses basic structural elements of general DTD
  • Designed to simplify the task of choosing an
    appropriate set of tags for the text in hand.
  • Selects appropriate combination of smaller tag
    sets, each containing some set of tags likely to
    be used together
  • 1. core tag sets standard components that are
    always included, no encoder action
  • 2. basic tag sets basic building blocks for
    text types, encoder must select at least one
  • 3. additional tag sets extra tags compatible
    with all other tag sets, encoder may add with
    basic tags in any combination
  • http//www.tei-c.org/P4X/DTD/

19
The TEI Header
20
Basic Elements of TEI
  • Paragraphs ltpgt
  • Punctuation ltstop.abbrgt, ltstop.sentgt
  • Quotations ltqgt or ltquotegt
  • Lists ltlistgt, ltitemgt etc.
  • Bibliographic Citations ltbiblgt
  • THE HEADER! ltteiHeadergt

21
The TEI Header
  • Required of every TEI text, composed of four
    parts
  • May be large and complex or very simple
  • The header may differ for documents not based on
    written text, such as computer files or spoken
    text
  • The header is not a library cataloging record,
    although the intent is similar

22
Four Parts
  • File Description ltfileDescgt
  • Encoding Description ltencodingDescgt
  • Text Profile ltprofileDescgt
  • Revision Description ltrevisionDescgt

23
File Description ltfileDescgt
  • lttitleStmtgt
  • lteditionStmtgt
  • ltextentgt
  • ltpublicationStmtgt
  • ltseriesStmtgt
  • ltnotesStmtgt
  • ltsourceDescgt

24
Encoding Description ltencodingDescgt
  • ltprojectDescgt
  • ltsamplingDeclgt
  • lteditorialDeclgt
  • lttagsDeclgt
  • ltrefsDeclgt
  • ltclassDeclgt
  • ltfsdDeclgt
  • ltmetDeclgt
  • ltvariantEncodinggt

25
Profile Description ltprofileDescgt
  • ltcreationgt
  • ltlangUsagegt
  • lttextClassgt

26
Revision Description ltrevisionDescgt
  • ltrevisionDescgt
  • ltchangegt

27
Examples and Application
28
Examples and Application
  • Dumble Geological Survey
  • A Geological survey of Texas from the late 19th
    Century comprised of twelve volumes
  • Digitally imaged monographs processed with OCR
    software to produce text
  • Text marked up in XML using the TEI Lite
    specifications
  • http//www.lib.utexas.edu/books/dumble/

29
Dumble DTD
  • Element and Attribute definitions
  • Entity references

30
(No Transcript)
31
(No Transcript)
32
Dumble Header
  • Four basic sections
  • File description
  • Encoding description
  • Profile description
  • Revision description
  • Contains bibliographic information
  • Contains information on the creation of the
    digital file

33
(No Transcript)
34
(No Transcript)
35
(No Transcript)
36
Why XML?
  • Ability to record information about a document
    within the document.
  • Ability to separate structure from format
  • Ability to wrap or embed information in layers
    of xml

37
XML Beyond TEI
  • Open Archives Initiative (OAI)
  • Semantic Web
  • Open Archival Information System
  • Digital Preservation
  • Information Discovery

38
References
  • A Sample TEI Markup
  • Appendix A.2 Elements in TEI Lite
  • OAI
  • OAIS
  • Learning XML
  • www.tei-c.org/Lite/U5-eg.html
  • www.tei-c.org/Lite/U5-taglist.html
  • www.openarchives.org/
  • http//www.rlg.org/longterm/oais.html
  • Erik T. Ray
Write a Comment
User Comments (0)
About PowerShow.com