La digitalizzazione di testi letterari di area germanica: problemi e proposte - PowerPoint PPT Presentation

About This Presentation
Title:

La digitalizzazione di testi letterari di area germanica: problemi e proposte

Description:

La digitalizzazione di testi letterari di area germanica: problemi e proposte Presentation Outline Introduction Character encoding Metrical markup Conclusion The ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 29
Provided by: tdtcUnisi
Category:

less

Transcript and Presenter's Notes

Title: La digitalizzazione di testi letterari di area germanica: problemi e proposte


1
La digitalizzazione di testi letterari di area
germanica problemi e proposte
2
Presentation Outline
  • Introduction
  • Character encoding
  • Metrical markup
  • Conclusion
  • The Digital Vercelli Book Project
  • http//islp.di.unipi.it/bifrost/vbd/

3
Introduction
  • Digital editions require digital objects
  • Image digitizing and processing relies on
    reliable and mature techniques/tools
  • Text encoding can be a very time-consuming and
    difficult process
  • Literary texts belonging to the Old Germanic
    tradition present specific problems
  • Problems range from character encoding
    (transcription level) to meter encoding (edition
    level)

4
Character Encoding
  • What does text encoding mean?
  • What are characters for a computer?
  • What does character encoding mean?
  • code really means number
  • A 65 (dec.) or 41 (hex.) or 0100001 (bin.)
  • The first encoding standards ASCII (7 and 8
    bit), EBCDIC
  • The ISO ASCII-based standards ISO 8859-1 etc.
  • more characters but interchange problems

5
Old English Characters
  • Ancient writing systems present very specific
    problems
  • F.i. scribes writing in Old English modified the
    Latin alphabet to reflect OE phonological
    features
  • modified letters (æ œ ð)
  • new letters (þ ?)
  • unused letters (g v) lt- ? f
  • Significant variations related to different
    times, places (scriptoria), scribal habits,
    writings

6
Problems in OE character visualization
  • ASCII and ISO 8859- miss a good number of
    important characters
  • From an HTML page of the DOE corpus
  • The corresponding source code
  • ltimg src"T04290_files/etail-uppercase.gif"
    align"top" border"0"gtfne swa he cwæde Micel is
    gefea

7
The Unicode Standard
  • The Unicode site http//www.unicode.org/
  • A universal character encoding standard used for
    representation of text for computer processing
  • Fully compatible and synchronized with the
    corresponding versions of International Standard
    ISO/IEC 10646
  • Latest major revision 4.1, 5.0 in beta
  • 97720 different characters, room for many more
    (about 1 million)
  • Universal, efficient, unambiguous
  • Characters glyphs distinction

8
The Unicode Standard
9
Characters in an Old English manuscript
  • Considerable variation of shapes for the same
    character
  • a s y M
  • Size variation
  • a i e
  • Special characters (abbreviations, punctuation)

10
Encoding OE Characters
  • Why encode non standard characters
  • To allow for paleographical analysis
  • To track scribe habits
  • To obtain a high quality text-only facsimile
  • What to encode
  • Not every letter variation is meaningful
  • How to encode
  • Unicode XML markup MUFI compliant font

11
Entities
  • Entities are empty boxes (think about constants
    in programming languages)
  • Entities must be declared at the beginning of the
    XML (or, more often, in a separate file)
  • lt!ENTITY lows "61735"gt lt!-- low s letter --gt
  • lt!ENTITY longs "?"gt lt!-- long, f shaped s letter
    --gt
  • They allow for interchange with legacy operating
    systems and platform
  • They simplify the handling of special
    characters (and more)

12
TEI P4 and Unicode
  • How to use entities
  • longs s not very useful
  • N.B. entity names are lost forever!!!
  • longs ? visualization
  • ltc type'longs'gtslt/cgt visualization search
  • ... but what about missing characters?

13
TEI P5 and Unicode
  • Use the ltggt element in the text
  • ... ltg reflows/gt ...
  • together with the ltcharDescgt one
  • ltcharDescgt
  • ltchar idlowsgt
  • ltcharNamegtLATIN SMALL LETTER S LOW UNDER THE
    LINElt/charNamegt
  • ltcharPropgt
  • ltlocalNamegtentitylt/localNamegt
  • ltvaluegtlowslt/valuegt
  • lt/charPropgt
  • ltmapping typestandardizedgtslt/mappinggt
  • ltmapping typePUAgtUF127lt/mappinggt
  • lt/charDescgt

14
TEI P5 and Unicode
  • Another example
  • ltcharDescgt
  • ltgliph idr1gt
  • ltgliphNamegtLATIN SMALL LETTER R WITH
  • ONE FUNNY STROKElt/gliphNamegt
  • ltcharPropgt
  • ltlocalNamegtentitylt/localNamegt
  • ltvaluegtr1lt/valuegt
  • lt/charPropgt
  • ltgraphic urlr1img.png/gt
  • lt/gliphgt
  • lt/charDescgt

15
Metrical Markup
  • Old Germanic meter features
  • non isosyllabic
  • syllabic quantity not particularly relevant
  • long verse composed of two half-lines
  • half-lines bound by alliteration
  • stress pattern
  • No specific solutions in the TEI guidelines
  • Several prosodic theories (Sievers to Hoover)
  • Stylistic features problems
  • Risk of complex, overlapping markup

16
General Structure of Old Germanic Meter
  • A markup proposal
  • ltlggt
  • ltlgt
  • lthlgtHwæt! Ic swefna cystlt/hlgt
  • lthlgtsecgan wyllelt/hlgt
  • lt/lgt
  • lt/lggt
  • ltlggt (line group) only needed where stanzas occur
    (Deor)
  • lthlgt (half line) syntactic sugar for ltseg
    type"halfline"gt
  • lthlAgt and lthlBgt not needed

17
Meter encoding v. 1
  • A simple method to encode meter using attributes
    of the lthlgt element
  • lthlgt
  • ltmet name"Sievers" code"D1" scan"//\x"/gt
  • ltmet name"Russom" code"x/Sx" scan"x/\x"/gt
  • ltmet name"Hoover" code"nAn" scan"xx /\x"/gt
  • ...
  • HWÆT! WE GARDENA
  • lt/hlgt
  • Doesn't allow for alternative scansions using the
    same system
  • Doesn't take into account syllables (and
    disagreement in syllable counts/stress pattern)

18
Meter encoding v. 2
  • A more complete (and complex) method
  • lthl n"3a"gt
  • ltmet system"Sievers" resp"Schwab"
    totalSyllables"5" scansion"D-1" Anacrusis"0"
    Extrametrical"0" Lift"1,2,4" halfLift"3"
    dip"5" allitGlyph"w" allitSound"/w/"
    allitPosition"1,2" /gt
  • ltmet system"Sievers" resp"Fulk"
    totalSyllables"4" scansion"D-1" Anacrusis"0"
    Extrametrical"0" Lift"1,2" halfLift"3" dip"4"
    allitGlyph"w" allitSound"/w/"
    allitPosition"1,2" /gt
  • weorc wuldorfaeder
  • lt/hlgt
  • Scansion not associated to the actual text ...

19
Meter encoding v. 2
  • ... in fact you could take it out of the text
  • lthl n"3a" id"CH.3a"gtweorc wuldorfaederlt/hlgt
  • ...
  • ltmet target"CH.3a" system"Sievers"
    resp"Schwab" totalSyllables"5" scansion"D-1"
    Anacrusis"0" Extrametrical"0" Lift"1,2,4"
    halflift"3" dip"5" AlitGlyph"w"
    allitSound"/w/" Allitposition"1,2" /gt
  • ltmet target"CH.3a" system"Sievers" resp"Fulk"
    totalSyllables"4" scansion"D-1" Anacrusis"0"
    Extrametrical"0" Lift"1,2" halflift"3" dip"4"
    AlitGlyph"w" allitSound"/w/" Allitposition"1,2"
    /gt

20
Meter encoding v. 2
  • To establish a direct connection between scansion
    and text you have to mark syllables
  • You could add this to the simple model
  • lthlgt
  • ltmet name"Russom" scan"/x/xx" sylls"1a.1.1
    1a.1.2 1a.1.3 1a.1.4 1a.1.5" /gt
  • ltmet name"Bliss" scan"/\xx" sylls"1a.1.1
    1a.1.3 1a.1.4 1a.1.5"/gt
  • ltsyl id1a.1.1gtþeltsyl id"1a.1.2"gtodlt/sylgt
  • ltsyl id"1a.1.3"gtcynlt/sylgtltsyl id"1a.1.4"gtinlt/syl
    gtltsyl id"1a.1.5"gtgalt/sylgt
  • lt/hlgt

21
Meter encoding v. 3
  • The most complete (and complex!) method
  • ltfvLib id"PS" type"Prosodic Stress"gt
  • ltignored id"x"/gt //ignored in scansion
  • ltdip id"SO"/gt
  • ltdipResolution id"SOR"/gt //second half of
    resolved lift
  • lthalfLiftLongPosition id"S1LP"/gt // VCC
  • lthalfLiftLongNature id"S1LN"/gt // long Vowel
  • lthalfLiftShort id"S1S"/gt
  • ltliftLongPosition id"S2LP"/gt // lift long by
    position
  • ...
  • lt/fvLibgt

22
Meter encoding v. 3
  • The Feature Structure looks complex, but need
    only be designed once
  • lthl n"3a" id"CH.3a"gtltsyll id"ch3a.1"gtweordlt/syl
    lgt ltsyll id"ch3a.2"gtwullt/syllgtltsyll
    id"ch3a.3"gtdorlt/syllgtltsyll id"ch3a.4"gtfaelt/syllgt
    ltsyll id"ch3a.5"gtderlt/syllgtlt/hlgt
  • ....
  • ltlinkGrp type"metrical prosody" domains"PS AT
    AP AG T1" targFunc"?"gt
  • lt!--...--gt
  • ltlink id"L1" targets"ch3a.1 S2LP A1 APW AGW"/gt
  • ltlink id"L2" targets"ch3a.2 S2LP A1 APW AGW"/gt
  • ...

23
Stylistic features the kenning
  • Main element
  • ltkenninggt
  • Using the ltkenninggt element without further
    markup is the simplest way to markup kenningar in
    a text
  • Examples
  • ltkenninggtswanradlt/kenninggt
  • ltkenninggtbeadoleomalt/kenninggt

24
Stylistic features the kenning
  • Sub-elements
  • ltbwgt base word
  • To single out the base word in a kenning
  • ltdetgt determinant
  • To single out the determinant
  • ltrefergt referent
  • Explicit markup of the object or person the
    kenning is referred to

25
Stylistic features the kenning
  • Attributes
  • type specifies the type of kenning
  • level specifies the level, i.e. if the kenning
    is hosted/hosting another kenning and its
    position in the hierarchy
  • class specifies a general semantic class which
    the kenning belongs to
  • func specifies the stylistic function of the
    kenning

26
Stylistic features the kenning
  • Examples
  • ltkenninggt
  • ltdetgtbeadolt/detgtltbwgtleomalt/bwgt
  • ltrefergtsweordlt/refergt
  • lt/kenninggt
  • ltkenning level"1"gt
  • ltdetgt
  • ltkenning level"2"gt
  • ltdetgtheofonlt/detgtltbwgtenglalt/bwgtlt/kenninggt
  • lt/detgt
  • ltbwgtcyninglt/bwgt
  • lt/kenninggt

27
A Work in Progress...
  • Coming soon on the Digital Medievalist site
  • http//www.digitalmedievalist.org/
  • Collaborative edition on the wiki
  • Metrical-markup list for discussion
    (metricalmarkup-l_at_uleth.ca)
  • Feel free to ask and/or suggest!

28
Conclusion
  • The Digital Vercelli Book team
  • Federica Goria
  • Raffaele Cioffi
  • Emilia Di Maio
  • Roberto Rosselli Del Turco
  • The Metrical Markup team
  • Dorothy Carr Porter
  • Daniel Paul O'Donnell
  • Roberto Rosselli Del Turco
Write a Comment
User Comments (0)
About PowerShow.com