Developing a Digital Library for the Humanities - PowerPoint PPT Presentation


PPT – Developing a Digital Library for the Humanities PowerPoint presentation | free to download - id: 3e126-NzMwZ


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation

Developing a Digital Library for the Humanities


Greek Sculpture. The Streets of 19th Century London. Traditional Docs Rethought ... New challenging document types. Geospatial Data in : Patterson's Journeys ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 34
Provided by: Pers195
Learn more at:


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Developing a Digital Library for the Humanities

Developing a Digital Library for the Humanities
  • Gregory Crane (
  • Winnick Family Chair in Technology and
    EntrepreneurshipProfessor of ClassicsDirector,
    Perseus Digital Library ProjectHttp//www.perseus

Perseus Digital Library
  • On-going areas of Development
  • 1987 DL on Classical Greek Culture
  • 1993 History of Science
  • 1996 Began work on Latin and Rome
  • 1997 Early Modern English
  • 1999 History and Topography of London
  • 2000 Ancient Egyptian Giza
  • 2000 Slavery and the US Civil War

Partner Institutions
  • Max Planck Institute for the History of Science
  • Museum of Fine Arts, Boston
  • Stoa Publishing Consortium
  • New Variorum Shakespeare Series, Modern Language
  • Special Collections at Tufts, Brandeis, the
    University of Pennsylvania

On-Going Support
  • National Endowment for the Humanities(DLI2,
    Preservation Access, Education)
  • National Science Foundation (DLI2)
  • Fund for the Improvement of Postsecondary
    Education, Dept of Ed.
  • Max Planck Society

The Whole greater than the sum
  • Tufts Health Sciences Database
  • An on-line Medical School Curriculum
  • First iteration 70 of the value
  • Second Iteration 90
  • Third Iteration 130
  • Data and system interact in increasingly
    dynamic ways.

Persistent value over time space
  • How many ages hence Shall
    this our lofty scene be acted over,In states
    unborn and accents yet unknown?
  • Brutus in Julius Caesar
  • How do we structure data for
  • Contemporary users we cant directly anticipate?
  • Systems not yet designed?

Radically New Documents
  • Reconstructions of Historical Spaces, e.g.
  • UVAs Crystal Palace (London)
  • UCLAs Rome and VR Lab
  • Integrating Virtual Spaces with Sources
  • Museum of Fine Arts, Tombs at Giza
  • Greek Sculpture
  • The Streets of 19th Century London

Traditional Docs Rethought
  • Concordance Obsolete
  • Bibliographies databases
  • Encyclopedias automatic linking
  • Lexica and lexicography
  • Automatically discovered semantic rel-s
  • THEN lexicographic work

Development is two part
  • Ultimate end Radically new docs?
  • Short term Electronic Incunabula
  • New Variorum Shakespeare
  • Electronic Marlowe
  • Tallis Street Maps
  • FIRST we thoroughly analyze what we have
  • THEN radical redesign emerges

Technology outruns Practice
  • The 3D Reconstruction/Virtual Space
  • Cutting edge technology
  • Still nascent scholarly practices
  • Mature Document Structures
  • Textual Notes 1908 Richard 3
  • Traditional Text Citations 1887 Commentary

The More Things Stay the same...
  • Content can remain unchanged
  • Presentation is dynamic and flexible
  • The Dictionary knows what you are reading
  • Citations gt Bidirectional links
  • Automatic Linking by keyword
  • Text and Atlas Plot sites in a document

Current Paradigm DL Dipomacy
  • Monolithic Systems (e.g., Perseus!)
  • One way to view each document
  • Intercommunication via metadata
  • DL as metadata for opaque objects
  • Major Problems
  • Renting access, rather than collecting content
  • All publications become ephemera

Three Strategies
  • 1) The Editing Problem
  • How do real authors create structured docs?
  • 2) Developing Radically New Docs
  • Archimedes DL on Mechanics
  • MFA Excavations at Giza
  • 3) Radical Repurposing of Print
  • Bolles Collection on London

Bolles Collection at Tufts
  • documenting the history and topography of London
    and its environs
  • 35 "full-size maps
  • 320 more specialized maps
  • 400 books (284 linear feet of shelf space)
  • 1,000 pamphlets.
  • Paper Hypertexts
  • 10,000 extra illustrations

Bolles Electronic Archive
  • A Testbed for the Perseus Digital Library
  • Level 5 TEI Encoded Full Text
  • Quotes, languages, proper names, dates, money
  • High-end OCR and Double Keyboarding
  • OCR ideal for some but not all
  • Keyboarding much the best money permitting

Bolles Initial Texts
  • Five Million Words now in L5 TEI
  • Will exceed 10 million by years end
  • Surveys of London History and Topography
  • Stow, Maitland, Wilkinson, Allen, Thornbury
  • Commentary on social conditions
  • Mayhew, Archer, Hollingshead, Booth
  • Literary works with London as backdrop
  • Defoe, Dickens, Sherlock Holmes

  • 10,000 Grayscale Images
  • Mainly engravings of people and places
  • opportunistic metadata (captions context)
  • 2,400 Contemporary Images
  • Well catalogued and geo-referenced
  • QTVR Panoramas
  • 70 Tallis Map Elevations

Geospatial Data
  • Bartholomew 15000 Data set for London
  • Modern data as reference and interchange
  • Historical maps georeferenced to Barth. Data
  • 10 so far (c. 2 hours each)
  • Urban maps do not easily line up
  • How to create an historical GIS?
  • GPS Waypoints
  • As of May 2000, good to within 10m. or better

Feature Extraction
  • Easy identification Dates, Money
  • Known Keywords and Classes
  • The Getty TGN (1 m. places and lon/lats)
  • The Bartholomew Gazzetteer (10,000)
  • Indices to Maps (e.g. Cruchley 1826, 4200)
  • The Index/Abstract of the DNB (30,000)
  • Clean-up with rule based Proper Name
    classification Mr NAME NAME street

Runtime Links
  • Runtime links supplement in file tagging
  • 1) Where metadata is less precise
  • Metadata from unedited headers and captions
  • 2) Where the source does not contain data
  • If no dates, then scan for them
  • Use tagging for high confidence data
  • Ideal situation automated tags hand proofed

Strategic Questions
  • Editions a foundation for scholarship
  • Where does the editors job start?
  • How does editors job change?
  • How do we define Corpus Editors?
  • People with domain expertise in content
  • Expertise in software and Library systems
  • Need for scholarly automated processing

Delivering Integrated Data
  • Good and rough maps for Cics Letters
  • Coleman delivers quite useful results
  • Map locates Coleman Street.
  • Streets in description of "Portsoken Ward.
  • Historical Views of this section of London
  • Timeline 1 A Linear History
  • Timeline 2 Encyclopedic Scatter

Further Work
  • Disambig., auto-cataloguing, Time/Space
  • VR Interface Tallis 1, 2 and Headset
  • New challenging document types
  • Geospatial Data in Patterson's Journeys
  • Urban data in Booth and City Directories.
  • Tallis Map for Oxford Street with overall and
    more focused directories.

Research Projects
  • Robert Jacob and VR Interfaces
  • Figure Tallis VR Conversion 1.
  • Figure Tallis VR Conversion 2..
  • Figure Head mounted VR navigation.
  • Holly Taylor and Cognitive Analysis
  • Spatial Cognition
  • Text Comprehension

  • Baseline Knowledge Environment
  • Practical and useful
  • Corpus Editions
  • Midway between editions and library digitiz.
  • Requires a new config. of skills
  • The Diplomatic Federated DL model weak
  • Need access to full data for visualizations

Perseus Document Manager
  • Works with XML
  • Multiple granularities sentence, section,
  • Deals with overlapping doc hierarchies
  • Combines internal and external metadata
  • Our metadata in RDF and can be XML
  • Since all data and metadata gt XML
  • Well suited to Federated DL Applications

Scalable DL
  • SGML/XML need translation for display
  • Cant maintain stylesheets for millions of docs
  • Intelligent display of various DTDs
  • Cheaply acquires XML/SGML docs
  • Individual Custom Style sheets allowed
  • Integration of Geo-spatial Data
  • Multilingual support, feature extraction
  • Integrated multi-resolution image support

Perseus Document Manager
  • Short term development
  • Collecting new datasets to the Perseus DL
  • (leveraging Internet 2 investment)
  • Adding value e.g.,
  • Sources for the History of Mechanics (Max Planck)
  • Duke Databank of Documentary Papyri
  • Books, maps etc. on the City of London
  • Shakespeare and Early modern English

Perseus Document Manager
  • Longer Term Distribution of the System
  • How best to maintain and expand the system?
  • Open source?
  • Commercial Licensing?
  • Wait for third party to match PDM features?

Automatic Integration
  • Content Analysis Various Languages
  • Time extracting and visualizing dates
  • Space Integrating historical Geographic Data
  • Names establishing authority lists
  • Getty Thesaurus of Geographic Names
  • Names and Coordinates
  • Encyclopedias e.g., Harpers, DNB
  • Names and Dates

Our Research Agenda
  • Developing a self-sustaining models
  • Publication of documents
  • Maintenance of software
  • Exploring Problem Sets in different domains
  • E.g., sparse data (antiquity) vs. rich (London)
  • Helping humanists rethink their position
  • Reaching new audiences
  • Changing habits

Technology matters e.g.19th c. Printing in
  • 20th Century Radio/Film/TV ambiguous
  • 19th Century Print Technology
  • 1810 c. 10,000 copies for a successful book
  • Audience for literature mainly upper class
  • 1850 hundreds of thousands
  • Audience vastly expands
  • Huge numbers read Dickens, etc.
  • 21st Century Network Technology?

The Future?
  • Two models
  • Reproduce current world in new form
  • Narrow/expensive distribution
  • Think about how that world may change
  • Broader/inexpensive distribution
  • What happens now sets the stage for
  • talk show cyber culture? or
  • a new dispersal of intellectual life?