Presenting Documents - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Presenting Documents

Description:

Structure unknown to the digital library system. Browsing is less convenient ... Example: Music Digital Library. Multiple Languages. Interface and/or documents ... – PowerPoint PPT presentation

Number of Views:13
Avg rating:3.0/5.0
Slides: 16
Provided by: cdau
Category:

less

Transcript and Presenter's Notes

Title: Presenting Documents


1
Presenting Documents
  • How to Build a Digital Library
  • Ian H. Witten and David Bainbridge

2
Questions
  • What form are the documents in?
  • What structure do the documents have?
  • Which kinds of access do you want to provide?
  • What metadata is available?
  • How do you want to present the documents?

3
Presenting Documents
  • Structured documents (hierarchy)
  • Unstructured text documents
  • Page images
  • Page images and extracted text
  • Audio and photographic images
  • Video
  • Music
  • Foreign Language

4
Hierarchically Structured Text
  • Table of contents
  • Chapter, section, subsection, etc.
  • Granularity of document?
  • Example Humanity Development Library

5
Unstructured Text
  • Long scroll of plain text
  • Structure unknown to the digital library system
  • Browsing is less convenient
  • Pages of document may not correspond to physical
    pages of book
  • Example Project Gutenberg Collection

6
Page Images
  • Digitized images of the documents pages
  • Document accuracy
  • OCR is error-prone
  • Duplicating layout is difficult
  • Space requirements
  • Requires 20 times more storage space than text
  • Increased download time
  • Need for text representation for searching
  • Difficult to highlight search terms on an image

7
Page Images and Extracted Text
  • Provide page images and extracted text
  • Search on extracted text
  • View image or extracted text
  • Example Maori Newspaper Collection

8
Other Document Types
  • Audio and photographic images
  • Example Oral History Collection
  • Video
  • Example Music Video Collection
  • Music
  • Representations printed notation, MIDI,
    synthesized performance, human performance
  • Example Music Digital Library
  • Multiple Languages
  • Interface and/or documents
  • Example Arabic Collection

9
Metadata
  • Provides information to facilitate access
  • Structured
  • Standardized

10
Metadata Examples
  • Conventional bibliographic listing
  • Title
  • Author
  • Date
  • Publication
  • Volume Number
  • Issue Number
  • Page Numbers
  • MARC
  • Dublin Core
  • METS

11
Metadata Aspects
  • Historical
  • Describes provenance and preservation history
  • Functional
  • Describes usage, condition and audience
  • Technical
  • Describes interoperability requirements
  • Relational
  • Describes links and citations
  • Intellectual
  • Describes content or subject

12
Searching
  • Types of query
  • Boolean
  • Ranked
  • Case-folding and stemming
  • Phrase searching

13
Browsing
  • Based on metadata
  • Browsing alphabetical lists
  • Chinese is not alphabetic
  • Browsing by date
  • Browsing structures
  • Hierarchical classification structures

14
Phrase Browsing
  • Phrase any sequence of words appearing more than
    once in the collection
  • Automatic phrase extraction
  • Key phrases
  • Phrase browser
  • Phrase hierarchy
  • Sorted by document and collection frequencies
  • Leaves are documents
  • Example The Complete Works of Shakespeare

15
Browsing Using Extracted Metadata
  • Acronyms
  • Example Acronym Extraction Demo
  • Language identification
  • Example Language Extraction Demo
Write a Comment
User Comments (0)
About PowerShow.com