Texts and Digital Objects - PowerPoint PPT Presentation

Loading...

PPT – Texts and Digital Objects PowerPoint presentation | free to download - id: 7adcde-YWUzO



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Texts and Digital Objects

Description:

Texts and Digital Objects What seems to have changed The web as universal library Generation I the ASCII text Generation II the XML text Generation III the book as ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 34
Provided by: adamh163
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Texts and Digital Objects


1
Texts and Digital Objects
  • What seems to have changed

2
The web as universal library
  • Generation I the ASCII text
  • Generation II the XML text
  • Generation III the book as object

3
The web as universal library
  • Generation I the ASCII text
  • A web of text nodes with documents at the nodes
  • Generation II the XML text
  • A web where the documents retain deep structure
    but the web is still the library
  • Generation III the book as object
  • The library will be imported to the web. Page by
    page. Library by library. The web is simply a way
    of accessing the universal library of print
    objects.

4
But are we going backwards?
5
But are we going backwards?
  • Some of the movement looks a trifle retrograde

6
Generation I
  • The primacy of texts
  • Nodes can in principle also contain non-text
    information such as diagrams, pictures, sound,
    animation etc. The term hypermedia is simply the
    expansion of the hypertext idea to these other
    media. (Tim Berners Lee 1989 proposal for a www
    written at CERN)
  • Texts hypertext, http, and ASCII will do

7
Generation I circa 1995
  • A forest of connected texts which frankly doesnt
    look too great.

8
(No Transcript)
9
Project Gutenberg
  • Texts are what matter
  • Accuracy matters
  • Page numbering doesnt
  • Typography doesnt matter either

10
But a good deal is lost
  • Typography may not matter, but good web design
    does
  • Typography carries a lot of meta-data
  • Meta-data and the formal structure of the text
    needs to be kept
  • Variety, flexibility, and machine-readability .
    xml

11
(No Transcript)
12
Generation II circa 2000
  • Books repurposed for the web look a lot better
    than flat ASCII. But there is a big overhead.

13
Republished for the web
  • Inevitable duplication
  • Page numbers dont matter
  • Typography can be optimised for web browsers
  • Structure and added value is preserved
  • Links and HTTP connections are fine
  • But this re-purposing is a hassle and ultimately
    confusing

14
So Google has a better idea
  • Words matter
  • Pages matter
  • Books matter
  • Libraries matter
  • And they should be searched in the way that all
    other digital objects and collections can be
    searched

15
Generation III circa 2005
  • Put books on the web just as they are. Books not
    texts are the primary resource for a library.

16
Keep it simple
  • Scan every page of every book
  • OCR every word and symbol
  • Store every word and symbol in a database
  • Store an image of every page in the database
  • Know precisely where every word is on every page

17
How the Google system works
  • The browser has a JPEG and some HTML around it
  • The web page is an image with search terms
    highlighted
  • The intelligence is in the database
  • Search is precise and fast
  • The Google database would be the universal library

18
(No Transcript)
19
(No Transcript)
20
Pages really matter
  • Every print page is a web page
  • A book is just a collection of web pages
  • The concept of a union catalogue will now have
    its co-relative a union library collection (ie
    what is a duplicate?)
  • There is no such thing as a Google edition
  • Are the Google standards of preservation good
    enough?

21
Simplicity and Conservatism
  • Publishers should be flattered
  • Book designers, editors and typographers should
    be more than flattered
  • Authors are still authors
  • Catalogues and references work with minimal
    adjustment
  • Book warehouses become obsolete

22
So what is lost?
  • Perhaps publishers and authors lose profits????
  • The text is lost. The text is readable and
    searchable. But there is no text.
  • A searchable text, but not an entire and complete
    text. A collection of pages (JPEGs).
  • Certainly none of the deep structure of the xml
    is retained
  • Linkages and references are absent

23
What is gained?
  • Books all texts, documents and libraries become
    fully searchable.
  • Automation of reading and accessibility of rare
    editions.
  • Incredibly cheap in relation to the enhanced
    availability
  • Bibliographies and Catalogues and other systems
    of metadata are preserved

24
There is much left to do
  • No fine structure in the pages
  • Poor navigation within the books
  • The commercial model has to be invented
  • It will not all be advertising driven

25
Exact Editions uses a Google-style platform for
magazines
  • Technology is similar but the sociology is
    different.

26
(No Transcript)
27
(No Transcript)
28
(No Transcript)
29
(No Transcript)
30
(No Transcript)
31
(No Transcript)
32
Similar to Google Book Search
  • Platform for publishers of magazines
  • Publishers can add web functionality (links and
    advertisements)
  • PDF as input and automated production
  • Subscription or free access
  • Full web functionality (statistics and
    integration with web apps)

33
Adam Hodgkin
  • adam.hodgkin_at_exacteditions.com
About PowerShow.com