Digital%20Preservation%20(E-Archiving) - PowerPoint PPT Presentation

About This Presentation
Title:

Digital%20Preservation%20(E-Archiving)

Description:

Digital Preservation (E-Archiving) Marta Melgar Garc a mmelgar_at_ine.es – PowerPoint PPT presentation

Number of Views:220
Avg rating:3.0/5.0
Slides: 50
Provided by: Joseb155
Category:

less

Transcript and Presenter's Notes

Title: Digital%20Preservation%20(E-Archiving)


1
Digital Preservation(E-Archiving)
Marta Melgar García mmelgar_at_ine.es
2
Presentation Index
  • Introduction
  • Digital Preservation Strategies
  • Digital Preservation Problems
  • INE Journals digital repository
  • INEBase History
  • Our Virtual Library
  • Project Phases
  • The Technical Process in 3 steps
  • The Publisher
  • Visualization On Internet
  • Interesting Data
  • IT Data

3
Introduction
Digital Preservation definition
  • Digital preservation combines policies,
    strategies and actions that ensure access to
    information in digital formats over time.
  • Publications will be available and accessible
    for generations to come.
  • Source American Library Association

4
Digital Preservation strategies
  • Digital preservation strategies and actions
    address content creation, integrity and
    maintenance.
  • Planning
  • Content creation
  • Content integrity
  • Content maintenance
  • Problems
  • Source ALA

5
Digital Preservation strategies
  • Clear and complete technical specifications
  • Production of reliable master files
  • Sufficient descriptive, administrative and
    structural metadata to ensure future access
  • Detailed quality control of processes

6
Digital Preservation strategies
  • Program planning, management and evaluation
    should consider
  • Risk assessment and management.
  • Cost benefit analysis.
  • Legal issues.
  • The role of file formats,standards and metadata.
  • Storage and maintenance.
  • Disaster planning.
  • The relationship between preservation and access.
  • Preservation strategies, approaches, and
    methodologies.
  • Technology forecasting for preservation.
  • Source Cornell University Library

7
Digital Preservation strategies
  • Content integrity includes
  • Documentation of all policies, strategies and
    procedures
  • Use of persistent identifiers
  • Recorded provenance and change history for all
    objects
  • Verification mechanisms
  • Attention to security requirements
  • Routine audits

8
Digital Preservation strategies
  • Content maintenance includes
  • A computing and networking infrastructure
  • Storage and synchronization of files at multiple
    sites
  • Continuous monitoring and management of files
  • Programs for refreshing, migration and emulation
  • Written disaster prevention and recovery plans
  • Periodic review and updating of policies and
    procedures

9
Digital Preservation problems
  • We have to preserve the records in an electronic
    era where change and speed is valued more highly
    that conservation and longevity.
  • Enormous amounts of digital information are
    already lost forever.
  • Information technologies are essentially obsolete
    in a short period of time. This dynamic creates
    an unstable and unpredictable environment for
    the continuance of hardware and software.
  • There is a proliferation of document and media
    formats, each one potentially carrying their own
    software and hardware dependencies.Copying these
    formats from one storage device to another is
    simple. However, merely copying bits is not
    sufficient for preservation purposes. If the
    software is not avaliable, the information will
    lost. Besides the complexity of maintaining the
    integrity of links, embedded objects, etc.
  • Digital preservation is expensive.
  • Increasingly restictive intellectual property and
    licensing regimes.
  • Source http//www.ifla.org

10
INE Journals digital repository
In our Library we have created a digital
repository of printed journals.
  • Process steps
  • 1. In our OPAC (On-line public Access Catalogue),
    we select the 856 field (for electronical
    resources).
  • 2. We create a fixed URL. This URL is inside our
    server.
  • 3. We scan the journals in PDF format.
  • 4. We get up the PDF files to the server through
    FTP.
  • 5. We use the fixed URL and incorporate every
    different PDF file to its root.
  • 6. We link every file to the OPAC Web.
  • 7. We see the digitalized file in our OPAC Web.

11
INE Journals digital repository
Field 856
12
INE Journals digital repository
13
INE Journals digital repository
14
INE Journals digital repository
15
INE Journals digital repository
16
INE Journals digital repository
17
INE Journals digital repository
  • Some interesting data
  • No cost of implementation
  • Involved personel 2 people
  • Project time one and a half year
  • Current status More than 1000 journal numbers
    digitalized and published

18
INEbase history
Statistical books 1858-1997 available on the web
  • Background
  • 1996 The INE joins the Internet
  • 2000 INEbase birth ?all statistical production
    offered on the Internet
  • 2004 what shall we do with past information only
    available in printed format? ?Target opening up
    to the public historical collection of INE
    publications only available on paper

19
INEbase history a new section of INEbase
  • We had to choose between different alternatives
  • Tables in pc-axis format
  • Complete PDF versions of the books
  • INEbase history

20
INEBase History Our Virtual Library
21
INEBase History Our Virtual Library
22
INEBase History Our Virtual Library
1858 Yearbook
23
INEBase History Our Virtual Library
Population (28 tables)
24
INEBase History Our Virtual Library
25
INEBase History Our Virtual Library
26
INEbase history Project Phases
  • Phase 1.
  • What should be published? Most symbolic and
    representative volumes of public statistical
    activity
  • Statistical Yearbooks (1858 1997)
  • Population Censuses (1900 1970)
  • Outsource scanning ( de 100,000 pages)
  • Outsource the software development
  • Phase 2.
  • Cataloguing starts
  • Software improvements suggested by use
  • 20 publications catalogued before publishing

27
INEbase history Project Phases
  • Phase 3.
  • Internet launch takes place with 20 Yearbooks and
    1 Census
  • Phase 4.
  • Cataloguing and web publications of 78 Yearbooks
    and 9 Censuses (34 volumes)
  • Phase 5.
  • Incorporation of new publications
  • Scan the Agrarian Census and VS statistics
  • Programme adaptation
  • Cataloguing publication

28
INEbase history The Technical Process in 3 steps
  • 1. Scanning and OCR
  • Scanning using the originals
  • Unbinding (old and non-unique)
  • Guillotining (repeated and unimportant)
  • Microfiche (rare, old copies)
  • TIFF files obtained
  • OCR programme used to generate txt files ?used
    for search engine
  • Once PDF file is obtained ? ready to be
    catalogued

29
2. Cataloguing books into the system
cataloguer role
INEbase history The Technical Process in 3 steps
  • 1st step create index with categories until we
    get to the final node the statistical tables

2nd step associate one or more PDF documents to
each node
30
INEbase history The Technical Process in 3 steps
How is cataloguing done? Practical example
Creation of a virtual book Statistical Yearbook
2010
Node blocked
31
INEbase history The Technical Process in 3 steps
Creation of the index publication
Creating as many chapters as needed
32
INEbase history The Technical Process in 3 steps
Creation of the tables and association to the
corresponding PDF-doc.
33
INEbase history The Technical Process in 3 steps
Recreating the hierarchical tree All the
publications documents appear associated to
their corresponding table
Nodes unblocked
Cataloguers work ends here
34
INEbase history The Technical Process in 3 steps
  • 3. Revision before publishing
  • Cataloguing should be revised before being
    published
  • Who revises? ? there is a specific role, the
    proof-reader, but. this role has not really
    been used and in reality another cataloguer does
    the revision
  • Once the proof-reading work is finished, the book
    is ready for publication
  • Proof-readers work ends here

35
INEbase history The Publisher
Main task to publish books other tasks user
and trasmission control, nodes translation
Blocked node
Published node
Unblocked node
Book ready to be shown on the Internet
And the translation process begins
36
Trasmission process synchronization of servers
Cataloguing Server
Dissemination Server
This step might not be needed
37
INEbase history Visualisation on the Internet
38
INEbase history Visualisation on the Internet
Yearbooks ordered by decades
39
INEbase history The hierarchical tree....
On the dissemination server
On the cataloguing programme
40
And just a click on the required table
And a 9 page PDF document is shown
41
INEbase history Anything else to be taken in
account
Search engine
Change language
No. of tables
Size of pdf file
42
INEbase history The search engine
Direct access to the pdf document
43
The search engine is based on the table titles
(sorry, only in Spanish) and the hierarchical
tree (in English as well)
INEbase history The search engine
Of course, you might as well use INEs general
search engine
44
Population censuses Everything is also valid
INEbase history The search engine
45
INEbase history Some Interesting Data
  • 1- Economic data
  • Initial scanning stage 12,000 Euros, 110,000
    pages
  • External development 90,000 Euros
  • 2- Deadlines
  • Scaning development programme 6 months
  • Cataloguing 20 months
  • 3- Amount of scanned pages
  • Yearbook 70,000 pages
  • Census 30,000 pages
  • Total 100,000 pages

46
INEbase history Some Interesting Data
  • 4- Personnel used
  • Cataloguing 0 3 Recording assistants
  • Indexes translator 1 trainee
  • Publisher 1 2 Statisticians
  • IT support team
  • 5- How many people use INEbase History?
  • Page views in october 77,623 (1.2 of total)

47
INEbase history IT DATA
  • IT infrastructure  a reasonably simple system
  •      A cataloguing server houses a copy of the
    work from the database and the collection of PDF
    pages multiple cataloguer PCs provided with a
    "client" application connect to the server
  •     One of the components of the family of web
    servers at www.ine.es houses the dissemination
    server (the software, plus a copy of the database
    and a copy of the collection of PDF pages). This
    is the system that serves Internet files
  •     There are copy and safety mechanisms between
    one environment and the other
  •      The environment is similar to a content
    management programme

48
INEbase history IT DATA
  • IT infrastructure  a reasonably simple system
  • Client programmes developed with Microsoft.Net.
  • Server programme developed with Java.
  • Catalogue and dissemination database, Oracle 9i.
  • Programmes for working with PDF files obtained
    from a manufacturer specialised in this kind of
    software.
  • Conceptual design. Setting requirements,
    selection of
  • platforms National Statistics Institute.
  • Scanning of originals Proco S.A.
  • Tecnological partner development Sopra Group.

49
  • Thank you very much for your attention
Write a Comment
User Comments (0)
About PowerShow.com