Title: Collection Development and Web Publications at the British Library
1Collection Development and Web Publications at
the British Library
John TuckHead of British CollectionsDigital
Memory, Session 2, Tallinn24th November 2005
2British Library Web Archiving Programme
- Three strands to the Programme
- an underpinning collection development policy
- UK collaborative approach through the UK Web
Archiving Consortium (UKWAC) - International collaboration through the
International Internet Preservation
Consortium (IIPC)
3British Library Web Archiving ProgrammeWhy?
- Short life-span
- Looking ahead to extension of legal deposit to
non-print - Pilot project Domain.uk as proof of concept
4British Library Web Archiving ProgrammeResources
- Team is divided across Scholarship and
Collections and IT directorates - Web Archiving Programme Manager
- Curator, Web Archiving whose responsibility
includes definition of the collection
development policy - Other posts focussing on technical
aspects/developments and on permissions,
rights clearance and administration
5Collection Development Policy
- Web Archiving High Level Collection Development
Policy - Given the huge scale and dynamic nature of the
web (estimated at approx 5 million UK-based web
sites) the British Library does not consider it
practicable nor affordable to aim at truly
comprehensive coverage of the UK web presence.
The Librarys strategy is based on - a) taking a complete snapshot of the entire UK
web presence at regular intervals
(possibly annually or twice a year) - b) achieving a more intensive and selective
harvesting of a limited number and
well-defined range of sites, building up over
time to perhaps 10,000. These would be
sites judged to be of research value now
and in the future, reflecting the national and
cultural heritage, and including a number
of sites which are exemplars of web
innovation. Also included is an events-based,
thematic collection strand
6Web Archiving Collection Development Progress
- Through its Curator, Web Archiving, the British
Library has defined a more detailed development
policy statement for UK web sites (See
www.bl.uk/collections/britirish/modbritcdpwebsites
.doc) - Framework of curators within the British Library
to assist the Curator, web archiving. Work also
carried out with partners, e.g. within the UKWAC
consortium - Longer-term aim is to consider web-sites as just
another format to collect within an overall
collection development policy
7UK Web Archiving Consortium (UKWAC)
- Officially launched in June 2004
- Comprises six institutions British Library (lead
partner), Joint Information Systems Committee,
National Library of Scotland, National Library of
Wales, the National Archives, and the Wellcome
Trust - Two-year pilot project with aims of putting in
place common framework, common approaches to
rights-cleared web archiving, and to put in place
an archive of websites (see www.webarchive.org.uk
) - To date has archived over 700 sites. British
Library input has been 700 instances of 282 sites - The first successful selective archive of UK web
space which imposes no charge for including
material or for access. Based on the National
Library of Australias web archiving application
PANDORA
8UKWAC Permissions-based approach (1)
- From outset it has been the intention to seek
explicit rights clearance from website
owners, pending secondary legislation for the
deposit of UK websites - Common licence/template devised by UKWAC
- Sites only mounted once explicit permission has
been agreed - Some exceptions in case of events-based
collection, e.g. Asian tsunami, UK general
election 2005, and London bombings, July
2005 notice and takedown policy put in place
9UKWAC Permissions-based approach (2)
- British Library has sent out more than 1,500
permission requests has received only
approximately 400 positive replies. 25
success rate. Very few outright rejections
(10) but many queries (200) and no replies - Not sustainable impact both on collection size
but also collection balance - Secondary legislation through the Legal Deposit
Libraries Act will address this. May be the
case that web sites will be brought up the
agenda with a swifter schedule for
implementation than originally thought -
10International Internet Preservation Consortium
(IIPC)
- Website http//netpreserve.org
- Mission
- To acquire, preserve and make accessible
knowledge and information from the Internet for
future generations everywhere, promoting global
exchange and international relations - Goals
- To enable the collection of a rich body of
Internet content from around the world to be
preserved in a way that it can be archived,
secured and accessed over time - To foster the development and use of common
tools, techniques and standards that enable the
creation of international archives - To encourage and support national libraries
everywhere to address Internet archiving and
preservation
11Working with IIPC
- Aim of IIPC is to put in place a range of tools
and common standards for those tasked with
web archiving - We see IIPC and developing tools, standards as
the means of achieving a whole domain crawl
of the UK - Recently took part in a smart crawler project
and procurement with the Bibliotheque
nationale de France to put in place a
prototype to enable large scale web archiving
automatically locating content, frequency of
capture and thematic linking. Complexities
of the technology have led to a new approach
now to involve British Library, BnF and Library
of Congress - National Library of New Zealand, Library of
Congress and British Library also to work on
improved curator tools to facilitate interface
and work of curators dealing with websites
12e-Content Future Collection Development for
other e-Formats
- Offline digital publications
- The British Library will seek to collect offline
resources (e.g. CD-ROMs, Disks, DVDs not films)
comprehensively to the level of approximately 80
- 90 of estimated published output. Collection
will be within a scope generally defined as
appropriate for current research or research in
the future
13e-Content Future Collection Development for
e-Formats
- Online e-journals
- The British Library will seek to collect
e-journals with a UK imprint comprehensively to
the level of approximately 80 of published
output, and within a scope generally defined as
appropriate for current research or for research
in the future. The 20 of material not collected
will reflect out of scope material considered to
be of non-research level together with a small
element of inevitable non compliance
14e-Content Future Collection Development for
e-Formats
- Online e-books
- The same collection criteria as for e-journals
apply to e-books but we believe that the
build-up to 80 will be slower than for
e-journals as to a large degree e-books currently
replicate printed materials and very few are at
research level. E-books are not prioritised by
the legal deposit libraries in the UK as an area
of early Regulation under the Legal Deposit
Libraries Act 2003
15e-Content Future Collection Development for
e-Formats
- Databases
- In the case of databases, many may not be defined
as publications under the Act and thus would not
be eligible for legal deposit. For formally
published databases, the British Library will
seek to acquire comprehensively and within the
same scope and proportions as for e-journals.
Note is taken, however, of the dynamic and
ephemeral nature of databases and the technical
challenges they will present. From the
perspective of the national published archive,
databases can probably only meaningfully be
collected on a snapshot or last edition basis. At
present online databases are being accorded a
lowish priority for the Library from the
perspective of both voluntary and statutory
deposit. Many are more likely to be relevant to
the web archiving programme
16Voluntary Deposit of Electronic
PublicationsPractice
- Handheld (CD-ROMs etc) declining in number
delivered to our Legal Deposit Office processed
as other physical materials. Fully catalogued and
accessible in reading rooms - On-line materials received through voluntary
deposit new set of procedures, workflows put in
place clear collection development policy
defined enabling selection multitude of file
extensions on-line material stored as e-mails in
first instance, then burned on DVD for storage
(using Ex Libraries Digitool) - Long-term objective is incorporation in Digital
Object Management Programme, as part of overall
digital preservation strategy