Nomadic Digital Library Research at Cornell - PowerPoint PPT Presentation

About This Presentation
Title:

Nomadic Digital Library Research at Cornell

Description:

Access to scientific, medical, legal information. In the ... The Model T Library. The Model T Ford, with mass production, brought car travel to the masses ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 35
Provided by: carll8
Category:

less

Transcript and Presenter's Notes

Title: Nomadic Digital Library Research at Cornell


1
Automated Digital Libraries William Y.
Arms Department of Computer Science Cornell
University
2
Two Questions
3
Before Digital Libraries
Access to scientific, medical, legal information
In the United States -- excellent if you
belonged to a rich organization (e.g, a
major university) -- very poor otherwise In many
countries of the world -- very poor for everybody
4
Question 1
Must access to scientific and professional
information be expensive?
5
Research Libraries are Expensive
library materials
buildings facilities
staff
6
The Potential of Digital Libraries
7
Question 2
How effectively can computers be used for the
skilled tasks of professional librarianship? --
Time horizon 5 to 20 years -- All materials in
digital form
8
Automated Library Services
9
Skilled Librarianship
People are skilled at judgment, understanding,
discrimination, etc. -- selection --
cataloguing, indexing -- seeking for
information -- evaluating information --
reference service Can computers provide
equivalent services?
10
Equivalent Services
Example Cataloguing rules -- Application of
cataloguing rules to monographs is skilled -- It
is hard to imagine a computer system with these
skills but ... -- Catalogs and cataloguing
rules are the means not the end
11
Equivalent Services
Information discovery Why are web search services
the most widely used information discovery tools
in universities today?
12
Conventional Criteria
Web search services have many weaknesses --
selection is arbitrary -- index records are
crude -- no authority control -- duplicate
detection is weak -- search precision is
deplorable yet they clearly satisfy important
requirements ...
13
Effectiveness of Web Search
Inspec v. Google Google is usually superior for
general computing and computer science questions
gt Broader coverage gt Adequate indexing
records gt Better ranking
14
Simple Algorithms Immense Computing Power
15
History Licklider
J. C. R. Licklider Libraries of the Future,
1965 -- envisaged digital libraries for
scientists at their place of work --
listed desiderata for a digital library --
studied construction of fully automated digital
libraries -- put emphasis on artificial
intelligence and natural language processing
16
History Licklider
Licklider's predictions for digital libraries
were remarkably good, but ... -- over optimistic
about progress in artificial
intelligence -- underestimated what can be done
by brute force computing
17
Brute Force Computing
Few people can appreciate the power of Moore's
Law -- Computing power doubles every 18
months -- Increases 100 times in 10 years --
Increases 10,000 times in 20 years Simple
algorithms immense computing power may
outperform human intelligence
18
Brute Force Computing
Example Creators of the world champion chess
program (Deep Thought later Deep Blue) --
moderate chess players -- simple tree-search
algorithm -- very, very fast computer hardware
19
An Anecdote
The question (Marvin Minsky) -- How would you
design as computer system that can answer
questions such as, "Why was the space
station a bad idea?"? The answer (Danny
Hillis) -- Design much more powerful computers!
20
Examples of Automated Digital Library Services
21
Web Search
Brute force indexing and retrieval -- retrieve
every page on the web -- index every word --
repeat every month Getting better all the time --
improved algorithms -- faster computers and
networks -- analysis of users
22
Web Search
Ranking algorithms Closeness of match -- vector
space and statistical methods (Salton, et
al., c. 1970) Importance of digital object --
Google ranks web pages by how many other pages
link to them, gives greater weight to
links from higher ranking pages.
(NSF/DARPA/NASA Digital Libraries Initiative)
23
Archiving and Preservation
Internet Archive -- Monthly, web crawler gathers
every open access web page with associated
images -- Web pages are preserved for future
generations -- Files are available for scholarly
research not perfect ... -- HTML pages, images
no Java applets, style sheets -- materials are
dumped with no organization or indexing --
access for scholars is rudimentary
24
Reference Linking
Web of Science (ISI) -- input combination of
automatic means, skilled people -- limited
number of journals -- very expensive ResearchInde
x (a.k.a. CiteSeer, a.k.a. ScienceIndex) (NEC) --
fully automatic -- all open access material in
computer science -- a free service
25
Beyond Text
Informedia (Carnegie Mellon) Automatic processing
of segments of video, e.g., television news.
Algorithms for -- dividing raw video into
discrete items -- generating short summaries --
indexing the sound track using speech
recognition -- recognizing faces -- searching
using natural language processing
(NSF/DARPA/NASA Digital Libraries Initiative)
26
Costs and Benefits
27
Costs of Catalogs and Indexes
Catalog, index and abstracting records are very
expensive when created by skilled
professionals -- only available for certain
categories of material (e.g., monographs,
scientific journals) -- contain limited fields
of information (e.g., no contents page) --
restricted to static information High costs
reduce effectiveness and access
28
Costs of Automated Digital Libraries
The Google company -- 5.5 million searches
daily -- 85 people (half technical, 14 with
Ph.D. in computing) -- 2,500 PCs running Linux,
with 80 terabytes of disk The Internet
Archive -- 7 people with support from
Alexa (March 2000)
29
Overall
If you are rich ... -- Research libraries, using
commercial information services, provide
excellent service at very high cost to a
favored few -- Automated digital libraries are a
long way from providing the personal
reference service available to a faculty
member at a well-endowed university but ...
30
The Model T Library
The Model T Ford, with mass production, brought
car travel to the masses ...
-- Automated digital libraries, with open access
materials, can already provide good service
at low cost
-- In the future automated digital libraries can
bring scientific, scholarly, medical and
legal information to everybody at
negligible cost
31
A Footnote
32
Library Expertise
The future of scientific and professional
information is tied to computing, but ... --
automated digital libraries need small teams of
highly skilled people -- development of
automated digital libraries is bypassing
libraries (Google, ResearchIndex,
Informedia, Internet Archive) The level of
computing expertise in U.S. research libraries is
depressingly low
33
Further reading
William Y. Arms, "Automated digital libraries."
To be submitted to D-Lib Magazine, July/August
2000. William Y. Arms, "Economic models for
open-access publishing." iMP, March 2000.
http//www.cisp.org/imp/march_2000/03_00arms.htm
34
Automated Digital Libraries William Y.
Arms Department of Computer Science Cornell
University
Write a Comment
User Comments (0)
About PowerShow.com