Title: Anwendung von open source Ideen in digitalen Bibliotheken: die Beispiele von RePEc und rclis
1 Anwendung von open source Ideen in digitalen
Bibliotheken die Beispiele von RePEc und rclis
- Thomas Krichel
- 2005-06-01
2who is me?
- I was an economist.
- I was a leisure digital librarian.
- NetEc since 1993
- RePEc since 1997
- I am "just another Perl hacker"
- I am a visionary
- but I'm not like St. John the Baptist
3who is he?
4he is "St. IGNUicus"
- A humoristic creation of Richard M. Stallman
(RMS) - RMS is the father of the free software movement
- a geek
- a visionary
- St. IGNUicus shows an emphasis on the moral case
for free software, rather than the business case
5moral case and business case
- Other folks in the free software movement avoid
the "f" word - free can mean cheap
- cheap can mean bad
- They stress the business case of free software
- They use the term "open source software", (OSS)
6RMS and us
- Amen, I tell you we librarians need to learn
more from the OSS movement. - We need to make the concepts coming of free
software more a part of our business. - Let us look at a key concept free software.
7free software according to RMS
- Free software comes with four freedoms
- The freedom to run the software, for any purpose
- The freedom to study how the program works, and
adapt it to your needs - The freedom to redistribute copies so you can
help your neighbor - The freedom to improve the program, and release
your improvements to the public, so that the
whole community benefits
8what has this to do with us?
- Just replace free software with free information.
Libraries are about free information. - But the analogy is not quite as simple.
- When we talk about free information, we usually
mean things that we can freely read (download).
free as in 0 - We do not usually mean free information as
information we are free to do things with. Free
as in freedom.
9moral and business
- There is a moral case for free information.
- We rely on it.
-
- There is a business case for free information.
- We need to make our own.
10we rely on the moral case
- The citizen should be informed
- Individuals in the organization should have free
access - This is how we justify resources given to us.
- Often, members of the community who pay get
privileged access.
11from moral case to business case
- To form the business case for free information,
think of "free information" as "freedom to do
things" rather than 0. - Thus libraries can make a crucial business case
for them as agents who transform information. - Recall that there are whole industries out there
that produces free information.
12Now for something different
- RePEc is an example for an Open Library.
- An Open Library is loosely defined an application
of the OSS principles to libraries. - vague
- in the making
- but has some history
- Looking at RePEc will fix ideas.
13History
- It started with me as a research assistant an in
the Economics Department of Loughborough
University of Technology in 1990. - a predecessor of the Internet allowed me to
download free software without effort - but academic papers had to be gathered in a
painful way
14CoREJ
- published by HMSO
- Photocopied lists of contents tables recently
published economics journal received at the
Department of Trade and Industry - Typed list of the recently received working
papers received by the University of Warwick
library - The latter was the more interesting.
15working papers
- early accounts of research findings
- published by economics departments
- in universities
- in research centers
- in some government offices
- in multinational administrations
- disseminated through exchange agreements
- important because of 4 year publishing delay
161991-1992
- I planned to circulate the Warwick working paper
list over listserv lists - I argued it would be good for them
- increase incentives to contribute
- increase revenue for ILL
- After many trials, Warwick refused.
- During the end of that time, I was offered a
lectureship, and decided to get working on my own
collection.
171993 BibEc and WoPEc
- Fethy Mili of Université de Montréal had a good
collection of papers and gave me his data. - I put his bibliographic data on a gopher and
called the service "BibEc" - I also gathered the first ever online electronic
working papers on a gopher and called the service
"WoPEc".
18NetEc consortium
- BibEc printed papers
- WoPEc electronic papers
- CodEc software
- WebEc web resource listings
- JokEc jokes
- HoPEc
- a lot of Ec!
19WoPEc to RePEc
- WoPEc was a catalog record collection
- WoPEc remained largest web access point
- but getting contributions was tough
- In 1996 I wrote basic architecture for RePEc.
- ReDIF
- Guildford Protocol
20creation of RePEc
- It came about when I finally got one other
partner, the Dutch DEGREE project, a library-lead
consortium for working paper publication. - I also had a contact in Sweden called Sune
Karlsson for whom I was instrumental in securing
funding for a Swedish version of WoPEc called
S-WoPEc. - I put together a protocol that would allow us to
work together.
211997 RePEc principle
- Many archives
- archives offer metadata about digital objects
(mainly working papers) - One database
- The data from all archives forms one single
logical database despite the fact that it is held
on different servers. - Many services
- users can access the data through many
interfaces. - providers of archives offer their data to all
interfaces at the same time. This provides for an
optimal distribution.
22RePEc is based on 440 archives
- WoPEc
- EconWPA
- DEGREE
- S-WoPEc
- NBER
- CEPR
- US Fed in Print
- IMF
- OECD
- MIT
- University of Surrey
- CO PAH
23to form a 300k item dataset
- 146,000 working papers
- 154,000 journal articles
- 1,600 software components
- 900 book and chapter listings
- 6,400 author contact and publication
listings - 8,400 institutional contact listings
24RePEc is used in many services
- EconPapers
- NEP New Economics Papers
- Inomics
- RePEc author service
- Z39.50 service by the DEGREE partners
- IDEAS
- RuPEc
- EDIRC
- LogEc
- CitEc
My concern is NEP, a human mediated current
awareness service for RePEc. This could be the
subject of a more academic talk
25 describes documents
- Template-Type ReDIF-Paper 1.0
- Title Dynamic Aspect of Growth and Fiscal Policy
- Author-Name Thomas Krichel
- Author-Person RePEcper1965-06-05thomas_kriche
l - Author-Email T.Krichel_at_surrey.ac.uk
- Author-Name Paul Levine
- Author-Email P.Levine_at_surrey.ac.uk
- Author-WorkPlace-Name University of Surrey
- Classification-JEL C61 E21 E23 E62 O41
- File-URL ftp//www.econ.surrey.ac.uk/
pub/RePEc/sur/surrec/surrec9601.pdf - File-Format application/pdf
- Creation-Date 199603
- Revision-Date 199711
- Handle RePEcsursurrec9601
26 describes persons (RAS)
- template-type ReDIF-Person 1.0
- name-full MANKIW, N. GREGORY
- name-last MANKIW
- name-first N. GREGORY
- handle RePEcper1984-06-16N__GREGORY_MANKIW
- email ngmankiw_at_harvard.edu
- homepagehttp//post.economics.harvard.edu/faculty
/ - mankiw/mankiw.html
- workplace-institution RePEcedideharus
- workplace-institution RePEcedinberrus
- Author-Article RePEcaeaaecrevv76y1986i4p
676-91 - Author-Article RePEcaeaaecrevv77y1987i3p
358-74 - Author-Article RePEcaeaaecrevv78y1988i2p
173-77 - .
27 describes institutions
- Template-Type ReDIF-Institution 1.0
- Primary-Name University of Surrey
- Primary-Location Guildford
- Secondary-Name Department of Economics
- Secondary-Phone (01483) 259380
- Secondary-Email economics_at_surrey.ac.uk
- Secondary-Fax (01483) 259548
- Secondary-Postal Guildford, Surrey GU2 5XH
- Secondary-Homepage
- http//www.econ.surrey.ac.uk/
- Handle RePEcedidesuruk
28institutional registration
- This works through a system called EDIRC.
- Christian Zimmermann started it as a list of
departments that have a web site. - I persuaded him that his data would be more
widely used if integrated into the RePEc
database. - Now he is a crucial RePEc leader.
29author registration
- It started when funding allowed us to hire a
student programmer to write an author
registration system. - The system went online as "HoPEc" in late 2000.
- It has been renamed "RePEc author service" (RAS)
- In 2002 grant from OSI allows for a rewrite and
expansion.
30RePEc author service
- RePEc document data has author names as strings.
- The authors register with RAS to list contact
details and identify the papers they wrote. - This is classic access control, but done by the
authors. - Currently one in three items in RePEc has at
least one identified author
31LogEc
- It is a service by Sune Karlsson that tracks
usage of items in the RePEc database - abstract views
- downloads
- There is mail that is sent by Christian
Zimmermann to - archive maintainers
- RAS registrants
- that contains a monthly usage summary.
32authors' incentives
- Authors perceive the registration as a way to
achieve common advertising for their papers. - Author records are used to aggregate usage logs
across RePEc user services for all papers of an
author. - Stimulates a "I am bigger than you are"
mentality. Size matters!
33recently
- In 2004, Peter Jasco compared RePEc services with
the EconLit proprietary professional database. - IDEAS and LogEc were Peters pick
- EconLit was Peters pan.
- He slammed the working paper coverage of EconLit.
- He could have slammed other things.
34RePEc / EconLit partnership
- RePEc now delivers all its working paper data to
EconLit, without getting the journal data of
EconLit in return. - This may seem absolutely perverse! A bunch of
volunteers laboring for a multi-million
concern! - In fact it serves RePEc well because it adds
officialdom.
35summary until here
- We are talking about an open library as a
collaboratory for the creation of large
bibliographic aggregates. - Thus we are mainly about the supply of data,
rather than of services. This is one limitation.
I will come to this later. - RePEc only only works for Economics. This is
another limitation. I will talk about this now.
36scholarly communication
- is mainly about scholars communicating
- between themselves
- to students, occasionally
- thus it is essentially a community activity
- traditionally, there have been two intermediaries
acting as external agents. - libraries
- publishers
37when tradition ends
- Two external shock
- There comes the Internet and reduces distribution
costs to zero - There comes computer technology and reduces
storage costs somewhat - opportunity sets of community members and
external agents increases - Proposition the future depends much on what the
community members decide. External agents have
little impact.
38discipline communities
- Scholars of various disciplines have varying
habits of research, publication, and evaluation - It is likely that the Internet will emphasize
those differences rather than reducing them.
39examples disciplines with established informal
publishing
- Preprint communities
- Physics ? arxiv.org
- Mathematics ? arxiv.org, partially
- Working paper communities
- Computer Science ? CiteSeer
- (working paper disappearing)
- Economics ? RePEc
40change is tough
- Change has to come inside the discipline.
- There has to come a pioneering individual who
- is technically well versed
- is managerially smart
- has extraordinary forward thinking
- is willing to take considerable risk with her
career - Ginsparg, Krichel, Giles Lawrence are rare
41and what about libraries?
- Libraries do it systematically wrong
- concentrate on access
- concentrate on readers
- concentrate on documents
- They need to
- move from access to impact
- move from the reader to the writer
- move from documents to people
42example the institutional repository
- The name as attractive as a prison toilet
- They have been set up in many universities but
remaining empty - They imply a top-down, Stalin-style
centralization - They are resisted as any interference with
departmental affairs by administration - They set up for general purposes, and ends up
pleasing nobody.
43despite that minimal communality
- Every discipline has some form of more informal
communication. Many times they are conferences. - Every discipline needs some formal evaluation
- peer-review
- overall personal review
- This can not be done by computer and needs human
input.
44rclis
- rclis stands for Research in Computing and
Library and Information Science. - It is pronounced as reckless.
- It is a RePEc clone.
- My attempt to show that the same ideas that
propel RePEc also can work in that area.
45technical innovation
- RePEc is built on attribute value templates.
- rclis is built on a purpose built format called
the Academic Metadata Format. - I set up this format. It is tailor-made to suit
the needs of rclis and RePEc. - There is some usage of AMF in RePEc
- RePEc OAI interface
- ernad, the software feeding NEP
46E-LIS
- It is the largest LIS eprint archive on this
planet. - It lives at http//eprints.rclis.org.
- It contains over 2000 papers.
- It runs in Italy but uses a system of national
editors to feed in material. -
47DoIS
- DoIS is a service based on a Spanish LIS
bibliography. - It used to run at Manchester computing but moved
to http//wotan.liu.edu/dois when, because of
JISC regulations, we had to move from there. - It contains 13k records, 9k with free full text,
but the data has many errors.
48using already existing resources
- There is already a very large computer science
bibliography called DBLP, see http//dblp.uni-trie
r.de - The data has no abstracts. It has some full-text
links, mainly to toll-gated sites. - I have done work to convert parts of it to AMF.
- I am now searching if free full text versions of
the papers exist anywhere on the Web. This is the
Konz project.
49the Konz project
- Current state
- I use Google API to search of titles.
- I examine responses and download pages.
- I scan the pages for PDF and Word files.
- I examine the text in the file to find the title.
- Limitations
- pdf and word full text
- conference paper data still being processed
- significant hardware and disk problems.
50Khabarovsk proposal
- There is a generic possibility of building
full-text links out of bibliographic records
using search engines. - The authoritative bibliographic record can be
used as a container to hold other objects that
have a relationship to the paper - full-text instance
- display page
- comment
- cv of author etc
- See http//openlib.org/home/krichel/proposals/
khabarovsk.pdf
51DoCIS
- Konz currently finds 25k papers with free
versions out of the paper out of a 98k searched.
Not particularly exiting. - This data is integrated with DBLP AMF data and
the result forms a new service called DoCIS. - DoCIS lives at
- http//wotan.liu.edu/docis
52DoCIS service
- DoCIS is implemented in mod_perl with swish and
therefore very fast. - The web pages are written by XSLT scripts
directly from the AMF data. - The service is available to copy from the web, I
am more than happy to run it on other sites. - But the most interesting thing are the service
principles.
53construction transparency
- DoCIS is an open digital library service because
it allows users to inspect exactly how the
service runs - DoCIS is built using open source software.
- There is a special interface http//wotan.liu.edu/
strip/docis/ that allows to see almost all
internal file. Non visible files are specially
documented. - The hope is that it may be used for teaching
purposes.
54transportability
- Everything in DoCIS is built is such a way that
it should be easy to move the service somewhere
else and establish copies. - The ideas may not make a lot of technical sense
but it should increase to non-proprietary nature
of the system. - Note that this has not been tested.
55usage transparency
- All usage is logged and the logs are made public.
- This it is hoped that it could be used for
digital library research. - Ways will be found to aggregate usage on
different physical installations.
56open digital service
- DoCIS is an example for a new type of service
where the source code of the library is openly. - It is an open library service.
- This contrasts favorably with the black box
approach of the commercial search engines.
57to do list
- finish a version of konz that recognizes HTML
full text - integrate DoCIS and DoIS
- finish conversion of DBLP to AMF
- open institutional registration for rclis
- open author registration for rclis
- open a NEP-like service for rclis
58Am I crazy?
- Money does not make the world go round. Ideas do.
- When RMS proposed a free replacement for UNIX in
the early 80s, most people dismissed the idea. - Today it is reality!
- Similarly, when I started to work on RePEc a
totally free and improved AI dataset in 1993,
nobody gave it a high probability to succeed. - It is a reality!
59obstacles to open libraries
- lack of imagination entrepreneurship
- inability to form alliances
- user-centered thinking
- document-centered thinking
- technical competence required
- OAI PMH
- XML and XML Schema
- Unicode
- the "C" word
60http//openlib.org/home/krichel
collaboration is welcome!
- Thank you for your attention!