Anwendung von open source Ideen in digitalen Bibliotheken: die Beispiele von RePEc und rclis - PowerPoint PPT Presentation

About This Presentation
Title:

Anwendung von open source Ideen in digitalen Bibliotheken: die Beispiele von RePEc und rclis

Description:

... to make the concepts coming of free software more a part of our business. ... a predecessor of the Internet allowed me to download free software without effort ... – PowerPoint PPT presentation

Number of Views:97
Avg rating:3.0/5.0
Slides: 61
Provided by: kric2
Learn more at: https://openlib.org
Category:

less

Transcript and Presenter's Notes

Title: Anwendung von open source Ideen in digitalen Bibliotheken: die Beispiele von RePEc und rclis


1
Anwendung von open source Ideen in digitalen
Bibliotheken die Beispiele von RePEc und rclis
  • Thomas Krichel
  • 2005-06-01

2
who is me?
  • I was an economist.
  • I was a leisure digital librarian.
  • NetEc since 1993
  • RePEc since 1997
  • I am "just another Perl hacker"
  • I am a visionary
  • but I'm not like St. John the Baptist

3
who is he?
4
he is "St. IGNUicus"
  • A humoristic creation of Richard M. Stallman
    (RMS)
  • RMS is the father of the free software movement
  • a geek
  • a visionary
  • St. IGNUicus shows an emphasis on the moral case
    for free software, rather than the business case

5
moral case and business case
  • Other folks in the free software movement avoid
    the "f" word
  • free can mean cheap
  • cheap can mean bad
  • They stress the business case of free software
  • They use the term "open source software", (OSS)

6
RMS and us
  • Amen, I tell you we librarians need to learn
    more from the OSS movement.
  • We need to make the concepts coming of free
    software more a part of our business.
  • Let us look at a key concept free software.

7
free software according to RMS
  • Free software comes with four freedoms
  • The freedom to run the software, for any purpose
  • The freedom to study how the program works, and
    adapt it to your needs
  • The freedom to redistribute copies so you can
    help your neighbor
  • The freedom to improve the program, and release
    your improvements to the public, so that the
    whole community benefits

8
what has this to do with us?
  • Just replace free software with free information.
    Libraries are about free information.
  • But the analogy is not quite as simple.
  • When we talk about free information, we usually
    mean things that we can freely read (download).
    free as in 0
  • We do not usually mean free information as
    information we are free to do things with. Free
    as in freedom.

9
moral and business
  • There is a moral case for free information.
  • We rely on it.
  • There is a business case for free information.
  • We need to make our own.

10
we rely on the moral case
  • The citizen should be informed
  • Individuals in the organization should have free
    access
  • This is how we justify resources given to us.
  • Often, members of the community who pay get
    privileged access.

11
from moral case to business case
  • To form the business case for free information,
    think of "free information" as "freedom to do
    things" rather than 0.
  • Thus libraries can make a crucial business case
    for them as agents who transform information.
  • Recall that there are whole industries out there
    that produces free information.

12
Now for something different
  • RePEc is an example for an Open Library.
  • An Open Library is loosely defined an application
    of the OSS principles to libraries.
  • vague
  • in the making
  • but has some history
  • Looking at RePEc will fix ideas.

13
History
  • It started with me as a research assistant an in
    the Economics Department of Loughborough
    University of Technology in 1990.
  • a predecessor of the Internet allowed me to
    download free software without effort
  • but academic papers had to be gathered in a
    painful way

14
CoREJ
  • published by HMSO
  • Photocopied lists of contents tables recently
    published economics journal received at the
    Department of Trade and Industry
  • Typed list of the recently received working
    papers received by the University of Warwick
    library
  • The latter was the more interesting.

15
working papers
  • early accounts of research findings
  • published by economics departments
  • in universities
  • in research centers
  • in some government offices
  • in multinational administrations
  • disseminated through exchange agreements
  • important because of 4 year publishing delay

16
1991-1992
  • I planned to circulate the Warwick working paper
    list over listserv lists
  • I argued it would be good for them
  • increase incentives to contribute
  • increase revenue for ILL
  • After many trials, Warwick refused.
  • During the end of that time, I was offered a
    lectureship, and decided to get working on my own
    collection.

17
1993 BibEc and WoPEc
  • Fethy Mili of Université de Montréal had a good
    collection of papers and gave me his data.
  • I put his bibliographic data on a gopher and
    called the service "BibEc"
  • I also gathered the first ever online electronic
    working papers on a gopher and called the service
    "WoPEc".

18
NetEc consortium
  • BibEc printed papers
  • WoPEc electronic papers
  • CodEc software
  • WebEc web resource listings
  • JokEc jokes
  • HoPEc
  • a lot of Ec!

19
WoPEc to RePEc
  • WoPEc was a catalog record collection
  • WoPEc remained largest web access point
  • but getting contributions was tough
  • In 1996 I wrote basic architecture for RePEc.
  • ReDIF
  • Guildford Protocol

20
creation of RePEc
  • It came about when I finally got one other
    partner, the Dutch DEGREE project, a library-lead
    consortium for working paper publication.
  • I also had a contact in Sweden called Sune
    Karlsson for whom I was instrumental in securing
    funding for a Swedish version of WoPEc called
    S-WoPEc.
  • I put together a protocol that would allow us to
    work together.

21
1997 RePEc principle
  • Many archives
  • archives offer metadata about digital objects
    (mainly working papers)
  • One database
  • The data from all archives forms one single
    logical database despite the fact that it is held
    on different servers.
  • Many services
  • users can access the data through many
    interfaces.
  • providers of archives offer their data to all
    interfaces at the same time. This provides for an
    optimal distribution.

22
RePEc is based on 440 archives
  • WoPEc
  • EconWPA
  • DEGREE
  • S-WoPEc
  • NBER
  • CEPR
  • US Fed in Print
  • IMF
  • OECD
  • MIT
  • University of Surrey
  • CO PAH

23
to form a 300k item dataset
  • 146,000 working papers
  • 154,000 journal articles
  • 1,600 software components
  • 900 book and chapter listings
  • 6,400 author contact and publication
    listings
  • 8,400 institutional contact listings

24
RePEc is used in many services
  • EconPapers
  • NEP New Economics Papers
  • Inomics
  • RePEc author service
  • Z39.50 service by the DEGREE partners
  • IDEAS
  • RuPEc
  • EDIRC
  • LogEc
  • CitEc

My concern is NEP, a human mediated current
awareness service for RePEc. This could be the
subject of a more academic talk
25
describes documents
  • Template-Type ReDIF-Paper 1.0
  • Title Dynamic Aspect of Growth and Fiscal Policy
  • Author-Name Thomas Krichel
  • Author-Person RePEcper1965-06-05thomas_kriche
    l
  • Author-Email T.Krichel_at_surrey.ac.uk
  • Author-Name Paul Levine
  • Author-Email P.Levine_at_surrey.ac.uk
  • Author-WorkPlace-Name University of Surrey
  • Classification-JEL C61 E21 E23 E62 O41
  • File-URL ftp//www.econ.surrey.ac.uk/
    pub/RePEc/sur/surrec/surrec9601.pdf
  • File-Format application/pdf
  • Creation-Date 199603
  • Revision-Date 199711
  • Handle RePEcsursurrec9601

26
describes persons (RAS)
  • template-type ReDIF-Person 1.0
  • name-full MANKIW, N. GREGORY
  • name-last MANKIW
  • name-first N. GREGORY
  • handle RePEcper1984-06-16N__GREGORY_MANKIW
  • email ngmankiw_at_harvard.edu
  • homepagehttp//post.economics.harvard.edu/faculty
    /
  • mankiw/mankiw.html
  • workplace-institution RePEcedideharus
  • workplace-institution RePEcedinberrus
  • Author-Article RePEcaeaaecrevv76y1986i4p
    676-91
  • Author-Article RePEcaeaaecrevv77y1987i3p
    358-74
  • Author-Article RePEcaeaaecrevv78y1988i2p
    173-77
  • .

27
describes institutions
  • Template-Type ReDIF-Institution 1.0
  • Primary-Name University of Surrey
  • Primary-Location Guildford
  • Secondary-Name Department of Economics
  • Secondary-Phone (01483) 259380
  • Secondary-Email economics_at_surrey.ac.uk
  • Secondary-Fax (01483) 259548
  • Secondary-Postal Guildford, Surrey GU2 5XH
  • Secondary-Homepage
  • http//www.econ.surrey.ac.uk/
  • Handle RePEcedidesuruk

28
institutional registration
  • This works through a system called EDIRC.
  • Christian Zimmermann started it as a list of
    departments that have a web site.
  • I persuaded him that his data would be more
    widely used if integrated into the RePEc
    database.
  • Now he is a crucial RePEc leader.

29
author registration
  • It started when funding allowed us to hire a
    student programmer to write an author
    registration system.
  • The system went online as "HoPEc" in late 2000.
  • It has been renamed "RePEc author service" (RAS)
  • In 2002 grant from OSI allows for a rewrite and
    expansion.

30
RePEc author service
  • RePEc document data has author names as strings.
  • The authors register with RAS to list contact
    details and identify the papers they wrote.
  • This is classic access control, but done by the
    authors.
  • Currently one in three items in RePEc has at
    least one identified author

31
LogEc
  • It is a service by Sune Karlsson that tracks
    usage of items in the RePEc database
  • abstract views
  • downloads
  • There is mail that is sent by Christian
    Zimmermann to
  • archive maintainers
  • RAS registrants
  • that contains a monthly usage summary.

32
authors' incentives
  • Authors perceive the registration as a way to
    achieve common advertising for their papers.
  • Author records are used to aggregate usage logs
    across RePEc user services for all papers of an
    author.
  • Stimulates a "I am bigger than you are"
    mentality. Size matters!

33
recently
  • In 2004, Peter Jasco compared RePEc services with
    the EconLit proprietary professional database.
  • IDEAS and LogEc were Peters pick
  • EconLit was Peters pan.
  • He slammed the working paper coverage of EconLit.
  • He could have slammed other things.

34
RePEc / EconLit partnership
  • RePEc now delivers all its working paper data to
    EconLit, without getting the journal data of
    EconLit in return.
  • This may seem absolutely perverse! A bunch of
    volunteers laboring for a multi-million
    concern!
  • In fact it serves RePEc well because it adds
    officialdom.

35
summary until here
  • We are talking about an open library as a
    collaboratory for the creation of large
    bibliographic aggregates.
  • Thus we are mainly about the supply of data,
    rather than of services. This is one limitation.
    I will come to this later.
  • RePEc only only works for Economics. This is
    another limitation. I will talk about this now.

36
scholarly communication
  • is mainly about scholars communicating
  • between themselves
  • to students, occasionally
  • thus it is essentially a community activity
  • traditionally, there have been two intermediaries
    acting as external agents.
  • libraries
  • publishers

37
when tradition ends
  • Two external shock
  • There comes the Internet and reduces distribution
    costs to zero
  • There comes computer technology and reduces
    storage costs somewhat
  • opportunity sets of community members and
    external agents increases
  • Proposition the future depends much on what the
    community members decide. External agents have
    little impact.

38
discipline communities
  • Scholars of various disciplines have varying
    habits of research, publication, and evaluation
  • It is likely that the Internet will emphasize
    those differences rather than reducing them.

39
examples disciplines with established informal
publishing
  • Preprint communities
  • Physics ? arxiv.org
  • Mathematics ? arxiv.org, partially
  • Working paper communities
  • Computer Science ? CiteSeer
  • (working paper disappearing)
  • Economics ? RePEc

40
change is tough
  • Change has to come inside the discipline.
  • There has to come a pioneering individual who
  • is technically well versed
  • is managerially smart
  • has extraordinary forward thinking
  • is willing to take considerable risk with her
    career
  • Ginsparg, Krichel, Giles Lawrence are rare

41
and what about libraries?
  • Libraries do it systematically wrong
  • concentrate on access
  • concentrate on readers
  • concentrate on documents
  • They need to
  • move from access to impact
  • move from the reader to the writer
  • move from documents to people

42
example the institutional repository
  • The name as attractive as a prison toilet
  • They have been set up in many universities but
    remaining empty
  • They imply a top-down, Stalin-style
    centralization
  • They are resisted as any interference with
    departmental affairs by administration
  • They set up for general purposes, and ends up
    pleasing nobody.

43
despite that minimal communality
  • Every discipline has some form of more informal
    communication. Many times they are conferences.
  • Every discipline needs some formal evaluation
  • peer-review
  • overall personal review
  • This can not be done by computer and needs human
    input.

44
rclis
  • rclis stands for Research in Computing and
    Library and Information Science.
  • It is pronounced as reckless.
  • It is a RePEc clone.
  • My attempt to show that the same ideas that
    propel RePEc also can work in that area.

45
technical innovation
  • RePEc is built on attribute value templates.
  • rclis is built on a purpose built format called
    the Academic Metadata Format.
  • I set up this format. It is tailor-made to suit
    the needs of rclis and RePEc.
  • There is some usage of AMF in RePEc
  • RePEc OAI interface
  • ernad, the software feeding NEP

46
E-LIS
  • It is the largest LIS eprint archive on this
    planet.
  • It lives at http//eprints.rclis.org.
  • It contains over 2000 papers.
  • It runs in Italy but uses a system of national
    editors to feed in material.

47
DoIS
  • DoIS is a service based on a Spanish LIS
    bibliography.
  • It used to run at Manchester computing but moved
    to http//wotan.liu.edu/dois when, because of
    JISC regulations, we had to move from there.
  • It contains 13k records, 9k with free full text,
    but the data has many errors.

48
using already existing resources
  • There is already a very large computer science
    bibliography called DBLP, see http//dblp.uni-trie
    r.de
  • The data has no abstracts. It has some full-text
    links, mainly to toll-gated sites.
  • I have done work to convert parts of it to AMF.
  • I am now searching if free full text versions of
    the papers exist anywhere on the Web. This is the
    Konz project.

49
the Konz project
  • Current state
  • I use Google API to search of titles.
  • I examine responses and download pages.
  • I scan the pages for PDF and Word files.
  • I examine the text in the file to find the title.
  • Limitations
  • pdf and word full text
  • conference paper data still being processed
  • significant hardware and disk problems.

50
Khabarovsk proposal
  • There is a generic possibility of building
    full-text links out of bibliographic records
    using search engines.
  • The authoritative bibliographic record can be
    used as a container to hold other objects that
    have a relationship to the paper
  • full-text instance
  • display page
  • comment
  • cv of author etc
  • See http//openlib.org/home/krichel/proposals/
    khabarovsk.pdf

51
DoCIS
  • Konz currently finds 25k papers with free
    versions out of the paper out of a 98k searched.
    Not particularly exiting.
  • This data is integrated with DBLP AMF data and
    the result forms a new service called DoCIS.
  • DoCIS lives at
  • http//wotan.liu.edu/docis

52
DoCIS service
  • DoCIS is implemented in mod_perl with swish and
    therefore very fast.
  • The web pages are written by XSLT scripts
    directly from the AMF data.
  • The service is available to copy from the web, I
    am more than happy to run it on other sites.
  • But the most interesting thing are the service
    principles.

53
construction transparency
  • DoCIS is an open digital library service because
    it allows users to inspect exactly how the
    service runs
  • DoCIS is built using open source software.
  • There is a special interface http//wotan.liu.edu/
    strip/docis/ that allows to see almost all
    internal file. Non visible files are specially
    documented.
  • The hope is that it may be used for teaching
    purposes.

54
transportability
  • Everything in DoCIS is built is such a way that
    it should be easy to move the service somewhere
    else and establish copies.
  • The ideas may not make a lot of technical sense
    but it should increase to non-proprietary nature
    of the system.
  • Note that this has not been tested.

55
usage transparency
  • All usage is logged and the logs are made public.
  • This it is hoped that it could be used for
    digital library research.
  • Ways will be found to aggregate usage on
    different physical installations.

56
open digital service
  • DoCIS is an example for a new type of service
    where the source code of the library is openly.
  • It is an open library service.
  • This contrasts favorably with the black box
    approach of the commercial search engines.

57
to do list
  • finish a version of konz that recognizes HTML
    full text
  • integrate DoCIS and DoIS
  • finish conversion of DBLP to AMF
  • open institutional registration for rclis
  • open author registration for rclis
  • open a NEP-like service for rclis

58
Am I crazy?
  • Money does not make the world go round. Ideas do.
  • When RMS proposed a free replacement for UNIX in
    the early 80s, most people dismissed the idea.
  • Today it is reality!
  • Similarly, when I started to work on RePEc a
    totally free and improved AI dataset in 1993,
    nobody gave it a high probability to succeed.
  • It is a reality!

59
obstacles to open libraries
  • lack of imagination entrepreneurship
  • inability to form alliances
  • user-centered thinking
  • document-centered thinking
  • technical competence required
  • OAI PMH
  • XML and XML Schema
  • Unicode
  • the "C" word

60
http//openlib.org/home/krichel
collaboration is welcome!
  • Thank you for your attention!
Write a Comment
User Comments (0)
About PowerShow.com