Designing for the Discipline: Open Libraries and Scholarly Communication - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

Designing for the Discipline: Open Libraries and Scholarly Communication

Description:

Thus it is essentially a community activity ... There comes the Internet and ... RAS registrants. that contains a monthly usage summary. authors' incentives ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 33
Provided by: kric2
Learn more at: http://openlib.org
Category:

less

Transcript and Presenter's Notes

Title: Designing for the Discipline: Open Libraries and Scholarly Communication


1
Designing for the Discipline Open Libraries and
Scholarly Communication
  • Thomas Krichel
  • 2005-05-20

2
about this talk
  • Three parts
  • normative theory
  • RePEc history
  • rclis future ideas
  • And a final plea all of this needs help.

3
scholarly communication
  • is mainly about scholars communicating
  • between themselves
  • to students, occasionally
  • Thus it is essentially a community activity
  • Traditionally, there have been two intermediaries
    acting as external agents.
  • libraries
  • publishers

4
when tradition ends
  • Two external shock
  • There comes the Internet and reduces distribution
    costs to zero
  • There comes computer technology and reduces
    storage costs somewhat
  • opportunity sets of community members and
    external agents increases
  • Proposition the future depends much on what the
    community members decide. External agents have
    little impact.

5
discipline communities
  • Scholars of various disciplines have varying
    habits of research, publication, and evaluation
  • It is likely that the Internet will emphasize
    those differences rather than reducing them.

6
examples disciplines with established informal
publishing
  • Preprint communities
  • Physics ? arxiv.org
  • Mathematics ? arxiv.org, partially
  • Working paper communities
  • Computer Science ? CiteSeer
  • (working paper disappearing)
  • Economics ? RePEc

7
change is tough
  • Change has to come inside the discipline.
  • There has to come a pioneering individual who
  • is technically well versed
  • is managerially smart
  • has extraordinary forward thinking
  • is willing to take considerable risk with her
    career
  • Ginsparg, Krichel, Giles Lawrence are rare

8
and what about libraries?
  • Libraries do it systematically wrong
  • concentrate on access
  • concentrate on readers
  • concentrate on documents
  • They need to
  • move from access to impact
  • move from the reader to the writer
  • move from documents to people

9
RePEc
  • RePEc is a freely available digital library
    related to Economics.
  • It does provide for a partial evaluative
    database.
  • It is entirely run by a virtual organization of
    volunteer.
  • I am the person who got it starting in 1993.
  • I skip over history.

10
RePEc principle
  • Many archives
  • archives offer metadata about digital objects
    (mainly working papers)
  • One database
  • The data from all archives forms one single
    logical database despite the fact that it is held
    on different servers.
  • Many services
  • users can access the data through many
    interfaces.
  • providers of archives offer their data to all
    interfaces at the same time. This provides for an
    optimal distribution.

11
RePEc is based on 460 archives
  • WoPEc
  • EconWPA
  • DEGREE
  • S-WoPEc
  • NBER
  • CEPR
  • US Fed in Print
  • IMF
  • OECD
  • MIT
  • University of Surrey
  • CO PAH

12
to form a 312k item dataset
  • 153,000 working papers
  • 157,000 journal articles
  • 1,700 software components
  • 1000 book and chapter listings
  • and the really important stuff
  • 7,000 author contact and publication
    listings
  • 8,700 institutional contact listings

13
RePEc is used in many services
  • EconPapers
  • NEP New Economics Papers
  • Inomics
  • RePEc author service
  • Z39.50 service by the DEGREE partners
  • IDEAS
  • RuPEc
  • EDIRC
  • LogEc
  • CitEc

14
institutional registration
  • This works through a system called EDIRC.
  • Christian Zimmermann started it as a list of
    departments that have a web site.
  • I persuaded him that his data would be more
    widely used if integrated into the RePEc
    database.
  • Now he is a crucial RePEc leader.

15
author registration
  • It started when funding allowed us to hire a
    student programmer to write an author
    registration system.
  • The system went online as "HoPEc" in late 2000.
  • It has been renamed "RePEc author service" (RAS)
  • In 2002 grant from OSI allows for a rewrite and
    expansion.

16
RePEc author service
  • RePEc document data has author names as strings.
  • The authors register with RAS to list contact
    details and identify the papers they wrote.
  • This is classic access control, but done by the
    authors.
  • Currently one in three items in RePEc has at
    least one identified author.

17
LogEc
  • It is a service by Sune Karlsson that tracks
    usage of items in the RePEc database
  • abstract views
  • downloads
  • There is mail that is sent by Christian
    Zimmermann to
  • archive maintainers
  • RAS registrants
  • that contains a monthly usage summary.

18
authors' incentives
  • Authors perceive the registration as a way to
    achieve common advertising for their papers.
  • Author records are used to aggregate usage logs
    across RePEc user services for all papers of an
    author.
  • Stimulates a "I am bigger than you are"
    mentality. Size matters!

19
summary keys to success
  • Have a small group of volunteers.
  • Disseminate as widely as possible.
  • Collect precise usage logs.
  • Demonstrate to authors and institutions that it
    works for them.
  • institutional registration
  • author registration

20
rclis
  • rclis stands for Research in Computing and
    Library and Information Science.
  • It is pronounced as reckless.
  • It is a RePEc clone.
  • My attempt to show that the same ideas that
    propel RePEc also can work in that area.

21
technical innovation
  • RePEc is built on attribute value templates.
  • rclis is built on a purpose built format called
    the Academic Metadata Format.
  • I set up this format. It is tailor-made to suit
    the needs of rclis and RePEc.
  • There is some usage of AMF in RePEc
  • RePEc OAI interface
  • ernad, the software feeding NEP

22
E-LIS
  • It is the largest LIS eprint archive on this
    planet.
  • It lives at http//eprints.rclis.org.
  • It contains over 2400 documents.
  • It runs in Italy but uses a system of national
    editors to feed in material.
  • I am one of the US editors.

23
DoIS
  • DoIS is a service based on a Spanish LIS
    bibliography.
  • It used to run at Manchester computing but moved
    to http//wotan.liu.edu/dois when, because of
    JISC regulations, we had to move from there.
  • It contains 13k records, 9k with free full text,
    but the data has many errors.

24
using already existing resources
  • There is already a very large computer science
    bibliography called DBLP, see http//dblp.uni-trie
    r.de
  • The data has no abstracts. It has some full-text
    links, mainly to toll-gated sites.
  • I have done work to convert parts of it to AMF.
  • I am now searching if free full text versions of
    the papers exist anywhere on the Web. This is the
    Konz project.

25
the Konz project
  • Current state
  • I use Google API to search of titles.
  • I examine responses and download pages.
  • I scan the pages for PDF and Word files.
  • I examine the text in the file to find the title.
  • Limitations
  • pdf and word full text
  • conference paper data still being processed
  • significant hardware and disk problems.

26
DoCIS
  • Konz currently finds 25k papers with free
    versions out of the paper out of a 98k searched.
    Not particularly exciting.
  • This data is integrated with DBLP AMF data and
    the result forms a new service called DoCIS.
  • DoCIS lives at
  • http//wotan.liu.edu/docis

27
DoCIS service
  • DoCIS is implemented in mod_perl with swish and
    therefore very fast.
  • The web pages are written by XSLT scripts
    directly from the AMF data.
  • The service is available to copy from the web, I
    am more than happy to run it on other sites.
  • But the most interesting thing are the service
    principles.

28
construction transparency
  • DoCIS is an open digital library service because
    it allows users to inspect exactly how the
    service runs
  • DoCIS is built using open source software.
  • There is a special interface http//wotan.liu.edu/
    strip/docis/ that allows to see almost all
    internal file. Non visible files are specially
    documented.
  • The hope is that it may be used for teaching
    purposes.

29
transportability
  • Everything in DoCIS is built is such a way that
    it should be easy to move the service somewhere
    else and establish copies.
  • The ideas may not make a lot of technical sense
    but it should increase to non-proprietary nature
    of the system.
  • Note that this has not been tested --)

30
usage transparency
  • All usage is logged and the logs are made public.
  • This it is hoped that it could be used for
    digital library research.
  • Ways will be found to aggregate usage on
    different physical installations.

31
to do list
  • finish a version of konz that recognizes HTML
    full text
  • integrate DoCIS and DoIS
  • finish conversion of DBLP to AMF
  • open institutional registration for rclis
  • open author registration for rclis
  • open a NEP-like service for rclis

32
http//openlib.org/home/krichel
collaboration is welcome!
  • Thank you for your attention!
Write a Comment
User Comments (0)
About PowerShow.com