Title: Der Beitrag der Fachgesellschaften zum Aufbau integrierter wissenschaftlicher Informationssysteme au
1Der Beitrag der Fachgesellschaftenzum Aufbau
integrierter wissenschaftlicherInformationssystem
e aus Sicht der Mathematik
- Joachim Lügger
- Konrad-Zuse-Zentrum
- für Informationstechnik Berlin
- META-LIB-Workshop am 22./23. Juni 1998
- SUB Göttingen
2Some Organisational Characteristics of
theDistributed Information System for Mathematics
- Involve scientific libraries, providers of
specialised information, publishers, special
interest groups - Proceed user oriented - find out what they want
- Plan for a infrastructure for mathematics and
extend it, (Informationsbeauftragte in all
departments/institutes) - Communicate widely - look for technical
co-operation - Exchange experience with related disciplines
(IuK) - ...
- Development of a news service (short notes) and
a related information store ordered by
collections, classifications,...
3Elements of Plan of the DMV (1993-95) for a
Distributed Information System for Mathematics
- Offer of electronic preprints and reports by
authors - Development of a model for electronic journals
- Availability of research software - sources and
test data - Distribute descriptions of research projects
early - build an electronic mathematical museum
(multimedia) - ...
- Usage of electronic communication (e-mail, ftp,
gopher) - Organisation of a distributed information
infrastructure - Development of some mechanism for search and
retrieval in distributed and heterogeneous data
collections.
4User Oriented Services
- Local - Collections of Documents
- offer of all (!) relevant mathematical
information - browsing and querying of hierarchic ordered
collections - utilisation of stringent classifications by
authors (MSC, GAMS,...) - National - Collections of Collections
- integration/centralisation of collection oriented
services - news service (distribution of "first pages")
- central store of news first-pages, which can be
browsed and searched (switching different views)
5Spectrum of Mathematical Information
- Digital publications, preprints, reports
- Electronic teaching materials (Exercises, ...
,Applications) - Electronic announcements of (local) events and
talks - Mathematical software, data collections
- Electronic information on projects and research
groups - Contact addresses (E-mail, Phone, Fax, ...
Homepages) - Digitised collections of historical materials
(Manuscripts) - "Mathematical Museum" (Multimedia, ...
Visualisations) - Collections of links to other relevant resources
6Origins of Metadata
- NSF/NASA/ARPA Digital Library projects
- Maps, images, geospatial data
- Journals, books, general scientific information
- Environmental databases, agricultural data
- Videos, computer vision materials
- Governmental archives and data
- OCLC/NCSA (USA) and UKOLN (UK)
- Document delivery and supercomputing
- Library and information sciences
- Internet, WWW and search engines
- National Libraries, Museums, Cultural Heritage
- Preservation of documents
- Pictures, images (original art and
digitisation's) - Natural artefacts and artificial objects
7Metadata Data about Data
- Improving resource discovery in networks
- Description of resources
- Automatic discovery, indexing and retrieval
- Interoperability of digital libraries
- Combination of digital resources
- Integration of heterogeneous databases
- Interoperability of information systems
- Wide accessibility of catalogue information
- Opportunities for interdisciplinary collaboration
- Geospatial activities and environmental
initiatives - Human genome project and medicine
- Electronic publishing and education
- Visual Arts and Information Sciences
Metadata is Information that makes data useful.
The focus is not on technologies or libraries but
rather on usability and utilisation.
8Examples Metadata Formats
- Libraries
- USMARC US Machine Readable Cataloguing
- MAB Maschinelles Austauschformat für Bibliotheken
- Humanities
- TEI Text Encoding Initiative
- EAD Encoding Archival Description
- Geospatial resources
- FGDC Content Standard for Digital Geospatial Data
- Museums
- CIMI Computer Interchange of Museum Information
- Government
- GILS Government Information Locator System
- Internet resources
- IAFA Internet Anonymous Filetransfer
- SOIF Summary Object Interchange Format (Harvest)
9Do Metadata Schemes Have Something in Common?
Libraries
Internet/ WWW
Museums
Astronomy
Dublin Core
Humanities
Geospatial
Environment
Government
- Research is more and more inter- and even
transdisciplinary. - Information exchange between science and society
is essential. - Cataloguing communities are creating their own
metadata methodologies according to their habits,
needs and uses.
10Basic Dublin Core Design Principles
The discussion was further restricted to the
metadata elements for the discovery of what we
called document like objects, or DLOs by the
workshop participants. Weibel-95 on DC-1
DC-1, March 1995, Dublin, Ohio OCLC/NCSA Metadata
Workshop
DC-5, October 1997, Helsinki OCLC/Nat. Library of
Finland
- Intrinsicality
- Extensibility
- Syntax Independence
- Optionality
- Repeatability
- Modifiability
- Simplicity
- Semantic Interoperability
- International Consensus
- Flexibility
Back in 1995 we focused on providing authors with
the ability to supply metadata ... . This is
happening, but not as much as we expected most
metadata is being created by cataloguers, or by
information professionals we wouldnt quite call
cataloguers or by other non-authorial agents.
Caplan-97
11The Dublin Core Metadata Element Set
http//purl.org/metadata/dublin_core_elements
Intellectual Property
Content
Instantiation
- 1. Title)
- 3. Subject and
- Keywords
- 4. Description
- 11. Source
- 12. Language
- 13. Relation
- 14. Coverage
- 2. Author or
- Creator
- 5. Publisher
- 6. Other
- Contributor
- 15. Rights
- Management
7. Date 8. Resource Type 9.
Format 10. Resource Identifier
) Element numbers as given in the definition.
The Dublin Core is built around the library
metaphor, i.e. the catalogue card.
12Collections of the Math-Net Project
- Preprints and Published Articles
- Teaching Materials
- Talks
- Mathematical Software, Data Collections
- Projects and Research Groups
- Talks, Lectures
- Personal Homepages
13DC-Element Title
- The name given to the resource, usually
- by the Creator or Publisher.
- DC.Title
- SUBELEMENTS
- DC.Title.Alternative (used for any titles
other than the main title including subtitle,
etc.)
The qualifier LANG may be important for the Title
element.
141. Title
- The name given to the resource, usually
- by the Creator or Publisher.
- DC.Title Preprints complete title of the article
- Teaching complete title of the work
- Talks title of the talk
- Software name of the software/source
- Projects name of the project/group
- Personal n.a.
No subtitles, no special schemes or qualifiers
are used.
152. Author or Creator
- The person or organisation primarily responsible
for creating the intellectual content of the
resource. - DC.Creator -- n.a.
- DC.Creator.PersonalName Preprints last name,
first name,... (no title) - Teaching last name, first name,... (?)
- Talks last name, first name,... (?)
- Software last name, first name,...
- Projects name of the head of the group
- Personal last name, first name,... (no
title) - DC.Creator .Email Preprints E-mail address
- .PersonalName.Address Teaching e-mail address
- .Address Talks e-mailfaxphoneofficeadd.
- .Email Software e-mail address
- .PersonalName.Address Projects e-mail address
- .PersonalName.Address Personal e-mailfaxphone
officeadd. -
16DC Problem a Name and its Notation (I)
- you need a scheme to write and to interpret it
automatically - Grötschel, Prof. Dr. M.
- Prof. Dr. Martin Grotschel
- Martin Gr\otschel
- Groumltschel, M., Prof. Dr.
- you must write correctly if you want alphabetic
lists - you need a coding convention (incl. accents,
vowels, etc.) - There is no provision of an universal coding
scheme e.g., - LCNAF (LOCs Name Authority File) is community
specific. - you need subelements for proper discrimination in
searches - DC.Creator ...
- DC.Creator.PersonalName ...
- DC.Creator.CorporateName ...
- DC.Creator.PersonalName.Address ...
- DC.Creator.CorporateName.Address ...
Who is the creator in case of a digitised
manuscript from Gauß? Is it the person who
digitised it or Gauß?
17DC Problem a Name and its Notation (II)
If all of these problems are solved, then there
remains the ...
- Tschebytscheff-Problem Hazewinkel,
Osnabrück, Oct. 1997
Chebychef Chebycheff Chebychev Chebyhev
Chebyschev Chebysev Chebyshef Chebyskev
Tchebychef Tchebycheff Tchebychev Tchebyschef
Tchebyscheff Tchebyschev Tchebyshef
Tchebysheff Tchebyshev Tchebytcheff Tschebishev
Tschebychef Tschebyscheff Tschebychev
Tschebyschef Tschebyscheff Tschebyschev
Tschebysheff Tschebyshev Tschebyshew
There are more than 600 variants of
writing Tschebytscheff correctly.
183. Subject PreprintsTeachingTalksSoftwareProj
ectsPersonal
- SCHEME
- DC.Subject -- uncontrolled keywords,
description - Math-Net Math-Net subject classification
- DC.Subject .MscPrimary msc91 primary
MSC-classification - .MscSecondary msc91 secondary
MSC-classification - Msc msc91 union of primary and secondary MSC
- .Topic "Mathematics" (if MSC-classified)
- DC.Subject .Pacs pacs PACS-classification
- .Topic "Physics" (if PACS-classified)
- DC.Subject .Cr cr CR-classification
- .Topic "Computer Science" (if CR-classified)
- DC.Subject.Zdm zdm ZDM-classification
- .Topic "Mathematics Education" (if ZDM-cl.)
- DC.Subject.Gams gams GAMS-classification
- .Topic "Software" kind of software
19Some Bibliographic Schemes
Rebecca Guenther Library of Congress
- Author or Creator
- LCNAF Library of Congress Name Authority File
- Subject and Keywords
- LCSH Library of Congress Subject Headings
- MeSH Medical Subject Headings
- AAT Art and Architecture Thesaurus
- LCNAF Library of Congress Name Authority File
- DDC Dewey Decimal Classification
- LCC Library of Congress Classification
- NLM National Library of Medicine Classification
- UDC Universal Decimal Classification
- in Germany
- PND PersonenNamenDatei
- GKD Gemeinsame KörperschaftsDatei
- SWD SchlagWortnormDatei
204. DescriptionPreprintsTeachingTalksSoftwareP
rojectsPersonal
A textual description of the content of the
resource, including abstracts in case of
document-like-objects of content descriptions in
case of visual resources.
- SCHEME
- DC.Description -- a short textual description
- (url) URL of a short textual description
- DC.Description.Abstract -- abstract of the
resource - (url)... URLabstract (to abstract within
body) - DC.Description.Notes -- additional (technical)
information - (url)... URLabstract (to note within body)
An abstract, a description or a note within the
body must to be surrounded by special commentary
texts.
21Date, Type, and Relation
- CONTENT
- DC.Date YYYYMMDD date of last modification
- DC.Date.Created YYYYMMDD date of the creation of
the first version - DC.Type preprint for preprints
- article for published articles
- software for published sources
- Text.Homepage for personal Homepages
- Text.Homepage.Organisation
- for research projects/groups
- DC.Relation (SCHEMEurl) URL of related document
22DC-Problems DATE and RELATION
- The notation of the DATE field has been fixed
to ISO8601. - But what are we providing access to - a digital
representation of the painting, or the Webpage it
is upon, or both? Larsgaard-Dec-97 - 11 Principle
- Each resource should have a discrete metadata
description, and each metadata description should
include elements to a single resource. It is
desirable to be able to link these descriptions
in a coherent and consistent manner (by usage of
the RELATION-field). - Subelements of the DATE-field (as of Feb-98)
- Date.Created
- Date.Issued
- Date.Accepted
- Date.Available
- Date.Gathered
- Date.Valid
- Subelements of the RELATION-field (under
development) - Relation.Type
As agreed upon at DC-5, Helsinki, Finland,
Oct-97 The Helsinki Metadata Workshop OCLC/Nat.
Library of Finland
23DC-Element Relation - under development
- An identifier of a second resource and its
relationship to the present resource. This
element permits links (via a SCHEME qualifier
free text default, URL, URN, ISBN,...) between
related resources and resource descriptions to be
indicated. - Inclusion Relation (e.g. collection, part of)
- DC.Relation.IsPartOf
- DC.Relation.HasPart
- Version Relation (edition, draft)
- DC.Relation.IsVersionOf
- DC.Relation.HasVersion
- Mechanical Relation (copy, format change, mirror
copy) - DC.Relation.IsFormatOf
- DC.Relation.HasFormat
- Reference Relation (citation)
- DC.Relation.References
- DC.Relation.IsReferencedBy
- Creative Relation (translation, annotation)
- DC.Relation.IsBasedOn
- DC.Relation.IsBasisFor
24DC-Element Coverage - under development
- The spatial and/or temporal characteristics of
the intellectual content of the resource.
Coverage may be modified by spatial or temporal
qualifiers - Subelements - as determined by the Coverage WG
- DC.Coverage.PeriodName
- DC.Coverage.PlaceName
- DC.Coverage.t
- DC.Coverage.x
- DC.Coverage.y
- DC.Coverage.z
- DC.Coverage.Polygon
- DC.Coverage.Line
- DC.Coverage.3d
Spatial coverage refers to a physical region
(e.g. celestial sector) use coordinates (e.g.,
longitude and latitude) or place names that are
from a controlled list or are fully spelled out.
Temporal coverage refers to what the resource is
about rather than when it was created or made
available.
25What is the Dublin Core?
Libraries
Internet/ WWW
Museums
DC Minima- lists
Astronomy
Humanities
Structuralists
Geospatial
Environment
Government
Tom Bakers Theory of Pidgin Metadata
Weibel-Oct-97 Pidginization results from the
need for communication among groups, who do not
share a common language. Creolization is the
process of complexification of a pidgin language
in order to make it more adequate to the
complexity of natural language expressivity.
26What the Dublin Core is Not
- It was never an objective of the DC working group
- to design a brute force simplification of
cataloguing interoperability is a main goal. - to reduce costs for resource descriptions
these costs basically depend from the users
needs. - to replace existing practice in cataloguing the
DC community, however, gets much useful critique
and support from the cataloguing community. - to prescribe syntax, formats or implementation
the usage of the WWW, however, is encouraged.
The Dublin Core is a simple (and uniform)
conceptual scheme for the description of
resources.
27Experience of a DC application to Images (III)
- Full cataloguing is a complex, time-consuming
process. Library administrators, when they feel
like being horrified, figure out how much time
(and therefore money) it takes per title - around
67 per item, at least at Davidsons Library, ... - There are many more possible methods of access
where full cataloguing is used the question is,
how necessary are they? And the answer is, it
depends. What are users looking for? - The general experience in university libraries is
that a brief record is sufficient, and
indeed, this brief record is what normally
displays in a library online catalogue. - Only the place of publication does not appear in
the Dublin Core element set.
Larsgaard-Dec-97
28Dublin Core and Classification (I)
If we could target our searches onto words which
are used as significant terms, we could achieve
an enormous improvement in precision. Metadata
can be used to achieve this by identifying just
the major concepts of the information resource.
Cathro-97
- DC metadata attributes can be used like
classification codes to restrict the search space
top down (thus shrinking the context) - Classification codes can be used within several
DC attributes (e.g. Subject and Keywords,
Description, Coverage) to guide navigation
(browsing, context-switching/shrinking/widening) - Certain DC attribute values can be used like
classification - The name of a scientist may guide the search for
items which are specific to the scientists area
of interest. - Some keywords are classification terms by their
very nature, e.g. terms of specialist terminology
e.g., in biology or medicine.
29Metadata-oriented Browsing and Searching
Math
Entry of search terms
Papers
GAMS classified
MSC
Software
People
MSC
Hypertext systems like HyperWave with integrated
search engines allow switching between browsing,
searching and hierarchical navigation modes. Thus
they enable a number of powerful context
switching methods.
30Support of Specialised Open Communities
Search Engine
User communities, such as mathematicians,
could use metadata to form (or isolate) their own
virtual collections of resources.
- Metadata are also useful to specify offline
services accordingly such as - alerting, announcing
- profile oriented searching
- within the context of large heterogeneous
collections of classified resources.
31Dublin Core as Inter-Metadata
Now it appears an even more common application of
DC is as lingua franca, a least common
denominator for indexing across heterogeneous
databases. ... The simplest way to index them
all with some degree of semantic consistency may
be to translate them all to DC. Caplan-97
- Integration of heterogeneous collections of
resources - Inter- and transdisciplinary research projects
are increasingly common in modern science - The research process of today results in rapidly
growing products which are separated from each
other by their content and form - If the Dublin Core will be accepted widely
- Also a market of search engines may evolve on the
grounds of the future WWW protocol suit and the
DC as universal data structure - Users of such engines may have access to an ever
growing number of heterogeneous and also well
structured digital resources
32Beyond Traditional Classification/Navigation
- Interactive Maps
- Virtual Tourist
- CityNet
- Geospatial coordinates
- EarthView
- Living Earth
- Icons, Images
- Blue-Skies
- CineBase (Video server)
- Chronologies, Historic Maps
- History of Mathematics (MacTutor)
- Theory, interactive navigation
- Famous Curves Index (MacTutor)
- Hypermedia navigation
- ChemWeb (XML based)
- ICM98 (HyperWave)