Title: Towards a Digital Library Theory: A Formal Digital Library Ontology
1Towards a Digital Library Theory A Formal
Digital Library Ontology
- Marcos André Gonçalves, Layne T. Watson, and
Edward A. Fox - Virginia Tech, Blacksburg, VA 24061 USA,
fox_at_vt.edu - (For ACM SIGIR Mathematical/Formal Methods in
Information Retrieval, MF/IR 2004, Sheffield, UK,
Aug. 29, 2004)
2Outline
- Background The 5S Model
- Motivation for this Work
- Digital Library Formal Ontology
- Taxonomy of DL Services
- Applications of the Theory
- Conclusions and Future Work
3Background The 5S Model
- Why 5S?
- DLs are not benefiting from formal theories as
have other CS fields DB, IR, PL, etc. - DL construction difficult, ad-hoc, lacking
support for tailoring/customization - Conceptual modeling, requirements analysis, and
methodological approaches are rarely supported in
DL development. - Lack of specific DL models, formalisms,
languages
4Background The 5S Model
- Informally, DLs can be defined as complex
information systems that - help satisfy info needs of users (societies)
- provide info services (scenarios)
- organize info in usable ways (structures)
- (re)present info in usable ways (spaces)
- communicate info with users (streams)
5Background The 5S Model
6Background 5S and DL formal definitions and
compositions (April 2004 TOIS)
7Background The 5S Model
- Summary of TOIS 2004 Formal Definitions
- A digital library is a 10-tuple (Streams,
Structs, Sps, Scs, St2, Coll, Cat, Rep, Serv,
Soc) in which - Streams is a set of streams, which are sequences
of arbitrary types (e.g., bits, characters,
pixels, frames) - Structs is a set of structures, which are tuples,
(G, ?), where G (V, E) is a directed graph and
? (V ? E) ? L is a labeling function - Sps is a set of spaces each of which can be a
measurable, measure, probability, topological,
metric, or vector space.
8Background The 5S Model
- Scs sc1, sc2, , scd is a set of scenarios
where each sck lte1k(p1k), e2k(p2k), ,
ed_kk(pd_kk)gt is a sequence of events that also
can have a number of parameters pik. Events
represent changes in computational states
parameters represent specific locations in a
state and respective values. - St2 is a set of functions ? V? Streams? (? ?
?) that associate nodes of a structure with a
pair of natural numbers (a, b) corresponding to a
portion (span/segment) of a stream. - Coll C1, C2, , Cf is a set of DL collections
where each DL collection Ck do1k, do2k, ,
dof_kk is a set of digital objects. Each digital
object dok (hk, Stm1k, Stt2k, ?k) is a tuple
where Stm1k ? Streams, Stt2k ? Structs, ?k ?
St2, and hk is a handle which represents a
unique identifier for the object.
9Background The 5S Model
- Cat DMC_1, DMC_2, , DMC_f is a set of
metadata catalogs for Coll where each metadata
catalog DMC_k (h, msshk), and msshk mshk1,
mshk2, , mshkn_hk is a set of descriptive
metadata specifications. Each descriptive
metadata specification mshki is a structure with
atomic values (e.g., numbers, dates, strings)
associated with nodes. - A repository Rep (Ci, DMC_i) (i1 to f) is a
set of pairs (collection, metadata catalog) it
is assumed there exist operations to manipulate
the family of pairs (e.g., get, store, delete). - Serv Se1, Se2, , Ses is a set of services
where each service Sek sc1k, .., scs_kk is
described by a set of related scenarios. - Soc (C, R) where C is a set of communities and
R is a set of relationships among communities.
SM sm1, sm2, , smj, and Ac ac1, ac2, ,
acr are two such communities where the former
is a set of service managers responsible for
running DL services and the latter is a set of
actors that use those services. - Being basically an electronic entity, a member
smk of SM distinguishes itself from actors by
defining or implementing a set of operations
op1k, op2k, , opnk ? smk
10Background
11Motivation
- Previous definitions emphasize syntactic aspects,
i.e., how digital library concepts are composed
or built from previously defined concepts. - Complete a formal DL theory by
- Making explicit the implicit relationships that
exist among the DL formal concepts defined in
Gonc04 - Providing set of axiomatic rules that precisely
define and constrain the semantics of the
relationships - Categorizing and classifying DL services on the
basis of the ontology - Research questions
- How should DL services be built from the other DL
components - Which are the fundamental and elementary DL
services ? - How can services be built/composed from other DL
services? - We will explore semantic relations and rules of
the DL domain by using ontologies.
12Digital Library Formal Ontology
- An ontology is a tuple ? (Ontol_Concepts,
Ontol_Rels) where - Ontol_Concepts is a family of ontological
concepts, - Ontol_Rels is a family of relations.
- Relations in Ontol_Rels are operationally
realized by one or more rules (e.g., first-order
logic axioms) which intentionally specify or
constrain which elements of a concept can
participate in a relation. - Ontol_Rules is a family of rules of a particular
ontology.
13Digital Library Formal Ontology
- Relationships
- Intra-Model
- Video contains Audio (MM)
- Metadata Catalog describes Collection (LIS)
- Probabilistic Space is_a Measure Space
- Service extends Service (reuse)
- Service Manager inherits_from Service Manager
(OO) - Inter-Model
- Event executes Operation
- Actor participates_in Scenario
- Service Manager runs Service
- Service employs/produces Streams ? Structures ?
Spaces
14Digital Library Formal Ontology
- Concepts Se, Sc, e Key Se service Sc
scenario e event. - Relations
- contains ? Sc ? e
- Symbolic Rule. ? x, y (x contains y ? Sc(x) ?
e(y) ? ?j (j ?x.Dom ? y x(j)) ) - precedes ? e ? e ? Sc happens_before ? e ? e ?
Sc - Symbolic Rule 1. ? x, y, z (x precedesz y ? e(x)
? e(y) ? Sc(z) ? ? i, j (z contains x ? z
contains y ? x z(i) ? yz(j) ? i 1 j)) - Symbolic Rule 2. ? x, y, z (x happens_beforez y ?
e(x) ? e(y) ? Sc(z) ? ? i, j (z contains x ? z
contains y ? x z(i) ? yz(j) ? i lt j)) - includes ? Se ? Se ? Sc ? Sc extends ? Se ? Se
? Sc ? Sc - Symbolic Rule 1. ?x, y (x includes y ? Sc(x) ?
Sc(y) ? (?z e(z) ? y contains z ? x contains
z) ? (?p, q e(p) ? e(q) ? p precedesy q ? p
precedesx q)) - Symbolic Rule 2. ?x, y (x extends y ? Sc(x) ?
Sc(y) ? (?z e(z) ? y contains z ? x contains
z) ? (?p, q e(p) ? e(q) ? p happens_beforey q ?
p happens_beforex q)) - Symbolic Rule 3. ?x, y (x extends y ? Se(x) ?
Se(y) ? y ? x ? (x? y ? ?p, q Sc(p) ? Sc(q) ? p
? x ? q ? y ? p extends q))
15Digital Library Formal Ontology
16Digital Library Formal Ontology
- Consistency Rules
- Catalog-Collection
- A complete catalog has at least one set of
metadata specifications for each digital object
in the collection it describes (surjective
partial function). - In a consistent catalog, each set of metadata
specifications describes (exactly) one digital
object in the related collection (total
function). - Scenarios-Society
- A scenario x is consistent with regards to a
set of service managers Y if each operation
executed by each event in the scenario is defined
in some service manager y ? Y.
17Digital Library Formal Ontology
- Characterizing employs/produces relationships
- In the table each service is characterized by
- parameters (input, output)
- of the initial and final events
- of the scenarios that compose those services
- All other previous definitions and keys apply
here. - That set is complemented with the following
definitions
18Services Related Definitions
- A query q is the representation of user interest
or information need. - Hyptxt is an hypertext wherein an anchor is a
node. - A log_entry is a descriptive metadata
specification about an event of a scenario. - Let doi doi1, doi2,, doin be a set of
digital objects and Ct c1, c2,,cn be a set
of labels for categories. A classifier classCt
doi ? 2Ct is a function that maps a digital
object to a set of categories. - A cluster cluk do1k, do2k, , donk is a
subset of a set of digital objects.
19Service User input Other Service Input Output
Acquiring doi Ci Cj
Browsing anchor Hyptxtk doi
Cataloging doi, msi_k (hi, mssi_m) (hi, mssi_(mk))
Classifying doi classCt (doi, ck_i)
Clustering doi X cluk_i
Expanding (query) doi IC_i, qi qj
Indexing Ci none IC_i
Linking doi Hyptxtk Hyptxtik
Logging none ei(pi) log_entryi
Rating doi ,acj none (doi,acj,rk)
Searching q, Ci IC_i dok
Visualizing doi tfrk spik
20Applications A Taxonomy of DL Services
- Infrastructure Services dealing with basic
concepts such as collections and catalogs - Repository-Building create collections (digital
objects) and/or catalogs (metadata
specifications). - Preservational generate instances by copying
collections (digital objects) or transforming
(converting/translating) objects into different
formats for preservation purposes - Add_Value either aggregate value/information to
collections (digital objects) or connect objects
together. - Information Satisfaction dealing with higher
level societal requirements - KEY in next slide
- Fundamental minimal set of services or essential
to existence of a DL - Composite DL service takes input from some
other service otherwise the service is called
elementary.
21Applications A Taxonomy of DL Services
22Application A Taxonomy of DL Services
23DL Services I/O Behavior
- Regarding the prior figure, which shows
- Instantiations of the Services Definition model
- Inputs and outputs of examples of infrastructure
and information satisfaction DL services - Key
- CDL Collection
- ICDL index for collection CDL
- doi digital object
- Soc Society
24Applications A Taxonomy of DL Services
25Application Defining Quality in Digital Libraries
- Formal theory can help to define whats a good
digital library by - Formally defining metrics of quality for each
formal concept (and relationships) - Helping defining and applying numerical measures
to these metrics - Consider this in the Information Life Cycle
26(No Transcript)
27Defining Quality in Digital Libraries
28Defining Quality in Digital Libraries
- Metadata specifications and metadata format -
completeness - Completeness of metadata specifications refers to
the degree to which values are present in the
description, according to a metadata standard. As
far as an individual property is concerned, only
two situations are possible either a value is
assigned to the property in question, or not. - Metric
- Completeness(msx) 1 - (no. of missing
attributes in msx/ total attributes of the schema
to which msx conforms)
29Defining Quality in Digital Libraries
- Metadata specifications and metadata format -
completeness - OCLC NDLTD Union Catalog
30Defining Quality in Digital Libraries
- Services - Extensibility and Reusability
- A service Y reuses a service X if the behavior of
Y incorporates the behavior of X. - A service Y extends a service X if it subsumes
the behavior of X and potentially includes
additional subflows of events. - Metrics
- Macro-Reusability(Serv) (? reused(sei), sei ?
Serv)/ Serv, where reused is a 1, if ? smj,
sej reuses si 0, otherwise. - Micro-Reusability(Serv) (? LOC(smx)
reused(sei), smx ? SM, sei ? Serv, sex runs sei
)/ ?LOC(sm), ?sm ? SM, where LOC corresponds to
the number of lines of code of a service manager
31Defining Quality in Digital Libraries
- Services - Extensibility and Reusability
Macro-Reusability 3/16 0.187 Micro-Reusability
3630 / 11910 0.304
32Application Re-engineering a DL Specification
Language
- 5SL Specification Language
- Reengineering
- Using the relationships to redefine/reorganize
the semantics and organization of the XML
elements within the several sections of the DL
specification
33Re-engineering a DL Specification Language
34Re-engineering a DL Specification Language
355SLGen Automatic DL Generation
36Conclusions and Future Work
- Presented a DL formal ontology which specifies
the semantics of the relationships among the DL
concepts therefore completing a theory for DLs - Applied the resulting ontology to
- Define a taxonomy of DL services
- Create a Quality Model for DLs
- Re-engineer a DL specification language
37Conclusions and Future Work
- Future Work Include
- Including Pre- and Post-Conditions in the Service
Behavior Analysis - New Applications of the Model/theory
- New Design and Generation Tools
- Quality tools
- Modeling Complex Heterogeneous/Integrated Systems
- Archaeology (ETANA)
- Develop theorems and proofs
- Writing books