Towards a Digital Library Theory: A Formal Digital Library Ontology - PowerPoint PPT Presentation

About This Presentation
Title:

Towards a Digital Library Theory: A Formal Digital Library Ontology

Description:

Towards a Digital Library Theory: A Formal Digital Library Ontology Marcos Andr Gon alves, Layne T. Watson, and Edward A. Fox Virginia Tech, Blacksburg, VA 24061 ... – PowerPoint PPT presentation

Number of Views:149
Avg rating:3.0/5.0
Slides: 38
Provided by: eNu45
Learn more at: https://fox.cs.vt.edu
Category:

less

Transcript and Presenter's Notes

Title: Towards a Digital Library Theory: A Formal Digital Library Ontology


1
Towards a Digital Library Theory A Formal
Digital Library Ontology
  • Marcos André Gonçalves, Layne T. Watson, and
    Edward A. Fox
  • Virginia Tech, Blacksburg, VA 24061 USA,
    fox_at_vt.edu
  • (For ACM SIGIR Mathematical/Formal Methods in
    Information Retrieval, MF/IR 2004, Sheffield, UK,
    Aug. 29, 2004)

2
Outline
  • Background The 5S Model
  • Motivation for this Work
  • Digital Library Formal Ontology
  • Taxonomy of DL Services
  • Applications of the Theory
  • Conclusions and Future Work

3
Background The 5S Model
  • Why 5S?
  • DLs are not benefiting from formal theories as
    have other CS fields DB, IR, PL, etc.
  • DL construction difficult, ad-hoc, lacking
    support for tailoring/customization
  • Conceptual modeling, requirements analysis, and
    methodological approaches are rarely supported in
    DL development.
  • Lack of specific DL models, formalisms,
    languages

4
Background The 5S Model
  • Informally, DLs can be defined as complex
    information systems that
  • help satisfy info needs of users (societies)
  • provide info services (scenarios)
  • organize info in usable ways (structures)
  • (re)present info in usable ways (spaces)
  • communicate info with users (streams)

5
Background The 5S Model
6
Background 5S and DL formal definitions and
compositions (April 2004 TOIS)
7
Background The 5S Model
  • Summary of TOIS 2004 Formal Definitions
  • A digital library is a 10-tuple (Streams,
    Structs, Sps, Scs, St2, Coll, Cat, Rep, Serv,
    Soc) in which
  • Streams is a set of streams, which are sequences
    of arbitrary types (e.g., bits, characters,
    pixels, frames)
  • Structs is a set of structures, which are tuples,
    (G, ?), where G (V, E) is a directed graph and
    ? (V ? E) ? L is a labeling function
  • Sps is a set of spaces each of which can be a
    measurable, measure, probability, topological,
    metric, or vector space.

8
Background The 5S Model
  • Scs sc1, sc2, , scd is a set of scenarios
    where each sck lte1k(p1k), e2k(p2k), ,
    ed_kk(pd_kk)gt is a sequence of events that also
    can have a number of parameters pik. Events
    represent changes in computational states
    parameters represent specific locations in a
    state and respective values.
  • St2 is a set of functions ? V? Streams? (? ?
    ?) that associate nodes of a structure with a
    pair of natural numbers (a, b) corresponding to a
    portion (span/segment) of a stream.
  • Coll C1, C2, , Cf is a set of DL collections
    where each DL collection Ck do1k, do2k, ,
    dof_kk is a set of digital objects. Each digital
    object dok (hk, Stm1k, Stt2k, ?k) is a tuple
    where Stm1k ? Streams, Stt2k ? Structs, ?k ?
    St2, and hk is a handle which represents a
    unique identifier for the object.

9
Background The 5S Model
  • Cat DMC_1, DMC_2, , DMC_f is a set of
    metadata catalogs for Coll where each metadata
    catalog DMC_k (h, msshk), and msshk mshk1,
    mshk2, , mshkn_hk is a set of descriptive
    metadata specifications. Each descriptive
    metadata specification mshki is a structure with
    atomic values (e.g., numbers, dates, strings)
    associated with nodes.
  • A repository Rep (Ci, DMC_i) (i1 to f) is a
    set of pairs (collection, metadata catalog) it
    is assumed there exist operations to manipulate
    the family of pairs (e.g., get, store, delete).
  • Serv Se1, Se2, , Ses is a set of services
    where each service Sek sc1k, .., scs_kk is
    described by a set of related scenarios.
  • Soc (C, R) where C is a set of communities and
    R is a set of relationships among communities.
    SM sm1, sm2, , smj, and Ac ac1, ac2, ,
    acr are two such communities where the former
    is a set of service managers responsible for
    running DL services and the latter is a set of
    actors that use those services.
  • Being basically an electronic entity, a member
    smk of SM distinguishes itself from actors by
    defining or implementing a set of operations
    op1k, op2k, , opnk ? smk

10
Background
11
Motivation
  • Previous definitions emphasize syntactic aspects,
    i.e., how digital library concepts are composed
    or built from previously defined concepts.
  • Complete a formal DL theory by
  • Making explicit the implicit relationships that
    exist among the DL formal concepts defined in
    Gonc04
  • Providing set of axiomatic rules that precisely
    define and constrain the semantics of the
    relationships
  • Categorizing and classifying DL services on the
    basis of the ontology
  • Research questions
  • How should DL services be built from the other DL
    components
  • Which are the fundamental and elementary DL
    services ?
  • How can services be built/composed from other DL
    services?
  • We will explore semantic relations and rules of
    the DL domain by using ontologies.

12
Digital Library Formal Ontology
  • An ontology is a tuple ? (Ontol_Concepts,
    Ontol_Rels) where
  • Ontol_Concepts is a family of ontological
    concepts,
  • Ontol_Rels is a family of relations.
  • Relations in Ontol_Rels are operationally
    realized by one or more rules (e.g., first-order
    logic axioms) which intentionally specify or
    constrain which elements of a concept can
    participate in a relation.
  • Ontol_Rules is a family of rules of a particular
    ontology.

13
Digital Library Formal Ontology
  • Relationships
  • Intra-Model
  • Video contains Audio (MM)
  • Metadata Catalog describes Collection (LIS)
  • Probabilistic Space is_a Measure Space
  • Service extends Service (reuse)
  • Service Manager inherits_from Service Manager
    (OO)
  • Inter-Model
  • Event executes Operation
  • Actor participates_in Scenario
  • Service Manager runs Service
  • Service employs/produces Streams ? Structures ?
    Spaces

14
Digital Library Formal Ontology
  • Concepts Se, Sc, e Key Se service Sc
    scenario e event.
  • Relations
  • contains ? Sc ? e
  • Symbolic Rule. ? x, y (x contains y ? Sc(x) ?
    e(y) ? ?j (j ?x.Dom ? y x(j)) )
  • precedes ? e ? e ? Sc happens_before ? e ? e ?
    Sc
  • Symbolic Rule 1. ? x, y, z (x precedesz y ? e(x)
    ? e(y) ? Sc(z) ? ? i, j (z contains x ? z
    contains y ? x z(i) ? yz(j) ? i 1 j))
  • Symbolic Rule 2. ? x, y, z (x happens_beforez y ?
    e(x) ? e(y) ? Sc(z) ? ? i, j (z contains x ? z
    contains y ? x z(i) ? yz(j) ? i lt j))
  • includes ? Se ? Se ? Sc ? Sc extends ? Se ? Se
    ? Sc ? Sc
  • Symbolic Rule 1. ?x, y (x includes y ? Sc(x) ?
    Sc(y) ? (?z e(z) ? y contains z ? x contains
    z) ? (?p, q e(p) ? e(q) ? p precedesy q ? p
    precedesx q))
  • Symbolic Rule 2. ?x, y (x extends y ? Sc(x) ?
    Sc(y) ? (?z e(z) ? y contains z ? x contains
    z) ? (?p, q e(p) ? e(q) ? p happens_beforey q ?
    p happens_beforex q))
  • Symbolic Rule 3. ?x, y (x extends y ? Se(x) ?
    Se(y) ? y ? x ? (x? y ? ?p, q Sc(p) ? Sc(q) ? p
    ? x ? q ? y ? p extends q))

15
Digital Library Formal Ontology
16
Digital Library Formal Ontology
  • Consistency Rules
  • Catalog-Collection
  • A complete catalog has at least one set of
    metadata specifications for each digital object
    in the collection it describes (surjective
    partial function).
  • In a consistent catalog, each set of metadata
    specifications describes (exactly) one digital
    object in the related collection (total
    function).
  • Scenarios-Society
  • A scenario x is consistent with regards to a
    set of service managers Y if each operation
    executed by each event in the scenario is defined
    in some service manager y ? Y.

17
Digital Library Formal Ontology
  • Characterizing employs/produces relationships
  • In the table each service is characterized by
  • parameters (input, output)
  • of the initial and final events
  • of the scenarios that compose those services
  • All other previous definitions and keys apply
    here.
  • That set is complemented with the following
    definitions

18
Services Related Definitions
  • A query q is the representation of user interest
    or information need.
  • Hyptxt is an hypertext wherein an anchor is a
    node.
  • A log_entry is a descriptive metadata
    specification about an event of a scenario.
  • Let doi doi1, doi2,, doin be a set of
    digital objects and Ct c1, c2,,cn be a set
    of labels for categories. A classifier classCt
    doi ? 2Ct is a function that maps a digital
    object to a set of categories.
  • A cluster cluk do1k, do2k, , donk is a
    subset of a set of digital objects.

19
Service User input Other Service Input Output
Acquiring doi Ci Cj
Browsing anchor Hyptxtk doi
Cataloging doi, msi_k (hi, mssi_m) (hi, mssi_(mk))
Classifying doi classCt (doi, ck_i)
Clustering doi X cluk_i
Expanding (query) doi IC_i, qi qj
Indexing Ci none IC_i
Linking doi Hyptxtk Hyptxtik
Logging none ei(pi) log_entryi
Rating doi ,acj none (doi,acj,rk)
Searching q, Ci IC_i dok
Visualizing doi tfrk spik
20
Applications A Taxonomy of DL Services
  • Infrastructure Services dealing with basic
    concepts such as collections and catalogs
  • Repository-Building create collections (digital
    objects) and/or catalogs (metadata
    specifications).
  • Preservational generate instances by copying
    collections (digital objects) or transforming
    (converting/translating) objects into different
    formats for preservation purposes
  • Add_Value either aggregate value/information to
    collections (digital objects) or connect objects
    together.
  • Information Satisfaction dealing with higher
    level societal requirements
  • KEY in next slide
  • Fundamental minimal set of services or essential
    to existence of a DL
  • Composite DL service takes input from some
    other service otherwise the service is called
    elementary.

21
Applications A Taxonomy of DL Services
22
Application A Taxonomy of DL Services
23
DL Services I/O Behavior
  • Regarding the prior figure, which shows
  • Instantiations of the Services Definition model
  • Inputs and outputs of examples of infrastructure
    and information satisfaction DL services
  • Key
  • CDL Collection
  • ICDL index for collection CDL
  • doi digital object
  • Soc Society

24
Applications A Taxonomy of DL Services
25
Application Defining Quality in Digital Libraries
  • Formal theory can help to define whats a good
    digital library by
  • Formally defining metrics of quality for each
    formal concept (and relationships)
  • Helping defining and applying numerical measures
    to these metrics
  • Consider this in the Information Life Cycle

26
(No Transcript)
27
Defining Quality in Digital Libraries
28
Defining Quality in Digital Libraries
  • Metadata specifications and metadata format -
    completeness
  • Completeness of metadata specifications refers to
    the degree to which values are present in the
    description, according to a metadata standard. As
    far as an individual property is concerned, only
    two situations are possible either a value is
    assigned to the property in question, or not.
  •  Metric
  • Completeness(msx) 1 - (no. of missing
    attributes in msx/ total attributes of the schema
    to which msx conforms)

29
Defining Quality in Digital Libraries
  • Metadata specifications and metadata format -
    completeness
  • OCLC NDLTD Union Catalog

30
Defining Quality in Digital Libraries
  • Services - Extensibility and Reusability
  • A service Y reuses a service X if the behavior of
    Y incorporates the behavior of X.
  • A service Y extends a service X if it subsumes
    the behavior of X and potentially includes
    additional subflows of events.
  • Metrics
  • Macro-Reusability(Serv) (? reused(sei), sei ?
    Serv)/ Serv, where reused is a 1, if ? smj,
    sej reuses si 0, otherwise.
  • Micro-Reusability(Serv) (? LOC(smx)
    reused(sei), smx ? SM, sei ? Serv, sex runs sei
    )/ ?LOC(sm), ?sm ? SM, where LOC corresponds to
    the number of lines of code of a service manager

31
Defining Quality in Digital Libraries
  • Services - Extensibility and Reusability

Macro-Reusability 3/16 0.187 Micro-Reusability
3630 / 11910 0.304
32
Application Re-engineering a DL Specification
Language
  • 5SL Specification Language
  • Reengineering
  • Using the relationships to redefine/reorganize
    the semantics and organization of the XML
    elements within the several sections of the DL
    specification

33
Re-engineering a DL Specification Language
34
Re-engineering a DL Specification Language
35
5SLGen Automatic DL Generation
36
Conclusions and Future Work
  • Presented a DL formal ontology which specifies
    the semantics of the relationships among the DL
    concepts therefore completing a theory for DLs
  • Applied the resulting ontology to
  • Define a taxonomy of DL services
  • Create a Quality Model for DLs
  • Re-engineer a DL specification language

37
Conclusions and Future Work
  • Future Work Include
  • Including Pre- and Post-Conditions in the Service
    Behavior Analysis
  • New Applications of the Model/theory
  • New Design and Generation Tools
  • Quality tools
  • Modeling Complex Heterogeneous/Integrated Systems
  • Archaeology (ETANA)
  • Develop theorems and proofs
  • Writing books
Write a Comment
User Comments (0)
About PowerShow.com