Indexing Mathematical Abstracts by Metadata and Ontology IMA Workshop, April 26-27, 2004 - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Indexing Mathematical Abstracts by Metadata and Ontology IMA Workshop, April 26-27, 2004

Description:

OAI extensions to federated search and other services for MathML-based metadata ... the fields of medicine, nursing, dentistry, veterinary medicine, the health care ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 28
Provided by: xiao92
Category:

less

Transcript and Presenter's Notes

Title: Indexing Mathematical Abstracts by Metadata and Ontology IMA Workshop, April 26-27, 2004


1
Indexing Mathematical Abstracts by Metadata and
OntologyIMA Workshop, April 26-27, 2004
  • Su-Shing Chen, University of Florida
  • suchen_at_cise.ufl.edu

2
Abstract
  • OAI extensions to federated search and other
    services for MathML-based metadata indexing and
    subject classification of mathematical abstracts.
  • Construction of ontology or conceptual maps of
    mathematics. Mathematical formulas are considered
    as elements of the ontology.
  • Ontology indexing by clustering mathematical
    abstracts or full papers into an information
    visualization interface so that users may select
    using ontology as well as metadata.

3
A DL Server with OAI Extensions Managing the
Metadata Complexity
4
(No Transcript)
5
A DL Server with OAI Extensions Managing the
Metadata Complexity
  • Built in capabilities
  • Harvester harvest various OAI compliant data
    providers
  • Data provider expose harvested and existing
    metadata sets
  • Service provider federated search and data
    mining capabilities on metadata sets

6
Harvester
7
Harvester Interface
8
Harvester Interface
9
Data Provider
  • Expose single or combined metadata sets harvested
    to other harvesters
  • Reformat metadata from different data providers
    to be harvested by other service providers (e.g.,
    originally Dublin Core, reformat to MARC before
    exposing)

10
Service Provider Federated Search
  • Emulating a federated search service on existing
    and combined harvested metadata sets
  • Federated search across potentially other search
    protocols

11
Federated Search
12
Federated Search
13
Federated Search
14
Service Provider Data Mining
  • Knowledge discovery on harvested metadata sets
  • Metadata classification using the Self-Organizing
    Map (SOM) algorithm
  • Improving retrieval effectiveness by providing
    concept browsing and search services

15
Self-Organizing Map Algorithm
  • Competitive and unsupervised learning algorithm
  • Artificial neural network algorithm for
    visualizing and interpreting complex data sets
  • Providing a mapping from a high-dimensional input
    space to a two-dimensional output space

16
Data Mining Service Provider System Architecture
Browser
Browser
Concept browsing request
Concept search request
Response
Response
Request
Response
Concept Harvester
SOM Categorizer
Input Vector Generator
Noun Phraser
Fetch metadata
Save SOM
Metadata Database
17
Concept Harvester
  • Screenshot of the SOM Categorizer

18
Construction of Two-level Concept Hierarchy
  • Constructing the SOM for each harvested metadata
    set
  • SOMs of the lower layer are added to the
    upper-layer SOM.

VTETD
19
Top-level Concept Browsing
20
Bottom-level Concept Browsing
21
MEDLINE Database
  • Developed by the National Library of Medicine
    (NLM)
  • Bibliographic citations and abstracts from more
    than 4,600 biomedical journals published in the
    United States and 70 other countries.
  • Covering the fields of medicine, nursing,
    dentistry, veterinary medicine, the health care
    system, and the preclinical sciences.
  • Over 12 million citations
  • Searchable via PubMed or the NLM Gateway

22
MeSH (Medical Subject Headings)
  • MEDLINE uses MeSH as its controlled vocabulary
    for indexing database articles
  • Indexers scan an entire article and assign MeSH
    headings (or MeSH descriptors) to each article
  • MeSH descriptors are arranged in both an
    alphabetic list and a hierarchical structure.
  • Updated annually to reflect the changes in
    medicine and medical terminology

23
Our Experimentation
  • Problems
  • It is well known that searching by descriptors
    will greatly improve the search precision.
  • However, it is very difficult for naïve users to
    know and use exact MeSH descriptors to search.
  • In addition, as the database of MEDLINE grows,
    information overload would prevent users from
    finding relevant information of their interest.
  • Proposed Approach
  • Categorizations according to MeSH terms, MeSH
    major topics, and the co-occurrence of MeSH
    descriptors
  • Clustering using the results of MeSH term
    categorization through the Knowledge Grid
  • Visualization of categories and hierarchical
    clusters

24
Data Access Services
MeSH Major Topic Tree View
SOM Tree View
25
Knowledge Grid
  • Knowledge Grid Architecture

Courtesy of Cannataro and Talia (Knowledge Grid
An Architecture for Distributed Knowledge
Discovery)
26
Future Directions
  • Develop a federated search service for
    OAI-compliant mathematical abstracts.
  • Develop an ontology or conceptual maps for
    mathematics.
  • Develop an ontology search service for
    mathematical abstracts and full papers.
  • Develop an interoperable architecture with other
    services, such as OCR of mathematical formulas.

27
Acknowledgement
  • Many thanks to the NSF NSDL Program.
  • Collaborators Joe Futrelle (NCSA), Ed Fox
    (Virginia Tech)
  • Student Team Hyunki Kim, Chee Yoong Choo,
    Xiaoou Fu, Yu Chen
Write a Comment
User Comments (0)
About PowerShow.com