Title: Indexing Mathematical Abstracts by Metadata and Ontology IMA Workshop, April 26-27, 2004
1Indexing Mathematical Abstracts by Metadata and
OntologyIMA Workshop, April 26-27, 2004
- Su-Shing Chen, University of Florida
- suchen_at_cise.ufl.edu
2Abstract
- OAI extensions to federated search and other
services for MathML-based metadata indexing and
subject classification of mathematical abstracts. - Construction of ontology or conceptual maps of
mathematics. Mathematical formulas are considered
as elements of the ontology. - Ontology indexing by clustering mathematical
abstracts or full papers into an information
visualization interface so that users may select
using ontology as well as metadata.
3A DL Server with OAI Extensions Managing the
Metadata Complexity
4(No Transcript)
5A DL Server with OAI Extensions Managing the
Metadata Complexity
- Built in capabilities
- Harvester harvest various OAI compliant data
providers - Data provider expose harvested and existing
metadata sets - Service provider federated search and data
mining capabilities on metadata sets
6Harvester
7Harvester Interface
8Harvester Interface
9Data Provider
- Expose single or combined metadata sets harvested
to other harvesters - Reformat metadata from different data providers
to be harvested by other service providers (e.g.,
originally Dublin Core, reformat to MARC before
exposing)
10Service Provider Federated Search
- Emulating a federated search service on existing
and combined harvested metadata sets -
- Federated search across potentially other search
protocols
11Federated Search
12Federated Search
13Federated Search
14Service Provider Data Mining
- Knowledge discovery on harvested metadata sets
- Metadata classification using the Self-Organizing
Map (SOM) algorithm - Improving retrieval effectiveness by providing
concept browsing and search services
15Self-Organizing Map Algorithm
- Competitive and unsupervised learning algorithm
- Artificial neural network algorithm for
visualizing and interpreting complex data sets - Providing a mapping from a high-dimensional input
space to a two-dimensional output space
16Data Mining Service Provider System Architecture
Browser
Browser
Concept browsing request
Concept search request
Response
Response
Request
Response
Concept Harvester
SOM Categorizer
Input Vector Generator
Noun Phraser
Fetch metadata
Save SOM
Metadata Database
17Concept Harvester
- Screenshot of the SOM Categorizer
18Construction of Two-level Concept Hierarchy
- Constructing the SOM for each harvested metadata
set - SOMs of the lower layer are added to the
upper-layer SOM.
VTETD
19Top-level Concept Browsing
20Bottom-level Concept Browsing
21MEDLINE Database
- Developed by the National Library of Medicine
(NLM) - Bibliographic citations and abstracts from more
than 4,600 biomedical journals published in the
United States and 70 other countries. - Covering the fields of medicine, nursing,
dentistry, veterinary medicine, the health care
system, and the preclinical sciences. - Over 12 million citations
- Searchable via PubMed or the NLM Gateway
22MeSH (Medical Subject Headings)
- MEDLINE uses MeSH as its controlled vocabulary
for indexing database articles - Indexers scan an entire article and assign MeSH
headings (or MeSH descriptors) to each article - MeSH descriptors are arranged in both an
alphabetic list and a hierarchical structure. - Updated annually to reflect the changes in
medicine and medical terminology
23Our Experimentation
- Problems
- It is well known that searching by descriptors
will greatly improve the search precision. - However, it is very difficult for naïve users to
know and use exact MeSH descriptors to search. - In addition, as the database of MEDLINE grows,
information overload would prevent users from
finding relevant information of their interest. - Proposed Approach
- Categorizations according to MeSH terms, MeSH
major topics, and the co-occurrence of MeSH
descriptors - Clustering using the results of MeSH term
categorization through the Knowledge Grid - Visualization of categories and hierarchical
clusters
24Data Access Services
MeSH Major Topic Tree View
SOM Tree View
25Knowledge Grid
- Knowledge Grid Architecture
Courtesy of Cannataro and Talia (Knowledge Grid
An Architecture for Distributed Knowledge
Discovery)
26Future Directions
- Develop a federated search service for
OAI-compliant mathematical abstracts. - Develop an ontology or conceptual maps for
mathematics. - Develop an ontology search service for
mathematical abstracts and full papers. - Develop an interoperable architecture with other
services, such as OCR of mathematical formulas.
27Acknowledgement
- Many thanks to the NSF NSDL Program.
- Collaborators Joe Futrelle (NCSA), Ed Fox
(Virginia Tech) - Student Team Hyunki Kim, Chee Yoong Choo,
Xiaoou Fu, Yu Chen