Medical Digital Library to Support Scenario Specific Information Retrieval - PowerPoint PPT Presentation

About This Presentation
Title:

Medical Digital Library to Support Scenario Specific Information Retrieval

Description:

Medical Digital Library to Support Scenario Specific Information Retrieval Wesley W. Chu wwc_at_cs.ucla.edu Computer Science Department University of California – PowerPoint PPT presentation

Number of Views:157
Avg rating:3.0/5.0
Slides: 39
Provided by: wwc86
Learn more at: http://web.cs.ucla.edu
Category:

less

Transcript and Presenter's Notes

Title: Medical Digital Library to Support Scenario Specific Information Retrieval


1
Medical Digital Library to Support Scenario
Specific Information Retrieval
  • Wesley W. Chu
  • wwc_at_cs.ucla.edu
  • Computer Science Department
  • University of California
  • Los Angeles, California

2
A Project of theNIH Grant at UCLA
  • A Digital File Room for Patient Care, Education,
    and Research

3
Background
Background Hypothesis Specific Aims
Significance Approach and Innovations
Research Progress
  • Current file rooms managing patient records have
    limited functionality
  • Main goal of mapping patient ID to patient
    records
  • PACS implementations are an electronic version of
    the traditional file room

4
Background
Background Hypothesis Specific Aims
Significance Approach and Innovations
Research Progress
Lack of structure makes...
  • Finding relevant information for a particular
    user is time consuming and labor intensive
  • Poorly structured and incomplete results, which
    may affect patient management
  • Current search tools limited for general use and
    not tailored to specific users or tasks

5
Digital File Room Requirements
Background Hypothesis Specific Aims
Significance Approach and Innovations
Research Progress
  • A navigable information space providing
  • Relevant and reputable information
  • Access to similar patient records
  • Content-based cross referencing
  • Dynamically updated data repository
  • Tailored access for specific users and devices

6
Hypotheses
Background Hypothesis Specific Aims
Significance Approach and Innovations
Research Progress
  • A digital file room (digital library) that
    delivers relevant and structured answers to
    specific query can be developed from existing
    medical databases
  • Such a digital file room will increase user
    satisfaction and improve patient management

7
Specific Aims
Background Hypothesis Specific Aims
Significance Approach and Innovations
Research Progress
  • SA1 Develop a system that identifies and
    provides access to reputable information sources
  • SA2 Provide users with greater query capability
    (e.g. similar-to, approximate)
  • SA3 Extract knowledge from patient data, medical
    literature and radiology teaching files to
    support content-based cross-referencing
  • SA4 Provide access to dynamically updated
    collections based on patient data
  • SA5 Adapt information retrieval to user and
    device characteristics

8
Significance
Background Hypothesis Specific Aims
Significance Approach and Innovations
Research Progress
  • Extend patient record to provide tailored and
    timely access to a broader array of reputable
    medical information

9
Approach and Innovations
Background Hypothesis Specific Aims
Significance Approach and Innovations
Research Progress
  • Intelligent information registration
  • Provide access to multiple, related data sources
    through a single access point
  • Content-based navigation and matching
  • Develop similarity matching based on medical
    concepts patterns
  • Content correlation
  • User and device modeling
  • Adaptive information retrieval based on user and
    device models
  • Scenario-based information web (proxies)
  • Develop information web linking clustered data
    sources for agiven set of related tasks (i.e.,
    scenario)

10
Intelligent Information Registration
Background Hypothesis Specific Aims
Significance Approach and Innovations
Research Progress
  • Registers multiple information sources to provide
    transparent access through a single point (proxy
    object).
  • Information requests are routed to appropriate
    data sources based on query characteristics
  • Data sources are hierarchically clustered
    according to a four-layer data model

11
Content-Based Navigation Matching
Background Hypothesis Specific Aims
Significance Approach and Innovations
Research Progress
  • Two types of navigation
  • Navigation of the information space using proxies
    and content correlation
  • Pattern/similarity navigation using type
    abstraction hierarchies (TAHs)

12
Pattern-Based Type Abstraction Hierarchies
Background Hypothesis Specific Aims
Significance Approach and Innovations
Research Progress
  • Scalable, hierarchical knowledge structures that
    facilitate similarity matching

13
Adaptive Information Retrieval
Background Hypothesis Specific Aims
Significance Approach and Innovations
Research Progress
  • Tailors query processing and query results
    according to
  • Particular user
  • Characteristics of their device
  • Examples
  • Doctors prefer JAMA or Lancet while patients
    prefer Time or CNN.
  • High resolution workstations support large,
    detailed imaging studies while portable devices
    need lower-bandwidth data.
  • Allows the system to retrieve appropriate data
    for a particular query, user, and device

14
Scenario-Based Proxy
Background Hypothesis Specific Aims
Significance Approach and Innovations
Research Progress
  • A framework that defines, for a particular domain
    and set of tasks, the access methods to and the
    relationships between information sources.
  • intelligent information registration
  • pattern-based similarity matching
  • adaptive information retrieval
  • information web

15
Scenario-Based Information Web
Background Hypothesis Specific Aims
Significance Approach and Innovations
Research Progress
  • A directed graph that defines access paths for
    navigation among proxy objects

correlated-to
similar-to
Literature
Patient
correlated-to
Teaching File
similar-to
16
Scenario-Based Information Web
Background Hypothesis Specific Aims
Significance Approach and Innovations
Research Progress
  • Similar-to links relate objects based on their
    similarity
  • patients similar by age, sex, and disease

Extends the scope of the digital file room into a
digital medical library
  • Correlated-to links relate objects based on
    related content
  • disease can be correlated to relevant literature

17
Research Progress
Background Hypothesis Specific Aims
Significance Approach and Innovations
Research Progress
  • Phrase Indexing
  • Phrase generated from a n-word combination in a
    sentence.
  • Domain Specific Retrieval
  • Document Summarization
  • Content Correlation
  • Linking of relevant documents via patterns

18
Domain Specific Retrieval
Background Hypothesis Specific Aims
Significance Approach and Innovations
Research Progress
  • Document are grouped into domain-specific
    collections
  • Medical patient reports
  • Web sites are often tailored to specific subject
    areas
  • Phrases can capture content better than single
    word, thus improve retrieval performance

19
Problem With Longer Phrases
Background Hypothesis Specific Aims
Significance Approach and Innovations
Research Progress
Large combinatorial problem
To process longer phrases it is necessary to
partition documents into smaller segments
20
Phrase Analysis
Background Hypothesis Specific Aims
Significance Approach and Innovations
Research Progress
  • A phrase is defined as any 2, 3 or 4 words
    co-occurring in a sentence (word combination)
  • Very large number of possible phrases
  • Use a stoplist to remove useless words
  • Normalize words to a common stem

right
The
upper
lobe
mass
is
seen
again.
sentence
case
right
the
upper
lobe
mass
is
seen
again
normalization
stop word
right
upper
lobe
mass
seen
again
removal
right
upp
lob
mass
seen
again
stemming
right
upp
lob
mass
seen
again
sorting
mass
right
lob
again
lob
mass
seen
mass
mass
again
candidate
right
lob
upp
mass
again
right
2-word
seen
lob
right
seen
combinations
seen
again
upp
lob
upp
right
upp
again
seen
upp
21
Document Retrieval Evaluation
Background Hypothesis Specific Aims
Significance Approach and Innovations
Research Progress
  • Preliminary evaluation
  • A domain specific collection of documents
  • Can phrase analysis limited to sentences improve
    retrieval effectiveness?
  • SMART system (single word terms) used as baseline
  • Data
  • Thoracic radiology patient reports
  • Dictated reports
  • Describe anatomy and abnormal findings such as
    enlarged lymph nodes and cancer masses

22
Domain SpecificDocument Retrieval
Background Hypothesis Specific Aims
Significance Approach and Innovations
Research Progress
  • Query right upper lobe mass

23
Automatic Text Summarization
Background Hypothesis Specific Aims
Significance Approach and Innovations
Research Progress
  • Salton Method
  • Given a text file with n paragraphs
  • A paragraph can be represented by Di(di1, di2,
    , dim)
  • dik is the weight to represent the importance
    for term Tk(word or phrase)
  • The pair-wise similarity of two paragraphs
  • Sim(Di,Dj) ? dik djk , k 1..m
  • Text relationship map
  • Nodes paragraph
  • Links pair-wise similarity of the connected
    nodes
  • Links are created if Sim(Di, Dj) gt threshold

Bushiness of a node of links of a node Text
Summarization derived from the Bushy nodes.
24
Performance Comparison of Sultans Summarization
Method Based on Phrase and Single Word
Background Hypothesis Specific Aims
Significance Approach and Innovations
Research Progress
Aspirin.txt Aspirin.txt words words words 2W phrases 2W phrases 2W phrases 3W phrases 3W phrases 3W phrases
Threshold Threshold 0.1 0.2 0.3 0.1 0.2 0.3 0.1 0.2 0.3
Paragraphs Ranking Based on Bushiness No.1 4 6 8 2 2 2 2 2 2
Paragraphs Ranking Based on Bushiness No.2 6 8 2 3 3 3 3 3 3
Paragraphs Ranking Based on Bushiness No.3 8 3 3 6 6 6 8 8 8
Paragraphs Ranking Based on Bushiness No.4 1 4 4 1 4 4 4 4 4
Paragraphs Ranking Based on Bushiness No.5 5 5 5 8 5 5 6 6 6
Paragraphs Ranking Based on Bushiness No.6 2 1 6 4 1 1 5 5 5
Paragraphs Ranking Based on Bushiness No.7 3 2 1 5 8 8 7 7 7
Paragraphs Ranking Based on Bushiness No.8 9 9 9 7 7 7 1 1 1
Paragraphs Ranking Based on Bushiness No.9 7 7 7 9 9 9 9 9 9
Summarization based on Phrases are less sensitive
to Threshold setting than Single Words.
25
N-words Distribution
26
Number Distinct Freq Words
27
Number of Valid Sentences
28
Performance Comparison
29
Comparison (cont)
30
Content Correlation
Background Hypothesis Specific Aims
Significance Approach and Innovations
Research Progress
  • Given a document in one collection, content
    correlation links relevant documents in another
    document collection

31
Document ClusterBy Pattern
Background Hypothesis Specific Aims
Significance Approach and Innovations
Research Progress
  • A pattern is a set of unique terms that
    characterize some features in the data set
  • Patterns can be found in a collection of
    documents by data mining
  • Documents are grouped into clusters based on
    patterns via clustering technique

32
Cluster Signature
Background Hypothesis Specific Aims
Significance Approach and Innovations
Research Progress
  • Every cluster can be classified according to the
    occurrence frequency of the patterns
  • Looking to answer
  • The set of patterns summarize a given cluster?
  • How the patterns related among the clusters ?

Literature
Patient Records
33
Deriving Cluster Signature
Background Hypothesis Specific Aims
Significance Approach and Innovations
Research Progress
  • Metrics
  • Local Cluster Certainty (LCC) measures the
    coverage of a pattern in a given cluster
    (Popularity)
  • The Global Cluster Certainty (GCC) measures the
    coverage of a pattern among clusters
    (Exclusiveness)
  • The Cluster Signature is the set of those
    patterns that have both high LCC and GCC
  • Documents from one collection (source) can be
    linked to relevant clusters in another collection
    (target)

34
Preliminary Results
Background Hypothesis Specific Aims
Significance Approach and Innovations
Research Progress
  • A collection of 69 pediatric urology literature
    abstracts taken from Medline were clustered using
    the complete link clustering algorithm
  • 3 large clusters, each with 2 or more
    sub-clusters
  • GCC and LCC were calculated for patterns found in
    several sub-clusters
  • Data from one sub-cluster is reported here

35
GCC
Background Hypothesis Specific Aims
Significance Approach and Innovations
Research Progress
LCC
Term/Phrase Cl
Pediatr 1.0
Result 1.0
Patient 1.0
Perform 1.0
Compl 1.0
Laparoscop 1.0
Urolog 0.34
Laparoscop pediatr 1.0
Laparoscop perform 1.0
Diagnost laparoscop 0.35
Laparoscop operat 0.35
Compl rate 0.35
Laparoscop patient 0.35
Laparoscop operat perform 0.0817
Laparoscop patient perform 0.0817
Term/Phrase Cg
Laparoscop 0.1887
Compl 0.0817
Child Laparoscop 1.0
Laparoscop patient 1.0
Compl Laparoscop 1.0
Comple techn 1.0
ltMEASgt compl 1.0
Laparoscop perform 0.6088
Compl rate 0.4564
Laparoscop patient perform 1.0
Laparoscop perform procedur 1.0
ltMEASgt compl rate 1.0
Laparoscop pediatr perform 1.0
Compl laparoscop techn 1.0
36
Project Summary
Background Hypothesis Specific Aims
Significance Approach and Innovations
Research Progress
  • A system that provides
  • relevant and reputable information,
  • access to similar patient records,
  • content-based cross referencing,
  • a dynamically updated data repository, and
  • tailored access for specific users and devices
  • will
  • augment the patient record to provide tailored
    and timely access to a broader array of reputable
    information and
  • extend the digital file room into a digital
    medical library.

37
Research Results
Background Hypothesis Specific Aims
Significance Approach and Innovations
Research Progress
  • Phrase Indexing
  • Developed an efficient algorithm for extracting
    n-word features from textual documents
  • Phrase index provide better results than single
    word index in document retrieval and
    summarization
  • Content Correlation via Cluster Signature (LCC
    GCC)
  • Preliminary results reveal the feasibility using
    cluster signature for linking relevant documents
  • Work begun on proxy for information navigation

38
Future Work
Background Hypothesis Specific Aims
Significance Approach and Innovations
Research Progress
  • Develop Ontology for Intelligent Information
    Registration
  • User Model for Information Retrieval
Write a Comment
User Comments (0)
About PowerShow.com