Automating Discovery from Biomedical Texts - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

Automating Discovery from Biomedical Texts

Description:

UIs for building and reusing hypothesis seeking strategies. ... PAP. h? PSA. Kall. PAP. g? Other possibilities as well. Make use of the literature ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 36
Provided by: melody87
Category:

less

Transcript and Presenter's Notes

Title: Automating Discovery from Biomedical Texts


1
Automating Discovery from Biomedical Texts
  • Marti Hearst Barbara Rosario
  • UC Berkeley
  • Agyinc Visit
  • August 16, 2000

2
The LINDI ProjectLinking Information for New
Discoveries
Two Main Thrusts
  • UIs for building and reusing hypothesis seeking
    strategies.
  • Statistical language analysis techniques for
    extracting propositions

3
Scenario Explore Functions of a Gene
  • Objective
  • Determine the functions of a newly sequenced Gene
    X.
  • Known facts
  • Gene X co-expresses (activated in the same cell)
    with Gene A, B, C
  • The relationship of Gene A, B, C with certain
    types of diseases (from medical literature)
  • Question
  • What types of diseases are Gene X related to?

4
Gene Co-expressionRole in the genetic pathway
Kall.
Kall.
g?
h?
PSA
PSA
PAP
PAP
g?
Other possibilities as well
5
Make use of the literature
  • Look up what is known about the other genes.
  • Different articles in different collections
  • Look for commonalities
  • Similar topics indicated by Subject Descriptors
  • Similar words in titles and abstracts
  • adenocarcinoma, neoplasm, prostate, prostatic
    neoplasms, tumor markers, antibodies ...

6
Developing Strategies
  • Different strategies seem needed for different
    situations
  • First see what is known about Kallikrein.
  • 7341 documents. Too many
  • AND the result with disease category
  • If result is non-empty, this might be an
    interesting gene
  • Now get 803 documents

7
Explore Functions of New Gene X
Medical Literature
Query
Projection
Mapping
Slide adapted from K. Patel
8
Developing Strategies
  • Different strategies seem needed for different
    situations
  • First see what is known about Kallikrein.
  • 7341 documents. Too many
  • AND the result with disease category
  • If result is non-empty, this might be an
    interesting gene
  • Now get 803 documents
  • AND the result with PSA
  • Get 11 documents. Better!

9
Explore Functions of New Gene X
Medical Literature
Query
Projection
Intersection
10
Developing Strategies
  • Look for commalities among these documents
  • Manual scan through 100 category labels
  • Would have been better if
  • Automatically organized
  • Intersections of important categories scanned
    for first

11
Explore Functions of New Gene X
Medical Literature
Query
Projection
Intersection
Slicing
Mapping
Slide adapted from K. Patel
12
Try a new tack
  • Researcher uses knowledge of field to realize
    these are related to prostate cancer and
    diagnostic tests
  • New tack intersect search on all three known
    genes
  • Hope they all talk about diagnostics and prostate
    cancer
  • Fortunately, 7 documents returned
  • Bingo! A relation to regulation of this cancer

13
Explore Functions of New Gene X
Medical Literature
Possible Function For Gene-X
Query
Query
Projection
Intersection
Slicing
Mapping
Slide adapted from K. Patel
14
Formulate a Hypothesis
  • Hypothesis mystery gene has to do with
    regulation of expression of genes leading to
    prostate cancer
  • New tack do some lab tests
  • See if mystery gene is similar in molecular
    structure to the others
  • If so, it might do some of the same things they
    do

15
Strategies again
  • In hindsight, combining all three genes was a
    good strategy.
  • Store this for later
  • Might not have worked
  • Need a suite of strategies
  • Build them up via experience and a good UI

16
The System
  • Doing the same query with slightly different
    values each time is time-consuming and tedious
  • Same goes for cutting and pasting results
  • IR systems dont support varying queries like
    this very well.
  • Each situation is a bit different
  • Some automatic processing is needed in the
    background to eliminate/suggest hypotheses

17
The User Interface
  • A general search interface should support
  • History
  • Context
  • Comparison
  • Operators Intersection, Union, Slicing
  • Operator Reuse
  • Visualization (where appropriate)
  • We have an initial implementation
  • It needs lots of work

18
Architecture of LINDI UI
  • Data Layer
  • Annotation Layer
  • User Interface Layer

19
Data Layer
  • Purpose
  • Hide different formats of text collections
  • Components
  • Data Abstractions representing records of a text
    collection
  • Operations performed on the data
  • Data
  • A set of records
  • Each record is a set of tuples with types
  • Operations
  • union, intersection, projection, mapping

20
Annotation Layer
  • Purpose
  • Associate data set with operations that produced
    them (history)
  • History is a first class object
  • Advantage
  • Streamline a sequence of operations
  • Reuse operations
  • Parameterize operations

21
User Interface
  • Direct manipulation of information objects and
    access operations
  • Query
  • Intersection
  • Union
  • Mapping
  • Slicing
  • Record and reuse of past operations
  • Parameterization of operations
  • Streamlining of operations

22
Initial Palette
23
Query Structure Determined by Collection Type
24
Query Operation Results
25
Projection Operation and Subsequent Results
26
Parameterized Query Repeat operations with
different values
GA
GB
GC
27
Intersection over Projected Attribute
28
Intersection over Projected Attribute
29
Example Interaction with UI Prototype
1 Query on Gene names 2 Project out only mesh
headings 3 Intersect the results 4 Map to create
a ranking 5 Slice out the top-ranked.
30
Future Work on UI
  • As currently designed
  • Better labeling
  • Better layout
  • Intuitive
  • Scalable
  • Connection to real backend
  • User Testing
  • Does direct manipulation work?
  • What operator sequences help?
  • How to improve parameterization?
  • More advanced
  • Support for strategies
  • Incorporation of NLP

31
Language Analysis Component
  • Goals
  • Extract Propositions from Text
  • Make Inferences

32
Language Analysis Component
  • Why Extract Propositions from Text?
  • Text is how knowledge at the propositional level
    is communicated
  • Text is continually being created and updated by
    the outside world

33
ExampleStatistical Semantic Grammar
  • To detect causal relationships between medical
    concepts
  • Title
  • Magnesium deficiency implicated in increased
    stress levels.
  • Interpretation
  • ltnutrientgtltreductiongt related-to
    ltincreasegtltsymptomgt
  • Inference
  • Increase(stress, decrease(mg))

34
Statistical Semantic Grammars
  • Empirical NLP has made great strides
  • But mainly applied to syntactic structure
  • Semantic grammars are powerful, but
  • Brittle
  • Time-consuming to construct
  • Idea
  • Use what we now know about statistical NLP to
    build up a probabilistic grammar

35
LINDI Target Components
  • Special UI for retrieving appropriate docs
  • Language analysis on docs to detect causal
    relationships between concepts
  • Probabilistic representation of concepts and
    relationships
  • UI User Hypothesis creation
Write a Comment
User Comments (0)
About PowerShow.com