EXPERIENCES IN BUILDING AN ONTOLOGYDRIVEN IMAGE DATABASE FOR BIOLOGISTS - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

EXPERIENCES IN BUILDING AN ONTOLOGYDRIVEN IMAGE DATABASE FOR BIOLOGISTS

Description:

University of Pennsylvania School of Medicine. Image ... (who took the original micrograph, where, when, under what conditions, for what purpose, etc. ... – PowerPoint PPT presentation

Number of Views:19
Avg rating:3.0/5.0
Slides: 20
Provided by: drda79
Category:

less

Transcript and Presenter's Notes

Title: EXPERIENCES IN BUILDING AN ONTOLOGYDRIVEN IMAGE DATABASE FOR BIOLOGISTS


1
EXPERIENCES IN BUILDING AN ONTOLOGY-DRIVEN IMAGE
DATABASE FOR BIOLOGISTS
2
Outline
  • Why are images important?
  • What is the BioImage database?
  • Why use a semantic web architecture?
  • Lessons and research questions

3
Why are biological images important in the
post-genomic age?
  • Images are semantic instruments for capturing
    aspects of the real world, and form a vital part
    of the scientific record, for which words are no
    substitute
  • In the post-genomic world, attention is now
    focused on the organization and integration of
    information within cells, for functional analyses
    of gene products
  • In a month a single active cell biology lab may
    generate between 10 and 100 Gbytes of
    multidimensional image data

4
Images are complex
  • An image database must be able to store original
    images in any digital format currently available
    or yet to be invented, including multi-channel 3D
    images, multi-channel videos, etc.

5
The need for image databases
  • The value of digital image information depends
    upon how easily it can be located, searched for
    relevance, and retrieved
  • Detailed descriptive metadata about the images
    are essential
  • Without them, digital image repositories become
    little more than meaningless and costly data
    graveyards
  • Despite the growth of on-line journals that
    permit the inclusion of media objects, few of
    these resources are freely available, and those
    that are are difficult to locate and are not
    cross-searchable
  • There is thus a need for a free publicly
    available image database with rich
    well-structured searchable metadata
  • The BioImage Database seeks to fulfil that need

6
This view has a growing acceptance
7
What metadata?
  • Image acquisition (who took the original
    micrograph, where, when, under what conditions,
    for what purpose, etc.)
  • The media object itself (source and derivation,
    image type, dynamic range, resolution, format,
    codec, etc.),
  • The denotation of the referent (e.g. the name,
    age and condition of the subject),
  • Connotation of the referent (the images
    interpretation, meaning, purpose or significance,
    its relevance to its creator and others, and its
    semantic relationship to other images).
  • Field aspects of the real world that cannot
    conveniently be attached to any particular object
    (e.g. variations of illumination intensity or
    chemo-attractant concentration across the field
    of view of a light microscope image).
  • Sequences of change where there is a need to
    preserve the concept of object identity in the
    face of radical spatio-temporal changes in
    appearance.

8
Why use a semantic web architecture?
  • Traditional relational databases dont meet our
    needs
  • Image data is complex, layered, and difficult to
    model
  • Images are searched primarily through their
    metadata
  • Metadata is time consuming and difficult to
    obtain
  • Ontologies offer the promise of better retrieval
    accuracy through linking to instances in an
    ontology, rather than attempting to process free
    text.
  • Ontologies offer the promise of easy
    inter-operability with other systems

9
The BioImage Ontology
10
Lessons learnedPerformance, scalability
  • Database retrieval is slower than a traditional
    database would be
  • Scalability remains to be tested (true for all
    semantic web software)
  • Query languages (RDQL) are immature when compared
    to SQL
  • Parsing RDF is hard and slow (RDF-ABBREV output
    of the Jena parser is unreliable and the
    unstriped format requires multiple passes to
    create XML that can easily be transformed to HTML)

11
A problem with ontologies?
  • The volume of data generated in the Life Sciences
    is now estimated to be doubling every month
  • Already people look less and less at the raw
    scientific data (unless they are their own
    results)
  • As this volume of data accumulates, few if any of
    us will have the time or the mental capacity to
    assimilate new data, structure them in a
    meaningful way and extract information, without
    first processing the data through an ontology or
    some other similar machine-based organisational
    aid
  • THE ONTOLOGY WILL BE WRONG! (or we should all
    pack up and go home)

12
Paradigm shifts
  • Our human understanding of an area of science is
    never static, but is constantly being revised by
    new research
  • Such revisions in understanding are either
    evolutionary (incremental), following the
    progressive discovery of more and more detail,
    interpreted according to the prevailing paradigm,
    or revolutionary, when the prevailing paradigm is
    overthrown by another
  • How do paradigm revolutions succeed?
  • "A new scientific truth does not triumph by
    convincing its opponents and making them see the
    light, but rather because its opponents
    eventually die, and a new generation grows up
    that is familiar with it"
  • (Max Planck, 1949)

13
Factors preventing evolution
  • Ontology builders are monks (and nuns) - led by
    an abbot, a relatively senior domain expert
    likely to be committed to encapsulating the
    dominant paradigm
  • Substantial problems confront any newcomers
    wishing to contribute, since ontology building is
    time-consuming and expensive
  • Since an ontology expresses the community
    consensus, there will be massive social pressures
    against change
  • If large volumes of data have already been
    encoded using an existing ontology, this will
    make it difficult to introduce change
  • The first ontology in a domain may assume a
    monopolistic position that becomes unassailable,
    even if it has universally acknowledged
    weaknesses
  • Ontologies are unlikely to evolve in response to
    the same market forces that drive the development
    of applications software

14
Encapsulating the dominant paradigm
  • Imagine a section of an ontology describing the
    development of adult mammalian bone marrow and
    brain, constructed according to the pre-1980
    dominant paradigm that bone marrow develops from
    mesoderm, while brain develops from
    ectoderm

15
An example of paradigm evolution
  • Subsequently, adult mouse brain was found to
    contain haemopoietic stem cells
  • Bartlett (1982) hypothesised that these cells
    developed from foetal haemopoietic cells that
    entered the brain tissue before the barrier was
    established
  • This challenge to the dominant paradigm that
    brain tissues are derived exclusively from
    ectoderm can be accommodated by extending the
    graph

16
An example of paradigm revolution
  • More recently, Brazelton et al. (2000) claimed
    that haemopoietic stem cells from adult bone
    marrow can develop into neural cells in adult
    mouse brain
  • If true, this result overthrows the paradigm that
    neuronal cells can only develop from embryonic
    ectoderm, requiring a new ontology incompatible
    with the old
  • This new ontology is no longer an extension of
    the previous one, since neural cells no longer
    develop only from foetal neuroepithelium

17
A way forward using Named Graphs in RDF (and
OWL?)
  • In response to considerable frustration and
    confusion within the RDF community about the best
    method of reifying RDF statements, Jeremy Carroll
    et al. proposed an extension to RDF

18
Thanks and acknowledgements
  • David Shotton and Simon Sparks for BioImage
    developments (http//www.bioimage.org)
  • John Pybus, our computer systems manager, for
    keeping us running in spite of the problems
  • Liz Mellings for unbounded patience inputting
    data and testing
  • The European Commission for funding the BioImage
    Project (EC IST 5th Framework Contract
    2001-32688 ORIEL Online Research Information
    Environment for the Life Sciences
    http//www.oriel.org)

19
End
Write a Comment
User Comments (0)
About PowerShow.com