Medical image retrieval the medGIFT project - PowerPoint PPT Presentation

1 / 50
About This Presentation
Title:

Medical image retrieval the medGIFT project

Description:

2005 H pitaux Universitaires de Gen ve. Outline. Geneva hospitals and medical informatics ... Geneva is strong in bioinformatics, genetics, neurosciences ... – PowerPoint PPT presentation

Number of Views:97
Avg rating:3.0/5.0
Slides: 51
Provided by: simH
Category:

less

Transcript and Presenter's Notes

Title: Medical image retrieval the medGIFT project


1
Medical image retrieval the medGIFT project
  • I2R 7.11.2005

Henning Müller Medical Informatics Service
2
Outline
  • Geneva hospitals and medical informatics
  • Medical image retrieval
  • Why, how, what?
  • The medGIFT retrieval framework
  • MRML, system integration,
  • Image pre-processing
  • Needs analysis of medical image users
  • Retrieval system evaluation
  • ImageCLEF benchmarking event
  • Future projects and conclusions

3
Hospitals and medical informatics
4
Geneva University Hospitals
  • 2,200 beds, 6 hospitals
  • 900 beds in the main clinic
  • 780,000 hospital days
  • 10,000 employees
  • 1,300 MDs
  • 22,000 operations per year
  • 35,000 images per day
  • 6,000 computers
  • Budget gt 1 billion/year
  • Research and teaching have high importance
  • Geneva is strong in bioinformatics, genetics,
    neurosciences
  • Service for medical informatics - management
    informatics

5
Medical Informatics Service
  • 60 employees, part of radiology
  • vs. administrative informatics, part of central
    administration
  • 10 persons in research
  • Research areas
  • Multimedia electronic patient record, Decision
    support systems
  • Telemedicine, especially with African countries
  • Knowledge representation, natural language,
    processing, data mining
  • Image processing, PACS, operation planning
  • Teaching
  • Postgraduate course in medical informatics
  • Virtual campus for medical students in medical
    informatics

6
Image Retrieval
7
Image retrieval
8
Content-based image retrieval
  • Based on visual features and visual queries
  • Query by image example, query by sketch, query by
    region
  • Visual features include color histograms, texture
    descriptors, shape descriptors, etc.
  • But query formulation is difficult
  • Page zero problem for query by example
  • Now match visual features and semantics, try
    object recognition of simple objects

9
A medical example
10
Global structure of retrieval systems
11
Medical image retrieval Why?
  • Increasing variety amount of imaging in
    medicine (diagnostics, treatment planning, follow
    up, )
  • Hard to know everything extremely well
  • Currently, images are mainly accessed by patient
    ID, used in a single context
  • Much information stored in images and connected
    text
  • Little of this knowledge is exploited
  • Case-based reasoning and evidence-based medicine
    need tools to integrate visual data as well
  • Standardized methods less dependent of MDs
    personal experience

12
Medical image retrieval How?
  • Create annotated datasets for real tasks such as
    diagnostic aid (administrative burdens)
  • To model expert knowledge
  • Infrastructures and database techniques exist
  • Web-based,
  • Visual features classification/retrieval
    techniques need to be optimized based on the
    problem
  • Integrate all knowledge available for a case
  • Visual (several varied images), textual (release
    letter, etc.), numerical (lab results)
  • Include real users (feedback loops)

13
Medical image retrieval What?
  • Application for teaching
  • Help lecturers to find images
  • Help students to browse catalogs (continuing
    education)
  • Replace books? Same environment as in the
    hospital
  • Application in research
  • Optimize case selection for studies
  • Include visual features into studies
  • Visual data mining, visual knowledge management
  • Application as diagnostic aid
  • In specialized domains
  • Automation of processes
  • DICOM header correction, annotation, choice of
    settings

14
medGIFT
15
The GIFT framework
  • GIFT GNU Image Finding Tool
  • Open source, free of charge, Linux
  • Techniques from text retrieval
  • Framework of components to avoid the
    redevelopment of large parts for every project
  • Web-based interfaces
  • MRML Multimedia Retrieval Markup Language
  • Features can be plugged in, parameterized
  • Feedback schemes
  • Pruning methods, to allow interactive search
  • medGIFT add utilities, and integration into
    medical applications

16
Framework overview
17
medGIFT
  • http//www.sim.hcuge.ch/medgift/ (open source)
  • Project for content-based search in medical image
    databases
  • Goals of the project
  • Better management of visual medical data
    (retrieval)
  • Visual Knowledge Management
  • Textual and visual data
  • Diagnostic aid
  • Specialized retrieval (lung CTs, fractures,
    dermatologic images)
  • Access to PACS data
  • In the short term
  • Research, Teaching

18
Interface
Query image
Diagnosis
Link to casimage
Similarity score
19
Visual features
  • Global color histogram (HSV, 18, 3, 3, 4 grey
    levels)
  • Color blocks at different scales and locations
  • Histogram of Gabor filter responses
  • 4 directions, 3 scales, quantized in 10 strengths
  • Gabor blocks at different scales and locations
  • 85,000 possible features, 1,000-3,000 features
    per image, distribution similar to words in text
    collections
  • Roughly Zipf distribution

20
Weighting schemes
  • Classical tf/idf
  • tf - term frequency
  • cf - collection frequency
  • j - feature number
  • Q - query with i1..N input images
  • k - possible result image
  • R - Relevance of an image in a query

21
Combination of visual and textual features
  • EasyIR text search engine, also open source
    (EPFL)
  • Frequency-based techniques similar to gift
  • Stemming and stop work removal to improve
    results, also for multilingual search
  • Mapping to MeSH terms delivers few terms reliably
    but high quality results
  • Linear combination of normalized results of text
    and visual system
  • Depending on the query the optimal factors are
    varying

22
Relevance feedback
  • One-image queries rarely lead to very good
    results
  • Mainly false positives
  • Several input images improve the query quality
    enormously
  • Negative feedback is extremely important
  • Positive feedback is often reordering of
    highest-ranked results
  • But problems with too much negative feedback in
    many systems
  • Log files of web demo to analyze user behavior
  • Learning of feature weightings as an additional
    factor
  • Long-term learning from the user interaction
  • Changes of feature sets during feedback
  • First tests promise good results

23
Long-term learning
  • Learn automatically from user interaction on
    non-classified databases
  • Log files from past interaction are used to
    improve future results
  • Images marked together by users in the same query
    step are taken into account
  • Positive, negative, neutral
  • Images marked together have something in common
  • Learning can include several levels (same user,
    same database, same domain, )

24
Using this as additional factor for weighting
  • Learning on feature not on image basis is the
    goal
  • Positive and negative feature occurrences
  • Additional factor in the frequency-based
    weighting for each feature
  • With much feedback a pure probability approach
    might be possible, as well as on an image level
  • Results are improved significantly, although web
    demo does not have reliable users

25
Casimage a radiological case database
  • Case database for teaching
  • http//www.casimage.com/, interface developed
    with the proprietary 4D software
  • gt65,000 images, 9,000 images externally
    accessible, 500 added per week
  • Case descriptions (textual) available in XML
  • Very varying quality
  • Mix of French and English
  • Interface is compatible to the MIRC (Medical
    Image Resource Center) standard of the RSNA

26
GIFT/casimage
27
GIFT integration
  • medGIFT -gt casimage
  • Simple link from image to case
  • Important to get info on images
  • Casimage -gt medGIFT
  • Constraint no change of a running routine
    application of the hospital
  • Simple button under an image with a link opening
    a new browser window
  • PHP interface traces address and downloads the
    images, then executes a query

28
Image pre-treatment
29
Lung segmentation
  • Concentrate visual search on animportant region
    of the image

30
Lung block analysis and classification
  • Segmentation of the lung
  • Cutting of the lung into blocks
  • Feature extraction from blocks
  • Classification of blocks into several classes (8
    in our case)
  • Learning database containing 112 annotated
    regions (1000 blocks of size 32x32)
  • Features Cooccurence matrices, Gabor filters,
    grey level histograms,
  • SVMs reach 84 accuracy healthy/non-healthy, 83
    into 8 classes

31
Problems with the classification
  • Several classes are extremely small
  • Unbalanced dataset
  • Healthy class is extremely heterogeneous
  • Sometimes the circle around the ROI is fairly
    large
  • Healthy tissue is incorrectly classified

32
Another problem Noise around object
Hospital logo
Text in the images
Specific problems
Large regions with no information
33
Object extraction
  • Mostly small structures with high frequencies
  • Object in the center, one large connected
    component
  • Remove certain objects specifically (logo, grey
    square)
  • Remove small structures
  • Query only on the image object

34
Object extraction steps
35
Object extraction examples
36
User needs
37
User needs
  • How to find out what the user really needs?
  • They will not tell you by themselves
  • Future use of images in medicine
  • HON (health on the net) media search
  • Log files from the web search engine
  • Mainly patients searching for information
  • Surveys among various medical professionals
  • Students, librarians
  • Clinicians, researcher, lecturers
  • Survey at OHSU and Geneva among 33 persons
  • Practical experiences when dealing with a PACS

38
Log file analysis of HONmedia search
  • http//www.hon.ch/HONmedia
  • 2000 searches per month
  • Preliminary results (Jan 2005)
  • More French than English (2/1), mainly 1-3 words
  • Mostly diagnosis and anatomic region, sometimes
    combined
  • Leukemia, tumeur glomique, fracture,
  • Many general questions
  • Childbirth, medical images, medical media,
  • Also XXX

39
Analysis of survey Questions
  • For which tasks are images useful for you?
  • What type of images do you use for each task?
  • Where and how do you search images
  • How do you define whether an image found is
    relevant or not?
  • What kind of search would be useful for you
  • Separately for the following areas research,
    clinics, lecturer, student, librarian
  • 18 participants in Geneva, 15 in Portland (OHSU)
  • Mainly research/clinician/lectures together

40
Analysis of survey first results
  • Tasks are extremely different depending on
    department, specific work, and experience
  • Mostly diagnostics and conference presentations
  • In diagnostics mainly radiographs and much CT,
    for research and teaching CTs and illustrations
  • Most research in the PACS, but frequently in
    google, our teaching file, and on specialized
    pages
  • Relevance is defined by experience, problems on
    the web with bad resolution/quality
  • Most wanted a search by pathology added and the
    possibility to find similar cases to a current
    patient

41
Performance Evaluation
42
Overview image retrieval benchmarks
  • Birds-I, Benchathlon
  • SPIE Electronic Imaging
  • Personal proposals
  • C. Leung (IAPR benchmark), Koskela,
  • TRECVID (for videos)
  • ImageEval
  • French, only
  • ImageCLEF
  • Cross Language Evaluation Forum
  • Four tasks in total, two medical tasks for image
    retrieval and classification

43
CLEF and ImageCLEF
  • Located at the Cross Language Evaluation Forum
    (CLEF)
  • Goal is to evaluate the retrieval of images
    through multi-lingual information retrieval
  • And not necessarily based on image information
  • 2003 a first image retrieval task with 4
    participants
  • Queries in different languages than the English
    collection annotation, image is part of the query
  • 2004 17 participants for two tasks (200 runs)
  • Medical task for visual image retrieval added
    where the query topic is an image, only, and the
    text is English/French mixed
  • Evaluation of interactive image retrieval
  • 2005 24 participants for four tasks, gt300 runs,
    36 inscriptions
  • Medical retrieval and classification tasks

44
ImageCLEF 2005 examples
Show me x-ray images with fractures of the
femur. Zeige mir Röntgenbilder mit Brüchen des
Oberschenkelknochens. Montre-moi des fractures du
fémur.
Show me chest CT images with emphysema. Zeige mir
Lungen CTs mit einem Emphysem. Montre-moi des CTs
pulmonaires avec un emphysème.
Show me any photograph showing malignant
melanoma. Zeige mir Bilder bösartiger
Melanome. Montre-moi des images de mélanomes
malignes.
45
ImageCLEF results
  • Resources 50,000 images for retrieval and 10,000
    images for classification
  • Annotation in English/French/German
  • Query includes text and 1-3 images
  • 3 types of queries (visual, mixed, semantic)
  • Average results are better using text than
    images, best results are textvisual
  • 130 runs submitted, mostly mixed, little feedback
  • Best result IPAL/I2R (map 0.2821)
  • Best visual map 0.1455 (I2R, but), best textual
    map 0.2084
  • Results vary extremely over queries
  • Classification task 87.4 best rate for 57
    classes

46
ImageCLEF 2006
  • Similar medical retrieval task
  • Medical automatic annotation with more complexity
  • gt100 classes or even IRMA hierarchy
  • Completely new interactive task
  • Based on FlickR database
  • New ad-hoc retrieval task
  • IAPR database (gt25000 holiday images, three
    languages)
  • Visual and textual queries
  • New automatic annotation task
  • Based on LTU database (300 objects, gt300 training
    imgs)

47
Future Conclusions
48
Future work in Geneva
  • FNS project on lung image texture analysis
    (Talisman)
  • Creation of large annotated dataset
  • Definition of other necessary data for
    diagnostics
  • 3D texture analysis of lung slices
  • EU project AneurIST
  • Large project on data integration for aneurism
    detection
  • Genetics data, lab values, organ data, patient
    data,
  • Images are an important part of the diagnostics
  • Multimodality important for analysis and
    retrieval
  • GRID trechnology as basis for storage and
    computation

49
Conclusions
  • Content-based medical image retrieval can become
    important in teaching, research and diagnostics
  • To use the inherently stored knowledge of images
  • Integration of various data sources and images is
    needed
  • More is needed than technical solution
  • Users need to be included in the development
  • Hospitals need to work with computer science
    researchers (more communication)
  • Standardized evaluation is needed to identify
    promising techniques

50
Questions?
Write a Comment
User Comments (0)
About PowerShow.com