Title: A%20Computer-Aided%20Diagnosis%20System%20For%20Digital%20Mammograms%20Based%20on%20Radial%20Basis%20Functions%20and%20Feature%20Extraction%20Techniques
1A Computer-Aided Diagnosis System For Digital
Mammograms Based on Radial Basis Functions and
Feature Extraction Techniques
- Dissertation written by
- Mohammed Jirari
- Proposal for Ph.D. Candidate Examination
- July 23rd, 2003
2 Why This Project?
- Breast Cancer is the most common cancer and is
the second leading cause of cancer deaths - Mammographic screening reduces the mortality of
breast cancer - But, mammography has low positive predictive
value PPV (only 35 have malignancies) - Goal of Computer Aided Diagnosis CAD is to
provide a second reading, hence reducing the
false positive rate
3What is a Mammogram?
- A Mammogram is an x-ray image of the breast.
Mammography is the procedure used to generate a
mammogram - The equipment used to obtain a mammogram,
however, is very different from that used to
perform an x-ray of chest or bones. The breast is
composed of tissues that are similar to each
other in density. Changes or abnormalities in the
breast tissue are often very subtle. Therefore,
the mammogram machines, film, and developing
process are specially designed to take pictures
of these subtle differences.
4Mammograms (cont.)
- In order to get a good image, the breast must
also be flattened or compressed. This may be
uncomfortable, but it will not harm the breast in
any way and is extremely important for obtaining
a clear image. Compression of the breast is also
beneficial because it results in a lower dose of
radiation. - In a standard examination, two images of each
breast are taken--one from the top (called a
cranio-caudal or CC view) and one from the side
(called a medio-lateral oblique or MLO view).
This ensures that the images display as much
breast tissue as possible.
5Mammogram Examples
Mammogram of a left breast, cranio-caudal (from
the top) view
Mammogram of a left breast, medio-lateral oblique
(from the side) view
6Purpose of CAD
- Mammography is the most reliable method in early
detection of breast cancer. - But, due to the high number of mammograms to be
read, the accuracy rate tends to decrease. - Double reading of mammograms has been proven to
increase the accuracy, but at high cost. - CAD can assist the medical staff to achieve high
efficiency and effectiveness. - The physician/radiologist makes the call not CAD
7Proposed Method
- The proposed method will assist the physician by
providing a second opinion on reading the
mammogram, by pointing out an area (if one
exists) delimited by its center coordinates and
its radius. - If the two readings are similar, no more work is
to be done. - If they are different, the radiologist will look
at it one more time to make the final diagnosis.
8Co-occurrence Matrices
- The joint probability of occurrence of gray level
a and b for two pixels with a defined spatial
relationship in an image. - The spatial relationship is defined in terms of
distance d and angle ?. - From these matrices, a variety of features may be
extracted.
9Co-occurrence Matrices (cont.)
- In my project, the matrices are constructed at
distance of d1 and d3 and for angles ?0, 45,
90, 135. - For each matrix, eight features are extracted.
- Can be formally represented as follows
10Features Used
- Energy or angular second moment
-
- Entropy
- Maximum Probability
- Inverse Difference moment
- ?2, ?1
11Features Used (cont.)
- Contrast
- Homogeneity
- Inertia or variance
12Features Used (cont.)
13Radial Basis Network Used
- Radial basis networks may require more neurons
than standard feed-forward backpropagation FFBP
networks - BUT, can be designed in a fraction of the time to
train FFBP - Work best with many training vectors
14Radial Basis Network with R Inputs
15aradbas(n)
16Radial basis network consists of 2 layers a
hidden radial basis layer of S1 neurons and an
output linear layer of S2 neurons
17Data Used in my Project
- The dataset used is the Mammographic Image
Analysis Society (MIAS) MINIMIAS database
containing Medio-Lateral Oblique (MLO) views for
each breast for 161 patients for a total of 322
images. - Every image is
- 1024 pixels X 1024 pixels X 256
18Preprocessing
- In order to improve the quality of the images and
make feature extraction more reliable, 2
techniques were used - Cropping cuts the black parts of the image
(almost 50) based on a threshold - Enhancement Histogram equalization to
accentuate the features to be extracted by
increasing the dynamic range of gray levels
19Preprocessing result
a-Original mammogram b-after cropping c-after
cropping and histogram equalization
20Feature extraction
- The extraction phase is needed in order not to
feed the whole image as input to the neural
network. The method applied takes the whole
cropped image and calculates the co-occurrence
matrices at distance d1 and d3. The angles
used are ?0, 45, 90, 135 with the fifth
matrix being the mean of the 4 directions. The
co-occurrence matrices are calculated and the
eight statistical features mentioned earlier are
computed.
21Training
- After normalizing the data between 0 and 1 for
the network to have a common range, the training
begins. - The first training set was made up of 212
mammograms with 81 abnormal ones, with features
calculated at distances d1 and d3. - The second training set was made up of 163
mammograms with 81 abnormal ones, with features
calculated at distances d1 and d3.
22Example of a network used
23Testing
- A mammogram is presented to the trained network
and the output is a suspicious area denoted by
its centers x and y coordinates and its radius.
If the mammogram is considered to be normal then
zeros are returned for the coordinates and
radius. - The radiologist can then review his original
assessment of the patient if some areas uncovered
by the network were not originally looked at
closely. - The whole database is tested and the accuracy is
calculated. - The smaller dataset performed better than the
larger, and when d3 results were better also
compared to d1.
24Results
- There were 2 training datasets 163 and 212
- There were 2 distance measures 1 and 3
- There were 3 spreads 0.1, 0.25, and 0.05
- There were 3 goals 0.00003, 0.008, 0.00005.
- For 12 possible combinations.
- The NN was sensitive to the unbalanced data
collection that contained about 70-30 split in
the larger training set. Therefore the smaller
dataset was preferred. - Achieving a high recognition is not that
appealing if the TPF is small
25Representative Preliminary Results
Net 1 Net 2 Net 3
TPF 0.01639 0.72973 0.88043
FPF 0.5939 0.0 0.3478
Recognition 0.3323 0.9068 0.7174
of Neurons 133 91 151
26Future work/plans
- Use more features like standard deviation,
skewness, kurtosis, ... - Which feature(s) have the most impact
- Rank the features from best to worst (single
input to NN) - Select most significant feature(s) by using
leave one out method - Determine whether the area is benign or malignant
by adding the severity of the abnormality to the
training.
27Future work/plans (cont.)
- Try and reduce False Negatives on the basis of
region characteristics size, difference in
homogeneity and entropy. - Use larger database to train/learn, since most
commercial CADs use 100,000s mammograms to try
and recognize foreign samples . - Increase the recognition rate to diagnose with
100 accuracy since saving human lives is at
stake. Reaching 80 rate determines credibility
of CAD. May or may not be reached when tested on
foreign mammograms, but can gain valuable ideas
as to how to improve.
28Future work/plans (cont.)
- Use segmentation of breast from its background as
it may make the feature extraction more accurate. - May experiment with multichannel wavelet
transform and Kalman-filtering NN, since wavelet
transform provides an efficient multiresolution
representation. I may also experiment with a
fuzzy neural CAD using a fuzzy detection
algorithm using a sliding window. Comparing the
results will be worth investigating. - Will use some data mining techniques to unveil
new patterns/relationships between the presented
patients.