Title: PRELIMINARY ARTIFICIAL NEURAL NETWORK ANALYSIS OF SELDI MASS SPECTROMETRY DATA FOR THE CLASSIFICATIO
1PRELIMINARY ARTIFICIAL NEURAL NETWORK ANALYSIS OF
SELDI MASS SPECTROMETRY DATA FOR THE
CLASSIFICATION OF MELANOMA TISSUE
2Melanoma(1)
- Serious form of skin cancer.
- Begins in the melanocytes (skin pigment melanin).
- Accounts for just 4 of all skin cancer cases but
causes most skin cancer related deaths. - Incidences are increasing, in the US since 1973,
incidence rate per 100,000 people/year has risen
from 5.7 14.3.
3Melanoma(2)
- Most commonly defined in four stages
- Stage I Presence of mole or growth on top layer
of skin. - Stage II Growth deeper but not spreading.
- Stage III Spread to neighbouring tissue.
- Stage IV Melanoma has spread to other, more
distant areas of the body.
4Melanoma(3)
- ABCD system can help tell a normal mole from one
that could be melanoma
- A Asymmetry melanoma lesions are typically
asymmetrical. - B Border melanoma lesions frequently have
uneven or irregular borders - C Colour melanoma lesions often contain
multiple shades of brown or black. - D Diameter early melanoma lesions are often
more than 6 mm in diameter
http//www.melanoma.com/diagnosing/look/
5Melanoma(4)
- If detected in its early stages, melanoma is
often treatable and curable. - Treatment usually involves surgery followed by
either - Chemotherapy
- Radiation Therapy
- Immunotherapy
- Combination of all 3
6SELDI-MS(1)
- Surface Enhanced Laser Desorption Ionisation Mass
Spectrometry. - Involves on chip separation of complex mixtures
together with mass spectrometry. - Able to rapidly analyse samples containing vast
amounts of proteins. - Generates patterns that these proteins produce.
- Shows differences between these patterns for
proteins expressed in different tissues, or in
tissues during different disease states.
7SELDI-MS(2)
? Sample is prepared in a molar excess
(10001) of a matrix compound. ? Matrix absorbs
light at the wavelength of the laser. ? Sample
and matrix molecules ejected into the gas phase,
where charge transfer occurs.
To mass spectrometer
8SELDI-MS(3)
- This laser desorbs the proteins on the chip,
causing them to be launched as ions. - The time-of-flight (TOF) of the ion before
detection by an electrode is a measure of the m/z
value of the ion. - Peptides with a larger m/z move more slowly down
the flight tube and therefore have a longer TOF.
9SELDI-MS(4)
10SELDI-MS(5)
11SELDI-MS(6)
- Mass intensity spectra is average of 5 reads.
- Data exported for ANN analysis.
12Identification of potential biomarkers/therapeutic
targets
- 2-30 KDa. mass range consists of approx. 20,000
data points. - Identification of ions/proteins that may have any
clinical relevance requires large sample size
(n50at least). - This suggests at least 1 million data points
would have to be screened to find potential
markers. - Essential to implement a bioinformatic approach
to cope with this mass of data.
13Artificial Neural Networks
- Recent years have shown an increasing application
of techniques such as ANNs for biological
problems, in particular, cancer. - Examples include prostatic, cervical, lung,
ovarian and breast. - Other uses include
- Prediction of rehospitilization
- Progression of glaucoma
- Classification of bacterial growth
- Plant response to ozone
14Aims
- The identification of ions important in the
correct classification of tumour grade which may
serve as potential biomarkers representative of a
specific disease state. - To achieve this, a multi-layer perceptron (MLP)
ANN with a back propagation (BP) algorithm and 2
hidden nodes was used to model for 72 melanoma
serum samples (36 stage I, 36 stage IV).
15Parameterisation of models(1)
- Allows us to determine the importance of the
inputs in a model. - Eliminates unimportant inputs, removing noisy
data and reduces model complexity. - Data split into blocks, with each block being
used in a separate model. - Data blocks
- 2-5, 2.5-5.5, 3-6, 3.5-6.5, 4-7, 4.5-7.5, 5-8,
5.5-8.5, 6-9, 6.5-9.5..27-30 KDa.
16Parameterisation of models(2)
- Blocks trained over 50 random training/test/produc
tion subsets (bootstrapping- high level of
confidence). - During training, ANN model is optimised against
test set, then validated against production
(validation) set. - Model convergence determined by a failure of
model to improve mean squared error (MSE) of the
test data for 20,000 training events. - Relative importance values for each input
recorded and used for initial screening.
17Parameterisation results (2-10 KDa.)
18Parameterisation results (10-20 KDa.)
19Parameterisation results (20-30KDa.)
20Parameterisation of models(3)
- This relative importance analysis used to reduce
the number of inputs in the model. - The top 1,000 ions of greatest relative
importance selected, and training process
repeated. - Same process used to determine top 500, 300, 200,
100, 50, 30 and finally top 20 ions from initial
set of 20,000.
21Relative Importance of top 1000 ions
22Additive approach(1)
- To identify the minimum number of ions from the
top 20 which were capable of accurately
predicting tumour grade. - All ions taken sequentially and used as a single
input in the model (creation of 1-ion model). - 100 training/test/production subsets of each
model used so that all tumour samples were
treated as unseen a number of times.
23Additive approach(2)
- MSE calculated and ion with the lowest error was
selected for further training. - Remaining 19 ions were added sequentially to this
creating the 2-ion model which was trained as
before. - Process repeated in creating a 3 and 4-ion model.
24Results Top 20 ions
25Results
26Receiver Operating Characteristic (ROC) curves
- Represents the values of the true positive ratio
(sensitivity) and false positive ratio
(specificity) at different possible prediction
thresholds. - Used in this study to assess model performance.
- Area under the curve (AUC) measures model
performance - A perfect test has an AUC of 1
27ROC curve results
AUC results 1 ion model 0.574 (poor) 2 ion
model 0.748 (fair) 3 ion model 0.809 (good) 4
ions model 0.854 (good)
28Comparison of models unseen data
29Summary
- Parameterisation of models
- Identifies importance of inputs in a model
- Removes noisy data from system
- Reduces complexity from model
- Top 20 ions (from initial 20,000) of importance
identified - Additive approach employed to create a 4-ion
model which predicts tumour grade with a gt80
accuracy.
30Conclusions
- By combining ANNs and SELDI-MS, essential ions
involved in the classification of tumour grade
can be found. - These models are currently being developed
further to deduce how many ions the optimal model
contains for this data set.
31Future Work
- Developing methods for the analysis of
interactions between ions/proteins within the
system. - Sequencing of important proteins.
- These may have clinical relevance, important in
establishing diagnostic markers.
32Acknowledgements
- The Nottingham Trent University
- Dr. Graham Ball
- Dr. Shahid Mian
- Prof. Robert Rees
- Universitätsklinikum Mannheim
- Prof. Dirk Schadendorf
- Fifth Framework Programme