Title: Timothy H. W. Chan, Calum MacAulay, Wan Lam, Stephen Lam, Kim Lonergan, Steven Jones, Marco Marra, Raymond T. Ng
1Using the Permutation Test to Analyze Lung Cancer
SAGE Libraries
Timothy H. W. Chan, Calum MacAulay, Wan Lam,
Stephen Lam, Kim Lonergan, Steven Jones, Marco
Marra, Raymond T. Ng Department of Computer
Science, University of British Columbia The
British Columbia Cancer Research Centre
BACKGROUND
RESULTS
Permutation Test
- Previously analyzed publicly available Breast
and Brain SAGE libraries using the permutation
test (Ng. et al, Frontiers of Cardiovascular
Science 2003) and had some success (60 of top
ranked genes for breast SAGE data were verified
to be related to the neoplastic process). - BC Cancer Research Centre has produced various
Lung Cancer SAGE libraries including 5 CIS
(carcinoma in situ), 6 Invasive and 17 Normal
libraries. - It would be interesting to use the permutation
test to contrast and compare the various stages
of lung cancer and search for small
transcriptional changes (pathway regulators,
check points, switches).
- 1981 out 32,871 TAGS considered at 99
confidence failed the permutation test for Normal
vs Invasive Lung Cancer. - 1887 TAGS out of 40,476 TAGS considered at 99
confidence failed the permutation test for Normal
vs CIS Lung Cancer - 119 TAGS out of 20,077 TAGS considered failed the
permutation test for CIS vs Invasive Lung Cancer
Verification Results
OBJECTIVES
- To use the permutation test on normal and
different stages of lung cancer (CIS and
Invasive) SAGE libraries to discover candidate
cancer-related genes. - To contrast and compare these two stages of lung
cancer. - To demonstrate the advantages and power the
permutation test holds over the T-test.
METHODOLGY
Data Pre-Processing
- Quality of these genes is mostly dependent on
criteria A and B. Following closely are criteria
C and D as they are important genes in the
neoplastic process - Hypotheticals or genes who have no known
function did not meet any of the criteria. - Indicates that there exists a duplicate (more
than one TAG match to the same gene).
99 confidence - Output
1. Gene-to-Tag Assignment
- Some tags map to more than one gene. To deal with
this, the expression level of the tag is assigned
to each gene the tag maps to. For instance, if
tag A maps to genes 1, 2, and 3, all the genes
will be assigned the tag count of tag A.
Intersections of Top Ranked Genes Between the
Inv. vs Norm. and CIS vs Norm. Results
- The null hypothesis states that there is no
difference between the mean of the normal and the
cancer sample. If this were the case, it would
make no difference if we mix up the labels of
the libraries. - The alternative hypothesis states that it does
make a difference and the mean of the normal and
cancer sample are different.
2. Normalizing the Libraries
- The low intersections suggest that CIS and
Invasive stages of cancer are different.
- To reduce comparison errors, the tag frequencies
are normalized by scaling each library up to
300,000.
Scoring and Ranking Genes
Power of The Permutation Test
- Higher permutation scores correspond to either
greater differences between the two samples or
greater differential consistencies between the
two samples. - For each tissue and significant genes, rank the
genes by sorting the permutation scores in
descending order.
CONCLUSION
- With the permutation test, the number of samples
required for the test to be acceptable is
relatively low compared to other statistical
tests (ie. T-test, chi-square).
- The permutation test is great at picking out
genes that are related to the neoplastic process.
- It is also much better at picking out these genes
than the T-test. - The permutation test between Invasive and CIS
show that there are 119 Tags that are
differentially expressed which suggests that the
two stages of cancer have different genes turned
on or off. In addition, the intersections
between the top ranked genes between Normal vs
Invasive And Normal vs CIS are quite low (top 200
only 25 of the Tags intersect) which also
suggest differences between the 2 stages.
Literature Verification
- An investigation is conducted on the top ranked
genes for cancer-relation using the currently
available literatures on PubMed.
Verification Criteria
FUTURE DIRECTIONS
- Continue to use the permutation test to analyze
other SAGE libraries. - The permutation also has the power to detect
small transcriptional changes as long as the gene
across all the libraries have a consistent Tag
count. Further analysis of these low TAG count
significant genes (with high permutation scores)
is required as they could be vital pathway
regulators, checkpoints or switches that may have
led to the onset of lung cancer. - Validate genes further by experimentation.
- Use validated genes for early cancer detection or
derive new treatments from data.