NetRaVE: A novel method for constructing gene networks from microarray data - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

NetRaVE: A novel method for constructing gene networks from microarray data

Description:

use gene expression as a ... extend this to find a local' network of genes close to this response ... Human, Chicken, Fugu, Rat, Mouse, Plants, ... – PowerPoint PPT presentation

Number of Views:108
Avg rating:3.0/5.0
Slides: 30
Provided by: thedo4
Category:

less

Transcript and Presenter's Notes

Title: NetRaVE: A novel method for constructing gene networks from microarray data


1
NetRaVE A novel method for constructing gene
networks from microarray data
Bill Wilson CSIRO Mathematics and Information
Sciences North Ryde, Sydney
www.csiro.au
2
Microarray data analysis
  • The usual analysis!
  • Differential gene expression
  • List of Top100 genes
  • Lacks any connections or structure
  • Where to start?
  • Stop looking at favourite genes!!

3
Microarray data analysis
  • Current analysis paradigm
  • Gene expression values are treated as a response
    to a condition, not a predictor.
  • GeneRaVE analysis
  • use gene expression as a predictor of a response
  • analyse gene expression values to find a small
    set of genes predictive of a response.
  • extend this to find a local network of genes
    close to this response

4
Gene RaVE in action
  • Properties of the algorithm
  • A supervised approach that requires no
    preselection of variables.
  • Parameter estimation and variable selection
    occur simultaneously.
  • Significance of our results are assayed by
    randomly permuting the labels or sample classes
    and rebuilding our model. What is the chance we
    get the same results by chance?
  • V-fold cross validation of results is also used,
    to obtain an estimate of overall prediction
    error, and to ensure no overfitting of our model.
  • (build model on v-1 groups, predict the group
    left out)
  • Used when independent test set not available ..
    usually always.

5
Network building method
  • Identify genes predictive of our response
  • Regressions (linear or trees) are carried out
    using the expression of the classifier genes as
    our response, predictors are the expression
    levels of the remaining genes
  • repeat this process using the genes selected in
    the regression as the response variables in the
    next round
  • Crossvalidate to prevent over fitting. Tree
    pruning!

6
A gene regulatory network??
7
The Smoking Data
Two Classes Smokers (34) Never-smokers(23) 57
samples
Affymetrix U133A array 22, 283 genes
(probesets)
8
The Smoking Data - classifiers
Can classify the different subtypes with the
expression of 3 genes. Cross validated error rate
0.000!
  • CYP1B1 cytochrome P450, family 1, subfamily B,
    polypeptide 1
  • (xenobiotic metabolism)
  • CEACAM5 carcinoembryonic antigen-related cell
    adhesion molecule 5
  • ALDH3A1 aldehyde dehydrogenase 3 family,
    memberA1
  • (protects against oxidative damage)

9
The Smoking Data - Local Gene Network
  • rma (quantile)
  • linear

10
The Smoking Data - Local Gene Network
11
The Smoking Data - Local Gene Network
  • metabolism (Xenobiotics / Hormones)
  • CYP1B1, CYP1A1 expression activated by PAHs,
    cigarette smoke, catabolises estrogen,
  • -AKR1B10 catalyses the reduction of carbonyl
    group-containing xenobiotics - AKR1C1, involved
    in progesterone catabolism.
  • MUC5AC, respiratory mucins. Upregulated in
    response to inflammation, pathogenic factors.
  • MUC5B, salivary mucin.
  • immediate-early genes regulators of cell
    proliferation, differentiation, and
    transformation
  • EGR1 (interacts physically with Jun),
  • FOS, FOSB (all regulated by the JNK pathway
    which is activated by oxidative stress) - TOB1,
    supresses cell growth, antiproliferative,
  • - ATF3, regulated by EGR1

12
The Smoking Data - Local Gene Network
  • Immune system genes
  • HLA-DQA1, plays a central role in the immune
    system by presenting peptides derived from
    extracellular proteins.- CXCL1 chemokine, - C3
    complement component

CEACAM5 Carcinoembryonic antigen-related cell
adhesion molecule 5, prognostic for coloRC.
13
The Smoking Data - recent publications
Recent publications showing relationships
detected in our network analysis - a form of
validation
Penning TM. AKR1B10 a new diagnostic marker of
non-small cell lung carcinoma in smokers. Clin
Cancer Res. 2005 Mar 111(5)1687-90
Bottone FG Jr, Moon Y, Alston-Mills B, Eling TE.
Transcriptional regulation of activating
transcription factor 3 involves the early growth
response-1 gene. J Pharmacol Exp Ther. 2005
Nov315(2)668-77.
Baginski TK, Dabbagh K, Satjawatcharaphong C,
Swinney DC. Cigarette Smoke Synergistically
Enhances Respiratory Mucin Induction by
Pro-inflammatory Stimuli. Am J Respir Cell Mol
Biol. 2006 Mar 16.
14
St Judes Leukemia Data
5 Classes of leukemia BCR_ABL E2A_PBX1 MLL TEL_A
ML1 T_ALL others 104 samples Affymetrix U133A
/ B arrays 44,000 genes (probesets)
  • Can classify the different subtypes with the
    expression of 6 genes.
  • Cross validated error rate 0.048
  • PBX1
  • SHCD1A
  • PCLO
  • C20ORF103
  • REDD2
  • DNAPTP6

15
St Judes Leukemia Data (500,000 probes)
5 probes identified Cross validated error rate
0.039
16
ZBTB34
HIP1R
IGF-IImRNABP
ELK3
SLC27A2
INTRON of SLIC1
ZFHX1B
CCT2
KCNN1
ABTB1
PLCE1
PBX1
SCHIP2
BCL2
FLJ20313
GCSH
PCLO
Serpina6
REDD2
ZNF258
Y
FLHSD2
HLA DRB4
ALU seq
SHCD1A
HLA DPB1
PKC?
IGKC
C20orf103
DNAPTP6
TcRbVar
HLA DRB3
FBXW7
DCTN4
ABCC1
HLA DRB1
PKC?
Galectin
MRP036
HLA DQB1
AP3M1
HTLF
MRP621
17
ZBTB34
HIP1R
IGF-IImRNABP
ELK3
SLC27A2
INTRON of SLIC1
ZFHX1B
CCT2
KCNN1
ABTB1
PLCE1
PBX1
SCHIP2
BCL2
FLJ20313
GCSH
PCLO
Serpina6
REDD2
ZNF258
Y
FLHSD2
HLA DRB4
ALU seq
SHCD1A
HLA DPB1
PKC?
IGKC
C20orf103
DNAPTP6
TcRbVar
HLA DRB3
FBXW7
DCTN4
ABCC1
HLA DRB1
PKC?
Galectin
MRP036
HLA DQB1
AP3M1
HTLF
MRP621
18
Network edges
The expression of DCTN4 predicts the expression
we observe for DNAPTP6, and the expression of
DNAPTP6 in turn predicts the expression of the
leukemia variable.
19
ZBTB34
HIP1R
IGF-IImRNABP
ELK3
Cell cycle
SLC27A2
INTRON of SLIC1
ZFHX1B
CCT2
KCNN1
ABTB1
PLCE1
PBX1
SCHIP2
BCL2
FLJ20313
GCSH
PCLO
Serpina6
LEUKEMIA
REDD2
ZNF258
FLHSD2
HLA DRB4
ALU seq
SHCD1A
HLA DPB1
PKC?
IGKC
C20orf103
DNAPTP6
TcRbVar
Protein degradation
HLA DRB3
FBXW7
DCTN4
ABCC1
HLA DRB1
PKC?
Galectin
MRP036
HLA DQB1
Immunesystem
AP3M1
HTLF
MRP621
20
ZBTB34
HIP1R
IGF-IImRNABP
ELK3
Cell cycle
SLC27A2
INTRON of SLIC1
ZFHX1B
CCT2
KCNN1
ABTB1
PLCE1
PBX1
SCHIP2
BCL2
FLJ20313
GCSH
PCLO
Serpina6
LEUKEMIA
REDD2
ZNF258
FLHSD2
HLA DRB4
ALU seq
SHCD1A
HLA DPB1
PKC?
IGKC
C20orf103
DNAPTP6
TcRbVar
Protein degradation
HLA DRB3
FBXW7
DCTN4
ABCC1
HLA DRB1
PKC?
Galectin
MRP036
HLA DQB1
Immunesystem
AP3M1
HTLF
The identity of a few interesting genes in the
analysis
MRP621
21
Hypothesis to the lab!
PKC?
C20orf103
Protein Kinase C, eta Regulates transcricption
factors. .. expression is highly correlated with
tumour progression in renal cell carcinoma
LEUKEMIA
IGKC
Unknown protein Highly conserved in Human, Mouse,
Rat, Fish, Chicken, C.elegans. Contains LAMP
domain. Implies association with lysosome
membrane. Conserved segments in promoter regions
of Mouse and Human genes that potentially bind
haematopoetic specific trans factors. Contains
potential FBXW7/CDC4 degron.
St Judes Leukemia dataset (Ross. M et al, Blood
2003) 104 patients 6 (ALL) leukemia
classes T-ALL E2A-PBX1 BCR-ABL TEL-AML1 MLL Hyperd
iploidgt50 Affymetrix U133A/B chips
Immunoglobulin kappa constant region (light
chain) Essential for immunoglobulin formation
FBXW7
F-Box WD-40 protein7 CDC4 Key regulator of cell
cycle. Mutated in certain carcinomas.
22
Adding extra information to data analyses
What kind of information is out there? Molecular
Databases ProteinProtein interactions -
DIP Metabolic pathways Genomics
Databases Genomes! Human, Chicken, Fugu, Rat,
Mouse, Plants, blah blah Mappings! Transcripts,
mRNA, Genes, SNPs, TFBS, Things!
23
NetRave - analysis of yeast microarray data
Yeast microarray data Gasch et al. Mol. Biol.
Cell 11, 4241 (2000) 172 microarrays 10
different stress conditions ProteinProtein
binary interactions from DIP The NUPP116
gene NUP116 is part of the nuclear pore complex
(the protein door that lets things in and out of
the nucleus) .. and contains a motif that binds
mRNA.
24
NetRave - analysis of yeast microarray data
Linear networks with the following amount of
protein-protein interaction 0.0 0.2 0.4 0.6
25
0.2
26
0.4
27
0.6
28
NetRaVE - analysis of microarray data
  • Relationship networks from microarray data
  • GeneRaVE connects together genes that are
    predictive of each others expression values.
  • We can successfully build sparse networks from
    gene expression data that make biological
    sense, ie we can make a story out of them.
    Bringing in extra data improves the network.
  • Currently working on methods to validate the
    networks, and also partnering with research
    groups that are interested in validating the
    results in the laboratory, and testing hypotheses
    from our networks.

29
BHH - CMIS
Harri Kiiveri Aloke Phatak David
Mitchell Maree OSullivan Rob Dunne Glenn
Stone
Write a Comment
User Comments (0)
About PowerShow.com