Systems biology: identification of regulatory regions and disease causing genes and mechanisms - PowerPoint PPT Presentation

Loading...

PPT – Systems biology: identification of regulatory regions and disease causing genes and mechanisms PowerPoint presentation | free to download - id: 729be2-MTU4N



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Systems biology: identification of regulatory regions and disease causing genes and mechanisms

Description:

Systems biology: identification of regulatory regions and disease causing genes and mechanisms PhD defense Peter Van Loo Promotor: P. Marynen Co-promotors: B. De Moor – PowerPoint PPT presentation

Number of Views:124
Avg rating:3.0/5.0
Slides: 80
Provided by: Peter1541
Learn more at: http://homes.esat.kuleuven.be
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Systems biology: identification of regulatory regions and disease causing genes and mechanisms


1
Systems biology identification of regulatory
regions and disease causing genes and mechanisms
  • PhD defense
  • Peter Van Loo
  • Promotor P. Marynen
  • Co-promotors B. De Moor
  • and C. De Wolf-Peeters
  • Human Genome Laboratory
  • Departement of Human Genetics
  • May 23th, 2008

2
Introduction
Gene expression

Regulatory regions
Genes
3
Introduction
Gene expression

Regulatory regions
Genes
1
4
Introduction
Gene expression

Regulatory regions
Genes
2
5
Introduction
Gene expression

Regulatory regions
3
Genes
6
Introduction
Systems biology identification of regulatory
regions and disease causing genes and mechanisms
7
Introduction
Systems biology identification of regulatory
regions and disease causing genes and mechanisms
TSS
5 UTR
Exon
Intron
3 UTR
regulatory region
transcription factor binding site
TATA-box
gene
regulatory region
8
Introduction
Systems biology identification of regulatory
regions and disease causing genes and mechanisms
  • Disease gene identification
  • Linkage analysis
  • Cytogenetics
  • Molecular cytogenetics
  • e.g. array-CGH
  • Candidate genes
  • Validation labour intensive

? Computational gene prioritization by data
integration
9
Introduction
Systems biology identification of regulatory
regions and disease causing genes and mechanisms
10
Introduction
Gene expression

Regulatory regions
Genes
1
11
Computational detection of regulatory regions
cis-regulatory modules
Wasserman and Sandelin, Nat Rev Genetics 2004
12
A classification of existing CRM detection methods
Type I
Type II
Type III

13
Computational detection of cis-regulatory
modules method overview
TSS
CDS
CDS
CNS 1
Human gene
3 UTR
5 UTR
Exon 1
Exon 2
LAGAN VISTA
Mouse gene
Set of co-regulated genes
Transcriptional Regulatory Model
p,v
CNS data- base
211 bp
MotifScanner
ModuleMiner
TFBS data- base
Transfac
ModuleScanner
Jaspar
New target genes
14
The transcriptional regulatory model (TRM) -
ModuleScanner
  • Collection of Position Weight Matrices
    Parameters
  • For a transcriptional regulatory model T, we can
    assign a score to a sequence s, as
  • TFBS transcription factor binding site
  • log S(t) logarithmic score of TFBS t
  • f(p,t) limitations, depending on parameters p (0
    if invalid, 1 if all criteria are satisfied)
  • maximalisation over all binding sites of all
    transcription factors in the model
  • ModuleScanner ranks all genes in the genome
  • Highest ranking genes are putative target genes

15
ModuleMiner parameterless CRM detection in sets
of co-regulated genes
  • Search space all possible TRMs

v
97 bp
167 bp
p
p,v
211 bp
191 bp
p,v
43 bp
465 bp
16
ModuleMiner parameterless CRM detection in sets
of co-regulated genes
  • For each TRM use ModuleScanner to score all genes

v
97 bp
167 bp
p
p,v
211 bp
191 bp
p,v
43 bp
465 bp
17
ModuleMiner parameterless CRM detection in sets
of co-regulated genes
  • Use order statistics to assign a score to the
    ranks of the co-regulated genes

Set of co-regulated genes
Rank ratios ri
Ranks
6 11 18 29 45
6.8E-4 1.2E-3 2.0E-3 3.3E-3 5.1E-3
Assign p-value
All genes in genome, ordered by transcriptional
regulatory model score

18
ModuleMiner parameterless CRM detection in sets
of co-regulated genes
  • Use order statistics to assign a score to each TRM

19
ModuleMiner parameterless CRM detection in sets
of co-regulated genes
  • Use order statistics to assign a score to each
    TRM
  • Select the best performing TRM

p
8E-9
211 bp
20
ModuleMiner parameterless CRM detection in sets
of co-regulated genes
  • Use order statistics to assign a score to each
    TRM
  • Select the best performing TRM
  • Genetic algorithm based optimization

p
8E-9
211 bp
21
In silico validation leave-one-out
cross-validation
High quality set of 12 smooth muscle specific
genes
Leave one gene out



ModuleMiner train a TRM on the other 11 genes
ModuleScanner rank the full genome
Look at the position of the left-out gene
Nelander et al. (2003), Genome Res 131838-1854
22
In silico validation leave-one-out
cross-validation
Do for all possible thresholds

Sensitivity of left out genes above
threshold Specificity of genome below
threshold
Plot on ROC curve
Area Under the Curve is a measure of performance
23
In silico validation leave-one-out
cross-validation
Do for all possible thresholds

Sensitivity of left out genes above
threshold Specificity of genome below
threshold
Plot on ROC curve
Area Under the Curve is a measure of performance
50
24
In silico validation leave-one-out
cross-validation
Do for all possible thresholds

Sensitivity of left out genes above
threshold Specificity of genome below
threshold
Plot on ROC curve
Area Under the Curve is a measure of performance
100
25
In silico validation leave-one-out
cross-validation
Do for all possible thresholds

Sensitivity of left out genes above
threshold Specificity of genome below
threshold
Plot on ROC curve
Area Under the Curve is a measure of performance
93
26
Sensitivity to noise
1.0
0.9
0.8
AUC
0.7
0.6
10 smooth muscle genes random genes smooth
muscle genes random genes 10
0.5
0
1
2
3
4
of random genes / of smooth muscle genes
27
Comparison to other CRM detection algorithms
ModuleMiner ModuleSearcher
CisModule EMCMODULE Random TRMs
28
Application of ModuleMiner to adult tissues and
embryonic development
  • 10 microarray clusters
  • Genes expressed in different adult tissues
  • 9/10 successful CRM detection
  • 5 custom-build sets
  • Embryonic development processed
  • 5/5 successful CRM detection

All conserved regions Adult tissue CRM
predictions Embryonic development CRM predictions
distance from TSS
29
Conclusions
  • ModuleMiner
  • detects similar cis-regulatory modules in
    co-regulated genes
  • outperforms existing CRM detection algorithm on
    benchmark data
  • detects CRMs in microarray clusters of different
    adult tissues
  • Mostly close to TSS
  • detects CRMs in custom-build embryonic
    development sets
  • Mostly further from TSS

30
Introduction
Gene expression

Regulatory regions
Genes
2
31
Gene prioritization genomic data fusion
Gene expression
Anatomical expression
Protein Domains
Process/ pathway
Gene regulation
Prot-Prot Interactions
Literature
BLAST
P(gene)
32
ENDEAVOUR The approach
candidate genes
n data sources
overall prioritization
n prioritizations
data source
known (training) genes
33
Prioritization based on one data source
T3
  • Vector-based
  • Literature (text-mining)
  • Microarray data
  • Cis-regulatory motifs
  • Attribute-based
  • Gene ontology
  • Protein domains
  • Pathways
  • Anatomical expression
  • Other
  • BLAST
  • Cis-regulatory modules
  • Protein-protein interactions

cos(?)
T2
T1
GO ID expected frequencies
full genome
GO IDs
P-value
GO IDs observed frequencies
training genes
GO IDs
Training genes
Protein sequences
Local BLAST database
Candidate genes
BLASTP
34
Genomic data fusion order statistics
candidate genes
n data sources
overall prioritization
n prioritizations
data source
known (training) genes
35
Cross-validation
n-th position

99 random test genes one left-out gene
data source
prioritization
Sensitivity of left out genes above
threshold Specificity of random genes below
threshold
left-out gene
Plot on ROC curve
training genes in which one gene is left out
sensitivity
sensitivity
AUC
Performance measure Area Under the Curve
0
1 - specificity
1 - specificity
0
36
Cross-validation
n-th position

99 random test genes one left-out gene
all data sources
prioritization
Sensitivity of left out genes above
threshold Specificity of random genes below
threshold
left-out gene
Plot on ROC curve
training genes in which one gene is left out
sensitivity
sensitivity
AUC
Performance measure Area Under the Curve
0
1 - specificity
1 - specificity
0
37
Cross-validation on disease and pathway genes
Diseases (OMIM, 627 genes, 29 diseases)
100
Pathways (GO, 76 genes, 3 pathways)
Random
80
60
AUC
40
20
0
GO
EST
BIND
KEGG
Overall prioritization
BLAST
InterPro
Literature
Transcription motifs
Microarray
Cis-regulatory modules
38
Integrated case study predicting expression from
sequence
AGCTTCTCCTCTGTAGACACCGAGACTCATAACTCTGATGAGATCCACAG
TTCTATTGGAGTTGTGCAATGAAATAGCAGACACTCTTGGAATCTCTTGG
GGCTCCCCCAACTTCATGAATGAATCTCTAAGTTCTGCATGCCCCATATA
AACTGATGACAAGATCTTTGAGAGCACTGTTTCCTTAGTGGGTTTCCACA
GAGAAATTTTGAATATGGGGGTCCACGAAGTGGCTTGAGCCATCTACCCC
AACAACAACATTTGGCCTTTGGTGCCTCTCTAGTATTCTCCTGATGGTTA
TGCAGATGGTGGCATACAGAAATGGAGTAAATTAGTAAACTAAAAGAATA
AATGAGGTGCCCCATTTCTCTGACTCTATTCTAGGAAAATGAGTGAGAAG
CAGGATCTCCCAGATTTCAGGAGAGATCTGGGTCACTTTTTGGAGGTTTC
TGGTATTGAAAATTATATATATATATCCTCCAGCTGTATATATATATATA
TATATATATATATATATATATATATATATAACATCTCTATATGATATACG
TATCTATCTATACCTCTATAGATATCTATAGATATCTATCTATATCTCTA
TATGATATATAGAGATATAGATATCTCCTCCAGGTAATAGACTTAATTTT
TAAGAACATGTTTCAATTCACAGAAAAATTGAGCAGATGGTACAGAGAAT
AACCCTGTGCCCAGTTTCCCCTATGATTAACATAATACATTATATGGTAC
ACGTGTAACAGTGAAACAATATCGGTACATTATTATTCACTAAAGTTCAT
CATTGATTCAGATTTGTCTAGGTTGATCTTATGTCTTTTTGTGGCCAATT
ATTCCATCTAAGATTCGACATTATATTAAGTTGTCATGTCTCCTTAGGCT
AATCCTTGCCTGTGACAATTTCTCAGACTTTCCTTGTTTCTGATGACCTT
GATGGGCTTGAGGATTACTGGTTTTTTGTAGGACGCCCCTCTACTAGAAT
TTGTCTGATGTTTTTCTTATGATTAGACTAGTATTATGAGAGCAGGACCA
CAGAGAGAAAGAACAATTTTCACCACATCCTATCAAGAGTATATACTATC
AAGATGATTTATCATTGTTGATGTTGGTCTTAATCCCCTGGCTAAGAGAG
TGTTTGTCAGGCTTCTCCTAAGCTATTTTCCCCTGACTACCTTTCCATAC
GGAATATACTCCCCGGGAAGAAGTTACTATCTATAGCCCACAATTAAAGA
GTGTGGGTTTCTGTTTCTCCTCCTTAAGGCCGGCACATGTCTATAAATTA
TTTGGAATCCCTGTGCACATCTATATAAATAAATTTGGAATTACGGGATG
TTTGTCTTTTCTCTCTGGTTTATTAATTTACTTAATAATTTATTTATAAT
AGTATGGACTCATTACTTTTTTTTTTTTTTTTTTTTTGAGAAGGAGCCTC
ACTCTGTTGCCCAGGCTGGAGTGCAGTGGCACAATCTTGGCTCACTGAAC
CTCCGCCTCCCGGGTTCAATGGATTCTCCTGTCTCAGCCTCCCGAGTAGC
TGGGATTACAGGCATACGACACCATGCCCAGCTAATTTCTGTATTTTTAG
TAGAGACAGGGTTTCACCATGTTGGCCAGGCTGGTCTTGAACTCCTGACC
TCAAGTGATCTGCCCACCTCGGCCTCCCAAAGTGCTGGGATTACAGGTGT
AAGCCACTGCACCCAGCCCTGGACTCATTAATATTTATTTTATACTTTGG
GTTATAATGTAAAACACTATTCTATTTTGTTGCTCAAATTGTTGCAGCTT
TGGCCACTGGGAGCTCTTCAAGTGGCTCCTGTGTCTTTTTGAAATATCCC
TCACCAATGTAGTTTTGTTTTTGAATAATTCCTTACTTTAAGGTGCTACA
AGATCTTTCATTCTCATTTGTGTATTTCCTGCCCCAGTTTTAGAACTCAA
CAATTTCTCCAAGAAGCCTTGGTTCCAGCTGCTGAGAAATGGCATTAAAA
CTGAGACCAGCCTGGCCAACATGGTGAAACCCTGTCTCTACTGAAAATAC
AAAAACTAGCCGGGCGTGGTGGTGCAGGGCTGTAATCCCAGCTACTCGGG
AGGCTGAGGCAAGAGAATCGCTTGAACCCGGGAGGCGGAGGTTACAGTCG
GCTGAGATCGCGCCACTGCACTCCAGCCTGGGCAACAGAGTGAGACTCTG
TGTCAAAAACAAAAACAAACACAAAAACAAAACAAAACTGAGCTCTGAGC
ACCAGGTGTGCTTGTTGCTACGACAAATATATTTCAAACCTTATATTTTT
AACACCAGCACCCACACAACTACAATACAATTGCACTATTCATAAAACAA
TTATAGATTATTAACAACATTCAATCATGGTGTCATAGGAGCCTGTGGTC
CTACACTGGATCCCACACACAAAACTTGCATATGATGGTCATCTTCTTTC
AGTCCTGTTAGGATTGAAAGAGAGATGTATAGCCTCAGTGGAGATAATAT
CAAAAGTCTAATTTTATTTATTTATTTCTTTCTTTATTTTGAGACAGGGT
CTTACTCTGTGGCCCAGGCTGGAGTGGTGCCATCATAGCTCACTGCAGCC
TCAAATTCCTGGGTTCAAGAAATCCTCCTGCCTCAGCCTCCCAAGTGGCT
AGCACTACAAGTATGTGCCATCATGCCTGGCTATTTTTTTTTTCTTCCGT
TTTTTTAGAGACAGGGTCTACGTTGCCCAAGCTGGTCTTGAATCCCTGGT
CTCAAGTGATCTTCCCACCTCAGCCTCACAAAGTATTGGGACTACAGTTG
TGAGTCATTGTGTCTGGCCCAAAAGTCCAAATTTGAGGCCTTCTTTGGAT
GTGTGGCCACAATAAATGGCTCTTGCAAGGCTGCCAACCCCTTACACTCT
TTCCATAATATGCCATAAGAAAAGCATACTGGATTTAGAAATAGGGCATG
AAAGTTCTGAATCCAGCTGTGTTAGTGTTATAGCATATATAGCAAGTGGA
TTGTGTCTGGGCCTCAATTTCCAATGATACAAAATCAGGAACATCAGATT
GGATAATGGCTAAAGGCCCTCCCAGTTCTAGCACACTATAATTTTCAACA
GACTTACACTGGGGGAATACAATTGGCTCCACTAGTCTTTGTATACAGGC
CTAATATTCCAGAAAGTCTAAACCAGTGGAGGCATGGGGGTGCGGAGGTC
GCGGCTAATAAATCAGAGTCATTTTATTATTTTTTGGGAATGCCAAGACC
TGTTAAAGGCTTTAGATAGTCTAGACAATCGGGCCTGAGAAACTTTAGAC
CTTTCTTTTTAAAGAATGAAGATCAAAAAAGTATAAAAAATATTGATGGA
AAGTATCTCTTTCATTGGTTTCATGTTCTGATAGATCAAGACTTCTTCCT
CTTTTTTTTTTTTTTCCTTTAGTAAGGGAAAACTCCTCATCTGCTTTTTC
CTCTGACTTCAAATAATTACCTTTAATGCAGTGATGGCTGAGCCACCTCT
AAGTTTCTTACCAGGAATCTCTCTCTAGGTTTTTATTTTTTTCTTTTTCT
CTTCCTTCCTTCTCTCCTCTCTGCCTCCCTCTCTTATGCTCTCCCTCCCT
CCCTTCCTTCCCTTGAATGTTATGATGTGTTTTTTACATCCATATACTAC
CCAGTGTACAAATGTCATCTCCTTCCTATCATTGATGGGGACAATTTGCA
AAAACAAATAGAAGGAAAATAAAAAAGGAAATATAGGGCAGAAAAGACAC
TTGGGAACTGTCACATTTGATTATGAATGCTGGAGATCAAAGGTGCAAGG
TCTTAGAACCTACTTCCTCCACCTCTTAACGTTTAAAATCTTCAATTGGC
TTTTGAACCCACTCAGCAAAATCCCAGACTTTGGTCTACAATTGGTTAAA
AATTGATAGAGTGAGGATTCTGGGACTGCCTTCTTTACTTAGAAGTTTAC
ATTTTAACTCCTTCCCTAGCCCCAGGTACACATATACACACAGCTCCTTT
CCACTCCTCTCGCACAGTTCTGTAAATATGTTTTGAAATGTAAAGGTACA
GAACTAAGCGCAGACCGGCATCCCTCAAATCATCGGGGCTATTCCTTCAC
ACAGCTGAGGAAACGGAGTCCTCACAAGTGGCTTTGCTCAATGTCCCATA
AAGAGTTTCAGGCACAGCTGTAATTAGAAACCAAGGGTTTGTGTGTGTGT
GCGCGCGTGTGTGTGTGTGTGTGTGTGTGTGTGTTTGCAACAAATACAGT
GTTTTTTTTTTTCTCCCTACACTGTGCCCCCGTGGAGTCACATTTGTGTG
TCTGTGTCTCTGTACGTACATAAGTTACACAGACACTGACATGTAGGAAA
CGTGCACCAAAGTGTCTGTCTTCTGACCTCAGGTAACAGTATAATGACTT
GAATTTCAGGCAGCTGAAAGGTTTCTGCCGGTGGAGGTTGAAATAAACAA
GAAAAGCCACTGTGGAGATGTGAATGGAAAAGTACCGAGCCCTCCCTCCC
TCCGCACATTCTTCCGCTCTCCAGCTCTCCCTGCCATCGAGCTGGCTTCA
GATAGGCTTCTGCATGGTCAGGTGTACAAGAGGGCGGTGGGGAGAAGAAA
AAAAAAATGCAGGCACAACACGCAAATCAAGTTTTTCCACTTCTAGCCTT
AGGTAGTAGAGACAGCTAAGTACAGCAGCCAGCAGCCCGGCAACGGCAGC
GGGTGGACCAGCCACCCTGAGTTTACAAACACTCAAGTGCTTTCCTTCCC
TCATCCCTCTCAGAGTCCAGCTGCTGCTTTCCTTCATGCTAAGGTTTCAT
AGGAAGTGAAAACTCTGCTATTCAAAACAGCGATCGAACGCAATAAACAA
ATCATTACACACCCCCTAACCCCCATCACTTCTCTATTTTAAGCTTCTGA
TATTTATTCCCATTTTAAATAAGTGAGAAAAGTGTGGAAAATTAGTGTTT
GGGGGTAAACTCTGAGCCAGGCTGAAAAGGTTTCTAAAGGAAAAAAAAAT
CTCAGAACAATAAAGGCTAAAAGCAGGCAGCATATGGATGAAAATTAAAC
ACTGATACTTCCTTTTCAGAAGGCAGTAGCTGGAAATTATACACTTTTTT
AATGTCTCAAAACTTTCTGCTCATCTTGCTATGTTAAAAACGCCTTTCTT
TCTCCAAGGATACTACAAAAAGCTTGTTTACAACAGTTCTAAATGAAGGA
TTTGAAATAAAACGAACAGGTAAAATTTAACAAGTCTGATAGATAGTGTC
TCCCAAATCTATCAAAAGCAGTGCCAAGTACTTCAATGTAGCTGAGAGGC
ATAAATAAACCCAAATGACCATCAAAACTCATCATGACTTGGAGTTCGCT
CTGAGTTTTGCAGTTTACAAAGAGACCATGGCAGCCTTGCTTCCCTCAGT
TCTACAAGGACACAAGATATACACCTACAGACTCAAGTTGTCAGATTACA
CTGATCCTCTAAAATGACAGAGGGCCAGCAAATCATGCAGACCCATTTTC
AGTTGTGTTCCTGGGGTCACACATGCTCCTAGTGAAGACCCAGCCTATAA
TCCTGAAAGAAGAAAGCCTAGAGAAGGTGATTGATTTGAAAAAGTCTTCC
CAGTTTTAAAATCTTTAGTCCTATGATGTGGTATCTTAAAGACCTACCAA
GGTGCCAGAGGTTCCTGACAGGTGAAACCAACTTCCTCTTGTGAGCCCCC
TTAGAAAGAGGACAGAACCGTGTTTATTCCAAGGATAGGTTCTTTTTCAG
CTATGACTGAATTGTGGGGAAGGTTTTGCAAGGGGGAATTGGATGTGAAG
TCTGTTCTTTTCCTCAAATAGATGTAATATTAGGACCAGGCTATTTTATT
TTGTAATAAAGCTTATATTTACCCAGCAGCAATGATCAGGGACCTATTCT
TATGCCCAGTCCATGAGGCAAAGAGGTTGGCCTGGTCCTCTACTGAGTAT
TAGCAGCCAGTAACCATTAAATCATGGGACTAGTTGAAATGTAGTGCCCC
GAAGTCTGCAGGAATTATTCATACGACCCCAGACATGGAATCACTCTTTA
GAGCTTCTTAAGGATGATTTAAAAGAATCAGAATACGTTCAAGTCAGCCC
TTTCTTTAATCCTGTAACACGGCACTGCGGGAGTGAGGGAGGCCCACATA
GTGATGCCAACTGGATACTGAGGAGAGGTCAAGAATGAAAGAAGAAATGA
CATTCTGGAAGAAATTCAACTGGTATAATATTTGACAAAGTTACTTTCCT
AGGAATTGAAAAGAGATTGAGAGGCGGGTGCACAATTTTCCTCACCATTC
ATTCAGTTCAAAGTAAAAGAGACTCACCGAAAAGTAAGTGCCTATCTTTA
GAAAATTTTCAATAATGATTTTCTCTTTCTTTCTAACTGGTCTTCTGTTC
TGGTCAATTTCTTTCAGTGTAAACACATTGATTTGGCAAAAAGCAGTAGG
AAAATGTGGCACTCTGGCACTTGGTCCCAGAAATAATATGCTGGGAAGAT
TTGAGGTCCCTGGTGATGAGGTTAATTATATATGAACCAGCCCTGGTGGG
TTCTCCCTCTAGGGGCTCACTGCAGAGAATTAAAGAGGGCTGAGGTATGA
GAAGGGTGAATTCCTTCCCAGCCCCCACTCTGGCTGGTTCTACTACTGCC
TTAGAGAGCAGATTTCCCTTTGCTCTGCAGCGCCCCCATGGGGCTAAGAG
TGGAGTGGCAAAGGGAACACAGGAGGGACAAGCTGTGTTTCAGGTTGAGG
GGGGCGGTGGATGAGGCTGAATGGCAGTTTTGACAAAGAAAAAAGTGACC
AAAAATCATAAAAATAATCTTTTGAGGGCCCAATAGTAAGGCAGAGCCAT
ACAATTCACATTCCAAACCATATAGATACGTCTGAGAAATCCTAAAGTGC
TAATTGCTCATAAAAGAAAAAATTACACATATAAACACACAAAGAAAATC
CCTTCCACAAAATCGGGGTGTCATTTTGCATCCAGCGGGATTCATTTTAA
TTTCTTTGAAAATGAGAAGGAAGGGGACTCAAATGAAAAAGCAGATAGTC
TGCCTTCTGGCAGAATAAATCTGAACTTGACAATATCATGTGTCTTTGGG
GGTAAAACGTACATTTCAACAACAGTGACAGGATTAGGCCTATGTATATT
TTTCAAAAACCGTTCACAAGACAGGCTTTTCTGCAGAGGCTGCAGTAATC
CATCTGTCAATAAGTATTAAAATATTCAGATTTCACAGGGACAGACACTT
TAACGCATATTTCCTAAGCTCCAGCCCTTGTGGAAAATAATCAACCTCTT
TGCACCTTTCTGGGTTTTAAAACCTAAAATACAGCCTTTAAAAATGTGTG
TGTGTTGTGGGGTAGGGGGGTGCATTGCCAACAACATTTTCGGTGATAGA
TGGAACTTCTTACGGGACTGTCAATGAAAGAGATTTTCCAAATATCCCAG
CAAACAGCAATCTTTCACAGCTCTGATCACTCCTCCATTATAAACCCAAA
TTTTGGGTTGAGATAGGTAGATTATTTTAGACATATCTTTATTAGAAATT
AACAAGTGACGAGATTTTGTGGAAGCTTTAAGAATTCATCTGTAATTTAA
TAAGTCGCTTGAAGGACTCTCATAGCCAAGGCTCAGAACAGCCTGACCTT
TGAAAGCTGCTTCTGGTCCAAACATTTTGGGCTAATTCTTGAGGAATCTG
AAATATTATTTTCCCCTCACACCCTTCTTTTAAGAGAGAGACATAAAAGA
AACAAGAGTCTCCCTTATTCAGGGATGAGTAGGAGGGGAAAAAACCCGAA
CCAACATTTAAATAAGGAAACTAGCAGCTCTGAACAAACAAACTAGGACC
CACAATGAAATGATTCTGCACTGCAATTGCCTTTAAAAAGAAAGTAATAG
AGAAAAAGAGAAGGAAAGAATTTCTCCTTCTTCTCTACCCCCCCCCCACC
CCACCCCCCAACTCAGCTTCAAAGCTAAGAAGACTGTGCTGCGTGTAGTG
CATTGTAGTTGTGGCAGTCTGTTCTAAATACAGGCAGTATCTGTGATACT
GGCACGGCAGGCCTTTAGAATTCCCTCCGGCTGATCTCTTAAACACAGAC
TGAAGAGATTTTTTTACAACGACCTTGAAACGAGCCTCGAAAACAAAAAT
CTCAAGACCTTAAGAGAAAACAAAACACAAACAGGTATTTGGCTCACAGA
ATTTTGTAGAAAACACACACATACCACCCCGCCACCCCCACCCTCCCCCC
CACACACACGTTTCTTGCAACAAGAAATTTCCCAAGAGTCAACAATAACA
GATTAAACCCACCACTTGCTGTCCTGGAAAGAAACAAACCAAACCAAAAC
AAATCCTTTGAACATTTCTCTGAAGTGCAGGAGAGACACACTTCAGCAAA
AGTCCAAGGGGGAAAAAGAAAATTGCACCAAAGGAAAAAAAAAAAAAAAA
AGTGGGGGCTGGGATTGTTACATATGGCCAAAAATTTAAGCTTCTTTCAA
TAGTATTAGTATTGAAATAATACATCTTTAAAACGCTTGAGGGATTAGAT
AGGGAAAGAAAAGGCACGTACAAAAAAATCCAACCGATGCCGATCCTGTG
ATTTACGTAACACCACAAACTTGCAAAAGGCAAAAAATCAGAAGCAAAAA
TCCATAAACCATCAAAATACAGAAACCAAAAATCCCAAGCCACCACACCA
GAAAGAAAAAAACCCAGAACAACAGCAAAAACCCCTGTCCTAAATAAAAA
TAAAGCAAATGAACCCACCGAAAACTGCTTGGCAAATATTTTTCTCGTGG
TGCCTAATATTCTAGTTGGAAAGAGCTGTGATGTTTATTTTATTTTATTT
TTCTCTTACTCGCCTCTCTAACCCTACTATATATATAACATACTTTTCCC
AGTGGTTCAAACCTCTCGCTCCCTTTTGTGCATTTAGCTCGATCTGCTGA
GTTTATGGGTAAGAAAGAAGGAATTAGCCCCAGACCCCGGGAAAGCAAAG
CGCACTCCCCCTCTTATGTCACCGAATAGCAAATTAGTTCTCAGAATTCC
AGAGGCCGAGCTTTGCTACAGCGAAGGCGCCGACGTCACAGAGGAGGAGC
CCACGTGATGGTGGCGGAGCAGGCCATACCATCGTCTTGGGCCCGGGGAG
GGAGAGCCACCTTCA
How is this gene expressed?
39
Predicting expression from sequence macrophage
differentiation
Macrophage
Granulocytic monocytic progenitor cell
TPA
HL-60 cell line
Differentiated HL-60 cells
  • Using only sequence information, can we predict
    genes up- or down-regulated during macrophage
    differentiation?

40
CRM detection prediction of new target genes
transcriptional regulatory model
100 new target genes
18 upregulated genes
41
CRM detection prediction of new target genes
transcriptional regulatory model
100 new target genes
18 upregulated genes
100
top 20
10
Fold upregulation
1
0.1
0.01
42
Prioritization of new target genes
training genes
18 upregulated genes
candidate genes
transcriptional regulatory model
100 new target genes
prioritization
43
Prioritization of new target genes - conclusions
training genes
18 upregulated genes
candidate genes
transcriptional regulatory model
100 new target genes
prioritization
top 20
top 20
Fold upregulation
44
Conclusions
  • Endeavour prioritizes candidate genes
  • Looks for similarities with known disease/pathway
    genes
  • Integrates information from many heterogeneous
    data sources
  • Computational validation
  • Disease/pathway genes ranked on average at the
    10th position of 100 candidate genes
  • In vitro validation
  • Predicting gene expression from sequence

45
Introduction
Gene expression

Regulatory regions
3
Genes
46
Identification of disease causing mechanisms
THRLBCL
  • T cell/histiocyte rich large B cell lymphoma
  • Similarities with nodular lymphocyte predominant
    Hodgkins lymphoma (NLPHL)
  • Functional meaning of the THRLBCL
    microenvironment?
  • Microarray expression profiling of THRLBCL, in
    comparison with NLPHL

THRLBCL NLPHL
0
5
10
15
Survival (years)
47
The microarray experiment - PCA plot
THRLBCL NLPHL reactive lymph node
48
A three-gene quantitative RT-PCR classifier of
THRLBCL vs NLPHL
  • 3 most significant genes
  • One calibrator of each lymphoma type
  • Each converted to give 6 percentage scores
  • Averaged to give one NLPHL and one THRLBCL
    similarity score

classification by the three-gene classifier
diagnosis by morphology
THRLBCL
NLPHL
0
46
NLPHL
23
0
THRLBCL
49
Differential expression
FCER1G VSIG4 IDO CCL8 TLR1 TLR2 TLR4 TLR8 CD14 STA
T1 CCR1 CXCL10 CXCL16 CCRL2 CD80 CD86 CD274 CSF1R
CSF3R PDCD1LG2
FCGR3B FCGR1A ICAM1 IL1RN IL18BP IRAK3 CD74 S100A9
CASP5 MSR1 CD163 SOD2 IFNAR1 IFNGR2 IFIT3 IFI6 C1
QA C1QC C2 C3AR1
THRLBCL signature
FCRL1 CD79A CD79B CD19 CD22 MS4A1
PAX5 BCL11A FGFR1OP FCER2 BANK1
NLPHL signature
50
The model
CCL8
recruitment
scavenger receptors
IDO
macrophages and dendritic cells
tumor tolerance
Toll-like receptors
innate immunity
VSIG4
VSIG4
activation
IFN-?
51
Conclusions
  • Expression profiles are in line with differences
    in microenvironment between THRLBCL and NLPHL
  • Insight into the functional significance of the
    microenvironment
  • Tolerogenic immune response
  • New targets for therapy

52
Breast cancer clinicopathological significance
of polysomy 17
Tumour grade

HER2 amplification
I
II
III
Nottingham Prognostic Index
Normal

I
II
III
ER status

Polysomy 17
HER2 expression
Negative Positive
PR status
gt 10
lt 1
1 - 3
3 - 5
5 - 10
Negative Positive
53
Final conclusions
  • Development of novel systems biology methods
  • ModuleMiner CRM detection
  • Endeavour gene prioritization
  • Systems biology to gain more insight into
    diseases and processes
  • Predicting expression from sequence integrated
    case study
  • A tolerogenic immune response in THRLBCL
  • Clinicopathological significance of polysomy 17
    in breast cancer

54
Perspectives
  • Systems biology methods for the identification
    of
  • Regulatory regions
  • Protein-binding microarrays more and better PWMs
  • Disease genes
  • Three systems biology methods
  • Combination with array-CGH
  • Disease mechanisms
  • Microarrays focused experiments
  • Data integration
  • Disease treatment
  • Insight ? directed treatment

55
Systems biology identification of regulatory
regions and disease causing genes and mechanisms
  • PhD defense
  • Peter Van Loo
  • Promotor P. Marynen
  • Co-promotors B. De Moor
  • and C. De Wolf-Peeters
  • Human Genome Laboratory
  • Departement of Human Genetics
  • May 23th, 2008

56
3 Conservation options
  • 1. All predicted binding sites in all human-
    mouse conserved non-coding sequences (CNSs), 10
    kb 5 of transcription start
  • 2. Same as (1), but limit ot binding sites that
    occur in both the human and mouse CNS
  • 3. Same as (2), but add 100 kb of mouse sequence
    both 5 and 3 (to correct for transcription
    start annotation errors)

10 kb
CDS
CNS 1
Human gene
5 UTR
Exon 1
LAGAN VISTA
Mouse gene
10 kb
Human CNS
Mouse CNS
10 kb
CDS
CNS 1
Human gene
5 UTR
Exon 1
LAGAN VISTA
Mouse gene
100 kb
10 kb 100 kb
57
ModuleMiner performance TRMs and TRGMs
58
Comparison to other CRM detection algorithms -
results
59
Comparison to other CRM detection algorithms -
results
  • Using TFBS set 2 in ranking step improves
    performance of other methods

60
Comparison to other CRM detection algorithms -
results
  • TFBS set 2 does not always do best

61
Application of ModuleMiner to microarray clusters
62
Application of ModuleMiner to embryonic
development sets
Embryonic develop-ment process TFBS set Nr target genes after leave-one-out cross-validation (p-val) AUC
Primary heart field 44 1 6 / 7 (p 6.4 ? 10-6) 0.92
Secondary heart field 44 1 6 / 9 (p 6.4 ? 10-5) 0.79
Neural crest cells 45 2 6 / 10 (p 1.5 ? 10-4) 0.86
Eye development 46 1 10 / 15 (p 1.9 ? 10-7) 0.79
Limb development 47 1 10 / 24 (p 5.2 ? 10-5) 0.77
63
Where are the CRM predictions located?
TFBS set 1 and 2 TFBS set 3
Microarray clusters Development sets
64
How are genes ranked?
  • Vector-based data source
  • Microarray data
  • Candidate gene with expression similar to that of
    genes known for the disease gets a high score
  • Literature
  • Motifs
  • Attribute-based data source
  • Gene Ontology
  • Interpro protein domains
  • KEGG pathways
  • EST anatomical expression

expression inbrain, liver, kidney,...
Known disease genes Low score candidates High
score candidates
???????????? ???????????? ????????????
???????????? ???????????? ???????????? ??????????
??
cytoskeletonGO0005856
65
Order statistics
  • Given a set of n ordered rank ratios for gene i
  • (9/100 4/120 30/150 30/50 2/10 80/80)
  • ? (0.09 0.03 0.2 0.6 0.2 1)
  • ? (0.03 0.09 0.2 0.2 0.3 0.5 0.6 0.8)
  • What is the probability of getting these rank
    ratios or better by chance alone?
  • How many rank vectors does my vector strictly
    dominate?
  • Joint probability density function of all n order
    statistics
  • Recursive formula of complexity O(n2)

66
Validation of the literature data source
Prioritizations of 199 random gene the
indicated disease gene
67
Validation of genes related to complex diseases
Prioritizations of 199 random gene the
indicated disease gene
68
Disease case study DiGeorge syndrome
Atypical 22q11 deletion 58 candidate genes
69
Ypel1 as a novel DGS gene validation in zebrafish
70
A screen for genes involved in congenital heart
defects (CHD) work in progress
Array-CGH of CHD patients with a chromosomal
phenotype
Map (micro)deletions and (micro)duplications
1.0
0
-1.0
Chr 14
Known CHD gene explains phenotype
No known CHD gene in deleted/duplicated region(s)
Endeavour prioritization
Validation in zebrafish
Morpholino knockdown
in situ hybridisation
71
A screen for genes involved in congenital heart
defects (CHD) work in progress
Array-CGH of CHD patients with a chromosomal
phenotype
Map (micro)deletions and (micro)duplications
100
16
1.0
0
-1.0
Chr 14
Known CHD gene explains phenotype
No known CHD gene in deleted/duplicated region(s)
7
9
Endeavour prioritization
Validation in zebrafish
Morpholino knockdown
in situ hybridisation
72
CHD gene prioritization optimizing the
performance
Extra data source Microarray data embryonic
heart development (mouse)
Multiple training sets Validation/optimization of
each training set by leave-one-out
cross-validation
Performance gain
Primary heart field
Secondary heart field
CHD genes
Neural crest cells
Vascularization
Combine prioritizations using different training
sets into one prioritization
73
CHD gene prioritization preliminary results (in
situ hybridisation)
Chr 14
Chr 4
1.0
1.0
0
0
-1.0
-1.0
74
Bias to well characterized genes
75
Endeavour
http//www.esat.kuleuven
.be/endeavour
76
Differential expression histogram of p-values
77
Is the spleen sample abberrant?
78
Quantitative RT-PCR validation
  • Fold change THRLBCL vs NLPHL
  • Genes selected for involvement in
  • Interferon pathways
  • Macrophage activation
  • Innate immune responses

Gene symbol Description Fold difference microarray Fold difference quantitative RT-PCR (p-value1)
IFN-? Interferon gamma 4.72 4.4 (p 1.0 x 10-5)
STAT-1 Signal transducer and activator of transcription 1 1.6 2.9 (p 4.4 x 10-9)
CD74 HLA class II histocompatibility antigen gamma chain 2.8 1.2 (p 0.21)
CCL8 (MCP-2) Monocyte chemotactic protein 2 143.5 84.8 (p 3.9 x 10-9)
IDO Indoleamine 2,3-dioxygenase 9.0 12.3 (p 1.6 x 10-8)
IFN-?1 Interferon alpha 1 1.02 0.92 (p 0.81)
IFN-?R2 Interferon alpha receptor 2 0.92 1.3 (p 5.3 x 10-3)
STAT-2 Signal transducer and activator of transcription 2 1.82 1.3 (p 0.11)
TLR8 Toll-like receptor 8 11.5 11.5 (p 6.4 x 10-11)
MyD88 Myeloid differentiation primary response gene (88) 1.82 2.2 (p 6.7 x 10-7)

1 T-test, not corrected for multiple testing. 2
Difference was not significant at p lt 0.001
(after correction for multiple testing).
79
Sensitivity of the classifier to the choice of
reference samples
About PowerShow.com