Length and evolutionary dynamics of vertebrate conserved non coding CNC DNA regions - PowerPoint PPT Presentation

About This Presentation
Title:

Length and evolutionary dynamics of vertebrate conserved non coding CNC DNA regions

Description:

Length and evolutionary dynamics of vertebrate conserved non coding ... Mammal conserved CNC are much less conserved in marsupials (Margulies et al, PNAS 2005) ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 34
Provided by: Jon183
Category:

less

Transcript and Presenter's Notes

Title: Length and evolutionary dynamics of vertebrate conserved non coding CNC DNA regions


1
Length and evolutionary dynamics of vertebrate
conserved non coding (CNC) DNA regions
Dorota Retelska and Philipp Bucher, CCG, SIB
2
Conserved regions in the genome
Alignment of human genome with syntenic regions
of other vertebrate genomes identified a large
number of almost perfectly conserved non coding
fragments (100bp)
3
What are CNC?
  • - Often far from genes and not transcribed
  • - Not present in invertebrates
  • - Non coding RNAs
  • - A role in regulation of homeobox genes has been
    demonstrated for some of them, and a significant
    proportion (45) have enhancer functions
    (Pennacchio et al, Nature 2006 23499)

4
Questions
  • Why vertebrate CNC are not conserved in insects,
    in contrast to CDS?
  • Does the presence of CNC explain the apparent
    higher complexity of vertebrates?

5
Our setup - sliding window
Pairwise alignment and scoring of syntenic
sequences
Parallel analysis of pairwise sequence alignments
6
Example of annotation
Sequence identity computed based on a 60bp
sliding window, and assigned to discrete classes
(96-1009 91-958, etc) Each bp of the genome
is annotated with Ensembl fonctional annotation
and sequence identity Chr bp
base identity class CG
Annotation Chr21 31520185 t 2
2 LINE/CR1 Chr21 31531977 a
1 4 0 Chr21 31531978 a 1
9 0 Chr21 31531979 t 1
4 0 Chr21 31531980 a 1
0 CDS Chr21 31531981 a 1
1 CDS -gt Statistical analysis
7
Distribution of conservation per sequence class
(Hs_Cf)
CDS conservation pic at 86-90 identity NC,
repeats, Fint/Lint 71-75 identity Fraction of
NC/Int in high conservation Classes
Coding
NC
Repeats
First Intron
Last Intron
lt55
76
81
86
91
96
61
66
71
56
8
Alignments comparison
We compare HS_GG alignments to DM_DV, since the
distribution of conservation of coding sequences,
and the genomic distances are identical in these
species
DM-DV
HS-GG
Non coding bases above 80 identity
49.44
61.89
9
Conserved information
Classical Jukes-Cantor probability for 1 neutral
site to differ
d genomic distance mt
Generalization for non-random nucleotide
distributions
r equilibrium random identity value.
Experimentally defined from random genomic
alignments between two species (E.Beaudoing)
Probability for 1 site to differ given a
conservation constraint s
p x observed identity s coefficient
weighting the conserved information in each
identity class irrespective of genomic distance
Conserved Information S (s Nbp) over all
sequence identity classes
10
Conserved Information
  • 1.61 times less of non coding conserved bases in
    Drosophila than vertebrates
  • 60,3 of conserved bases are non coding in
    vertebrates (52,1 in insects)
  • Ratio of non-coding to coding in both species is
    similar
  • -gt how are these elements distributed in both
    genomes?

11
Persistence length
Length of a sequence region with
identity percentage within a sliding
window constantly exceeding a conservation
treshold
Chr bp base
identity class CG Annotation chr21
31520185 t 2 2
LINE/CR1 chr21 31531977 a 1
4 0 chr21 31531978 a 1
9 0 chr21 31531979 t 1
4 0 chr21 31531980 a 1
0 CDS chr21 31531981 a 0
1 CDS
1
12
Persistence length
Length of a sequence region with
identity percentage within a sliding
window Constantly exceeding a conservation
treshold
13
Persistence length
Non coding bases gt 95 Identity
Coding bases gt 95 Identity
  • Insect CNC persistence length follows the
  • length distribution of coding sequences
  • Vertebrate CNC are longer

14
Persistence length summary
  • Fragments exceeding 95 identity in more than 100
    bp

15
Persistence length
  • Conserved non coding information is organized in
    longer fragments in vertebrates
  • These results suggest that the functional length
    of CNC, and probably cis-regulatory elements is
    longer in vertebrates

16
CNC are selectively constrained
Low frequency alleles
Allele frequency spectrum
A
lt61
81-90
61-70
71-80
37-39
56-60
1-3
7-9
61-63
13-15
67-69
25-27
55-57
70-71
43-45
19-21
31-33
91-100
49-51
66-70
86-90
96-100
76-80
Allele 1 occurences
Hs-Gg Identity
Hs-Gg Identity
Different classes of sequence conservation are
under different levels of constraint CNC have
been under selective pressure for the last 300 My
17
Persistence time
  • Although vertebrate CNC are highly conserved,
    they have never been identified in invertebrates
  • To understand why, we systematically investigated
    the persistence time of coding and non-coding
    sequences in vertebrate and insect genomes

18
Persistence time measure
  • UCSC MAF (multiple alignment file)
  • a score-24046.0
  • s hg17.chr21 9883715 37 46944323
    ----CTAGGGCATGACCTCTTCCTATAGCCC-TAGAAACACT
  • s oryCun1. 12855 37 - 16685
    ----CTCGGACACGGCCACCTATTTCTGTGC-CAGAGACACA
  • s mm7.chr12 6223856 37 - 117814103
    ----CGAAGACACGGCTTGTATTATTGTGC-CAGAGACACA
  • s rheMac2.chr 170225 35 - 169801366
    ----GCACGACACAGACACCTCCATGGTGGGC-TGGAG--ACT
  • s panTro1.chr16 28244396 37 101535987
    -------CTAGGGCATGACCTCTTCCTATAGCCCTAGAAACAC
  • s canFam2.chr8 1033559 17 - 77315194
    --------------------------TACAACAG-TAGAGACC
  • s dasNov1.scaffold 1309 35 - 4930
    -------GCATAGCACAACCAGGTGACACTGG---CAGAGAAG

For each query species in the alignment, compute
the identity to the human sequence
Split all alignments in 6 bins based on mean
identity in closest species (Mm and Rm)
Analyze the divergence of most conserved
alignments in distant species
19
Conservation of vertebrate alignments in distant
species
Identity ()
Identity ()
Identity ()
20
Conservation of insect alignments in distant
species
Ag
Da
Dy
Dp
Dv
Ds
Short persistence time of CNC is observed in
insects as in vertebrates
21
Conservation of vertebrate alignments in distant
species
Conservation of CDS decreases linearly with
evolutionary time CNC are more conserved than
coding sequences in closely related species, much
less in distant species







Hs
Rm
Mm
Md
Gg
Xt
Dr
22
Both vertebrate and insect CNC have a short
persistence time
NC
CDS
-Explains why vertebrate CNC have never been
found in insects - Opens the way to predictions
about CNC evolution







Hs
Rm
Mm
Md
Gg
Xt
Dr
23
What is known about CNC evolution?
  • Separate sets of CNC are associated with
    orthologous genes in C.elegans and genes of
    similar function in Drosophila (Vavouri et al,
    Genome Biology 2007, 8R15)
  • Mammal conserved CNC are much less conserved in
    marsupials (Margulies et al, PNAS 2005)
  • Lots of morphological changes are due to changes
    in cis-regulatory regions (only changes that
    allow changes of one of the expression patterns
    of a pleiotropic gene) (Prudhomme et al, PNAS
    2007)

24
Both vertebrate and insect CNC have a short
persistence time
NC
CDS
- Understanding CNC evolution might help to
understand the morphological evolution of
vertebrates







Hs
Rm
Mm
Md
Gg
Xt
Dr
25
Future (current) directions
  • Classification and description of individual CNC,
    motif discovery
  • Measure the amount of conserved information in a
    large number of species
  • Correlate evolution of non-coding regions and
    morphological evolution
  • . and gene expression

26
Thanks
  • Philipp Bucher
  • Emmanuel Beaudoing
  • Cédric Notredame
  • Victor Jongeneel
  • Adan Villamarin
  • Christian Iseli
  • Roberto Fabbretti
  • Volker Flegel
  • Eivind Valen
  • Stylianos Antonarakis, UNIGE
  • Rasmus Nielsen, KU

27
Both Vertebrate and insect CNC have a short
persistence time
  • Both Vertebrate and insect CNC have a short
    persistence time
  • - explains why vertebrate CNC have never been
    found in insects
  • - allows to make predictions about the evolution
    and presence of CNC in various species

28
Vertebrate and Insect CNC have a short
persistence time
  • Both Vertebrate and insect CNC have a short
    persistence time
  • Evolution of non coding regulatory regions is
    likely to affect gene expression level, which
    might be a very important factor in phenotypic
    evolution

29
Both vertebrate and insect CNC have a short
persistence time
-Why are CNC evolving or constrained? 45 of CNC
might be regulators of gene expression (Pennaccio
et al, 2006) -gt Evolution of CNC is likely to
affect gene expression -gt Changes in gene
expression regulation might be responsible for
morphological evolution (Prudhomme et al, Nature
2006)







Hs
Rm
Mm
Md
Gg
Xt
Dr
30
Persistence length
Non coding bases gt 95 Identity
Coding bases gt 95 Identity
Non coding bases gt 85 Identity
Coding bases gt 85 Identity
31
Conserved information
Classical Jukes-Cantor probablity for 1 site to
differ
d genomic distance mt
Generalization for non-random nucleotide
distributions
r equilibrium random identity value. Calculed
from random genomic alignments between two species
Modification of Jukes-Cantor including s
conservation coefficient for each sequence
identity class X is now observed identity
s
s coefficient weighting the conserved
information in the conserved bases irrespective
of genomic distance
32
Conserved information
Conserved Information
  • Distributions of both coding and non-coding
    information are similar in both phyla

33
Persistence time measure
  • UCSC MAF (multiple alignment file)
  • a score-24046.0
  • s hg17.chr21 9883715 37 46944323
    ----CTAGGGCATGACCTCTTCCTATAGCCC-TAGAAACACT
  • s oryCun1. 12855 37 - 16685
    ----CTCGGACACGGCCACCTATTTCTGTGC-CAGAGACACA
  • s mm7.chr12 6223856 37 - 117814103
    ----CGAAGACACGGCTTGTATTATTGTGC-CAGAGACACA
  • s rheMac2.chr 170225 35 - 169801366
    ----GCACGACACAGACACCTCCATGGTGGGC-TGGAG--ACT
  • s panTro1.chr16 28244396 37 101535987
    -------CTAGGGCATGACCTCTTCCTATAGCCCTAGAAACAC
  • s canFam2.chr8 1033559 17 - 77315194
    --------------------------TACAACAG-TAGAGACC
  • s bosTau2.scaffol 13483 35 26178
    ---------AGGACTCAGCCACATACTACTGTGCAAGAGACA
  • s rn3.chr6 6261962 35 - 147642806
    ---------AGGACACAGCCACATATTCCTGCAC-ATGAGATA
  • s dasNov1.scaffold 1309 35 - 4930
    -------GCATAGCACAACCAGGTGACACTGG---CAGAGAAG
  • 1. Compute the number of bases identical to human
    sequence
  • Chr Start AlnLen hg17 rheMac2 mm7 monDom2 galGal2
    xenTro1
  • Chr1 2459 138 138 120 73 76 69 81
  • Chr1 4551 278 278 212 161 166 154 0
Write a Comment
User Comments (0)
About PowerShow.com