Finding functional DNA sequences from whole genome human vs' mouse alignments despite variation in c - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Finding functional DNA sequences from whole genome human vs' mouse alignments despite variation in c

Description:

Finding functional DNA sequences from whole genome human vs. mouse ... at Santa Cruz: David Haussler, Krishna Roskin, Mark Diekens, Robert Baertsch, Jim Kent ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 23
Provided by: rossha6
Category:

less

Transcript and Presenter's Notes

Title: Finding functional DNA sequences from whole genome human vs' mouse alignments despite variation in c


1
Finding functional DNA sequences from whole
genome human vs. mouse alignments despite
variation in conservation
  • Penn State Univ. Ross Hardison, Webb Miller,
    Laura Elnitski, Scott Schwartz, Shan Yang, Jia
    Li, Francesca Chiaromonte
  • Univ. California at Santa Cruz David Haussler,
    Krishna Roskin, Mark Diekens, Robert Baertsch,
    Jim Kent
  • Cambridge Univ. Nick Goldman, Simon Whelan
  • Institute for Systems Biology Arian Smit

2
Human and mouse genomes have been aligned
  • Human Dec. 2001 assembly
  • About 96 coverage of euchromatic portion
  • Mouse Arachne assembly of Feb. 2002 sequence
  • 40 million reads,7x redundancy
  • Assembled into a few supercontigs per chromosome
  • About 96 coverage of euchromatic portion
  • Aligned with blastz (PSU Miller, Zhang and
    Schwartz)
  • Used computer cluster at UCSC
  • 1024 cpus
  • Job takes 2 days

3
Whole genome human vs. mouse alignments can be
obtained from PipDispenser
http//bio.cse.psu.edu
4
Alignments are tracks at the UCSC Human Genome
Browser
Tracks shown are under development. http//hgwdev-
baertsch.cse.ucsc.edu
http//genome.ucsc.edu
5
Whole genome alignments reveal variation in
conservation between and within chromosomes
6
Coverage of human DNA by alignments with mouse
repetitive DNA/ total DNA
aligned, nonrepetitive DNA/ nonrepetitive DNA
aligned DNA/total DNA
X
Blastz, Dec 2001 human vs Feb 2002 mouse
7
Alignment coverage varies inversely with number
of breaks in conserved synteny
8
Conservation varies along chromosomes
aln_nrc fraction on nonrepetitive, noncoding
DNA that aligns in 10 kb windows
Chr22
Chr7
9
Autocorrelation of fraction of DNA that aligns
Although the correlation between values of aln
for 10 kb windows falls off rapidly, a
substantial correlation is retained for about 400
kb. The correlation falls below significance
rapidly for rep and exon but not GC.
10
Variation in evolutionary rates revealed in
ancient repeats
K. Roskin, D. Haussler UCSC
11
Neighboring bases affect frequency of
substitutions in ancient repeats
Graph from UCSC, similar results from A. Smit on
repeats and N. Goldman on 4-fold degenerate
sites. However, this effect was not seen in
noncoding, nonrepetitive DNA (Miller).
12
p-values reflecting different divergence rates
reveal more significant alignments
Jia Li and Webb Miller HMMs to model local rate
variation, then use Markov model to assign
p-value given that local rate.
13
What factors account for the variation in
conservation of noncoding DNA?
  • Multivariate analysis of alignments on chr22.
  • For non-overlapping 10 kb windows, measure
    fraction of DNA that aligns and other genomic
    parameters.
  • Analyze for single and multiple parameters that
    predict the variation in conservation.

14
Fraction of sequence aligning is associated with
fewer repeats and more GC and exons
15
Negative correlation between aln and rep is
highly significant chromosome 22q (33.4 Mb)
16
Multivariate analysis also shows that aligning
genomic DNA is associated with fewer repeats and
more GC
Correlation with fraction of aligning
nonrepetitive Parameter sequence
(aln_nr) GC Exon GC content 0.222 Exon
density 0.278 0.268 snp density 0.000 0.17
4 0.125 6-mer exact matches 0.062 0.013 0.044
Repeat density -0.327 -0.518 -0.266
Results of multivariate analysis of 3329
non-overlapping 10 kb windows comprising chr22.
17
Sliced inverse regression finds two combinations
of parameters that explain only 16 of the
variation in fraction aligning (aln)
18
Alu repeats insert randomly with respect to
fraction of a segment that aligns, but are
retained in regions of limited alignment
young
old
Chr7, 10 kb windows
19
Nucleotide level alignment scores (ASPC, id) do
not correlate well with coverage by alignments
(aln)
Blue line is the lowess, smoothing parameter 0.5.
aspc alignment score per column
aln_nr fraction of nonrepetitive DNA that aligns
20
Use measures of alignment quality to discriminate
between functional classes of DNA
  • Types of quality scores
  • Percent identity
  • Context-dependent I-score
  • Principal component analysis
  • Frequency of exactly matching hexamers
  • Datasets
  • Chr22 alignments and annotations
  • Whole genome alignments and annotations
  • 95 known regulatory regions

21
Fraction matching in alignments distinguishes
exons and regulatory regions (partially) from
neutrally evolving sequences
Denominator excludes gaps
Denominator includes gaps
22
Principal component analysis of alignment
parameters for different classes of DNA on chr22
Two principal components (PCA1 and PCA2) account
for 99 of the variability in match (M),
transition (S), transversion (V) and gap (G)
space.
23
Distinguishing functional segments from
nonfunctional DNA after PCA
The data are distinguished along two orthogonal
directions, D1 dominated by gaps and D2
dominated by matches. A combination of directions
is more effective than one direction.
24
Alignment scores derived from PCA do not (yet)
improve discrimination among classes of DNA
Values assigned to matches, transitions,
transversions and gaps were derived from
coefficients in directions in the PCA1-PCA2 plane
that discriminate among the classes of DNA.
Write a Comment
User Comments (0)
About PowerShow.com