RNA-Seq technology and it's application on dosage compensation between the X chromosome and autosomes in mammals - PowerPoint PPT Presentation

1 / 45
About This Presentation
Title:

RNA-Seq technology and it's application on dosage compensation between the X chromosome and autosomes in mammals

Description:

A recent study using time-course microarray data excluded lowly expressed genes, ... In Caenorhabditis elegans hermaphrodites, the X: ... – PowerPoint PPT presentation

Number of Views:59
Avg rating:3.0/5.0
Slides: 46
Provided by: ludong
Category:

less

Transcript and Presenter's Notes

Title: RNA-Seq technology and it's application on dosage compensation between the X chromosome and autosomes in mammals


1
RNA-Seq technology and it's application on dosage
compensation between the X chromosome and
autosomes in mammals
2011-12-05
2
Outline
  • RNA-Seq technologies and it's methodologies
  • Application on dosage compensation model

3
RNA-Seq technologies and it's methodologies
4
Transcriptomics methods before RNA-Seq
  • Hybridization-based approaches
  • Genomic tiling microarrays
  • Fluorescently labelled cDNA with microarrays
  • Sequence-based approaches
  • Sanger sequencing of cDNA or EST libraries
  • Serial analysis of gene expression (SAGE)
  • Cap analysis of gene expression (CAGE)
  • Massively parallel signature sequencing (MPSS)

5
A typical RNA-Seq experiment
6
Sequencer used for RNA-Seq
  • Illumina IG
  • Applied Biosystems SOLiD
  • Roche 454 Life Science
  • Helicos Biosciences tSMS (has not yet been used
    for published RNA-Seq studies, data from Jan.
    2009)

7
Direct RNA sequencing using the Helicos approach
a RNA that is polyadenylated and 3'
deoxy-blocked with poly(A) polymerase is captured
on poly(dT)-coated surfaces. A 'fill-and-lock'
step is performed, in which the 'fill' step is
performed with natural thymidine and polymerase,
and the 'lock' step is performed with
fluorescently labelled A, C and G Virtual
Terminator (VT) nucleotides and polymerase. This
step corrects for any misalignments that may be
present in poly(A) and poly(T) duplexes, and
ensures that the sequencing starts in the RNA
template rather than the polyadenylated tail. b
Imaging is performed to locate the positions of
the templates. Then, chemical cleavage of the
dyenucleotide linker is performed to release the
dye and prepare the templates for nucleotide
incorporation. c Incubation of this surface
with one labelled nucleotide (C-VT is shown as an
example) and a polymerase mixture is carried out.
After this step, imaging is performed to locate
the templates that have incorporated the
nucleotide. Chemical cleavage of the dye allows
the surface and DNA templates to be ready for the
next nucleotide-addition cycle. Nucleotides are
added in the C, T, A, G order for 120 total
cycles (30 additions of each nucleotide).
8
Advantages of RNA-Seq compared with other
transcriptomics methods
9
Quantifying expression levels RNA-Seq and
microarray compared
10
Challenges for RNA-Seq
  • Library construction
  • Bias in the result from different library
    construction (RNA fragmentation and cDNA
    fragmentation) for large RNA
  • Strand-specific libraries are currently laborious
    to produce
  • Bioinformatic challenges
  • The development of efficient methods to store,
    retrieve and process large amounts of data
  • Mapping reads to the genome
  • Coverage versus cost

11
DNA library preparation RNA fragmentation and
DNA fragmentation compared
a Fragmentation of oligo-dT primed cDNA (blue
line) is more biased towards the 3' end of the
transcript. RNA fragmentation (red line) provides
more even coverage along the gene body, but is
relatively depleted for both the 5' and 3' ends.
Note that the ratio between the maximum and
minimum expression level (or the dynamic range)
for microarrays is 44, for RNA-Seq it is 9,560.
The tag count is the average sequencing coverage
for 5,000 yeast ORFs. b A specific yeast gene,
SES1 (seryl-tRNA synthetase), is shown.
12
Coverage versus depth
13
Metholologies for RNA-Seq studies
  • Mapping transcription start sites
  • Strand-specific RNA-Seq
  • Characterization of alternative splicing patterns
  • Gene fusion detection
  • Targeted approaches using RNA-Seq
  • Small RNA profiling
  • Direct RNA sequencing
  • Profiling low-quantity RNA samples

14
Mapping transcription start sites (TSSs)
15
Mapping transcription start sites (TSSs)
  • Advantages
  • Low quantities of input RNA
  • Pair-end sequencing enables identified TSSs to
    specific transcripts
  • Pair-end sequencing alleviates the difficulty of
    aligning single short reads to repeat regions
  • Disadvantages
  • Primer dimers dominates sequencing data sets
  • Dependent on cDNA synthesis or hybridization
    steps
  • Be challenging for short-lived transcripts

16
Strand-specific RNA-Seq
  • Adaptors with known orientations are ligated to
    the ends of RNAs or to first-strand cDNA
    molecules
  • Direct sequencing of the first-strand cDNA
    products
  • Selective chemical marking of the second-strand
    cDNA synthesis products or RNA

17
Characterization of alternative splicing patterns
a Sequence reads are mapped to genomic DNA or
to a transcriptome reference to detect
alternative isoforms of an RNA transcript.
Mapping is based simply on read counts to each
exon and reads that span the exonic boundaries.
One infers the absence of the genomic exon in the
transcript by virtue of no reads mapping to the
genomic location. b Paired sequence reads
provide additional information about exonic
splicing events, as demonstrated by matching the
first read in one exon and placing the second
read in the downstream exon, creating a map of
the transcript structure.
18
Gene fusion detection
19
Targeted approaches using RNA-Seq
20
Targeted approaches using RNA-Seq
21
Small RNA profiling
22
Direct RNA sequencing
a RNA that is polyadenylated and 3'
deoxy-blocked with poly(A) polymerase is captured
on poly(dT)-coated surfaces. A 'fill-and-lock'
step is performed, in which the 'fill' step is
performed with natural thymidine and polymerase,
and the 'lock' step is performed with
fluorescently labelled A, C and G Virtual
Terminator (VT) nucleotides and polymerase. This
step corrects for any misalignments that may be
present in poly(A) and poly(T) duplexes, and
ensures that the sequencing starts in the RNA
template rather than the polyadenylated tail. b
Imaging is performed to locate the positions of
the templates. Then, chemical cleavage of the
dyenucleotide linker is performed to release the
dye and prepare the templates for nucleotide
incorporation. c Incubation of this surface
with one labelled nucleotide (C-VT is shown as an
example) and a polymerase mixture is carried out.
After this step, imaging is performed to locate
the templates that have incorporated the
nucleotide. Chemical cleavage of the dye allows
the surface and DNA templates to be ready for the
next nucleotide-addition cycle. Nucleotides are
added in the C, T, A, G order for 120 total
cycles (30 additions of each nucleotide).
23
Profiling low-quantity RNA samples
a Single-molecule DNA and RNA sequencing
technologies could be modified for single-cell
applications. Cells can be delivered to flow
cells using fluidics systems, followed by cell
lysis and capture of mRNA species on the
poly(dT)-coated sequencing surfaces by
hybridization. Standard sequencing runs could
take place on channels with a 127.5 mm2 surface
area, requiring 2,750 images to be taken per
cycle to image the entire channel area. The
surface area needed to accommodate 350,000 mRNA
molecules contained in a single cell is 0.4 mm2
thus, only eight images per cycle would be
needed. Sequence analysis can be done with direct
RNA sequencing (DRS) or on-surface cDNA synthesis
followed by single-molecule DNA sequencing. b
Counter system workflow. Two probes are used for
each target site the capture probe (shown in
red) contains a target-specific sequence and a
modification that allows the immobilization of
the molecules on a surface the reporter probe
contains a different target-specific sequence
(shown in blue) and a fluorescent barcode (shown
by a green circle) that is unique to each target
being examined. After hybridization of the
capture and reporter probe mixture to RNA samples
in solution, excess probes are removed. The
hybridized RNA duplexes are then immobilized on a
surface and imaged to identify and count each
transcript with the unique fluorescent signals on
the capture and reporter probes.
24
Reference
  • Zhong, W. et al. RNA-Seq a revolutionary tool for
    transcriptomics. Nature Reviews Genetics 10, 57
    (2009).
  • Fatih, O. et al. RNA sequencing advances,
    challenges and opportunities. Nature Reviews
    Genetics 12, 87 (2011).
  • Jeffrey, A. M. et al. Next-generation
    transcriptome assembly. Nature Reviews Genetics
    12, 671 (2011)
  • Philipp, K. et al. New class of
    gene-termini-associated human RNAs suggests a
    novel RNA copying mechanism. Nature 466, 642
    (2010).

25
Application on dosage compensation model
26
(No Transcript)
27
(No Transcript)
28
Background
  • Ohno's hypothesis
  • X-linked genes are expressed at twice the level
    of autosomal genes per active allele to balance
    the gene dose between the X chromosome and
    autosomes.
  • Microarray data (XAA 1)

29
Abstract from Xiong et al
  • Mammalian cells from both sexes typically contain
    one active X chromosome but two sets of
    autosomes. It has previously been hypothesized
    that X-linked genes are expressed at twice the
    level of autosomal genes per active allele to
    balance the gene dose between the X chromosome
    and autosomes (termed 'Ohno's hypothesis'). This
    hypothesis was supported by the observation that
    microarray-based gene expression levels were
    indistinguishable between one X chromosome and
    two autosomes (the X to two autosomes ratio
    (XAA) 1). Here we show that RNA sequencing
    (RNA-Seq) is more sensitive than microarray and
    that RNA-Seq data reveal an XAA ratio of 0.5 in
    human and mouse. In Caenorhabditis elegans
    hermaphrodites, the XAA ratio reduces
    progressively from 1 in larvae to 0.5 in
    adults. Proteomic data are consistent with the
    RNA-Seq results and further suggest the lack of X
    upregulation at the protein level. Together, our
    findings reject Ohno's hypothesis, necessitating
    a major revision of the current model of dosage
    compensation in the evolution of sex chromosomes.

30
Expression level definition
  • Taking mouse as an example, we mapped all 25-mer
    RNA-Seq reads to the genome sequence. Only those
    reads uniquely mapped to exons were considered as
    valid hits for a given gene. The expression level
    of a gene is defined by the number of valid hits
    to the gene divided by the effective length of
    the gene, which is the total number of 25-mers in
    the DNA sequences of the exons of the gene that
    have no other matches anywhere in the genome. For
    comparisons between tissues or developmental
    stages, expression levels were normalized by
    dividing the total number of valid hits in the
    sample.

31
Comparison of gene expressions measured by
microarray and RNA-Seq
Human liver is considered unless otherwise noted.
(a) Estimation variation measured by the fold
difference of microarray intensities of two
same-target probesets or of RNA-Seq signals from
two halves of the same gene. (b) Identical to a,
except that mouse liver is considered here. (c)
Comparison of the internal consistency of RNA-Seq
data and microarray data. The expression
differences from one-half of the nucleotides
(RNA-Seq) or a probeset (microarray) are shown
for 1,000 randomly picked gene pairs each with
twofold 0.01-fold expression difference from
the other half of nucleotides (RNA-Seq) or from
the other probeset (microarray). The central bold
line shows the median, the box encompasses 50 of
data points and the error bars include 90 of
data points. (d) Pearson's correlation (r) of
microarray and RNA-Seq expression signals (gray)
and of RNA-Seq signals from two independent
experiments (black). A certain fraction of genes
(x axis) with the highest expression according to
one of the RNA-Seq datasets are examined. Error
bars show 95 confidence intervals estimated by
bootstrapping. (e) Microarray consistently
underestimates expression differences between
genes. The microarray expression differences of
1,000 randomly picked gene pairs each with x-fold
(x 2 0.01, 4 0.02, 8 0.04, 16 0.08, 32
0.16, and 64 0.32) RNA-Seq expression
difference are shown. The central bold line shows
the median, the box encompasses 50 of data
points and the error bars include 90 of data
points. (f) Relative liver expressions of 55
mouse genes, measured by RNA-Seq, microarray and
qRT-PCR.
32
Comparisons of RNA-Seq gene expression levels
between the X chromosome and autosomes in 12
human tissues and 3 mouse tissues
(a) The median expression levels of X-linked
genes (closed diamonds) and autosomal genes (open
circles) are compared. Median expressions of
autosomal genes were normalized to 1. Error bars
show 95 bootstrap confidence intervals. Sex
information is listed in the parantheses after
the tissue names (M, male F, female NA,
unknown). (b) XAA ratios of median expressions
from the human liver when X is compared to
individual autosomes. Error bars show 95
bootstrap confidence intervals.
33
Test upregulation in Ohno's hypothesis
  • Upregulation in Ohno's hypothesis
  • In Ohno's hypothesis, upregulation is needed for
    those X-linked genes that had existed in the
    genome before the emergence of the X chromosome
    X-linked genes that originated de novo on X
    presumably do not require upregulation.

34
Test upregulation in Ohno's hypothesis
35
Comparison of RNA-Seq gene expression levels of
the X chromosome and autosomes in C. elegans
36
Caveats in this RNA-Seq analysis
  • The Illumina sequencing used here may be biased
    toward certain sequences or nucleotides.
  • Reverse transcription during cDNA library
    preparation is likely to be less efficient for
    longer transcripts.
  • GC content may affect RNA-Seq results.
  • A recent study using time-course microarray data
    excluded lowly expressed genes, which is
    inappropriate for measuring the absolute value of
    XAA ratio.

37
(No Transcript)
38
Main idea
  • Here we contend that the low estimate of the XAA
    ratio by Xiong et al. stems from the
    disproportionate contribution of
    transcriptionally inactive genes, which are not
    relevant for the evaluation of dosage
    compensation mechanisms, to the X chromosome
    average. We show that when only active genes are
    considered, the RNA-seq data give XAA ratios
    closer to 1, and the observed minor deviation of
    the XAA ratio from 1 is within the range
    expected when taking into account
    chromosome-to-chromosome variability

39
Key notes
  • RPKM (the number of associated reads per kilobase
    of exonic sequence per million of total reads
    sequenced.)
  • We assert that the effect of a mechanism that
    regulates transcriptional dosage compensation
    pertains only to the expression magnitude of
    transcriptionally active genes.
  • The fraction of undetected (RPKM 0) genes is
    substantially higher on the X chromosome than on
    autosomes, accounting for as much as 40 of all
    the X-linked genes.
  • Threshold in the analysis (RPKM gt 1 with at
    least 3 reads)

40
Fraction of transcriptionally inactive genes on
autosomes and X chromosome
41
The ratio of the median transcription magnitudes
of X-linked and autosomal genes
The XAA ratio estimates are shown based on the
set of genes with minimal transcription (RPKM 1
and at least 3 associated reads). Black error
bars show the 95 confidence interval (CI) based
on bootstrap estimates incorrectly assuming
independence of expression levels for neighboring
genes (plotted here for reference not used to
make inferences). Red bars show the range around
1 into which the XAA ratio is expected to fall
(95 CI) in the presence of twofold upregulation
of the X chromosome, taking into account
interchromosomal variation (sampling of
contiguous blocks of X-chromosome size from the
autosomal portion of the genome). The observed
XAA values (black dots) in all tissues fall
within this range, indicating that the observed
transcriptional magnitude of X-linked genes is
compatible with the presence of twofold
upregulation. The blue bars show the range around
0.5 into which the XAA ratio is expected to fall
in the absence of X-chromosome upregulation (50
of the autosomal expression level). The XAA
estimates for the first five samples fall outside
of this range, indicating that the X-linked
expression magnitude is significantly higher than
that expected in the absence of dosage
compensation. The XAA values for other samples
are within both the red and blue ranges,
indicating that the two hypotheses (XAA 1 and
XAA 0.5) cannot be clearly distinguished based
on these individual data sets.
42
The chr. 10A and chr. 11A ratios illustrating
chromosome-to-chromosome variability
43
Mouse RNA-seq data shows a lack of dosage
compensation
44
Dependence of the XAA estimates on the RPKM
threshold
Dependence of the XAA estimates on the RPKM
threshold. The tissue-averaged XAA estimates are
shown (black) as a function of the minimal RPKM
threshold, from 0 (all genes, including those
with undetected expression) to RPKM 2. The error
bars correspond to the s.e.m. between different
tissues. The largest change in the ratio is
observed after exclusion of genes with undetected
expression (RPKM gt0). As the RPKM thresholds
increase, the XAA ratio largely stabilizes above
RPKM 1. The application of a RPKM threshold
increases the median expression level and can
artificially shift the XAA ratio closer to 1.
The shaded gray region shows the 95 confidence
envelope for the hypothetical X chromosome that
is expressed at 50 of the autosomal level (see
Supplementary Methods). For non-zero RPKM
thresholds, the observed XAA ratios lie outside
of this 95 confidence interval, showing that the
high XAA ratios are increased more than is
expected from only setting a RPKM threshold.
45
Discussion
Write a Comment
User Comments (0)
About PowerShow.com