Beyond the Human Genome: Transcriptomics - PowerPoint PPT Presentation

About This Presentation
Title:

Beyond the Human Genome: Transcriptomics

Description:

Beyond the Human Genome: Transcriptomics Dr Jen Taylor Henry Wellcome Centre for Gene Function Bioinformatics Department of Statistics taylor_at_stats.ox.ac.uk – PowerPoint PPT presentation

Number of Views:273
Avg rating:3.0/5.0
Slides: 69
Provided by: statsOxA
Category:

less

Transcript and Presenter's Notes

Title: Beyond the Human Genome: Transcriptomics


1
Beyond the Human GenomeTranscriptomics
  • Dr Jen Taylor
  • Henry Wellcome Centre for Gene Function
  • Bioinformatics
  • Department of Statistics
  • taylor_at_stats.ox.ac.uk

2
Beyond the Human Genome 1995 Human Genome
sequencing begins in earnest Mapping the Book of
Life 1999 Human Genome 2000 - First
Draft Human Genome 2003 - Essential
Completion Human Genome
approx 140, 000 genes
30, 000 40,000 genes ??
24, 195 genes !!!???
Commemorative stained glass window for F.C.
Crick, designed by Maria McClafferty.(Photograph
Paul Forster) Gonville Caius College,
Cambridge, UK.
3
Beyond the Human Genome
Gene Number ? Complexity
Gene
Commemorative stained glass window for F.C.
Crick, designed by Maria McClafferty.(Photograph
Paul Forster) Gonville Caius College,
Cambridge, UK.
4
Introduction The scope of transcriptomics a
definition of the transcriptome Part I
Observing the transcriptome Experimental
methodology Data analysis Part II Using the
transcriptome The regulation of the
trancriptome The transcriptome and the genome The
transcriptome and the proteome Beyond the Human
Transcriptome
5
Transcriptome transcriptome, the mRNAs
expressed by a genome at any given
time.. (Abbott, 1999)
6
Central Dogma of Molecular Biology
  • mRNA single stranded RNA molecule
  • Complementary to DNA
  • Processed (spliced and polyadenylated) RNA
    transcript
  • Carries the sequence of a gene out of the nucleus
    into the cytoplasm where it can be translated
    into a protein structure

Image Access Excellence, National Institutes of
Heath
7
Transcriptome An evolving definition
  • (the population of) mRNAs expressed by a
    genome at any given time
  • (Abbott, 1999)
  • The complete collection of transcribed
    elements of the genome.
  • (Affymetrix, 2004)
  • mRNAs 35, 913 transcripts (including
    alternative spliced variants)
  • Non-coding RNAs
  • tRNAs (497 genes)
  • rRNAs (243 genes)
  • snmRNAs (small non-messenger RNAs)
  • microRNAs and siRNAs (small interferring RNAs)
  • snoRNAs (small nucleolar RNAs)
  • snRNAs (small nuclear RNAs)
  • Pseudogenes ( 2,000)

8
The human transcriptome
Nucleotides
High density oligonucleotide arrays across 11
different cell lines 70 of transcripts
non-coding 79-88 have multiple
transcripts Kapranov et al., 2002 90 of
transcribed nucleotides outside annotated exons
The dimensions of the unique transcriptome?? gtgtgt
current 40,000 estimate
Kampa et al., Novel RNAs identified from an
in-depth analysis of the transcriptome of human
chromosomes 21 and 22. Genome Research. 2004
9
Transcriptomics
  • Scope
  • the population of functional RNA transcripts.
  • the mechanisms that regulate the production of
    RNA transcripts
  • dynamics of the trancriptome (time, cell type,
    genotype, external stimuli)
  • Definition
  • The study of characteristics and regulation of
    the functional RNA transcript population of a
    cell/s or organism at a specific time.

10
Introduction The scope of transcriptomics a
definition of the transcriptome Part I
Observing the transcriptome Experimental
methodology Data analysis Part II Using the
transcriptome The regulation of the
trancriptome The transcriptome and the genome The
transcriptome and the proteome Beyond the Human
Transcriptome
11
Observing the transcriptome
High-throughput friendly
Genome
Predicts Biology

Regulatory network
Transcriptome
Context dependent and dynamic
Proteome
Li et al., 2004
12
Publications Expression Profiling vs Proteomics
Data from PubMed
13
Observing the transcriptome?
Classic Human Transcriptome Profiling
Studies Trancriptome reflects Biology
Golub et al., Molecular classification of cancer
class discovery and class prediction by gene
expression monitoring. Science 1999. ALL acute
lymphoblastic leukemia AML acute myeloid
leukemia Scherf et al., A gene expression
database for the molecular pharmacology of
cancer. Nature Genetics 2000 60 human cancer cell
lines
14
Observing the transcriptome
  • Focussed Experimental Approaches
  • Northern Blotting Analysis
  • Real time PCR (quantitative or semi-quantitative)
  • Highthroughput Approaches
  • Closed System Profiling
  • Microarray expression profiling
  • ? Open System Profiling
  • Serial analysis of gene expression (SAGE)
  • Massively Parallel Signature Sequencing (MPSS)

15
Red increase of Cy5 sample transcripts Green
increase of Cy3 sample transcripts Yellow equal
abundance
Limit of Detection 1 in 30,000 transcripts
20 transcripts/cell
16
Experimental overview
17
Red increase of Cy5 sample transcripts Green
increase of Cy3 sample transcripts Yellow equal
abundance
Limit of Detection 1 in 30,000 transcripts
20 transcripts/cell
18
Platforms and Formats
  • Isotope
  • Nylon cDNA (300-900 nt)
  • Two-colour
  • Glass
  • cDNA or Oligo (80 nt)
  • 500 11,000 elements
  • Affymetrix
  • Silicone oligo (20 nt)
  • 22 ,000 elements
  • Tissue Arrays
  • Glass
  • Tissue Discs (20-150)

19
Affymetrix GeneChip
Limits 1 100,000 transcripts 5
transcripts/cell
Affymetrix GeneChip
20
http//www.affymetrix.com
21
Affymetrix
  • Gene Expression Arrays Transcripts/Genes
  • Arabidopsis Genome 24,000
  • C. elegans Genome 22,500
  • Drosophila Genome 18, 500
  • E. coli Genome 20, 366
  • Human Genome U133 Plus 47,000
  • Mouse Genome 39, 000
  • Yeast Genome 5, 841 (S. cerevisiae) 5, 031
    (S. pombe)
  • Rat Genome 30, 000
  • Zebrafish 14, 900
  • Plasmodium/Anopheles 4,300 (P. falciparum)
    14,900 (A. gambiae)
  • Barley (25,500), Soybean (37,500 23,300
    pathogen), Grape (15,700)
  • Canine (21,700), Bovine (23,000)
  • B.subtilis (5,000), S. aureus (3,300 ORFS),
    Xenopus (14, 400)

22
Microarray and GeneChip Approaches
  • Advantages
  • Rapid
  • Method and data analysis well described and
    supported
  • Robust
  • Convenient for directed and focussed studies
  • Disadvantages
  • Closed system approach
  • Difficult to correlate with absolute transcript
    number
  • Sensitive to alternative splicing ambiguities

23
Serial Analysis of Gene Expression (SAGE)
  • The principles
  • Velculescu et al., Science 1995
  • A transcript (new or novel) can be recognised by
    a small subset (e.g. 14) of its nucleotides
    a tag
  • Linking tags allows for rapid sequencing.
  • Open system for transcript profiling
  • Modified SAGE methods
  • LongSAGE (21 nt)
  • SAGE-lite, micro-SAGE, mini-SAGE
  • RASL/DASL methods (5 and 3 Tags)

14 nt
TAG
AAAAAAAAA 3
TAG
AAAAAAAAA 3
TAG
AAAAAAAAA 3
TAG
AAAAAAAAA 3
AGCTTGAACCGTGACATCATGGCCATTGGCCCCAATTGAGACAGTGAGTT
CAATGC
TAG
TAG
TAG
TAG
Sequence
24
SAGE
  • Advantages
  • Potential open system method new transcripts
    can be identified
  • Accuracy of unambiguous transcript observation
  • Digital output of data
  • Quantitative and qualitative information
  • Disadvantages
  • Characterising novel transcripts is often
    computationally difficult from short tag
    sequences
  • Tag specificity (recently increased length to 21
    bp)
  • Length of tags can vary (RE enzyme activity
    variable with temperature)
  • A subset of transcripts do not contain enzyme
    recognition sequence
  • Sensitive to a subset of alternative splice
    variants

25
Introduction The scope of transcriptomics a
definition of the transcriptome Part I
Observing the transcriptome Experimental
methodology Data analysis Part II Using the
transcriptome The regulation of the
trancriptome The transcriptome and the genome The
transcriptome and the proteome Beyond the Human
Transcriptome
26
Biological question
Sample Attributes
Experimental design Platform Choice
16-bit TIFF Files
Microarray experiment
(Rspot, Rbkg), (Gspot, Gbkg)
Image analysis
Normalization
Statistical Analysis
Clustering
Data Mining
Pattern Discovery
Classification
Biological verification and interpretation
27
Analysis
188, 000
47,000 x 2 x 2 datapoints
Liver
47,000 x 2 x 2 datapoints
188, 000
Brain
47,000 x 2 x 2 datapoints
Lymphocyte
188, 000
28
Analysis
  • Essential problem
  • Given a large dataset with technical and
    biological noise
  • Find
  • A) Transcripts patterns (common themes or
    differences)
  • measures of robustness or some idea of
    uncertainty
  • B) Sample similarities or differences between
    samples on global/multi-gene level

29
Analysis
Brain
Liver
Lymphocytes
Which transcripts are different?
What are the patterns?
30
Biologists Nightmare Statisticians Playground
  • Characteristics of the expression profiling data
  • High dimensionality
  • Sample number (n) low and observation number high
    (p)
  • Non-independence of observations
  • Complex patterns visualisation and extraction
  • Incorporation of contextual information
  • Standardisation and data sharing
  • Integration of with other data types

31
Analysis Methods
  • Classical parametric non-parametric statistical
    tests for hypothesis testing
  • Unsupervised clustering algorithms
  • Hierarchical clustering
  • Kmeans and Self-Organising Maps
  • Classification
  • e.g. Machine learning and Linear discriminant
    analysis
  • Dimensionality Reduction or Principal Component
    Analysis
  • e.g. Gene Shaving and Multi-dimensional Scaling
  • Probabilistic Modelling
  • Dynamic Bayesian Networks
  • Markov Models

32
Analysis Methods
  • Classical Parametric Statistical Analysis

Tools T-test ANOVA Mann Whitney U Test
Fold Change
Liver
Brain
Lymphocyte
33
Analysis Methods
  • Classical Parametric Statistical Analysis

(P0.01) 20,000 transcripts 200 transcripts
  • Difficulties
  • Assumes that observations are normally
    distributed and independent
  • Statistical significance does not equal
    biological significance
  • Appropriate multiple testing corrections are
    difficult

???
34
Analysis Methods
Clustering Approaches Divides or groups
genes/samples into groups clusters, based on
similarities and differences Number of groups is
user defined
Algorithms Hierarchical clustering Kmeans
clustering Self organising maps
35
Distance Metrics
Time
Distance between 2 expression vectors
Euclidean Pearson(r-1)
1.4
-0.90
4.2
-1.00
36
Distance Metric
Transcription Factor Transcript
Target Transcript 1 Target Transcript 2
37
Hierarchical Clustering
g1 is most like g8
g4 is most like g1, g8
38
Hierarchical Tree
39
Clustering Case Study
  • Sorlie et al., 2001
  • Breast tissue subtypes
  • Hierarchical clustering

40
K-means clustering
Partition or centroid algorithms
Step 1 User specifies K clusters
x
K 3
x
Expression Level
Brain
x
Liver Expression Level
41
K-means clustering
Step 2 Using Euclidean distance nearest points
assigned to clusters (k)
Step 3 New centroids calculated
x
K 3
x
x
42
K-means clustering
Step 4 Points re-assigned to nearest centroid
Step 5 New centroids calculated
K 3
43
Classification
Transcript B
Transcript A
K-nearest neighbour methods (KNN) Linear
Discriminant Analysis (LDA) Machine Learning
Support Vector Machines Neural Network
Analysis
Adapted from Florian Markowetz
44
Classification
Training Set 2/3 sample set
Test Set 1/3 sample set
Define Classification Rule
Linear Discriminant Analysis KNN
Gene B
Gene A
45
Classification More complex classifiers
Gene B
Gene A
KNN Voting scheme (k3) Use three closest
points to classify
Adapted from Florian Markowetz
46
Probabilistic Modelling
  • Incorporate dependencies and prior knowledge
    into the identification of patterns/clusters
  • - relationships in time between samples
  • - relationships between genes
  • Handle measures of uncertainty well
  • Conceptually simple, consideration needed on
    implementation
  • Markov modelling
  • Dynamic bayesian networks

47
Analysis Methods
  • Classical parametric non-parametric statistical
    tests for hypothesis testing
  • Unsupervised clustering algorithms
  • Hierarchical clustering
  • Kmeans and Self-Organising Maps
  • Classification
  • Machine learning and Linear discriminant
    Analysis
  • Dimensionality Reduction or Principal Component
    Analysis
  • Gene Shaving and Multi-dimensional Scaling
  • Probabilistic Modelling
  • Dynamic Bayesian Networks and Pattern
    recognition
  • Markov Models

48
Introduction The scope of transcriptomics a
definition of the transcriptome Part I
Observing the transcriptome Experimental
methodology Data curation and analysis pipelines
Part II Using the transcriptome The regulation
of the trancriptome The transcriptome and the
genome The transcriptome and the proteome Beyond
the Human Transcriptome
49
. to be continued.
50
Introduction The scope of transcriptomics a
definition of the transcriptome Part I
Observing the transcriptome Experimental
methodology Data curation and analysis pipelines
Part II Using the transcriptome The regulation
of the trancriptome The transcriptome and the
genome The transcriptome and the proteome Beyond
the Human Transcriptome
51
Regulation of Gene Expression
Abundance (transcript) Rate of
Transcription Rate of Decay
Decay
Transcription
  • Protein/DNA interactions
  • cis and trans regulatory sequence motifs
  • chromatin structure
  • Methylation
  • Protein/RNA interactions
  • cis-acting regulatory motifs
  • secondary structure

52
Regulation of Transcription
Wray et al., 2003
53
Regulation of Decay
Stabilisation facilitates rapid increase in
potential protein production Destabilisation
facilitates precise time and dose control of
transcripts
Abundance
Stabile
Abundance
Decay
Time
Time
  • Sequence-mediated mRNA decay AU rich elements
    (AREs)
  • 3 UTR, 50 150 nucleotides
  • usually multiple copies (e.g. AUUUA x 5)
  • protein recruitment for destabilisation
  • size and content variation (functionally
    critical motif unknown)
  • gt30 of vertebrate homologous mRNAs have highly
    conserved elements in the 3UTR - often
    sequence position

54
  • The importance of the decay process
  • BMP2 (bone morphogenetic protein 2)
    developmentally critical, highly conserved
    protein in vertebrates (Fritz et al., 2004)
  • 3 UTR conservation
  • - 73 /100 nucleotides, 450 myr evolution
  • - 95 within mammals
  • Cancer related genes
  • C-fos, C-myc, C-jun, MMP-13, Cyclooxygenase-2,
    Cyclin D, Cyclin E, Cyclins A and B, Cdk
    inhibitors, DNA methyltransferase 1.
  • (Review Audic and Hartley, 2004)

55
Regulation of Transcription
Wray et al., 2003
56
Regulation of Trancription
Diverse orientations, structure and functional
properties of regulatory modules
Wray et al., 2003
57
Regulation of the transcriptome
  • Finding regulatory elements using co-abundant
    transcripts

Assumption shared abundance profile same
cluster shared regulatory machinery
Penacchio and Rubin, 2001
58
Introduction The scope of transcriptomics a
definition of the transcriptome Part I
Observing the transcriptome Experimental
methodology Data analysis Part II Using the
transcriptome The regulation of the
trancriptome The transcriptome and the genome The
transcriptome and the proteome Beyond the Human
Transcriptome
59
The transcriptome the genome
  • Using the genome to infer/observe the
    transcriptome
  • Construction of whole genome/transcriptome arrays
    and SAGE tags
  • Using sequence features to predict gene
    expression
  • Beer and Tavazoie. Predicting gene expression
    from sequence. Cell 2004
  • Using chromatin structure to predict regulation
    of gene expression
  • Sabo et al. Genome-wide identification of
    DNaseI hypersenstive sites. PNAS 2004
  • Quantitative trait loci mapping
  • Morley et al., Genetic analysis of genome-wide
    variation in human gene expression. Nature 2004
  • Schadt et al., Genetics of gene expression
    surveyed in mouse, human and maize. Nature
    2003

60
Transcriptome Genome
  • Beer and Tavazoie, Cell. 2004

Abundance profile
Predict potential gene expression patterns
Transcription factor binding site
61
Transcriptome Genome
  • Beer and Tavazoie, Cell. 2004

AND Logic, OR Logic
AND Logic
OR Logic, NOT Logic
Combinatorial patterns help identify groups of
transcripts predicted to show similar abundance
profiles
Solid Actual expression Dashed Predicted
62
Introduction The scope of transcriptomics a
definition of the transcriptome Part I
Observing the transcriptome Experimental
methodology Data analysis Part II Using the
transcriptome The regulation of the
trancriptome The transcriptome and the genome The
transcriptome and the proteome Beyond the Human
Transcriptome
63
The transcriptome the proteome
  • Functional annotations of co-abundant genes
  • Yang et al., 2003 Decay rates of human mRNAs
    Correlation with functional characteristics and
    sequence attributes. Genome Research.
  • Co-ordinated patterns of decay rates within
    functional classes of transcripts
  • Transcription factor functional classes have
    fast-decaying mRNAs (lt2 hr half lives).
  • Transcripts of multi-subunit proteins have
    correlated decay patterns and rates

64
The transcriptome the proteome
  • Do they agree?
  • Studies of direct correlation between mRNA
    abundance and protein abundances
  • ( r 0.6) (Hegde et al., 2003)
  • Biological Issues
  • Post-translational modifications
  • Protein stability and folding
  • Alternative splicing products
  • Technical Issues
  • Inter-platform variability (microarray and RT
    PCR r 0.8)
  • Protein abundance measures 2D gel
    electrophoresis

65
The transcriptome the proteome
  • The integration of transcriptomics and proteomics

Hegde et al., 2003
Synergistic approaches to biological problems
using both transcriptomics and proteomics
66
Beyond the Human Transcriptome
  • Challenges for the Future (short and long term)
  • Integration of different datatypes
  • - sequence, exon structure, transcript
    abundance, protein abundance and function
  • Dealing with alternative splice variants
  • The regulatory processes behind any given RNA
    abundance
  • Dealing with gene ontologies in a quantitative
    manner

67
Beyond the Human Transcriptome
  • Future Directions
  • Open systems for comprehensively cataloguing
    the transcriptome
  • - between tissues/cells/developmental time
    points
  • - between individuals
  • Variation of transcriptome between individuals
  • - coding variants, epigenetic variation and
    inheritance
  • Clinical deployment of transcriptome profiling
    approaches in diagnostics and pharmacogenetics
  • Human Regulatory Network Resources for Tissues

68
Acknowledgements
Oxford Centre for Gene Function Jotun Hein Chris
Holmes Gerton Lunter Lizhong Hao Ben Holtom Karen
Lees http//www.stats.ox.ac.uk/taylor/Presentati
ons
Write a Comment
User Comments (0)
About PowerShow.com