The hidden layer of regulatory noncoding RNA in the genetic programming of complex organisms - PowerPoint PPT Presentation

1 / 54
About This Presentation
Title:

The hidden layer of regulatory noncoding RNA in the genetic programming of complex organisms

Description:

The hidden layer of regulatory noncoding RNA in the genetic programming of ... fewer protein-coding genes than the nematode worm C. elegans (~19,000), which ... – PowerPoint PPT presentation

Number of Views:597
Avg rating:3.0/5.0
Slides: 55
Provided by: chrisb101
Category:

less

Transcript and Presenter's Notes

Title: The hidden layer of regulatory noncoding RNA in the genetic programming of complex organisms


1
The hidden layer of regulatory noncoding RNA in
the genetic programming of complex organisms
2
The genetic basis of eukaryotic complexity and
variation
  • The number of protein-coding genes does not scale
    strongly or consistently with complexity

- insects only have just over twice as many
protein-coding genes (13,500) as yeast
(6,000) and P. aeruginosa (5,200).
- insects have 50 fewer protein-coding genes
than the nematode worm C. elegans (19,000),
which has only 1,000 cells.
- vertebrate (human, mouse, fish) protein-coding
gene numbers (20-25,000) are only slightly
higher than that of C. elegans, and less than
those of plants (rice 40,000).
  • The range of protein isoforms (by alternative
    splicing, RNA editing, post-translational
    modification etc.) does appear to increase with
    complexity - but this also requires an increase
    in regulation.
  • The relative amount of noncoding DNA also scales
    with complexity.

3
The proportion of noncoding DNA broadly increases
with developmental complexity
Note This analysis corrects for varying ploidy.
The trend is somewhat noisy because of variable
amounts of repetitive sequences (of unknown
function) but is not negated by upward exceptions
(e.g. amphibians).
R.J. Taft and J.S. Mattick, http//genomebiology.c
om/2003/5/1/P1
4
The problem of development
Unicellular gt colonial, limited differentiation
1014 positionally distinct cells, with precise
architecture and differentiated function
5
The programming of complex objects
  • Complex objects / organisms require two levels of
    programming
  • the specification of the structural and
    functional components
  • (including signalling devices / systems)

- the design plans for the assembly of these
components
  • Differences in complex objects (e.g. aircraft or
    mammals) occurs by variation both of components
    and of their architectural assembly

- the genetic (phenotypic) impact of damage
(mutation) to these
different types of specification is quite
different
  • How is assembly - differentiation and development
    - programmed?
  • - the assumption is that the combinatorics
    of regulatory factors intersecting with
    environmental cues provide enough information to
    control the trajectories of differentiation
    and development

- the problem is not generating complexity per
se, but rather to control complex trajectories
- this requires enormous amounts of information
6
The genetic basis of human complexity and
variation
  • 98 of the transcriptional output in humans is
    noncoding RNA

- 95-97 of the primary transcript of
protein-coding genes is intronic
- there are enormous numbers of noncoding RNA
genes in the mammalian genome, which are only
now beginning to be recognized, and which
appear to account for between 1/2 and 3/4 of all
transcripts
  • The majority of the human genome is transcribed

- 1.9 exonic (1.2 protein-coding) x 20
indicates that 30-40 of the genome is
transcribed, just to account for protein-coding
genes
- if equal number of noncoding RNA transcripts,
then gt60 is transcribed
- a direct summation of known genes', mRNAs,
and spliced ESTs from the UCSC database shows
that (a minimum of) 58 of the human genome is
transcribed, 24 from both strands (total 2.3 Gb).
  • Either the genome is replete with useless
    transcription or these non-protein-coding RNAs
    are fulfilling some unexpected function

7
Central dogma and the E. coli lac paradigm
genes are synonymous with proteins
8
The central dogma states that genetic information
flows from DNA to RNA to protein
This is usually interpreted to mean that genetic
information flows from DNA to protein (via mRNA),
i.e. that genes are generally synonymous with
proteins
This is true in prokaryotes, whose genomes are
comprised (75-95) of wall-to-wall protein coding
sequences flanked by 5 and 3 regulatory
elements.
Thus proteins comprise not only the functional
and structural components of prokaryotic cells,
but are also the agents by which the system is
regulated in conjunction with cis-elements and
environmental signals.
It has been assumed that the latter also applies
in eukaryotes..with the logical extension (i)
that the increased complexity of eukaryotes is
explained by the combinatorics of regulatory
factors intersecting with more complex promoters
etc., with the corollary (ii) that the vast
tracts of non-protein-coding sequences in
eukaryotic genomes (98.8 in humans) are largely
evolutionary junk.
These assumptions are now articles of faith.but
they are not necessarily correct.
9
Eukaryotic gene structure
Mosaic protein coding sequence
intron
exon
transcription factors
Transcription
pre-mRNA
Splicing
Nucleus
?
intronic RNA
mRNA (alt)

Cytoplasm
Translation
protein (alt)
Possibility 1 intronic RNA is non-functional
Possibility 2 intronic RNA is functional
10
If one considers the reasonable (indeed more
plausible) alternative, i.e. that intronic RNA is
functional, then there are a number of logical
extensions
  • Genetic information is being expressed both as
    RNA and as proteins, and intronic RNA is
    transmitting secondary information in parallel
    with protein-coding sequences, which must be
    involved in networking of gene activity. Thus
    the genetic operating system is different between
    eukaryotes and prokaryotes.
  • Some, perhaps many, genes will have evolved only
    to express RNA signals, as higher order
    regulators in these networks.
  • These RNAs must be processed post-splicing and be
    transmitting information via RNA-DNA, RNA-RNA and
    RNA-protein interactions, presumably
    sequence-specifically. This equates to a
    quasi-digital feed-forward regulatory system that
    would, in theory, permit integration of complex
    suites of gene activity and regulatory regimes,
    and the programming of gene expression profiles
    throughout the trajectories of differentiation
    and development.

11
Prokaryotic gene
Eukaryotic gene
Hidden layer
networking
mRNA
mRNA and/or eRNA
functions
protein
protein
catalytic function
catalytic function
structural role
structural role
regulation
regulation
12
  • Not all introns have evolved function, as each
    intron is the descendant of an independent
    insertion event and is evolving independently,
    albeit in the context of their host transcript -
    as exemplified by the intron distribution in Fugu
    rubripes.

13
  • Evidence that prokaryotic complexity is limited
    by
  • (protein-based) regulatory overhead
    (accelerating cost)
  • Evidence that complex genetic phenomena in
    eukaryotes are mediated by RNA signalling
  • Evidence that small regulatory RNAs are derived
    from introns and control many aspects of animal
    and plant development
  • Evidence for large numbers of noncoding RNAs in
    mammals that are developmentally regulated
  • Evidence that noncoding sequences have unexpected
    blocks of conservation (including
    ultra-conservation and transposon-free regions)
    and that a large fraction of the genome may be
    functional and under selection

14
How is regulation expected to scale with function?
  • The numbers of regulators must rise as a
    non-linear (quadratic) function of the number of
    regulons (genes or co-regulated modules of genes)
    - in prokaryotes operons, in eukaryotes
    (mostly) individual genes, and splice variants
    thereof.
  • Each new regulon requires at least one new
    regulator (or regulatory combination), and an
    additional higher order regulator, depending on
    the degree of required connectivity (i.e.
    coordination with) other genes or suites of genes
    in the network. The existing regulatory network
    of the cell has also to be expanded to integrate
    the activity of the new regulon, if the system is
    not to be become disconnected.

r a?n2
These regulatory networks are accelerating, and
thus there will be a functional (and complexity)
limit imposed by the rising cost of regulatory
overhead, until and unless the regulatory system
undergoes a physical state transition (e.g.
analog gt digital).
15
Prokaryotic complexity is limited by regulatory
overhead
R 0.0000163 N1.96 (r2 0.88, 95 confidence
limit 1.81 - 2.11)
  • Prokaryotic genomes have maximum size of 12Mb
  • The numbers of regulators in prokaryotes scales
    as a square function of genome size (number of
    regulons).

Larry Croft, Martin Lercher, Michael Gagen and
John Mattick, http//au.arxiv.org/abs/q-bio.MN/03
11021
16
Accelerating networks John S. Mattick and Michael
J. Gagen Institute for Molecular Bioscience,
University of Queensland Brisbane, Qld 4072,
Australia Simply connection networks like
telephone exchanges or the internet can grow in
an unconstrained way giving, for instance,
scale-free networks with size independent
connectivity statistics. In contrast, networks
that must rapidly integrate and globally respond
to information from any component node require
network connectivity and regulatory overhead to
scale faster than linearly with network size.
These accelerating networks suffer decreasing
returns which inherently impose an upper limit on
network size and system complexity which can only
be surmounted by transforming the physical nature
of the connection and regulatory architecture.
This has implications for society, engineering
and biology wherein the structure and
connectivity of control networks ultimately
determine the functional complexity that
integrated systems can attain.
Science 307, 856-858 (2005)
17
A simplified biological history of the Earth
animals
plants
fungi
Multicellular world
Complexity
sponges
single-cell eukaryotes
Unicellular world
(protista)
eubacteria
archaea
present
-4,000
-3,000
-2,000
-1,000
Time (mya)
18
  • Complex genetic and molecular genetic phenomena
    in eukaryotes are all related to RNA signalling
    through intersecting pathways
  • - co-suppression - RNA mediated (systemic)
  • - transgene silencing - RNA mediated
  • - RNAi - RNA processing / involved in normal
    development
  • - imprinting - involves ncRNAs and methylation
  • - methylation - RNA directed
  • - transvection - probably RNA mediated
  • - transinduction, position effect variegation -
    RNA mediated
  • Other observations
  • - zinc-finger proteins, Y-box proteins, and
    other transcription factors have high
    affinity for nucleic acid structures involving
    RNA
  • - chromatin-modifying proteins (containing CD,
    SET and methyl-DNA- binding domains, and DNA
    methyltransferases) bind RNA
  • - all snoRNAs in animals and plants are
    processed from introns
  • - there are large numbers of small regulatory
    RNAs (microRNAs) that control various aspects of
    animal and plant development

19
Multiple microRNAs (miRs) are produced by
processing of introns of protein-coding
transcripts and the introns and exons of
noncoding RNAs Lagos-Quintana et al. Science
294, 853-858 (2001) Lau et al. Science 294,
858-862 (2001) Lee and Ambros Science 294,
862-864 (2001) Glimpses of a tiny RNA
world Ruvkun Science 294, 797-799 (2001)
20
microRNAs are located in the introns of an
imprinted noncoding RNA gene Seitz et al. (2003)
Nature Genetics 34, 261-262.
21
(No Transcript)
22
MicroRNA-Directed Cleavage of HOXB8 mRNA
Fig. 1. Genomic organization of HOX clusters.
Colored arrows indicate HOX genes representing 13
paralogous groups black arrowheads depict miRNA
genes. Repression supported by bioinformatic
evidence only (dotted red line), cell-culture and
bioinformatic evidence (dashed line), or in vivo,
cell culture, and bioinformatic evidence (solid
line) are indicated. The vertical red line
indicates that miRNAs from any of the three loci
could repress the targets. From Yekta et
al 2004 Science 304594-596
23
Validation of miRNA targets in the Notch pathway
miR-7 Notch genes (GY box motif) miR-2
proapoptotic genes miR-227 metabolic pathway
The miR-7 downregulates censor genes (C)
Expression of the miR-7 sensor transgene is shown
in green. Expression of the red fluorescent
protein miR-7 miRNA under ptcGal4 control is
shown in red. The right panel shows the miR-7
sensor alone. (D and E) Expression of the m4 3
UTR and hairy 3 UTR sensor transgenes (green)
were downregulated by miR-7 (red). Cut protein,
shown in blue, was downregulated in miR-7
expressing cells. The right panel shows a second
example of Cut repression. The lower panel shows
Cut channel alone. (G) Cuticle preparations of a
wild-type adult wing and a wing expressing miR-7
under ptcGal4 control in the region between
veins 3 and 4.
Stark et al. PLoS 1397-409 (2003)
24
The tip of the iceberg?
miRNAs control many aspects of animal and plant
development, including developmental timing, cell
proliferation, left-right patterning, floral
development, neuronal cell fate, apoptosis,
hematopoietic differentiation, adipocyte
differentiation and insulin secretion, and have
been shown to be perturbed in cancer.
Most miRNAs have been identified
bioinformatically on the basis of a
double-stranded precursor, a match to an mRNA
target (usually the 3-UTR) and their
evolutionary conservation.
The known miRNAs have multiple targets, which
imposes severe constraints on their evolution.
On the other hand siRNAs and known noncoding RNAs
like Xist and H19 can and do evolve quickly.
This may contribute to the perception that
genomic sequences encoding such ncRNAs are (in
general) drifting neutrally and (therefore)
non-functional.
The level of selection pressure on such sequences
(as signaling molecules largely dependent on
primary sequence recognition) will be a function
of the number of interactions that must be
maintained rather than the precise sequence
itself. Those with one or few interacting
partners will be able to evolve relatively freely
and also explore new (and quite plastic)
connections in regulatory networks.
Thus most regulatory RNAs, including the majority
of miRNAs, may not show strong evidence of
sequence conservation over significant
evolutionary distances.
25
(No Transcript)
26
(No Transcript)
27
Emerging evidence for large numbers of noncoding
RNAs
  • Transcript profiling using Affymetrix whole
    chromosome tiling arrays has revealed that the
    human transcriptome is amazingly complex, with
    many interlacing and overlapping transcripts, at
    least half of which do not encode proteins see
    Cawley et al., Cell (2004) Cheng et al., Science
    2005 Kapranov et al., Genome Research 2005.
  • Detailed molecular analysis of many loci, e.g.
    b-globin, bithorax, callipyge, Igf2, shows that
    the majority of transcripts are noncoding.
  • There are around 20,000 pseudogenes in the human
    genome Harrison et al., Genome Res. (2002). At
    least one is functional as an RNA which regulates
    the expression of its homologous protein-coding
    gene Hirotsune et al. Nature (2003) . Many
    human genes have antisense transcripts Yelin et
    al., Nature Biotech. (2003).
  • There are large numbers of noncoding sequences in
    well constructed cDNA libraries Okazaki et al.,
    Nature (2002) Carninci et al., Science (2005) -
    2 x 106 cDNAs, 20 x 106 TSS start sites.

28
  • RIKEN (Tokyo) and collaborators have cloned and
    analysed 60,770 full length cDNAs Okazaki et al.
    Nature (2002). (A more extensive study on
    gt100,000 cDNAs has just been published.)
  • These cDNAs have been grouped into 33,409
    transcription clusters, of which 15,815 (47) do
    not contain any substantial open reading frame,
    and are probable ncRNAs. Almost 30 of these are
    spliced.

29
(No Transcript)
30
Hierarchical clustering of up-regulated putative
ncRNAs
31
Tissue distribution of up-regulated putative
ncRNAs
25
20
gt 5 fold upregualtion cf. d17.5 embryo
of differentially expressed ncRNAs showing
15
10
5
0
Skin
Lung
Bone
Brain
Colon
Heart
Liver
Kidney
Testis
uterus
Muscle
Spleen
Adipose
Thymus
Stomach
Pancreas
Placenta
Cerebellum
Small Intestine
Neonatal Skin
Neonatal Cerebellum
32
ncRNAs are developmentally expressed and can
exhibit multiple splice forms
33
Dynamic regulation of ncRNAs by physiological
stimuli (exposure of macrophages to bacterial
lipopolysaccharide)
34
(No Transcript)
35
(No Transcript)
36
(No Transcript)
37
Non-coding RNA database at IMB
RNAdb a comprehensive mammalian noncoding RNA
database Ken C. Pang, Stuart Stephen, Pär G.
Engström, Khairina Tajul-Arifin, Weisan Chen,
Claes Wahlestedt, Boris Lenhard , Yoshihide
Hayashizaki, and John S. Mattick Nucleic Acids
Research 33 (Database Issue), D125-D130 (2005)
gtgt Articles gtgt Links gtgt ncRNA Books gtgt Events gtgt D
iscuss gtgt Database gtgt BLAST Search gtgt Fasta
Downloads gtgt XML Downloads gtgt Help
gt20,000 sequences
http//research.imb.uq.edu.au/RNAdb
38
Rapid evolution of noncoding RNAs lack of
conservation does not mean lack of function
(K.C. Pang, M.C. Frith and J.S. Mattick, Trends
in Genetics, in press)
39
Functions of regulatory RNA signals
  • Chromatin modification (evidence very strong)
  • Transcriptional regulation (some examples)
  • Control of alternative splicing (strongly
    predicted)
  • RNA modification and editing (snoRNAs, Alu
    elements)
  • Control of mRNA turnover (RNAi / miRNAs / siRNAs)
  • Control of translation (miRNAs)
  • Signal transduction (RNA binding proteins with
    signaling domains)

40
Conservation of nucleotide sequences around
alternative splice sites
No changes in 25 bases around indel alternative
exon in any vertebrate
No changes in 67 bases around exon-intron
junction in any mammal
41
(No Transcript)
42
(No Transcript)
43
Paper
44
Ultra-conserved elements sequences frozen in
vertebrates
human-mouse-rat
  • Many more, and larger, when allow small
    substitutions and indels.
  • All are intergenic or intronic (some overlap
    alternative splice sites).
  • Far more conserved than protein-coding sequences.
    Very low probability of finding even one
    ultraconserved sequence by chance (lt 10-22)
  • Most are conserved in chickens, two-thirds (core)
    conserved in fishes.

45
Evolution Slowed for 300 M Years
67 align _at_ 77 Compared to lt2 of genome _at_ 60
Amniotes
5
99 align _at_ 99 Compared to 25 of genome
85
95
Frozen unchanged for 85 million years!
110
97 align _at_ 96 6 align _at_ 100 Compared to 4 of
genome _at_ 63
Only 5 go back past 525 million years, unlike
protein and rRNA (Chordata) (Ciona)
310
360
400 Mya
46
Many ultra-conserved elements are associated with
extended genomic regions that lack transposon
insertions
CPLX1
18 kb Transposon-free region (TFR)
47
PRDM16 PR-domain zinc finger protein 16
48
  • NR2F1 (60 kb TFR)
  • nuclear receptor subfamily 2, group F, member 1
    (COUP transcription factor 1 )
  • Transposon free in six species
  • human
  • mouse
  • rat
  • dog
  • opossum
  • chicken

85 of human TFRs overlap with mouse TFRs
52 overlap with opossum TFRs
Most transposons have entered independently in
each lineage
49
Transposon-free regions in the human genome Cas
Simons, Michael Pheasant, Igor V. Makunin, and
John S. Mattick Institute for Molecular
Bioscience, University of Queensland, Brisbane
QLD 4072, Australia Despite the presence of over
3 million transposons separated on average by
approximately 500 bp, the human and mouse genomes
each contain almost 1,000 transposon-free regions
(TFRs) over 10 kb in length. The majority of
human TFRs correlate with orthologous TFRs in the
mouse, despite the fact that most transposons are
lineage specific. Many human TFRs also overlap
with orthologous TFRs in the marsupial opossum,
indicating that these regions have remained
refractory to transposon insertion for long
evolutionary periods. Most TFRs are not
associated with unusual nucleotide composition
but are significantly associated with genes
encoding developmental regulators, suggesting
that they represent extended regions of
regulatory information that are largely unable to
tolerate insertions, a conclusion difficult to
reconcile with current conceptions of gene
regulation.
submitted for publication
50
The percentage of the human genome under negative
selection has been estimated at 5 Waterston et
al. (2002) Nature 420 520-62, a figure which
has recently been revised upwards to at least 10
Smith et al. (2004) Genomics 84 806-13.
Recent studies on the CFTR and SIM2 loci in many
different mammalian species have shown that
noncoding regions of vertebrate genomes are
conserved with interesting patterns that are not
obvious from pairwise comparisons alone.
That is, sequences that may not be conserved
between human and mouse, are conserved e.g.
between human and dog or cow.and that much more
than 5 of the genome is under selection, both
negative and positive (associated with phenotypic
divergence).
These patterns suggest that some regulatory
sequences are conserved over long distances,
indicating that they are relevant to some general
aspect of vertebrate, amniotic or mammalian
biology, whereas others are clade- or
species-specific, relevant to the some particular
aspect of the biology of the group or species in
question.
These findings are difficult to reconcile with
protein-based models of gene regulation, but not
with the hypothesis that much of the transcribed
noncoding RNA is functional, in which case one
would predict that the majority of the genome is
under selection (strong and weak, positive and
negative), albeit evolving rapidly at many loci.
51
A large fraction of the mammalian genome is under
evolutionary selection
Actual landscape
Simulated landscape
52
Revised definition of gene and flow of genetic
information
gene (transcription unit / cluster)
transcription
primary transcript
splicing
exons introns
networking
processing
assembly
mRNA or ncRNA
snoRNAs microRNAs
protein
other functions
catalytic functions structural roles
signal transduction and regulation of gene
expression
53
Why has this system gone unnoticed?
  • Intellectually unprepared
  • Genetically subtle
  • Biochemically invisible

54
CONCLUSIONS
  • The majority of the genome of complex organisms
    likely consists of an expanded regulatory
    architecture primarily transacted via digital RNA
    signals. This is a parallel processing system, a
    hidden layer, which is the primary endogenous
    control system that directs differentiation and
    development.
  • The majority of the genome of complex organisms
    is likely to be functional and under evolutionary
    selection, both positive and negative. Variation
    in noncoding sequences almost certainly accounts
    for most of the phenotypic variation between
    species and individuals, including quantitative
    trait variation and susceptibility to complex
    diseases, and will be disturbed in aberrant
    developmental states, particularly cancer.
  • Accelerating regulatory requirements place
    complexity limits on all integrated systems.
    Understanding the genomic programming of humans
    and other complex organisms may transform
    information science and technology, including
    complexity theory, network theory, and the design
    of autopoietic (self-assembling) systems.
Write a Comment
User Comments (0)
About PowerShow.com