Comparative genomics: functional characterization of new genes and regulatory interactions using computer analysis - PowerPoint PPT Presentation

About This Presentation
Title:

Comparative genomics: functional characterization of new genes and regulatory interactions using computer analysis

Description:

The metabolic map, the bird's view. Metabolic pathways, the eagle's view ... Identification of the candidate regulator by the analysis of phyletic patterns ... – PowerPoint PPT presentation

Number of Views:139
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: Comparative genomics: functional characterization of new genes and regulatory interactions using computer analysis


1
Comparative genomics functional
characterization of new genes and regulatory
interactions using computer analysis
  • Mikhail Gelfand
  • Institute for Information Transmission
    Problems(The Kharkevich Institute), RAS
  • Workshop at the Landau Instiute of Theoretical
    Physics, RAS
  • September 27-28, 2007, Moscow

2
The genome is decyphered!
3
Is it?
  • To intercept a message does not mean to
    understand it

4
Fragment of a genome (0.1 of E. coli)
A typical bacterial genome several million
nucleotides 600 through 9,000 genes (90 of
the genome encodes proteins)
5
Propaganda
sequences in GenBank (genes)
articles in PubMed (experiments)
6
More propaganda
  • Most genes will never be studied in experiment
  • Even in E.coli only 20-30 new genes per year
    (hundreds are still uncharacterized)
  • Universally missing genes not a single known
    gene even for 10 reactions of the central
    metabolism. No genes for gt40 reactions overall.
  • Conserved hypothetical genes (5-15 of any
    bacterial genome) essential, but unknown
    function.

7
The local goal to characterize the genes
  • What?
  • function (rather, role)
  • When?
  • regulation (conditions)
  • gene expression
  • lifetime (mRNA, protein)
  • Where?
  • Localization
  • Cellular/membrane/secreted
  • How?
  • Mechanism of action
  • Specificity, regulation (biochemistry)

8
Propaganda-2 complete genomes
2007 gt 1200 bacterial genomes
9
The global goal
  • to predict the organisms properties given its
    genome
  • (plus some additional information, e.g. the
    initial state after cell division)
  • and to understand the evolution of
    genomes/organisms

10
Haemophilus influenzae, 1995
11
Vibrio cholerae, 2000
12
The metabolic map, the birds view
13
Metabolic pathways, the eagles view
14
A submap (metabolism of arginine and proline)
15
Approaches
  • Similarity gt homology (common origin)
  • Homology gt common function
  • The Pearson Principle (after Karl
    Pearson)important features are conserved
  • functional sites in proteins
  • regulatory (protein-binding) sites in DNA
  • not necessarily sequences
  • structure of protein and RNA
  • gene localization on chromosomes
  • co-expression of genes
  • Allows one to annotate 50-75 of genes in a
    bacterial genome
  • Necessary first step, may be automated (to some
    extent)

16
but not so simple
  • Similarity ? homology
  • Low complexity regions, unstructured domains,
    transmembrane segments and other regions with
    non-strandard amino acid composition
  • The need for correct similarity measures
  • Does homology always follow from the structural
    similarity?
  • What is structural similarity?How can it be
    measured?
  • Convergent evolution of structures?Independent
    emergence of folds?
  • Homology ? same function
  • What is the same function?
  • Biochemical details and cellular role

17
The Fermi principle
  • (after Enrico Fermi)
  • Purely homology-based annotation boring (nothing
    radically new)
  • It turns out, one can predict something
    completely new
  • Comparative genomics

18
Positional clustering
  • Genes that are located in immediate proximity
    tend to be involved in the same metabolic pathway
    or functional subsystem
  • caused by operon structure, but not only
  • horizontal transfer of loci containing several
    functionally linked operons
  • compartmentalisation of products in the cytoplasm
  • very weak evidence
  • stronger if observed in may unrelated genomes
  • May be measured
  • e.g. the STRING database/server (P.Bork, EMBL)
  • and other sources

19
STRING trpB positional clusters
20
Functionally dependent genes tend to cluster on
chromosomes in many different organisms
Vertical axis number of gene pairs with
association score exceeding a threshold. Control
same graph, random re-labeling of vertices
21
More genomes (stronger links) gt highly
significant clustering
22
Fusions
  • If two (or more) proteins form a single
    multidomain protein in some organism, they all
    are likely to be tightly functionally related
  • Very useful for the analysis of eukaryotes
  • Sometimes useful for the analysis of prokaryotes

23
STRING trpB fusions
24
Phyletic patterns
  • Functionally linked genes tend to occur together
  • Enzymes with the same function (isozymes) have
    complementary phyletic profiles

25
STRING trpB co-occurrence (phyletic patterns)
26
Phyletic patterns in the Phe/Tyr pathway
shikimate kinase
27
Archaeal shikimate-kinase
Chorismate biosynthesis pathway (E. coli)
28
Arithmetics of phyletic patterns
Shikimate dehydrogenase (EC 1.1.1.25) AroE
COG0169 aompkzyqvdrlbcefghsnuj-i--
5-enolpyruvylshikimate 3-phosphate synthase (EC
2.5.1.19) AroA COG0128 aompkzyqvdrlbcefghsnuj-
i--
Chorismate synthase (EC 2.5.1.19)
AroC COG0082 aompkzyqvdrlbcefghsnuj-i--
29
Distribution of association scores monotonic
for subunits,bimodal for isozymes
30
Comparative analysis of regulation
  • Phylogenetic footprinting regulatory sites are
    more conserved than non-coding regions in general
    and are often seen as conserved islands in
    alignments of gene upstream regions
  • Consistency filtering regulons (sets of
    co-regulated genes) are conserved gt
  • true sites occur upstream of orthologous genes
  • false sites are scattered at random

31
Riboflavin (vitamin B2) biosynthesis pathway
32
5 UTR regions of riboflavin genes from bacteria
33
Conserved secondary structure of the RFN-element
Capitals invariant (absolutely conserved)
positions. Lower case letters strongly
conserved positions. Dashes and stars
obligatory and facultative base pairs Degenerate
positions R A or G Y C or U
K G or U B not A V not U.
N any nucleotide. X any
nucleotide or deletion
34
RFN the mechanism of regulation
  • Transcription attenuation
  • Translation attenuation

35
Early observation an uncharacterized gene (ypaA)
with an upstream RFN element
36
Phylogenetic tree of RFN-elements (regulation of
riboflavin biosynthesis)
no riboflavin biosynthesis
duplications
no riboflavin biosynthesis
37
YpaA a.k.a. RibU riboflavin transporterin
Gram-positive bacteria
  • 5 predicted transmembrane segments gt a
    transporter
  • Upstream RFN element (likely co-regulation with
    riboflavin genes) gt transport of riboflaving or
    a precursor
  • S. pyogenes, E. faecalis, Listeria sp. ypaA, no
    riboflavin pathway gt transport of riboflavin
  • Prediction YpaA is riboflavin transporter
    (Gelfand et al., 1999)
  • Validation
  • YpaA transports flavines (riboflavin, FMN, FAD)
    by genetic analysis (Kreneva et al., 2000) by
    direct measurement (Burgess et al., 2006 Vogl et
    al., 2007 )
  • ypaA is regulated by riboflavin by microarray
    expression study (Lee et al., 2001)
  • via attenuation of transcription (and to some
    extent inhibition of translaition) (Winkler et
    al., 2003)

38
Conserved structures of riboswitches (circled
X-ray)
39
Mechanisms
gcvT ribozyme, cleaves its mRNA (the Breaker
group)THI-box in plants inhibition of splicing
(the Breaker and Hanamoto groups)
40
Characterized riboswitches (more are predicted)
RFN Riboflavin biosynthesis and transport FMN (flavin mononucleotide) Bacillus/Clostridium group, proteobacteria, actinobacteria, other bacteria
THI Biosynthesis and transport of thiamin and related compounds TPP (thiamin pyrophosphate) Bacillus/Clostridium group, proteobacteria, actinobacteria, cyanobacteria, other bacteria, archea (thermoplasmas), plants, fungi
B12 Biosynthesis of cobalamine, transport of cobalt, cobalamin-dependent enzymes Coenzyme B12 (adenosyl-cobalamin) Bacillus/Clostridium group, proteobacteria, actinobacteria, cyanobacteria, spirochaetes, other bacteria
S-boxSAM-IISAM-III Metabolism of methionine and cystein SAM (S-adenosyl- methionine) Bacillus/Clostridium group and some other bacteriaSAM-II (alpha), SAM-III (Streptococci)
LYS Lysine metabolism lysine Bacillus/Clostridium group, enterobacteria, other bacteria
G-box Metabolism of purines purines Bacillus/Clostridium group and some other bacteria
glmS (ribozyme) Synthesis of glucosamine-6-phosphate glucosamine-6-phosphate Bacillus/Clostridium group
gcvT (tandem) Catabolism of glycine glycine Bacillus/Clostridium group
41
Properties of riboswitches
  • Direct binding of ligands
  • High conservation
  • Including unpaired regions tertiary
    interactions, ligand binding
  • Same structure different mechanisms
    transcription, translation, splicing, (RNA
    cleavage)
  • Distribution in all taxonomic groups
  • diverse bacteria
  • archaea thermoplasmas
  • eukaryotes plants and fungi
  • Correlation of the mechanism and taxonomy
  • attenuation of transcription (anti-anti-terminator
    ) Bacillus/Clostridium group
  • attenuation of translation (anti-anti-sequestor
    of translation initiation) proteobacteria
  • attenuation of translation (direct sequestor of
    translation initiation) actinobacteria
  • Evolution horizontal transfer, duplications,
    lineage-specific loss
  • Sometimes very narrow distribution evolution
    from scratch?

42
Conserved signal upstream of nrd genes
43
Identification of the candidate regulator by the
analysis of phyletic patterns
  • COG1327 the only COG with exactly the same
    phylogenetic pattern as the signal
  • large scale on the level of major taxa
  • small scale within major taxa
  • absent in small parasites among alpha- and
    gamma-proteobacteria
  • absent in Desulfovibrio spp. among
    delta-proteobacteria
  • absent in Nostoc sp. among cyanobacteria
  • absent in Oenococcus and Leuconostoc among
    Firmicutes
  • present only in Treponema denticola among four
    spirochetes

44
COG1327 Predicted transcriptional regulator,
consists of a Zn-ribbon and ATP-cone domains
regulator of the riboflavin pathway (RibX)?
45
Additional evidence co-localization
  • nrdR is sometimes clustered with nrd genes or
    with replication genes dnaB, dnaI, polA

46
Additional evidence co-regulated genes
  • In some genomes, candidate NrdR-binding sites are
    found upstream of other replication-related genes
  • dNTP salvage
  • topoisomerase I, replication initiator dnaA,
    chromosome partitioning, DNA helicase II

47
Multiple sites (nrd genes) FNR, DnaA, NrdR
48
Mode of regulation
  • Repressor (overlaps with promoters)
  • Co-operative binding
  • most sites occur in tandem (gt 90 cases)
  • the distance between the copies (centers of
    palindromes) equals an integer number of DNA
    turns
  • mainly (94) 30-33 bp, in 84 31-32 bp 3 turns
  • 21 bp (2 turns) in Vibrio spp.
  • 41-42 bp (4 turns) in some Firmicutes

49
Experimental validations
50
Acknowledgements
  • Dmitry Rodionov (comparative genomics)
  • Andrei Mironov (software)
  • Alexei Vitreschak (riboswitches)
  • Funding
  • Howard Hughes Medical Institute
  • Russian Foundation of Basic Research
  • RAS, program Molecular and Cellular Biology
  • INTAS
Write a Comment
User Comments (0)
About PowerShow.com