Title: A polyphasic approach for identification of bacteria: the Burkholderia cepacia example
1Microbial genomics
Tom Coenye
2- Overview
- The sequencing of prokaryotic genomes
- Analysis of prokaryotic genome sequences
- Diversity of prokaryotic genome sequences
- Use of genome sequences in biodiversity studies
- Case studies
3- The sequencing of prokaryotic genomes
- Growing the organism DNA isolation
- Approach 1 map-based cloning sequencing
- Approach 2 shotgun sequencing
- Annotation
4- The sequencing of prokaryotic genomes
- Growing the organism DNA isolation
- Isolating DNA is normally not a problem
- Biggest challenge isolating DNA from
unculturable organisms - eg. Blochmannia floridanus abdomens of 100
Camponotus floridanus ants were crushed, cell
debris was removed by filtration, bacterial
cells were collected and treated with Dnase I to
remove ant DNA, then normal DNA preparation
5Approach 1 map-based cloning sequencing
Order of fragments is known before start of
sequencing
6Approach 2 shotgun sequencing Order of
fragments is not known before start of sequencing
7Approach 2 shotgun sequencing
8Approach 2 shotgun sequencing
9Approach 2 shotgun sequencing
10Approach 2 shotgun sequencing
11- Annotation the search for genes
- Identification of genes is of utmost importance
but not straight forward - rRNA genes trough similarity with known rRNA
genes - tRNA genes trough similarity with known tRNA
genes and/or use of statistical models (eg HMMs)
- Protein-coding genes computer models
12(No Transcript)
13- Annotation the search for genes
- Protein-coding genes computer models
SD start
stop ...AGGA........ATG(XXXXXX)nTAA
P
14- Annotation the search for genes
- Biological confirmation remains necessary!
- E.g. over-annotation is a common problem
Skovgaard et al., 2001
15- Sequencing centers
- Sanger Centre (UK) has sequenced more than 2
billion bases between 4 October 1993 and now - Equipment includes 250 ABI 3700 and MEGABACE
capillairy sequencers
16Evolution of sequencing
100000000
90000000
80000000
70000000
60000000
50000000
40000000
30000000
20000000
10000000
0
77
95
03
97
17Evolution of sequencing
45,861,651,747 bases in 31,904,910 records
18- Analysis of prokaryotic genomes
- In silico analysis
- Microarrays
- Subtractive hybridisation
- Proteomics
- Making knock-out mutants
19- In silico analysis of prokaryotic genomes
- Comparison of genome sequences on the computer,
using specialised bioinformatics tools - Comparison of gene content
- Alignment of genes for phylogenetic purposes
- Looking for gene duplications
- ...
20(No Transcript)
21- Microarrays
- DNA Microarray miniaturised version of a
dot-blot hybridisation - Immobilised DNA is being hybridised with labelled
probe DNA or RNA - High throughput (up to 300000 spots/array)
22- Microarrays
- Different technologies
- PCR product arrays (amplification of ORFs
spotting or in situ PCR) - Oligonucleotide arrays (immobilisation of
pre-synthetised oligos or in situ synthesis)
23E.g. XeoChips
24- Microarrays
- Study of gene expression (are these ORFs
expressed under certain conditions?) - Comparative genomics
- Comparing gene content
- All genes or subset (eg. virulence genes)
- Allows to include organisms in comparative
genomics research without the availability of a
complete genome sequence
25- Microarrays
- Eg identification of new genes in Klebsiella
pneumoniae by means of an E. coli microarray
26Microarrays
Green E. coli specific Red K. pneumoniae
specific Yellow common genes
27Subtractive hybridisation PCR-based way to
isolate sequences present in genome 1 but not in
genome 2
28Subtractive hybridisation
29- Proteomics
- Study of the expressed protein-complement of the
genome, expressed by specific cells at a
specific moment under specific conditions - Typical approach separation of proteins by
2D-gel- electrophoresis (IEF SDS-PAGE) and
identification of differentially expressed
proteins by means of mass- spectrometry
30Proteomics
Vb. induction of heat-shock proteins by
elevated temperature in Bradyrhizobium japonicum
(Munchbach et al., 1999)
31- Knock-out mutants
- Final proof for biological function of gene
product - Time consuming
- Analysis not straight forward, for example
- Duplicated genes
- No phenotype
- ...
32- Diversity of prokaryotic genomes
- Which organisms have been sequenced ?
- Genome size and organisation
- Number of genes, functional groups
- Visualisation of properties of prokaryotic genomes
33Which organisms have been sequenced
? http//igweb.integratedgenomics.com/GOLD/
34Which organisms have been sequenced
? http//www.ncbi.nlm.nih.gov/genomes/MICROBES/Com
plete.html http//www.tigr.org http//www.jgi.d
oe.gov/JGI_microbial/html/index.html http//www.s
anger.ac.uk/
35Which organisms have been sequenced ?
Finished In progress Total Bacteria 128
391 519 Archaea 17 22
39 Total 145 413 558
36- Which organisms have been sequenced ?
- Available genomes are NOT representative of total
biodiversity - Emphasis on
- Medically important organisms (humans and
animals) - Organisms with biotechnological applications
37- Which organisms have been sequenced ?
- Example 1 Escherichia coli
- 4 completed genomes available
- 7 additional genomes in progress
- Strains K12 and O157 have been sequenced twice
- Example 2 Agrobacterium tumefaciens C58 has
been sequenced twice
38- Genome size and organisation
- Genome size of 580074 bp (Mycoplasma genitalium)
to 9105828 bp (Bradyrhizobium japonicum)
39- Genome size and organisation
- Genome organisation
- 1 circular chromosome (eg. Escherichia coli 4.6
5.4 Mbp) - Multiple circular chromosomes (eg. Ralstonia
solanacearum 3.7 Mbp en 2.1 Mbp Burkholderia
cenocepacia 3.8 Mbp, 3.2 Mbp en 0.9 Mbp) - 1 linear chromosome (eg. Borrelia burgdorferi 0.9
Mbp) - 1 linear and 1 circular chromosome (eg.
Agrobacterium tumefaciens 2.8 en 2.1 Mbp)
40- Genome size and organisation
- Plasmids can also be present
- Borrelia burgdorferi 1 linear chromosome (0.91
Mbp) - Also contains 21 circular plasmids (total 0.61
Mbp)
41- Genome size and organisation
- GC 22.5 (Wigglesworthia brevipalpis) to
72.1 (Streptomyces coelicolor)
42- Genome size and organisation
- Number of genes 480 (Mycoplasma genitalium) to
8317 (Bradyrhizobium japonicum)
43- Genome size and organisation
- Number of genes 480 (Mycoplasma genitalium) to
8317 (Bradyrhizobium japonicum) - Coding density 75 (Rickettsia prowazekii) to
94 (Campylobacter jejuni) - Very compact when compared to eukaryote genomes
(eg Fugu rubripes only 25 of genome is
coding)
44- Relationship genome size/number of genes
- (Konstatinidis Tiedje, 2003)
- Linear relationship between genome size and
number of genes (r20.97) - Linear relationship between genome size and
coding density (r20.72)
Larger genomes have more genes but not more junk
DNA !
45- Relationship genome size/number of genes
- (Konstatinidis Tiedje, 2003)
- Regulatory genes (signal transduction and
transcriptional control) overrepresented in
larger genomes - Genes involved in metabolism and transport also
overrepresented in larger genomes - Gene involved in DNA metabolism are
underrepresented in larger genomes
46Visualisation
AGACCGAAATTTACGCACCTGTGGACAATCTGGGGAGAATTTTGAACAGT
TCCGTCTTATTCCAGTAAT TCACAGGCGTCTCGAAGACGAGAGACGCCA
CTTGCGGATTGTGGAAAAACACCACCTTATTCACCCCGCG
GCTCGGCCCGTCGGACAATTCAGAGATTTGTCCCGGTTTATCAACAGGGG
GAGAAAAACAGCGTGGAGAA CAAAAAAAGCTTCTTCCATCTGCACCTGA
TTTCGGACTCGACGGGAGAGACTCTGATGTCGGCCGGCCGC
GCCGTCTCGGCGCAGTTTCATACATCCATGCCGGTGGAACATGTCTATCC
GATGATCCGCAACCAGAAGC AGCTCGCGCAGGTCATCGATCTCATCGAC
AAGGAGCCCGGCATTGTTCTTTATACAATCGTTGATCAGCA
GCTGGCGGAATTCCTGGATCTGCGCTGCCATGCGATTGGCGTGCCCTGCG
TCAACGTTCTCGAACCGATC ATCGGCATTTTCCAGACCTATCTCGGCGC
GCCGTCCAGGCGGCGGGTGGGTGCGCAACACGCGCTGAATG
CCGATTATTTCGCGCGGATCGAAGCACTCAATTTCGCCATGGACCATGAT
GACGGGCAGATGCCGGAGAC CTATGACGATGCGGATGTCGTCATCATCG
GCATCAGCCGCACGTCGAAAACACCAACCAGCATCTATCTT
GCTAACAGGGGCATAAAGACTGCCAATATCCCGGTCGTTCCCAATGTGCC
TTTGCCCGAAAGCCTATATG CCGCGACCCGGCCGTTGATCGTCGGTCTC
GTCGCGACATCGGATCGCATATCGCAGGTTCGTGAGAACAG
GGATCTGGGTACAACCGGCGGATTTGACGGTGGCCGTTACACGGATCGCG
CCACCATCATGGAAGAGCTG AAATATGCGCGTGCGCTCTGCGCCCGCAA
CAATTGGCCGCTGATCGACGTCACACGCCGTTCCATCGAGG
AAACGGCCGCGGCGATCCTTGCCCTGCGCCCGAGGACGCGATAATCCGAA
TCGCATCATCAGGAGCAGAC AGTCGATGAAACAAGAGTTGATCCTCGCC
TCATCCAGCGCATCCCGGCAGATGCTGATGCGCAATGCGGG
GCTGACATTTTCGGCAATACCCGCGGATATTGATGAGCGTGCGCTTGATG
AGCAACTGGAACGGGACGGC GCCAGCCCCGAAGAGGTTGCGCTGGAACT
TGCGCGGGCGAAGGCTCTTGCAGTCAGTGCGCTCCATCCAG
AAGCACTGGTTCTTGGCTGCGACCAGACCATGGCGCTCGGCACACGCGTT
TATCACAAGCCAAAAAACAT GGCGGAAGCCGCGACGCATCTGCTGTCGT
TGTCCGGCAAGGTCCACCGCCTGAACAGCGCGGCTGTTCTC
GTTCACAACGGAAAGGTGGTGTGGCAGACCGTTTCCAGTGCAGAGCTTGC
CGTTCGAACCTTGAGCGCTG AGTTTGTGTCCCGCCACCTGCAGCGGGTG
GGAGAAAAGGCGCTCAGCAGCGTCGGCGCTTACCAGCTTGA
GAGGGAAGGAATCCAGCTATTCACCTCCATAGAGGGGGATTATTTCACGA
TCCTCGGTTTGCCGCTTCTG CCTCTTTTATCGAAACTACGCGACATGGA
TGTCATCGATGGCTGATTCACGTGAAACATTAACTATAAAT
GCCTTCGTTGTCGGTTACCCGATCAAACATTCCCGGTCGCCGATCATCCA
TTCCTATTGGCTGAAAAAAT TCGGTATCGCCGGTTCCTATACGGCAGTT
GAGGTCTCCCCAGACGATTTCCCGAAGTTCATTGCAACGCT
GAAGGAAGGCAAGCCGGGTGCAGCGGTGGGCGGTAACGCCACCATTCCGC
ACAAGGAAGCGGCTTACCGG TTGGCCGATCATCCCGATGCCTTGGCGGA
AGAACTCGGCGCCGCCAACACCATCTGGATGGAGGAGGGTA
AACTCCACGCGACCAACACGGATGGTTACGGTTTCGTCTCGAACCTGGAC
GAGCGGCATCCGGGCTGGGA TAAGACCCAGCGCGCGGTGGTGTTCGGCG
CCGGCGGTGCAAGCCGGGCCGTCATTCAGTCGCTGCGTGAT
CGGGATGTTGCGGAAATTCACGTCGTGAACCGTACGGTCGAGCGCGCTCG
CGAACTGGCCGACCGCTTTG GCCCACGGGTCTTTTCCCATCCCCAGGCA
GCGCTTCAGGAGGTCATGCACGGCGCGGGGTTGTTCGTGAA
47- Visualisation
- Genome Atlas (http//www.cbs.dtu.dk/services/Genom
eAtlas/) - TIGR CMR (http//www.tigr.org/tigr-scripts/CMR2/ch
oose_genome.spl) - Artemis (http//www.sanger.ac.uk/Software/Artemis/
) - ACT (http//www.sanger.ac.uk/Software/ACT/)
- Apollo
- SeqVISTA
- EMBOSS, JEMBOSS, EMBASSY, ...
48(No Transcript)
49(No Transcript)
50(No Transcript)
51(No Transcript)
52Novel approaches in taxonomy based on
whole-genome sequences
- Supertree approach
- Presence/abscence of characteristics
(genes/gene families/protein folds/conserved
indels) - Differences in gene content
- Gene order / synteny
- Differences in nucleotide composition
53Supertrees
- Availability of genome sequences allows to
determine which genes are shared between a
number of organisms - These genes can then be used for phylogenetic
analysis - Analysis of protein-coding genes can deliver
important additional information compared to 16S
rRNA gene sequences - Combined alignments of conserved proteins
54Presence/abscence analysis
55Presence/abscence analysis
56Presence/abscence analysis
- Protein folds (b-propeller, TIM-barrel,
Zn-b-lactamases, ...)
- Conserved insertions and deletions (indels)
57Presence/abscence analysis
glmU (UDP-N-acetylglucosamine pyrophosphorylase)
shows a 17 aa insertion in Archae and Chlamydiae
58Presence/abscence analysis
59Differences in gene content and gene order
- Simple approach consider 2 genomes as bags
filled with genes and compare content of both - Extension if the same genes are present, are
they present in the same order?
60Differences in gene content and gene order
- Synteny less conserved than gene content
- Syntenie lt shared genes lt identity between shared
genes
61Differences in nucleotide composition
- GC (A, C, G, T)
- Relative abundance of di/tri/tetra nucleotides
(AA, AC, AG, AT, CA, ...) - Mathematical
- rXY fXY/fXfY (X, Y A, C, G, T XY AA,
AC, AG, AT, ..., TT) - d(f,g) 1/16 S r(f) r(g) (within species
lt 20) - Similar statistics apply to higher-order
nucleotides - Codon statistics (codon usage, GC content of
synonymous 3rd position)
62Examples
- Genome-scale metabolic model of Helicobacter
pylori 26695. Schilling et al., J. Bact.
1844582-4593 (2002) - Phylogenetic position of the Aquificales
- Comparative analysis of the genome sequences of
Bordetella pertussis, Bordetella parapertussis
and Bordetella bronchiseptica. Parkhill et al.
Nature Genetics (2003) (can be downloaded at
http//www.sanger.ac.uk/Projects/B_pertussis/Borde
tella_genomes.pdf)
63- Goal
- Thorough study of the metabolic potential of
Helicobacter pylori based on genome sequence - Calculate the growth demands
64(No Transcript)
65- Several key enzymes from glycolysis (PFK, PK) are
absent deficient in hexose metabolism - All pentose phosphate reactions are present
except G6P-DH oxidative branch of pentose
phosphate pathway incomplete - Entner-Doudoroff pathway is present C can be
transferred from pentose phosphate pathway to
G3P en pyruvate - No PEP carboxylase PEP can not be transformed
to oxaloacetate - PEP can not be transformed to TCA intermediate
- TCA cycle is complete but in a slightly modified
form
66(No Transcript)
67- Required components Ala, Arg, His, Ile, Leu,
Met, Phe, Val, thiamine, phosphate, oxygen and
sulphate/cysteine - No requirement for purines (de novo synthesis is
possible) - Ala and Arg are important C sources - use of AA
as C sources is coupled to production of ammonia - 14 alternative C sources were identified
- Oxygen necessary for production of NAD, NADP,
CTP, UTP, dCTP, dUTP necessary because there
is need for an e- acceptor to allow oxidation of
FADH to FAD
68- 9 essential AA for humans, 8 for H. pylori 6 in
common - Strategy of metabolic design seems ideal and
could be coupled to co-evolution of this
organism to its host (ie humans) - Elimination of genes required for energetically
not favoured AA biosynthesis (proteolysis in
stomach releases unlimited supply of AA) - Use of AA as C source results in production of AA
what helps to neutralise the acid pH in the
stomach
69 A genomic perspective on the taxonomic
position of Aquifex aeolicus
- Genus Aquifex marine, hyperthermophilic,
microaerophilic, hydrogen- oxidising, recovered
from marine sediments near Iceland - Based on 16S rRNA sequence data one of the
deepest branching lineages - But, considerable debate regarding its true
taxonomic postion - Deep (type of ribosomes, EF-G and EF-Tu
sequences, ribosomal proteins, rpoB and rpoC
sequences) - Not deep but close to e-Proteobacteria
(Helicobacter, Campylobacter, Wolinella)(cytochro
me bc, four amino-acid insert in
ala-tRNA- synthetase, ultrastructure)
7016S rRNA gene
71Differences in gene content
72Dinucleotide relative abundance
73The supertree
74Conclusions (?)
- Coming to a consistent picture is not
straightforward - No particular relationship between Aquifex and
e-Proteobacteria - Aquifex can be considered as a primitive species
with primitive genes