Comparative genomics and Target discovery - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

Comparative genomics and Target discovery

Description:

What are the implications of comparative genomics to target ... Poplar, gibbon, platypus, Drosophila species, variety of pathogenic fungi and bacteria, etc. ... – PowerPoint PPT presentation

Number of Views:211
Avg rating:3.0/5.0
Slides: 36
Provided by: maartensol
Category:

less

Transcript and Presenter's Notes

Title: Comparative genomics and Target discovery


1
Comparative genomics and Target discovery
  • Maarten Sollewijn Gelpke
  • MDI, Organon

2
  • What is comparative genomics?
  • What can we learn from comparative genomics?
  • What is target discovery?
  • What are the implications of comparative genomics
    to target discovery?
  • What issues in target discovery can be addressed
    by comparative genomics?

3
Overview
  • Introduction to genomes and sequencing.
  • Comparative genomics aspects.
  • Phylogenomics concepts.
  • Examples of comparative genomics.

4
Sequence availability
  • Availability of gene and protein sequences has
    increased enormously in during the last 2
    decades.
  • Current capacity of the main sequencing centers
    is gt3Gb per month per centre.
  • This will increase again dramatically with the
    development of new superfast sequencing
    techniques.

Currently gt 100Gbases
5
Genomes sequenced
A.thaliana
First bacterial genomes sequenced H.influenzae
and M.genitalium
The yeast genome
Human draft
  • Human finished
  • Rat
  • Chicken

E.coli K12
Full sequence of chr. 22
1999
Xenopus Zebrafish
C.elegans
Chimpanzee
D.melanogaster Genome Chr. 21
6
Genome sequencing
Evolutionary relationship between metazoans
(multicellular animals) that have been sequenced
or are due for sequencing.
7
Genome sequencing
  • BAC fingerprinting ? shotgun approach
  • Accurate but laborious!
  • Shotgun sequencing (WGS)

Bac
Clone
Bac
Clone
Whole Genome 30Mb 3Gb
100
-
200 kb
100
-
200 kb
Assembly
Finishing
8
Genome sequencing
  • Current state of sequenced organisms
  • gt316 Prokaryotes
  • gt27 Archae
  • gt280 Eukaryotes (complete, in assembly or in
    progress)
  • gt1600 Viruses and gt 500 mitochondria/chloroplasts
  • Some ongoing genome sequencing projects
  • Poplar, gibbon, platypus, Drosophila species,
    variety of pathogenic fungi and bacteria, etc.
  • Meta-genomic projects on environmental samples
    (soil, deep-sea, waste sites)

9
Future of genome sequencing?
  • New complete genomes.
  • New low-redundancy genomes.
  • New (low-redundancy) genome areas.
  • Meta-genomics. Sequencing of microbial
    communities.
  • Sequencing of extinct species.
  • 40000 year old Cave bear 26k, 21 genes.
  • 45000 year old Neanderthaler 75k ? diverged from
    human lineage 315000 years ago

10
Comparative genomics
  • Discover what lies hidden in genomic sequence by
    comparing sequence information.
  • Main areas
  • Whole genome alignment
  • Gene prediction
  • Regulatory element prediction
  • Phylogenomics
  • Pharmacogenetics
  • Affected by evolutionary aspects
  • Mutational forces (introduce random mutations)
  • Selection pressures
  • Ratio of non-synonymous to synonymous
    substitutions
  • Mutation rates lower or higher than neutral

11
Comparing sequences, methods.
  • Pairwise comparison of sequences (alignments)
  • proteins or genes
  • variety of local alignment tools like BLAST,
    Smith-Waterman etc.
  • multiple sequence comparisons (ClustalW, Muscle
    etc.)
  • results may be dependent on alignment settings

12
Comparing sequences, methods.
  • Whole genome comparisons
  • Large stretches of sequence
  • Divergence up to 450Mya (fugu-human) with
    sufficient similarity remaining.
  • BLAT, BLASTZ, Phusion/BlastN
  • Seeding strategy ? alignment extension ? gapped
    alignments

13
Whole genome comparison
  • Conservation of synteny!
  • Cross-reference of any genetic traits (diseases!)
    from one organism (eg mouse) to genes in the
    syntenic regions in the other organism (eg
    human).
  • Genome expansion and contraction
  • Genome duplications, segmental duplications
    important mechanism for generating new genes.
  • (GC) content, CpG islands
  • Reflect different mutational or DNA repair
    processes?
  • Repeats
  • Transposable elements are a main force in
    reshaping genomes. TEs (or remainders thereof)
    can be used to measure evolutionary forces acting
    on the genome.
  • Neutral mutation rate.

14
Gene prediction
  • Comparing sequences has contributed enormously to
    the accuracy of gene prediction.
  • Evidence based method.
  • Use cDNAs, ESTs and proteins from various
    organisms.
  • Apply gene feature rules.

15
Gene prediction
  • De novo methods.
  • Alignment of genomic sequences
  • Splicing rules and other gene features
  • De novo gene prediction by comparing sequences
    attempts to model a negative selection of
    mutations. Areas with less mutations are
    conserved because the mutations where detrimental
    for the organism.
  • Prediction of similar proteins in both genomes.

Newly predicted protein in mouse and human,
similar to the disease related gene dystrophin.
16
Regulatory element prediction
  • The complexity of higher eukaryotes and their
    relatively low number of genes can be explained
    partially through the importance of
    transcriptional regulation.
  • Identification of REs will have an extensive
    impact in understanding gene expression patterns
    (expression intensity, tissue specificity),
    relations within expression patterns and
    inferring biological systems or networks.

17
Regulatory element prediction
  • No formal models for regulatory motifs
  • Attempt to find conserved regions or motifs based
    on the global alignment of similar sequences of
    different organisms (phylogenetic footprinting).
  • Which species to compare? Evolutionary distance?
  • What regions around gene models to investigate?
    5 and 3 flanking regions, introns?
  • Take expression patterns into account?
  • How does evolution affect REs?

18
Phylogenomics
  • Comparison of genes and gene products across a
    number of species (whole genomes), characterizing
    homologues and gain insights in the evolutionary
    process itself.
  • Pharmacophylogenomics is the use of phylogenomics
    in aid of drug discovery, through improved target
    selection and validation.

19
Orthology and paralogy
Phylogenetic tree of gene X
  • Orthologs genes in different species that arose
    from a single gene in the most recent common
    ancestor, by speciation.
  • Paralogs genes in the same species that arose
    from a single gene in a ancestral species, by a
    process of gene duplication.

20
Target orthology
  • Species differences frequently affect progression
    of targets and compounds. Orthology maps in
    combination with expression studies may explain
    these differences.
  • Establishing orthology
  • Reciprocal highest scoring Blast hit.
  • Conservation of synteny.
  • Gene loss or rate of evolution issues.
  • Orthology does not guarantee common function
    (functional shift).
  • Extensive sequence divergence
  • High non-synonymous over synonymous nucleotide
    substitution ratios.
  • Comparison of regulatory regions?

21
Target paralogy
  • Key insights in large pharmacologically relevant
    families (NRs, GPCRs) can be gained from paralogy
    analysis.
  • Paralogy is inter-related with several other gene
    to function occurrences that can seriously affect
    the suitability of genes as drug targets

Schematic representation of various mappings of
genes to functions.
22
  • Pleiotropy
  • Suggested to precede paralogy
  • Relaxed substrate or ligand specificity
  • Multiple protein domains
  • Tissue or cellular localization
  • Redundancy
  • Total or partial redundancy of function
  • Directly linked to paralogy
  • Robustness against gene knock-outs (target
    validation)
  • PPAR-d / PPAR-a in skeletal muscle PXR / FXR in
    bile acid signaling dopamine transporters /
    serotonin transporters in adjacent neurons.

23
  • Heteromery
  • Formation of heteromers between paralogs
  • Known examples in major classes of drug targets
  • GPCRs GABAß receptors
  • NRs formation of heterodimers with retinoid X
    rexeptors (RXR)
  • Ion channels
  • Crosstalk
  • Combination of pleiotropy and redundancy
  • May be regulated in time and space (expression
    and localization)
  • Action of cytokines (interleukins) on immune cell
    types.

24
  • Alternative transcription
  • Intermediate between paralogy and pleiotropy.
    paralogy in place
  • Increases effective size of the genome (estimated
    gt30 of human genes show alternative
    transcription!)

25
Effects on drug discovery
  • Functional shifts, pleiotropy and redundancy
    potentially have good or bad news for drug
    discovery.
  • Functional shifts
  • Misleading or unavailable animal model
  • Animal toxicity irrelevant for humans
  • Pleiotropy
  • Unintended drug effects
  • Opportunities for multiple indications
  • Redundancy
  • Disease resistant to treatment (multi-functionalit
    y)
  • Highly selective treatment for complex diseases.

26
Pharmacogenetics
  • Within species comparative genomics
  • Single Nucleotide Polymorphisms SNPs
  • Current focus in coding regions, expected to
    expand to sites of transcription regulation.
  • Determine the site of a SNP and the allele
    frequencies from ethnic or multi-ethnic panels of
    individuals (eg 100)
  • Pharmacogenetics (PGx) relate SNP information to
    efficacy and safety issues during the drug
    development process.
  • Efficacy PGx Select/predict drug responders,
    increase confidence in a certain drug in
    development.
  • Safety PGx Identification of individuals with
    adverse effects to a drug

27
Examples
  • New genes and REs from yeast genomes.
  • Multi species comparisons from targeted genomic
    regions.
  • Comparative genomics at the vertebrate extremes.
  • Pharmacogenetics in drug efficacy

28
Comparison of yeast species to identify genes and
regulatory elements. (Kellis et al, Nature 2003)
  • Saccharomyces cerevisiae and 3 related species
  • 7x coverage WGS of each species
  • Assembly of draft genome sequence
  • S.cerevisiae genome aligned to others using ORFs
    as seeds
  • Most ORFs have 11 matches. Considerable
    conserved synteny.
  • Most genomic rearrangements clustered in
    telomeric regions.
  • Local gene family expansion/contraction, creating
    phenotypic diversity over evolutionary time.
  • Balance between conservation and divergence
    allows for accurate gene identification and
    recognition of REs as well!

29
Identification of genes
  • Original S.cerevisiae genome (1996) 6275 ORFs
  • Re-analysis and other evidence (2002) 6062 ORFs
  • This study validates all ORFs using a reading
    frame conservation score (very sensitive).
  • 5538 ORFs, 20 unresolved, 504 rejected ORFs!
  • In addition to gene recognition, also largely
    improved gene structure definitions (start, stop,
    intron).

30
Identification of regulatory elements
  • REs are difficult to identify
  • Short (6-15bp), sequence variation, few known
    rules
  • De novo discovery of REs directly from genomic
    sequence.
  • Develop a motif conservation score system based
    on known motifs
  • 78 motifs discovered, overlapping with 36 of 55
    known motifs
  • Putative annotation of motifs using adjacent
    genes. (GO)
  • 25 of 42 new motifs show high category annotation
    correlation
  • Discovery of combinatorial control of Res
  • Applications to human genome?
  • Increase number of species in comparison to
    enrich the low signal to noise ratio in humans.

31
Multi species comparisons from targeted genomic
regions. (Thomas et al, Nature 2003)
  • Comparing targeted regions areas in multiple
    evolutionary diverse vertebrates (less probable
    for conservation to occur by chance)
  • ENCODE project
  • 44 genomic regions (14 manually selected of
    which some disease related, 30 random) of diverse
    gene density and non-exonic conservation
  • primates, bat, alligator, elephant, cat, emu,
    leopard, salmon etc.
  • Initial analysis 1.8 Mb on chromosome 7
    containing 10 genes, including CFTR, from 12
    species.
  • Detection of 1000 multi-species conserved
    sequences of which gt60 would not be detected by
    a 2 species comparison.

32
Comparative genomics at the vertebrate extremes
(Bofelli et al, Nature 2004)
  • What can be learned from comparisons of genomes
    that are distant or closely related in evolution?
  • Distant comparisons reveal the most constrained
    sequence elements.
  • Most of the conserved human-fish non-coding
    sequences are found near genes with roles in
    embryonic development.
  • Mutations can have an important role in human
    disease

33
  • Human-Fugu conservation of non-coding sequence in
    the DACH gene area (development of brain, limbs,
    sensory organs).
  • Validation of identified enhancer regions by
    driving expression of a reporter in mouse embryos.

34
Comparative genomics at the vertebrate extremes
  • Intraspecies sequence comparisons allow
    identification of species specific sequences
  • Phylogenetic shadowing
  • Requires high rate of polymorphism
  • Comparison among primates show human specific
    sequences
  • Analysis of regulatory sequence of ApoA (involved
    in human heart disease)

A. Mutation rate analysis of Ciona intestinalis
5 region of the forkhead gene. B. Validation
of identified potential regulatory elements in
Ciona larvae.
35
Pharmacogenetics in drug efficacy
Efficacy PGx for an obesity drug. Compare
genotypes 1-1, 1-2 and 2-2
Write a Comment
User Comments (0)
About PowerShow.com