The gene family play and the chromosomal theater - PowerPoint PPT Presentation

1 / 58
About This Presentation
Title:

The gene family play and the chromosomal theater

Description:

labs.bio.unc.edu – PowerPoint PPT presentation

Number of Views:89
Avg rating:3.0/5.0
Slides: 59
Provided by: labsBioUn5
Learn more at: http://labs.bio.unc.edu
Category:

less

Transcript and Presenter's Notes

Title: The gene family play and the chromosomal theater


1
The gene family play and the chromosomal theater
  • Todd Vision
  • Department of Biology
  • University of North Carolina at Chapel Hill

2
Outline
  • Large-scale duplication and loss of genes in the
    angiosperms
  • Looking into the future of plant phylogenomics
  • A case study in gene family demography
  • Duplication and functional divergence

3
(No Transcript)
4
Arabidopsis as a hub for plant comparative maps
data from Arumuganathan Earle (1991)Plant Mol
Biol Rep 9208-218
5
Tomato-Arabidopsis synteny
Bancroft (2001) TIG 17, 89 after Ku et al (2000)
PNAS 97, 9121
6
Duplicated genes in Arabidopsis
7
Modes of gene duplication
  • Tandem (T)
  • unequal crossing-over
  • mostly young
  • Dispersed (D)
  • transposition
  • all ages
  • Segmental (S)
  • polyploidy
  • all old

8
Paleotetraploidy?
The Arabidopsis Genome Initiative. 2000. Nature
408796
9
Vision et al. (2000) Science 2902114-7.
10
Microsynteny within blocks
11
distribution of dA
in blocks
not in blocks
  • Problems
  • proteins diverge at different rates
  • high dA is difficult to estimate
  • Solution
  • average dA within blocks

12
discrete duplication events
13
the 2-4 complex(one ancestral segment broken up
by 4 large inversions)
14
coefficient of variation 0.67
coefficient of variation 0.53
15
Rice-Arabidopsis microsynteny
Mayer et al. (2001) Genome Res. 11, 1167
16
Blanc, Hokamp, Wolfe (2003) Genome Res. 13,
137-144.
17
(No Transcript)
18
Block 37 after Asterid-Rosid split
Block 57 before monocot-dicot divergence
Raes, Vandepoele, Saeys, Simillion, Van de Peer
(2003) J. Struct. Func. Genomics 3, 117-129
19
Divergence among duplicated genes in rice
Goff et al. (2002) Science 296 92
20
Hidden syntenies
Simillion, Vandepoele, Van Montagu, Zabeau, Van
de Peer (2002) PNAS 99, 13627
21
Interspecies comparison can reveal hidden
syntenies
Vandepoele, Simillion, Van de Peer (2002) TIG 18,
606-608
22
Comparative mapping in a phylogenetic context
23
Major plant genome datasets
  • Family Genus genome EST
    map
  • Aizoaceae Mesembryanthemum crystallinum
    X
  • Brassicaceae Arabidopsis thaliana
    X X X
  • Brassica spp.
    X
  • Fabaceae Glycine max
    X X
  • Medicago truncatula
    X X
  • Phaseolus spp.
    X
  • Malvaceae Gossypium arboreum
    X X
  • Solanaceae Capsicum annuum
    X
  • Lycopersicon esculentum
    X X
  • Solanum tuberosum
    X X
  • Poaceae Hordeum vulgare
    X X
  • Oryza sativa
    X X X
  • Sorghum bicolor/propinguim
    X X
  • Triticum aestivum
    X X
  • Zea mays
    X X
  • Other Beta vulgaris
    X
  • Chlamydomonas reinhardtii
    X X
  • Pinus taeda
    X X

24
Plant unigene datasets
  • species TIGR PlantGDB
  • barley 49885 74621
  • beet na 13565
  • chlamydomonas 30296 na
  • citrus na 4266
  • coffee na 392
  • cotton 24350 27854
  • grape 49885 74621
  • iceplant 8455 8945
  • lettuce 21960 na
  • lotus 11025 na
  • maize 55063 71655
  • marchantia na 1059
  • medicago 36976 43384
  • oat na 361
  • onion 11726 na
  • pine 26882 24668
  • poplar na 20935
  • potato 24275 24839

25
Wikström et al (2001) Proc R Soc Lond B 268, 2211
26
Plant phylogenomics Phytome
  • The goal is to integrate
  • Organismal phylogeny
  • Gene family
  • sequence
  • alignment
  • phylogeny
  • Genetic and physical maps

27
Some uses for Phytome
  • Starting with a chromosome segment
  • Identify homologous segments
  • Predict unobserved gene content (candidate QTL)
  • Starting with a gene family
  • Resolve orthology/paralogy relationships
  • Identify coevolving families
  • Starting with a species
  • Explore lineage-specific diversification
  • Guide comparative mapping wet-work

28
Current pipeline
Homolog identification
Protein sequence prediction
Unigene collections
Protein family clustering
Annotations
Multiple sequence alignment
Phytome
Phylogenetic inference
29
(No Transcript)
30
Lineage specific diversification
Arabidopsis
1033
436
173
Cotton
334
836
696
Medicago
715
Tomato
919
Rice
152 genes are single copy in all four species
31
A tale of two sisters the ARF and the Aux/IAA
gene families
  • Modulate whole plant response to auxin
  • Interact via dimerization
  • ARFs are transcription factors
  • Aux/IAAs bind and repress ARFs in the absence of
    auxin

32
The chromosomal context
33
Diversification of ARFs
34
Diversification of the Aux/IAAs
35
(No Transcript)
36
Why the different patterns of diversification?
  • 12 (ARF) vs 40 (Aux/IAA) segmental duplications
  • Presumably reflects differential retention
  • Possible explanations
  • Dosage requirements
  • Coevolution with other interacting genes
  • Regional transcriptional regulation

37
Divergence of duplicated genes
Divergence in expression profile
Age of duplication
38
Duplicate pairs in yeast and human (Gu et al.
2002, Makova and Li 2003)
  • Appx. 50 of pairs diverge very rapidly
  • Proportion of divergent pairs increases with Ks
    and Ka
  • Plateaus at Ka 0.3 in human
  • In humans,
  • Immune response genes over-represented among
    young, divergent pairs
  • Distantly related pairs with conserved expression
    tend to be either ubiquitous or very tissue
    specific

39
Retention of duplicated genes
  • Nonfunctionalization, or loss of one copy
  • The fate of most pairs
  • Neofunctionalization (NF)
  • Positive selection on a new mutation can maintain
    the pair
  • Subfunctionalization (SF)
  • Mutations that increase the specificity of
    duplicates can fix due to drift provided that,
    combined, the two copies provide the
    functionality of the ancestral gene. Once SF
    happens, both copies are indispensable and are
    retained.
  • One prediction of the model is that SF more
    likely for tandem than dispersed pairs (due to
    linkage)

40
Digital expression profiling
  • Massively Parallel Signature Sequencing (MPSS)
  • Count occurrence of 17-20 bp mRNA signatures
  • Cloning and sequencing is done on microbeads
  • Similar to Serial Analysis of Gene Expression
    (SAGE)
  • Bar-code counting reduces concerns of
  • cross-hybridization
  • probe affinity
  • background hybridization
  • Advantages
  • Accurate counts of low expression genes
  • Can distinguish expression profiles of duplicate
    genes

41
MPSS library construction
Brenner et al., PNAS 971665-70.
GATC
42
MPSS library construction
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
Brenner et al., PNAS 971665-70.
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
Sort by FACS to remove empty beads
The result of the library construction is a set
of microbeads. Each bead contains many DNA
molecules, all derived from the 3 end of a
single transcript. Beads are loaded in a
monolayer on a microscope slide for the
sequencing of 17 20 bp from the 5 end.
43
MPSS Sequencing
Brenner et al., Nat. Biotech. 18630-4.
44
MPSS Sequencing
Each bead provides a signature of 17-20 bp
Signature Sequence
of Beads (Frequency)
Tag
GATCAATCGGACTTGTC GATCGTGCATCAGCAGT GATCCGATACAGCT
TTG GATCTATGGGTATAGTC GATCCATCGTTTGGTGC GATCCCAGCA
AGATAAC GATCCTCCGTCTTCACA GATCACTTCTCTCATTA GATCTA
CCAGAACTCGG . . GATCGGACCGATCGACT
2 53 212 349 417 561 672 702 814 . . 2,935
1 2 3 4 5 6 7 8 9 . . 30,285
Total of tags gt1,000,000
Two sets of signatures are generated from each
sample in different reading frames staggered by
two bases
45
Classifying signatures
Typical signatures
46
Core Arabidopsis MPSS librariessequenced by Lynx
for Blake Meyers, U. of Delaware
Signatures Distinct Library sequenced signatur
es Root 3,645,414 48,102 Shoot 2,885,229 53,396
Flower 1,791,460 37,754 Callus 1,963,474 40,903
Silique 2,018,785 38,503 TOTAL 12,304,362 133,37
7
47
http//www.dbi.udel.edu/mpss
  • Query by
  • Sequence
  • Arabidopsis gene identifier
  • chromosomal position
  • BAC clone ID
  • MPSS signature
  • Library comparison
  • Site includes
  • Library and tissue information
  • FAQs and help pages

48
Genome-wide MPSS profile in Arabidopsis
Of the 29,084 gene models, 17,849 match
unambiguous, expressed class 1 and/or 2 signatures
49
Dataset of duplicate pairs
  • Gene families of size two in Arabidopsis
    classified as
  • Dispersed (280)
  • Segmental (149)
  • Tandem (63)
  • For each pair
  • Measure similarity/distance in expression profile
  • Estimate of Ks and KA

50
Expression distance
51
  • The number of genes with gt5 ppm expression in a
    given number of libraries among the 984 genes in
    pairs analyzed and among all Arabidopsis genes
    with MPSS profiles.
  • Libraries Genes in pairs All genes
  • 0 153 (15.5) 4160 (23.3)
  • 1 124 (12.6) 2643 (14.8)
  • 2 73 (7.4) 1727 (9.6)
  • 3 93 (9.5) 1777 (10.0)
  • 4 109 (11.1) 1930 (10.8)
  • 5 432 (43.9) 5612 (31.4)

52
Asymmetry in levels of expression among libraries
within pairs
  • Symmetry of divergence
  • Type of Pair A B C D
  • __________________________________________________
    ______________
  • Young
  • Dispersed (Ks?0.5) 14 61 8 6
  • 15.7 68.5 9.0 6.7
  • Tandem (Ks?0.5) 8 29 10 9
  • 14.3 51.8 17.9 16.1
  • Old
  • Dispersed (Ksgt0.5) 35 111 24 21
  • 18.3 58.1 12.6 11.0
  • Segmental (All) 31 104 7 7
  • 20.8 69.8 4.7 4.7
  • A Each copy has higher expression in at least
    one library

53
dN 0.480.37? KA, plt0.0001
54
(No Transcript)
55
Pairs with small Ks but dissimilar expression
profiles.
  • Ks Ka dup gene pair callus flower leaf root sili
    que
  • 0.03 lt0.01 D AT1G80700 71 59 11 140 94
  • AT1G80980 0 0 1 8 17
  • 0.17 0.05 T AT2G46280 246 210 160 308 80
  • AT2G46290 28 29 1 29 16
  • 0.20 0.06 T AT2G15400 4 14 5 5 34
  • AT2G15430 42 128 14 136 18
  • 0.22 0.05 D AT1G36280 1 3 9 13 10
  • AT4G18440 40 87 69 69 51
  • 0.26 0.05 T AT1G71270 88 56 44 52 107
  • AT1G71300 0 0 0 0 1
  • 0.27 0.07 T AT3G13290 20 22 1 1 6
  • AT3G13300 246 245 72 192 77

56
Pairs with large Ks but similar expression
profiles.
  • Ks Ka dup gene pair callus flower leaf root sili
    que
  • 0.87 0.28 T AT3G16220 16 10 57 3 19
  • AT3G16230 21 12 35 13 13
  • 0.89 0.13 D AT3G03660 14 0 0 0 0
  • AT5G17810 71 0 0 0 0
  • 0.95 0.29 D AT2G41180 57 14 78 4 29
  • AT3G56710 75 15 39 3 14
  • 0.97 0.28 D AT1G31814 2 39 4 3 0
  • AT5G16320 0 55 10 19 8
  • 0.98 0.23 D AT5G07230 0 344 0 0 0
  • AT5G62080 0 288 0 0 0
  • 0.99 0.26 D AT3G22160 86 6 10 4 4
  • AT4G15120 34 2 0 0 0

57
A closing thought
  • 1965
  • The Ecological Theater and the Evolutionary Play,
    G. E. Hutchison
  • 2004
  • The Chromosomal Theater and the Gene Family Play
  • Phylogenetics has a great deal to contribute to
    understanding the evolutionary interplay of
    genome structure and function

58
Dan Brown Brandon Gaut Steven Tanksley Liqing
Zhang Jason Phillips Dihui Lu David
Remington Jason Reed Tom Guilfoyle Blake
Meyers NSF
Write a Comment
User Comments (0)
About PowerShow.com