Putting gene family evolution in its chromosomal context - PowerPoint PPT Presentation

1 / 56
About This Presentation
Title:

Putting gene family evolution in its chromosomal context

Description:

Putting gene family evolution in its chromosomal context Todd Vision Department of Biology University of North Carolina at Chapel Hill Abstract In complex genomes ... – PowerPoint PPT presentation

Number of Views:144
Avg rating:3.0/5.0
Slides: 57
Provided by: labsBioU9
Learn more at: http://labs.bio.unc.edu
Category:

less

Transcript and Presenter's Notes

Title: Putting gene family evolution in its chromosomal context


1
Putting gene family evolution in its chromosomal
context
  • Todd Vision
  • Department of Biology
  • University of North Carolina at Chapel Hill

2
Abstract
  • In complex genomes, the continual duplication,
    functional divergence, and loss of genes over
    time results in gene content divergence among
    related lineages. In addition to changes in
    content, the order of genes within the genome can
    be disturbed by a host of different rearrangement
    events. Changes in gene content and order are of
    interest for a number of reasons. Such mutations,
    particularly those that affect gene content, may,
    as a class, have dramatic phenotypic
    consequences thus, they merit study from a
    functional perspective. In order to predict the
    location of genes in non-model organisms using
    comparative mapping, molecular breeders will need
    to have better models for how gene content and
    order and evolve. And from an evolutionary
    perspective, it is of interest to understand how
    carefully our gene content and gene order is the
    directly governed by selective forces, and what
    other forces are at work. Here, I describe what
    we currently know about the evolution of gene
    content and order among the flowering plants.
    This clade contains all of the world's major food
    crops, and is thus the focus of a great deal of
    comparative mapping effort. I will offer my
    thoughts on what computational biology has to
    contribute to this emerging area of inquiry.

3
Outline
  • Gene order rearrangement in plants
  • Chromosomal perspective
  • Gene family perspective
  • Gene duplication and functional divergence
  • Segmental duplications as a tool

4
(No Transcript)
5
Chromosomal perspective
  • Biological importance
  • Clustering of gene function
  • Clustering of transcriptional activity
  • Applied importance
  • Conservation of gene order (synteny)

6
Devos and Gale 2000 Plant Cell 12, 637
7
Arabidopsis as a hub for plant comparative maps
Arumuganathan and Earle 1991 Plant Mol Biol Rep
9, 208.
8
Arabidopsis paleopolyploidy
The Arabidopsis Genome Initiative 2000 Nature
408, 796
9
Non-overlapping syntenies
10
Blanc et al. 2003 Genome Res. 13, 137.
11
Blanc and Wolfe 2004 Plant Cell 16, 1667.
12
Tomato-Arabidopsis synteny
Bancroft 2001 TIG 17, 89 after Ku et al. 2000
PNAS 97, 9121.
13
Rice-Arabidopsis microsynteny
Mayer et al. 2001 Genome Res. 11, 1167.
14
(No Transcript)
15
Hidden syntenies
Simillion et al. 2002 PNAS 99, 13627.
16
Interspecies comparison can reveal hidden
syntenies
Vandepoele et al. 2002 TIG 18, 606.
17
Simillion et al. 2004 Genome Res. 14, 1095
18
From descriptive to predictive
  • Can we predict the gene content of homologous
    segments when markers are sparse?
  • Utility for QTL mapping
  • Prioritize candidate genes in a QTL region from a
    non-sequenced genome
  • Provide markers for fine-mapping

19
Hidden Markov Models (HMM)
t1,1
t1,2
t2,2
t2,end
Transition probabilities Hidden
states Emission probabilities
1
2
end
p1(a) p1(b)
p2(a) p2(b)
Observed states a-gtb-gta Hidden states
1-gt1-gt2-gtend Probability p1(a) t1,1 p1(b)
t1,2 p2(a) t2,end
20
A gene content HMM
  • Observed states
  • a homologous gene is either observed or not
  • Hidden states
  • presence or absence of gene within a segment
  • Emission probabilities
  • A gene will be unobserved if it is not present
  • A gene may be unobserved even if it is present
  • Dependent on the density of the gene map
  • Transition probabilities
  • reflect conservation of gene content along the
    branches of a phylogeny

21
Transition probabilities and the segment phylogeny
22
1-a
1
Loss (L) Loss-Gain (LG) Multiple Loss-Gain
(MLG)
a
P
A
1-b
1-a
1
a
b
A1
P
A2
1-b
1-ai
1
b
ai
A1
P
A2
23
Estimating model parameters
  • Segment phylogeny
  • Each set of homologous genes is missing from some
    segments
  • Estiimate an averaged distance matrix
  • Build tree with neighbor-joining and midpoint
    rooting
  • HMM parameter estimation
  • Loss rate(s)
  • Gain rate
  • Number of genes present at the root

24
Do parameter estimates converge?
LG model n100 genes no missing data a1 0.1, a2
0.3 1000 replicates
Initial a SE SE
0.05 0.106 0.006 0.294 0.018
0.3 0.106 0.006 0.294 0.018
25
Accuracy of hidden state assignments
5 segment phylogeny, a ? 10.1, ?20.3, ?0.1,
24 gain
26
A large multiplicon
12 segments from rice and arabidopsis 56 sets of
homologous genes
Vandepoele et al 2003 Plant Cell 15, 2192.
27
Self-validation test
? ? ? ? ?
28
Probability of gene presence(8 longest segments)
Segment True Estimate Diff
1 0.251 0.173 0.078
2 0.225 0.166 0.059
3 0.262 0.171 0.091
4 0.149 0.175 -0.026
5 0.268 0.171 0.097
6 0.233 0.167 0.066
7 0.226 0.170 0.056
8 0.148 0.168 -0.020
Branch lengths scaled so that longest branch is
1.0 Estimate of a 0.7
29
Summary gene content HMM
  • Multispecies comparative maps
  • Becoming more common
  • Most species only partially characterized
  • Usefulness also compromised by sparse synteny
  • Probabilistic models will allow us to move
  • from simple descriptions of the extent of synteny
  • to predictive tools that can guide further
    experiments

30
Gene family perspective
  • Modes of duplication
  • Tandem (T)
  • Dispersed (D)
  • Segmental (S)

T
D
S
31
A tale of two sisters the ARF and the Aux/IAA
gene families
  • Modulate whole plant response to auxin
  • Interact via dimerization
  • ARFs are transcription factors
  • Aux/IAAs bind and repress ARFs in the absence of
    auxin

32
Diversification of ARFs
Remington et al 2004 Plant Cell 135, 1738
33
The chromosomal context
Remington et al 2004 Plant Cell 135, 1738
34
Diversification of the Aux/IAAs
Remington et al 2004 Plant Cell 135, 1738
35
Remington et al 2004 Plant Cell 135, 1738
36
Why the different patterns of diversification?
  • 12 (ARF) vs 40 (Aux/IAA) segmental duplications
  • Presumably reflects differential retention
  • Possible explanations
  • Dosage requirements
  • Coevolution with other interacting genes
  • Regional transcriptional regulation

37
How typical is the Aux/IAA family?
Gene family Genes S events
Proteasome alpha beta subunits 23 9
Ser/Thr phosphatase 26 10
Ras related GTP-binding 72 19
Auxin-independent growth promoter 33 8
Major instrinsic protein 38 10
Calmodulin 79 20
Phosphatidylcholine transferase 30 8
Cation/hydrogen exchanger 28 8
Cannon et al. 2004 BMC Plant Biology 4, 10.
38
Segmental duplication of pathways?
Blanc and Wolfe 2004 Plant Cell 16, 1679.
39
Summary gene family perspective
  • Chromosomal context can matter
  • Gene families differ in their patterns of
    duplicate gene proliferation
  • Presumably due to differential retention
  • Polyploidy
  • Qualitatively differs from other gene duplication
    modes
  • Divergence of whole pathways possible

40
Functional divergence and chromosomal context
  • Do patterns of divergence (ie spatiotemporal
    expression) differ among T, D, and S duplicates?

41
Retention of duplicated genes
  • Neofunctionalization (NF)
  • Mutations lead to new divergent functions that
    are positively selected
  • Subfunctionalization (SF)
  • Mutations knock out ancestral functions and make
    both copies indispensible
  • New divergent functions evolve secondarily
  • SF more likely for tandem than dispersed pairs
    (due to linkage)
  • There are other possibilities
  • Duplicates retained when higher expression is
    favored

42
Divergence of duplicated genes
Divergence in expression profile
Age of duplication
43
Duplicate pairs in yeast and human (Gu et al.
2002, Makova and Li 2003)
  • Appx. 50 of pairs diverge very rapidly
  • Proportion of divergent pairs increases with
    synonymous substitions (Ks)
  • Less so with replacement changes (Ka)
  • Plateaus at Ka 0.3 in human
  • In humans, distantly related pairs with conserved
    expression tend to be either ubiquitous or very
    tissue specific

44
Digital expression profiling
  • Massively Parallel Signature Sequencing (MPSS)
  • Count occurrence of 17-20 bp mRNA signatures
  • Cloning and sequencing is done on microbeads
  • Similar to Serial Analysis of Gene Expression
    (SAGE)
  • Bar-code counting reduces concerns of
  • cross-hybridization
  • probe affinity
  • background hybridization
  • Which enables
  • Accurate counts of low expression genes
  • Distinguishing expression profiles of duplicate
    genes

45
MPSS technology
Sort by FACS and deposit in channeled monolayer
Clone 3 ends of transcripts to microbeads
Sequence 17-20 bp from 5 end by hybridization
Brenner et al. 2000 PNAS 971665.
46
MPSS Data
signature
frequency
GATCAATCGGACTTGTC GATCGTGCATCAGCAGT GATCCGATACAGCT
TTG GATCTATGGGTATAGTC GATCCATCGTTTGGTGC GATCCCAGCA
AGATAAC GATCCTCCGTCTTCACA GATCACTTCTCTCATTA GATCTA
CCAGAACTCGG . . GATCGGACCGATCGACT
2 53 212 349 417 561 672 702 814 . . 2,935
Total of tags gt1,000,000
47
Classifying signatures
Typical signatures
48
Core Arabidopsis MPSS librariessequenced by Lynx
for Blake Meyers, U. of Delaware
Signatures Distinct Library sequenced signatur
es Root 3,645,414 48,102 Shoot 2,885,229 53,396
Flower 1,791,460 37,754 Callus 1,963,474 40,903
Silique 2,018,785 38,503 TOTAL 12,304,362 133,37
7
49
http//www.dbi.udel.edu/mpss
  • Query by
  • Sequence
  • Arabidopsis gene identifier
  • chromosomal position
  • BAC clone ID
  • MPSS signature
  • Library comparison
  • Site includes
  • Library and tissue information
  • FAQs and help pages

50
Genome-wide MPSS profile in Arabidopsis
Of the 29,084 gene models, 17,849 match
unambiguous, expressed class 1 and/or 2 signatures
51
Dataset of duplicate pairs
  • Arabidopsis gene families of size 2 classified as
  • Dispersed (280)
  • Segmental (149)
  • Tandem (63)
  • For each pair
  • Measured similarity/distance in expression
    profile
  • Estimated silent Ks and replacement KA changes

52
Expression distance
53
Major findings
  • Many pairs are divergent in sequence but not
    expression and vice versa
  • Pairs have atypically high expression
  • Especially slowly evolving pairs
  • Divergence increases with Ka,
  • Particularly among S duplicates!
  • Divergence tends to be highly asymmetric

54
Expression level gt5 ppm in x libraries
  • Libraries Genes in pairs All genes
  • 0 153 (15.5) 4160 (23.3)
  • 1 124 (12.6) 2643 (14.8)
  • 2 73 (7.4) 1727 (9.6)
  • 3 93 (9.5) 1777 (10.0)
  • 4 109 (11.1) 1930 (10.8)
  • 5 432 (43.9) 5612 (31.4)

55
(No Transcript)
56
dN 0.480.37? KA, plt0.0001
57
Asymmetric divergence
  • Type of Pair A B C D
  • __________________________________________________
    _
  • Young
  • Dispersed (Ks?0.5) 14 61 8 6
  • 15.7 68.5 9.0 6.7
  • Tandem (Ks?0.5) 8 29 10 9
  • 14.3 51.8 17.9 16.1
  • Old
  • Dispersed (Ksgt0.5) 35 111 24 21
  • 18.3 58.1 12.6 11.0
  • Segmental (All) 31 104 7 7
  • 20.8 69.8 4.7 4.7
  • A Each copy has higher expression in at least
    one library
  • B One copy has higher expression in all
    libraries that differ and at least two libraries
    differ
  • C Copies differ in expression in only one
    library
  • D Copies do not differ in expression in any
    libraries

58
Why put gene family evolution into a chromosomal
context?
  • We can begin to understand and utilize patterns
    of evolution in gene order
  • We can gain insight into the function and
    evolution of gene families that are not apparent
    from beanbag genomics

59
Thanks to Zongli Xu David Remington Jason
Reed Tom Guilfoyle Blake Meyers NSF
Write a Comment
User Comments (0)
About PowerShow.com