Title: Putting gene family evolution in its chromosomal context
1Putting gene family evolution in its chromosomal
context
- Todd Vision
- Department of Biology
- University of North Carolina at Chapel Hill
2Abstract
- In complex genomes, the continual duplication,
functional divergence, and loss of genes over
time results in gene content divergence among
related lineages. In addition to changes in
content, the order of genes within the genome can
be disturbed by a host of different rearrangement
events. Changes in gene content and order are of
interest for a number of reasons. Such mutations,
particularly those that affect gene content, may,
as a class, have dramatic phenotypic
consequences thus, they merit study from a
functional perspective. In order to predict the
location of genes in non-model organisms using
comparative mapping, molecular breeders will need
to have better models for how gene content and
order and evolve. And from an evolutionary
perspective, it is of interest to understand how
carefully our gene content and gene order is the
directly governed by selective forces, and what
other forces are at work. Here, I describe what
we currently know about the evolution of gene
content and order among the flowering plants.
This clade contains all of the world's major food
crops, and is thus the focus of a great deal of
comparative mapping effort. I will offer my
thoughts on what computational biology has to
contribute to this emerging area of inquiry.
3Outline
- Gene order rearrangement in plants
- Chromosomal perspective
- Gene family perspective
- Gene duplication and functional divergence
- Segmental duplications as a tool
4(No Transcript)
5Chromosomal perspective
- Biological importance
- Clustering of gene function
- Clustering of transcriptional activity
- Applied importance
- Conservation of gene order (synteny)
6Devos and Gale 2000 Plant Cell 12, 637
7Arabidopsis as a hub for plant comparative maps
Arumuganathan and Earle 1991 Plant Mol Biol Rep
9, 208.
8Arabidopsis paleopolyploidy
The Arabidopsis Genome Initiative 2000 Nature
408, 796
9Non-overlapping syntenies
10Blanc et al. 2003 Genome Res. 13, 137.
11Blanc and Wolfe 2004 Plant Cell 16, 1667.
12Tomato-Arabidopsis synteny
Bancroft 2001 TIG 17, 89 after Ku et al. 2000
PNAS 97, 9121.
13Rice-Arabidopsis microsynteny
Mayer et al. 2001 Genome Res. 11, 1167.
14(No Transcript)
15Hidden syntenies
Simillion et al. 2002 PNAS 99, 13627.
16Interspecies comparison can reveal hidden
syntenies
Vandepoele et al. 2002 TIG 18, 606.
17Simillion et al. 2004 Genome Res. 14, 1095
18From descriptive to predictive
- Can we predict the gene content of homologous
segments when markers are sparse? - Utility for QTL mapping
- Prioritize candidate genes in a QTL region from a
non-sequenced genome - Provide markers for fine-mapping
19Hidden Markov Models (HMM)
t1,1
t1,2
t2,2
t2,end
Transition probabilities Hidden
states Emission probabilities
1
2
end
p1(a) p1(b)
p2(a) p2(b)
Observed states a-gtb-gta Hidden states
1-gt1-gt2-gtend Probability p1(a) t1,1 p1(b)
t1,2 p2(a) t2,end
20A gene content HMM
- Observed states
- a homologous gene is either observed or not
- Hidden states
- presence or absence of gene within a segment
- Emission probabilities
- A gene will be unobserved if it is not present
- A gene may be unobserved even if it is present
- Dependent on the density of the gene map
- Transition probabilities
- reflect conservation of gene content along the
branches of a phylogeny
21Transition probabilities and the segment phylogeny
221-a
1
Loss (L) Loss-Gain (LG) Multiple Loss-Gain
(MLG)
a
P
A
1-b
1-a
1
a
b
A1
P
A2
1-b
1-ai
1
b
ai
A1
P
A2
23Estimating model parameters
- Segment phylogeny
- Each set of homologous genes is missing from some
segments - Estiimate an averaged distance matrix
- Build tree with neighbor-joining and midpoint
rooting - HMM parameter estimation
- Loss rate(s)
- Gain rate
- Number of genes present at the root
24Do parameter estimates converge?
LG model n100 genes no missing data a1 0.1, a2
0.3 1000 replicates
Initial a SE SE
0.05 0.106 0.006 0.294 0.018
0.3 0.106 0.006 0.294 0.018
25Accuracy of hidden state assignments
5 segment phylogeny, a ? 10.1, ?20.3, ?0.1,
24 gain
26 A large multiplicon
12 segments from rice and arabidopsis 56 sets of
homologous genes
Vandepoele et al 2003 Plant Cell 15, 2192.
27Self-validation test
? ? ? ? ?
28Probability of gene presence(8 longest segments)
Segment True Estimate Diff
1 0.251 0.173 0.078
2 0.225 0.166 0.059
3 0.262 0.171 0.091
4 0.149 0.175 -0.026
5 0.268 0.171 0.097
6 0.233 0.167 0.066
7 0.226 0.170 0.056
8 0.148 0.168 -0.020
Branch lengths scaled so that longest branch is
1.0 Estimate of a 0.7
29Summary gene content HMM
- Multispecies comparative maps
- Becoming more common
- Most species only partially characterized
- Usefulness also compromised by sparse synteny
- Probabilistic models will allow us to move
- from simple descriptions of the extent of synteny
- to predictive tools that can guide further
experiments
30Gene family perspective
- Modes of duplication
- Tandem (T)
- Dispersed (D)
- Segmental (S)
T
D
S
31A tale of two sisters the ARF and the Aux/IAA
gene families
- Modulate whole plant response to auxin
- Interact via dimerization
- ARFs are transcription factors
- Aux/IAAs bind and repress ARFs in the absence of
auxin
32Diversification of ARFs
Remington et al 2004 Plant Cell 135, 1738
33The chromosomal context
Remington et al 2004 Plant Cell 135, 1738
34Diversification of the Aux/IAAs
Remington et al 2004 Plant Cell 135, 1738
35Remington et al 2004 Plant Cell 135, 1738
36Why the different patterns of diversification?
- 12 (ARF) vs 40 (Aux/IAA) segmental duplications
- Presumably reflects differential retention
- Possible explanations
- Dosage requirements
- Coevolution with other interacting genes
- Regional transcriptional regulation
37How typical is the Aux/IAA family?
Gene family Genes S events
Proteasome alpha beta subunits 23 9
Ser/Thr phosphatase 26 10
Ras related GTP-binding 72 19
Auxin-independent growth promoter 33 8
Major instrinsic protein 38 10
Calmodulin 79 20
Phosphatidylcholine transferase 30 8
Cation/hydrogen exchanger 28 8
Cannon et al. 2004 BMC Plant Biology 4, 10.
38Segmental duplication of pathways?
Blanc and Wolfe 2004 Plant Cell 16, 1679.
39Summary gene family perspective
- Chromosomal context can matter
- Gene families differ in their patterns of
duplicate gene proliferation - Presumably due to differential retention
- Polyploidy
- Qualitatively differs from other gene duplication
modes - Divergence of whole pathways possible
40Functional divergence and chromosomal context
- Do patterns of divergence (ie spatiotemporal
expression) differ among T, D, and S duplicates?
41Retention of duplicated genes
- Neofunctionalization (NF)
- Mutations lead to new divergent functions that
are positively selected - Subfunctionalization (SF)
- Mutations knock out ancestral functions and make
both copies indispensible - New divergent functions evolve secondarily
- SF more likely for tandem than dispersed pairs
(due to linkage) - There are other possibilities
- Duplicates retained when higher expression is
favored
42Divergence of duplicated genes
Divergence in expression profile
Age of duplication
43Duplicate pairs in yeast and human (Gu et al.
2002, Makova and Li 2003)
- Appx. 50 of pairs diverge very rapidly
- Proportion of divergent pairs increases with
synonymous substitions (Ks) - Less so with replacement changes (Ka)
- Plateaus at Ka 0.3 in human
- In humans, distantly related pairs with conserved
expression tend to be either ubiquitous or very
tissue specific
44Digital expression profiling
- Massively Parallel Signature Sequencing (MPSS)
- Count occurrence of 17-20 bp mRNA signatures
- Cloning and sequencing is done on microbeads
- Similar to Serial Analysis of Gene Expression
(SAGE) - Bar-code counting reduces concerns of
- cross-hybridization
- probe affinity
- background hybridization
- Which enables
- Accurate counts of low expression genes
- Distinguishing expression profiles of duplicate
genes
45MPSS technology
Sort by FACS and deposit in channeled monolayer
Clone 3 ends of transcripts to microbeads
Sequence 17-20 bp from 5 end by hybridization
Brenner et al. 2000 PNAS 971665.
46 MPSS Data
signature
frequency
GATCAATCGGACTTGTC GATCGTGCATCAGCAGT GATCCGATACAGCT
TTG GATCTATGGGTATAGTC GATCCATCGTTTGGTGC GATCCCAGCA
AGATAAC GATCCTCCGTCTTCACA GATCACTTCTCTCATTA GATCTA
CCAGAACTCGG . . GATCGGACCGATCGACT
2 53 212 349 417 561 672 702 814 . . 2,935
Total of tags gt1,000,000
47Classifying signatures
Typical signatures
48Core Arabidopsis MPSS librariessequenced by Lynx
for Blake Meyers, U. of Delaware
Signatures Distinct Library sequenced signatur
es Root 3,645,414 48,102 Shoot 2,885,229 53,396
Flower 1,791,460 37,754 Callus 1,963,474 40,903
Silique 2,018,785 38,503 TOTAL 12,304,362 133,37
7
49http//www.dbi.udel.edu/mpss
- Query by
- Sequence
- Arabidopsis gene identifier
- chromosomal position
- BAC clone ID
- MPSS signature
- Library comparison
- Site includes
- Library and tissue information
- FAQs and help pages
50Genome-wide MPSS profile in Arabidopsis
Of the 29,084 gene models, 17,849 match
unambiguous, expressed class 1 and/or 2 signatures
51 Dataset of duplicate pairs
- Arabidopsis gene families of size 2 classified as
- Dispersed (280)
- Segmental (149)
- Tandem (63)
- For each pair
- Measured similarity/distance in expression
profile - Estimated silent Ks and replacement KA changes
52Expression distance
53Major findings
- Many pairs are divergent in sequence but not
expression and vice versa - Pairs have atypically high expression
- Especially slowly evolving pairs
- Divergence increases with Ka,
- Particularly among S duplicates!
- Divergence tends to be highly asymmetric
54Expression level gt5 ppm in x libraries
- Libraries Genes in pairs All genes
- 0 153 (15.5) 4160 (23.3)
- 1 124 (12.6) 2643 (14.8)
- 2 73 (7.4) 1727 (9.6)
- 3 93 (9.5) 1777 (10.0)
- 4 109 (11.1) 1930 (10.8)
- 5 432 (43.9) 5612 (31.4)
55(No Transcript)
56dN 0.480.37? KA, plt0.0001
57Asymmetric divergence
- Type of Pair A B C D
- __________________________________________________
_ - Young
- Dispersed (Ks?0.5) 14 61 8 6
- 15.7 68.5 9.0 6.7
- Tandem (Ks?0.5) 8 29 10 9
- 14.3 51.8 17.9 16.1
- Old
- Dispersed (Ksgt0.5) 35 111 24 21
- 18.3 58.1 12.6 11.0
- Segmental (All) 31 104 7 7
- 20.8 69.8 4.7 4.7
- A Each copy has higher expression in at least
one library - B One copy has higher expression in all
libraries that differ and at least two libraries
differ - C Copies differ in expression in only one
library - D Copies do not differ in expression in any
libraries
58Why put gene family evolution into a chromosomal
context?
- We can begin to understand and utilize patterns
of evolution in gene order - We can gain insight into the function and
evolution of gene families that are not apparent
from beanbag genomics
59Thanks to Zongli Xu David Remington Jason
Reed Tom Guilfoyle Blake Meyers NSF