The%20International%20HapMap%20Project:%20a%20Rich%20Resource%20of%20Genetic%20Information - PowerPoint PPT Presentation

About This Presentation
Title:

The%20International%20HapMap%20Project:%20a%20Rich%20Resource%20of%20Genetic%20Information

Description:

Julia Krushkal Department of Preventive Medicine The University of Tennessee Health Science Center jkrushka{at}utmem.edu – PowerPoint PPT presentation

Number of Views:201
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: The%20International%20HapMap%20Project:%20a%20Rich%20Resource%20of%20Genetic%20Information


1
The International HapMap Project a Rich
Resource of Genetic Information
Julia Krushkal Department of Preventive
Medicine The University of Tennessee Health
Science Center jkrushkaatutmem.edu
2
HapMap Population Samples
Project launched in 2002 to provide a public
resource for accelerating medical genetic research
270 Individuals from 4 Geographically Diverse
Populations YRI 90 Yorubans from Ibadan,
Nigeria 30 parent-offspring trios CEU 90
northern and western European-descent living in
Utah, USA from the Centre dEtude du
Polymorphisme Humain (CEPH) collection 30
parent-offspring trios CHB 45 unrelated Han
Chinese from Beijing, China JPT 45 unrelated
Japanese from Tokyo, Japan
http//www.hapmap.org/
HapMap http//www.genome.gov/
page.cfm?pageID10001688 NHGRI
3
The International HapMap Project
Determine the common patterns of DNA sequence
variation in the human genome, by characterizing
sequence variants, their frequencies, and
correlations between them, in DNA samples from
populations with ancestry from parts of Africa,
Asia and Europe.
Nature (2003)
  • Population-specific sequence variation
  • Allele frequencies
  • Linkage disequilibrium patterns
  • Haplotype information
  • Tag SNPs
  • Structural genome variation
  • Better understanding of human population dynamics
    and of the history of human populations
  • Cell lines available from Coriell Inst. for
    Medical Research
  • A rich resource for biomedical genetic analysis

4
  • International HapMap Project Papers
  • The Int. HapMap Consortium. A second generation
    human haplotype map of over 3.1 million SNPs.
    Nature 449, 851-861. 2007
  • The Int. HapMap Consortium. A Haplotype Map of
    the Human Genome. Nature 437, 1299-1320. 2005
  • The Int. HapMap Consortium. The International
    HapMap Project. Nature 426, 789-796.. 2003
  • The Int. HapMap Consortium. Integrating Ethics
    and Science in the International HapMap Project.
    Nature Reviews Genet 5, 467 -475. 2004
  • Thorisson et al. The International HapMap Project
    Web site. Genome Res 151591-1593. 2005
  • HapMap-related papers
  • Sabeti et al. Genome-wide detection and
    characterization of positive selection in human
    populations. Nature 449, 913-918. 2007.
  • Clark et al. Ascertainment bias in studies of
    human genome-wide polymorphism. Genome Res,
    151496-1502. 2005
  • Clayton et al. Population structure,
    differential bias and genomic control in a
    large-scale, case-control association study.
    Nature Genet 37(11)1243-1246. 2005
  • de Bakker et al. Efficiency and power in genetic
    association studies. Nature Genet,
    37(11)1217-1223 2005
  • Goldstein, Cavalleri. Genomics Understanding
    human diversity. Nature 4371241-1242. 2005.
  • Hinds et al. Whole genome patterns of common DNA
    variation in three human populations. Science
    3071072-1079. 2005.
  • Myers et al. A fine-scale map of recombination
    rates and hotspots across the human genome.
    Science, 310321-324. 2005
  • Nielsen R et al.Genomic scans for selective
    sweeps using SNP data.Genome Res 151566-1575.
    2005
  • Smith et al. Sequence features in regions of weak
    and strong linkage disequilibrium. Genome Res 15
    1519-1534. 2005
  • Weir et al. Measures of human population
    structure show heterogeneity among genomic
    regions. Genome Res 15 1468-1476. 2005.

5
Nature (2003)
6
Human Chromosomes
  • Contain DNA
  • 22 pairs of autosomes
  • sex-chromosomes (X and Y) mitochondrial
    genome
  • Contain functional units (genes) and other DNA

Human genome sequence is available as a
reference, as a result of the Human Genome
Project A significant amount of inter-individual
variation exists
7
Some Basic Definitions
  • Locus - A site in the genome
  • The DNA in the human genome is not a static
    entity.
  • There are differences between different copies
  • Allele a genetic variant, i.e., a form (state)
    of a locus
  • Mutation - a genetic change
  • An individual carries two copies of each locus on
    autosomes
  • Individual alleles are inherited from parents to
    offspring
  • (1 from each parent)
  • Genotype - A set of alleles an individual is
    carrying at a given locus

8
Chromosomes are sets of continuously linked
genetic loci
Example
Integrated map of chromosome 5 from the
International HapMap Project, http//www.hapmap.
org
9
Genetic Variation
  • Some DNA loci vary among individuals
  • Linked genetic loci are inherited
    non-independently
  • Loci may change with time (mutation, selection,
    genetic drift)
  • Some DNA changes lead to quantitative changes in
    RNA expression and to quantitative or qualitative
    changes in protein production
  • Some genetic changes, even small, may lead to
    disease
  • A large amount of natural variation occurs in
    healthy individuals, i.e.,
  • many changes are neutral
  • Loci genetically linked to the disease-causing
    locus can be used as genetic markers to search
    for the disease locus

SNP1
SNP2
There are many types of DNA variation, e.g.
Sequence variation
AAAC/TGGCTA
Microsatellite repeats
AATG AATG AATG AATG
10
Polymorphic Site A locus with common DNA
variation ? 2 alleles in a population
Shows difference in DNA sequence among
individuals In most definitions the most
common allele with frequency lt 99, or minor
allele frequency (MAF) ? 1, or MAF ? 2,
or at least two alleles have frequencies ? 1.
A rare allele that occurs in lt1 of the
population is usually non considered a
polymorphic site.
11
SNPSingle Nucleotide Polymorphism
A SNP locus on the distal end of the long arm of
human chromosome 5 (data from Ensembl)
SNP locus rs6870660
http//www.ensembl.org
CAAATTCCATGA or CAGAAGGAAATACAT
A and C are alleles at SNP locus rs6870660
12
A SNP locus on the distal end of the long arm of
chromosome 5
SNP locus rs6870660
http//www.hapmap.org
13
Regulatory Interactions The ENCODE Project
ltgt
2003-Pilot project launched (1 of the
genome) 2007- Pilot project completed
production phase launched on the entire genome
High-through-put experimental and computational
approaches to studies of DNA regulatory sites,
regulatory interactions, and DNA modification
Production Scale Effort Pilot Scale Effort
Data Coordination Center Technology Development
Effort
14
Genome SNP Variation
Size of human genome is ? 3.2 ? 109 bp 99.9
identical 9-10 mln SNPs may have MAF? 5 ?
30,000 genes
HapMap SNP Density Coverage
  • Phase I (published in 2005)
  • 1,007,329 SNPs that passed quality control
  • 1 SNP / 3000 bp
  • 11,500 nsSNP
  • 10 ENCODE regions, 500 kb each
  • 17,944 SNPs
  • 1 SNP / 279 bp
  • Phase II (published in 2007)
  • gt3,806,000 SNPs
  • 1 SNP / 875bp
  • 25-30 of all SNPs with MAF ? 5

The cumulative number of non-redundant SNPs (each
mapped to a single location in the genome) is
shown as a solid line, as well as the number of
SNPs validated by genotyping (dotted line) and
double-hit status (dashed line). Years are
divided into quarters (Q1Q4).
15
(No Transcript)
16
http//www.hapmap.org/
17
SNP Differences among Individuals Far Exceed
Differences among Populations
Phase 1 Autosomes Across the 1 million SNPs
genotyped, only 11 have fixed differences between
CEU and YRI, 21 between CEU and CHB/JPT, and 5
between YRI and CHB/JPT. X chromosome 123 SNPs
were completely differentiated between YRI and
CHB/JPT, but only 2 between CEU and YRI and 1
between CEU and CHB/JPT.
18
Haplotypes
A haplotype is a set of alleles at multiple loci
located on the same copy of the chromosome
Genotype calls obtained from sequencing or DNA
chip genotyping do not provide the information
about which of the two chromosomal copies a
particular allele belongs to. E.g., genotypes
for individual X
Haplotypes SNP Genotypes SNP A A1 A2
A T SNP B B1 B2 T C SNP C C1 C2
G C
A C C
A1 B2 C2
Haplotype 1
Haplotype 2
A2 B1 C1
T T G
19
Recombination
Random event Occurs
during meiosis The larger the distance between
loci or as more generations pass, the more likely
recombination(s) will occur
A1 B1 A2 B2
Recombination (crossing-over)
x
A2 B2
A1 B2
A2 B1
A1 B1
Nonrecombinant
Recombinant Haplotypes
Haplotypes
20
Two ancestral chromosomes being scrambled through
recombination over many generations to yield
different descendant chromosomes. If an A allele
on the ancestral chromosome increases the risk of
a disease, the two individuals in the current
generation who inherit that part of the ancestral
chromosome will be at increased risk. Source
the International HapMap Project
21
Linkage Disequilibrium
Associations among alleles at different loci
A1 B1
D Linkage disequilibrium coefficient
Coefficient of association
A2 B2
DpA1B1-pA1pB1
Locus A Locus B
DD/Dmax
Normalized disequilibrium coefficient
Correlation coefficient
D max min(pA1pB2, pA2pB1) -1 ? D ? 1
?D/? pA1pA2pB1pB2
In case of no association,
D0 (linkage equilibrium)
Practical implications in fine gene
mapping Search for locus B using association of
marker loci with disease
22
The value of D decreases geometrically with each
generation
A B
a ? b
D(t)(1- ? ) D(t-1)
D(t)(1- ? ) tD(0)
Unless the two loci are closely linked, the value
of D should rapidly decrease to 0. The
occurrence of association between two loci
implies that they are closely linked.
23
Haplotype Maps Generated by The International
HapMap Project
3 steps of the HapMap construction (a) SNPs are identified in DNA samples from multiple individuals. (b) Adjacent SNPs that are inherited together are compiled into haplotypes. (c)"Tag" SNPs are identified within haplotypes that uniquely describe those haplotypes. Source The International HapMap Project
24
Haplotype Maps of the Human Genome
Helmuth 2001, Science 293583-585
Find correlations among groups of SNPs
Haplotypes were inferred for the HapMap project
from trios data and from unrelated individuals
using Phase (Stephens 01 Stephens and Donnely 03)
25
Haplotype Maps of the Human Genome
Genome regions decomposed into discrete
haplotype blocks, which capture similarity in
haplotype organization
Patil et al. 2001, Blocks of Limited Haplotype
Diversity Revealed by High-Resolution Scanning of
Human Chromosome 21. Science 294(5547)1719-23
26
Haplotype Block Partition Results for Three
Populations
1,586,383 (SNPs) genotyped in 71 Americans of
European, African, and Asian ancestry
Population  
Blocks  
Average size, kb  
Required SNPs    
African-American  
235,663  
8.8  
570,886  
European-American  
109,913  
20.7  
275,960  
Han Chinese  
89,994  
25.2  
220,809  
Average distance spanned by segregating sites
in each block.   Minimum number of SNPs
required to distinguish common haplotype patterns
with frequencies of 5 or higher.
Hinds et al. 2005 Science
27
Hinds et al 2005
Population differences in local bin
structure Differences in allele and haplotype
frequencies Although analysis panels are
characterized both by different haplotype
frequencies and, to some extent, different
combinations of alleles, both common and rare
haplotypes are often shared across populations
(The Int. HapMap Project, Nature, 2005)
28
Tag SNP (htSNP) selection
Pairwise LD-based and haploblock-based tagging
methods Partition haplotypes into blocks Can use
haplotype-based (haploblocks) or genotype-based
(LD-blocks) partitioning Select representative
htSNPs from each block Latest DNA microarrays aim
to capture SNPs with r2 ? 0.8
Tags are the subset of variants genotyped in a
disease study. SNPs that are not typed in the
study but whose effect can be studied through LD
with a tag are termed proxies. A tag with perfect
correlation (r2 1) to an untyped putative
causal allele is termed a perfect proxy. De
Bakker et al., 2005
29
(No Transcript)
30
Tag SNP, Haplotypes, and LD
The Int. HapMap Consortium, Nature, 2005
31
Use of Haplotypes in Association Analysis
  • Testing one marker at a time for associations is
    very time-consuming
  • Problem of multiple testing
  • Testing individual SNPs, we are not utilizing
    information from other markers

Benefits of Using Haplotypes
  • Haplotypes allow us to use information from
    multiple loci simultaneously
  • LD information between loci is captured

32
Benefits of Haplotype Analysis
  • Construct a single highly informative mega-locus
    from a number of less informative but closely
    linked loci
  • Identify genotyping or data entry errors.
  • Likelihood ratio tests indicate which typings
    are more likely to be an error
  • Find boundaries of conserved haplotypes
    associated with a trait.
  • Employs recombinations from the entire history a
    population

33
Amount of Captured Sequence Variation in HapMap
Phase II
For common variants (MAF ? 0.05) the mean maximum
r2 of any SNP to a typed one is 0.90 in YRI, 0.96
in CEU and 0.95 in CHB /JPT.
1.09 million SNPs capture all common Phase II
SNPs with r2 ? 0.8 in YRI.
Very common SNPs with MAF ? 0.25 are captured
extremely well (mean maximum r2 of 0.93 in YRI to
0.97 in CEU) Rarer SNPs with MAF,0.05 are less
well covered (mean maximum r2 of 0.74 in CHB/JPT
to 0.76 in YRI).
34
(No Transcript)
35
Recombination Hot Spots
36
Structural Genome Variation
HapMap samples are also used as a resource for
CNV analysis
  • Large number of copy number variants (CNVs) and
    other genome rearrangements found among
    individuals
  • Some variation is assumed normal, other may cause
    disease
  • Genome databases, e.g. Database of Genomics
    Variants at the TCAG of the Toronto Hospital of
    Sick Children, the Copy Number Variation Project
    Map at the Sanger Center

37
  • Segmental duplications are recombination
    hotspots, causing global genome rearrangements

38
(No Transcript)
39
HapMap Genome Browser
40
(No Transcript)
41
(No Transcript)
42
(No Transcript)
43
(No Transcript)
44
Perlegen Genotype Browser
45
(No Transcript)
46
UCSC Genome Browser
http//genome.ucsc.edu/
47
DNA Chips and Resequencing High-through-put
Analysis of Sequence Variation
An easy way to access genome-wide variation Both
Affymetrix and Illumina DNA chips contain
representative SNP and CNV probes
Affymetrix GeneChip 6.0 1.8 million markers for
genetic variation, including 906,000 SNPs and
946,000 copy number probes. Illumina 1M Bead
Chip and 1M-duo Bead Chip 950,000
genome-spanning tag SNPs 100,000 additional
non-HapMap SNPs, gt565,000 SNPs in and near
coding regions such as nsSNPs, promoter regions,
3 and 5 UTRs dense coverage in ADME and MHC
regions. 260,000 markers located in novel and
reported copy number polymorphic regions.
Sequenom mass arrays (based on Maldi-TOF)
48
Genome-Wide Association
Select representative htSNPs from low diversity
haplotype blocks Adjustment for multiple
comparisons LD values highly variable smoothing
function needed Haplotypes in a sliding
window OR screen for top SNPs likely
functional SNPs SNPs in genes involved in
pathways of interest
49
Use of Phase-Resolved Data in Association
Analysis
  • Find association with haplotypes similar to
    analyses of individual SNP alleles Need to
    consider multiple testing
  • Test for tendency of cases to cluster around
    groups of similar haplotypes
  • Extend log-linear approach to take haplotype
    structure into account
  • Modifications also used for ambiguous phase

50
http//www.genome.gov/26525384
As of 04/14/2008, GWAS of 150 traits posted
51
(No Transcript)
52
Special Thanks to
  • Ken Manly, whose presentation ideas for the
    HapMap module 2006 inspired and helped organized
    this presentation
Write a Comment
User Comments (0)
About PowerShow.com