Title: Copy Number Variations: a new type of genetic marker in wholegenome association studies
1Copy Number Variations a new type of genetic
marker in whole-genome association studies
- Wentian Li, Ph.D
- Center for Genomics and Human Genetics
- Feinstein Institute for Medical Research
- North Shore LIJ Health System
- Oct 31, 2008
- Consumer Genomics Workshop
- Center for Genetic Medicine
- Northwestern University
2Outline of the talk
- Overview of structural aberrations
- Using genotyping array for CNV study
- Examples of disease mapping
- Practical issues
3terminology
- Copy number variation ( germline, inherited)
- inherited also present in parents genome
- de novo absent in parents genome
- Copy number alteration (somatic, e.g. in cancer
cells) - Copy number polymorphism (relatively common CNV,
with a fixed starting/ending position)
4 - Overview of structural aberration
- Different length scales
- CGH/arrayCGH/ROMA
- 2. Using genotyping array for CNV study
- 3. Examples of disease mapping
- 4. Practical issues
5length scales of aberrations/variations/polymorphi
sms
6structural aberration (1) whole genome
duplication
- Polyploidy is common in plants (Rare in animals)
- Survival rate after WGD may be very low. Major
genomic instability would follow including
massive gene losses - In vertebrates, WGD is thought to occur twice
around 500 million years ago (2R hypothesis)
7Structural variation (2)gain or loss of certain
chromosomes
- Aneuploidy monosomy1, trisomy3, tetrasomy4
- either fatal (spontaneous abortion) or
responsible for abnormal phenotypes - Chromosome-specific aneuploidy rate? less number
of chiasmata -- shorter chromosomes ch21,ch22 - Down syndrome trisomy 21
8(No Transcript)
9Structural aberration (3)microscopically-visible
aberrations
- Breaks
- Double-breaks (inversion, translocation)
- Deletions (4p, 5p, 9p, 11p/11q, 13q, 18p/18q).
deletion syndromes - Duplications (inverted 15p). Iso-chromosomes are
inverted duplications of the whole arm. - balanced vs. unbalanced (deletion/loss,
duplication/gain)
10translocation between ch7 and ch13 balanced
karyotyping with each chromosome stained with a
different color (Iafrate et al. 2004)
11extra copy on ch16, extra two copies on ch6
unbalanced
Fluorescence in situ hybridization (FISH). Red
for test, green for control (Iafrate et al. 2004)
12Chromosome CGH comparative genome hybridization
(Pinkel 1992)
- advantages
- No need to prepare chromosomes, only DNA
- Simple color scheme e.g.,duplications show up as
red, deletions as green, normal as yellow
- disadvantages
- Need sophisticated microscopic/image analysis
- Long time (days) in hybridization
- Time-consuming analysis of the result
13Structural aberration (4) from microscopic to
submicroscopic, from chromosome CGH to array CGH
(Pinkel/Albertson 2001)
- Array can be spotted by any DNA sources BAC
clone, oligonucleotide - Swap in a second hybridization to remove
artifact
14ROMA representational oligo-nucleotide
microarray analysis (Lucito/Wigler 2003)
- DNAs are digested to become smaller segments
- Segments are amplified by PCR (upper limit of
1.2kb) - Array is spotted with 70-nt ologonucleotides
15 - Overview of structural aberration
- Using genotyping array for CNV study
- R-ratio and theta series
- Cumulative plots
- Hidden Markov model
- 3. Examples of disease mapping
- 4. Practical issues
16(No Transcript)
17The basic idea behind CNV detection using
genotyping array
- Two-channel (two-allele) intensities (x and y)
- normalizing x,y with a reference value (based on
100 controls, provided by the company) - derive angle (theta) and radius (R) from x,y
18Hemizygous deletion (CN1)
Log(1/2)
No heterozygote (loss of heterozygosity)
19Homozygous deletion (CN0)
Log(0/2)
20Duplication (CN3)
Log(3/2)
21Delineate CNV regions
- Eyeballing the theta and R-ratio plots (for large
CNV regions) - Cumulative plots
- Hidden Markov model
22- CNA in cancer cell chronic lymphocytic leukemia
(black normal, blue cancer cell) ch13
23 cumulative plot, detrended cp
Cumu Log (R-ratio)
Cumu homozygosity
24Combining two cumulative plots into one for
hemizygous deletion
Cumu hemi-del indicator
Detrended cumu
Hemizygous deletion indicator variable 1 if logR
is bw -2 and -0.346 AND homozygosity1 -1
otherwise
25 for homozygous deletion
cumu homo-del indicator var
detrended
Homozygous deletion indicator variable 1 if
log(R-ratio) lt -2 -1 otherwise
26CLL, ch6
Li, Lee, Gregersen, BMC Bioinformatics (2009)
- Signal is mainly from LOH, not from R-ratio
- Much harder to be confident in 10kb-50kb
27Hidden Markov models
28Some HMM-based CNV detection programs
- QuantiCNV www.well.ox.ac.uk/QuantiSNP/
- PennCNV www.neurogenome.org/cnv/penncnv/
- dChip biosun1.harvard.edu/complab/dchip/copy.htm
29advantages and points-to-consider (HMM)
- Using the same set of parameter throughout the
sequence implies that heterogeneity is not
allowed - The fixed parameter implies a characteristic
length for CNV regions
- Using information from R-ratio and theta series
simultaneously - Standard algorithm
30 - Overview of structural aberration
- Using genotyping array for CNV study
- Examples of disease mapping
- Autism
- Schizophrenia
- Crohns disease
- 4. Practical issues
311972
32Example(1) Autism
- Brain development disorder
- Age of diagnosis 3
- Impairment in social interaction, in
communication, restricted interests, repetitive
behavior - autism spectrum disorder Pervasive
Developmental Disorder - Not Otherwise Specified - Concordance rate in MZ twin 70/90, in DZ twin
5/10
33Sebat et al. Science (Apr 20, 2007)
34Example (1) More details
- Roughly 200 patients and 200 controls (patients
either have, or do not have, affected siblings) - ROMA technology is used resolution is 35kb
- 14 CNVs detected in 195 ASD, 2 CNVs in 196
controls (statistically significant) - Out of 14 CNVs, 12 in sporadic cases, 2 in
multiplex families - 12 out of 15 CNVs in cases are deletions, the 2
CNVs controls are duplications
35Smallest target gene (359kb)
gt1Mb
Duplication in controls
36Example(2) Schizophrenia
- Mental disorder
- Auditory hallucinations, paranoid delusions,
disorganized speech and thinking. - Age of onset early adulthood
- Concordance rate for MZ twins 48, for DZ twins
4
37Nature (Sep 11, 2008)
38ISC paper more details
- 3000 cases and 3000 controls
- Affymetrix Human SNP 5.0 and 6.0 array
- 6751 (gt100kb) CNVs are detected. 1.14 CNV per
person in cases, 0.99 in controls - Various attempts to increase the odd-ratio (gene
load, single CNV, larger-sized CNV), but more or
less the same - Confirming a known risk deletion on ch22q11 13
in cases, none in controls - Other 271 (gt500kb) deletions (161 in cases,110 in
controls). 15q13 (new). 1q21.
39(No Transcript)
40deCODE paper more details
- 1433 cases, 33250 controls, followed by 3285
cases, 7951 controls - Illumina HumanHap300, 550, Affymetrix GenomeWide
6.0 - Only search for de novo CNV (in 9878 parent-child
transmissions) 66 are found - Three deletions 1q21.1, 15q11.2, 15q13.3
4111/47180.2
26/47180.55
7/47180.1
42Example(3) Crohns disease(McCarroll et al.
Nat.Genet.2008)
43rs13361189
20kb deletion
44Contributions of these new CNV studies
- Autism new explanation on why the concordance
rate in MZ twins is high the twins share the
same deletion/duplication event. - Schizophrenia narrowing the 22q11.2 risk
deletion region from 17-21Mb to 3Mb - Crohns disease within the associated
region/gene, locate the causal mutation
45(No Transcript)
46 - Overview of structural variations
- Using genotyping array for CNV study
- Examples of disease mapping
- Practical issues
- common CNPs vs. de novo CNVs
47Common vs. de novo CNV
- Present in general population with a fixed
starting/ending positions - Similar to SNP, especially SNPs with the same
frequency - Are they already captured by SNPs?
48Common CNV
Perfect LD between CNV and SNPs can either be
good or bad Good provide a new potential for
causal mutation Bad the causal mutation can be a
SNP
49Common vs. de novo CNV
- More deleterious. The evolutionary negative
selection makes it more relevant to diseases like
autism and schizophrenia. - de novo CNVs should be easier to detect than de
novo SNPs (until we have cheaper sequencing
technologies) - Similar to cytogenetic studies (either genetic or
cancer studies) smaller sample sizes,
individually distinct mutations
50de novo CNV
Size of de novo CNV can be large, thus cover too
many genes
51Share similar issues as SNP-based whole-genome
association studies
- ethnic/population stratifications
- missing-typing rates differ between case and
control groups (usually the hetero-zygotes are
most likely to be untyped) - Multiple testings
52summary
- CNV as a new detectable source of
variation/mutation/polymorphism should not be
overlooked - Studies on de novo CNV in autism and
schizophrenia represent a new promising strategy - CNV hot spots (e.g. segmental duplication
regions) - Relevance of common CNVs to common diseases is to
be examined
53Source materials
- Feuk et al (2006), Structural variations in the
human genome, Nat. Rev. Genet.785-97. - URL www.nslij-genetics.org/duplication/ (600
papers) - Peiffer, Gunderson (2006), SNP-CGH technologies
for genomic profiling of LOH and copy number,
Clinical Laboratory International (May06). - Li, Lee, Gregersen (2009), Copy-number-variation
and copy-number-alteration region detection by
cumulative plots, BMC Bioinf, to appear. - Sebat et al (2007), Strong association of de
novo copy number mutations with autism, Science,
317445-449. - Stefansson et al. (2008), Large recurrent
microdeletions associated with schizophrenia,
Nature, 455232-236. - Int. Schizophrenia Consortium (2008),Rare
chromosomal deletions and duplications increase
risk of Schizophrenia, Nature, 455237-241. - McCarroll et al. (2008), Deletion polymorphism
upstream of IRGM associated with altered IRGM
expression and Crohn's disease, Nature Genet.
401107-1112. - McCarroll (2008), Extending genome-wide
association studies to copy-number variation,
Hum. Mol. Genet. 17R135-R142 - URL www.nslij-genetics.org/cnv/ (300 papers)