Title: Factors to Consider in Selecting a Genotyping Platform
1Factors to Consider in Selecting a Genotyping
Platform
- Elizabeth Pugh
- June 22, 2007
2GWA Studies
- Genotype 300,000 to 1,000,000 SNPs
- 3 platforms, multiple products
- Affymetrix
- Illumina
- Perlegen
- How to choose?
3What I can cover
- Basics of calling genotypes
- Examples of good and bad data
- Some things to consider
4Basics of how it works
- Skipping chemistry
- Generate intensity data for 2 alleles
- Assign genotypes based on clustering
- These are phenotypes there is measurement
error - No manual review of data too many SNPs
5A good SNP
6Same SNP different view
7Same SNP different view
8Another Good SNP
9And Another Good SNP
10Data Quality
- Most of the data is good for all platforms
- Some samples, SNPs and genotypes fail
- Have to find them without manual review
11Ways to find bad data
- Use summary statistics across SNPs, samples
- Include investigator and control replicates
- Include control and where possible investigator
trios - If use Hapmap controls can compare with caution
to Hapmap genotypes there are some errors in
Hapmap data
12Finding Bad SNPs
- Use qc checks
- Call rate
- Mendelian Inheritance
- Replicates
- HWE
- Quality score, clustering
- Note some bad SNPs will pass any qc filter
- Some good SNPs may fail qc
13Bad SNP caught by qc filter
14Bad SNP caught by qc filter
15Bad SNP caught by qc filter
16Bad SNP caught by qc filter
17Bad SNP caught by qc filter
18Bad SNP caught by qc filter
19Bad SNP caught by qc filter
20Bad SNP caught by qc filter
21Bad SNP caught by qc filter
22Yikes! Some of those are awful!
- Yes
- We can find many, hopefully most of them but
- Use the intensity data to plot your most
significant SNPs - Look at them before you publish
23Use a lab that will give you intensity data
- If you have intensity data you can
- Plot the intensities to check clustering
- Cluster with a different algorithm
- Recluster as algorithms get better
- Recluster subsets or supersets of the data
- Create your own metrics (e.g. number of samples
with no or very low intensity)
24Finding Bad Samples
- Look at sample level metrics starting with call
rate - Bad samples - even water will have some genotypes
- May want to remove possibly bad sample before
clustering the data then make final sample
decisions
25Sample plotall SNPs for one sample sample call
rate 99.8
26Sample plot Failed samplelow intensity Call
freq 41
27Failed samples tend to fall outside of clusters
for many SNPs
28Failed samples tend to fall outside of clusters
for many SNPs
29Can I use WGA samples?
- Whole Genome Amplified DNA performance ranges
from awful to very good - Even WGA samples that work very well may perform
poorly for some SNPs - Extra attention needed for clustering decisions
and for analysis - Make sure lab knows sample type for each sample
30WGA clustering with other samples
31WGA lower intensityCall freq 98
32WGA failurecall rate 93
33Multiple sample types in study
- Look at data by sample type (metrics and plots)
- If they are not performing equivalently do lots
of extra qc by sample type - If have to cluster separately even more qc and
checks are needed - If sample type is not random may cause more
headaches (e.g. different types for cases and
controls)
34Preventing Bad Data
- Discuss sample types with lab what is their
experiece? May want to test some before start
project - Discuss plating with lab may wish to place
controls uniquely or arrange males and females
uniquely by plate
35Preventing Bad Data
- Differences in intensity (batch effects) are not
common but possible - May only be present for subset of SNPs
- May want to mix cases and controls across plates
to minimize effect of plate effect if it happens
36Genotypes
- For good SNPs and samples some genotypes will
fail - May not be called
- May be called with low confidence or quality
score - May be called wrong
371 genotype not called
381 wrong genotype
39Copy number
- With Affymetrix and Illumina intensity
information can be used to infer copy number - Works very well with small numbers of samples and
manual review - Not really a high throughput system software
not sensitive or specific enough Yet
40Genome viewer
41Female Chr X
42Male chrX
43Known Frequent CNV chr 10
44Known Frequent CNV chr 10
45Choosing a Platform and ProductFactors to
Consider
- Your study
- Population
- Study design
- Sample types
- Combining data with other studies
- Interest in CNVs
- Product
- Coverage of the genome
- How many SNPs
- Which SNPs (tagging, in or near genes)
- Quality of data
- Performance on your sample types
- Information on CNVs
46Comparing PlatformsMake sure the numbers are
comparable!
- QC rates reported denominators can differ
- Mendel errors per trio or per sample
- Replicate errors per pair or per sample
47Comparing PlatformsMake sure the numbers are
comparable!
- SNPs on the chip are correlated with many others
often very strong correlation - There are multiple
- measures of the strength of the correlation
- Lists of SNPs to use as proxy for Genome
48Cost?
- Hard to say
- Changing rapidly
- Generally increase with the numbers of SNPs on a
chip - May decrease with number of samples in a study
- Reagents (the chips) are only part of the cost
49New Stuff!
50New GWA ArraysAffymetrix and Illumina
- 1 million SNPs
- Enhanced copy number content
- Different strategies
- Improved coverage in YRI population
- Illumina 1M still pre-release
- Same chemistry, same software, same probe
designs, same lab workflow as other Infinium
products - Affymetrix 6.0 just released
- Same chemistry lab workflow as 5.0
- Changes in probe design software
51More SNPs are better, right?
- Maybe not always
- Methods that use the genotypes on samples plus
Hapmap data to infer ungenotyped SNPs - Can use infered genotypes in analysis
- Can combine data from studies that used different
SNPs - more samples on fewer genotypes may give more
power - Need enough genotypes for your population to
infer SNPs
52One or Two Stage Designs
- A year ago everyone was thinking about 2 stage
designs - GWA scan on part of sample
- Follow up a subset of significant results in rest
of sample - Now may cost less to do GWA scan on all samples
53Effect Size of 1.2 !!!!
- Recent GWA studies have found small effect sizes
- May need many, many samples to have reasonable
power
54Choosing a platform
- Must balance coverage, QC and cost per sample to
design the most powerful study you can - Costs, products, clustering, qc and analysis
methods are changing rapidly - What is best will change
55www.cidr.jhmi.edu
56The end
57(No Transcript)