Factors to Consider in Selecting a Genotyping Platform - PowerPoint PPT Presentation

1 / 57
About This Presentation
Title:

Factors to Consider in Selecting a Genotyping Platform

Description:

Generate intensity data for 2 alleles. Assign genotypes based on clustering ... With Affymetrix and Illumina intensity information can be used to infer copy number ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 58
Provided by: kimdo3
Learn more at: https://www.genome.gov
Category:

less

Transcript and Presenter's Notes

Title: Factors to Consider in Selecting a Genotyping Platform


1
Factors to Consider in Selecting a Genotyping
Platform
  • Elizabeth Pugh
  • June 22, 2007

2
GWA Studies
  • Genotype 300,000 to 1,000,000 SNPs
  • 3 platforms, multiple products
  • Affymetrix
  • Illumina
  • Perlegen
  • How to choose?

3
What I can cover
  • Basics of calling genotypes
  • Examples of good and bad data
  • Some things to consider

4
Basics of how it works
  • Skipping chemistry
  • Generate intensity data for 2 alleles
  • Assign genotypes based on clustering
  • These are phenotypes there is measurement
    error
  • No manual review of data too many SNPs

5
A good SNP
6
Same SNP different view
7
Same SNP different view
8
Another Good SNP
9
And Another Good SNP
10
Data Quality
  • Most of the data is good for all platforms
  • Some samples, SNPs and genotypes fail
  • Have to find them without manual review

11
Ways to find bad data
  • Use summary statistics across SNPs, samples
  • Include investigator and control replicates
  • Include control and where possible investigator
    trios
  • If use Hapmap controls can compare with caution
    to Hapmap genotypes there are some errors in
    Hapmap data

12
Finding Bad SNPs
  • Use qc checks
  • Call rate
  • Mendelian Inheritance
  • Replicates
  • HWE
  • Quality score, clustering
  • Note some bad SNPs will pass any qc filter
  • Some good SNPs may fail qc

13
Bad SNP caught by qc filter
14
Bad SNP caught by qc filter
15
Bad SNP caught by qc filter
16
Bad SNP caught by qc filter
17
Bad SNP caught by qc filter
18
Bad SNP caught by qc filter
19
Bad SNP caught by qc filter
20
Bad SNP caught by qc filter
21
Bad SNP caught by qc filter
22
Yikes! Some of those are awful!
  • Yes
  • We can find many, hopefully most of them but
  • Use the intensity data to plot your most
    significant SNPs
  • Look at them before you publish

23
Use a lab that will give you intensity data
  • If you have intensity data you can
  • Plot the intensities to check clustering
  • Cluster with a different algorithm
  • Recluster as algorithms get better
  • Recluster subsets or supersets of the data
  • Create your own metrics (e.g. number of samples
    with no or very low intensity)

24
Finding Bad Samples
  • Look at sample level metrics starting with call
    rate
  • Bad samples - even water will have some genotypes
  • May want to remove possibly bad sample before
    clustering the data then make final sample
    decisions

25
Sample plotall SNPs for one sample sample call
rate 99.8
26
Sample plot Failed samplelow intensity Call
freq 41
27
Failed samples tend to fall outside of clusters
for many SNPs
28
Failed samples tend to fall outside of clusters
for many SNPs
29
Can I use WGA samples?
  • Whole Genome Amplified DNA performance ranges
    from awful to very good
  • Even WGA samples that work very well may perform
    poorly for some SNPs
  • Extra attention needed for clustering decisions
    and for analysis
  • Make sure lab knows sample type for each sample

30
WGA clustering with other samples
31
WGA lower intensityCall freq 98
32
WGA failurecall rate 93
33
Multiple sample types in study
  • Look at data by sample type (metrics and plots)
  • If they are not performing equivalently do lots
    of extra qc by sample type
  • If have to cluster separately even more qc and
    checks are needed
  • If sample type is not random may cause more
    headaches (e.g. different types for cases and
    controls)

34
Preventing Bad Data
  • Discuss sample types with lab what is their
    experiece? May want to test some before start
    project
  • Discuss plating with lab may wish to place
    controls uniquely or arrange males and females
    uniquely by plate

35
Preventing Bad Data
  • Differences in intensity (batch effects) are not
    common but possible
  • May only be present for subset of SNPs
  • May want to mix cases and controls across plates
    to minimize effect of plate effect if it happens

36
Genotypes
  • For good SNPs and samples some genotypes will
    fail
  • May not be called
  • May be called with low confidence or quality
    score
  • May be called wrong

37
1 genotype not called
38
1 wrong genotype
39
Copy number
  • With Affymetrix and Illumina intensity
    information can be used to infer copy number
  • Works very well with small numbers of samples and
    manual review
  • Not really a high throughput system software
    not sensitive or specific enough Yet

40
Genome viewer
41
Female Chr X
42
Male chrX
43
Known Frequent CNV chr 10
44
Known Frequent CNV chr 10
45
Choosing a Platform and ProductFactors to
Consider
  • Your study
  • Population
  • Study design
  • Sample types
  • Combining data with other studies
  • Interest in CNVs
  • Product
  • Coverage of the genome
  • How many SNPs
  • Which SNPs (tagging, in or near genes)
  • Quality of data
  • Performance on your sample types
  • Information on CNVs

46
Comparing PlatformsMake sure the numbers are
comparable!
  • QC rates reported denominators can differ
  • Mendel errors per trio or per sample
  • Replicate errors per pair or per sample

47
Comparing PlatformsMake sure the numbers are
comparable!
  • SNPs on the chip are correlated with many others
    often very strong correlation
  • There are multiple
  • measures of the strength of the correlation
  • Lists of SNPs to use as proxy for Genome

48
Cost?
  • Hard to say
  • Changing rapidly
  • Generally increase with the numbers of SNPs on a
    chip
  • May decrease with number of samples in a study
  • Reagents (the chips) are only part of the cost

49
New Stuff!
50
New GWA ArraysAffymetrix and Illumina
  • 1 million SNPs
  • Enhanced copy number content
  • Different strategies
  • Improved coverage in YRI population
  • Illumina 1M still pre-release
  • Same chemistry, same software, same probe
    designs, same lab workflow as other Infinium
    products
  • Affymetrix 6.0 just released
  • Same chemistry lab workflow as 5.0
  • Changes in probe design software

51
More SNPs are better, right?
  • Maybe not always
  • Methods that use the genotypes on samples plus
    Hapmap data to infer ungenotyped SNPs
  • Can use infered genotypes in analysis
  • Can combine data from studies that used different
    SNPs
  • more samples on fewer genotypes may give more
    power
  • Need enough genotypes for your population to
    infer SNPs

52
One or Two Stage Designs
  • A year ago everyone was thinking about 2 stage
    designs
  • GWA scan on part of sample
  • Follow up a subset of significant results in rest
    of sample
  • Now may cost less to do GWA scan on all samples

53
Effect Size of 1.2 !!!!
  • Recent GWA studies have found small effect sizes
  • May need many, many samples to have reasonable
    power

54
Choosing a platform
  • Must balance coverage, QC and cost per sample to
    design the most powerful study you can
  • Costs, products, clustering, qc and analysis
    methods are changing rapidly
  • What is best will change

55
www.cidr.jhmi.edu
56
The end
57
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com