Title: SNP Resources: Finding SNPs
1SNP Resources Finding SNPs Databases and Data
Extraction
Mark J. Rieder, PhD SeattleSNPs Variation
Workshop March 20-21, 2006
2Genotype - Phenotype Studies
Typical Approach I have candidate gene/region
and samples ready to study. Tell me what SNPs
to genotype.
Other questions How do I know I have all the
SNPs? What is the validation/quality of the SNPs
that are known? Are these SNPs informative in my
population/sample? What do I need to know for
selecting the best SNPs? How do I pick the
best SNPs?
What information do I need to characterize a SNP
for genotyping?
3Minimal SNP information for genotyping/characteriz
ation
- What is the SNP? Flanking sequence and
alleles. - FASTA format
- gtsnp_name
- ACCGAGTAGCCAG
- A/G
- ACTGGGATAGAAC
- dbSNP reference SNP (rs )
- Where is the SNP mapped? Exon, promoter, UTR,
etc - picture of gene with mapped to the gene
structure. - How was it discovered? Method
- What assurances do you have that it is real?
Validated how? - What population African, European, etc?
- What is the allele frequency of each SNP?
Common (gt10), rare - Are other SNPs associated - redundant?
Genotyping data!
4Finding SNPs Databases and Extraction
How do I find and download SNP data for
analysis/genotyping?
1. Entrez Gene - dbSNP - Entrez SNP 2. HapMap
Genome Browser 3. SeattleSNPs PGA Candidate
gene website 4. Web applications and other
tools NIEHS, PolyPhen, ECR Browser
5NCBI - Database Resource
IL1B
www.ncbi.nlm.nih.gov
6Finding SNPs Where do I start?
7Finding SNPs Where do I start?
NCBI - Entrez Gene (LocusLink replacement)
8Finding SNPs Entrez Gene
9dbSNP Geneview
10dbSNP Geneview
11Finding SNPs dbSNP validation
12(No Transcript)
13Finding SNPs dbSNP database
14Entrez SNP - dbSNP genotype retrieval
15Finding SNPs - Gene Genotype Report
16Graphic display of genotype data - Visual Genotype
17Finding SNPs - Gene Genotype Report
18Finding SNPs - Gene Genotype Report
19Minimal SNP information for genotyping/characteriz
ation
- What is the SNP? Flanking sequence and
alleles. - FASTA format
- gtsnp_name
- ACCGAGTAGCCAG
- A/G
- ACTGGGATAGAAC
- dbSNP reference SNP (rs )
- Where is the SNP mapped? Exon, promoter, UTR,
etc - picture of gene with mapped to the gene
structure. - How was it discovered? Method
- What assurances do you have that it is real?
Validated how? - What population African, European, etc?
- What is the allele frequency of each SNP?
Common (gt10), rare - Are other SNPs associated - redundant?
Genotyping data!
dbSNP - data is there
20Entrez Gene Entry - Entrez SNP
21Entrez SNP - direct dbSNP querying
22(No Transcript)
23Entrez SNP - Parseable Multi-SNP reports
24Entrez SNP - Parseable Multi-SNP reports
25Entrez SNP - Search Limiting Capabilities
IL1B
26Entrez SNP - Search Limits
27Entrez SNP - Search Limiting Capabilities
28(No Transcript)
29Entrez SNP - More Limit Searching
30Entrez SNP - More Limit Searching
31Entrez SNP - Query Term Capabilities
32Entrez SNP - Search Terms Fields
33Entrez SNP - Search Terms Fields
More advanced queries
2CHR AND "coding nonsynon"FUNC
34(No Transcript)
35Entrez SNP - Search Terms Fields
More advanced queries
2CHR AND "coding nonsynonymous"FUNC AND
"PGA-UW-FHCRC"HANDLE Note Can also use
wildcard () characters, AND, OR, and NOT
operators
36Entrez SNP - Advanced Queries
37Minimal SNP information for genotyping/characteriz
ation
- What is the SNP? Flanking sequence and
alleles. - FASTA format
- gtsnp_name
- ACCGAGTAGCCAG
- A/G
- ACTGGGATAGAAC
- dbSNP reference SNP (rs )
- Where is the SNP mapped? Exon, promoter, UTR,
etc - picture of gene with mapped to the gene
structure. - How was it discovered? Method
- What assurances do you have that it is real?
Validated how? - What population African, European, etc?
- What is the allele frequency of each SNP?
Common (gt10), rare - Are other SNPs associated - redundant?
Genotyping data!
EntrezSNP - better!
38Finding SNPs - Entrez SNP Summary
- dbSNP is useful for investigating detailed
information on a - small number SNPs - and its good for a picture of
the gene - Entrez SNP is a direct, fast, database for
querying SNP data. - Data from Entrez SNP can be retrieved in batches
for many SNPs - Entrez SNP data can be limited to specific
subsets of SNPs - and formatted in plain text for easy parsing and
manipulation - More detailed queries can be formed using
specific field tags - for retrieving SNP data
39Finding SNPs Databases and Extraction
How do I find and download SNP data for
analysis/genotyping?
1. Entrez Gene - dbSNP - Entrez SNP 2. HapMap
Genome Browser 3. SeattleSNPs PGA Candidate
gene website 4. Web applications and other
tools NIEHS, PolyPhen, ECR Browser
40www.hapmap.org
41Finding SNPs HapMap Browser
42Finding SNPs HapMap Browser
43Finding SNPs HapMap Genotypes
44Finding SNPs HapMap Browser
45Minimal SNP information for genotyping/characteriz
ation
- What is the SNP? Flanking sequence and
alleles. - FASTA format
- gtsnp_name
- ACCGAGTAGCCAG
- A/G
- ACTGGGATAGAAC
- dbSNP reference SNP (rs )
- Where is the SNP mapped? Exon, promoter, UTR,
etc - picture of gene with mapped to the gene
structure. - How was it discovered? Method
- What assurances do you have that it is real?
Validated how? - What population African, European, etc?
- What is the allele frequency of each SNP?
Common (gt10), rare - Are other SNPs associated - redundant?
Genotyping data!
46Finding SNPs HapMap Browser
- HapMap data sets are useful because
- individual genotype data can be used to determine
optimal - genotyping strategies (tagSNPs) or perform
population - genetic analyses (linkage disequilbrium)
- Data are specific produced by those projects (not
all dbSNP) - HapMap data is available in dbSNP
- HapMap data (Phase II) can be accessed preleased
prior to dbSNPs - Easier visualization of data and direct access to
- SNP data, individual genotypes, and LD analysis
47Finding SNPs Databases and Extraction
How do I find and download SNP data for
analysis/genotyping?
1. Entrez Gene - dbSNP - Entrez SNP 2. HapMap
Genome Browser 3. SeattleSNPs PGA Candidate
gene website 4. Web applications and other
tools NIEHS, PolyPhen, ECR Browser
48Finding SNPs SeattleSNPs Candidate Genes
pga.gs.washington.edu
49Finding SNPs SeattleSNPs Candidate Genes
50Finding SNPs SeattleSNPs Candidate Genes
51HapMap Compatible
52Finding SNPs SeattleSNPs Candidate Genes
53Finding SNPs SeattleSNPs Candidate Genes
54(No Transcript)
55(No Transcript)
56(No Transcript)
57(No Transcript)
58SNP_pos lttabgt Ind_ID lttabgt allele1 lttabgt
allele2 Repeat for all individuals Repeat for
next SNP
59(No Transcript)
60(No Transcript)
61SIFT Sorting Intolerant From Tolerant Evolution
ary comparison of non-synonymous SNPs PolyPhen -
Polymorphism Phenotyping Structural protein
characteristics and evolutionary comparison
62PolyPhen Polymorphism Phenotyping- prediction
of functional effect of human nsSNPs
Physical and comparative analyses used to make
predictions Uses SwissProt annotations to
identify known domains Calculates a substitution
probability from BLAST alignments of homologous
and orthologous sequences Ranks substitutions on
scale of predicted functional effects from
benign to probably damaging
http//tux.embl-heidelberg.de/ramensky/
63PolyPhen Polymorphism Phenotyping- prediction
of functional effect of human nsSNPs
tux.embl-heidelberg.de/ramensky/
64Finding SNPs SeattleSNPs Candidate Genes
65Finding SNPs SeattleSNPs Candidate Genes
pga.gs.washington.edu
66Finding SNPs SeattleSNPs Candidate Genes
67Finding SNPs NIEHS SNPs Candidate Genes
egp.gs.washington.edu
68(No Transcript)
69ECR Browser Evolutionary Conserved Regions
Aligns sequences to Mouse, Rat, Dog, Opposum,
Chicken, Fugu and Drosophila Gene annotations
from UCSC Genome Browser Easy retrieval of ECR
sequences and alignments Pre-computed
transcription factor binding sites
http//ecrbrowser.dcode.org
70ECR Browser Evolutionary Conserved Regions
71ECR Browser Evolutionary Conserved Regions
Human-mouse alignment
Fasta sequences
72ECR Browser Evolutionary Conserved Regions
Transcription Factor Binding Sites from Transfac
73Finding SNPs Databases and Extraction
Entrez SNP (www.ncbi.nlm.nih.gov/entrez) Direct
access to dbSNP data - versatile and flexible
querying HapMap Browser (hapmap.org) Access to
large scale genotype data Rapid/early access on
HapMap website Browsers provide visualization
and other analysis tools SeattleSNPs
(pga.gs.washington.edu) Candidate gene focused -
inflammation - HLBS phenotypes Comprehensive SNP
data from resequencing Early access - prior to
dbSNP release Other Resources NIEHS SNPS
(egp.gs.washington.edu), Polyphen, ECR (with
TransFac)