Title: Polymorphism discovery informatics
1Polymorphism discovery informatics
Gabor T. Marth
Department of Biology Boston College Chestnut
Hill, MA 02467
2Types of sequence variations
- Substitution-type single-nucleotide
polymorphisms are the most abundant form of
sequence variations
3Are all substitutions SNPs?
4What is SNP discovery?
- comparative analysis of multiple sequences from
the same region of the genome (redundant sequence
coverage)
5Steps of SNP discovery
6SNP discovery in diverse sequences
- many different types of sequences are available
for polymorphism discovery
genome
EST
WGS
BAC
BAC-end
restriction fragments
- early methods of SNP discovery focused on
specific sequence types
7General SNP mining PolyBayes
sequence clustering simplifies to database search
with genome reference
multiple alignment by anchoring fragments to
genome reference
paralog filtering by counting mismatches weighed
by quality values
SNP detection by differentiating true
polymorphism from sequencing error using quality
values
8SNP validation
9Genome-scale SNP mining projects
- Overlaps of large-insert clone sequences
10SNP genotyping
- SNP discovery which nucleotides in the genome
are polymorphic?
a g aacgtttatgtgattccagtaaa
tacggca c t
- SNP genotyping which alleles does an individual
carry at a nucleotide locus that is known to be
polymorphic?
11Genotyping by sequence
12Genome variation landscape
- nucleotide diversity on human chromosomes