SNP and Variation - PowerPoint PPT Presentation

1 / 31
About This Presentation

SNP and Variation


SNP and Variation Ka-Lok Ng Asia University – PowerPoint PPT presentation

Number of Views:125
Avg rating:3.0/5.0
Slides: 32
Provided by: edut1550


Transcript and Presenter's Notes

Title: SNP and Variation

SNP and Variation
  • Ka-Lok Ng
  • Asia University

  • http//
  • http//

  • Having sequenced the genomics ? then studies the
    nature and distribution of variation between
  • Variation at DNA level nucleotide insertions,
    deletions, and Single Nucleotide Polymorphism
    (SNP) or small nucleotide polymorphisms
  •     SNP refers to any site where two or more
    different nucleotides are segregating in
    population.    Cluster of linked SNPs
    haplotype    SNPs and haplotypes are
    increasingly important component in biological
    studies which range from ecology and evolution to
    biomedical (disease association study)  
     These variations apply to characterization of
    population structure and history or functional
    study of genes.    They are indispensable for
    recombination mapping purposes (linkage
    analysis) or used as positional markers for
    physical mapping
  • SNPs are the most common genetic variations occur
    once every 100 to 300 bases.

The Nature of Single Nucleotide Polymorphisms
  • Classification of SNPs
  • Most common changing from one base to another
  • This could either be transversions or transitions
  • Could also be insertions and deletions, also
    termed indels
  • Some geneticists see two-nucleotide changes and
    small insertions/deletions of a few nucleotides
    as SNPs, therefore simple-nucleotide
    polymorphism may be a better description
  • Microsatellites, longer sequence repeats, and any
    other molecular polymorphism (transposable
    element insertions, deletions, chromosome
    inversions and translocations, and aneuploidy)
    are not regarded as SNPs
  • Aneuploidy is an error in cell division that
    results in the "daughter" cells having the wrong
    number of chromosomes. In some cases there is a
    missing chromosome, while in others an extra.

Classification of SNPs
  • SNPs classified on nature of affected nucleotide
  • Noncoding SNP 5 or 3 nontranscribed region
    (NTR), 5 or 3 untranslated region (UTR),
    intron, or intergenic

3.1 (Part 1) Human promoter SNPs that affect
gene expression
  • Coding SNP replacement polymorphisms (change
    the amino acid encoded for) or synonymous
    polymorphisms (change the codon but not the amino
  • Nonreplacement polymorphisms include both
    synonymous and noncoding polymorphisms, but,
    could still affect gene function by having an
    effect on transcriptional or translational
    regulation, splicing, or RNA stability.
  • This type of polymorphism is important in
    increased genetic variation (Fig 3.1).
  • Fig. 3.1 a collection of over 140 human
    promoter SNPs that have been associated with an
    effect on gene expression or TF binding, and in
    many cases, a clinical outcome

Fig. 3.1. Human promoter SNPs that affect gene
expression. These are loci for which a SNP has
been implicated in modulation of transcript
levels, either by statistical association or
using a biochemical assay in cell lines that are
dispersed throughout the human genome. The figure
shows where some of these nonreplacement
polymorphisms lie and affect gene expression.
3.1 (Part 2) Human promoter SNPs that affect
gene expression
  • Fig. 3.1. Human promoter SNPs that affect gene

  • SNPs can also be classified as transitions or
  • Transitions change purine to a purine (A ?? G)
    or a pyrimidine to a pyrimidine (C ?? T)
  •    Transversions change purine to pyrimidine
    and vice versa (A or G ? C or T and vice versa)
  • Transitions tend to occur just as frequently as
    transversions and are actually more prevalent
    (???), despite transversions having twice as many
    possible changes
  • This holds broadly true for both coding and
    noncoding SNPs
  • In part a result of difference in ab initio
    (protein prediction) mechanisms where certain
    types of mutations arise and are repaired
  • Due to the nature of the genetic code,
    transitions are less likely to affect amino acids
    than transversions.
  • This means transitions are thought to have a
    higher probability of retaining the proper coding

number of transitions/number of transversions gt 1
in coding region
  • Synonymous
  • TGT ? TGC results in Cys ? Cys
  • Nonsynonymous replacement
  • TGT ? TGG results in Cys ? Trp
  • can be conservative or nonconservative
  • Nonsynonymous nonsense mutation, introduction of
    a stop codon
  • TGT ? TGA results in Cys ? stop
  • Nonsynonymous read through mutation
  • TAA ? TTA results in stop ? Ile

  • SNP and disease
  • Sickle-cell anaemia a disease caused by a
    specific SNP an A?T mutation (GTGAG ? GTGTG) in
    the b-globin gene changes a Glu ? Val, creating a
    sticky surface on the haemoglobin molecule that
    leads to polymerization of the deoxy form
  • SNP and blood groups A, B and O alleles
  • A and B alleles differ by four SNP substitutions
  • They code for related enzymes that add different
    saccharide (sugar, general formula (CH2O)n) units
    to an antigen on the surface of red blood cells
  • Allele Sequence
  • A .gctggtgacccctt
  • B .gctcgtcaccgcta
  • O .cgtggt-acccctt
  • The O allelle has undergone a mutation causing
  • a phase shift, and produce no enzyme. The rbc of
    type O
  • contain neither the A nor the B antigen, This is
    why people with
  • type O blood are universal donors in bolld
  • The loss of activity of the protein does not
    seem to carry
  • any adverse consequences.

The ABO antigens are terminal sugars found at the
end of long sugar chains (oligosaccharides) that
are attached to lipids on the red cell membrane.
The A and B antigens are the last sugar added to
the chain.  The "O" antigen is the lack of A or B
antigens but it does have the most amount of next
to last terminal sugar that is called H antigen.
  • In classical population genetic theory, genetic
    loci are only regarded as polymorphic if the
    frequency of the most common allele is lt 95 ?
    that is a 5 changes
  • Most SNP are first detected in a sample of fewer
    than 10 individuals, so the frequency criterion
    is not applied all single nucleotide changes are
    described initially as candidate SNPs.
  • NCBI dbSNP http//
  • Seattle SNP http//

  • From Fig. 3.1 ? chromosome 1 FY, and do a NCBI
  • NCBI ? SNP ? keyword ? FY AND homo ? refSNP ID

  • Comment - polymorphisms ? mutations
  • Confusion arises over the distinction between
    polymorphisms and mutations, largely due to dual
    usage of the term mutation.
  • All SNPs arise as mutations, in the sense that
    the conversion of one nucleotide into another is
    a mutational event. But by the time a seq.
    variant is observed in a population, the event
    that created it is usually long past, so the
    observed SNP is no longer a mutation it is just
    a rare seq. variant or a polymorphism.
  • Since the distinction only applies to a small
    fraction of all SNPs, then the term polymorphism
    is more general.

Distribution of SNPs
  • Distribution of SNP's lies within the domain of
    population genetics
  • Study of relationship between SNP's and
    phenotypic variation lies in the domain of
    Quantitative Genetics
  • Application of SNP ? Quantitative trait loci
    (QTL), which are loci that contribute to
    polygenic phenotypic variation
  • Neutral theory of molecular evolution
  • Balance between mutation and genetic drift
  • Rate of mutations introduced into a population
    rate at which polymorphisms are lost
  • Most mutations whether deleterious, advantageous
    or neutral in effect, are lost within a few
  • The effect of selection acts to reduce the
    frequency of slightly deleterious alleles, but on
    occasion tends to favor a new allele (positive
    selection) or maintain two or more polymorphisms
    (balancing selection) at some loci

  • Three key concepts are important in
    characterizing SNP variation
  • Allele frequency distribution
  • Linkage disequilibrium
  • Population stratification (??)
  • Aspects of frequency distribution
  • Population structure - example SNP can be more
    frequent in one population than another. As
    migration is a potent (???) source of diversity,
    isolation affects the rate at which variation is
    lost (i.e. no variation) due to drift.
  • Nucleotide Diversity - the average fraction of
    nucleotides that differ between a pair of alleles
    chosen at random from a population
  • Hs lower nucleotide diversity, with an average
    of one SNP every kbp between the chromosomes of
    any individuals
  • Fly and maize an order of magnitude greater
    polymorphism, with one SNP every 50-100 bp
  • Linkage Disequilibrium and Haplotype Maps
  • Linkage Disequilibrium (LD) Non-random
    association of alleles
  • LD allows mapping of disease loci in large
  • In humans - LD is commonly observed for several
    tens, and in many cases, 100 kbps of either side
    of SNP
  • LD has an effect on haplotypes which display
    clustered distribution
  • Broad approximation - Genome tens of thousands
    of blocks
  •     Each block   up to 100,000 bases
  •                         3 5 common
  •     Each haplotype tens or hundreds of SNPs in

3.2 (Part 1) Nucleotide diversity in natural
  • Fig. 3.2 Nucleotide diversity in natural
    population. (A) Observed and expected of SNP
    frequencies for 874 SNP's from 75 candidate human
    hypertension loci. Rare alleles are the most
    frequent, and the number of SNPs in each
    frequency class declines as the more rare allele
    becomes more common.

In a sample of several hundred alleles, the
most common class of SNPs are singletons (which
appear only once in the sample), followed by
doubletons, tripletons, and so on. Only between
1/3 and ½ of all SNPs are common in the sense
that the more rare allele is present in more than
5 of the individuals.
3.2 (Part 2) Nucleotide diversity in natural
  • (B) LD (D) decays with time (number of
    generations) in proportion to the recombination
    rate r.
  • (C) The level of nucleotide diversity is a
    function of recombination rate, and hence
    chromosomal position, as in this example for

(B) As number of generations ?, frequency of SNP
segregate ? (no more clustering) ? LD ? (C)
as r ?, nucleotide diversity ?
  • NCBI dbSNP http//

dbSNP accepts submissions for SNP, microsatellite
repeats, and small-scale deletion and insertion
dbSNP summary for various species
  • dbSNP
  • Submitted data
  • The submitter HANDLE is a short tag that uniquely
    defines each submitting laboratory in the
  • A unique ssSNP identifier SNP order record, such
    as ss4923558, HANDLE YUSUKE
  • Keyword ss4923558 AND homo
  • Keyword ss4923558 will return multiple records !
    More than 11 rsSNP records
  • More than one submitter ? more than one ssSNP ?
    these ssSNP are clustered into reference SNP
    identifier ? rsSNP

  • dbSNP

Alleles A/G Ancestral Allele G Handle YUSUKE,
  • Gene View of SNP

  • Go to the bottom of the page
  • JBIC sample size 1270, Allele frequency of A
    and G
  • Other populations have a smaller sample size

  • Click NCBI Assay ID ? ss4923558 ?
  • Japanese Millennium Genome Project
  • Measured in a group of East Asian DNA samples
  • There is no individual genotype data for ss4923558
  • Click HandleSubmitter ID
  • YUSUKEIMS-JST082810 ?
  • Allele frequency
  • G 0.8929
  • A 0.1071
  • Sample Size 1270 (number of chromosomes)

  • Entrez SNP search terms
  • http//

  • SNP integration in Genome Browsers
  • Ensembl http//
  • rs3737559
  • SNP rs3737559 is located in the following
  • Genotype and Allele frequencies per population

  • The local DNA seq. within 100 kb on either side
    of the SNP is shown.

The different types of SNPs are color coded as to
type (e.g. coding, intronic, flanking or other).
Deletion and insertion polymorphisms are
indicated with a triangle. The letters (K, M, R,
S, W, Y) inside the SNP squares indicate the type
of SNP using IUPAC ambiguity codes.
  • UCSC Genome browser http//
  • BRCA1 gene

  • NCBI Entrez Gene
  • Gene BRAC1
  • http//

SNP GeneView
The coding SNPs in the BRCA1 gene. Those that do
not change the aa are colored in green, those
that result in a different aa are colored in red.
SNP association studies
  • Association studies
  • A case group of people vs. a control group of
  • The case group - are diagnosed with some disease
    (e.g. cystic fibrosis), react to some type of
    medicine, or are even specially healthy (e.g.
    more than 100 years old)
  • The control group are people that do not exhibit
    the feature selected for the case group.
  • For case-control studies, a selection of SNPs is
    genotyped in both the case and control groups
  • alleles (case group) gt alleles (control group) ?
    potential markers for the observed phenotype

SNP and disease
  • Functional variation a SNP may be assoicated
    with a nonsynonymous substitution in a coding
Write a Comment
User Comments (0)