1 of 25 - PowerPoint PPT Presentation

About This Presentation
Title:

1 of 25

Description:

Polymorphism: a DNA variation in which each possible sequence is present in at ... Zebrafish. Tetraodon. Mosquito. 11 of 25. Caveat ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 26
Provided by: bert169
Category:
Tags: zebrafish

less

Transcript and Presenter's Notes

Title: 1 of 25


1
Sequence Variation in Ensembl
2
Outline
  • SNPs
  • SNPs in Ensembl
  • Linkage disequilibrium
  • SNPs in BioMart
  • DAS sources

3
Single nucleotide polymorphisms (SNPs)
  • Two human genomes differ by 0.1
  • Polymorphism a DNA variation in which each
    possible sequence is present in at least 1 of
    people
  • Most polymorphisms (90) take the forms of SNPs
    variations that involve just one nucleotide
  • 1 out of every 300 bases in the human genome
  • 10 million in the human genome

4
Functional Consequences
  • SNPs in coding area that alter aa sequence
  • SNPs in coding areas that dont alter aa sequence
  • SNPs in promoter or regulatory regions
  • SNPs in other regions

Cause of most monogenic disorders, e.g
Hemochromatosis (HFE) Cystic fibrosis (CFTR)
Hemophilia (F8) May affect splicing May
affect the level, location or timing of gene
expression No direct known impact on phenotype,
useful as markers
5
Practical Applications
  • Disease diagnosis
  • Association studies
  • Pharmacogenomics
  • Forensic testing
  • Population genetics and evolutionary studies
  • Marker-assisted selection

6
Practical Applications
7
SNPs in Ensembl
  • Most SNPs imported from dbSNP (rs)
  • Imported data alleles, flanking sequences,
    frequencies, .
  • Calculated data position, synonymous status,
    peptide shift, .
  • For human also
  • HGVbase
  • TSC
  • Affy GeneChip 100K and 500K Mapping Array
  • Affy Genome-Wide SNP array 6.0
  • Ensembl-called SNPs (from Celera reads and Jim
    Watsons and Craig Venters genomes)
  • For mouse, rat, dog and chicken also
  • Sanger- and Ensembl-called SNPs (other strains /
    breeds)

8
dbSNP
  • Central repository for simple genetic
    polymorphisms
  • single-base nucleotide substitutions
  • small-scale multi-base deletions or insertions
  • retroposable element insertions and
    microsatellite repeat variations
  • http//www.ncbi.nlm.nih.gov/SNP/index.html
  • For human (dbSNP build 128)
  • 34,434,159 submissions (sss)
  • 11,883,685 RefSNP clusters (rss)
  • 6,262,709 validated
  • 737,679 with frequency

9
SNPs in Ensembl - Types
  • Non-synonymous In coding sequence, resulting in
    an aa change
  • Synonymous In coding sequence, not resulting in
    an aa change
  • Frameshift In coding sequence, resulting in a
    frameshift
  • Stop lost In coding sequence, resulting in the
    loss of a stop codon
  • Stop gained In coding sequence, resulting in the
    gain of a stop codon
  • Essential splice site In the first 2 or the last
    2 basepairs of an intron
  • Splice site 1-3 bps into an exon or 3-8 bps into
    an intron
  • Upstream Within 5 kb upstream of the 5'-end of a
    transcript
  • Regulatory region In regulatory region annotated
    by Ensembl
  • 5' UTR In 5' UTR
  • Intronic In intron
  • 3' UTR In 3' UTR
  • Downstream Within 5 kb downstream of the 3'-end
    of a transcript
  • Intergenic More than 5 kb away from a transcript

10
SNPs in Ensembl - Species
11
Caveat
  • For human, mouse and rat Ensembl defines all SNP
    alleles respective to the strand of the genome
    assembly! (to be able to merge dbSNP data with
    Sanger resequencing data)
  • Exceptions
  • Those cases where SNPs are shown as part of a
    sequence

12
  • A missense SNP, C1858T, in PTPN22
    (Tyrosine-protein
  • phosphatase non-receptor type 22) has been
    identified as a
  • genetic risk factor for rheumatoid arthritis.
  • This SNP is also referred to as R620W.
  • Find the SNPView page for this SNP.
  • Why are the alleles on this page given as A/G?
  • What is the minor allele of this SNP in
    Caucasians?

13
SNPs in Ensembl
GeneSNPView (1)
Transcript
InterPro domains
SNP alleles
14
SNPs in Ensembl
GeneSNPView (2)
15
SNPs in Ensembl
TranscriptSNPView (1)
  • Shows SNP alleles in different
  • Individuals (human)
  • Celera HuAA, HuCC, HuDD and HuFF, Craig Venter,
    Jim Watson
  • Strains (mouse, rat)
  • Breeds (chicken, dog)

16
SNPs in Ensembl
TranscriptSNPView (2)
Different individuals
Resequencing coverage
SNP alleles
Alleles in different individuals
17
SNPs in Ensembl
TranscriptSNPView (3)
18
  • Find the TranscriptSNPView page for human PTPN22.
  • Do all individuals (HuAA, HuCC, HuDD, HuFF,
    Venter and Watson) have resequence coverage at
    the position of the C1858T (R620W) SNP?
  • Has any of the individuals a higher risk to get
    rheumatoid arthritis based on its genotype at
    this position?
  • Is there an individual that is heterozygote at
    this position?

19
Haplotypes and Linkage Disequilibrium
  • A haplotype is a set of SNPs on a single
    chromatid that are statistically associated
  • Linkage disequilibrium describes a situation in
    which some combinations of SNP alleles occur more
    or less frequently in a population than would be
    expected from a random formation of haplotypes
    from alleles based on their frequencies

20
Measures of LD
  • D P(AB) P(A)P(B)
  • D ranges from 0.25 to 0.25
  • D 0 indicates linkage equilibrium
  • dependent on allele frequencies, therefore of
    little use
  • D D / maximum possible value
  • D 1 indicates perfect LD
  • estimates of D strongly inflated in small
    samples
  • r2 D2 / P(A)P(B)P(a)P(b)
  • r2 1 indicates perfect LD
  • measure of choice

21
Linkage Disequilibrium
LDView
It is also possible to export SNP information for
upload into the HaploView software tool
22
Linkage Disequilibrium
LDTableView
23
  • Retrieve all non-synonymous SNPs for the human
    CFTR gene using BioMart and export their id,
    genomic position, alleles and peptide shift
  • (hint which dataset should you start with?).

24
DAS Sources
  • For human, data from the following DAS Sources
    can be
  • visualised on ContigView
  • DGV and DGV loci
  • Structural variations from the Database of
    Genomic Variations (CNVs, InDels, inversions
    etc.)
  • RedonCNV regions and RedonCNV loci
  • Copy number variations from Redon et al. paper
  • SegDup Washu
  • Segmental Duplications, University of Washington

25
Q

A
Q U E S T I O N S A N S W E R S
Write a Comment
User Comments (0)
About PowerShow.com