Title: Databases
1Databases
BI420 Introduction to Bioinformatics
Gabor T. Marth
Department of Biology, Boston College marth_at_bc.edu
2SNP Mining in BAC Overlaps
Human Chromosome
Tiling path of BACs (finished or 5x shotgun)
Clone overlap
Candidate SNP
SNP Marker Map
100 kb
3BAC overlap mining
inter- intra-chromosomal duplications known
human repeats fragmentary nature of draft data
4Title
NH0260K08 NH0407F02
Section of base-wise alignment with marked-up
candidate SNP (alignment displayed with the
CONSED sequence viewer)
SNP mark-up tag produced by PolyBayes
http//genome.wustl.edu/gsc/polybayes
5BAC overlap mining results
30,000 clones
gtCloneX ACGTTGCAACGT GTCAATGCTGCA
gtCloneY ACGTTGCAACGT GTCAATGCTGCA
25,901 clones (7,122 finished, 18,779 draft with
basequality values)
21,020 clone overlaps (124,356 fragment overlaps)
ACCTAGGAGACTGAACTTACTG
ACCTAGGAGACCGAACTTACTG
6Database schema
Clone (1) NH0260K08
CLONE id name received
masked 1 NH0260K08 12-25-99 12-26-99 2 NH0407F02 1
2-28-99 01-03-00
Hit (1)
HSP id sense 1 1
HSP (1)
Clone (2) NH0407F02
Hit (2)
ALLELE id hitID nucleotide 1 1 C 2 2 T
HIT id cloneID hspID start end 1 1 1 1 17957 2 2 1
96912 114891
C
Hit (1)
Allele (1)
SNP id submitted 1 01-04-00
SNP (1)
Database tables CLONE table Clone attributes
and sequence file location HSP table Significant
pair-wise BLAST similarity HIT table Region of a
clone that is part of an HSP SNP table Candidate
SNP attributes ALLELE table Attributes of an
allele within a SNP
T
Hit (2)
Allele (2)