Linear Reduction Method for Tag SNPs Selection - PowerPoint PPT Presentation

About This Presentation
Title:

Linear Reduction Method for Tag SNPs Selection

Description:

Haplotype = description of single copy (0,1) ... Haplotype Tagging Problem. Given the full pattern of all SNPs for sample ... Human Haplotype Evolution ... – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 19
Provided by: Gan999
Category:

less

Transcript and Presenter's Notes

Title: Linear Reduction Method for Tag SNPs Selection


1
Linear Reduction Method for Tag SNPs Selection
  • Jingwu He
  • Alex Zelikovsky

2
Outline
  • SNPs , haplotypes and genotypes
  • Haplotype tagging problem
  • Linear reduction method for tagging
  • Maximizing tagging separability
  • Conclusions future work

3
Outline
  • SNPs , haplotypes and genotypes
  • Haplotype tagging problem
  • Linear reduction method for tagging
  • Maximizing tagging separability
  • Conclusions future work

4
Human Genome and SNPs
  • Length of Human Genome ? 3 ? 109 base pairs
  • Difference b/w any people ? 0.1 of genome ? 3 ?
    106 SNPs
  • Total single nucleotide polymorphisms (SNP) ?
    1 ? 107
  • SNPs are mostly bi-allelic, e.g., alleles A and C
  • Minor allele frequency should be considerable
    e.g. gt 1
  • Diploid two different copies of each chromosome
  • Haplotype description of single copy (0,1)
  • Genotype description of mixed two copies
    (000, 111, 201)

0
1
1
1 0 0 1
1
0
0
1
1
1 0 0 1
1
0
Two
haplotypes
per individual
Two
haplotypes
per individual
?
1
1
0
1 0 0 1
0
0
1
1
0
1 0 0 1
0
0
Genotype for the individual
Genotype for the individual
2
1
2
1 0 0 1
2
0
2
1
2
1 0 0 1
2
0
5
Haplotype and Disease Association
  • Haplotypes/genotypes define our individuality
  • Genetically engineered athletes might win at
    Beijing Olympics (Time (07/2004))
  • Haplotypes contribute to risk factors of complex
    diseases (e.g., diabetes)
  • International HapMap project http//www.hapmap.or
    g
  • SNPs causing disease reason are hidden among 10
    million SNPs.
  • Too expensive to search
  • HapMap tries to identify 1 million tag SNPs
    providing almost as much mapping information as
    entire 10 million SNPs.

6
Outline
  • SNPs, haplotypes and genotypes
  • Haplotype tagging problem
  • Linear reduction method for tagging
  • Maximizing tagging separability
  • Conclusions future work

7
Tagging Reduces Cost
  • Decrease SNP haplotyping cost
  • sequence only small amount of SNPs tag SNP
  • infer rest of (certain) SNPs based on sequenced
    tag SNPs
  • Cost-saving ratio m / k (infinite population)
  • Traditional tagging linkage disequilibrium (LD)
    needs too many SNPs, cost-saving ratio is too
    small ( 2)
  • Proposed linear reduction method cost-saving
    ratio 20

Number of SNPs m Number of Tags k
8
Haplotype Tagging Problem
  • Given the full pattern of all SNPs for sample
  • Find minimum number of tag SNPs that will allow
    for reconstructing the complete haplotype for
    each individual

9
Outline
  • SNPs, haplotypes and genotypes
  • Haplotype tagging problem
  • Linear reduction method for tagging
  • Maximizing tagging separability
  • Conclusions future work

10
Linear Rank of Recombinations
  • Human Haplotype Evolution
  • Mutations introduce SNPs
  • Recombinations propagate SNPs over entire
    population
  • Replace notations (0, 1) with (1, 1)
  • Theorem Haplotype population generated from l
    haplotypes with recombinations at k spots has
    linear rank (l-1)(k2)
  • It is much less than number of all haplotypes l
    k
  • Conclusion use only linearly independent SNPs
    as tags

11
Tag SNPs Selection
  • Tag Selecting Algorithm
  • Using Gauss-Jordan Elimination find Row Reduced
    Echelon Form (RREF) X of sample matrix S.
  • Extract the basis T of sample S
  • Factorize sample S T ? X
  • Output set of tags T
  • Fact In sample, each SNP is a linear
    combination of tag SNPs
  • Conjecture In entire population, each SNP is
    same linear combination of tags as in sample



rref X
tags T
Sample S
12
Haplotype Reconstruction
  • Given tags t of unknown haplotype h
  • and RREF X of sample matrix S
  • Find unknown haplotype h
  • Predict the h t ? X
  • We may have errors, since predicted h may not
    equal to unknown haplotype h. we assign 1 if
    predicted values are negative and 1 otherwise.
    (RLRP)
  • Variant randomly reshuffle SNPs before choosing
    tags (RLR)

Unknown haplotype h
rref X
Predicted haplotype h
tags set
?

13
Results for Simulated Data
  • Cost-saving ratio for 2 error for LR is 3.9 and
    for RLRP is 13
  • P 1000 different haplotypes
  • m 25000 sites
  • Sample size k (number of tag SNPs)
    50,100,,750

14
Results for Real Data
  • Cost-saving ratio for 5 error for LR is 2.1 and
    for RLRP is 2.8
  • P 158 different haplotypes (Daly el.,)
  • m 103 sites
  • Sample size k (number of tag SNPs)
    10,15,20,,90

15
Outline
  • SNPs, haplotypes and genotypes
  • Haplotype tagging problem
  • Linear reduction method for tagging
  • Maximizing tagging separability
  • Conclusions future work

16
Tag Separability
  • Correlation between number of zeros for SNPs in
    RREF X and number of errors in prediction column
  • Greedy heuristic gives a more separable basis.
    For 5 error, cost-saving ratio 2.8 vs 3.3 for
    RLRP

17
Conclusions and Future work
  • Our contributions
  • new SNP tagging problem formulation
  • linear reduction method for SNP tagging
  • enhancement of linear reduction using separable
    basis
  • Future work
  • application of tagging for genotype and haplotype
    disease association

18
Thank you
Write a Comment
User Comments (0)
About PowerShow.com