View by Category

Loading...

PPT – Single Nucleotide Polymorphisms PowerPoint presentation | free to view - id: 21dcc6-YmJmM

The Adobe Flash plugin is needed to view this content

About This Presentation

Write a Comment

User Comments (0)

Transcript and Presenter's Notes

Single Nucleotide Polymorphisms

Instructor Yao-Ting Huang

Bioinformatics Laboratory, Department of Computer

Science Information Engineering, National Chung

Cheng University.

Genetic Variants

- We are distinguished from each other by genetic

variants. - Single Nucleotide Polymorphisms (SNP)
- Insertion/deletion
- Copy Number Polymorphism (CNP)
- Inversion

Genetic Variants Over Time

Variants observed in a population

Mutations over time

Common Ancestor

time

present

SNPs and Haplotypes

- A Single Nucleotide Polymorphism (SNP),

pronounced snip, is a single DNA base variation

observed in the human population. - A haplotype stands for a set of linked SNPs on

the same chromosome.

Single Nucleotide Polymorphism

- We only consider SNPs observed with sufficient

frequency in the population. - SNP the minor allele frequency is at least 5.
- Mutation the minor allele frequency is less than

5.

C T T A G C T T

C T T A G T T T

SNP

A C T T A G C T T

99.9

A C T T A G T T T

0.1

Mutation

Single Nucleotide Polymorphism

- All humans share 99.9 the same DNA sequence
- SNPs occur about every 200600 base pairs.
- 90 of human genome variation comes SNPs.
- The human genome contains about four million

SNPs. - Because the probability of recurrent mutation at

the same locus is quite low, we usually observe

only two alleles at a SNP locus.

Single Nucleotide Polymorphism

- The SNPs differ among members in the human

population.

Black eye Brown eye Black eye Blue eye Brown

eye Brown eye

GATATTCGTACGGA-T GATGTTCGTACTGAAT GATATTCGTACGGA-T

GATATTCGTACGGAAT GATGTTCGTACTGAAT GATGTTCGTACTGAA

T

Haplotypes

AG- 2/6 GTA 3/6 AGA 1/6

DNASequences of 6 individuals

Discovery of SNPs

- The DNA of two individuals differs in less than

0.1. - Hinds et al. identified 1,586,383 Single

Nucleotide Polymorphisms across three human

populations (Science, 2005).

The HapMap Project

- The International HapMap project aims to provide

a map of SNPs in the human genome (269

individuals from 4 populations). - Phase I 1,007,329 SNPs.
- Phase II (ongoing) 4.6 millions SNPs.

Haplotype v.s. Genotype

- The collection of haplotypes has been limited

because the human genome is a diploid. - In above projects, genotypes instead of

haplotypes are collected due to cost

consideration.

Haplotype v.s. Genotype

- Genotypes only tell us the alleles at each SNP

locus. - But we dont know the connection of alleles at

different SNP loci. - There could be several possible haplotype pairs

for the same genotype.

or

We dont know which haplotype pair is real.

Three Possible Genotypes at Each SNP Locus

- At SNP1, it is possible to observe three

genotypes (A, C), (A, A), and (C, C) in the

population. - (A, C) Heterozygous (One major and one minor

alleles). - (A, A) Homozygous wild type (two major alleles).
- (C, C) Homozygous mutant (two minor alleles).

T

C

G3

C

T

SNP1

SNP2

Haplotype Inference

- Inferring the haplotypes from a set of genotypes

is called haplotype inference. - Without further assumption, this problem can not

be solved. - Most combinatorial methods consider the maximum

parsimony model to solve this problem. - Methods based on this model search for a minimum

set of haplotypes which can explain all

genotypes. - This problem is shown to be APX-hard (Lancia

etal, 2005).

Maximum Parsimony

or

- Find a minimum set of haplotypes that can explain

all genotypes.

Related Works

- Statistical methods
- Niu, et al. (2002) developed a PL-EM algorithm

called HAPLOTYPER. - Stephens and Donnelly (2003) designed a MCMC

algorithm based on Gibbs sampling called PHASE. - Combinatorial methods
- Gusfield (2003) proposed an integer linear

programming for this problem. - Wang and Xu (2003) developed a branching and

bound algorithm called HAPAR to find the optimal

solution. - Brown and Harrower (2004) proposed a new integer

linear programming for this problem.

Our Results

- Huang et al. An approximation algorithm for

haplotype inference by maximum parsimony, Journal

of Computational Biology, 2005.

Yao-Ting Huang

Approximation Approaches to NP-hard problems

- Formulate the problem to an integer linear

problem - Relax to a Linear Programming (LP) problem and

solve it. - Gusfield and Brown formulate the haplotype

inference problem into integer programming. - Formulate the problem to an integer quadratic

programming (IQP) problem - Relax to a Semi-Definite Programming (SDP)

problem and solve it. - We formulate the haplotype inference problem into

an IQP problem.

Integer Quadratic Programming

- Define xi as an integer variable with values 1 or

-1. - xi 1 if the i-th haplotype is selected.
- xi -1 if the i-th haplotype is not selected.
- Finding a minimum set of haplotypes is to

minimize the following function

Integer Quadratic Programming

- Each genotype must be explained by at least one

pair of haplotypes. - For genotype G1, the following inequality must be

satisfied.

Suppose h1 and h2 are selected

or

Integer Quadratic Programming

Constraint Functions

- Maximum parsimony

Find a minimum set of haplotypes

which can explain all genotypes.

An Iterative Semi-definite Programming Relaxation

Algorithm

Integer Quadratic Programming

Semi-definite Programming

Vector Formulation

Vector Solution

SDP Solution

Integral Solution

Relaxation

Integer Quadratic Programming

Vector Formulation

- We relax xi into a (m1)-dimensional unit vector

yi. - Replace the integer constant 1 with another unit

vector y0 (1, 0, , 0).

SDP Formulation

Vector Formulation

- Let Y (y0 y1 ym)T(y0 y1 ym)

Reformulation

Vector Formulation

Solving SDP

Semidefinite Programming

- The SDP problem can be solved by algorithms such

as the interior point method in polynomial time. - We can obtain the SDP solution matrix Y.

Decomposition

SDP Solution

- Recall that Y (y0 y1 ym)T(y0 y1 ym).
- Use the incomplete Choleskey decomposition method

to obtain vector solutions y0, y1, , ym.

Randomized Rounding

IntegralSolution

Vector Solution

- Randomly generate two unit vectors z1 and z2.
- Set xi 1 if
- ( z1 yi ) ( z1 y0 ) gt 0, and
- ( z2 yi ) ( z2 y0 ) gt 0.
- Set xi -1 otherwise.

We will discuss this later

Iterative Process

Integer Quadratic Programming

- Check if all inequalities are satisfied.
- No, repeat this algorithm only for those

unsatisfied inequalities. - Yes, we are done.

Analysis of the SDP-relaxation Algorithm

- Recall the randomized rounding
- Randomly generate two unit vectors z1 and z2.
- Set xi 1 if
- ( z1 yi ) ( z1 y0 ) gt 0, and
- ( z2 yi ) ( z2 y0 ) gt 0.
- Set xi -1 otherwise.
- We will show that the randomized rounding outputs

a solution Ew at least as good as the optimal

solution.

Analysis of the SDP-relaxation Algorithm

- The randomized rounding method can output a

solution Ew at least as good as the optimal

solution. - We will show OPT(IQP) OPT(SDP) Ew.
- The solution space of SDP includes that of IQP,
- We already have OPT(IQP) OPT(SDP).
- We can set yi (1,0,0,0, ) ? xi 1.
- We can set yi (-1,0,0,0, ) ? xi -1.

Analysis of the SDP-relaxation Algorithm

- We still need to prove
- OPT(IQP) OPT(SDP) Ew.

gt lt?

Analysis of the SDP-relaxation Algorithm

- Recall xi 1 if
- ( z1 yi ) ( z1 y0 ) gt 0, and
- ( z2 yi ) ( z2 y0 ) gt 0.
- Note that cos? vi vj
- Let the angle between vectors y0 and yi be ?.
- Recall that cos? gt 0 when ?0, p/2 or p, 3p/2.

Analysis of the SDP-relaxation Algorithm

- Recall xi 1 if
- ( z1 yi ) ( z1 y0 ) gt 0, and
- ( z2 yi ) ( z2 y0 ) gt 0.
- Let the angle between vectors y0 and yi be ?.
- ( z1 yi ) ( z1 y0 ) gt 0 if z1 is within region

(p-?) or the opposite region. - ( z2 yi ) ( z2 y0 ) gt 0 if z2 is within region

(p-?) or the opposite region.. - xi 1 with probability ((p-?) /p)2.

Analysis of the SDP-relaxation Algorithm

Analysis of the SDP-relaxation Algorithm

- We now complete the proof
- OPT(IQP) OPT(SDP) Ew.

Simulation Methods

- The haplotypes are used to validate the result.
- We randomly pair two haplotypes to generate a

genotype.

HaplotypeData

GenotypeData

Solution

h1 h2 hm

G1 h1h4 G2 h2hm Gn h1h2

G1 h1h4 G2 h1h2 Gn h1h2

SDPHapInferHAPARHAPLOTYPER PHASE

Results

- We prove that SDPHapInfer gives a solution of

O(log n)-approximation with a high probability,

where n is the number of genotypes. - We implement SDPHapInfer in MatLab.
- We compare the number of haplotypes found by

different methods on simulated data sets.

Experimental Results (1/2)

Error rate

Number of genotypes

100 simluated data sets of 10 haplotypes with 20

SNPs

The Challenge

- The problem of inferring haplotypes for long

genotypes is still a challenging problem. - Existing methods are forced to
- partition the genotypes into small segments,
- infer haplotype in each segment,
- and concatenate inferred haplotypes to construct

a final solution.

The First Application of SDP on Approximation

Algorithms

- A 0.878 randomized approximation algorithm for

the MAXCUT problem is developed by SDP relaxation

technique. - The LP-relaxation can only achieve 0.5

approximation ratio. - An upper bound has shown to be 0.941.
- Goemans, M. and Williamson, D. at ACM STOC 1994.

The MAXCUT Problem

- Given an undirected graph with n nodes Gx1 , x2

, , xn, find a cut to maximize the number of

edges on the cut. - Let xi be 1 if the vertex is at one side of the

cut, and -1 if the vertex is at the other side of

the cut.

-1

-1

-1

Integer Quadratic Programming

- Define aij be 1 if the edge (xi , xj) exists and

0 otherwise.

x2

x1

x3

x4

- Relax the integer constraint of xi to be the unit

length vector in dimension m.

Semidefinite Programming Formulation

x2

x1

x3

x4

- Let X be (v1 ,v2 , , vn)T ? (v1, v2 ,, vn).

Randomized Rounding Method

- Once X is found, perform Cholesky decomposition

to obtain the vector solutions v1, v2, , vn. - Pick a random unit vector r and
- Set xi 1 if vi ? r 0
- Set xi -1 if vi ? r lt 0
- Note that cos? vi ? vj
- The edge (vi , vj) is on the cut iff (vi ? r )

and (vj ? r) has different sign.

vi

r

?

vj

Analysis

- Denote C as the size of the cut found by the

above algorithm. - The expectation that each edge (xi , xj) is the

solution is

vi

r

?

vj

Analysis

- The randomized rounding partition the nodes by a

hyperplane.

r

1

1

1

-1

-1

Linear Algebra Background

- A symmetric n?n matrix A is positive semidefinite

iff xTAx ? 0 , for every x?Rn. - ABTB , for some m?n matrix B.
- All the eigenvalues of A are non-negative.
- The inner product of symmetric matrices A and B is

About PowerShow.com

PowerShow.com is a leading presentation/slideshow sharing website. Whether your application is business, how-to, education, medicine, school, church, sales, marketing, online training or just for fun, PowerShow.com is a great resource. And, best of all, most of its cool features are free and easy to use.

You can use PowerShow.com to find and download example online PowerPoint ppt presentations on just about any topic you can imagine so you can learn how to improve your own slides and presentations for free. Or use it to find and download high-quality how-to PowerPoint ppt presentations with illustrated or animated slides that will teach you how to do something new, also for free. Or use it to upload your own PowerPoint slides so you can share them with your teachers, class, students, bosses, employees, customers, potential investors or the world. Or use it to create really cool photo slideshows - with 2D and 3D transitions, animation, and your choice of music - that you can share with your Facebook friends or Google+ circles. That's all free as well!

For a small fee you can get the industry's best online privacy or publicly promote your presentations and slide shows with top rankings. But aside from that it's free. We'll even convert your presentations and slide shows into the universal Flash format with all their original multimedia glory, including animation, 2D and 3D transition effects, embedded music or other audio, or even video embedded in slides. All for free. Most of the presentations and slideshows on PowerShow.com are free to view, many are even free to download. (You can choose whether to allow people to download your original PowerPoint presentations and photo slideshows for a fee or free or not at all.) Check out PowerShow.com today - for FREE. There is truly something for everyone!

You can use PowerShow.com to find and download example online PowerPoint ppt presentations on just about any topic you can imagine so you can learn how to improve your own slides and presentations for free. Or use it to find and download high-quality how-to PowerPoint ppt presentations with illustrated or animated slides that will teach you how to do something new, also for free. Or use it to upload your own PowerPoint slides so you can share them with your teachers, class, students, bosses, employees, customers, potential investors or the world. Or use it to create really cool photo slideshows - with 2D and 3D transitions, animation, and your choice of music - that you can share with your Facebook friends or Google+ circles. That's all free as well!

For a small fee you can get the industry's best online privacy or publicly promote your presentations and slide shows with top rankings. But aside from that it's free. We'll even convert your presentations and slide shows into the universal Flash format with all their original multimedia glory, including animation, 2D and 3D transition effects, embedded music or other audio, or even video embedded in slides. All for free. Most of the presentations and slideshows on PowerShow.com are free to view, many are even free to download. (You can choose whether to allow people to download your original PowerPoint presentations and photo slideshows for a fee or free or not at all.) Check out PowerShow.com today - for FREE. There is truly something for everyone!

presentations for free. Or use it to find and download high-quality how-to PowerPoint ppt presentations with illustrated or animated slides that will teach you how to do something new, also for free. Or use it to upload your own PowerPoint slides so you can share them with your teachers, class, students, bosses, employees, customers, potential investors or the world. Or use it to create really cool photo slideshows - with 2D and 3D transitions, animation, and your choice of music - that you can share with your Facebook friends or Google+ circles. That's all free as well!

For a small fee you can get the industry's best online privacy or publicly promote your presentations and slide shows with top rankings. But aside from that it's free. We'll even convert your presentations and slide shows into the universal Flash format with all their original multimedia glory, including animation, 2D and 3D transition effects, embedded music or other audio, or even video embedded in slides. All for free. Most of the presentations and slideshows on PowerShow.com are free to view, many are even free to download. (You can choose whether to allow people to download your original PowerPoint presentations and photo slideshows for a fee or free or not at all.) Check out PowerShow.com today - for FREE. There is truly something for everyone!

For a small fee you can get the industry's best online privacy or publicly promote your presentations and slide shows with top rankings. But aside from that it's free. We'll even convert your presentations and slide shows into the universal Flash format with all their original multimedia glory, including animation, 2D and 3D transition effects, embedded music or other audio, or even video embedded in slides. All for free. Most of the presentations and slideshows on PowerShow.com are free to view, many are even free to download. (You can choose whether to allow people to download your original PowerPoint presentations and photo slideshows for a fee or free or not at all.) Check out PowerShow.com today - for FREE. There is truly something for everyone!

Recommended

«

/ »

Page of

«

/ »

Promoted Presentations

Related Presentations

Page of

Home About Us Terms and Conditions Privacy Policy Contact Us Send Us Feedback

Copyright 2017 CrystalGraphics, Inc. — All rights Reserved. PowerShow.com is a trademark of CrystalGraphics, Inc.

Copyright 2017 CrystalGraphics, Inc. — All rights Reserved. PowerShow.com is a trademark of CrystalGraphics, Inc.

The PowerPoint PPT presentation: "Single Nucleotide Polymorphisms" is the property of its rightful owner.

Do you have PowerPoint slides to share? If so, share your PPT presentation slides online with PowerShow.com. It's FREE!

Committed to assisting Ccu University and other schools with their online training by sharing educational presentations for free