1 / 20

Single nucleotide polymorphisms

- Usman Roshan

SNPs

- DNA sequence variations that occur when a single

nucleotide is altered. - Must be present in at least 1 of the population

to be a SNP. - Occur every 100 to 300 bases along the 3

billion-base human genome. - Many have no effect on cell function but some

could affect disease risk and drug response.

Toy example

SNPs on the chromosome

Perl exercise

- Determining SNPs from a pairwise genome

alignment - Can we solve this problem with a Perl script?

Bi-allelic SNPs

- Most SNPs have one of two nucleotides at a given

position - For example
- A/G denotes the varying nucleotide as either A or

G. We call each of these an allele - Most SNPs have two alleles (bi-allelic)

Perl exercise

- Determining SNP type from a multiple genome

alignment.

SNP genotype

- We inherit two copies of each chromosome (one

from each parent) - For a given SNP the genotype defines the type of

alleles we carry - Example for the SNP A/G ones genotype may be
- AA if both copies of the chromosome have A
- GG if both copies of the chromosome have G
- AG or GA if one copy has A and the other has G
- The first two cases are called homozygous and

latter two are heterozygous

SNP genotyping

Perl exercise

- SNP encoding
- Convert SNP genotype from a character sequence to

numeric one

Real SNPs

- SNP consortium snp.cshl.org
- SNPedia www.snpedia.com

Application of SNPs association with disease

- Experimental design to detect cancer associated

SNPs - Pick random humans with and without cancer (say

breast cancer) - Perform SNP genotyping
- Look for associated SNPs
- Also called genome-wide association study

Case-control example

- Study of 100 people
- Case 50 subjects with cancer
- Control 50 subjects without cancer
- Count number of dominant and recessive alleles

and form a contingency table

Perl exercise

- Contingency table
- Compute contingency table given case and control

SNP genotype data

Odds ratio

- Odds of recessive in cancer a/b e
- Odds of recessive in no-cancer c/d f
- Odds ratio of recessive in cancer vs no-cancer

e/f

Risk ratio (Relative risk)

- Probability of recessive in cancer a/(ab) e
- Probability of recessive in no-cancer c/(cd)

f - Risk ratio of recessive in cancer vs no-cancer

e/f

Odds ratio vs Risk ratio

- Risk ratio has a natural interpretation since it

is based on probabilities - In a case-control model we cannot calculate the

probability of cancer given recessive allele.

Subjects are chosen based disease status and not

allele type - Odds ratio shows up in logistic regression models

Example

- Odds of recessive in case 15/35
- Odds of recessive in control 2/48
- Odds ratio of recessive in case vs control

(15/35)/(2/48) 10.3 - Risk of recessive in case 15/50
- Risk of recessive in control 2/50
- Risk ratio of recessive in case vs control 15/2

7.5

Odds ratios in genome-wide association studies

- Higher odds ratio means stronger association
- Therefore SNPs with highest odds ratios should be

used as predictors or risk estimators of disease - Odds ratio generally higher than risk ratio
- Both are similar when small

Statistical test of association (P-values)

- P-value probability of the observed data (or

worse) under the null hypothesis - Example
- Suppose we are given a series of coin-tosses
- We feel that a biased coin produced the tosses
- We can ask the following question what is the

probability that a fair coin produced the tosses? - If this probability is very small then we can say

there is a small chance that a fair coin produced

the observed tosses. - In this example the null hypothesis is the fair

coin and the alternative hypothesis is the biased

coin