Parallel Genehunter: Implementation of a linkage analysis package for distributed memory architectures - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Parallel Genehunter: Implementation of a linkage analysis package for distributed memory architectures

Description:

Crossover between homologous. pairs of chromosomes. Genes on the same ... For two markers' inheritance vectors, each disagreeing bit requires a crossover event ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 24
Provided by: Csu48
Learn more at: http://www.cs.umd.edu
Category:

less

Transcript and Presenter's Notes

Title: Parallel Genehunter: Implementation of a linkage analysis package for distributed memory architectures


1
Parallel Genehunter Implementation of a linkage
analysis package for distributed memory
architectures
  • Michael Moran
  • CMSC 838T Presentation
  • May 9, 2003

2
Introduction
  • Goals
  • Link Genes to specific loci in the genome
  • Decrease time and memory requirements through
    parallelization
  • Motivation
  • Locate genes for specific phenotypes
  • Test for inherited diseases and risk factors
  • Gene therapy

3
Talk Overview
  • Introduction
  • Talk Overview
  • Genetic Linkage Problem
  • Previous Work
  • Parallel Genehunter
  • Evaluation
  • Observations

4
Genetic Linkage Problem
  • Sexual Reproduction
  • Offspring created by two haploid gametes
  • Gametes are produced from diploid/polyploid cells
    during meiosis

www.blc.arizona.edu/courses/181gh/rick/genetics1/
5
Genetic Linkage Problem
  • Recombination occurs in two ways
  • Random segregation of chromatids
  • 2 x 23 human chromosomes
  • gt
  • 223 possible haploid combinations
  • Genes on different chromosomes
  • recombine with probability

www.gen.umn.edu/faculty_staff/hatch/1131/
6
Genetic Linkage Problem
  • Recombination occurs in two ways
  • Random segregation of chromatids
  • Crossover between homologous
  • pairs of chromosomes
  • Genes on the same chromosome
  • recombine with probability
  • depending on their distance and
  • location on the chromosome

7
Genetic Linkage Problem
  • Given
  • This model of recombination
  • Data for a particular pedigree (family)
  • Phenotype information for each individual
  • Genetic markers for each individual
  • Recombination frequencies for each pair of
    markers
  • Can we apply probabilistic methods to
  • Reconstruct the inheritance patterns
  • Link phenotypes to the markers

8
Previous Work
  • Fisher, Haldane, Smith, Morton (1935 - 1955)
  • Methods to infer genetic maps using maximum
    likelihood
  • estimators
  • Elston, Stewart (1971)
  • Genetic Linkage Algorithm
  • Linear in pedigree size
  • Exponential in number of markers
  • Lander, Green (1987)
  • Genetic Linkage Algorithm
  • Linear in number of markers
  • Exponential in pedigree size

9
Previous Work
  • Genehunter (2001)
  • Implementation of Lander Green
  • Analyzes a pedigree containing n non-founders
  • The inheritance of a gene by one
  • non-founder can be summarized
  • by two bits
  • The entire pedigrees inheritance
  • pattern can be summarized by a
  • 2n bits

10
Previous Work
  • 3 steps of Genehunter
  • Step 1 For each marker, calculate the
    probability of each
  • of the possible inheritance pattern.
  • Store probabilities in a vector of size 22n

0 grandfathers chromatid 1 grandmothers
chromatid Pr(0,0) .5 Pr(0,1)
.5 Pr(1,0) 0 Pr(1,1) 0
11
Previous Work
  • 3 steps of Genehunter
  • Step 2 For each marker, calculate the
    conditional probably of each inheritance pattern
    conditional on all of the markers to the left,
    and to the right
  • For two markers inheritance vectors, each
    disagreeing bit requires a crossover event
  • The probability of transitioning between
    inheritance vectors i, j differing in d bits is

12
Previous Work
  • 3 steps of Genehunter
  • Step 2 For each marker, calculate the
    conditional probably of each inheritance pattern
    conditional on all of the markers to the left,
    and to the right
  • Mi,j cost of transitioning between inheritance
    vectors ij
  • P1 , P2 probability vectors for every
    inheritance pattern given markers 1 and 2
    respectively
  • P21 P2 (M P1)
  • Calculate the probabilities of each markers
    inheritance conditional on all others by Markov
    Chain or FFT convolution

13
Previous Work
  • 3 steps of Genehunter
  • Step 3 For each marker, calculate the
    probability of unknown gene being located at
    specific locations
  • Hypothesizes phenotype has a gene located at a
    particular location.
  • By default tries 5 evenly-spaced locations
    between consecutive pairs of markers
  • Calculates PD, the probabilities of each
    inheritance pattern for based on this phenotype
    (as in step 1)
  • For a location between markers ii1, p PD
    Px1...i Pxi1...m
  • Space Requirement
  • O(22n) O(22n-f) exploiting symmetry of f
    founders
  • Time Requirement
  • O(m22n) O(m22n-f) with f founders

14
Parallel Genehunter
  • Approach
  • Parallelize the 3 Genehunter steps separately
  • Divides each 22n-sized marker vector evenly among
    the P processors
  • allows greater distribution of memory than
    assigning O(m/P) entire vectors to each processor

15
Parallel Genehunter
  • Parallelization of step 1
  • For each marker, calculate the probability of
    each of the possible inheritance pattern
  • Each processor calculates the probabilities for
    a particular
  • 22n / P inheritance patterns for ever marker

16
Parallel Genehunter
  • Parallelization of step 2
  • For each marker, calculate the conditional
    probably of each inheritance pattern conditional
    on all of the markers to the left, and to the
    right
  • FFT convolution
  • As in serial genehunter, 22n x 22n matrix-vector
    multiplication
  • is replaced FFT-based convolution
  • 2 forward 1D FFTs on 22n-length vectors
  • element-by-element multiplication
  • inverse FFT
  • Each 1D FFT is equivalent to a 2D FFT on a
  • P x 22n / P matrix
  • There are well-known distributed algorithms for
    this FFT using all-to-all communication.
  • Dot Product in P21 P2 (M P1)
  • trivially parallelized each processor has the
    same
  • portion of each vector.

17
Parallel Genehunter
  • Parallelization of step 3
  • For each marker, calculate the probability of
    unknown
  • gene being located at specific locations
  • computing Px1...i and Pxi1...m
  • FFTs parallelized as in step 2
  • Final dot product p (PD Px1...i
    Pxi1...m)
  • parallelized as in step 2
  • each processor holds all the same portion of each
    vector


18
Evaluation
  • Experimental Environment
  • Input data sets
  • 51 family member pedigree
  • 19,21,24-bit data sets ( bits 2n-f )
  • Computing Facilities
  • Cplant Cluster (Sandia National Laboratories)
  • DEC Alpha EV6 processors
  • Myrinet connection

19
Evaluation
  • Runtimes For 19,21 and 24 bit problems

20
Evaluation
  • Runtimes For 19,21 and 24 bit problems

21
Observations
  • Pro Performs Genehunter computation exactly
  • Pro Effective for multipoint linkage of
    phenotypes
  • Con Old-fashioned compared to protein-based
    methods (?)
  • Pro Distributes memory requirements
  • Pro More computers allows larger feasible
    inputs
  • Con Experiments based on 1 pedigree
  • Pro Efficient parallelization up to 32 or 64
    processors
  • Con Only allows pedigrees to grow by only 3 or 4
    individuals
  • in equal time

22
References
  • Genetic Recombination
  • Dr. Craig Woodworth, Genetic Recombination in
    Eukaryotes, Lecture Notes, (www.clarkson.edu/class
    /by214/powerpoint)
  • Genehunter
  • K. Markianos, M.J. Daly, L. Kruglyak.
    Efficient Multipoint Linkage Analysis Through
    Reduction of Inheritance Space. American
    Journal of Human Genetics 68, 2001.
  • Parallel Genehunter
  • G. Conant, S. Plimpton, W. Old, A. Wagner, P.
    Fain, G. Heffelfinger. Parallel Genehunter
    Implementation of a Linkage Analysis Package for
    Distributed-Memory Architectures,  Proceedings of
    the First IEEE Workshop on High Performance
    Computational Biology, International Parallel and
    Distributed Computing Symposium, 2002.

23
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com