A PARALLEL BISECTION ALGORITHM (WITHOUT COMMUNICATION) - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

A PARALLEL BISECTION ALGORITHM (WITHOUT COMMUNICATION)

Description:

and considered a bracketing algorithm to be correct if ... and, if required, it corrects to be smaller than . 18. An alternative algorithm (1) ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 23
Provided by: ccam1
Learn more at: https://www.cse.psu.edu
Category:

less

Transcript and Presenter's Notes

Title: A PARALLEL BISECTION ALGORITHM (WITHOUT COMMUNICATION)


1
A PARALLEL BISECTION ALGORITHM (WITHOUT
COMMUNICATION)
  • Rui Ralha
  • DMAT, CMAT
  • Univ. do Minho
  • Portugal
  • r_ralha_at_math.uminho.pt

2
Acknowledgements
  • CMAT
  • FCT
  • POCTI (European Union contribution)
  • Prof. B. Parlett

3
Outline
  • Counting eigenvalues of symmetric tridiagonals
  • The ScaLAPACKs routine
  • A parallel algorithm without communication
  • An alternative algorithm
  • Some conclusions

4
Counting eigenvalues
5
Nonmonotonicity of Count(x)
6
The ScaLAPACKs implementation (1)
7
The ScaLAPACKs implementation (2)
  • In 1 the authors wrote
  • Ideally, we would like a bracketing algorithm
    that was simultaneously parallel, load balanced,
    devoid of communication, and correct in the face
    of nonmonotonicity. We still do not know how to
    achieve this completely in the most general
    case, when different parallel processors do not
    even possess the same floating point format, we
    do not know how to implement a correct and
    reasonably fast algorithm at all. Even when
    floating point formats are the same, we do not
    know how to avoid some global communication
  • and considered a bracketing algorithm to be
    correct if
  • (1) every eigenvalue is computed
    exactly once,
  • (2) the computed eigenvalues are
    correct to within the user
  • specified error tolerance,
  • (3) the computed eigenvalues are in
    sorted order.

8
The ScaLAPACKs implementation (3)
9
The ScaLAPACKs implementation (4)
10
Drawbacks of the ScaLAPACKs implementation
11
A simple and incorrect parallel algorithm
(without communication)
  • To partition the initial Gerschgorin interval
    into p subintervals of equal width
  • and assign to processor i the task of finding
    all the eigenvalues in the
  • ith subinterval . But, even with
    processors with the same arithmetic
  • (nonmonotonic) the algorithm may be incorrect.
  • For example, with np3, it may happen 1
  • Therefore, the second eigenvalue will be computed
    twice (processors 1 and 3)

12
Parallel bisection for computing the eigenvalues
of -1 2 -1 with 100 processors
13
Our proposal (1)
14
Our proposal (2)
15
Our proposal (3)
16
Our proposal (4)
17
Sorting eigenvalues
  • For the Wilkinsons matrix of order 21 we have
  • With single precision in Matlab we get
  • With double precision we get
  • We assume that eigenvalues are to be gathered in
    a master
  • processor (this is a standard feature of
    ScaLAPACK). Supose that the
  • master receives
    (out of order) and knows that the
  • processor that computed has better
    accuracy. Then, it keeps
  • and, if required, it corrects to be smaller
    than .

18
An alternative algorithm (1)
  • Phase 1(equal for every processor) carry out a
    (not too large) number of bisection steps in a
    breadth first search to get a good picture of
    the spectrum. Produces a number of intervals (at
    least p number of processors).
  • Phase 2 distributes intervals to processors
    trying to achieve load- balance (the same number
    of eigenvalues to each processor)
  • Phase 3 each processor computes the assigned
    eigenvalues to some prescribed accuracy

19
An alternative algorithm (2)

20
An alternative algorithm (3)
  • Preliminar implementation (in Matlab)
  • Finishes Phase 1 when enough intervals have been
    produced such that, for each k1,,p-1, an end
    point x of one of those intervals satisfies
  • This may affect the speedup by 10.
  • This termination criteria for Phase 1 may be hard
    (i.e, take too many bisection steps) to satisfy
    in some cases.

21
Parallel bisection for computing the eigenvalues
of -1 2 -1 of order 104
22
Conclusions
  • Parallel bracketing in ScaLAPACKs requires
    global communication
  • We have proposed an algorithm that is
    communication free and is load balanced in the
    sense that each processor computes the same
    number of eigenvalues (if p divides n)
  • In homogeneous systems, our algorithm produces
    sorted eigenvalues even when the arithmetic is
    nonmonotonic
  • In heterogeneous systems, eigenvalues may be
    unsorted (they may be sorted by the master if
    required)
Write a Comment
User Comments (0)
About PowerShow.com