AHPCRC SPATIAL DATAMINING TUTORIAL on Scalable Parallel Formulations of Spatial AutoRegression SAR M - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

AHPCRC SPATIAL DATAMINING TUTORIAL on Scalable Parallel Formulations of Spatial AutoRegression SAR M

Description:

... EISPACK, pre-parallelized LAPACK-based NAG linear algebra libraries and shared ... this task separately in another program using NAG SMP library subroutines ... – PowerPoint PPT presentation

Number of Views:89
Avg rating:3.0/5.0
Slides: 42
Provided by: baris4
Category:

less

Transcript and Presenter's Notes

Title: AHPCRC SPATIAL DATAMINING TUTORIAL on Scalable Parallel Formulations of Spatial AutoRegression SAR M


1
AHPCRCSPATIAL DATA-MINING TUTORIALonScalable
Parallel Formulations of Spatial Auto-Regression
(SAR) Models for Mining Regular Grid Geospatial
Data
  • Shashi Shekhar, Baris M. Kazar, David J. Lilja
  • EECS Department _at_ University of Minnesota
  • Army High Performance Computing Research Center
    (AHPCRC)
  • Minnesota Supercomputing Institute (MSI)
  • 05.14.2003

2
Outline
  • Motivation for Parallel SAR Models
  • Background on Spatial Auto-regression Model
  • Our Contributions
  • Problem Definition Hypothesis
  • Introduction to the SAR Software
  • Experimental Design
  • Related Work
  • Conclusions and Future Work

3
Motivation for Parallel SAR Models
  • Linear regression models make the assumption of
    independent identical distribution (a.k.a. iid)
    about learning data samples.
  • Therefore, low prediction accuracy occurs
  • SAR model generalization of linear regression
    model with an auto-correlation term
  • Incorporating the auto-correlation term
  • Results in better prediction accuracy
  • However, computational complexity increases due
    to the logarithm-determinant (a.k.a. Jacobian)
    term of the maximum likelihood estimation
    procedure
  • Parallel processing can reduce execution time

4
Our Contributions
  • This study is the first study that offers the
    only available parallel SAR formulation and
    evaluates its scalability
  • All of the eigenvalues of any type of dense
    neighborhood (square) matrix can be computed in
    parallel
  • Scalable parallel formulations of spatial
    auto-regression (SAR) models for 1-D and 2-D
    location prediction problems for planar surface
    partitionings using the eigenvalue computation
  • Hand-parallelized EISPACK, pre-parallelized
    LAPACK-based NAG linear algebra libraries and
    shared-memory parallel programming, i.e. OpenMP
    are used

5
Background on Spatial Autoregression Model
  • Mixed Regressive Spatial Auto-regressive (SAR)
    Model
  • where
  • y n-by-1 vector of observations on dependent
    variable
  • ? spatial autoregression parameter
    (coefficient)
  • W n-by-n matrix of spatial weights
  • (i.e. contiguity, neighborhood matrix)
  • X n-by-k matrix of observations on the
    explanatory variables
  • ? k-by-1 vector of regression coefficients
  • I n-by-n Identity matrix
  • ? n-by-1 vector of unobservable error term
    N(0, ?2I)
  • ? common variance of error term
  • When ? 0, the SAR model collapses to the
    classical regression model

6
Background on SAR Model Contd
  • If x 0 and W2 0, then first-order spatial
    autoregressive model is obtained as follows
  • y ? W1 y ?
  • ? N(0, ?2I)
  • If W2 0, then mixed regressive spatial
    autoregressive model (a.k.a. spatial lag model)
    is obtained as follows
  • y ? W1 y x? ?
  • ? N(0, ?2I)
  • If W1 0, then regression model with spatial
    autocorrelation in the disturbances (a.k.a.
    spatial error model) is obtained as follows
  • y x? u
  • u ? W2 u ?
  • ? N(0, ?2I)

7
Problem Definition
  • Given
  • A Sequential solution procedure Serial Dense
    Matrix Approach
  • Same as Prof. Bin Lis method in Fortran 77
  • Find
  • Faster serial formulation called as Serial
    Sparse Matrix Approach
  • New Parallel Formulation of Serial Dense Matrix
    Approach different from Prof. Bin Lis method in
    CM-fortran
  • Constraints ? ? N(0,?2I) IID
  • Rastor Data (resulting in binary
    neighborhood,contiguity matrix W)
  • Parallel Platform (HW Origin 3800, IBM SP, IBM
    Regatta, Cray X1 SW Fortran, OpenMP)
  • Size of W (big vs. small and dense vs. sparse)
  • Objective
  • Maximize speedup (Tserial/Tparallel)
  • Scalability- Better than (Li,1996)s formulation

8
Hypotheses
  • There are a number of sequential algorithms
    computing SAR model, most of which are based on
    the estimation of maximum likelihood method that
    solves for the spatial autoregression parameter
    (?) and regression coefficients (?).
  • As the problem size gets bigger, the sequential
    methods are incapable of solving this problem due
    to
  • extensive number of computations and
  • large memory requirement.
  • The new parallel formulation proposed in this
    study will outperform the previous parallel
    implementation in terms of
  • Speedup (S),
  • Scalability,
  • Problem size (PS) and,
  • Memory requirement.

9
Serial SAR Model Solution
  • Starting with normal density function
  • The maximum likelihood (ML) function
  • The explicit form of maximum likelihood (ML)
    function

10
Serial SAR Model Solution - (Contd)
  • The logarithm of the maximum likelihood function
    is called
  • log-likelihood function
  • The ML estimates of the SAR parameters
  • The function to optimize

11
System Diagram
B Golden Section Search to find that minimizes
ML function
Pre-processing Step
C Compute and given the best estimate using
least squares
The Symmetric Eigenvalue-equivalent
Neighborhood Matrix
Eigenvalues of W
A Compute Eigenvalues
Calculate the ML function
12
Introduction to SAR Software
  • Following slides will describe
  • how we implemented the SAR model solution
  • execution trace
  • Matlab is good for small size problems
  • Memory is not enough
  • Execution time is too long
  • Compiler language needed for larger problems such
    as Fortran
  • Up to 80GB can be used on supercomputers
  • Execution time decreased considerably due to
    parallel processing

13
1. Pre-processing Step
  • It consists of four sub-steps
  • Forming Epsilon (?) Vector
  • Form W, the row-normalized Neighborhood Matrix
  • Symmetrize W
  • Form y x Vectors


Form Training Data Set y
To box B

Symmetrization of W
Form W
To box A
14
1.0 Form Epsilon (?) Vector
  • Input p4 q4 npq16
  • Output a 16-by-1 column vector ? (epsilon)
  • The ? term i.e. the 16-by-1 vector of
    unobservable error term N(0, ?2I) in the SAR
    equation
  • y ?Wy x? ?
  • The elements are IID with zero mean and unit
    standard deviation normal random numbers
  • Prof. Isaku Wadas normal random number generator
    written in C is used, which is much better than
    Matlabs nrand function
  • ? T0.9914 -1.2780 -0.4976 1.2271 0.4740 -0.0700
    0.0834 -0.8789 0.3378 -1.5901 0.0638 -0.2717
    2.3235 -0.2973 -0.1964 0.1416

15
1.1 Form W, the Neighborhood Matrix
  • Input p4 q4 npq16 neighborhood type (4-
    Neighbors)
  • Output The binary (non-row-normalized) 16-by-16
    C matrix the row-sum in a 16-by-1 column vector
    D_onehalf the row-normalized 16-by-16
    neighborhood matrix W
  • The neighborhood matrix, W is formed by using the
    following neighborhood relationship ((i.j) is the
    current pixel)
  • The solution can handle more cases such as
  • 1-D 2-neighbors
  • 2-D 8,16,24 etc. neighbors

16
1.1 Matrices for 4-by-4 Regular Grid Space
  • (a) (b)
    (c)
  • (a) The spatial framework which is p-by-q
    (4-by-4) where p may or may not be equal to q
  • (b) the pq-by-pq (16-by-16) non-row-normalized
    (non-stochastic) neighborhood matrix C with 4
    nearest neighbors, and
  • (c) the row-normalized version i.e. W which is
    also pq-by-pq (16-by-16). The product pq is equal
    to n (16), i.e. the problem size.

17
1.1 Neighborhood Structures
2-D 16 Neighborhood
2-D 4 Neighborhood
1-D 2 Neighborhood
2-D 8 Neighborhood
2-D 24 Neighborhood
18
1.2 Symmetrize W
  • Input p,q,n,W,C,D_onehalf
  • Output the 16-by-16 symmetric
    eigenvalue-eqiuvalent matrix of W
  • Matlab programs do not need this step since eig
    function can find the eigenvalues of a dense
    non-symmetric matrix
  • EISPACKs subroutines find all eigenvalues of a
    symmetric dense matrix. Therefore, need to
    convert W to its eigenvalue-equivalent form
  • The following short-cut algorithm achieves this
    task

19
1.3 Form y x Vectors
  • Input p,q,n,?,W
  • Output y and x vectors each of which is 16-by-1
  • They are the synthetic (training) data to test
    SAR model
  • Fortran programs perform this task separately in
    another program using NAG SMP library subroutines
  • For n16
  • xT 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15,16
  • yT 8.2439 7.9720 11.7226 16.4201 17.9623
    19.5719 22.8666 24.9425 29.9876 30.6348 35.1528
    37.4712 42.1301 42.7142 45.7171 47.8384

20
2. Find All of the Eigenvalues of W
  • Matlab programs use eig function which finds all
    of the eigenvalues of non-symmetric dense matrix
  • Fortran programs use the tred2 and tql1 EISPACK
    subroutines, which is the most efficient to find
    eigenvalues
  • There are two sub-steps
  • Convert dense symmetric matrix to tridiagonal
    matrix
  • Find all eigenvalues of the tridiagonal matrix

21
2.1 Convert symmetric matrix to tridiagonal matrix
  • Input n,
  • Output Diagonal elements of the resulting
    tri-diagonal matrix in 16-by-1 column vector d,
    the sub-diagonal elements of the resulting
    tri-diagonal matrix in 16-by-1 column vector e
  • This is Householder Transformation which is only
    used in fortran programs
  • This step is the most-time consuming part (99 of
    the total execution time)

22
2.2 Find All Eigenvalues of the Tridiagonal Matrix
  • Input Diagonal elements of the resulting
    tri-diagonal matrix in 16-by-1 column vector d,
    the sub-diagonal elements of the resulting
    tri-diagonal matrix in 16-by-1 column vector e
  • Output All of the eigenvalues of the
    neighborhood matrix W
  • This is QL transformation which is only used in
    fortran programs
  • This is left serial in fortran programs

23
3. Fit for the SAR Parameter rho (?)
  • There are two subroutines in this step
  • Calculating constant statistics terms
  • Saves time by calculating the non-changing terms
    during the golden section search algorithm
  • Golden section search
  • Similar to binary search

24
3.1 Calculating constant statistics terms
  • Input x,y,n,W
  • Output KY, KWY column vectors
  • Fortran programs use this subroutine to perform
    K-optimization where constant term
    KI-x((xTx)-1)xT which is 16-by-16 for n16
  • The second term in log-likelihood expression
  • yT (I-rho W)T I-x((xTx)-1)xTT
    I-x((xTx)-1)xT (I-rho W) y
  • yT (I-rhoW)' KT K (I-rho W) y
  • (K (I-rho W) y)T (K (I-rho W) y)
  • (Ky - rho KWy )T (Ky - rho KWy )
  • which saves many matrix-vector multiplications
  • Matlab programs directly calculate all (constant
    and non-constant) terms in the log-likelihood
    function over and over again so do not need this
    operation
  • Those terms are not expensive to calculate in
    Matlab but expensive in fortran

25
3.2 Golden Section Search
  • Input x,y,n,eigenn,W,KY,KWY,ax,bx,cx,tol
  • Output fgld, xmin, niter
  • The best estimate for rho (xmin) is found
  • The optimization function is the log-likelihood
    function
  • Similar to binary search and bisection method
  • The detailed outputs are given in the execution
    traces

26
4. Calculate Beta (?) and Sigma (?)
  • Input x,y,n,niter,rho_cap,W,KY,KWY
  • Output beta_cap and sigmasqr_cap which are both
    scalars
  • Niter (number of iterations), rho_cap are scalars
  • Calculating best estimate of beta and sigma2
  • The formulas are
  • As n (I.e. problem size) incerases, the estimates
    become more accurate

27
Sample Output from Matlab Programs (n16)
  • Comparison of methods
  • First row Dense straight log-det calculation
  • Second row Log-det calculation via exact
    eigenvalue calculation
  • Third row Log-det calculation via approximate
    eigenvalue calculation (Griffith)
  • Fourth row Log-det calculation via better
    approximate eigenvalue calculation (Griffith)
  • Fifth row is coming up as Markov Chain Monte
    Carlo approximated SAR
  • Columns are defined as follows actual rho0.412
    beta1.91
  • ML value min_log rho_cap niter
    beta_cap sigma_sqr
  • 2.6687 -1.0000 0.4014 77.0000 1.9465 0.8509
  • 2.6687 -1.0000 0.4014 77.0000 1.9465 0.8509
  • 2.6645 -1.0000 0.4018 77.0000 1.9452 0.8508
  • 2.6641 -1.0000 0.4019 77.0000 1.9451 0.8508

28
Sample Output from Matlab Programs (n2500)
  • Comparison of methods
  • First row Dense straight log-det calculation
  • Second row Log-det calculation via exact
    eigenvalue calculation
  • Third row Log-det calculation via approximate
    eigenvalue calculation (Griffith)
  • Fourth row Log-det calculation via better
    approximate eigenvalue calculation (Griffith)
  • Fifth row is coming up as Markov Chain Monte
    Carlo approximated SAR
  • Columns are defined as follows actual rho0.412
    beta1.91
  • ML value min_log rho_cap niter
    beta_cap sigma_sqr
  • 7.8685 -1.0000 0.4127 77.0000 1.9079 0.9985
  • 7.8685 -1.0000 0.4127 77.0000 1.9079 0.9985
  • 7.8683 -1.0000 0.4127 77.0000 1.9079 0.9985
  • 7.8681 -1.0000 0.4127 77.0000 1.9079 0.9985

29
Instructions on How-To-Run Matlab Code
  • On SGI Origins
  • train_test_SAR (interactively)
  • matlab -nodisplay lt train_test_SAR.m gt output
  • On IBM SP
  • same
  • On IBM Regatta
  • same
  • On Cray X1
  • Not available yet

30
Instructions on How-To-Run Fortran Code
  • On SGI Origins with 4 threads (processors)
  • f77 -64 -Ofast -mp sar_exactlogdeteigvals_2DN4_250
    0.f
  • setenv OMP_NUM_THREADS 4
  • time a.out
  • On IBM SP with 4 threads (processors)
  • xlf_r -O3 -qstrict -q64 -qsmpomp
    sar_exactlogdeteigvals_2DN4_2500.f
  • setenv OMP_NUM_THREADS 4
  • time a.out
  • On IBM Regatta with 4 threads (processors)
  • xlf_r -O3 -qstrict -q64 -qsmpomp
    sar_exactlogdeteigvals_2DN4_2500.f
  • setenv OMP_NUM_THREADS 4
  • time a.out
  • On Cray X1
  • Coming soon

31
Illustration on a 2-by-2 Regular Grid Space
32
Summary of the SAR Software i.e. The Big Picture
33
New Parallel SAR Model Solution Formulation
  • Prof. Mark Bulls Expert Programmer vs
    Parallelizing Compiler Scientific Programming
    1996 paper
  • The loop 240 loop 280 are the major bottlenecks
    and parallelized most of the code as will be
    shown
  • The data distribution on both loops should be
    similar to benefit from value re-use
  • Loop 280 cannot benefit from block-wise
    partitioning, it should use interleaved
    scheduling for load balance. Thus, both loops use
    interleaved scheduling
  • Parallelizing initialization phase imitates
    manual data distribution, page-placement
    page-migration utilities of SGI Origin machines
  • The variable etemp enables reduction operation
    on the variable e that is updated by different
    processors

34
Experimental Design Response Variables Factors
  • Speedup S Tserial / Tparallel ,
  • Time taken to solve a problem on a single
    processor over time to solve the same problem on
    a parallel computer with p identical processors.
  • Scalability of a Parallel System
  • The measure of the algorithms capacity to
    increase S in proportion to p in a particular
    parallel system.
  • Scalable systems has the ability to maintain
    efficiency (i.e. S / p) at a fixed value by
    simultaneously increasing p and PS.
  • Reflects a parallel systems ability to utilize
    increasing processing resources effectively.
  • Name of Factor Factors
    Parameter Domain
  • Problem Size (n) 400, 2500, 6400,10000
  • Neighborhood Structure 2-D 4-Neighbors
  • Range of rho 0,1)
  • Parallelization options Static scheduling for
    box B and C, dynamic with chunk size 4 for
    box A
  • Number of processors 1,,16
  • Algorithm used householder transformation
    followed by QL algorithm followed by golden
    section search

35
Related Work
  • Eigenvalue software on the web are studied in
    depth at the following web sites
  • http//www-users.cs.umn.edu/kazar/sar/sar_present
    ations/eigenvalue_solvers.html
  • http//www.netlib.org/utk/people/JackDongarra/la-s
    w.html
  • Griffith,1995 computed analytically the
    eigenvalues for 1-D, 2-D with 4,8 neighbor
    cases. However, this is tedious for other cases
  • The approximate and better approximate
    eigenvalues are from this source
  • However, there are no closed form expression for
    many other cases
  • Bin Li 1996 implemented parallel SAR using a
    constant mean model
  • The programs cannot run anymor (CM-fortran)
  • Our model is more general and hard to solve
  • Our programs are the only running algorithms in
    the literature

36
SAR Model Solutions are cross-classified
37
References
  • ICP03 Baris Kazar, Shashi Shekhar, and David J.
    Lilja, "Parallel Formulation of Spatial
    Auto-Regression", submitted to ICPP 2003 under
    review
  • IEE02 S. Shekhar, P. Schrater, R. Vatsavai, W.
    Wu, and S. Chawla, Spatial Contextual
    Classification and Prediction Models for Mining
    Geospatial Data , IEEE Transactions on Multimedia
    (special issue on Multimedia Dataabses) , 2002

38
Conclusions Future Work
  • Able to solve SAR model for any type of W matrix
    until the memory limit is reached by finding all
    of its eigenvalues
  • Finding eigenvalues is hard
  • Sparsity of W should be exploited if eigenvalue
    subroutines allow
  • Efficient Partitioning of very large sparse
    matrices via parMETIS
  • New methods will be studied
  • Markov Chain Monte Carlo Estimates (parSARmcmc)
  • parSALE parallel spatial auto-regression local
    estimation
  • Characteristic Polynomial Approach
  • Contributions to spatial statistics package
    Version 2.0 from Prof. Kelley Pace will continue

39
Short Tutorial on OpenMP
  • Fork-join model of parallel execution
  • The parallel regions are denoted by directives in
    fortran and pragmas in C/C
  • Data environment (first/last/thread) Private,
    shared variables and their scopes across parallel
    and serial regions
  • Work-sharing constructs parallel do, sections
    (Static, dynamic scheduling with/without chunks)
  • Synchronization atomic, critical, barrier,
    flush, ordered, implicit synchronization after
    each parallel for loop
  • Run-time library functions e.g. to determine
    which thread is executing at some time

40
Short Tutorial on Eigenvalues
  • Let A be a linear transformation represented by a
    matrix. If there is a X different from zero
    vector such that
  • for some scalar ?, then ? is called the
    eigenvalue of A with corresponding (right)
    eigenvector X.
  • Eigenvalues are also known as characteristic
    roots, proper values, or latent roots (Marcus and
    Minc 1988, p. 144).
  • (A-I ?)X0 is the characteristic equation
    (polynomial) and roots of this polynomial are the
    eigenvalues.

41
Hands-On Part
  • Please goto
  • http//www.cs.umn.edu/kazar/sar/index.html
  • Find 05.14.2003 phrase
  • Type in shashi for username
  • Type in shashi for password
  • To run the programs we need to login to one of
    the SGI Origins, IBM SP, IBM Regatta (Cray X1 is
    not ready yet)
  • All programs are run by submitting to a queue
  • No interactive runs recommended for fortran
    programs due to the system load and high number
    of processors needed for execution
Write a Comment
User Comments (0)
About PowerShow.com