Title: AHPCRC SPATIAL DATAMINING TUTORIAL on Scalable Parallel Formulations of Spatial AutoRegression SAR M
1AHPCRCSPATIAL DATA-MINING TUTORIALonScalable
Parallel Formulations of Spatial Auto-Regression
(SAR) Models for Mining Regular Grid Geospatial
Data
- Shashi Shekhar, Baris M. Kazar, David J. Lilja
- EECS Department _at_ University of Minnesota
- Army High Performance Computing Research Center
(AHPCRC) - Minnesota Supercomputing Institute (MSI)
- 05.14.2003
2Outline
- Motivation for Parallel SAR Models
- Background on Spatial Auto-regression Model
- Our Contributions
- Problem Definition Hypothesis
- Introduction to the SAR Software
- Experimental Design
- Related Work
- Conclusions and Future Work
3Motivation for Parallel SAR Models
- Linear regression models make the assumption of
independent identical distribution (a.k.a. iid)
about learning data samples. - Therefore, low prediction accuracy occurs
- SAR model generalization of linear regression
model with an auto-correlation term - Incorporating the auto-correlation term
- Results in better prediction accuracy
- However, computational complexity increases due
to the logarithm-determinant (a.k.a. Jacobian)
term of the maximum likelihood estimation
procedure - Parallel processing can reduce execution time
4Our Contributions
- This study is the first study that offers the
only available parallel SAR formulation and
evaluates its scalability - All of the eigenvalues of any type of dense
neighborhood (square) matrix can be computed in
parallel - Scalable parallel formulations of spatial
auto-regression (SAR) models for 1-D and 2-D
location prediction problems for planar surface
partitionings using the eigenvalue computation - Hand-parallelized EISPACK, pre-parallelized
LAPACK-based NAG linear algebra libraries and
shared-memory parallel programming, i.e. OpenMP
are used
5Background on Spatial Autoregression Model
- Mixed Regressive Spatial Auto-regressive (SAR)
Model - where
- y n-by-1 vector of observations on dependent
variable - ? spatial autoregression parameter
(coefficient) - W n-by-n matrix of spatial weights
- (i.e. contiguity, neighborhood matrix)
- X n-by-k matrix of observations on the
explanatory variables - ? k-by-1 vector of regression coefficients
- I n-by-n Identity matrix
- ? n-by-1 vector of unobservable error term
N(0, ?2I) - ? common variance of error term
- When ? 0, the SAR model collapses to the
classical regression model
6Background on SAR Model Contd
- If x 0 and W2 0, then first-order spatial
autoregressive model is obtained as follows - y ? W1 y ?
- ? N(0, ?2I)
- If W2 0, then mixed regressive spatial
autoregressive model (a.k.a. spatial lag model)
is obtained as follows - y ? W1 y x? ?
- ? N(0, ?2I)
- If W1 0, then regression model with spatial
autocorrelation in the disturbances (a.k.a.
spatial error model) is obtained as follows - y x? u
- u ? W2 u ?
- ? N(0, ?2I)
7Problem Definition
- Given
- A Sequential solution procedure Serial Dense
Matrix Approach - Same as Prof. Bin Lis method in Fortran 77
- Find
- Faster serial formulation called as Serial
Sparse Matrix Approach - New Parallel Formulation of Serial Dense Matrix
Approach different from Prof. Bin Lis method in
CM-fortran - Constraints ? ? N(0,?2I) IID
- Rastor Data (resulting in binary
neighborhood,contiguity matrix W) - Parallel Platform (HW Origin 3800, IBM SP, IBM
Regatta, Cray X1 SW Fortran, OpenMP) - Size of W (big vs. small and dense vs. sparse)
- Objective
- Maximize speedup (Tserial/Tparallel)
- Scalability- Better than (Li,1996)s formulation
8Hypotheses
- There are a number of sequential algorithms
computing SAR model, most of which are based on
the estimation of maximum likelihood method that
solves for the spatial autoregression parameter
(?) and regression coefficients (?). - As the problem size gets bigger, the sequential
methods are incapable of solving this problem due
to - extensive number of computations and
- large memory requirement.
- The new parallel formulation proposed in this
study will outperform the previous parallel
implementation in terms of - Speedup (S),
- Scalability,
- Problem size (PS) and,
- Memory requirement.
9Serial SAR Model Solution
- Starting with normal density function
- The maximum likelihood (ML) function
- The explicit form of maximum likelihood (ML)
function
10Serial SAR Model Solution - (Contd)
- The logarithm of the maximum likelihood function
is called - log-likelihood function
- The ML estimates of the SAR parameters
11System Diagram
B Golden Section Search to find that minimizes
ML function
Pre-processing Step
C Compute and given the best estimate using
least squares
The Symmetric Eigenvalue-equivalent
Neighborhood Matrix
Eigenvalues of W
A Compute Eigenvalues
Calculate the ML function
12Introduction to SAR Software
- Following slides will describe
- how we implemented the SAR model solution
- execution trace
- Matlab is good for small size problems
- Memory is not enough
- Execution time is too long
- Compiler language needed for larger problems such
as Fortran - Up to 80GB can be used on supercomputers
- Execution time decreased considerably due to
parallel processing
131. Pre-processing Step
- It consists of four sub-steps
- Forming Epsilon (?) Vector
- Form W, the row-normalized Neighborhood Matrix
- Symmetrize W
- Form y x Vectors
Form Training Data Set y
To box B
Symmetrization of W
Form W
To box A
141.0 Form Epsilon (?) Vector
- Input p4 q4 npq16
- Output a 16-by-1 column vector ? (epsilon)
- The ? term i.e. the 16-by-1 vector of
unobservable error term N(0, ?2I) in the SAR
equation - y ?Wy x? ?
- The elements are IID with zero mean and unit
standard deviation normal random numbers - Prof. Isaku Wadas normal random number generator
written in C is used, which is much better than
Matlabs nrand function - ? T0.9914 -1.2780 -0.4976 1.2271 0.4740 -0.0700
0.0834 -0.8789 0.3378 -1.5901 0.0638 -0.2717
2.3235 -0.2973 -0.1964 0.1416
151.1 Form W, the Neighborhood Matrix
- Input p4 q4 npq16 neighborhood type (4-
Neighbors) - Output The binary (non-row-normalized) 16-by-16
C matrix the row-sum in a 16-by-1 column vector
D_onehalf the row-normalized 16-by-16
neighborhood matrix W - The neighborhood matrix, W is formed by using the
following neighborhood relationship ((i.j) is the
current pixel) - The solution can handle more cases such as
- 1-D 2-neighbors
- 2-D 8,16,24 etc. neighbors
161.1 Matrices for 4-by-4 Regular Grid Space
- (a) (b)
(c) - (a) The spatial framework which is p-by-q
(4-by-4) where p may or may not be equal to q - (b) the pq-by-pq (16-by-16) non-row-normalized
(non-stochastic) neighborhood matrix C with 4
nearest neighbors, and - (c) the row-normalized version i.e. W which is
also pq-by-pq (16-by-16). The product pq is equal
to n (16), i.e. the problem size.
171.1 Neighborhood Structures
2-D 16 Neighborhood
2-D 4 Neighborhood
1-D 2 Neighborhood
2-D 8 Neighborhood
2-D 24 Neighborhood
181.2 Symmetrize W
- Input p,q,n,W,C,D_onehalf
- Output the 16-by-16 symmetric
eigenvalue-eqiuvalent matrix of W - Matlab programs do not need this step since eig
function can find the eigenvalues of a dense
non-symmetric matrix - EISPACKs subroutines find all eigenvalues of a
symmetric dense matrix. Therefore, need to
convert W to its eigenvalue-equivalent form - The following short-cut algorithm achieves this
task
191.3 Form y x Vectors
- Input p,q,n,?,W
- Output y and x vectors each of which is 16-by-1
- They are the synthetic (training) data to test
SAR model - Fortran programs perform this task separately in
another program using NAG SMP library subroutines - For n16
- xT 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15,16
- yT 8.2439 7.9720 11.7226 16.4201 17.9623
19.5719 22.8666 24.9425 29.9876 30.6348 35.1528
37.4712 42.1301 42.7142 45.7171 47.8384
202. Find All of the Eigenvalues of W
- Matlab programs use eig function which finds all
of the eigenvalues of non-symmetric dense matrix - Fortran programs use the tred2 and tql1 EISPACK
subroutines, which is the most efficient to find
eigenvalues - There are two sub-steps
- Convert dense symmetric matrix to tridiagonal
matrix - Find all eigenvalues of the tridiagonal matrix
212.1 Convert symmetric matrix to tridiagonal matrix
- Input n,
- Output Diagonal elements of the resulting
tri-diagonal matrix in 16-by-1 column vector d,
the sub-diagonal elements of the resulting
tri-diagonal matrix in 16-by-1 column vector e - This is Householder Transformation which is only
used in fortran programs - This step is the most-time consuming part (99 of
the total execution time)
222.2 Find All Eigenvalues of the Tridiagonal Matrix
- Input Diagonal elements of the resulting
tri-diagonal matrix in 16-by-1 column vector d,
the sub-diagonal elements of the resulting
tri-diagonal matrix in 16-by-1 column vector e - Output All of the eigenvalues of the
neighborhood matrix W - This is QL transformation which is only used in
fortran programs - This is left serial in fortran programs
233. Fit for the SAR Parameter rho (?)
- There are two subroutines in this step
- Calculating constant statistics terms
- Saves time by calculating the non-changing terms
during the golden section search algorithm - Golden section search
- Similar to binary search
243.1 Calculating constant statistics terms
- Input x,y,n,W
- Output KY, KWY column vectors
- Fortran programs use this subroutine to perform
K-optimization where constant term
KI-x((xTx)-1)xT which is 16-by-16 for n16 - The second term in log-likelihood expression
- yT (I-rho W)T I-x((xTx)-1)xTT
I-x((xTx)-1)xT (I-rho W) y - yT (I-rhoW)' KT K (I-rho W) y
- (K (I-rho W) y)T (K (I-rho W) y)
- (Ky - rho KWy )T (Ky - rho KWy )
- which saves many matrix-vector multiplications
- Matlab programs directly calculate all (constant
and non-constant) terms in the log-likelihood
function over and over again so do not need this
operation - Those terms are not expensive to calculate in
Matlab but expensive in fortran
253.2 Golden Section Search
- Input x,y,n,eigenn,W,KY,KWY,ax,bx,cx,tol
- Output fgld, xmin, niter
- The best estimate for rho (xmin) is found
- The optimization function is the log-likelihood
function - Similar to binary search and bisection method
- The detailed outputs are given in the execution
traces
264. Calculate Beta (?) and Sigma (?)
- Input x,y,n,niter,rho_cap,W,KY,KWY
- Output beta_cap and sigmasqr_cap which are both
scalars - Niter (number of iterations), rho_cap are scalars
- Calculating best estimate of beta and sigma2
- The formulas are
- As n (I.e. problem size) incerases, the estimates
become more accurate
27Sample Output from Matlab Programs (n16)
- Comparison of methods
- First row Dense straight log-det calculation
- Second row Log-det calculation via exact
eigenvalue calculation - Third row Log-det calculation via approximate
eigenvalue calculation (Griffith) - Fourth row Log-det calculation via better
approximate eigenvalue calculation (Griffith) - Fifth row is coming up as Markov Chain Monte
Carlo approximated SAR - Columns are defined as follows actual rho0.412
beta1.91 - ML value min_log rho_cap niter
beta_cap sigma_sqr - 2.6687 -1.0000 0.4014 77.0000 1.9465 0.8509
- 2.6687 -1.0000 0.4014 77.0000 1.9465 0.8509
- 2.6645 -1.0000 0.4018 77.0000 1.9452 0.8508
- 2.6641 -1.0000 0.4019 77.0000 1.9451 0.8508
28Sample Output from Matlab Programs (n2500)
- Comparison of methods
- First row Dense straight log-det calculation
- Second row Log-det calculation via exact
eigenvalue calculation - Third row Log-det calculation via approximate
eigenvalue calculation (Griffith) - Fourth row Log-det calculation via better
approximate eigenvalue calculation (Griffith) - Fifth row is coming up as Markov Chain Monte
Carlo approximated SAR - Columns are defined as follows actual rho0.412
beta1.91 - ML value min_log rho_cap niter
beta_cap sigma_sqr - 7.8685 -1.0000 0.4127 77.0000 1.9079 0.9985
- 7.8685 -1.0000 0.4127 77.0000 1.9079 0.9985
- 7.8683 -1.0000 0.4127 77.0000 1.9079 0.9985
- 7.8681 -1.0000 0.4127 77.0000 1.9079 0.9985
29Instructions on How-To-Run Matlab Code
- On SGI Origins
- train_test_SAR (interactively)
- matlab -nodisplay lt train_test_SAR.m gt output
- On IBM SP
- same
- On IBM Regatta
- same
- On Cray X1
- Not available yet
30Instructions on How-To-Run Fortran Code
- On SGI Origins with 4 threads (processors)
- f77 -64 -Ofast -mp sar_exactlogdeteigvals_2DN4_250
0.f - setenv OMP_NUM_THREADS 4
- time a.out
- On IBM SP with 4 threads (processors)
- xlf_r -O3 -qstrict -q64 -qsmpomp
sar_exactlogdeteigvals_2DN4_2500.f - setenv OMP_NUM_THREADS 4
- time a.out
- On IBM Regatta with 4 threads (processors)
- xlf_r -O3 -qstrict -q64 -qsmpomp
sar_exactlogdeteigvals_2DN4_2500.f - setenv OMP_NUM_THREADS 4
- time a.out
- On Cray X1
- Coming soon
31Illustration on a 2-by-2 Regular Grid Space
32Summary of the SAR Software i.e. The Big Picture
33New Parallel SAR Model Solution Formulation
- Prof. Mark Bulls Expert Programmer vs
Parallelizing Compiler Scientific Programming
1996 paper - The loop 240 loop 280 are the major bottlenecks
and parallelized most of the code as will be
shown - The data distribution on both loops should be
similar to benefit from value re-use - Loop 280 cannot benefit from block-wise
partitioning, it should use interleaved
scheduling for load balance. Thus, both loops use
interleaved scheduling - Parallelizing initialization phase imitates
manual data distribution, page-placement
page-migration utilities of SGI Origin machines - The variable etemp enables reduction operation
on the variable e that is updated by different
processors
34Experimental Design Response Variables Factors
- Speedup S Tserial / Tparallel ,
- Time taken to solve a problem on a single
processor over time to solve the same problem on
a parallel computer with p identical processors. - Scalability of a Parallel System
- The measure of the algorithms capacity to
increase S in proportion to p in a particular
parallel system. - Scalable systems has the ability to maintain
efficiency (i.e. S / p) at a fixed value by
simultaneously increasing p and PS. - Reflects a parallel systems ability to utilize
increasing processing resources effectively.
- Name of Factor Factors
Parameter Domain - Problem Size (n) 400, 2500, 6400,10000
- Neighborhood Structure 2-D 4-Neighbors
- Range of rho 0,1)
- Parallelization options Static scheduling for
box B and C, dynamic with chunk size 4 for
box A - Number of processors 1,,16
- Algorithm used householder transformation
followed by QL algorithm followed by golden
section search
35Related Work
- Eigenvalue software on the web are studied in
depth at the following web sites - http//www-users.cs.umn.edu/kazar/sar/sar_present
ations/eigenvalue_solvers.html - http//www.netlib.org/utk/people/JackDongarra/la-s
w.html - Griffith,1995 computed analytically the
eigenvalues for 1-D, 2-D with 4,8 neighbor
cases. However, this is tedious for other cases - The approximate and better approximate
eigenvalues are from this source - However, there are no closed form expression for
many other cases - Bin Li 1996 implemented parallel SAR using a
constant mean model - The programs cannot run anymor (CM-fortran)
- Our model is more general and hard to solve
- Our programs are the only running algorithms in
the literature
36SAR Model Solutions are cross-classified
37References
- ICP03 Baris Kazar, Shashi Shekhar, and David J.
Lilja, "Parallel Formulation of Spatial
Auto-Regression", submitted to ICPP 2003 under
review - IEE02 S. Shekhar, P. Schrater, R. Vatsavai, W.
Wu, and S. Chawla, Spatial Contextual
Classification and Prediction Models for Mining
Geospatial Data , IEEE Transactions on Multimedia
(special issue on Multimedia Dataabses) , 2002
38Conclusions Future Work
- Able to solve SAR model for any type of W matrix
until the memory limit is reached by finding all
of its eigenvalues - Finding eigenvalues is hard
- Sparsity of W should be exploited if eigenvalue
subroutines allow - Efficient Partitioning of very large sparse
matrices via parMETIS - New methods will be studied
- Markov Chain Monte Carlo Estimates (parSARmcmc)
- parSALE parallel spatial auto-regression local
estimation - Characteristic Polynomial Approach
- Contributions to spatial statistics package
Version 2.0 from Prof. Kelley Pace will continue
39Short Tutorial on OpenMP
- Fork-join model of parallel execution
- The parallel regions are denoted by directives in
fortran and pragmas in C/C - Data environment (first/last/thread) Private,
shared variables and their scopes across parallel
and serial regions - Work-sharing constructs parallel do, sections
(Static, dynamic scheduling with/without chunks) - Synchronization atomic, critical, barrier,
flush, ordered, implicit synchronization after
each parallel for loop - Run-time library functions e.g. to determine
which thread is executing at some time
40Short Tutorial on Eigenvalues
- Let A be a linear transformation represented by a
matrix. If there is a X different from zero
vector such that - for some scalar ?, then ? is called the
eigenvalue of A with corresponding (right)
eigenvector X. - Eigenvalues are also known as characteristic
roots, proper values, or latent roots (Marcus and
Minc 1988, p. 144). - (A-I ?)X0 is the characteristic equation
(polynomial) and roots of this polynomial are the
eigenvalues.
41Hands-On Part
- Please goto
- http//www.cs.umn.edu/kazar/sar/index.html
- Find 05.14.2003 phrase
- Type in shashi for username
- Type in shashi for password
- To run the programs we need to login to one of
the SGI Origins, IBM SP, IBM Regatta (Cray X1 is
not ready yet) - All programs are run by submitting to a queue
- No interactive runs recommended for fortran
programs due to the system load and high number
of processors needed for execution