AHPCRC SPATIAL DATAMINING TUTORIAL on Scalable Parallel Formulations of Spatial AutoRegression SAR M - PowerPoint PPT Presentation

1 / 41

About This Presentation

Title:

AHPCRC SPATIAL DATAMINING TUTORIAL on Scalable Parallel Formulations of Spatial AutoRegression SAR M

Description:

... EISPACK, pre-parallelized LAPACK-based NAG linear algebra libraries and shared ... this task separately in another program using NAG SMP library subroutines ... – PowerPoint PPT presentation

Number of Views:89

Avg rating:3.0/5.0

Slides: 42

Provided by: baris4

Learn more at: https://www.spatial.cs.umn.edu

Category:

more less

Transcript and Presenter's Notes

Title: AHPCRC SPATIAL DATAMINING TUTORIAL on Scalable Parallel Formulations of Spatial AutoRegression SAR M

1
AHPCRCSPATIAL DATA-MINING TUTORIALonScalable
Parallel Formulations of Spatial Auto-Regression
(SAR) Models for Mining Regular Grid Geospatial
Data

Shashi Shekhar, Baris M. Kazar, David J. Lilja
EECS Department _at_ University of Minnesota
Army High Performance Computing Research Center
(AHPCRC)
Minnesota Supercomputing Institute (MSI)
05.14.2003

2
Outline

Motivation for Parallel SAR Models
Background on Spatial Auto-regression Model
Our Contributions
Problem Definition Hypothesis
Introduction to the SAR Software
Experimental Design
Related Work
Conclusions and Future Work

3
Motivation for Parallel SAR Models

Linear regression models make the assumption of
independent identical distribution (a.k.a. iid)
about learning data samples.
Therefore, low prediction accuracy occurs
SAR model generalization of linear regression
model with an auto-correlation term
Incorporating the auto-correlation term
Results in better prediction accuracy
However, computational complexity increases due
to the logarithm-determinant (a.k.a. Jacobian)
term of the maximum likelihood estimation
procedure
Parallel processing can reduce execution time

4
Our Contributions

This study is the first study that offers the
only available parallel SAR formulation and
evaluates its scalability
All of the eigenvalues of any type of dense
neighborhood (square) matrix can be computed in
parallel
Scalable parallel formulations of spatial
auto-regression (SAR) models for 1-D and 2-D
location prediction problems for planar surface
partitionings using the eigenvalue computation
Hand-parallelized EISPACK, pre-parallelized
LAPACK-based NAG linear algebra libraries and
shared-memory parallel programming, i.e. OpenMP
are used

5
Background on Spatial Autoregression Model

Mixed Regressive Spatial Auto-regressive (SAR)
Model
where
y n-by-1 vector of observations on dependent
variable
? spatial autoregression parameter
(coefficient)
W n-by-n matrix of spatial weights
(i.e. contiguity, neighborhood matrix)
X n-by-k matrix of observations on the
explanatory variables
? k-by-1 vector of regression coefficients
I n-by-n Identity matrix
? n-by-1 vector of unobservable error term
N(0, ?2I)
? common variance of error term
When ? 0, the SAR model collapses to the
classical regression model

6
Background on SAR Model Contd

If x 0 and W2 0, then first-order spatial
autoregressive model is obtained as follows
y ? W1 y ?
? N(0, ?2I)
If W2 0, then mixed regressive spatial
autoregressive model (a.k.a. spatial lag model)
is obtained as follows
y ? W1 y x? ?
? N(0, ?2I)
If W1 0, then regression model with spatial
autocorrelation in the disturbances (a.k.a.
spatial error model) is obtained as follows
y x? u
u ? W2 u ?
? N(0, ?2I)

7
Problem Definition

Given
A Sequential solution procedure Serial Dense
Matrix Approach
Same as Prof. Bin Lis method in Fortran 77
Find
Faster serial formulation called as Serial
Sparse Matrix Approach
New Parallel Formulation of Serial Dense Matrix
Approach different from Prof. Bin Lis method in
CM-fortran
Constraints ? ? N(0,?2I) IID
Rastor Data (resulting in binary
neighborhood,contiguity matrix W)
Parallel Platform (HW Origin 3800, IBM SP, IBM
Regatta, Cray X1 SW Fortran, OpenMP)
Size of W (big vs. small and dense vs. sparse)
Objective
Maximize speedup (Tserial/Tparallel)
Scalability- Better than (Li,1996)s formulation

8
Hypotheses

There are a number of sequential algorithms
computing SAR model, most of which are based on
the estimation of maximum likelihood method that
solves for the spatial autoregression parameter
(?) and regression coefficients (?).
As the problem size gets bigger, the sequential
methods are incapable of solving this problem due
to
extensive number of computations and
large memory requirement.
The new parallel formulation proposed in this
study will outperform the previous parallel
implementation in terms of
Speedup (S),
Scalability,
Problem size (PS) and,
Memory requirement.

9
Serial SAR Model Solution

Starting with normal density function

The maximum likelihood (ML) function

The explicit form of maximum likelihood (ML)
function

10
Serial SAR Model Solution - (Contd)

The logarithm of the maximum likelihood function
is called
log-likelihood function

The ML estimates of the SAR parameters

The function to optimize

11
System Diagram
B Golden Section Search to find that minimizes
ML function
Pre-processing Step
C Compute and given the best estimate using
least squares
The Symmetric Eigenvalue-equivalent
Neighborhood Matrix
Eigenvalues of W
A Compute Eigenvalues
Calculate the ML function
12
Introduction to SAR Software

Following slides will describe
how we implemented the SAR model solution
execution trace
Matlab is good for small size problems
Memory is not enough
Execution time is too long
Compiler language needed for larger problems such
as Fortran
Up to 80GB can be used on supercomputers
Execution time decreased considerably due to
parallel processing

13
1. Pre-processing Step

It consists of four sub-steps
Forming Epsilon (?) Vector
Form W, the row-normalized Neighborhood Matrix
Symmetrize W
Form y x Vectors

Form Training Data Set y
To box B

Symmetrization of W
Form W
To box A
14
1.0 Form Epsilon (?) Vector

Input p4 q4 npq16
Output a 16-by-1 column vector ? (epsilon)
The ? term i.e. the 16-by-1 vector of
unobservable error term N(0, ?2I) in the SAR
equation
y ?Wy x? ?
The elements are IID with zero mean and unit
standard deviation normal random numbers
Prof. Isaku Wadas normal random number generator
written in C is used, which is much better than
Matlabs nrand function
? T0.9914 -1.2780 -0.4976 1.2271 0.4740 -0.0700
0.0834 -0.8789 0.3378 -1.5901 0.0638 -0.2717
2.3235 -0.2973 -0.1964 0.1416

15
1.1 Form W, the Neighborhood Matrix

Input p4 q4 npq16 neighborhood type (4-
Neighbors)
Output The binary (non-row-normalized) 16-by-16
C matrix the row-sum in a 16-by-1 column vector
D_onehalf the row-normalized 16-by-16
neighborhood matrix W
The neighborhood matrix, W is formed by using the
following neighborhood relationship ((i.j) is the
current pixel)
The solution can handle more cases such as
1-D 2-neighbors
2-D 8,16,24 etc. neighbors

16
1.1 Matrices for 4-by-4 Regular Grid Space

(a) (b)
(c)
(a) The spatial framework which is p-by-q
(4-by-4) where p may or may not be equal to q
(b) the pq-by-pq (16-by-16) non-row-normalized
(non-stochastic) neighborhood matrix C with 4
nearest neighbors, and
(c) the row-normalized version i.e. W which is
also pq-by-pq (16-by-16). The product pq is equal
to n (16), i.e. the problem size.

17
1.1 Neighborhood Structures
2-D 16 Neighborhood
2-D 4 Neighborhood
1-D 2 Neighborhood
2-D 8 Neighborhood
2-D 24 Neighborhood
18
1.2 Symmetrize W

Input p,q,n,W,C,D_onehalf
Output the 16-by-16 symmetric
eigenvalue-eqiuvalent matrix of W
Matlab programs do not need this step since eig
function can find the eigenvalues of a dense
non-symmetric matrix
EISPACKs subroutines find all eigenvalues of a
symmetric dense matrix. Therefore, need to
convert W to its eigenvalue-equivalent form
The following short-cut algorithm achieves this
task

19
1.3 Form y x Vectors

Input p,q,n,?,W
Output y and x vectors each of which is 16-by-1
They are the synthetic (training) data to test
SAR model
Fortran programs perform this task separately in
another program using NAG SMP library subroutines
For n16
xT 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15,16
yT 8.2439 7.9720 11.7226 16.4201 17.9623
19.5719 22.8666 24.9425 29.9876 30.6348 35.1528
37.4712 42.1301 42.7142 45.7171 47.8384

20
2. Find All of the Eigenvalues of W

Matlab programs use eig function which finds all
of the eigenvalues of non-symmetric dense matrix
Fortran programs use the tred2 and tql1 EISPACK
subroutines, which is the most efficient to find
eigenvalues
There are two sub-steps
Convert dense symmetric matrix to tridiagonal
matrix
Find all eigenvalues of the tridiagonal matrix

21
2.1 Convert symmetric matrix to tridiagonal matrix

Input n,
Output Diagonal elements of the resulting
tri-diagonal matrix in 16-by-1 column vector d,
the sub-diagonal elements of the resulting
tri-diagonal matrix in 16-by-1 column vector e
This is Householder Transformation which is only
used in fortran programs
This step is the most-time consuming part (99 of
the total execution time)

22
2.2 Find All Eigenvalues of the Tridiagonal Matrix

Input Diagonal elements of the resulting
tri-diagonal matrix in 16-by-1 column vector d,
the sub-diagonal elements of the resulting
tri-diagonal matrix in 16-by-1 column vector e
Output All of the eigenvalues of the
neighborhood matrix W
This is QL transformation which is only used in
fortran programs
This is left serial in fortran programs

23
3. Fit for the SAR Parameter rho (?)

There are two subroutines in this step
Calculating constant statistics terms
Saves time by calculating the non-changing terms
during the golden section search algorithm
Golden section search
Similar to binary search

24
3.1 Calculating constant statistics terms

Input x,y,n,W
Output KY, KWY column vectors
Fortran programs use this subroutine to perform
K-optimization where constant term
KI-x((xTx)-1)xT which is 16-by-16 for n16
The second term in log-likelihood expression
yT (I-rho W)T I-x((xTx)-1)xTT
I-x((xTx)-1)xT (I-rho W) y
yT (I-rhoW)' KT K (I-rho W) y
(K (I-rho W) y)T (K (I-rho W) y)
(Ky - rho KWy )T (Ky - rho KWy )
which saves many matrix-vector multiplications
Matlab programs directly calculate all (constant
and non-constant) terms in the log-likelihood
function over and over again so do not need this
operation
Those terms are not expensive to calculate in
Matlab but expensive in fortran

25
3.2 Golden Section Search

Input x,y,n,eigenn,W,KY,KWY,ax,bx,cx,tol
Output fgld, xmin, niter
The best estimate for rho (xmin) is found
The optimization function is the log-likelihood
function
Similar to binary search and bisection method
The detailed outputs are given in the execution
traces

26
4. Calculate Beta (?) and Sigma (?)

Input x,y,n,niter,rho_cap,W,KY,KWY
Output beta_cap and sigmasqr_cap which are both
scalars
Niter (number of iterations), rho_cap are scalars
Calculating best estimate of beta and sigma2
The formulas are
As n (I.e. problem size) incerases, the estimates
become more accurate

27
Sample Output from Matlab Programs (n16)

Comparison of methods
First row Dense straight log-det calculation
Second row Log-det calculation via exact
eigenvalue calculation
Third row Log-det calculation via approximate
eigenvalue calculation (Griffith)
Fourth row Log-det calculation via better
approximate eigenvalue calculation (Griffith)
Fifth row is coming up as Markov Chain Monte
Carlo approximated SAR
Columns are defined as follows actual rho0.412
beta1.91
ML value min_log rho_cap niter
beta_cap sigma_sqr
2.6687 -1.0000 0.4014 77.0000 1.9465 0.8509
2.6687 -1.0000 0.4014 77.0000 1.9465 0.8509
2.6645 -1.0000 0.4018 77.0000 1.9452 0.8508
2.6641 -1.0000 0.4019 77.0000 1.9451 0.8508

28
Sample Output from Matlab Programs (n2500)

Comparison of methods
First row Dense straight log-det calculation
Second row Log-det calculation via exact
eigenvalue calculation
Third row Log-det calculation via approximate
eigenvalue calculation (Griffith)
Fourth row Log-det calculation via better
approximate eigenvalue calculation (Griffith)
Fifth row is coming up as Markov Chain Monte
Carlo approximated SAR
Columns are defined as follows actual rho0.412
beta1.91
ML value min_log rho_cap niter
beta_cap sigma_sqr
7.8685 -1.0000 0.4127 77.0000 1.9079 0.9985
7.8685 -1.0000 0.4127 77.0000 1.9079 0.9985
7.8683 -1.0000 0.4127 77.0000 1.9079 0.9985
7.8681 -1.0000 0.4127 77.0000 1.9079 0.9985

29
Instructions on How-To-Run Matlab Code

On SGI Origins
train_test_SAR (interactively)
matlab -nodisplay lt train_test_SAR.m gt output
On IBM SP
same
On IBM Regatta
same
On Cray X1
Not available yet

30
Instructions on How-To-Run Fortran Code

On SGI Origins with 4 threads (processors)
f77 -64 -Ofast -mp sar_exactlogdeteigvals_2DN4_250
0.f
setenv OMP_NUM_THREADS 4
time a.out
On IBM SP with 4 threads (processors)
xlf_r -O3 -qstrict -q64 -qsmpomp
sar_exactlogdeteigvals_2DN4_2500.f
setenv OMP_NUM_THREADS 4
time a.out
On IBM Regatta with 4 threads (processors)
xlf_r -O3 -qstrict -q64 -qsmpomp
sar_exactlogdeteigvals_2DN4_2500.f
setenv OMP_NUM_THREADS 4
time a.out
On Cray X1
Coming soon

31
Illustration on a 2-by-2 Regular Grid Space
32
Summary of the SAR Software i.e. The Big Picture
33
New Parallel SAR Model Solution Formulation

Prof. Mark Bulls Expert Programmer vs
Parallelizing Compiler Scientific Programming
1996 paper
The loop 240 loop 280 are the major bottlenecks
and parallelized most of the code as will be
shown
The data distribution on both loops should be
similar to benefit from value re-use
Loop 280 cannot benefit from block-wise
partitioning, it should use interleaved
scheduling for load balance. Thus, both loops use
interleaved scheduling
Parallelizing initialization phase imitates
manual data distribution, page-placement
page-migration utilities of SGI Origin machines
The variable etemp enables reduction operation
on the variable e that is updated by different
processors

34
Experimental Design Response Variables Factors

Speedup S Tserial / Tparallel ,
Time taken to solve a problem on a single
processor over time to solve the same problem on
a parallel computer with p identical processors.
Scalability of a Parallel System
The measure of the algorithms capacity to
increase S in proportion to p in a particular
parallel system.
Scalable systems has the ability to maintain
efficiency (i.e. S / p) at a fixed value by
simultaneously increasing p and PS.
Reflects a parallel systems ability to utilize
increasing processing resources effectively.

Name of Factor Factors
Parameter Domain
Problem Size (n) 400, 2500, 6400,10000
Neighborhood Structure 2-D 4-Neighbors
Range of rho 0,1)
Parallelization options Static scheduling for
box B and C, dynamic with chunk size 4 for
box A
Number of processors 1,,16
Algorithm used householder transformation
followed by QL algorithm followed by golden
section search

35
Related Work

Eigenvalue software on the web are studied in
depth at the following web sites
http//www-users.cs.umn.edu/kazar/sar/sar_present
ations/eigenvalue_solvers.html
http//www.netlib.org/utk/people/JackDongarra/la-s
w.html
Griffith,1995 computed analytically the
eigenvalues for 1-D, 2-D with 4,8 neighbor
cases. However, this is tedious for other cases
The approximate and better approximate
eigenvalues are from this source
However, there are no closed form expression for
many other cases
Bin Li 1996 implemented parallel SAR using a
constant mean model
The programs cannot run anymor (CM-fortran)
Our model is more general and hard to solve
Our programs are the only running algorithms in
the literature

36
SAR Model Solutions are cross-classified
37
References

ICP03 Baris Kazar, Shashi Shekhar, and David J.
Lilja, "Parallel Formulation of Spatial
Auto-Regression", submitted to ICPP 2003 under
review
IEE02 S. Shekhar, P. Schrater, R. Vatsavai, W.
Wu, and S. Chawla, Spatial Contextual
Classification and Prediction Models for Mining
Geospatial Data , IEEE Transactions on Multimedia
(special issue on Multimedia Dataabses) , 2002

38
Conclusions Future Work

Able to solve SAR model for any type of W matrix
until the memory limit is reached by finding all
of its eigenvalues
Finding eigenvalues is hard
Sparsity of W should be exploited if eigenvalue
subroutines allow
Efficient Partitioning of very large sparse
matrices via parMETIS
New methods will be studied
Markov Chain Monte Carlo Estimates (parSARmcmc)
parSALE parallel spatial auto-regression local
estimation
Characteristic Polynomial Approach
Contributions to spatial statistics package
Version 2.0 from Prof. Kelley Pace will continue

39
Short Tutorial on OpenMP

Fork-join model of parallel execution
The parallel regions are denoted by directives in
fortran and pragmas in C/C
Data environment (first/last/thread) Private,
shared variables and their scopes across parallel
and serial regions
Work-sharing constructs parallel do, sections
(Static, dynamic scheduling with/without chunks)
Synchronization atomic, critical, barrier,
flush, ordered, implicit synchronization after
each parallel for loop
Run-time library functions e.g. to determine
which thread is executing at some time

40
Short Tutorial on Eigenvalues

Let A be a linear transformation represented by a
matrix. If there is a X different from zero
vector such that
for some scalar ?, then ? is called the
eigenvalue of A with corresponding (right)
eigenvector X.
Eigenvalues are also known as characteristic
roots, proper values, or latent roots (Marcus and
Minc 1988, p. 144).
(A-I ?)X0 is the characteristic equation
(polynomial) and roots of this polynomial are the
eigenvalues.

41
Hands-On Part

Please goto
http//www.cs.umn.edu/kazar/sar/index.html
Find 05.14.2003 phrase
Type in shashi for username
Type in shashi for password
To run the programs we need to login to one of
the SGI Origins, IBM SP, IBM Regatta (Cray X1 is
not ready yet)
All programs are run by submitting to a queue
No interactive runs recommended for fortran
programs due to the system load and high number
of processors needed for execution