Towards Personal High-Performance Geospatial Computing (HPC-G): Perspectives and a Case Study - PowerPoint PPT Presentation

About This Presentation

Title:

Towards Personal High-Performance Geospatial Computing (HPC-G): Perspectives and a Case Study

Description:

Towards Personal High-Performance Geospatial Computing (HPC-G): Perspectives and a Case Study Jianting Zhang Department of Computer Science, the City College of New York – PowerPoint PPT presentation

Number of Views:329

Avg rating:3.0/5.0

Slides: 26

Provided by: Jian95

Learn more at: http://www-cs.ccny.cuny.edu

Category:

more less

Transcript and Presenter's Notes

Title: Towards Personal High-Performance Geospatial Computing (HPC-G): Perspectives and a Case Study

1
Towards Personal High-Performance Geospatial
Computing (HPC-G) Perspectives and a Case Study

Jianting Zhang
Department of Computer Science, the City College
of New Yorkjzhang_at_cs.ccny.cuny.edu

2
Outline

Introduction
Geospatial Data, GIS, Spatial Databases and HPC
Geospatial data whats special?
GIS impacts of hardware architectures
Spatial Databases parallel DB or MapReduce?
HPC many options
Personal HPC-G A New Framework
Why Personal HPC for geospatial data?
GPGPU Computing a brief introduction
Pipelining CPU and GPU workloads for performance
Parallel GIS prototype development strategies
A Case Study Geographically Weighted Regression
Summary and Conclusions

3
Ecological Informatics

Computer Science
Spatial Databases
Mobile Computing
Data Mining

Geography GIS Applications Remote Sensing
4
Introduction Personal Stories

Computational intensive problems in geospatial
data processing
Distributed hydrological modeling/flood
simulation
Satellite image processing clustering/classificat
ion (multi-/hyper-spectral)
Identifying and tracking storms from time-series
NEXRAD images
Species distribution modeling (e.g.
regression/GA-based)
History of accesses to HPC resources
1994 Simulating a 33 hours flood on a PC
(33MHZ/4M) took 50 hours
2000 A Cray machine was available but special
arrangement was required to access it while
taking a course (Parallel and Distributed
Processing)
2004-2007 HPC resources at SDSC were available
to the SEEK project but the project ended up only
using SRB for data/metadata storage
2009-2010 An Nvidia Quadro FX 3700 GPU card
(that came with a Dell workstation) gave 23X
speedup after porting a serial CPU codebase (for
SSDBM10) to CUDA platform (ACM-GIS10)

Two books that changed my research focus (as a
database person )

2nd edition?4th edition
http//courses.engr.illinois.edu/ece498/al/
As well as a few visionary database research
papers

David J. DeWitt, Jim Gray Parallel Database
Systems The Future of High Performance Database
Systems. Commun. ACM 35(6) 85-98 (1992)
Anastassia Ailamaki, David J. DeWitt, Mark D.
Hill, David A. Wood DBMSs on a Modern Processor
Where Does Time Go? VLDB 1999 266-277
J. Cieslewicz and K.A. Ross Database
Optimizations for Modern Hardware. Proceedings of
the IEEE, 96(5)2008

6
Introduction PGIS in traditional HPC Environment
A. Clematis, M. Mineter, and R. Marciano. High
performance computing with geographical data.
Parallel Computing, 29(10)12751279, 2003

Despite all these initiatives the impact of
parallel GIS research has remained slight
the anticipated performance plateau became a
mountain still being scaled
GIS companies found that, other than for
concurrency in databases, their markets did not
demand multi-processor performance.
While computing in general demands less of its
users, HPC has demanded morethe barriers to use
remain high and the range of options has
increased
fundamental problem remains the fact that
creating parallel GIS operations is non-trivial
and there is a lack of parallel GIS algorithms,
application libraries and toolkits.

If parallel GIS runs in a personal computing
environment, to what degree the conclusions will
change?
7
Introduction PGIS in Personal Computing
Environment

Every personal computer is now a parallel machine
Chip-Multiprocessors (CMP) Dual-core, Quad-core,
Six-core CPUs
INTEL XEON E5520 379.99
4 cores/8 threads 2.26G, 80W
4256K L2 cache, 8M L3 cache
Max Memory Bandwidth 25.6GB/s
Massively parallel GPGPU computing Hundreds of
GPU cores in a GPU card
Nvidia GTX480 499.99
480 cores/ (151024 threads) 700/1401MHZ, 250W
?1.35 TFlops
1532768 registers 1564K shared memory/L1
cache 768 L2 cache additional constant/texture
memory
1.5G GDDR5 1848MHZ clock rate, 384-bit memory
interface width, 177.4 GB/s memory bandwidth

If these parallel computing powers are fully
utilized, to what degree a personal workstation
can match a traditional cluster for geospatial
data processing?
8
Geospatial data whats special?

The slowest processing unit determines the
overall performance in parallel computing
Real world data very often are skewed

Wavelet compressed raster data
Clustered Point data
9
Geospatial data whats special?

Techniques to handle skewness
data decomposition/partition? spatial indexing
task scheduling

Complexities of task scheduling grow fast with
the number of tasks and generic scheduling
heuristics may not always produce good results
Simple equal-size partition may work well for
local operations, but may not for focal, zonal
and global operations which requires more
sophisticated partitions to achieve load balancing
10
GIS impacts of hardware architectures

GIS have been evolving along with mainstream
information technologies
major platform shift from Unix workstations to
Windows PCs in the early 1990s
the marriage with Web technologies to create
Web-GIS in the late 1990s
Will GIS naturally evolve from serial to parallel
as computers evolve from uniprocessor to chip
multiprocessor?
What can the community do to speedup the
evolution?

11
GIS impacts of hardware architectures

Three roles of GIS
data management
information visualization
modeling support
GIS-based spatial modeling, such as agent based
modeling, is naturally suitable for HPC
Computational intensive
Adopt a raster tessellation and mostly involve
local operations and/or focal operations with
small constant numbers of neighbors -
parallelization-friendly or even Embarrassingly
parallel
Runs in an offline mode and uses traditional GIS
for visualization
How to make full use of hardware and support data
management and information visualization more
efficiently and effectively?

12
HPC many options

The combination of architectural and
organizational enhancements lead to 16 years of
sustained growth in performance at an annual rate
of 50 from 1986 to 2002, due to the combined
power, memory and instruction-level parallelism
problem, the growth rate has dropped to about 20
per year from 2002 to 2006
In 2004, Intel cancelled its high-performance
uniprocessor projects and joined IBM and Sun to
declare that the road to higher performance would
be via multiple processors per chip (or Chip
Multiprocessors, CMP) rather than via faster
uniprocessors.
As a marketing strategy, Nvidia calls a personal
computer equipped with one or more of its
high-end GPGPU cards as a personal supercomputer.
Nvidia claimed that when compared to the latest
quad-core CPU, Tesla 20-series GPU computing
processors deliver equivalent performance at
1/20th of power consumption and 1/10th of cost.

13
HPC many options

CPU Multi-cores
GPU Many-cores
CPU Multi-nodes (traditional HPC)
CPUGPU Multi-nodes (23)

How about 12? ?Personal HPC
Affordable and dedicated personal computing
environment
No additional cost use-it or waste-it
Excellent visualization and user interaction
supports
Can be the last-mile of a larger
cyberinfrastructure
Data structures/algorithms/software are critical
to the success

14
Personal HPC-G A New Framework

Additional arguments to advocate for Personal HPC
for geospatial data
While some geospatial data processing tasks are
computationally intensive, many more are data
intensive in nature
Distributing large data chunks incur significant
network and disk I/O overheads (50-100MB/s)
make full use of high interface bandwidths
between CPU cores memory (10-30 GB/s), CPU
memory??GPU memory(8GB/s) and GPU cores-memory
(100-200 GB/s)
The improved CPUGPU performance will not only
solve old problems faster but also allow many
traditionally offline data processing tasks run
online in an interactive manner. The
uninterrupted exploration processes are likely to
facilitate novel scientific discoveries more
effectively.

15
Why Personal HPC for geospatial data?
High-Level Comparisons among Cluster Computing,
Cloud Computing and Personal HPC
Cluster Computing Cloud Computing Personal HPC
Initial cost High Low Low
Operational cost High Medium Low
End user control Low High High
Theoretical scalability High High Medium
User code development Medium Low High
Data management Low Medium Medium
Numeric modeling High Medium High
Interaction visualization Low Low High
16
Spatial Database parallel DB or MapReduce

Spatial databases GIS without GUI
Learn lessons from the relational databases on
parallelization
The debates between Parallel DB and MapReduce
The emergence of hybrid approaches (e.g. HadoopDB
)
While parallel processing of geospatial data to
achieve high performance has been a research
topic for quite a while, neither of them has been
extensively applied to practical large-scale
geospatial data management
Call for pilot studies in experimenting the two
approaches to provide insights for future
synthesis

17
GPGPU Computing Nvidia CUDA Compute Unified
Device Architecture AMD/ATI Stream Computing
18
Parallel GIS prototype development strategies

We envision that Personal HPC-G provides an
opportunity to evolve traditional GIS to parallel
GIS gradually. Community research and development
efforts are needed to speed up the evolution.
We first propose to learn from existing parallel
geospatial data processing algorithms and adapt
them to CMP CPU and GPU architectures.
Second, we suggest study existing GIS modules
(e.g., ArcGIS geoprocessing tools) carefully,
identify most frequently used ones and develop
parallel code for multicore CPUs and many-core
GPUs
Third, while exiting database research on CMP CPU
and GPU architectures are still relatively
limited, they can be the starting point to
investigate how geospatial data management can be
realized on the new architectures and their
hybridization
Finally, reuse existing CMP and GPU based
software codebases developed by the computer
vision and computer graphics communities

19
GWR Case Study

A conceptual design of efficiently implement GWR
based on CUDA GPGPU computing architecture
-preliminary in nature
Being realized by a master student at CCNY
Good C/C programming skills
New to GPGPU/CUDA programming
Being supported 5 hours/per week through a tiny
grant (experiment on what 2000 can contribute to
PGIS development)

20
GWR Case Study

GWR extends the traditional regression framework
by allowing local parameters to be estimated
Given a neighborhood definition (or Bandwidth) of
a data item, a traditional regression can be
applied to data items that fall into the
neighborhood or region.
The correlation coefficients for all the
geo-referenced data items (raster cells or
points) form a scalar field that can be
visualized and interactively explored
By interactively changing some GWR parameters
(e.g., bandwidth) and visual exploring the
changes of the corresponding scalar fields, users
can have better understanding of the
distributions of GWR statistics and the original
dataset.

21
GWR is computationally intensive
Dependent Variable
Independent Variable
8
7
9
6
8
7
7
8
7
6
5
4
6
6
5
5
4
3
Using an nn moving window to compute correlation
coefficients (n3). The correlation coefficient
at the dotted cell is r0.84
Point data are usually clustered which makes
load-balancing very difficult
22
GWR Case Study Overall Design
23
GWR Case Study From partial to total statistics
Let S1nSxiyi, S2Sxi, S3 Syi, S4nSxi2,
S5nSyi2, f can be computed from n and S1 through
S5. Assuming that data items D1, D2, Dn are
divided into m groups and each group has computed
their partial statistics s1, s2, s3, s4, s5, then
f can be computed from nj, S1j, S2j, S3j, S4j and
S5j as the following (j1,m) n Snj, S1nS
(S1j/nj), S2S S2j, S3S S3j, S4nS (S4j/nj),
S5nS (S5j/nj).
24
Summary and Conclusions

We aimed at introducing a new HPC framework for
processing geospatial data in a personal
computing environment, i.e., Personal HPC-G.
We argued that the fast increasing hardware
capacities of modern personal computers equipped
with chip multiprocessor CPUs and massively
parallel GPU devices have make Personal HPC-G an
attractive alternative to traditional Cluster
computing and newly emerging Cloud computing for
geospatial data processing.
We used a parallel design of GWR on Nvidia CUDA
enabled GPU device as an example to discuss how
Personal HPC-G can be utilized to realize
parallel GIS modules by synergistic software and
hardware co-programming.