Towards Personal High-Performance Geospatial Computing (HPC-G): Perspectives and a Case Study - PowerPoint PPT Presentation

About This Presentation
Title:

Towards Personal High-Performance Geospatial Computing (HPC-G): Perspectives and a Case Study

Description:

Towards Personal High-Performance Geospatial Computing (HPC-G): Perspectives and a Case Study Jianting Zhang Department of Computer Science, the City College of New York – PowerPoint PPT presentation

Number of Views:326
Avg rating:3.0/5.0
Slides: 26
Provided by: Jian95
Category:

less

Transcript and Presenter's Notes

Title: Towards Personal High-Performance Geospatial Computing (HPC-G): Perspectives and a Case Study


1
Towards Personal High-Performance Geospatial
Computing (HPC-G) Perspectives and a Case Study
  • Jianting Zhang
  • Department of Computer Science, the City College
    of New Yorkjzhang_at_cs.ccny.cuny.edu

2
Outline
  • Introduction
  • Geospatial Data, GIS, Spatial Databases and HPC
  • Geospatial data whats special?
  • GIS impacts of hardware architectures
  • Spatial Databases parallel DB or MapReduce?
  • HPC many options
  • Personal HPC-G A New Framework
  • Why Personal HPC for geospatial data?
  • GPGPU Computing a brief introduction
  • Pipelining CPU and GPU workloads for performance
  • Parallel GIS prototype development strategies
  • A Case Study Geographically Weighted Regression
  • Summary and Conclusions

3
Ecological Informatics
  • Computer Science
  • Spatial Databases
  • Mobile Computing
  • Data Mining

Geography GIS Applications Remote Sensing
4
Introduction Personal Stories
  • Computational intensive problems in geospatial
    data processing
  • Distributed hydrological modeling/flood
    simulation
  • Satellite image processing clustering/classificat
    ion (multi-/hyper-spectral)
  • Identifying and tracking storms from time-series
    NEXRAD images
  • Species distribution modeling (e.g.
    regression/GA-based)
  • History of accesses to HPC resources
  • 1994 Simulating a 33 hours flood on a PC
    (33MHZ/4M) took 50 hours
  • 2000 A Cray machine was available but special
    arrangement was required to access it while
    taking a course (Parallel and Distributed
    Processing)
  • 2004-2007 HPC resources at SDSC were available
    to the SEEK project but the project ended up only
    using SRB for data/metadata storage
  • 2009-2010 An Nvidia Quadro FX 3700 GPU card
    (that came with a Dell workstation) gave 23X
    speedup after porting a serial CPU codebase (for
    SSDBM10) to CUDA platform (ACM-GIS10)

5
  • Two books that changed my research focus (as a
    database person )

2nd edition?4th edition
http//courses.engr.illinois.edu/ece498/al/
As well as a few visionary database research
papers
  • David J. DeWitt, Jim Gray Parallel Database
    Systems The Future of High Performance Database
    Systems. Commun. ACM 35(6) 85-98 (1992)
  • Anastassia Ailamaki, David J. DeWitt, Mark D.
    Hill, David A. Wood DBMSs on a Modern Processor
    Where Does Time Go? VLDB 1999 266-277
  • J. Cieslewicz and K.A. Ross Database
    Optimizations for Modern Hardware. Proceedings of
    the IEEE, 96(5)2008

6
Introduction PGIS in traditional HPC Environment
A. Clematis, M. Mineter, and R. Marciano. High
performance computing with geographical data.
Parallel Computing, 29(10)12751279, 2003
  • Despite all these initiatives the impact of
    parallel GIS research has remained slight
  • the anticipated performance plateau became a
    mountain still being scaled
  • GIS companies found that, other than for
    concurrency in databases, their markets did not
    demand multi-processor performance.
  • While computing in general demands less of its
    users, HPC has demanded morethe barriers to use
    remain high and the range of options has
    increased
  • fundamental problem remains the fact that
    creating parallel GIS operations is non-trivial
    and there is a lack of parallel GIS algorithms,
    application libraries and toolkits.

If parallel GIS runs in a personal computing
environment, to what degree the conclusions will
change?
7
Introduction PGIS in Personal Computing
Environment
  • Every personal computer is now a parallel machine
  • Chip-Multiprocessors (CMP) Dual-core, Quad-core,
    Six-core CPUs
  • INTEL XEON E5520 379.99
  • 4 cores/8 threads 2.26G, 80W
  • 4256K L2 cache, 8M L3 cache
  • Max Memory Bandwidth 25.6GB/s
  • Massively parallel GPGPU computing Hundreds of
    GPU cores in a GPU card
  • Nvidia GTX480 499.99
  • 480 cores/ (151024 threads) 700/1401MHZ, 250W
    ?1.35 TFlops
  • 1532768 registers 1564K shared memory/L1
    cache 768 L2 cache additional constant/texture
    memory
  • 1.5G  GDDR5 1848MHZ clock rate, 384-bit memory
    interface width, 177.4 GB/s memory bandwidth

If these parallel computing powers are fully
utilized, to what degree a personal workstation
can match a traditional cluster for geospatial
data processing?
8
Geospatial data whats special?
  • The slowest processing unit determines the
    overall performance in parallel computing
  • Real world data very often are skewed

Wavelet compressed raster data
Clustered Point data
9
Geospatial data whats special?
  • Techniques to handle skewness
  • data decomposition/partition? spatial indexing
  • task scheduling

Complexities of task scheduling grow fast with
the number of tasks and generic scheduling
heuristics may not always produce good results
Simple equal-size partition may work well for
local operations, but may not for focal, zonal
and global operations which requires more
sophisticated partitions to achieve load balancing
10
GIS impacts of hardware architectures
  • GIS have been evolving along with mainstream
    information technologies
  • major platform shift from Unix workstations to
    Windows PCs in the early 1990s
  • the marriage with Web technologies to create
    Web-GIS in the late 1990s
  • Will GIS naturally evolve from serial to parallel
    as computers evolve from uniprocessor to chip
    multiprocessor?
  • What can the community do to speedup the
    evolution?

11
GIS impacts of hardware architectures
  • Three roles of GIS
  • data management
  • information visualization
  • modeling support
  • GIS-based spatial modeling, such as agent based
    modeling, is naturally suitable for HPC
  • Computational intensive
  • Adopt a raster tessellation and mostly involve
    local operations and/or focal operations with
    small constant numbers of neighbors -
    parallelization-friendly or even Embarrassingly
    parallel
  • Runs in an offline mode and uses traditional GIS
    for visualization
  • How to make full use of hardware and support data
    management and information visualization more
    efficiently and effectively?

12
HPC many options
  • The combination of architectural and
    organizational enhancements lead to 16 years of
    sustained growth in performance at an annual rate
    of 50 from 1986 to 2002, due to the combined
    power, memory and instruction-level parallelism
    problem, the growth rate has dropped to about 20
    per year from 2002 to 2006
  • In 2004, Intel cancelled its high-performance
    uniprocessor projects and joined IBM and Sun to
    declare that the road to higher performance would
    be via multiple processors per chip (or Chip
    Multiprocessors, CMP) rather than via faster
    uniprocessors.
  • As a marketing strategy, Nvidia calls a personal
    computer equipped with one or more of its
    high-end GPGPU cards as a personal supercomputer.
    Nvidia claimed that when compared to the latest
    quad-core CPU, Tesla 20-series GPU computing
    processors deliver equivalent performance at
    1/20th of power consumption and 1/10th of cost.

13
HPC many options
  1. CPU Multi-cores
  2. GPU Many-cores
  3. CPU Multi-nodes (traditional HPC)
  4. CPUGPU Multi-nodes (23)
  • How about 12? ?Personal HPC
  • Affordable and dedicated personal computing
    environment
  • No additional cost use-it or waste-it
  • Excellent visualization and user interaction
    supports
  • Can be the last-mile of a larger
    cyberinfrastructure
  • Data structures/algorithms/software are critical
    to the success

14
Personal HPC-G A New Framework
  • Additional arguments to advocate for Personal HPC
    for geospatial data
  • While some geospatial data processing tasks are
    computationally intensive, many more are data
    intensive in nature
  • Distributing large data chunks incur significant
    network and disk I/O overheads (50-100MB/s)
  • make full use of high interface bandwidths
    between CPU cores memory (10-30 GB/s), CPU
    memory??GPU memory(8GB/s) and GPU cores-memory
    (100-200 GB/s)
  • The improved CPUGPU performance will not only
    solve old problems faster but also allow many
    traditionally offline data processing tasks run
    online in an interactive manner. The
    uninterrupted exploration processes are likely to
    facilitate novel scientific discoveries more
    effectively.

15
Why Personal HPC for geospatial data?
High-Level Comparisons among Cluster Computing,
Cloud Computing and Personal HPC
Cluster Computing Cloud Computing Personal HPC
Initial cost High Low Low
Operational cost High Medium Low
End user control Low High High
Theoretical scalability High High Medium
User code development Medium Low High
Data management Low Medium Medium
Numeric modeling High Medium High
Interaction visualization Low Low High
16
Spatial Database parallel DB or MapReduce
  • Spatial databases GIS without GUI
  • Learn lessons from the relational databases on
    parallelization
  • The debates between Parallel DB and MapReduce
  • The emergence of hybrid approaches (e.g. HadoopDB
    )
  • While parallel processing of geospatial data to
    achieve high performance has been a research
    topic for quite a while, neither of them has been
    extensively applied to practical large-scale
    geospatial data management
  • Call for pilot studies in experimenting the two
    approaches to provide insights for future
    synthesis

17
GPGPU Computing Nvidia CUDA Compute Unified
Device Architecture AMD/ATI Stream Computing
18
Parallel GIS prototype development strategies
  • We envision that Personal HPC-G provides an
    opportunity to evolve traditional GIS to parallel
    GIS gradually. Community research and development
    efforts are needed to speed up the evolution.
  • We first propose to learn from existing parallel
    geospatial data processing algorithms and adapt
    them to CMP CPU and GPU architectures.
  • Second, we suggest study existing GIS modules
    (e.g., ArcGIS geoprocessing tools) carefully,
    identify most frequently used ones and develop
    parallel code for multicore CPUs and many-core
    GPUs
  • Third, while exiting database research on CMP CPU
    and GPU architectures are still relatively
    limited, they can be the starting point to
    investigate how geospatial data management can be
    realized on the new architectures and their
    hybridization
  • Finally, reuse existing CMP and GPU based
    software codebases developed by the computer
    vision and computer graphics communities

19
GWR Case Study
  • A conceptual design of efficiently implement GWR
    based on CUDA GPGPU computing architecture
    -preliminary in nature
  • Being realized by a master student at CCNY
  • Good C/C programming skills
  • New to GPGPU/CUDA programming
  • Being supported 5 hours/per week through a tiny
    grant (experiment on what 2000 can contribute to
    PGIS development)

20
GWR Case Study
  • GWR extends the traditional regression framework
    by allowing local parameters to be estimated
  • Given a neighborhood definition (or Bandwidth) of
    a data item, a traditional regression can be
    applied to data items that fall into the
    neighborhood or region.
  • The correlation coefficients for all the
    geo-referenced data items (raster cells or
    points) form a scalar field that can be
    visualized and interactively explored
  • By interactively changing some GWR parameters
    (e.g., bandwidth) and visual exploring the
    changes of the corresponding scalar fields, users
    can have better understanding of the
    distributions of GWR statistics and the original
    dataset.

21
GWR is computationally intensive
Dependent Variable
Independent Variable
8
7
9
6
8
7
7
8
7
6
5
4
6
6
5
5
4
3
Using an nn moving window to compute correlation
coefficients (n3). The correlation coefficient
at the dotted cell is r0.84
Point data are usually clustered which makes
load-balancing very difficult
22
GWR Case Study Overall Design
23
GWR Case Study From partial to total statistics
Let S1nSxiyi, S2Sxi, S3 Syi, S4nSxi2,
S5nSyi2, f can be computed from n and S1 through
S5. Assuming that data items D1, D2, Dn are
divided into m groups and each group has computed
their partial statistics s1, s2, s3, s4, s5, then
f can be computed from nj, S1j, S2j, S3j, S4j and
S5j as the following (j1,m) n Snj, S1nS
(S1j/nj), S2S S2j, S3S S3j, S4nS (S4j/nj),
S5nS (S5j/nj).
24
Summary and Conclusions
  • We aimed at introducing a new HPC framework for
    processing geospatial data in a personal
    computing environment, i.e., Personal HPC-G.
  • We argued that the fast increasing hardware
    capacities of modern personal computers equipped
    with chip multiprocessor CPUs and massively
    parallel GPU devices have make Personal HPC-G an
    attractive alternative to traditional Cluster
    computing and newly emerging Cloud computing for
    geospatial data processing.
  • We used a parallel design of GWR on Nvidia CUDA
    enabled GPU device as an example to discuss how
    Personal HPC-G can be utilized to realize
    parallel GIS modules by synergistic software and
    hardware co-programming.

25
QA
jzhang_at_cs.ccny.cuny.edu
25
Write a Comment
User Comments (0)
About PowerShow.com