GeoComputation presentation

About This Presentation

Transcript and Presenter's Notes

Title: GeoComputation

1
GeoComputation

Gordon Green
Hunter College
Department of Geography
11/2006

2
What is GeoComputation?

the eclectic application of computational
methods and techniques to portray spatial
properties, to explain geographic phenomena, and
to solve geographical problems -Couclelis, 1998
The application of a computational science
paradigm to study a wide range of problems in
geographical and earth systems. Openshaw, 2000
"The universe of computational techniques
applicable to spatial problems" -Reed and Tuton,
1998

3
What is GeoComputation?

Its champions consider it a new discipline.
Others consider it just a toolbox of
computationally intensive techniques.

4
What is GeoComputation?

Geo - emphasizes geographical and spatial
aspects. Some other approaches entail spatial
additions to methods that arose in non-spatial
fields. GeoComputational methods tend to be
inherently spatial.
Computation - equally important as the spatial
aspect. The approaches tend to come out of
computer science, rather than geography.
Traditional statistics tends to reduce and
summarize information GeoComputation tends to
retain or generate more complexity in its
operations.

5
What is GeoComputation?

Fotheringham differentiates between Weak GC,
where the computer is used to augment existing
quantitative methods and Strong GC, where the
computer "drives" the form of analysis.
Confirmatory versus exploratory techniques -- GC
tends to fall in the latter category.
The challenge of GC is to develop the ideas, the
methods, the models, and the paradigms able to
use the increasing computer speeds to do useful,
worthwhile, innovative ad new science in a
variety of geo-contexts. (Openshaw)

6
What is GeoComputation?

Aspects of GeoComputation
Multi-Agent Modeling
Cellular Automata
Fuzzy Modeling
Genetic Programming
Neural Networks

7
Agent-Based Models

An agent-based model is one in which the basic
unit of activity is a software agent.
Usually a model will contain many agents (at
least tens, occasionally many thousands).
Its outcomes are determined by the interactions
of the agents.
A agent is an identifiable unit of computer
program code which is autonomous and
goal-directed.
(from Schelhorn, OSullivan, Haklay,
Thurstain-Goodwin, 1999)

8
Agent-Based Models

An early example of agent-based modeling is Boids
(Reynolds, 1987), a model that describes flocking
behavior.
Explaining flocking using functional models is
complex.
By approaching it by modeling autonomous agents
with simple rules, it was easier to create an
explanatory model.
Agent models typically work by maintaining a
collection of agents, and a timer that advances
the state of each agent with each tick, each
agent updating its state based on characteristics
and location of the other agents and other
entities in the landscape.
Agent models are best implemented in an
object-oriented fashion

9
Agent-Based Models

Each agent has a simple set of rules defining its
behavior. Creating multiple instances of the
model and adjusting the rules made it possible to
model flocking behavior concisely.
Boids animation
More elaborate example

10
Agent-Based Models

These animations exhibit the phenomenon of
emergence, whereby complex behavior emerges from
the behavior of simple components.
Another example of agent-based modeling is
pedestrian modeling
Pedestrian model note formation of lanes.
Other sample pedestrian models

11
Agent-Based Models

Pedestrian model implementation (Batty, 1999)

12
Agent-Based Models

Pedestrian model implementation (Batty, 1999)

13
Agent-Based Models

Pedestrian model implementation (Batty, 1999)

14
Agent-Based Models

Pedestrian model implementation (Batty, 1999)

Pedestrian model design (Schelhorn, OSullivan,
Haklay, Thurstain-Goodwin, 1999)

16
(No Transcript)
17
Software for agent-based modeling (from Najlis,
Janssen, and Parker, 2001)
18
Agent-Based Models

When to use agents? Axtell (1999) describes three
basic cases when agent-based models might be
used
1. Agent models as simulation of mathematical
models agent models can be used to implement
monte-carlo simulations that can also be solved
using other numerical methods. Even though the
simulation can be solved in a functional manner,
an agent-based implementation can act as an
alternative implementation that can validate the
results.
Similarly, an agent-based model can be useful as
an illustration of a complex numerical model.
Even though the agent-based model may not be
necessary to come up with a solution, it may be
very helpful in illustrating the results to a
general audience.

19
Agent-Based Models

An example of this first use can be found in the
classic problem of modeling a bank-teller line
The queuing process is commonly simulated using a
the Monte Carlo method to arrive at a
distribution of waiting times.
This is can be equivalently modeled using agents
that each have different arrival times and other
parameters, and running the model with many
agents to similarly build up the resulting
distribution function.
(Axtell, 1999)

20
Agent-Based Models

2. Agent models as complementary to mathematical
models a mathematical model may adequately
model some aspects of a problem but not others,
or may be awkward or incapable of a comprehensive
solution. Or a mathematical model may be known,
but so complex as to be practically insoluble.
An artificial stock market is an example of such
a model. Trading agents have the choice of
investments with varying stochastic levels of
risk, adapt their behavior based on the results
of prior trading.
Even though the basic features of such a model
may be functionally describable, an agent-based
model can evolve over time into a system that
actually replicates some of the complexity of a
real stock market. These complexities are
impractical to model using a mathematical model.
(Axtell, 1999)

21
Agent-Based Models

3. Agent models as substitutes for mathematical
models some problems are not amenable to
mathematical modeling.
Examples include the behavior of individuals in
animal population, or groups of pedestrians as
described earlier.

22
Agent-Based Models

Pros
An argument from economics, which could also be
applied to geography
The reason why large scale computable general
equilibrium problems are difficult for economists
to solve is that they are using the wrong
hardware and software. Economists should design
their computations to mimic the real economy,
using massively parallel computers and
decentralized algorithms that allow competitive
equilibria to arise as emergent
computations...The most promising way for
economists to avoid the computational burdens
associated with solving realistic large scale
general equilibrium models is to adopt an
agent-based modeling strategy where equilibrium
prices and quantities emerge endogenously from
the decentralized interactions of agents. (Rust,
1996, in Axtell)
The quality of emergent behavior doesnt
correspond to any part of a traditional
statistical analysis. Agent-based models
uniquely provide the opportunity to model a
process, and see what happens when it runs
(Axtell).

23
Agent-Based Models

Cons
When dealing with agent models, we are quickly
involved in a world where everything - both the
agents and their environment - are designed, and
the reliable scientific analysis of the real
world may be compromised by the complexity of the
models (paraphrased from Couclelis, Why I No
Longer Work with Agents, discussing
human/environment agent models).
Agent models dont have a convenient way of
measuring their accuracy, unlike statistical
models. This can only be overcome by running the
agent model many times, varying the parameters to
discover the robustness of the results. There is
a limit as to how many such variations can be
executed (although this is increasing with
increases in computer power) (Axtell).

24
Cellular Automata

Cellular automata are closely related to
agent-based models.
Instead of an autonomous agents being given
simple behavioral rules, the states of cells in a
surface are dictated by similarly simple rules.
Those rules use the state of surrounding cells in
a grid to determine the state of any given cell
in the next iteration of the model.

25
Cellular Automata

CAs develop in space and time.
Space and time are defined in discrete steps.
Cells are lined up in a string for
one-dimensional automata, or arranged in a two or
higher dimensional lattice for two- or higher-
dimensional automata.
The number of states of each cell is finite.
The states of each cell are discrete.
All cells are identical.
The future state of each cell depends only of the
current state of the cell and the states of the
cells in the neighborhood, determined by rules
(from Alexander Schatten).

26
Cellular Automata

The simplest CAs are one-dimensional. For
example, here are a set of rules and the results
of a few iterations (from David R. Green)

27
Cellular Automata

For geographical applications, CAs are usually
2-dimensional, often using one of these cell
neighborhoods

28
Cellular Automata

Conways Game of life rules
Mathematician John Conway invented CAs in his
Game of Life, first described in a 1970
Scientific American article.
A cell that is dead at the time step t, becomes
alive at time t1 if exactly three of the eight
neighboring cells at time t were alive.
A cell that is alive at time t dies at time t1
if at time t less than two (loneliness) or more
than three cells (overcrowding) are alive.
(Alexander Schatten)
From these very simple rules, very complex
behaviors can be modeled.

29
Cellular Automata

Sample game of life pattern (glider)

30
Cellular Automata

There are many other examples available on the
web, such as
http//www.ibiblio.org/lifepatterns/
These simple rules can be expanded to generate
many, many patterns of change
http//www.collidoscope.com/modernca/

31
Cellular Automata

There are many enhancements involved in
generating more complex evolutionary patterns
Probabilistic CAs, where, instead of having
binary rules (and binary states), the changes are
described by probabilities, give an increased
level of control over how a CA develops. The
rules can express the chance of a cell changing
state, and each step of the CA involves selecting
the state of each cell based on a random number
falling within its probability range.
CAs can also be self-modifying, that is, they can
respond to changes in the states of the grid as
it develops.
Irregular grid cell sizes
Action a distance, instead of just immediately
adjacent cells.
Incorporating agents within CA landscapes

32
Cellular Automata

These more sophisticated CAs have been used to
model diverse geographic phenomena, such as
Wildfire
Formation of a Megalopolis
West Nile Virus Infection Risk
2 from Paul Torrens, http//geosimulation.org/sim
ulating-sprawl/, 3 from Sean Ahearn.

33
Cellular Automata

Cons (Batty and Xie, 2005)
Most CA rules have little relationship to the
actual causes of the phenomenon being studied.
Even if the model happens to model the process
successfully, it doesnt really prove anything,
and is difficult to use as the basis of any kind
of policy decision.
They tend not to model spatial interaction well.
In practice, CA cells tend not to match units of
the phenomenon under study.

34
Fuzzy Modeling

A frequent problem in geography is how to
identify an geographic entity.
Geographic phenomena tend to be described by
vague terms.
For example a study concerns the major cities in
Europe. Each of the terms major, city, in,
and Europe are somewhat vague.
Fuzzy set theory is an attempt to deal with the
problems posed to traditional set and logic
theory by vagueness.
(section paraphrased from Fisher, 2000)

35
Fuzzy Modeling

Sorites paradox
If a grain of sand is placed on a surface, is
there a heap?
If a second grain of sand is placed next to it,
is there a heap?
If a third grain of sand is placed next to the
second, is there a heap?
If a ten-millionth grain of sand is placed next
to the 9,999,999th, is there a heap?
Heap is a vague concept. Other examples are
tall, near, far, etc.
Many geographic values such as vegetation
coverage, soil types, etc, are similarly
Sorites-susceptible.

36
Fuzzy Modeling

Classical sets follow the logic described in
familiar Venn diagram (examples drawn from
behavior of boolean search terms)

37
Fuzzy Modeling

These can also be expressed using linear graphs

38
Fuzzy Modeling

Membership in a set is determined by some
threshold value. This value can be subject to
error, and can be assigned a probability, Those
probabilities can then be used in set-based
calculations

39
Fuzzy Modeling

Boolean set theory is used throughout most
conventional set and statistical analysis, For
example does the result of a test disprove the
null hypothesis? We set a threshold value to the
t test statistics and respond based on whether or
not the results of the hypothesis are above or
below that test. In many cases, there is
actually a continuous probability of the null
hypothesis being correct.
Zadeh 1965 put forward concept of fuzzy sets in
response to shortcomings of boolean sets.

40
Fuzzy Modeling

Fuzzy set membership is at the core of fuzzy set
theory. Boolean sets are encoded with 0 or 1,
wherein an entity is or is not part of a set.
Fuzzy set membership is defined by any value
between (and including) 0 and 1.

41
Fuzzy Modeling

Fuzzy set membership can take varied forms

42
Fuzzy Modeling

These then result in boolean calculations that
take into account partial set membership

43
Fuzzy Modeling

Boolean operations become functions rather than
boolean values

44
Fuzzy Modeling
45
Fuzzy Modeling
Example of fuzzy versus crisp classifications
46
Fuzzy Modeling
The concept of fuzziness can also be applied to
polygon boundaries
47
Fuzzy Modeling

Key question How do you choose the membership
function (MF)? Is it linear or curved?
The semantic import (SI) approach estimates
based on domain expertise and iterative
evaluation of the results.
Fuzzy K-means approach uses iterative
mathematical estimates of the best function.

48
Fuzzy Modeling

Fuzzy K-Means
Usually starts with random allocation of objects
into k clusters.
The center of each cluster is calculated
Objects are re-allocated based on similarity of
attributes using a similarity index, usually a
distance measurement.
This process is repeated until a stable solution
is reached.
Membership in each cluster is calculated as a
range of from 0 to 1, instead of 1 as in crisp
k-means (Burrough and McDonnell)

49
Fuzzy Modeling

Fuzzy K-Means

50
Fuzzy Modeling

Fuzzy modeling is often used in conjunction with
genetic algorithms, which provide a way of
iteratively refining the results of automated
models.
The following few slides will review this before
concluding the section on fuzzy modeling.

51
Genetic Algorithms

First developed in the 70s.
The basic idea is to model a problem with many
parameters by treating those parameters like
genes.
These are selected over time by iteratively
applying a measure of fitness, selecting the
fittest elements, and applying the process again.

52
Genetic Algorithms

Steps in an evolutionary algorithm
Create a new population of alternatives using
random values.
Select individuals from the population weighting
towards the individual that represents the best
solution so far.
Use them as the basis of a new set of
alternatives, by combining them (crossover) or
randomly changing them (mutation).
Continue this process until some terminating
condition is met.

53
Genetic Algorithms

Sample chromosome
A contrived simple problem
Given the digits 0 through 9 and the operators
, -, and /, find a sequence that will
represent a given target number. The operators
will be applied sequentially from left to right
as you read.
Encoding

54
Genetic Algorithms

Encoded solution
Crossover
Mutation Randomly changing bits

55
Genetic Algorithms

While not necessarily spatial per-se, a genetic
approach to model evaluation and selection can be
applied to many problems that require selecting
among many possible models.

56
Fuzzy Modeling

Using fuzzy modeling to classify coverage in
remotely sensed data involves fuzzy logic in
combination with a genetic classification process
(from TNTMips user guide)

57
Fuzzy Modeling
58
Fuzzy Modeling

The right-hand image shows the first-cut results
of an unsupervised image classification. The
right-hand image shows the extent to which each
cell differs from the center of its assigned
spectral class, so darker areas are more likely
to be in the assigned class

59
Fuzzy Modeling

The Fuzzy classification method uses rules of
fuzzy logic, which recognize that class
boundaries may be imprecise or gradational.
It creates an initial set of prototype classes,
then determines a membership grade for each
class for every cell.
The grades are used to adjust the class
assignments and calculate new class centers, and
the process repeats until the iteration limit is
reached. (TNTMips user guide).

60
Fuzzy Modeling

Champions of fuzzy modeling consider probability
theory to be a special case of fuzzy modeling.
Others consider fuzzy modeling to be unnecessary,
and replaceable by Bayesian probability.

61
Neural Networks

The last geocomputational technique we will cover
is neural networks, or neurocomputing.
Definition A computational neural network (NCC),
is a parallel distributed information structure
consisting of a set of adaptive processing
(computational) elements and a set of
unidirectional data connections (Fischer and
Abrahard).
These models have been inspired by neuroscience,
but in no way actually model biological or
neurological phenomena.

62
Neural Networks

63
Neural Networks

Typical neural network diagram

64
Neural Networks

A Neural Network consists of various layers.
Each layer can any number of neurons in it.
The first layer of the network is called an input
layer, and it is here we apply the input.
The last layer is called the output layer, and it
is from here we take the output.
A neural network can have any number of hidden
layers, between the input and output layer.
In most neural network models, a neuron in one
layer is connected to all neurons in the next
layer (from a description by Anoop Madhusudanan).

65
Neural Networks

Here is a diagram of the simplest possible neural
network. N1 and N2 are inputs, N3 and N4 are in
a hidden layer, and N5 is the output

66
Neural Networks

Within each node, or neuron, there is a
function that determines the output of the node
based on the inputs
Input data are typically specified to take values
in any range, whereas output is given in limited
ranges.

67
Neural Networks

Each input has a weight. The totals of the input
constitute an activity level. When the
activity level reaches a certain threshold, the
output changes.
The output may also change continuously with
changes in input, for example following a curve.
This curve represents the transfer function, by
which inputs are converted to outputs

68
Neural Networks

Neural networks must be trained. Training
consists of adjusting the weighting of inputs
depending on the value of outputs.
The can be accomplished with a feedback function,
which for example, may take the output and feed
it back to the hidden or input layers. The
hidden layer may in turn then feedback to the
input layer.
The feedback function adjusts the weighting so
that correct results are increased with further
iterations of the network. Note that weights can
be negative, so neurons may be either inhibitory
or excitatory.

69
Neural Networks

For example, a neural network designed to detect
the pattern of the number four might have an
input node for every pixel in a grid.
It would be trained (weights adjusted) using
multiple images of the number four

70
Neural Networks

Training may be supervised or unsupervised. Most
implementations are supervised.
In supervised training, the training data
consists of inputs together with expected
outputs.
Neural networks work best with boolean or ordinal
or real numeric data. Non-ordinal set-based data
types tend not to work because they do not lend
themselves to weighting.
With enough training data where the inputs and
outputs are known, the network implicitly begins
to model a function which can be applied to
unknown inputs.
The training data selected depends on user
knowledge and intuition about the problem domain.
Neural networks take into account existing
knowledge about a problem via the training
process.
(Paraphrased from StatSoft manual).

71
Neural Networks

Neural networks work best when there is a lot of
training data. For most practical problems,
hundreds or thousands of training cases are
required. The exact number depends on the nature
of the problem.
Neural networks tend to be tolerant of noisy
input, but there can be problems if the training
data does not include outliers found in the
subject data. In these cases the outliers may be
ignored.
They tend to work well with regression problems
(where the output will be a specific number) and
with classification problems (where the output
may be a boolean value or a set of values).
Pattern recognition is one common application.

72
Neural Networks

Applications in GIS
Image classification neural networks can be
used to optimally classify remotely sensed
images.
Feature detection the pattern-recognition
strengths of neural networks could be used to
identify features in remotely sensed data.
Transportation route selection for example,
route and congestion data can be used as inputs
to neurons that optimize for shortest routes.
Optimal routing could be the output.

73
Neural Networks

Transportation example (from Thurston, GIS
Café.com)
A is starting point and B is destination.
Intersections are nodes in the network. The red
square is the location of an accident.

74
Neural Networks

The weighting of node inputs can be calibrated to
the traffic capacity of each. The accident
causes the affected nodes to inhibit activation,
resulting in the selection of non-affected nodes

75
Related Topics

Parallel processing and object-oriented software
development are closely related to
GeoComputation.
The following slides give a quick overview of
these related topics.

76
Parallel Processing

What is it?
Parallel processing is using more than one CPU
to perform computational tasks more quickly.
Types of parallel processing
SIMD Single instruction stream, multiple data
stream.
MIMD Multiple instruction streams, multiple
data streams.
SISD Single instruction stream, single data
stream, is the model in place in most current
PCs.
MISD Multiple instruction stream, single data
stream. This is really only a theoretical
construct, because it is not practical to have
multiple processors manage the same data.

77
Parallel Processing

SIMD each processor runs the same program on
different data (from Johnston, introduction to
HPC)

78
Parallel Processing

MIMD each processor runs different instructions
on different data

79
Parallel Processing

MIMD has proven to be the most generally
applicable parallel processing model.
Splitting up tasks into chunks that can be
executed in parallel is widely applicable to
geographic problems, where similar tasks need to
be applied to different data.
This kind of processing model can be implemented
on a SISD machine, and is a useful way of
thinking about many geocomputational practices.

80
Parallel Processing

When does it make sense to use it?
Some tasks are easier than others to make
parallel old cliché nine women cant produce
a baby in one month but 90 women can create 90
babies in nine months.
Whether or not a model can be implemented in a
parallel fashion depends not only on the nature
of the problem, but also on the number of
iterations needed.
Problems that are inherently serial, or that
would be very difficult to make parallel
(parallel code tends to take several times longer
to write), are not suitable
But most problems in geography are likely to be
parallel (e.g., you can usually subdivide a map,
and a problem, into sectors that can be processed
independently).

81
Parallel Processing

One interesting sidebar Distributed computing
or task farming across multiple machines

82
Parallel Processing

For example, large distributed networks of
volunteer machines can be used to process large
parallel processing tasks such as global climate
modeling.
BOINC software for distributed volunteer
computing supports such processing

83
Parallel Processing

BOINC Applications
Climate Prediction Model

84
Object-Oriented Programming

Traditional programming consists of functions
that roughly approximate the notion of a
mathematical function.
Problems are modeled by breaking the down into
layers of functions in a process called
structured decomposition.
Object oriented programming instead identifies
units of behavior and data that can be grouped
together to model some real-world entity.

85
Object-Oriented Programming
86
Object-Oriented Programming
87
Object-Oriented Programming

Parallel processing and object-oriented software
development make it easy to approach
computational problems in a bottom-up fashion,
using simple components to model systems that
would otherwise be extremely complex.
An object encapsulates its state and behaviors,
making it easy to create complex systems composed
of autonomous software entities.
An OOP and parallel-programming perspective
applied to spatial modeling is a convenient
approach for most GeoComputation techniques.

Write a Comment

User Comments (0)

About PowerShow.com

GeoComputation PowerPoint PPT Presentation