Title: Descriptive Statistics for Spatial Distributions Review Standard Descriptive Statistics Centrographic Statistics for Spatial Data Mean Center, Centroid, Standard Distance Deviation, Standard Distance Ellipse Density Kernel Estimation, Mapping
1Descriptive Statistics for Spatial
Distributions Review Standard Descriptive
Statistics Centrographic Statistics for Spatial
Data Mean Center, Centroid, Standard Distance
Deviation, Standard Distance Ellipse Density
Kernel Estimation, Mapping
2Spatial Analysis successive levels of
sophistication
 Spatial data description classic GIS
capabilities  Spatial queries measurement,
 buffering, map layer overlay
 Exploratory Spatial Data Analysis (ESDA)
 searching for patterns and possible explanations
 GeoVisualization through data graphing and
mapping  Descriptive spatial statistics Centrographic
statistics  Spatial statistical analysis and hypothesis
testing  Are data to be expected or are they
unexpected relative to some statistical model,
usually of a random process  Spatial modeling or prediction
 Constructing models (of processes) to predict
spatial outcomes (patterns)
?
3Standard Statistical Analysis
 Two parts
 Descriptive statistics
 Concerned with obtaining summary measures to
describe a set of data  For example, the mean and the standard
deviation  2. Inferential statistics
 Concerned with making inferences from samples
about a populations
Similarly, we have Descriptive and Inferential
Spatial Statistics
4Spatial Statistics
 Descriptive Spatial Statistics Centrographic
Statistics (This time)  single, summary measures of a spatial
distribution   Spatial equivalents of mean, standard
deviation, etc..  Inferential Spatial Statistics Point Pattern
Analysis (Next time)  Analysis of point location onlyno
quantity or magnitude (no attribute variable)  Quadrat Analysis
 Nearest Neighbor Analysis, Ripleys K function
 Spatial Autocorrelation (Weeks 5 and 6)
 One attribute variable with different magnitudes
at each location  The Weights Matrix
 Global Measures of Spatial Autocorrelation
(Morans I, Gearys C, Getis/Ord Global G)  Local Measures of Spatial Autocorrelation (LISA
and others)  Prediction with Correlation and Regression (Week
7)  Two or more attribute variables
 Standard statistical models
 Spatial statistical models
5Standard Statistical Analysis A Quick Review
 1. Descriptive statistics
 Concerned with obtaining summary measures to
describe a set of data  Calculate a few numbers to represent all the data
 we begin by looking at one variable
(univariate)  Later , we will look at two variables (bivariate)
 Three types
 Measures of Central Tendency
 Measures of Dispersion or Variability
 Frequency distributions
I hope you are already familiar with these. I
will quickly review the main ideas.
6Standard Descriptive Statistics Central Tendency
 Central Tendency single summary measure for one
variable  mean (average)
 median (middle value)
 50 larger and 50 smaller
 rank order data and select middle number
 3. mode (most frequently occurring)
These may be obtained in ArcGIS by opening a
table, right clicking on column heading, and
selecting Statistics going to
ArcToolboxgtAnalysisgtStatisticsgtSummary Statistics
7Calculation of mean and median
Mean 296.15 / 34 8.71 Median (7.69 7.8)/2
7.75 (there are 2 middle values)
Note data for Taiwan is included
8Standard Descriptive Statistics Variability or
Dispersion
 Dispersion measures of spread or variability
 Variance
 average squared distance of observations from
mean  Standard Deviation (square root of variance)
 average distance of observations from the mean
These may be obtained in ArcGIS by opening a
table, right clicking on column heading, and
selecting Statistics going to
ArcToolboxgtAnalysisgtStatisticsgtSummary Statistics
9Calculation of Variance and Standard Deviation
Variance from Definition Formula 1361.370/34
40.04 Variance from Computation
Formula 3940.924 (296.15 296.15)/34/34 40.0
4 Standard Deviation 40.04
6.33
Note data for Taiwan is included
10Classic Descriptive Statistics
Univariate Frequency distributions
 A count of the frequency with which values occur
on a variable
US population, by age group 50 million
people age 4559 (data for 2000)
Source http//www.census.gov/compendia/statab/
US Bureau of the Census Statistical Abstract of
the US
Often represented by the area under a frequency
curve
This area represents 100 of the data
100
In ArcGIS, you may obtain frequency counts on a
categorical variable via ArcToolboxgtAnalysisgt
StatisticsgtFrequency
11Frequency Distributions for China Province Data
Symetric Distribution
Height of bar shows frequency There are 16
provinces with percent urban between 38.4 and
50.8 (mode) Mode (38.150.8)/2 44.5 Mean
48.97 Median 44.0 Symetric distribution
mean median mode
Skewed Distribution (right skew)
Height of bar shows frequency There are 17
provinces with illiteracy between 5.4 and 10.7
(mode) Mode (5.410.7)/2 8.05 Mean
8.7 Median (7.69 7.8)/2 7.75 Symetric
distribution mean gt median
tail extends to right Mean is pulled to the
right
12Frequency Distributions for China Province
Data Variability
Symetric Distribution
Standard deviation A measure of the average
distance of each observation from the
mean Standard deviation 14.8
Skewed Distribution (right skew)
Standard deviation 6.33 On average, illiteracy
values are closer to the mean. There is less
spread in this data
tail extends to right
13Cautionthese values are incorrect!
 Why?
 Incorrect to calculate mean for percentages
 Each percentage has a different base population
 Should calculate weighted mean

 wi
population of each 
province  Very common error in GIS because we use
aggregated data frequently
14Correct Values!
 Unweighted mean 8.7
 Weighted mean 7.75
 Weighted mean is smaller. Why?
 The largest provinces Highest rates
in  have lower illiteracy small
provinces
15Calculation of weighted mean
Unweighted mean 296.15 / 34 8.71 Weighted
mean 10,445,390,141 / 1,347,382,600 7.75
Note we should also calculate a weighted
standard deviation
16Centrographic Statistics Descriptive statistics
for spatial distributions Mean Center Centroid Sta
ndard Distance Deviation Standard Distance
Ellipse Density Kernel Estimation (Add Frequency
Distributions and mappinguse GeoDA to produce)
17Centrographic Statistics
 Measures of Centrality Measures of Dispersion
 Mean Center  Standard Distance
 Centroid  Standard Deviational Ellipse
 Weighted mean center
 Center of Minimum Distance
 Two dimensional (spatial) equivalents of standard
descriptive statistics for a singlevariable
(univariate).  Used for point data
 May be used for polygons by first obtaining the
centroid of each polygon  Best used to compare two distributions with each
other  1990 with 2000
 males with females
(OU Ch. 4 p. 7781)
18Mean Center
 Simply the mean of the X and the mean of the Y
coordinates for a set of points  Sum of differences between the mean X and all
other Xs is zero (same for Y)  Minimizes sum of squared distances between
itself and all points
Distant points have large effect Values for
Xinjiang will have larger effect
Provides a single point summary measure for the
location of a set of points
19Centroid
 The equivalent for polygons of the mean center
for a point distribution  The center of gravity or balancing point of a
polygon  if polygon is composed of straight line segments
between nodes, centroid given by average X,
average Y of nodes  (there
is an example later)  Calculation sometimes approximated as center of
bounding box  Not good
 By calculating the centroids for a set of
polygons can apply Centrographic Statistics to
polygons
20Centroids for Provinces of China
21Centroids for Provinces of China
22Warning Centroid may not be inside its polygon
 For Gansu Province, China, centroid is within
neighboring province of Qinghai
 Problem arises with crescent shaped polygons
23Weighted Mean Center
 Produced by weighting each X and Y coordinate by
another variable (Wi)  Centroids derived from polygons can be weighted
by any characteristic of the polygon  For example, the population of a province
24Calculating the centroid of a polygon or the mean
center of a set of points.
(same example data as for area of polygon)
Calculating the weighted mean center. Note how
it is pulled toward the high weight point.
25Center of Minimum Distance or Median Center
 Also called point of minimum aggregate travel
 That point (MD) which minimizes sum of distances
between itself and all other points (i)  No direct solution. Can only be derived by
approximation  Not a determinate solution. Multiple points may
meet this criteriasee next bullet.  Same as Median center
 Intersection of two orthogonal lines (at right
angles to each other), such that each line has
half of the points to its left and half to its
right  Because the orientation of the axis for the lines
is arbitrary, multiple points may meet this
criteria.
Source Neft, 1966
26Median and Mean Centers for US Population
Median Center Intersection of a north/south and
an east/west line drawn so half of population
lives above and half below the e/w line, and half
lives to the left and half to the right of the
n/s line
Mean Center Balancing point of a weightless map,
if equal weights placed on it at the residence of
every person on census day.
Source US Statistical Abstract 2003
27Standard Distance Deviation
 Represents the standard deviation of the
distance of each point from the mean center  Is the two dimensional equivalent of standard
deviation for a single variable  Given by
 which by Pythagoras reduces to
 essentially the average distance of points
from the center  Provides a single unit measure of the spread or
dispersion of a distribution.  We can also calculate a weighted standard
distance analogous to the weighted mean center.
Or, with weights
28Standard Distance Deviation Example
Circle with radiiSDD2.9
29Standard Deviational Ellipse concept
 Standard distance deviation is a good single
measure of the dispersion of the points around
the mean center, but it does not capture any
directional bias  doesnt capture the shape of the distribution.
 The standard deviation ellipse gives dispersion
in two dimensions  Defined by 3 parameters
 Angle of rotation
 Dispersion (spread) along major axis
 Dispersion (spread) along minor axis
 The major axis defines the direction of maximum
spread of the distribution  The minor axis is perpendicular to it and defines
the minimum spread
30Standard Deviational Ellipse calculation
 Formulae for calculation may be found in
references such as  Lee and Wong pp. 4849
 Levine, Chapter 4, pp.125128
 Basic concept is to
 Find the axis going through maximum dispersion
(thus derive angle of rotation)  Calculate standard deviation of the points along
this axis (thus derive the length (radii) of
major axis)  Calculate standard deviation of points along the
axis perpendicular to major axis (thus derive the
length (radii) of minor axis)
31Mean Center Standard Deviational Ellipse
example
There appears to be no major difference between
the location of the software and the
telecommunications industry in North Texas.
32Implementation in ArcGIS
In ArcToolbox
 To calculate centroid for a set of polygons, with
ArcGIS  ArcToolboxgtData Management ToolsgtFeaturesgtFeature
to Point (requires ArcInfo)  To calculate using GeoDA
 ToolsgtShapegtPolygons to Centroids
33Density Kernel Estimation
 commonly used to visually enhance a point
pattern  Is an example of exploratory spatial data
analysis (ESDA)
Kernel10,000
Kernel5,000
34 SIMPLE Kernel option (see example above)
 A neighborhood or kernel is defined around each
grid cell consisting of all grid cells with
centers within the specified kernel (search)
radius  The number of points that fall within that
neighborhood is totaled  The point total is divided by the area of the
neighborhood to give the grid cells value
 Density KERNEL option
 a smoothly curved surface is fitted over each
point  The surface value is highest at the location of
the point, and diminishes with increasing
distance from the point, reaching zero at the
kernel distance from the point.  Volume under the surface equals 1 (or the
population value if a population variable is
used)  Uses quadratic kernel function described in
Silverman (1986, p. 76, equation 4.5).  The density at each output grid cell is
calculated by adding the values of all the kernel
surfaces where they overlay the grid cell center.
35Implementation in ArcGIS
 If specify a population field software
calculates as if there are that number of points
at that location.  The search radius
 the size of the neighborhood or kernel which is
successively defined around every cell (simple
kernel) or each point (density kernel)  Output cell size
 Size of each raster cell
 Search radius and output cell size are based on
measurement units of the data (here it is feet)  It is good to round them (e.g. to 10,000 and
1,000)
36What have we learned today?
 We have learned about descriptive spatial
statistics, often called Centrographic Statistics  Next time, we will learn about Inferential
Spatial Statistics
37Project for you
 The China data on my web site has population data
for the provinces of China in 2008  Obtain population counts for 2000, 1990 and/or
any other year  Calculate the weighted mean center of Chinas
population for each year  Be sure to use the same set of geographic units
each time  For example, if you do not have data for Taiwan
or Hong Kong for one year, omit these geographic
units for all years
38Texts
 OSullivan, David and David Unwin, 2010.
Geographic Information Analysis. Hoboken, NJ
John Wiley, 2nd ed.  Other Useful Books
 Mitchell, Andy 2005. ESRI Guide to GIS Analysis
Volume 2 Spatial Measurement Statistics.
Redlands, CA ESRI Press.  Allen, David W 2009. GIS Tutorial II Spatial
Analysis Workbook. Redlands, CA ESRI Press.  Wong, David W.S. and Jay Lee 2005. Statistical
Analysis of Geographic Information. Hoboken, NJ
John Wiley, 2nd ed.  Ned Levine and Associates, Crime Stat III Manual,
Washington, D.C. National Institutes of Justice,
2004 with later updates.  http//www.icpsr.umich.edu/CrimeStat/
 Density Kernel Estimation
 Silverman, B.W. 1986. Density Estimation for
Statistics and Data Analysis. New York Chapman
and Hall.