Descriptive Statistics for Spatial Distributions Review Standard Descriptive Statistics Centrographic Statistics for Spatial Data Mean Center, Centroid, Standard Distance Deviation, Standard Distance Ellipse Density Kernel Estimation, Mapping - PowerPoint PPT Presentation

Loading...

PPT – Descriptive Statistics for Spatial Distributions Review Standard Descriptive Statistics Centrographic Statistics for Spatial Data Mean Center, Centroid, Standard Distance Deviation, Standard Distance Ellipse Density Kernel Estimation, Mapping PowerPoint presentation | free to download - id: 3ba50e-YzNhM



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Descriptive Statistics for Spatial Distributions Review Standard Descriptive Statistics Centrographic Statistics for Spatial Data Mean Center, Centroid, Standard Distance Deviation, Standard Distance Ellipse Density Kernel Estimation, Mapping

Description:

Descriptive Statistics for Spatial Distributions Review Standard Descriptive Statistics Centrographic Statistics for Spatial Data Mean Center, Centroid, Standard ... – PowerPoint PPT presentation

Number of Views:613
Avg rating:3.0/5.0

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Descriptive Statistics for Spatial Distributions Review Standard Descriptive Statistics Centrographic Statistics for Spatial Data Mean Center, Centroid, Standard Distance Deviation, Standard Distance Ellipse Density Kernel Estimation, Mapping


1
Descriptive Statistics for Spatial
Distributions Review Standard Descriptive
Statistics Centrographic Statistics for Spatial
Data Mean Center, Centroid, Standard Distance
Deviation, Standard Distance Ellipse Density
Kernel Estimation, Mapping
2
Spatial Analysis successive levels of
sophistication
  • Spatial data description classic GIS
    capabilities
  • Spatial queries measurement,
  • buffering, map layer overlay
  • Exploratory Spatial Data Analysis (ESDA)
  • searching for patterns and possible explanations
  • GeoVisualization through data graphing and
    mapping
  • Descriptive spatial statistics Centrographic
    statistics
  • Spatial statistical analysis and hypothesis
    testing
  • Are data to be expected or are they
    unexpected relative to some statistical model,
    usually of a random process
  • Spatial modeling or prediction
  • Constructing models (of processes) to predict
    spatial outcomes (patterns)

?
3
Standard Statistical Analysis
  • Two parts
  • Descriptive statistics
  • Concerned with obtaining summary measures to
    describe a set of data
  • For example, the mean and the standard
    deviation
  • 2. Inferential statistics
  • Concerned with making inferences from samples
    about a populations

Similarly, we have Descriptive and Inferential
Spatial Statistics
4
Spatial Statistics
  • Descriptive Spatial Statistics Centrographic
    Statistics (This time)
  • single, summary measures of a spatial
    distribution
  • - Spatial equivalents of mean, standard
    deviation, etc..
  • Inferential Spatial Statistics Point Pattern
    Analysis (Next time)
  • Analysis of point location only--no
    quantity or magnitude (no attribute variable)
  • --Quadrat Analysis
  • --Nearest Neighbor Analysis, Ripleys K function
  • Spatial Autocorrelation (Weeks 5 and 6)
  • One attribute variable with different magnitudes
    at each location
  • The Weights Matrix
  • Global Measures of Spatial Autocorrelation
    (Morans I, Gearys C, Getis/Ord Global G)
  • Local Measures of Spatial Autocorrelation (LISA
    and others)
  • Prediction with Correlation and Regression (Week
    7)
  • Two or more attribute variables
  • Standard statistical models
  • Spatial statistical models

5
Standard Statistical Analysis A Quick Review
  • 1. Descriptive statistics
  • Concerned with obtaining summary measures to
    describe a set of data
  • Calculate a few numbers to represent all the data
  • we begin by looking at one variable
    (univariate)
  • Later , we will look at two variables (bivariate)
  • Three types
  • Measures of Central Tendency
  • Measures of Dispersion or Variability
  • Frequency distributions

I hope you are already familiar with these. I
will quickly review the main ideas.
6
Standard Descriptive Statistics Central Tendency
  • Central Tendency single summary measure for one
    variable
  • mean (average)
  • median (middle value)
  • --50 larger and 50 smaller
  • --rank order data and select middle number
  • 3. mode (most frequently occurring)

These may be obtained in ArcGIS by --opening a
table, right clicking on column heading, and
selecting Statistics --going to
ArcToolboxgtAnalysisgtStatisticsgtSummary Statistics
7
Calculation of mean and median
Mean 296.15 / 34 8.71 Median (7.69 7.8)/2
7.75 (there are 2 middle values)
Note data for Taiwan is included
8
Standard Descriptive Statistics Variability or
Dispersion
  • Dispersion measures of spread or variability
  • Variance
  • average squared distance of observations from
    mean
  • Standard Deviation (square root of variance)
  • average distance of observations from the mean

These may be obtained in ArcGIS by --opening a
table, right clicking on column heading, and
selecting Statistics --going to
ArcToolboxgtAnalysisgtStatisticsgtSummary Statistics
9
Calculation of Variance and Standard Deviation
Variance from Definition Formula 1361.370/34
40.04 Variance from Computation
Formula 3940.924 (296.15 296.15)/34/34 40.0
4 Standard Deviation 40.04
6.33
Note data for Taiwan is included
10
Classic Descriptive Statistics
Univariate Frequency distributions
  • A count of the frequency with which values occur
    on a variable

US population, by age group 50 million
people age 45-59 (data for 2000)
Source http//www.census.gov/compendia/statab/
US Bureau of the Census Statistical Abstract of
the US
Often represented by the area under a frequency
curve
This area represents 100 of the data
100
In ArcGIS, you may obtain frequency counts on a
categorical variable via --ArcToolboxgtAnalysisgt
StatisticsgtFrequency
11
Frequency Distributions for China Province Data
Symetric Distribution
Height of bar shows frequency There are 16
provinces with percent urban between 38.4 and
50.8 (mode) Mode (38.150.8)/2 44.5 Mean
48.97 Median 44.0 Symetric distribution
mean median mode
Skewed Distribution (right skew)
Height of bar shows frequency There are 17
provinces with illiteracy between 5.4 and 10.7
(mode) Mode (5.410.7)/2 8.05 Mean
8.7 Median (7.69 7.8)/2 7.75 Symetric
distribution mean gt median
tail extends to right Mean is pulled to the
right
12
Frequency Distributions for China Province
Data Variability
Symetric Distribution
Standard deviation A measure of the average
distance of each observation from the
mean Standard deviation 14.8
Skewed Distribution (right skew)
Standard deviation 6.33 On average, illiteracy
values are closer to the mean. There is less
spread in this data
tail extends to right
13
Cautionthese values are incorrect!
  • Why?
  • Incorrect to calculate mean for percentages
  • Each percentage has a different base population
  • Should calculate weighted mean

  • wi
    population of each

  • province
  • Very common error in GIS because we use
    aggregated data frequently

14
Correct Values!
  • Unweighted mean 8.7
  • Weighted mean 7.75
  • Weighted mean is smaller. Why?
  • The largest provinces Highest rates
    in
  • have lower illiteracy small
    provinces

15
Calculation of weighted mean
Unweighted mean 296.15 / 34 8.71 Weighted
mean 10,445,390,141 / 1,347,382,600 7.75
Note we should also calculate a weighted
standard deviation
16
Centrographic Statistics Descriptive statistics
for spatial distributions Mean Center Centroid Sta
ndard Distance Deviation Standard Distance
Ellipse Density Kernel Estimation (Add Frequency
Distributions and mappinguse GeoDA to produce)
17
Centrographic Statistics
  • Measures of Centrality Measures of Dispersion
  • Mean Center -- Standard Distance
  • Centroid -- Standard Deviational Ellipse
  • Weighted mean center
  • Center of Minimum Distance
  • Two dimensional (spatial) equivalents of standard
    descriptive statistics for a single-variable
    (univariate).
  • Used for point data
  • May be used for polygons by first obtaining the
    centroid of each polygon
  • Best used to compare two distributions with each
    other
  • 1990 with 2000
  • males with females

(OU Ch. 4 p. 77-81)
18
Mean Center
  • Simply the mean of the X and the mean of the Y
    coordinates for a set of points
  • Sum of differences between the mean X and all
    other Xs is zero (same for Y)
  • Minimizes sum of squared distances between
    itself and all points

Distant points have large effect Values for
Xinjiang will have larger effect
Provides a single point summary measure for the
location of a set of points
19
Centroid
  • The equivalent for polygons of the mean center
    for a point distribution
  • The center of gravity or balancing point of a
    polygon
  • if polygon is composed of straight line segments
    between nodes, centroid given by average X,
    average Y of nodes
  • (there
    is an example later)
  • Calculation sometimes approximated as center of
    bounding box
  • Not good
  • By calculating the centroids for a set of
    polygons can apply Centrographic Statistics to
    polygons

20
Centroids for Provinces of China
21
Centroids for Provinces of China
22
Warning Centroid may not be inside its polygon
  • For Gansu Province, China, centroid is within
    neighboring province of Qinghai
  • Problem arises with crescent- shaped polygons

23
Weighted Mean Center
  • Produced by weighting each X and Y coordinate by
    another variable (Wi)
  • Centroids derived from polygons can be weighted
    by any characteristic of the polygon
  • For example, the population of a province

24
Calculating the centroid of a polygon or the mean
center of a set of points.
(same example data as for area of polygon)
Calculating the weighted mean center. Note how
it is pulled toward the high weight point.
25
Center of Minimum Distance or Median Center
  • Also called point of minimum aggregate travel
  • That point (MD) which minimizes sum of distances
    between itself and all other points (i)
  • No direct solution. Can only be derived by
    approximation
  • Not a determinate solution. Multiple points may
    meet this criteriasee next bullet.
  • Same as Median center
  • Intersection of two orthogonal lines (at right
    angles to each other), such that each line has
    half of the points to its left and half to its
    right
  • Because the orientation of the axis for the lines
    is arbitrary, multiple points may meet this
    criteria.

Source Neft, 1966
26
Median and Mean Centers for US Population
Median Center Intersection of a north/south and
an east/west line drawn so half of population
lives above and half below the e/w line, and half
lives to the left and half to the right of the
n/s line
Mean Center Balancing point of a weightless map,
if equal weights placed on it at the residence of
every person on census day.
Source US Statistical Abstract 2003
27
Standard Distance Deviation
  • Represents the standard deviation of the
    distance of each point from the mean center
  • Is the two dimensional equivalent of standard
    deviation for a single variable
  • Given by
  • which by Pythagoras reduces to
  • ---essentially the average distance of points
    from the center
  • Provides a single unit measure of the spread or
    dispersion of a distribution.
  • We can also calculate a weighted standard
    distance analogous to the weighted mean center.

Or, with weights
28
Standard Distance Deviation Example
Circle with radiiSDD2.9
29
Standard Deviational Ellipse concept
  • Standard distance deviation is a good single
    measure of the dispersion of the points around
    the mean center, but it does not capture any
    directional bias
  • doesnt capture the shape of the distribution.
  • The standard deviation ellipse gives dispersion
    in two dimensions
  • Defined by 3 parameters
  • Angle of rotation
  • Dispersion (spread) along major axis
  • Dispersion (spread) along minor axis
  • The major axis defines the direction of maximum
    spread of the distribution
  • The minor axis is perpendicular to it and defines
    the minimum spread

30
Standard Deviational Ellipse calculation
  • Formulae for calculation may be found in
    references such as
  • Lee and Wong pp. 48-49
  • Levine, Chapter 4, pp.125-128
  • Basic concept is to
  • Find the axis going through maximum dispersion
    (thus derive angle of rotation)
  • Calculate standard deviation of the points along
    this axis (thus derive the length (radii) of
    major axis)
  • Calculate standard deviation of points along the
    axis perpendicular to major axis (thus derive the
    length (radii) of minor axis)

31
Mean Center Standard Deviational Ellipse
example
There appears to be no major difference between
the location of the software and the
telecommunications industry in North Texas.
32
Implementation in ArcGIS
In ArcToolbox
  • To calculate centroid for a set of polygons, with
    ArcGIS
  • ArcToolboxgtData Management ToolsgtFeaturesgtFeature
    to Point (requires ArcInfo)
  • To calculate using GeoDA
  • ToolsgtShapegtPolygons to Centroids

33
Density Kernel Estimation
  • commonly used to visually enhance a point
    pattern
  • Is an example of exploratory spatial data
    analysis (ESDA)

Kernel10,000
Kernel5,000
34
  • SIMPLE Kernel option (see example above)
  • A neighborhood or kernel is defined around each
    grid cell consisting of all grid cells with
    centers within the specified kernel (search)
    radius
  • The number of points that fall within that
    neighborhood is totaled
  • The point total is divided by the area of the
    neighborhood to give the grid cells value
  • Density KERNEL option
  • a smoothly curved surface is fitted over each
    point
  • The surface value is highest at the location of
    the point, and diminishes with increasing
    distance from the point, reaching zero at the
    kernel distance from the point.
  • Volume under the surface equals 1 (or the
    population value if a population variable is
    used)
  • Uses quadratic kernel function described in
    Silverman (1986, p. 76, equation 4.5).
  • The density at each output grid cell is
    calculated by adding the values of all the kernel
    surfaces where they overlay the grid cell center.

35
Implementation in ArcGIS
  • If specify a population field software
    calculates as if there are that number of points
    at that location.
  • The search radius
  • the size of the neighborhood or kernel which is
    successively defined around every cell (simple
    kernel) or each point (density kernel)
  • Output cell size
  • Size of each raster cell
  • Search radius and output cell size are based on
    measurement units of the data (here it is feet)
  • It is good to round them (e.g. to 10,000 and
    1,000)

36
What have we learned today?
  • We have learned about descriptive spatial
    statistics, often called Centrographic Statistics
  • Next time, we will learn about Inferential
    Spatial Statistics

37
Project for you
  • The China data on my web site has population data
    for the provinces of China in 2008
  • Obtain population counts for 2000, 1990 and/or
    any other year
  • Calculate the weighted mean center of Chinas
    population for each year
  • Be sure to use the same set of geographic units
    each time
  • For example, if you do not have data for Taiwan
    or Hong Kong for one year, omit these geographic
    units for all years

38
Texts
  • OSullivan, David and David Unwin, 2010.
    Geographic Information Analysis. Hoboken, NJ
    John Wiley, 2nd ed.
  • Other Useful Books
  • Mitchell, Andy 2005. ESRI Guide to GIS Analysis
    Volume 2 Spatial Measurement Statistics.
    Redlands, CA ESRI Press.
  • Allen, David W 2009. GIS Tutorial II Spatial
    Analysis Workbook. Redlands, CA ESRI Press.
  • Wong, David W.S. and Jay Lee 2005. Statistical
    Analysis of Geographic Information. Hoboken, NJ
    John Wiley, 2nd ed.
  • Ned Levine and Associates, Crime Stat III Manual,
    Washington, D.C. National Institutes of Justice,
    2004 with later updates.
  • http//www.icpsr.umich.edu/CrimeStat/
  • Density Kernel Estimation
  • Silverman, B.W. 1986. Density Estimation for
    Statistics and Data Analysis. New York Chapman
    and Hall.
About PowerShow.com