CPSC 601.04 - PowerPoint PPT Presentation

About This Presentation
Title:

CPSC 601.04

Description:

Title: CPSC 601.82 Lecture 8 Author: marina Last modified by: marina Created Date: 2/6/2003 4:10:28 AM Document presentation format: On-screen Show (4:3) – PowerPoint PPT presentation

Number of Views:109
Avg rating:3.0/5.0
Slides: 38
Provided by: Mar5276
Category:

less

Transcript and Presenter's Notes

Title: CPSC 601.04


1
CPSC 601.04
  • Statistical Analysis in GIS
  • Dr. M. Gavrilova

2
Overview
  • Importance of correct data representation
  • Variance and covariance
  • Autocorrelation
  • Applications to pattern analysis and geometric
    modeling

3
Overuse of color and dimensionality
Four colors, three dimensions, and two plots to
visualize five data points
http//www.math.yorku.ca/SCS/Gallery/
4
Misleading data axis
5
Overcrowded data
Steven Skiena, Stony Brook, NY
http//www.cs.sunysb.edu/skiena
6
Time increasing over time
http//www.math.yorku.ca/SCS/Gallery/
7
Scatterplot linear or logarithmic?
Results of a poll of happiness from the World
Values Survey project of people throughout the
world in relation to economy, GNP per
capita. Many countries, particularly those in
Latin America, had higher marks for happiness
than their economic situation would predict.
Conclusion is based on the assumption that
happiness should be linearly related to GNP.
8
GIS goals
  • An organized collection of computer hardware,
    software, geographic data, and personnel designed
    to efficiently capture, store, update,
    manipulate, analyze, and display all forms of
    geographically referenced data.

9
Spatial Analysis
  • Provides
  • an efficient and generally reliable means of
    obtaining knowledge about spatial processes,
  • a way of maximizing our knowledge of spatial
    processes with the minimum of error.

10
Spatial processes
  • Spatial Data
  • location and attribute ? Pi (x, y, z)
  • Spatial Stochastic Processes
  • statistics and inference
  • Spatial is special
  • spatial autocorrelation
  • spatial non-stationarity
  • proximity

11
Examples of data analysis
The Space Shuttle Challenger exploded shortly
after take-off in January 1986. Cause failure of
the O-ring seals used to isolate the fuel supply
from burning gases. Graph from the Report of the
Presidential Commission on the Space Shuttle
Challenger Accident, 1986. NASA staff had
analysed the data on the relation between
temperature and number of O-ring failures (out of
6), but they had excluded observations where no
O-rings failed, believing that they were
uninformative. They were main observations
showing no failure at warm temperatures (65-80
degF).
12
Better graph curve fitting
Apart from the disasterouse omitting the
observations with 0 failures   1. drawing a
smoothed curve to fit the points    2. removing
the background grid which obscure datagives a
graph which shows excessive risks associated with
both high and low temperatures
13
Logistic regressing model
14
Challenger disaster
  • Reanalysis of the O-ring data involved fitting a
    logistic regression model. This provides a
    predicted extrapolation (black curve) of the
    probability of failure to the low (31 degF)
    temperature at the time of the launch and
    confidence bands on that extrapolation (red
    curves). See also Tappin, L. (1994). "Analyzing
    data relating to the Challenger disaster".
    Mathematics Teacher, 87, 423-426
  • There's not much data at low temperatures (the
    confidence band is quite wide), but the predicted
    probability of failure is uncomfortably high.
    Would you take a ride on Challenger when the
    weather is cold?

15
Good examples
The French engineer, Charles Minard (1781-1870),
illustrated the disastrous result of Napoleon's
failed Russian campaign of 1812. The graph shows
the size of the army by the width of the band
across the map of the campaign on its outward and
return legs, with temperature on the retreat
shown on the line graph at the bottom. Many
consider Minard's original the best statistical
graphic ever drawn.
16
Florence Nightingale's Coxcomb diagrams
17
Escaping the 2D
18
Definitions statistical variables
  • Samples, populations, consist of individuals.
  • Values of certain attributes are called
    observations (e. g. age, income).
  • Attributes vary across individuals, and they
    are called variables.
  • Variables are described by distributions and
    their parameters (e.g. Normal, Poisson, ).
  • A random variable X assumes its value according
    to the outcome of a chance experiment (coin,
    dice).

19
Definitions Variance
  • Variance is the sum of squared deviations from
    the mean divided by n (or n-1) sample number.

Sample Variance Population Variance
20
Autocorrelation
  • Spatial autocorrelation is a measure of the
    similarity of objects within an area.
  • Jay Lee and Louis K. Marion, 2001

21
Morans Index
  • The formula to compute Morans index is the
    following
  • where n is the number of individual points,
  • A area of the bounding polygon, i.e. the total
    area of the map including all points
  • zi- value of the parameter measured for point I
    (attribute)

22
Features
  • wij is computed according to the following rule,
    min(dij) is the smallest of all distances between
    all pairs of points computed
  • In this formula, distance dij is computed
    according to the formulas for Euclidean, supremum
    or Manhattan metrics. Since dii is equal to 0,
    wii will become infinite, thus cases when ij
    should be excluded. This will result in n2 n
    pairs of points.

23
Selecting pairs of points
  • The sum by all i,j means that ALL ORDERED PAIRS
    of points (i.e. order of consideration of pair ij
    is important) should be considered by the
    formula.
  • Sometimes, only pair of sample points within a
    specific distance from each other are considered.

24
Application to pattern analysis
  • Example autocorrelation on a grid.
  • Sample points are combined in one cell. Size and
    location of the cell defines autocorrelation
    parameters.
  • Consider all pairs of GRID CELLS, where XC and YC
    now denote coordinates of the center of each grid
    cell and the attribute z for each grid is the sum
    of combined attributes of all points that belong
    to this cell.
  • Result insight on pattern analysis and
    correlation can be obtained.

25
Case study 1 Pattern Analysis
  • Analysis of instances of patients undergoing
    cardiac catheterization, and location of those
    instances, i.e. city blocks.
  • Primary question spatial variation of heart
    disease random or non-random pattern?
  • Secondary question relationship between disease
    occurrence and social and demographic factors
    (Spatial Regression).

26
Set up
  • Analysis results are affected by grid size
  • prone to subjective choices
  • constrained by spatial resolution of data
  • Solving the problem by
  • using a non-arbitrary grid(s)
  • implementing a guided selection of the
    square unit area or grid size

27
City blocks in Calgary
28
Methodology
  • Definition of a city-block grid based on the
    main division in the city, i.e. using the squared
    grid centered on the intersection between Center
    Street and Center Avenue as the main axes of the
    geometric plan thus created.
  • Grid regularity decreases as distance increases
    from its center.
  • L_p norms provide flexibility to adjust grids
    size and shape consequently.

29
Methodology
  • Application of varying L_p norms
  • Varying spatial weights for spatial
    autocorrelation
  • Autocorrelation analysis at varying scales
    (CDA, community)
  • Data 2001/1996 census

30
Experiments
31
Observations
  • Sensitivity of Spatial Autocorrelation to
  • L_p norm
  • spatial weight
  • Proposed method useful in determining
  • best distance
  • best spatial weight
  • In context of multivariate spatial regression
  • best ?? lowest variance

32
Results
  • The Calgary Journal, Regional publication,
    Researchers link heart disease to urban
    lifestyles on SPARCS activity profile, Oct. 26
    Nov. 8, 2005
  • High risk of heart attack male, high education,
    married

33
Case study 2 Oil spill discharge
34
Summary statistics
cells Min. Max. Mean St. dev. Sum Skew Kurt.
Oil spill counts 44 (2,741) 0 3 0.02 0.162 53 9.85 113.6
Flight counts 2151 (2,741) 0 309 13.75 27.12 37,681 4.21 25.6
The mean and the standard deviation provide
information about the statistical dispersion of
the data and skewness (irregular) and kurtosis
(bulging in Greek) indicate highly skewed
distributions or lack of normality in the data.
35
Data clustering
36
Statistical analysis
  • Our exploratory analyses indicate that there is a
    positive spatial autocorrelation within datasets
    for all variables.
  • An initial overview of the statistical
    distribution and normality of each of the
    variables selected for this study indicated
    absence of normality in the data.

Exploratory Spatial Analysis of Illegal Oil
Discharges Detected off Canadas Pacific
Coast. Norma Serra-Sogas1, Patrick OHara2,
Rosaline Canessa3, Stefania Bertazzon4 and Marina
Gavrilova5
37
Lecture summary
  • Proper statistical analysis is important
  • Variance and autocorrelation are two important
    vehicles for data analysis
  • Combining these measures with various metrics,
    hierarchical structures, grids, attributes and
    also data filtering/visualization methods is a
    direction of current research.
Write a Comment
User Comments (0)
About PowerShow.com