Geographical data mining: key design issues - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Geographical data mining: key design issues

Description:

Visiting hyperspace. 1. Progressively re-map the hyperspace greatly reducing its ... List of best search regions together with coordinates in hyperspace, and ... – PowerPoint PPT presentation

Number of Views:67
Avg rating:3.0/5.0
Slides: 26
Provided by: Louis68
Category:

less

Transcript and Presenter's Notes

Title: Geographical data mining: key design issues


1
Geographical data mining key design issues
  • Stan Openshaw, Centre for Computational
    Geography, School of Geography, University of
    Leeds, United Kingdom GeoComputation99
  • ??????

2
Introduction
  • Explosion in Geographically referenced data
  • Occasioned by developments in IT, digital
    mapping, remote sensing, and the global diffusion
    of GIS
  • A GIS context DM tool are to be made in order to
    exploit the flood of geoinformation

3
Introduction
  • Data mining tools analyze very large commercial
    databases in order to model and predict
    customer-buying behaviour.
  • As Geographical Data Mining should do more in
    spatial pattern recognition.
  • The focus here is on developing an explicitly
    Geographical Data Mining(GDM) technology.

4
Why geographical data mining?
  • data uncertainty and errors are often spatial
    structured
  • whole map statistics are seldom helpful
  • relationships are often geographical localised -
    rather than global
  • non-linearity is the norm
  • time often interacts with space
  • most GIS data layers are categorical
  • the locational element is important
  • there can be a fair proportion of junk data

5
  • many conventional statistical tools continue to
    be useful and that they have already well served
    quantitative geography for more than 3 decades
  • however, the view here is that many of these
    conventional, general purpose, statistical
    techniques are not sufficiently focused on, and
    tailored to, the special needs of geographical
    analysis

6
Typical generic data mining functions
  • exploratory data analysis tools
  • linked map and graph displays
  • other visual data mining tools
  • most multivariate statistical methods
  • linear and logistic regression
  • classification
  • decision trees and regression trees
  • association rules detection
  • neural networks
  • memory based reasoning

7
example
  • If you input some X, Y referenced data into a
    data mining package and expect it to identify
    localised clusters of excess incidence of a
    disease, then you would probably be disappointed.
  • These packages could only treat the X,Y
    co-ordinates as if they were merely two ordinary
    variables (such as age or income) and it is very
    likely that nothing useful would be achieved.

8
Geographical analysis machines
  • GIS-relevant GDM ?????1.a human being based
    exploratory graphical methods approach, and2.the
    development of automated analysis machines.
  • The first prototype of an automated GAM was
    developed in mid 1980s.(the original GDM)

9
The justification for automation
  • there are many possible locations of potential
    localised clusters (many millions),
  • the search should be locationally unbiased and
    geographically comprehensive, and
  • it should handle spatial data uncertainty.

10
the explosion of GIS databases
  • the need to speedily analyse in hours,
  • an increasing need to perform routine analysis
    for monitoring purposes
  • few skilled spatial analysts,
  • high imperative for geographical analysis
  • as computer hardware has become faster,
  • automation makes it easier to develop
    user-friendly interfaces

11
  • the current GAMK version is a very powerful
    cluster detector that in blind testing
  • however, it is only a descriptive tool
  • GAM uses a brute force search
  • A space-time version has recently been developed
    (GAMK-T)

12
Geographical Explanations Machines
  • a traditional map-based GIS analysis tool that
    of map overlay
  • It has long been found useful to overlay two
    coverages
  • The hope is that some of the polygons created by
    the map overlay process will define "interesting"
    results

13
  • Instead of overlaying complete maps, the data
    inside each of the GAM search circles are
    examined using permutations of coverages
  • This new method is termed the Geographical
    Exploration Machine (GEM).

14
GIS data characteristics
  • Openshaw (1994,5) pointed out that GIS databases
    consist of three broad classes of data types that
    form a complex multivariate tri-space and that it
    is these spaces, and the interactions between
    them,
  • 1. Geographical coordinates,
  • 2. Temporal coordinate(s), and
  • 3. Multiple attributes relating to the
    geographical entities.

15
  • note that all three types of space are measured
    in different units that cannot be directly
    related to each other

16
GDM Basic functionality
  • Serve basic spatial data exploratory needs,
  • Have the potential to create new insights, ideas,
    and hypotheses from the analysis,
  • Offer artistic impressions of pattern structure
    to stimulate the imagination,
  • Spot major unusual localised database patterns
    and detect empirical location-based regularities,
    and
  • Be easy to use and meet basic GISability criteria
    (Openshaw and Fischer, 1995).

17
Trispace characteristics of a GIS database
  • Datatype Nature of Data
  • 1. geography data
  • 2. time data
  • 3. multiple attribute data
  • 4. geography and time data
  • 5. time and multiple attribute
    data
  • 6. ?geography and multiple attribute data
  • 7. geography, time, and multiple attribute
    data

18
  • most current geographical analysis tools only
    work with data type 1
  • Time Data (data type 2) can be handled via
    well-developed time series methods, but these are
    implicitly space free
  • Attribute only data (type 3) can be handled via
    conventional multivariate statistical (and
    current data mining) methods

19
  • space-time clustering measures exist (i.e., Knox
    and Mantel statistics)
  • space-time models have so far been developed
    (i.e., STARIMAs).
  • Time and multiple attribute data (data type 5)
    are even harder to analyse. The best example
    would be multiple time series modelling for a
    fixed area.
  • Methods that can handle data with this level of
    complexity should have little difficulty with all
    the rest.

20
Visiting hyperspace
  • 1. Progressively re-map the hyperspace greatly
    reducing its dimensionality until it becomes
    manageable
  • 2. Develop some kind of virtual reality
    hyperspace explorer and data visualiser so that
    you can go into the database's hyperspace
  • 3. Create GDM agents that are able to visit these
    spaces for you and then report back

21
  • Human beings can only easily handle a small
    number of dimensions of multivariate space.
  • impossible problems in visualising four, five, or
    fifty more dimensions
  • create intelligent agents or artificial creatures
    or robots able to search these higher dimensional
    spaces in

22
Handling time-varying population at risk data
  • disease incidence data might be monthly, but the
    best census-based population at risk data are
    updated once every 10 years

23
  • Ignore the problem on the grounds that any
    patterns created by this neglect may still be of
    considerable interest
  • Use an expected value based on historic data or
    long-term average values or estimates of a
    maximum value so as to minimise the risk of false
    patterns being uncovered and
  • Estimate population-at-risk values for each time
    period for which there are data to be analysed,
    perhaps using linear interpolation.

24
Understand the findings
  • A summary view in geography space of any and all
    clustering that has been found
  • List of best search regions together with
    coordinates in hyperspace, and
  • Hypertext linkages or databases that connected
    (2) to (1) and show the contribution (2) makes to
    1.

25
conclusion
  • The paper has focused on the design issues in
    building a GIS-appropriate Geographical Data
    Mining tool.
  • Preliminary results will be presented at
    GeoComputation '99 using synthetic data with
    varying levels of pattern complexity.
  • The belief is that GDM actually works
    surprisingly well, at least on synthetic data.
Write a Comment
User Comments (0)
About PowerShow.com