Noriel Christopher C' Tiglao, Dr' Eng

About This Presentation

Title:

Noriel Christopher C' Tiglao, Dr' Eng

Description:

Time of the day that each racer finished ... The finishing places of each racer, i.e. 1st place, 2nd place, 3rd place. Ordinal ... – PowerPoint PPT presentation

Number of Views:58

Avg rating:3.0/5.0

Slides: 104

Provided by: norielchri

Category:

more less

Transcript and Presenter's Notes

Title: Noriel Christopher C' Tiglao, Dr' Eng

1
Spatial Data Representation and Analysis
Module 3

Noriel Christopher C. Tiglao, Dr. Eng
24 January 4 February 2005
Statistical Research and Training Center (SRTC)
Quezon City, Metro Manila

2
Presentation Outline

Introduction
Spatial Data and Spatial Relationships
Sampling Reality
Scales of Measurement
Data Sources and Errors
Data Abstraction
Spatial Data Structures

3
Introduction

The world is infinitely complex
The contents of a spatial database represent a
particular view of the world
User sees the real world through the medium of
the database
The measurements and samples contained in the
database must present as complete and accurate a
view of the world as possible

4
Introduction (contd.)

The contents of the database must be relevant in
terms of
themes and characteristics captured
the time period covered
the study area

5
Representing Reality

A database consists of digital representations of
discrete objects
The features shown on a map, e.g. lakes,
benchmarks, contours can be thought of as
discrete objects
The contents of a map can be captured in a
database by turning map features into database
objects

6
Representing Reality (contd.)

Many of the features shown on a map are
fictitious and do not exist in the real world
contours do not really exist, but houses and
lakes are real objects
The contents of a spatial database include
digital versions of real objects, e.g. houses
digital versions of artificial map features, e.g.
contours
artificial objects created for the purposes of
the database, e.g. pixels

7
Data

Data are facts
some facts are more important to us than others.
Some facts are important enough to warrant
keeping track of them in a formal, organized way
"Data" is a broad concept that can include things
such as pictures (binary images), programs, and
rules
Informally, data are the things you want to store
in a database

8
Spatial vs. Non-spatial Data

Spatial data includes location, shape, size, and
orientation
Spatial data includes spatial relationships
Non-spatial data (also called attribute or
characteristic data) is that information which is
independent of all geometric considerations

9
Spatial vs. Non-spatial Data (contd.)

It is possible to ignore the distinction between
spatial and non-spatial data. However, there are
fundamental differences between them
spatial data are generally multi-dimensional and
autocorrelated.
non-spatial data are generally one-dimensional
and independent

10
Spatial vs. Non-spatial Data (contd.)

These distinctions put spatial and non-spatial
data into different philosophical camps with
far-reaching implications for conceptual,
processing, and storage issues.
For example, sorting is perhaps the most common
and important non-spatial data processing
function that is performed
It is not obvious how to even sort locational
data such that all points end up nearby their
nearest neighbors

11
Spatial Relationships

Describe the association among different features
in space
are visually obvious when data are presented in
the graphical form
however, it is difficult to build spatial
relationships into the information organization
and data structure of a database

12
Spatial Relationships (contd.)

Difficulty in capturing spatial relationships in
a database
there are numerous types of spatial relationships
possible among features
recording spatial relationships implicitly
demands considerable storage space
computing spatial relationships on-the-fly slows
down data processing particularly if relationship
information is required frequently

13
Point-Line-Area Relationship Matrix
14
Spatial Relationships (contd.)

Topological
describes the property of adjacency, connectivity
and containment of contiguous features
Proximal
describes the property of closeness of
non-contiguous features

15
(No Transcript)
16
Spatial Relationships (contd.)

Spatial relationships are very important in
geographical data processing and modeling
the objective of information organization and
data structure is to find a way that will handle
spatial relationships with the minimum storage
and computation requirements

17
Spatial Data

Phenomena in the real world can be observed in
three modes spatial, temporal and thematic
the spatial mode deals with variation from place
to place
the temporal mode deals with variation from time
to time (one slice to another)
the thematic mode deals with variation from one
characteristic to another (one layer to another)

18
Spatial Data (contd.)

All measurable or describable properties of the
world can be considered to fall into one of these
modes - place, time and theme
An exhaustive description of all three modes is
not possible

19
Spatial Data (contd.)

When observing real-world phenomena we usually
hold one mode fixed, vary one in a controlled
manner, and measure the third (Sinton, 1978)
e.g. using a census of population we could fix a
time such as 1990, control for location using
census tracts and measure a theme such as the
percentage of persons owning automobiles

20
Spatial Data (contd.)

Holding geography fixed and varying time gives
longitudinal data
Holding time fixed and varying geography gives
cross- sectional data
The modes of information stored in a database
influence the types of problem solving that can
be accomplished

21
Location

The spatial mode of information is generally
called location

22
Attributes

Attributes capture the thematic mode by defining
different characteristics of objects
A table showing the attributes of objects is
called an attribute table
each object corresponds to a row of the table
each characteristic or theme corresponds to a
column of the table
thus the table shows the thematic and some of the
spatial modes

23
Time

The temporal mode can be captured in several ways
by specifying the interval of time over which an
object exists
by capturing information at certain points in
time
by specifying the rates of movement of objects

24
Time (contd.)

Depending on how the temporal mode is captured,
it may be included in a single attribute table,
or be represented by series of attribute tables
on the same objects through time

25
Sampling Reality

Numerical values may be defined with respect to
nominal, ordinal, interval, or ratio scales of
measurement
It is important to recognize the scales of
measurement used in GIS data as this determines
the kinds of mathematical operations that can be
performed on the data

26
Sampling Reality (contd.)
27
Marathon Example
28
Sampling Reality (contd.)

Distinctions, though important, are not always
clearly defined
Is elevation interval or ratio? if the local base
level is 750 feet, is a mountain at 2000 feet
twice as high as one at 1000 feet when viewed
from the valley?
Many types of geographical data used in GIS
applications are nominal or ordinal
Values establish the order of classes, or their
distinct identity, but rarely intervals or ratios

29
Sampling Reality (contd.)

Thus you cannot
multiply soil type 2 by soil type 3 and get soil
type 6
divide urban area by the rank of a city to get a
meaningful number
subtract suitability class 1 from suitability
class 4 to get 3 of anything
However, you can
divide population by area (both ratio scales) and
get population density
subtract elevation at point a from elevation at
point b and get difference of elevation

30
Multiple Representations

A data model is essential to represent
geographical data in a digital database
There are many different data models
The same phenomena may be represented in
different ways, at different scales and with
different levels of accuracy
Thus there may be multiple representations of the
same geographical phenomena

31
Multiple Representations (contd.)

It is difficult to convert from one
representation to another
e.g. from a small scale (1250,000) to a large
scale (110,000)
Thus it is common to find databases with multiple
representations of the same phenomenon
this is wasteful, but techniques to avoid it are
poorly developed

32
Primary Data Sources

Some of the data in a spatial database may have
been measured directly
e.g. by field sampling or remote sensing
The density of sampling determines the resolution
of the data
e.g. samples taken every hour will capture
hour-to- hour variation, but miss shorter-term
variation
e.g. samples taken every 1 km will miss any
variation at resolutions less than 1 km

33
Primary Data Sources (contd.)

A sample is designed to capture the variation
present in a larger universe
e.g. a sample of places should capture the
variation present at all possible places
e.g. a sample of times will be designed to
capture variation at all possible times

34
Sampling Approaches

In a random sample, every place or time is
equally likely to be chosen
Systematic samples are chosen according to a
rule, e.g. every 1 km, but the rule is expected
to create no bias in the results of analysis,
i.e. the results would have been similar if a
truly random sample had been taken

35
Sampling Approaches (contd.)

In a stratified sample, the researcher knows for
some reason that the universe contains
significantly different sub-populations, and
samples within each sub-population in order to
achieve adequate representation of each
e.g. we may know that the topography is more
rugged in one part of the area, and sample more
densely there to ensure adequate representation
if a representative sample of the entire universe
is required, then the subsamples in each
subpopulation will have to be weighted
appropriately

36
Secondary Data Sources

Some data may have been obtained from existing
maps, tables, or other databases
To be useful, it is important to obtain
information in addition to the data themselves
information on the procedures used to collect and
compile the data
information on coding schemes, accuracy of
instruments

37
Secondary Data Sources (contd.)

Unfortunately such information is often not
available
a user of a spatial database may not know how the
data were captured and processed prior to input
this often leads to misinterpretation, false
expectations about accuracy

38
Data Standards

Standards may be set to assure uniformity
within a single data set
across data sets
e.g. uniform information about timber types
throughout the database allows better fire
fighting methods to be used, or better control of
insect infestations
Data capture should be undertaken in standardized
ways that will assure the widest possible use of
the information

39
Sharing Data

It is not uncommon for as many as three agencies
to create databases with, ostensibly, the same
information
e.g. a planning agency may map landuse, including
a forested class
e.g. the state department of forestry also maps
forests
e.g. the wildlife division of the department of
conservation maps habitat, which includes fields
and forest

40
Sharing Data (contd.)

Each may digitize their forest class onto
different GIS systems, using different protocols,
and with different definitions for the classes of
forest cover
this is a waste of time and money
Sharing information gives it added value
Sharing basic formats with other information
providers, such as a department of
transportation, might make marketing the database
more profitable

41
Errors and Accuracy

There is a nearly universal tendency to lose
sight of errors once the data are in digital form
are implanted in databases because of errors in
the original sources (source errors)
are added during data capture and storage
(processing errors)
occur when data are extracted from the computer
arise when the various layers of data are
combined in an analytical exercise

42
Errors in Sources

Are extremely common in non-mapped source data,
such as locations of wells, or lot descriptions
Can be caused by doing inventory work from aerial
photography and misinterpreting images
Often occur because base maps are relied on too
heavily

43
Classification Errors

Are common when tabular data are rendered in map
form
Simple typing errors may be invisible until
presented graphically
More complex classification errors may be due to
the sampling strategies that produced the
original data

44
Data Capture Errors

Manual data input induces another set of errors
Eye-hand coordination varies from operator to
operator and from time to time
Data input is a tedious task - it is difficult to
maintain quality over long periods of time

45
Accuracy Standards

Many agencies may not have established accuracy
standards for geographical data
these are more often concerned with accuracy of
locations of objects than with accuracy of
attributes
Higher accuracy requires better source materials
is the added cost justified by the objectives of
the study?
Accuracy standards should be determined by
considering both the value of information and the
cost of collection

46
Data Abstraction

Capturing the essential pieces of information to
describe the spatial phenomenon
Based on a conceptual model of reality
Expressed in data models
Realized by building up of data structures (i.e.
internal representation of spatial data) using
database models

47
Reality, Conceptual Model and Database
Database (Cyber world)
Reality
Conceptual model
48
Data Abstraction Example
Locating objects
Recording attribute info.
Data (1.0, 20.3, 9.0, 12.8, 15.0, 10000.00)
49
Data Abstraction Example
Data (1.0, 20.3, 9.0, 12.8, 15.0, 10000.00)
What do you mean by these data?
(1.0, 20.3)
Describing a house for rent
Z15.0
(9.0, 12.8)
Abstraction rule
Monthly rent PhP 10,000.00
Conceptual model of describing house for rent.
Crucial information for data users
50
Levels of Data Abstraction
51
Data Model vs. Database Model

Data Model
Vector methods (feature-based)
Raster methods (field-based)
Database Model
Software implementation of data models
Metwork, hierarchical and object-oriented
databases

52
Vector Data Model

Method of representing geographic features by the
basic graphical elements
Points
Lines (arcs)
Polygons (area)
They can also be used to construct complex
features

53
Basic Graphical Elements
54
Vector Data Model (contd.)

Related vector data are always organized by
themes, which are also referred to as layers or
coverages
examples of themes geodetic control, base map,
soil, vegetation cover, land use, transportation,
drainage and hydrology, political boundaries,
land parcel and others

55
Vector Data Model (contd.)

For themes covering a very large geographic area,
the data are always divided into tiles so that
they can be managed more easily
a tile is the digital equivalent of an individual
map in a map series
a tile is uniquely identified by a file name

56
Vector Data Model (contd.)

A collection of themes of vector data covering
the same geographic area and serving the common
needs of a multitude of users constitutes the
spatial component of a geographical database
Graphical data captured by imaging devices in
remote sensing and digital cartography (such as
multi-spectral scanners, digital cameras and
image scanners) are made up of a matrix of
picture elements (pixels) of very fine resolution

57
Raster Data Model

Method of representing geographic features by
pixels
A raster pixel is usually a square grid cell but
there are there are several variants such as
triangles and hexagons

58
Raster Data Model (contd.)

A raster pixel represents the generalized
characteristics of an area of specific size on or
near the surface of the Earth
the actual ground size depicted by a pixel is
dependent on the resolution of the data, which
may range from smaller than a square meter to
several square kilometers
Raster data are organized by themes, which is
also referred to as layers

59
Raster Data Model (contd.)

Raster data covering a large geographic area are
organized by scenes (for remote sensing images
The raster method is based on the concept that
geographic features are represented as surfaces,
regions or segments

60
Vector Data Structure

Spaghetti
a direct line-for-line unstructured translation
of the paper map has very limited practical use
it is usually an interim data structure for map
digitizing
Hierarchical
a vector data structure developed to facilitate
data retrieval by separately storing points,
lines and areas in a logically hierarchical manner

61
Spaghetti Data Model and Data Structure
62
Hierarchical Data Model and Data Structure
63
Vector Data Structure (contd.)

Topological
a vector data structure that captures spatial
relationship by explicitly storing adjacency
information
the basic logical feature for line and area
coverage is a straight line segment
each individual line segment is defined by the
coordinates of its end points called nodes

64
Topological Data Model and Data Structure
65
Raster Data Structure

Space is subdivided into regular grids of square
grid cells or other forms of polygonal meshes
known as picture elements (pixels)
the location of each cell is defined by its row
and column numbers
the area that each cell represents defines the
spatial resolution of the data
the position of a geographic feature is only
recorded to the nearest pixel

66
Raster Data Structure (contd.)

the value stored for each cell indicates the
types of the object, phenomenon or condition that
is found in that particular location
different types of values can be coded integers,
real numbers and alphabets
integer values often act as code numbers, which
are referenced to names in an associated table
(called the look-up table) or legend
different attributes at the same cell location
are stored as separate themes or layers

67
Characteristics of Raster Data Structure
68
Raster Data Structure (contd.)

There are several variants to the regular grid
raster data structure, including
irregular tessellation (e.g. triangulated
irregular network (TIN))
hierarchical tessellation (e.g. quad tree) and
scan-line

69
Representing Fields

there are many ways of representing fields
not all are implemented in GIS
different terminologies exist in different
disciplines
Six major representations
Regular cells
Rectangular grid of points
Irregularly spaced points
Digitized contours
Polygons
Triangulated irregular networks (TINs)

70
Regular Cells

Value in each cell is an average, total, or some
other aggregate property of the field within the
cell
the representation defines a value everywhere, so
is complete
however, all within-cell variation is lost
if necessary, it must be reconstructed by some
method of intelligent guesswork
e.g. remote sensing data and other kinds of
digital imagery

71
(No Transcript)
72
Rectangular grid of points

e.g. measurements of land surface elevation in a
digital elevation model (DEM)
spacing of measurements is critical to accuracy
of representation
all variation between sample points is lost
elevations at other points must be estimated by
some method of intelligent guesswork (the
representation is incomplete)

73
(No Transcript)
74
Irregularly spaced points

The field's value is defined at a set of sample
points scattered in the frame
values of the field at other points must be
interpolated representation is incomplete
e.g. weather data, available at scattered weather
stations
accuracy depends on the density of points
it is not clear what measure best defines
accuracy - density per unit area, minimum
distance between sample points, maximum distance

75
(No Transcript)
76
Digitized contours

The field is represented as a set of isolines,
each connecting points of constant value
representation is incomplete
The scale of measurement of the variable must be
at least ordinal
isolines cannot be defined for nominal data
Each isoline is represented as a polyline
e.g. data obtained from topographic maps
Accuracy depends on
the number of contoured values, or the contour
interval
the density of polyline points

77
(No Transcript)
78
Polygons

The frame is partitioned into irregular areas
(volumes for 3 or more dimensions)
value in each area is an average, total, or some
other aggregate property of the field within the
area
the representation is complete
all variation within areas is lost
e.g. data obtained from maps of vegetation cover
class, soil type

79
Polygons (contd.)

The boundaries of areas are continuously curved
lines
represented digitally as polylines - an ordered
sequence of points connected by straight lines
the denser the points, the more accurate the
polyline as a representation of a continuous
curve
accuracy depends both on the size of polygons and
on the density of polyline points
it is not clear what measure of polygon size -
average, minimum - best defines accuracy

80
Polygons (contd.)

Every point in the frame lies in exactly one
polygon
except for points on the boundaries
the polygons cannot overlap, must exhaust the
frame
they are said to tesselate the space, they form
an irregular tesselation

81
(No Transcript)
82
Triangulated irregular networks (TINs)

the frame is covered with a mesh of irregular
triangles
every point lies in exactly one triangle, or on a
triangle edge
the value of the field is known at every triangle
vertex
within triangles and along edges it is assumed to
vary linearly
the representation is complete
contours drawn across triangles will therefore
always be straight and parallel
across triangle edges there will be breaks of
slope, but not cliffs
contours will kink at edges

83
Triangulated irregular networks (TINs)

the scale of measurement of the variable must be
at least interval
variation within triangles cannot be defined for
nominal or ordinal variables

84
Triangulated irregular networks (TINs)

Accuracy depends on
how carefully the vertices were located on the
surface
how well the planes defined within each triangle
fit the actual surface
the sizes of triangles
but it is not clear what property of triangle
size best defines accuracy - average, smallest,
largest

85
(No Transcript)
86
Vector and Raster Data Integration

Recent advances in computer technologies allow
these two types of data to be used in the same
applications
computers are now capable of converting data from
the vector format to the raster format
(rasterization) and vice versa (vectorization)
computers are now able to display vector and
raster simultaneously
vector and raster data are largely seen as
complimentary to, rather than competing against,
one another in geographic data processing

87
Georelational Data Structure

Was developed to handle geographic data
It allows the association between spatial
(graphical) and non-spatial (descriptive) data
It is the data structure used by many
vector-based GIS software packages
Both spatial and non-spatial data are stored in
relational tables

88
Georelational Data Structure (contd.)

Point, line and polygon data are stored in
separate feature attribute tables (FAT)
in the FAT, each entity is assigned a unique
feature identifier (FID)
topological information is explicitly stored by
employing a method similar to the topological
data structure described above
non-spatial data are stored in relational tables

89
Feature Attribute Table (FAT)
90
Georelational Data Structure (contd.)

Entities in the spatial and non-spatial
relational tables are linked by the common FIDs
of entities

91
Linking spatial and non-spatial tables
92
Data Modeling

Process of defining real world phenomena or
geographic features of interest in terms of their
characteristics and their relationships with one
another
it is concerned with different phases of work
carried out to implement information organization
and data structure

93
Data Modeling (contd.)

There are three steps in the data modeling
process, resulting in a series of progressively
formalized data models as the form of the
database becomes more and more rigorously defined
conceptual data modeling - defining in broad and
generic terms the scope and requirements of a
database
logical data modeling - specifying the user's
view of the database with a clear definition of
attributes and relationships
physical data modeling - specifying internal
storage structure and file organization of the
database

94
Conceptual Data Modeling

Entity-relationship (E-R) modeling is probably
the most popular method of conceptual data
modeling
It is sometimes referred to as a method of
semantic data modeling because it used a human
language-like vocabulary to describe information
organization

95
Conceptual Data Modeling (contd.)

It involves four aspects of work
identifying entities
an entity is defined as a person, a place, an
event, a thing, etc.
identifying attributes
determining relationships
drawing an entity-relationship diagram (E-R
diagram)

96
Sample E-R Diagram
97
Logical Data Modeling

Comprehensive process by which the conceptual
data model is consolidated and refined
the proposed database is reviewed in its entirety
in order to identify potential problems such as
irrelevant data that will not be used
omitted or missing data
inappropriate representation of entities
lack of integration between various parts of the
database
unsupported applications
potential additional cost to revise the database

98
Logical Data Modeling (contd.)

The end product of logical data modeling is a
logical schema
the logical schema is developed by mapping the
conceptual data model (such as the E-R diagram)
to a software-dependent design document

99
Logical Schema Example
100
Physical Data Modeling

Database design process by which the actual
tables that will be used to store the data are
defined in terms of
data format - the format of the data that is
specific to a database management system (DBMS)
storage requirements - the volume of the database
physical location of data - optimizing system
performance by minimizing the need to transmit
data between different storage devices or data
servers

101
Physical Data Modeling (contd.)

The end product of physical data modeling is a
physical schema
a physical schema is also variably known as data
dictionary, item definition table, data specific
table or physical database definition
it is both software- and hardware specific
this means the physical schemas for different
systems look different from one another

102
Physical Schema Example
103
End

Write a Comment

User Comments (0)