Playing with Spaghetti: Vector and Raster Data Models in Depth

About This Presentation

Title:

Playing with Spaghetti: Vector and Raster Data Models in Depth

Description:

The details: Raster data models. Review you tell me ... topological encoding more efferent. suitable for most usage and compatible with data ... – PowerPoint PPT presentation

Number of Views:197

Avg rating:3.0/5.0

Slides: 180

Provided by: mississipp6

Category:

more less

Transcript and Presenter's Notes

Title: Playing with Spaghetti: Vector and Raster Data Models in Depth

1
Playing with SpaghettiVector and Raster Data
Models in Depth

Talbot J. Brooks
ASU Dept. of Geography

2
Tonights topics

Why we were gone
Big picture overview Raster vs. Vector
The details Vector data models
The details Raster data models

3
Review you tell me

What is the difference between vector and raster
data?
Basic vector data types
Examples of raster data
Computer file structures
Flat
Hierarchical
Network
Relational

4
RASTER AND VECTOR FORMATS
RASTER Grid-based, Simplify reality VECTOR
Analog map, Cartography
5
DATA MODEL OF RASTER AND VECTOR
REAL WORLD
1 2 3 4 5 6
7 8 9 10
1 2 3 4 5 6 7 8 9 10
GRID RASTER
VECTOR
6
RASTER DATA MODEL

derive from formulation that real world - it has
spatial elements and objects fills those elements
real world is represented with uniform cells
list of cells is a rectangle
cell comprises of triangles, hexagon and higher
complexities
a cell reports its own true characteristics
per units cell does not represent an object
an object is represented by a group of cells

7
Lake
River
Pond
Reality - Hydrography
Lake
River
Pond
Reality overlaid with a grid
1
1
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
0
0
0
0
0
0
0
0 No Water Feature 1 Water Body 2 River
1
1
1
2
0
0
0
0
0
0
0
0
2
2
1
1
0
0
0
0
0
0
0
0
2
2
0
0
0
0
0
0
0
0
0
0
2
0
0
0
0
0
0
0
0
0
0
0
Resulting raster
Creating a Raster
8
VECTOR DATA MODEL

derived from the formulation of spatial concepts
that emphasize on real world objects
geometry primitives of vector data model are
point, line and polygon
objects can be built from these primitives
object location determined by represented
location point
uniqueness of vector data model lies in its
management and storage of data geometry
primitives
spaghetti model
topology model

9
VECTOR CHARACTERISTICS
POINT X LINE POLYGON
10
RASTER TO VECTOR
RIVER CHANGED FROM RASTER TO VECTOR FORMAT
RIVER THAT HAS BEEN
VECTORISED ORIGINAL RIVER
11
PRO AND CONS OF RASTER MODEL

pro
raster data is more affordable
simple data structure
very efficient overlay operation
cons
topology relationship difficult to implement
raster data requires large storage
not all world phenomena related directly with
raster representation
raster data mainly is obtained from satellite
images and scanning

12
PRO AND CONS OF VECTOR MODEL

pro
more efficient data storage
topological encoding more efferent
suitable for most usage and compatible with data
good graphic presentation
cons
overlay operation not efficient
complex data structure

13
A look behind the scenes Vector GIS data models

Spaghetti model
Topological vector model
Cardinality (this is gonna hurt!)
Break

14
The Spaghetti Model

The spaghetti model is the most simple vector
data model
The model is a direct representation of a
graphical image
NO explicit topological information

15
Spaghetti Model

Description direct line for line translation of
the paper map (often viewed as raw digital data)
Pros easy to implement, good for fast drawing
Cons storage and searches are sequential,
storage of attribute data

16
Spaghetti model
17
Topology

Branch of mathematics dealing with geometric
properties
Geometry of objects remain invariant under
transformations
Neighborhood relationships remain the same
Topology is the distinguishing basis for more
complicated vector models

18
Topological Vector Model

Topological data models are provided with
information that can help us in obtaining
solutions to common operations in advanced GIS
analytical techniques.
This is done by explicitly recording adjacency
information into the data structure, eliminating
the need to determine it for multiple operations.
Each line segment, the basic logical entity in
topological data structures, begins and ends when
it either contacts or intersects another line, or
when there is a change in direction of the line.

19
Topological Vector Model

Each line has two sets of numbers, a pair of
coordinates and an associated node number.
Each line segment has its identification number
that is used as a pointer to indicate which set
of nodes represent its beginning and ending.

20
Topological Vector Model

Polygons also have identification codes that
relate back to the link numbers. Each link in
the polygon now is capable of looking left and
right at the polygon numbers to see which two
polygons are also stored explicitly, so that even
this tedious step is eliminated.
The Topological data model more closely
approximates how we as map readers identify the
spatial relationships contained in an analog map
document.

21
Topological Vector Model
22
How do we preserve topology ina computer
database?

What are we storing?
Points, lines, polygons
What do we need to preserve?
Neighborhood relationships between these objects
Terminology
point, link, node, polygon

23
Terminology

Point x, y coordinate identifying a geographic
location
Link (line, arc) an ordered set of points with a
node at the beginning and end of it
Node the beginning and end of link (often
defined where 3 or more lines connect)
Polygon two or more links connected at the
nodes, contains a point inside to identify the
polygons attributes

24
Nevada
Utah
California
Arizona
25
Identify the polygons
26
Create the polygon attribute table (PAT)
27
Identify the nodes
28
Node table
29
Identify the links (arcs, lines)
30
Simplify this
31
Create the topology!
32
Nodes First
33
Nodes First
34
Polygons
35
Polygons
36
Identify the points
37
Link List
38
Point Coordinates
39
Putting it all together
40
Putting it all together
41
Putting it all together
42
Putting it all together
43
Putting it all together
44
Cardinality

Cardinality is the relationship between spatial
objects, attributes, or spatial objects and
attributes.
This relationship may be defined as
11
1many
manymany

45
Cardinality

We can use cardinality to establish relationships
and rules among objects and attributes
This becomes the basis for modeling how data is
arranged within a GIS - especially one that uses
vector data.

46
Cardinality contd

Entity-entity relationships are described by
cardinality which may be
One to one. A FOREST can have only one MANAGER
and a MANAGER can have only one FOREST
Many to one. Many FACILITIES may be contained
within one FOREST
Many to Many. The relationship water_supply may
have many entries and may be connected to many
entries FACILITIES, FOREST, etc

47
Cardinality contd

The same concept applies to space
A bathroom is located within a house (11)
Many homes are within a town (many1)
Many people are within many homes (manymany)

48
Diagram Characteristics

Boxes represent entities
Ovals represent attributes
Diamonds represent relationships
Note how cardinality is depicted
Key attributes are underlined
Multi-valued attributes are in double ovals

49
Entity-Relationship (ER) Diagrams A Conceptual
Model
50
Exercise work in pairs 10 minutes

Create a simple ER diagram for your neighborhood
Pick a feature that matches each geometry type
(point, line). For example
For points, you might pick fire hydrants and lamp
posts
For lines, you might pick streets and water mains
For polygons, pick parcels or zip codes

51
Explanation of database types

a database is a collection of non-redundant data
which can be shared by different application
systems
implies separation of physical storage from use
of the data by an application program, i.e.
program/data independence
changes can be made to data without affecting
other components of the system.

52
Database types

tabular ("flat file") - data in a single table
hierarchical
network
relational

53
The ideal GIS database is one that maximizes the
uniqueness of every feature while minimizing
total data quantity
54
Hierarchical databases

Developed in the 1960s by International Business
Machines (IBM)
Somewhat resembles real-world filing systems
Tree-structured, similar to folder arrangements
in a computer directory
The database keeps track of the different record
types, their attributes, and the hierarchical
relationships between them
The attribute which assigns records to levels in
the database structure is called the key (e.g. is
record a department, part or supplier?)

55
Features of a hierarchical model

a set of record "types"
e.g. supplier record type, department record
type, part record type
a set of links connecting all record types in one
data structure diagram (tree)
at most one link between two record types, hence
links need not be named
for every record, there is only one parent record
at the next level up in the tree

56
Features (contd)

e.g. every county has exactly one state, every
part has exactly one department
no connections between occurrences of the same
record type
cannot go between records at the same level
unless they share the same parent
diagram

57
Pros and cons

data must possess a tree structure
tree structure is natural for geographical data
data access is easy via the key attribute, but
difficult for other attributes
in the business case, easy to find record given
its type (department, part or supplier)
in the geographical case, easy to find record
given its geographical level (state, county,
city, census tract), but difficult to find it
given any other attribute

58
Pros and cons (contd)

e.g. find the records with population 5,000 or
less
tree structure is inflexible
cannot define new linkages between records once
the tree is established
e.g. in the geographical case, new relationships
between objects
cannot define linkages laterally or diagonally in
the tree, only vertically

59
Pros and cons (contd)

the only geographical relationships which can be
coded easily are "is contained in" or "belongs
to"
DBMSs based on the hierarchical model (e.g.
System 2000) have often been used to store
spatial data, but have not been very successful
as bases for GIS

60
Network data model

developed in mid 1960s as part of work of CODASYL
(Conference on Data Systems Languages) which
proposed programming language COBOL (1966) and
then network model (1971)
other aspects of database systems also proposed
at this time include database administrator, data
security, audit trail
objective of network model is to separate data
structure from physical storage, eliminate
unnecessary duplication of data with associated
errors and costs

61
Networked model (contd)

uses concept of a data definition language, data
manipulation language
uses concept of mn linkages or relationships
an owner record can have many member records
a member record can have several owners
hierarchical model allows only 1n
example of a network database
a hospital database has three record types
patient name, date of admission, etc.

62
Networked model (contd)

doctor name, etc.
ward number of beds, name of staff nurse, etc.
need to link patients to doctor, also to ward
doctor record can own many patient records
patient record can be owned by both doctor and
ward records
network DBMSs include methods for building and
redefining linkages, e.g. when patient is
assigned to ward

63
Problems with the networked model

links between records of the same type are not
allowed
while a record can be owned by several records of
different types, it cannot be owned by more than
one record of the same type (patient can have
only one doctor, only one ward)

64
Relational database model

the most popular DBMS model for GIS
Used by ArcInfo
flexible approach to linkages between records
comes closest to modeling the complexity of
spatial relationships between objects
proposed by IBM researcher E.F. Codd in 1970
more of a concept than a data structure
internal architecture varies substantially from
one RDBMS to another

65
Relational databases (contd)

each record has a set of attributes
the range of possible values (domain) is defined
for each attribute
records of each type form a table or relation
each row is a record or tuple
each column is an attribute
note the potential confusion - a "relation" is a
table of records, not a linkage between records
the degree of a relation is the number of
attributes in the table

66
Relational databases (contd)

1 attribute is a unary relation
2 attributes is a binary relation
n attributes is an n-ary relation
Examples
unary COURSES(SUBJECT)
binary PERSONS(NAME,ADDRESS) OWNER(PERSON
NAME,HOUSE ADDRESS)
ternary HOUSES(ADDRESS,PRICE,SIZE)

67
How a relational database works

a key of a relation is a subset of attributes
with the following properties
unique identification
The value of the key is unique for each tuple
nonredundancy
no attribute in the key can be discarded without
destroying the key's uniqueness
A prime attribute of a relation is an attribute
which participates in at least one key
All other attributes are non-prime

68
Relational database key example

For example, a phone number is a unique key in a
phone directory
in the normal phone directory the key attributes
are last name, first name, street address
if street address is dropped from this key, the
key is no longer unique (many Smith, Mary's)

69
Pros and cons

the most flexible of the database models
no obvious match of implementation to model -
model is the user's view, not the way the data is
organized internally
is the basis of an area of formal mathematical
theory

70
Pros and cons (contd)

most RDBMS data manipulation languages require
the user to know the contents of relations, but
allow access from one relation to another through
common attributes Example Given two relations
PROPERTY(ADDRESS,VALUE,COUNTY_ID) COUNTY(COUNTY
ID,NAME,TAX_RATE)
to answer the query "what are the taxes on
property x" the user would

71
Pros and cons (contd)

retrieve the property record
link the property and county records through the
common attribute COUNTY_ID
compute the taxes by multiplying VALUE from the
property tuple with TAX_RATE from the linked
county tuple

72
Data Interpolation

Process of estimating the value of data at a
location where no data were collected

73
Example

Elevation data are collected at points
Carbon Dioxide data were collected at points

74
Triangulated Irregular Network

TIN is composed of
nodes
edges
triangles
hull polygons
topology

75
Nodes

The fundamental building blocks of the TIN
They originate from the points and arc vertices
contained in the input data sources.
Every node is incorporated in the tin
triangulation.
Every node in the tin surface model must have a z
value.

76
Creating Triangles
Delaunay criterion If you draw a circle around
the triangle with each of the points intersecting
the circle, no other point may fall within the
circle
77
Take the points, and draw triangles
78
Edges

Every node is joined with its nearest neighbors
by edges to form triangles which satisfy the
Delaunay criterion.
Each edge has two nodes, but a node may have two
or more edges.

79
Triangles

Each facet describes the behavior of a portion of
the tins surface.
The x,y,z coordinate values of a triangles three
nodes can be used to derive information about the
facet, such as slope, aspect, surface area, and
surface length.

80
Hulls

The hull is formed by one or more polygons
containing the entire set of data points used to
construct the tin.
The hull polygons define the zone of
interpolation of the tin.

81
Hull Types
Convex Hull
Hull
82
Topology

Topology is maintained with information of each
triangles nodes, edge numbers and type, and
adjacency to other triangles.
For each triangle, TIN records
The triangle number
The numbers of each adjacent triangle
The three nodes defining the triangle
The x,y coordinates of each node
The surface z value of each node
Also records the series of nodes that make up
the hull

83
Example
84
Triangle topology
85
Coordinates and attributes
86
What can we do with TINs

Calculate slope, aspect, and elevation at any
point on the surface
Because edges have a node with a z value at each
end, it is possible to calculate a slope along
the edge from one node to the other.

87
Calculating elevation along an edge
Using IDW (inverse distance weighting), we can
estimate the elevation value along any node
New Z (1000.5)(800.5) 90
88
In fact, we can estimate a z value anywhere in
our TIN
900.6 800.4 54 32 86
860.5 900.5 4345 88
89
Now, imagine doing this for each square dot
location
90
What youve created is a surface called a lattice

Lattice rows and columns of continuous values

91
Comparison
Square Grid Tessellation
Lattice
92
Four sources

Data that are collected in a raster format (e.g.,
satellite data)
Data in vector format converted to raster format
Data in a paper map converted to raster format
Interpolating data from points

Method of converting our TIN into a Lattice
93
In class exercise
94
Network Data Model
95
(No Transcript)
96
(No Transcript)
97
Some Terms

Edges (links) Streets, transmission lines, pipe,
and stream
Junctions (nodes) Street intersections, fuses,
switches, service taps, and the confluence of
stream reaches are examples of junctions
Impedance the cost associated with traveling
along a specific link

Edges connect together at junctions
The flow from one edge to another edge through
junctions
Automobiles, electrons, water - can be
transferred to another edge
Impedance can be applied to edges or junctions

99
Types of networks

Straight network (animal movement)
Branching network (stream patterns)
Circuit (street patterns)
Directed flows can move in a single direction
Undirected flows can move in either direction

100
Analysis of networks

Connectivity
gamma index ratio between the number of links in
a network to the maximum possible
alpha index ratio of the number of routes
through a network to the maximum possible
Shortest path

101
Gamma index
102
Gamma Index
103
(No Transcript)
104
(No Transcript)
105
Algorithm Terms

Nodes
Origin node
Adjacent node
Reached node
Scanned node
Unscanned node

Cumulative Cost
Tables
Scanned table
Reached table Cumulative Cost

106
(No Transcript)
107
(No Transcript)
108
Step One
109
Step Two
110
Step Three
111
Step Four
112
Step Five
113
Step Six
114
Step Seven
115
Step Eight
116
Step Nine
117
Step Ten
118
Step Eleven
119
Step Twelve
120
Step Thirteen
121
Break time!
122
Raster data
123
What type of data?

Continuous data
Examples elevation, temperature
Square grid tessellation also called raster

124
Raster Models (tessellation)
125
Raster
Data values are stored in rows and columns
126
Two types

Scanned Map images
Digital Raster Graphic
Other maps
Tessellation Models
Square Grid Tessellation
Hexagon Tessellation

127
Scanned Maps

Scanned map as a photograph
The value of each cell represents the color on
the map needs to be interpreted the way a
paper/analog map is interpreted

128
Digital Raster Graphic (DRG)
There is typically another file linked with the
DRG, so that the geographic position of the
graphic is known
129
MapQuest
130
Maps or Images??
131
Summary of scanned maps

Have the characteristics of an analog map in that
the location information and the attributes are
stored as a visual product
No queries can be made based on the database

132
Tessellation Models

Location-based spatial data model process of
dividing an area into smaller, contiguous tiles
with no gaps between them
Types
regular and irregular
Uses continuous surfaces
Pros easy to implement and manipulate
Cons high data storage, output not cartographic
quality

133
Spatial and Attribute Data

Combined in a single file
Unlike the scanned maps, they can be searched

134
Tessellation Models

Regular

Most common
Rarely used
135
Tessellation models
Regular grid
136
Data

Rows and columns containing the attribute value
associated with each data layer
The row/column location of the data value
represents the spatial position
Exact geographic position is typically
established with header information before the
rows and columns of data
Also need knowledge of what the values represent
(e.g., elevation in meters) typically part of
the metadata

137
Rows and Columns
138
Geographic Position
origin
orientation
size of each cell
139
(No Transcript)
140
(No Transcript)
141
Sample data
142
Each cell has a value
143
Data File
Origin (x,y) Ymax (x,y) Row,col Cell size
144
Tessellation models
Hexagonal mesh
Primary advantage over square grid tessellation
is distance measurements. Important in
applications that need to spread distances evenly
- e.g., spread of forest fires
145
Distance between adjacent cells?
Example modeling the spread of a fire from one
cell to the next adjacent cell.
146
Distance measurements between cells is the same
in the hexagon model
147
Where do we get raster data? Four sources

Data that are collected in a raster format (e.g.,
satellite data)
Data in vector format converted to raster format
Data in a paper map converted to raster format
DRG
Converted into a tessellation database
Interpolating data from points

148
One satellite data

Example Landsat Thematic Mapper (TM) data from
USGS

149
(No Transcript)
150
(No Transcript)
151
Multispectral

Multispectral meaning that each cell has more
than one value (different sections of the
electromagnetic spectrum) associated with it
(these are called bands)

152
Bands and Resolution

Fixed spatial resolution (either 30 meters or 120
meters) depending on the band

Landsats 4-5 Wavelength (micrometers) Resolution
(meters) Band 1 0.45-0.52 30 Band 2
0.52-0.60 30 Band 3 0.63-0.69 30 Band
4 0.76-0.90 30 Band 5 1.55-1.75 30
Band 6 10.40-12.50 120 Band 7 2.08-2.35
30
153
What can we do with the bands?

Band 1 penetrates water for bathymetric mapping
along coastal areas and is useful for
soil-vegetation differentiation and for
distinguishing forest types.
Band 2 detects green reflectance from healthy
vegetation, and
Band 3 is designed for detecting chlorophyll
absorption in vegetation.
Band 4 data is ideal for detecting near-IR
reflectance peaks in healthy green vegetation and
for detecting water-land interfaces.
The two mid-IR red bands on (bands 5 and 7) are
useful for vegetation and soil moisture studies
and for discriminating between rock and mineral
types.
The thermal-IR band on (band 6) is designed to
assist in thermal mapping, and is used for soil
moisture and vegetation studies.

154
False color

Bands 4, 3, and 2 can be combined to make
false-color composite images where band 4
represents the red, band 3 represents the green,
and band 2 represents the blue portions of the
electromagnetic spectrum. This combination makes
vegetation appear as shades of red, brighter reds
indicating more vigorously growing vegetation.
Soils with no or sparse vegetation range from
white (sands) to greens or browns depending on
moisture and organic matter content. Water bodies
will appear blue. Deep, clear water appears dark
blue to black in color, while sediment-laden or
shallow waters appear lighter in color. Urban
areas appear blue-gray in color. Clouds and snow
appear bright white. Clouds and snow are usually
distinguishable from each other by the shadows
associated with clouds

155
False Color Example
156
False Color example
157
False Color example
158
With the same data (NDVI)
Normalized difference vegetation index
159
Where do we get raster data? Four sources

Data that are collected in a raster format (e.g.,
satellite data)
Data in vector format converted to raster format
Data in a paper map converted to raster format
DRG
Converted into a tessellation database
Interpolating data from points

160
Second source for raster data

Data that are in another format (either vector or
paper map) and need to be converted to a raster
format

161
Land use in vector format
To convert it, we need to decide what size each
cell needs to be. How do we decide? Minimum
mapping unit and spatial resolution.
162
Sort the database
163
Minimum mapping unit
164
Better
This would give us a 2 m cell size
165
Default settings
166
Resulting data
167
Resulting data
755 (default)
168
200 meters
169
100 meters
170
10 meters
171
(No Transcript)
172
(No Transcript)
173
(No Transcript)
174
Which is best?
vector
100 meter
10 meter
175
Area and database size comparisons
176
Three conversion from a paper map

Scanning can convert to a DRG or into a square
grid or hexagon database
Same rules apply as with vector scanning best
approach is to trace to mylar, then scan
(my personal experience it is easier to vector
digitize, then use software to convert to raster
format)

Note with scanning you can create either a DRG
or a tessellation database
177
Database size can be a problem Compaction
Run length encoding
178
In some cases, there is very little you can do
179
Four sources