Title: Playing with Spaghetti: Vector and Raster Data Models in Depth
1Playing with SpaghettiVector and Raster Data
Models in Depth
- Talbot J. Brooks
- ASU Dept. of Geography
2Tonights topics
- Why we were gone
- Big picture overview Raster vs. Vector
- The details Vector data models
- The details Raster data models
3Review you tell me
- What is the difference between vector and raster
data? - Basic vector data types
- Examples of raster data
- Computer file structures
- Flat
- Hierarchical
- Network
- Relational
4RASTER AND VECTOR FORMATS
RASTER Grid-based, Simplify reality VECTOR
Analog map, Cartography
5DATA MODEL OF RASTER AND VECTOR
REAL WORLD
1 2 3 4 5 6
7 8 9 10
1 2 3 4 5 6 7 8 9 10
GRID RASTER
VECTOR
6RASTER DATA MODEL
- derive from formulation that real world - it has
spatial elements and objects fills those elements - real world is represented with uniform cells
- list of cells is a rectangle
- cell comprises of triangles, hexagon and higher
complexities - a cell reports its own true characteristics
- per units cell does not represent an object
- an object is represented by a group of cells
7Lake
River
Pond
Reality - Hydrography
Lake
River
Pond
Reality overlaid with a grid
1
1
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
0
0
0
0
0
0
0
0 No Water Feature 1 Water Body 2 River
1
1
1
2
0
0
0
0
0
0
0
0
2
2
1
1
0
0
0
0
0
0
0
0
2
2
0
0
0
0
0
0
0
0
0
0
2
0
0
0
0
0
0
0
0
0
0
0
Resulting raster
Creating a Raster
8VECTOR DATA MODEL
- derived from the formulation of spatial concepts
that emphasize on real world objects - geometry primitives of vector data model are
point, line and polygon - objects can be built from these primitives
- object location determined by represented
location point - uniqueness of vector data model lies in its
management and storage of data geometry
primitives - spaghetti model
- topology model
9VECTOR CHARACTERISTICS
POINT X LINE POLYGON
10RASTER TO VECTOR
RIVER CHANGED FROM RASTER TO VECTOR FORMAT
RIVER THAT HAS BEEN
VECTORISED ORIGINAL RIVER
11PRO AND CONS OF RASTER MODEL
- pro
- raster data is more affordable
- simple data structure
- very efficient overlay operation
- cons
- topology relationship difficult to implement
- raster data requires large storage
- not all world phenomena related directly with
raster representation - raster data mainly is obtained from satellite
images and scanning
12PRO AND CONS OF VECTOR MODEL
- pro
- more efficient data storage
- topological encoding more efferent
- suitable for most usage and compatible with data
- good graphic presentation
- cons
- overlay operation not efficient
- complex data structure
13A look behind the scenes Vector GIS data models
- Spaghetti model
- Topological vector model
- Cardinality (this is gonna hurt!)
- Break
14The Spaghetti Model
- The spaghetti model is the most simple vector
data model - The model is a direct representation of a
graphical image - NO explicit topological information
15Spaghetti Model
- Description direct line for line translation of
the paper map (often viewed as raw digital data) - Pros easy to implement, good for fast drawing
- Cons storage and searches are sequential,
storage of attribute data
16Spaghetti model
17Topology
- Branch of mathematics dealing with geometric
properties - Geometry of objects remain invariant under
transformations - Neighborhood relationships remain the same
- Topology is the distinguishing basis for more
complicated vector models
18Topological Vector Model
- Topological data models are provided with
information that can help us in obtaining
solutions to common operations in advanced GIS
analytical techniques. - This is done by explicitly recording adjacency
information into the data structure, eliminating
the need to determine it for multiple operations. - Each line segment, the basic logical entity in
topological data structures, begins and ends when
it either contacts or intersects another line, or
when there is a change in direction of the line.
19Topological Vector Model
- Each line has two sets of numbers, a pair of
coordinates and an associated node number. - Each line segment has its identification number
that is used as a pointer to indicate which set
of nodes represent its beginning and ending.
20Topological Vector Model
- Polygons also have identification codes that
relate back to the link numbers. Each link in
the polygon now is capable of looking left and
right at the polygon numbers to see which two
polygons are also stored explicitly, so that even
this tedious step is eliminated. - The Topological data model more closely
approximates how we as map readers identify the
spatial relationships contained in an analog map
document.
21Topological Vector Model
22How do we preserve topology ina computer
database?
- What are we storing?
- Points, lines, polygons
- What do we need to preserve?
- Neighborhood relationships between these objects
- Terminology
- point, link, node, polygon
23Terminology
- Point x, y coordinate identifying a geographic
location - Link (line, arc) an ordered set of points with a
node at the beginning and end of it - Node the beginning and end of link (often
defined where 3 or more lines connect) - Polygon two or more links connected at the
nodes, contains a point inside to identify the
polygons attributes
24Nevada
Utah
California
Arizona
25Identify the polygons
26Create the polygon attribute table (PAT)
27Identify the nodes
28Node table
29Identify the links (arcs, lines)
30Simplify this
31Create the topology!
32Nodes First
33Nodes First
34Polygons
35Polygons
36Identify the points
37Link List
38Point Coordinates
39Putting it all together
40Putting it all together
41Putting it all together
42Putting it all together
43Putting it all together
44Cardinality
- Cardinality is the relationship between spatial
objects, attributes, or spatial objects and
attributes. - This relationship may be defined as
- 11
- 1many
- manymany
45Cardinality
- We can use cardinality to establish relationships
and rules among objects and attributes - This becomes the basis for modeling how data is
arranged within a GIS - especially one that uses
vector data.
46Cardinality contd
- Entity-entity relationships are described by
cardinality which may be - One to one. A FOREST can have only one MANAGER
and a MANAGER can have only one FOREST - Many to one. Many FACILITIES may be contained
within one FOREST - Many to Many. The relationship water_supply may
have many entries and may be connected to many
entries FACILITIES, FOREST, etc
47Cardinality contd
- The same concept applies to space
- A bathroom is located within a house (11)
- Many homes are within a town (many1)
- Many people are within many homes (manymany)
48Diagram Characteristics
- Boxes represent entities
- Ovals represent attributes
- Diamonds represent relationships
- Note how cardinality is depicted
- Key attributes are underlined
- Multi-valued attributes are in double ovals
49Entity-Relationship (ER) Diagrams A Conceptual
Model
50Exercise work in pairs 10 minutes
- Create a simple ER diagram for your neighborhood
- Pick a feature that matches each geometry type
(point, line). For example - For points, you might pick fire hydrants and lamp
posts - For lines, you might pick streets and water mains
- For polygons, pick parcels or zip codes
51Explanation of database types
- a database is a collection of non-redundant data
which can be shared by different application
systems - implies separation of physical storage from use
of the data by an application program, i.e.
program/data independence - changes can be made to data without affecting
other components of the system.
52Database types
- tabular ("flat file") - data in a single table
- hierarchical
- network
- relational
53The ideal GIS database is one that maximizes the
uniqueness of every feature while minimizing
total data quantity
54Hierarchical databases
- Developed in the 1960s by International Business
Machines (IBM) - Somewhat resembles real-world filing systems
- Tree-structured, similar to folder arrangements
in a computer directory - The database keeps track of the different record
types, their attributes, and the hierarchical
relationships between them - The attribute which assigns records to levels in
the database structure is called the key (e.g. is
record a department, part or supplier?)
55Features of a hierarchical model
- a set of record "types"
- e.g. supplier record type, department record
type, part record type - a set of links connecting all record types in one
data structure diagram (tree) - at most one link between two record types, hence
links need not be named - for every record, there is only one parent record
at the next level up in the tree
56Features (contd)
- e.g. every county has exactly one state, every
part has exactly one department - no connections between occurrences of the same
record type - cannot go between records at the same level
unless they share the same parent - diagram
57Pros and cons
- data must possess a tree structure
- tree structure is natural for geographical data
- data access is easy via the key attribute, but
difficult for other attributes - in the business case, easy to find record given
its type (department, part or supplier) - in the geographical case, easy to find record
given its geographical level (state, county,
city, census tract), but difficult to find it
given any other attribute
58Pros and cons (contd)
- e.g. find the records with population 5,000 or
less - tree structure is inflexible
- cannot define new linkages between records once
the tree is established - e.g. in the geographical case, new relationships
between objects - cannot define linkages laterally or diagonally in
the tree, only vertically
59Pros and cons (contd)
- the only geographical relationships which can be
coded easily are "is contained in" or "belongs
to" - DBMSs based on the hierarchical model (e.g.
System 2000) have often been used to store
spatial data, but have not been very successful
as bases for GIS
60Network data model
- developed in mid 1960s as part of work of CODASYL
(Conference on Data Systems Languages) which
proposed programming language COBOL (1966) and
then network model (1971) - other aspects of database systems also proposed
at this time include database administrator, data
security, audit trail - objective of network model is to separate data
structure from physical storage, eliminate
unnecessary duplication of data with associated
errors and costs
61Networked model (contd)
- uses concept of a data definition language, data
manipulation language - uses concept of mn linkages or relationships
- an owner record can have many member records
- a member record can have several owners
- hierarchical model allows only 1n
- example of a network database
- a hospital database has three record types
- patient name, date of admission, etc.
62Networked model (contd)
- doctor name, etc.
- ward number of beds, name of staff nurse, etc.
- need to link patients to doctor, also to ward
- doctor record can own many patient records
- patient record can be owned by both doctor and
ward records - network DBMSs include methods for building and
redefining linkages, e.g. when patient is
assigned to ward
63Problems with the networked model
- links between records of the same type are not
allowed - while a record can be owned by several records of
different types, it cannot be owned by more than
one record of the same type (patient can have
only one doctor, only one ward)
64Relational database model
- the most popular DBMS model for GIS
- Used by ArcInfo
- flexible approach to linkages between records
comes closest to modeling the complexity of
spatial relationships between objects - proposed by IBM researcher E.F. Codd in 1970
- more of a concept than a data structure
- internal architecture varies substantially from
one RDBMS to another
65Relational databases (contd)
- each record has a set of attributes
- the range of possible values (domain) is defined
for each attribute - records of each type form a table or relation
- each row is a record or tuple
- each column is an attribute
- note the potential confusion - a "relation" is a
table of records, not a linkage between records - the degree of a relation is the number of
attributes in the table
66Relational databases (contd)
- 1 attribute is a unary relation
- 2 attributes is a binary relation
- n attributes is an n-ary relation
- Examples
- unary COURSES(SUBJECT)
- binary PERSONS(NAME,ADDRESS) OWNER(PERSON
NAME,HOUSE ADDRESS) - ternary HOUSES(ADDRESS,PRICE,SIZE)
67How a relational database works
- a key of a relation is a subset of attributes
with the following properties - unique identification
- The value of the key is unique for each tuple
- nonredundancy
- no attribute in the key can be discarded without
destroying the key's uniqueness - A prime attribute of a relation is an attribute
which participates in at least one key - All other attributes are non-prime
68Relational database key example
- For example, a phone number is a unique key in a
phone directory - in the normal phone directory the key attributes
are last name, first name, street address - if street address is dropped from this key, the
key is no longer unique (many Smith, Mary's)
69Pros and cons
- the most flexible of the database models
- no obvious match of implementation to model -
model is the user's view, not the way the data is
organized internally - is the basis of an area of formal mathematical
theory
70Pros and cons (contd)
- most RDBMS data manipulation languages require
the user to know the contents of relations, but
allow access from one relation to another through
common attributes Example Given two relations
PROPERTY(ADDRESS,VALUE,COUNTY_ID) COUNTY(COUNTY
ID,NAME,TAX_RATE) - to answer the query "what are the taxes on
property x" the user would
71Pros and cons (contd)
- retrieve the property record
- link the property and county records through the
common attribute COUNTY_ID - compute the taxes by multiplying VALUE from the
property tuple with TAX_RATE from the linked
county tuple
72Data Interpolation
- Process of estimating the value of data at a
location where no data were collected
73Example
- Elevation data are collected at points
- Carbon Dioxide data were collected at points
74Triangulated Irregular Network
- TIN is composed of
- nodes
- edges
- triangles
- hull polygons
- topology
75Nodes
- The fundamental building blocks of the TIN
- They originate from the points and arc vertices
contained in the input data sources. - Every node is incorporated in the tin
triangulation. - Every node in the tin surface model must have a z
value.
76Creating Triangles
Delaunay criterion If you draw a circle around
the triangle with each of the points intersecting
the circle, no other point may fall within the
circle
77Take the points, and draw triangles
78Edges
- Every node is joined with its nearest neighbors
by edges to form triangles which satisfy the
Delaunay criterion. - Each edge has two nodes, but a node may have two
or more edges.
79Triangles
- Each facet describes the behavior of a portion of
the tins surface. - The x,y,z coordinate values of a triangles three
nodes can be used to derive information about the
facet, such as slope, aspect, surface area, and
surface length.
80Hulls
- The hull is formed by one or more polygons
containing the entire set of data points used to
construct the tin. - The hull polygons define the zone of
interpolation of the tin.
81Hull Types
Convex Hull
Hull
82Topology
- Topology is maintained with information of each
triangles nodes, edge numbers and type, and
adjacency to other triangles. - For each triangle, TIN records
- The triangle number
- The numbers of each adjacent triangle
- The three nodes defining the triangle
- The x,y coordinates of each node
- The surface z value of each node
- Also records the series of nodes that make up
the hull
83Example
84Triangle topology
85Coordinates and attributes
86What can we do with TINs
- Calculate slope, aspect, and elevation at any
point on the surface - Because edges have a node with a z value at each
end, it is possible to calculate a slope along
the edge from one node to the other.
87Calculating elevation along an edge
Using IDW (inverse distance weighting), we can
estimate the elevation value along any node
New Z (1000.5)(800.5) 90
88In fact, we can estimate a z value anywhere in
our TIN
900.6 800.4 54 32 86
860.5 900.5 4345 88
89Now, imagine doing this for each square dot
location
90What youve created is a surface called a lattice
- Lattice rows and columns of continuous values
91Comparison
Square Grid Tessellation
Lattice
92Four sources
- Data that are collected in a raster format (e.g.,
satellite data) - Data in vector format converted to raster format
- Data in a paper map converted to raster format
- Interpolating data from points
Method of converting our TIN into a Lattice
93In class exercise
94Network Data Model
95(No Transcript)
96(No Transcript)
97Some Terms
- Edges (links) Streets, transmission lines, pipe,
and stream - Junctions (nodes) Street intersections, fuses,
switches, service taps, and the confluence of
stream reaches are examples of junctions - Impedance the cost associated with traveling
along a specific link
98- Edges connect together at junctions
- The flow from one edge to another edge through
junctions - Automobiles, electrons, water - can be
transferred to another edge - Impedance can be applied to edges or junctions
99Types of networks
- Straight network (animal movement)
- Branching network (stream patterns)
- Circuit (street patterns)
- Directed flows can move in a single direction
- Undirected flows can move in either direction
100Analysis of networks
- Connectivity
- gamma index ratio between the number of links in
a network to the maximum possible - alpha index ratio of the number of routes
through a network to the maximum possible - Shortest path
101Gamma index
102Gamma Index
103(No Transcript)
104(No Transcript)
105Algorithm Terms
- Nodes
- Origin node
- Adjacent node
- Reached node
- Scanned node
- Unscanned node
- Cumulative Cost
- Tables
- Scanned table
- Reached table Cumulative Cost
106(No Transcript)
107(No Transcript)
108Step One
109Step Two
110Step Three
111Step Four
112Step Five
113Step Six
114Step Seven
115Step Eight
116Step Nine
117Step Ten
118Step Eleven
119Step Twelve
120Step Thirteen
121Break time!
122Raster data
123What type of data?
- Continuous data
- Examples elevation, temperature
- Square grid tessellation also called raster
124Raster Models (tessellation)
125Raster
Data values are stored in rows and columns
126Two types
- Scanned Map images
- Digital Raster Graphic
- Other maps
- Tessellation Models
- Square Grid Tessellation
- Hexagon Tessellation
127Scanned Maps
- Scanned map as a photograph
- The value of each cell represents the color on
the map needs to be interpreted the way a
paper/analog map is interpreted
128Digital Raster Graphic (DRG)
There is typically another file linked with the
DRG, so that the geographic position of the
graphic is known
129MapQuest
130Maps or Images??
131Summary of scanned maps
- Have the characteristics of an analog map in that
the location information and the attributes are
stored as a visual product - No queries can be made based on the database
132Tessellation Models
- Location-based spatial data model process of
dividing an area into smaller, contiguous tiles
with no gaps between them - Types
- regular and irregular
- Uses continuous surfaces
- Pros easy to implement and manipulate
- Cons high data storage, output not cartographic
quality
133Spatial and Attribute Data
- Combined in a single file
- Unlike the scanned maps, they can be searched
134Tessellation Models
Most common
Rarely used
135Tessellation models
Regular grid
136Data
- Rows and columns containing the attribute value
associated with each data layer - The row/column location of the data value
represents the spatial position - Exact geographic position is typically
established with header information before the
rows and columns of data - Also need knowledge of what the values represent
(e.g., elevation in meters) typically part of
the metadata
137Rows and Columns
138Geographic Position
origin
orientation
size of each cell
139(No Transcript)
140(No Transcript)
141Sample data
142Each cell has a value
143Data File
Origin (x,y) Ymax (x,y) Row,col Cell size
144Tessellation models
Hexagonal mesh
Primary advantage over square grid tessellation
is distance measurements. Important in
applications that need to spread distances evenly
- e.g., spread of forest fires
145Distance between adjacent cells?
Example modeling the spread of a fire from one
cell to the next adjacent cell.
146Distance measurements between cells is the same
in the hexagon model
147Where do we get raster data? Four sources
- Data that are collected in a raster format (e.g.,
satellite data) - Data in vector format converted to raster format
- Data in a paper map converted to raster format
- DRG
- Converted into a tessellation database
- Interpolating data from points
148One satellite data
- Example Landsat Thematic Mapper (TM) data from
USGS
149(No Transcript)
150(No Transcript)
151Multispectral
- Multispectral meaning that each cell has more
than one value (different sections of the
electromagnetic spectrum) associated with it
(these are called bands)
152Bands and Resolution
- Fixed spatial resolution (either 30 meters or 120
meters) depending on the band
Landsats 4-5 Wavelength (micrometers) Resolution
(meters) Band 1 0.45-0.52 30 Band 2
0.52-0.60 30 Band 3 0.63-0.69 30 Band
4 0.76-0.90 30 Band 5 1.55-1.75 30
Band 6 10.40-12.50 120 Band 7 2.08-2.35
30
153What can we do with the bands?
- Band 1 penetrates water for bathymetric mapping
along coastal areas and is useful for
soil-vegetation differentiation and for
distinguishing forest types. - Band 2 detects green reflectance from healthy
vegetation, and - Band 3 is designed for detecting chlorophyll
absorption in vegetation. - Band 4 data is ideal for detecting near-IR
reflectance peaks in healthy green vegetation and
for detecting water-land interfaces. - The two mid-IR red bands on (bands 5 and 7) are
useful for vegetation and soil moisture studies
and for discriminating between rock and mineral
types. - The thermal-IR band on (band 6) is designed to
assist in thermal mapping, and is used for soil
moisture and vegetation studies.
154False color
- Bands 4, 3, and 2 can be combined to make
false-color composite images where band 4
represents the red, band 3 represents the green,
and band 2 represents the blue portions of the
electromagnetic spectrum. This combination makes
vegetation appear as shades of red, brighter reds
indicating more vigorously growing vegetation.
Soils with no or sparse vegetation range from
white (sands) to greens or browns depending on
moisture and organic matter content. Water bodies
will appear blue. Deep, clear water appears dark
blue to black in color, while sediment-laden or
shallow waters appear lighter in color. Urban
areas appear blue-gray in color. Clouds and snow
appear bright white. Clouds and snow are usually
distinguishable from each other by the shadows
associated with clouds
155False Color Example
156False Color example
157False Color example
158With the same data (NDVI)
Normalized difference vegetation index
159Where do we get raster data? Four sources
- Data that are collected in a raster format (e.g.,
satellite data) - Data in vector format converted to raster format
- Data in a paper map converted to raster format
- DRG
- Converted into a tessellation database
- Interpolating data from points
160Second source for raster data
- Data that are in another format (either vector or
paper map) and need to be converted to a raster
format
161Land use in vector format
To convert it, we need to decide what size each
cell needs to be. How do we decide? Minimum
mapping unit and spatial resolution.
162Sort the database
163Minimum mapping unit
164Better
This would give us a 2 m cell size
165Default settings
166Resulting data
167Resulting data
755 (default)
168200 meters
169100 meters
17010 meters
171(No Transcript)
172(No Transcript)
173(No Transcript)
174Which is best?
vector
100 meter
10 meter
175Area and database size comparisons
176Three conversion from a paper map
- Scanning can convert to a DRG or into a square
grid or hexagon database - Same rules apply as with vector scanning best
approach is to trace to mylar, then scan - (my personal experience it is easier to vector
digitize, then use software to convert to raster
format)
Note with scanning you can create either a DRG
or a tessellation database
177Database size can be a problem Compaction
Run length encoding
178In some cases, there is very little you can do
179Four sources
- Data that are collected in a raster format (e.g.,
satellite data) - Data in vector format converted to raster format
- Data in a paper map converted to raster format
- Interpolating data from points