Playing with Spaghetti: Vector and Raster Data Models in Depth - PowerPoint PPT Presentation

1 / 179
About This Presentation
Title:

Playing with Spaghetti: Vector and Raster Data Models in Depth

Description:

The details: Raster data models. Review you tell me ... topological encoding more efferent. suitable for most usage and compatible with data ... – PowerPoint PPT presentation

Number of Views:197
Avg rating:3.0/5.0
Slides: 180
Provided by: mississipp6
Category:

less

Transcript and Presenter's Notes

Title: Playing with Spaghetti: Vector and Raster Data Models in Depth


1
Playing with SpaghettiVector and Raster Data
Models in Depth
  • Talbot J. Brooks
  • ASU Dept. of Geography

2
Tonights topics
  • Why we were gone
  • Big picture overview Raster vs. Vector
  • The details Vector data models
  • The details Raster data models

3
Review you tell me
  • What is the difference between vector and raster
    data?
  • Basic vector data types
  • Examples of raster data
  • Computer file structures
  • Flat
  • Hierarchical
  • Network
  • Relational

4
RASTER AND VECTOR FORMATS
RASTER Grid-based, Simplify reality VECTOR
Analog map, Cartography
5
DATA MODEL OF RASTER AND VECTOR
REAL WORLD
1 2 3 4 5 6
7 8 9 10
1 2 3 4 5 6 7 8 9 10
GRID RASTER
VECTOR
6
RASTER DATA MODEL
  • derive from formulation that real world - it has
    spatial elements and objects fills those elements
  • real world is represented with uniform cells
  • list of cells is a rectangle
  • cell comprises of triangles, hexagon and higher
    complexities
  • a cell reports its own true characteristics
  • per units cell does not represent an object
  • an object is represented by a group of cells

7
Lake
River
Pond
Reality - Hydrography
Lake
River
Pond
Reality overlaid with a grid
1
1
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
0
0
0
0
0
0
0
0 No Water Feature 1 Water Body 2 River
1
1
1
2
0
0
0
0
0
0
0
0
2
2
1
1
0
0
0
0
0
0
0
0
2
2
0
0
0
0
0
0
0
0
0
0
2
0
0
0
0
0
0
0
0
0
0
0
Resulting raster
Creating a Raster
8
VECTOR DATA MODEL
  • derived from the formulation of spatial concepts
    that emphasize on real world objects
  • geometry primitives of vector data model are
    point, line and polygon
  • objects can be built from these primitives
  • object location determined by represented
    location point
  • uniqueness of vector data model lies in its
    management and storage of data geometry
    primitives
  • spaghetti model
  • topology model

9
VECTOR CHARACTERISTICS
POINT X LINE POLYGON
10
RASTER TO VECTOR
RIVER CHANGED FROM RASTER TO VECTOR FORMAT
RIVER THAT HAS BEEN
VECTORISED ORIGINAL RIVER
11
PRO AND CONS OF RASTER MODEL
  • pro
  • raster data is more affordable
  • simple data structure
  • very efficient overlay operation
  • cons
  • topology relationship difficult to implement
  • raster data requires large storage
  • not all world phenomena related directly with
    raster representation
  • raster data mainly is obtained from satellite
    images and scanning

12
PRO AND CONS OF VECTOR MODEL
  • pro
  • more efficient data storage
  • topological encoding more efferent
  • suitable for most usage and compatible with data
  • good graphic presentation
  • cons
  • overlay operation not efficient
  • complex data structure

13
A look behind the scenes Vector GIS data models
  • Spaghetti model
  • Topological vector model
  • Cardinality (this is gonna hurt!)
  • Break

14
The Spaghetti Model
  • The spaghetti model is the most simple vector
    data model
  • The model is a direct representation of a
    graphical image
  • NO explicit topological information

15
Spaghetti Model
  • Description direct line for line translation of
    the paper map (often viewed as raw digital data)
  • Pros easy to implement, good for fast drawing
  • Cons storage and searches are sequential,
    storage of attribute data

16
Spaghetti model
17
Topology
  • Branch of mathematics dealing with geometric
    properties
  • Geometry of objects remain invariant under
    transformations
  • Neighborhood relationships remain the same
  • Topology is the distinguishing basis for more
    complicated vector models

18
Topological Vector Model
  • Topological data models are provided with
    information that can help us in obtaining
    solutions to common operations in advanced GIS
    analytical techniques.
  • This is done by explicitly recording adjacency
    information into the data structure, eliminating
    the need to determine it for multiple operations.
  • Each line segment, the basic logical entity in
    topological data structures, begins and ends when
    it either contacts or intersects another line, or
    when there is a change in direction of the line.

19
Topological Vector Model
  • Each line has two sets of numbers, a pair of
    coordinates and an associated node number.
  • Each line segment has its identification number
    that is used as a pointer to indicate which set
    of nodes represent its beginning and ending.

20
Topological Vector Model
  • Polygons also have identification codes that
    relate back to the link numbers. Each link in
    the polygon now is capable of looking left and
    right at the polygon numbers to see which two
    polygons are also stored explicitly, so that even
    this tedious step is eliminated.
  • The Topological data model more closely
    approximates how we as map readers identify the
    spatial relationships contained in an analog map
    document.

21
Topological Vector Model
22
How do we preserve topology ina computer
database?
  • What are we storing?
  • Points, lines, polygons
  • What do we need to preserve?
  • Neighborhood relationships between these objects
  • Terminology
  • point, link, node, polygon

23
Terminology
  • Point x, y coordinate identifying a geographic
    location
  • Link (line, arc) an ordered set of points with a
    node at the beginning and end of it
  • Node the beginning and end of link (often
    defined where 3 or more lines connect)
  • Polygon two or more links connected at the
    nodes, contains a point inside to identify the
    polygons attributes

24
Nevada
Utah
California
Arizona
25
Identify the polygons
26
Create the polygon attribute table (PAT)
27
Identify the nodes
28
Node table
29
Identify the links (arcs, lines)
30
Simplify this
31
Create the topology!
32
Nodes First
33
Nodes First
34
Polygons
35
Polygons
36
Identify the points
37
Link List
38
Point Coordinates
39
Putting it all together
40
Putting it all together
41
Putting it all together
42
Putting it all together
43
Putting it all together
44
Cardinality
  • Cardinality is the relationship between spatial
    objects, attributes, or spatial objects and
    attributes.
  • This relationship may be defined as
  • 11
  • 1many
  • manymany

45
Cardinality
  • We can use cardinality to establish relationships
    and rules among objects and attributes
  • This becomes the basis for modeling how data is
    arranged within a GIS - especially one that uses
    vector data.

46
Cardinality contd
  • Entity-entity relationships are described by
    cardinality which may be
  • One to one. A FOREST can have only one MANAGER
    and a MANAGER can have only one FOREST
  • Many to one. Many FACILITIES may be contained
    within one FOREST
  • Many to Many. The relationship water_supply may
    have many entries and may be connected to many
    entries FACILITIES, FOREST, etc

47
Cardinality contd
  • The same concept applies to space
  • A bathroom is located within a house (11)
  • Many homes are within a town (many1)
  • Many people are within many homes (manymany)

48
Diagram Characteristics
  • Boxes represent entities
  • Ovals represent attributes
  • Diamonds represent relationships
  • Note how cardinality is depicted
  • Key attributes are underlined
  • Multi-valued attributes are in double ovals

49
Entity-Relationship (ER) Diagrams A Conceptual
Model
50
Exercise work in pairs 10 minutes
  • Create a simple ER diagram for your neighborhood
  • Pick a feature that matches each geometry type
    (point, line). For example
  • For points, you might pick fire hydrants and lamp
    posts
  • For lines, you might pick streets and water mains
  • For polygons, pick parcels or zip codes

51
Explanation of database types
  • a database is a collection of non-redundant data
    which can be shared by different application
    systems
  • implies separation of physical storage from use
    of the data by an application program, i.e.
    program/data independence
  • changes can be made to data without affecting
    other components of the system.

52
Database types
  • tabular ("flat file") - data in a single table
  • hierarchical
  • network
  • relational

53
The ideal GIS database is one that maximizes the
uniqueness of every feature while minimizing
total data quantity
54
Hierarchical databases
  • Developed in the 1960s by International Business
    Machines (IBM)
  • Somewhat resembles real-world filing systems
  • Tree-structured, similar to folder arrangements
    in a computer directory
  • The database keeps track of the different record
    types, their attributes, and the hierarchical
    relationships between them
  • The attribute which assigns records to levels in
    the database structure is called the key (e.g. is
    record a department, part or supplier?)

55
Features of a hierarchical model
  • a set of record "types"
  • e.g. supplier record type, department record
    type, part record type
  • a set of links connecting all record types in one
    data structure diagram (tree)
  • at most one link between two record types, hence
    links need not be named
  • for every record, there is only one parent record
    at the next level up in the tree

56
Features (contd)
  • e.g. every county has exactly one state, every
    part has exactly one department
  • no connections between occurrences of the same
    record type
  • cannot go between records at the same level
    unless they share the same parent
  • diagram

57
Pros and cons
  • data must possess a tree structure
  • tree structure is natural for geographical data
  • data access is easy via the key attribute, but
    difficult for other attributes
  • in the business case, easy to find record given
    its type (department, part or supplier)
  • in the geographical case, easy to find record
    given its geographical level (state, county,
    city, census tract), but difficult to find it
    given any other attribute

58
Pros and cons (contd)
  • e.g. find the records with population 5,000 or
    less
  • tree structure is inflexible
  • cannot define new linkages between records once
    the tree is established
  • e.g. in the geographical case, new relationships
    between objects
  • cannot define linkages laterally or diagonally in
    the tree, only vertically

59
Pros and cons (contd)
  • the only geographical relationships which can be
    coded easily are "is contained in" or "belongs
    to"
  • DBMSs based on the hierarchical model (e.g.
    System 2000) have often been used to store
    spatial data, but have not been very successful
    as bases for GIS

60
Network data model
  • developed in mid 1960s as part of work of CODASYL
    (Conference on Data Systems Languages) which
    proposed programming language COBOL (1966) and
    then network model (1971)
  • other aspects of database systems also proposed
    at this time include database administrator, data
    security, audit trail
  • objective of network model is to separate data
    structure from physical storage, eliminate
    unnecessary duplication of data with associated
    errors and costs

61
Networked model (contd)
  • uses concept of a data definition language, data
    manipulation language
  • uses concept of mn linkages or relationships
  • an owner record can have many member records
  • a member record can have several owners
  • hierarchical model allows only 1n
  • example of a network database
  • a hospital database has three record types
  • patient name, date of admission, etc.

62
Networked model (contd)
  • doctor name, etc.
  • ward number of beds, name of staff nurse, etc.
  • need to link patients to doctor, also to ward
  • doctor record can own many patient records
  • patient record can be owned by both doctor and
    ward records
  • network DBMSs include methods for building and
    redefining linkages, e.g. when patient is
    assigned to ward

63
Problems with the networked model
  • links between records of the same type are not
    allowed
  • while a record can be owned by several records of
    different types, it cannot be owned by more than
    one record of the same type (patient can have
    only one doctor, only one ward)

64
Relational database model
  • the most popular DBMS model for GIS
  • Used by ArcInfo
  • flexible approach to linkages between records
    comes closest to modeling the complexity of
    spatial relationships between objects
  • proposed by IBM researcher E.F. Codd in 1970
  • more of a concept than a data structure
  • internal architecture varies substantially from
    one RDBMS to another

65
Relational databases (contd)
  • each record has a set of attributes
  • the range of possible values (domain) is defined
    for each attribute
  • records of each type form a table or relation
  • each row is a record or tuple
  • each column is an attribute
  • note the potential confusion - a "relation" is a
    table of records, not a linkage between records
  • the degree of a relation is the number of
    attributes in the table

66
Relational databases (contd)
  • 1 attribute is a unary relation
  • 2 attributes is a binary relation
  • n attributes is an n-ary relation
  • Examples
  • unary COURSES(SUBJECT)
  • binary PERSONS(NAME,ADDRESS) OWNER(PERSON
    NAME,HOUSE ADDRESS)
  • ternary HOUSES(ADDRESS,PRICE,SIZE)

67
How a relational database works
  • a key of a relation is a subset of attributes
    with the following properties
  • unique identification
  • The value of the key is unique for each tuple
  • nonredundancy
  • no attribute in the key can be discarded without
    destroying the key's uniqueness
  • A prime attribute of a relation is an attribute
    which participates in at least one key
  • All other attributes are non-prime

68
Relational database key example
  • For example, a phone number is a unique key in a
    phone directory
  • in the normal phone directory the key attributes
    are last name, first name, street address
  • if street address is dropped from this key, the
    key is no longer unique (many Smith, Mary's)

69
Pros and cons
  • the most flexible of the database models
  • no obvious match of implementation to model -
    model is the user's view, not the way the data is
    organized internally
  • is the basis of an area of formal mathematical
    theory

70
Pros and cons (contd)
  • most RDBMS data manipulation languages require
    the user to know the contents of relations, but
    allow access from one relation to another through
    common attributes Example Given two relations
    PROPERTY(ADDRESS,VALUE,COUNTY_ID) COUNTY(COUNTY
    ID,NAME,TAX_RATE)
  • to answer the query "what are the taxes on
    property x" the user would

71
Pros and cons (contd)
  • retrieve the property record
  • link the property and county records through the
    common attribute COUNTY_ID
  • compute the taxes by multiplying VALUE from the
    property tuple with TAX_RATE from the linked
    county tuple

72
Data Interpolation
  • Process of estimating the value of data at a
    location where no data were collected

73
Example
  • Elevation data are collected at points
  • Carbon Dioxide data were collected at points

74
Triangulated Irregular Network
  • TIN is composed of
  • nodes
  • edges
  • triangles
  • hull polygons
  • topology

75
Nodes
  • The fundamental building blocks of the TIN
  • They originate from the points and arc vertices
    contained in the input data sources.
  • Every node is incorporated in the tin
    triangulation.
  • Every node in the tin surface model must have a z
    value.

76
Creating Triangles
Delaunay criterion If you draw a circle around
the triangle with each of the points intersecting
the circle, no other point may fall within the
circle
77
Take the points, and draw triangles
78
Edges
  • Every node is joined with its nearest neighbors
    by edges to form triangles which satisfy the
    Delaunay criterion.
  • Each edge has two nodes, but a node may have two
    or more edges.

79
Triangles
  • Each facet describes the behavior of a portion of
    the tins surface.
  • The x,y,z coordinate values of a triangles three
    nodes can be used to derive information about the
    facet, such as slope, aspect, surface area, and
    surface length.

80
Hulls
  • The hull is formed by one or more polygons
    containing the entire set of data points used to
    construct the tin.
  • The hull polygons define the zone of
    interpolation of the tin.

81
Hull Types
Convex Hull
Hull
82
Topology
  • Topology is maintained with information of each
    triangles nodes, edge numbers and type, and
    adjacency to other triangles.
  • For each triangle, TIN records
  • The triangle number
  • The numbers of each adjacent triangle
  • The three nodes defining the triangle
  • The x,y coordinates of each node
  • The surface z value of each node
  • Also records the series of nodes that make up
    the hull

83
Example
84
Triangle topology
85
Coordinates and attributes
86
What can we do with TINs
  • Calculate slope, aspect, and elevation at any
    point on the surface
  • Because edges have a node with a z value at each
    end, it is possible to calculate a slope along
    the edge from one node to the other.

87
Calculating elevation along an edge
Using IDW (inverse distance weighting), we can
estimate the elevation value along any node
New Z (1000.5)(800.5) 90
88
In fact, we can estimate a z value anywhere in
our TIN
900.6 800.4 54 32 86
860.5 900.5 4345 88
89
Now, imagine doing this for each square dot
location
90
What youve created is a surface called a lattice
  • Lattice rows and columns of continuous values

91
Comparison
Square Grid Tessellation
Lattice
92
Four sources
  • Data that are collected in a raster format (e.g.,
    satellite data)
  • Data in vector format converted to raster format
  • Data in a paper map converted to raster format
  • Interpolating data from points

Method of converting our TIN into a Lattice
93
In class exercise
94
Network Data Model
95
(No Transcript)
96
(No Transcript)
97
Some Terms
  • Edges (links) Streets, transmission lines, pipe,
    and stream
  • Junctions (nodes) Street intersections, fuses,
    switches, service taps, and the confluence of
    stream reaches are examples of junctions
  • Impedance the cost associated with traveling
    along a specific link

98
  • Edges connect together at junctions
  • The flow from one edge to another edge through
    junctions
  • Automobiles, electrons, water - can be
    transferred to another edge
  • Impedance can be applied to edges or junctions

99
Types of networks
  • Straight network (animal movement)
  • Branching network (stream patterns)
  • Circuit (street patterns)
  • Directed flows can move in a single direction
  • Undirected flows can move in either direction

100
Analysis of networks
  • Connectivity
  • gamma index ratio between the number of links in
    a network to the maximum possible
  • alpha index ratio of the number of routes
    through a network to the maximum possible
  • Shortest path

101
Gamma index
102
Gamma Index
103
(No Transcript)
104
(No Transcript)
105
Algorithm Terms
  • Nodes
  • Origin node
  • Adjacent node
  • Reached node
  • Scanned node
  • Unscanned node
  • Cumulative Cost
  • Tables
  • Scanned table
  • Reached table Cumulative Cost

106
(No Transcript)
107
(No Transcript)
108
Step One
109
Step Two
110
Step Three
111
Step Four
112
Step Five
113
Step Six
114
Step Seven
115
Step Eight
116
Step Nine
117
Step Ten
118
Step Eleven
119
Step Twelve
120
Step Thirteen
121
Break time!
122
Raster data
123
What type of data?
  • Continuous data
  • Examples elevation, temperature
  • Square grid tessellation also called raster

124
Raster Models (tessellation)
125
Raster
Data values are stored in rows and columns
126
Two types
  • Scanned Map images
  • Digital Raster Graphic
  • Other maps
  • Tessellation Models
  • Square Grid Tessellation
  • Hexagon Tessellation

127
Scanned Maps
  • Scanned map as a photograph
  • The value of each cell represents the color on
    the map needs to be interpreted the way a
    paper/analog map is interpreted

128
Digital Raster Graphic (DRG)
There is typically another file linked with the
DRG, so that the geographic position of the
graphic is known
129
MapQuest
130
Maps or Images??
131
Summary of scanned maps
  • Have the characteristics of an analog map in that
    the location information and the attributes are
    stored as a visual product
  • No queries can be made based on the database

132
Tessellation Models
  • Location-based spatial data model process of
    dividing an area into smaller, contiguous tiles
    with no gaps between them
  • Types
  • regular and irregular
  • Uses continuous surfaces
  • Pros easy to implement and manipulate
  • Cons high data storage, output not cartographic
    quality

133
Spatial and Attribute Data
  • Combined in a single file
  • Unlike the scanned maps, they can be searched

134
Tessellation Models
  • Regular

Most common
Rarely used
135
Tessellation models
Regular grid
136
Data
  • Rows and columns containing the attribute value
    associated with each data layer
  • The row/column location of the data value
    represents the spatial position
  • Exact geographic position is typically
    established with header information before the
    rows and columns of data
  • Also need knowledge of what the values represent
    (e.g., elevation in meters) typically part of
    the metadata

137
Rows and Columns
138
Geographic Position
origin
orientation
size of each cell
139
(No Transcript)
140
(No Transcript)
141
Sample data
142
Each cell has a value
143
Data File
Origin (x,y) Ymax (x,y) Row,col Cell size
144
Tessellation models
Hexagonal mesh
Primary advantage over square grid tessellation
is distance measurements. Important in
applications that need to spread distances evenly
- e.g., spread of forest fires
145
Distance between adjacent cells?
Example modeling the spread of a fire from one
cell to the next adjacent cell.
146
Distance measurements between cells is the same
in the hexagon model
147
Where do we get raster data? Four sources
  • Data that are collected in a raster format (e.g.,
    satellite data)
  • Data in vector format converted to raster format
  • Data in a paper map converted to raster format
  • DRG
  • Converted into a tessellation database
  • Interpolating data from points

148
One satellite data
  • Example Landsat Thematic Mapper (TM) data from
    USGS

149
(No Transcript)
150
(No Transcript)
151
Multispectral
  • Multispectral meaning that each cell has more
    than one value (different sections of the
    electromagnetic spectrum) associated with it
    (these are called bands)

152
Bands and Resolution
  • Fixed spatial resolution (either 30 meters or 120
    meters) depending on the band

Landsats 4-5 Wavelength (micrometers) Resolution
(meters) Band 1 0.45-0.52 30 Band 2
0.52-0.60 30 Band 3 0.63-0.69 30 Band
4 0.76-0.90 30 Band 5 1.55-1.75 30
Band 6 10.40-12.50 120 Band 7 2.08-2.35
30
153
What can we do with the bands?
  • Band 1 penetrates water for bathymetric mapping
    along coastal areas and is useful for
    soil-vegetation differentiation and for
    distinguishing forest types.
  • Band 2 detects green reflectance from healthy
    vegetation, and
  • Band 3 is designed for detecting chlorophyll
    absorption in vegetation.
  • Band 4 data is ideal for detecting near-IR
    reflectance peaks in healthy green vegetation and
    for detecting water-land interfaces.
  • The two mid-IR red bands on (bands 5 and 7) are
    useful for vegetation and soil moisture studies
    and for discriminating between rock and mineral
    types.
  • The thermal-IR band on (band 6) is designed to
    assist in thermal mapping, and is used for soil
    moisture and vegetation studies.

154
False color
  • Bands 4, 3, and 2 can be combined to make
    false-color composite images where band 4
    represents the red, band 3 represents the green,
    and band 2 represents the blue portions of the
    electromagnetic spectrum. This combination makes
    vegetation appear as shades of red, brighter reds
    indicating more vigorously growing vegetation.
    Soils with no or sparse vegetation range from
    white (sands) to greens or browns depending on
    moisture and organic matter content. Water bodies
    will appear blue. Deep, clear water appears dark
    blue to black in color, while sediment-laden or
    shallow waters appear lighter in color. Urban
    areas appear blue-gray in color. Clouds and snow
    appear bright white. Clouds and snow are usually
    distinguishable from each other by the shadows
    associated with clouds

155
False Color Example
156
False Color example
157
False Color example
158
With the same data (NDVI)
Normalized difference vegetation index
159
Where do we get raster data? Four sources
  • Data that are collected in a raster format (e.g.,
    satellite data)
  • Data in vector format converted to raster format
  • Data in a paper map converted to raster format
  • DRG
  • Converted into a tessellation database
  • Interpolating data from points

160
Second source for raster data
  • Data that are in another format (either vector or
    paper map) and need to be converted to a raster
    format

161
Land use in vector format
To convert it, we need to decide what size each
cell needs to be. How do we decide? Minimum
mapping unit and spatial resolution.
162
Sort the database
163
Minimum mapping unit
164
Better
This would give us a 2 m cell size
165
Default settings
166
Resulting data
167
Resulting data
755 (default)
168
200 meters
169
100 meters
170
10 meters
171
(No Transcript)
172
(No Transcript)
173
(No Transcript)
174
Which is best?
vector
100 meter
10 meter
175
Area and database size comparisons
176
Three conversion from a paper map
  • Scanning can convert to a DRG or into a square
    grid or hexagon database
  • Same rules apply as with vector scanning best
    approach is to trace to mylar, then scan
  • (my personal experience it is easier to vector
    digitize, then use software to convert to raster
    format)

Note with scanning you can create either a DRG
or a tessellation database
177
Database size can be a problem Compaction
Run length encoding
178
In some cases, there is very little you can do
179
Four sources
  • Data that are collected in a raster format (e.g.,
    satellite data)
  • Data in vector format converted to raster format
  • Data in a paper map converted to raster format
  • Interpolating data from points
Write a Comment
User Comments (0)
About PowerShow.com