Noriel Christopher C' Tiglao, Dr' Eng - PowerPoint PPT Presentation

1 / 103
About This Presentation
Title:

Noriel Christopher C' Tiglao, Dr' Eng

Description:

Time of the day that each racer finished ... The finishing places of each racer, i.e. 1st place, 2nd place, 3rd place. Ordinal ... – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 104
Provided by: norielchri
Category:

less

Transcript and Presenter's Notes

Title: Noriel Christopher C' Tiglao, Dr' Eng


1
Spatial Data Representation and Analysis
Module 3
  • Noriel Christopher C. Tiglao, Dr. Eng
  • 24 January 4 February 2005
  • Statistical Research and Training Center (SRTC)
  • Quezon City, Metro Manila

2
Presentation Outline
  • Introduction
  • Spatial Data and Spatial Relationships
  • Sampling Reality
  • Scales of Measurement
  • Data Sources and Errors
  • Data Abstraction
  • Spatial Data Structures

3
Introduction
  • The world is infinitely complex
  • The contents of a spatial database represent a
    particular view of the world
  • User sees the real world through the medium of
    the database
  • The measurements and samples contained in the
    database must present as complete and accurate a
    view of the world as possible

4
Introduction (contd.)
  • The contents of the database must be relevant in
    terms of
  • themes and characteristics captured
  • the time period covered
  • the study area

5
Representing Reality
  • A database consists of digital representations of
    discrete objects
  • The features shown on a map, e.g. lakes,
    benchmarks, contours can be thought of as
    discrete objects
  • The contents of a map can be captured in a
    database by turning map features into database
    objects

6
Representing Reality (contd.)
  • Many of the features shown on a map are
    fictitious and do not exist in the real world
  • contours do not really exist, but houses and
    lakes are real objects
  • The contents of a spatial database include
  • digital versions of real objects, e.g. houses
  • digital versions of artificial map features, e.g.
    contours
  • artificial objects created for the purposes of
    the database, e.g. pixels

7
Data
  • Data are facts
  • some facts are more important to us than others.
    Some facts are important enough to warrant
    keeping track of them in a formal, organized way
  • "Data" is a broad concept that can include things
    such as pictures (binary images), programs, and
    rules
  • Informally, data are the things you want to store
    in a database

8
Spatial vs. Non-spatial Data
  • Spatial data includes location, shape, size, and
    orientation
  • Spatial data includes spatial relationships
  • Non-spatial data (also called attribute or
    characteristic data) is that information which is
    independent of all geometric considerations

9
Spatial vs. Non-spatial Data (contd.)
  • It is possible to ignore the distinction between
    spatial and non-spatial data. However, there are
    fundamental differences between them
  • spatial data are generally multi-dimensional and
    autocorrelated.
  • non-spatial data are generally one-dimensional
    and independent

10
Spatial vs. Non-spatial Data (contd.)
  • These distinctions put spatial and non-spatial
    data into different philosophical camps with
    far-reaching implications for conceptual,
    processing, and storage issues.
  • For example, sorting is perhaps the most common
    and important non-spatial data processing
    function that is performed
  • It is not obvious how to even sort locational
    data such that all points end up nearby their
    nearest neighbors

11
Spatial Relationships
  • Describe the association among different features
    in space
  • are visually obvious when data are presented in
    the graphical form
  • however, it is difficult to build spatial
    relationships into the information organization
    and data structure of a database

12
Spatial Relationships (contd.)
  • Difficulty in capturing spatial relationships in
    a database
  • there are numerous types of spatial relationships
    possible among features
  • recording spatial relationships implicitly
    demands considerable storage space
  • computing spatial relationships on-the-fly slows
    down data processing particularly if relationship
    information is required frequently

13
Point-Line-Area Relationship Matrix
14
Spatial Relationships (contd.)
  • Topological
  • describes the property of adjacency, connectivity
    and containment of contiguous features
  • Proximal
  • describes the property of closeness of
    non-contiguous features

15
(No Transcript)
16
Spatial Relationships (contd.)
  • Spatial relationships are very important in
    geographical data processing and modeling
  • the objective of information organization and
    data structure is to find a way that will handle
    spatial relationships with the minimum storage
    and computation requirements

17
Spatial Data
  • Phenomena in the real world can be observed in
    three modes spatial, temporal and thematic
  • the spatial mode deals with variation from place
    to place
  • the temporal mode deals with variation from time
    to time (one slice to another)
  • the thematic mode deals with variation from one
    characteristic to another (one layer to another)

18
Spatial Data (contd.)
  • All measurable or describable properties of the
    world can be considered to fall into one of these
    modes - place, time and theme
  • An exhaustive description of all three modes is
    not possible

19
Spatial Data (contd.)
  • When observing real-world phenomena we usually
    hold one mode fixed, vary one in a controlled
    manner, and measure the third (Sinton, 1978)
  • e.g. using a census of population we could fix a
    time such as 1990, control for location using
    census tracts and measure a theme such as the
    percentage of persons owning automobiles

20
Spatial Data (contd.)
  • Holding geography fixed and varying time gives
    longitudinal data
  • Holding time fixed and varying geography gives
    cross- sectional data
  • The modes of information stored in a database
    influence the types of problem solving that can
    be accomplished

21
Location
  • The spatial mode of information is generally
    called location

22
Attributes
  • Attributes capture the thematic mode by defining
    different characteristics of objects
  • A table showing the attributes of objects is
    called an attribute table
  • each object corresponds to a row of the table
  • each characteristic or theme corresponds to a
    column of the table
  • thus the table shows the thematic and some of the
    spatial modes

23
Time
  • The temporal mode can be captured in several ways
  • by specifying the interval of time over which an
    object exists
  • by capturing information at certain points in
    time
  • by specifying the rates of movement of objects

24
Time (contd.)
  • Depending on how the temporal mode is captured,
    it may be included in a single attribute table,
    or be represented by series of attribute tables
    on the same objects through time

25
Sampling Reality
  • Numerical values may be defined with respect to
    nominal, ordinal, interval, or ratio scales of
    measurement
  • It is important to recognize the scales of
    measurement used in GIS data as this determines
    the kinds of mathematical operations that can be
    performed on the data

26
Sampling Reality (contd.)
27
Marathon Example
28
Sampling Reality (contd.)
  • Distinctions, though important, are not always
    clearly defined
  • Is elevation interval or ratio? if the local base
    level is 750 feet, is a mountain at 2000 feet
    twice as high as one at 1000 feet when viewed
    from the valley?
  • Many types of geographical data used in GIS
    applications are nominal or ordinal
  • Values establish the order of classes, or their
    distinct identity, but rarely intervals or ratios

29
Sampling Reality (contd.)
  • Thus you cannot
  • multiply soil type 2 by soil type 3 and get soil
    type 6
  • divide urban area by the rank of a city to get a
    meaningful number
  • subtract suitability class 1 from suitability
    class 4 to get 3 of anything
  • However, you can
  • divide population by area (both ratio scales) and
    get population density
  • subtract elevation at point a from elevation at
    point b and get difference of elevation

30
Multiple Representations
  • A data model is essential to represent
    geographical data in a digital database
  • There are many different data models
  • The same phenomena may be represented in
    different ways, at different scales and with
    different levels of accuracy
  • Thus there may be multiple representations of the
    same geographical phenomena

31
Multiple Representations (contd.)
  • It is difficult to convert from one
    representation to another
  • e.g. from a small scale (1250,000) to a large
    scale (110,000)
  • Thus it is common to find databases with multiple
    representations of the same phenomenon
  • this is wasteful, but techniques to avoid it are
    poorly developed

32
Primary Data Sources
  • Some of the data in a spatial database may have
    been measured directly
  • e.g. by field sampling or remote sensing
  • The density of sampling determines the resolution
    of the data
  • e.g. samples taken every hour will capture
    hour-to- hour variation, but miss shorter-term
    variation
  • e.g. samples taken every 1 km will miss any
    variation at resolutions less than 1 km

33
Primary Data Sources (contd.)
  • A sample is designed to capture the variation
    present in a larger universe
  • e.g. a sample of places should capture the
    variation present at all possible places
  • e.g. a sample of times will be designed to
    capture variation at all possible times

34
Sampling Approaches
  • In a random sample, every place or time is
    equally likely to be chosen
  • Systematic samples are chosen according to a
    rule, e.g. every 1 km, but the rule is expected
    to create no bias in the results of analysis,
    i.e. the results would have been similar if a
    truly random sample had been taken

35
Sampling Approaches (contd.)
  • In a stratified sample, the researcher knows for
    some reason that the universe contains
    significantly different sub-populations, and
    samples within each sub-population in order to
    achieve adequate representation of each
  • e.g. we may know that the topography is more
    rugged in one part of the area, and sample more
    densely there to ensure adequate representation
  • if a representative sample of the entire universe
    is required, then the subsamples in each
    subpopulation will have to be weighted
    appropriately

36
Secondary Data Sources
  • Some data may have been obtained from existing
    maps, tables, or other databases
  • To be useful, it is important to obtain
    information in addition to the data themselves
  • information on the procedures used to collect and
    compile the data
  • information on coding schemes, accuracy of
    instruments

37
Secondary Data Sources (contd.)
  • Unfortunately such information is often not
    available
  • a user of a spatial database may not know how the
    data were captured and processed prior to input
  • this often leads to misinterpretation, false
    expectations about accuracy

38
Data Standards
  • Standards may be set to assure uniformity
  • within a single data set
  • across data sets
  • e.g. uniform information about timber types
    throughout the database allows better fire
    fighting methods to be used, or better control of
    insect infestations
  • Data capture should be undertaken in standardized
    ways that will assure the widest possible use of
    the information

39
Sharing Data
  • It is not uncommon for as many as three agencies
    to create databases with, ostensibly, the same
    information
  • e.g. a planning agency may map landuse, including
    a forested class
  • e.g. the state department of forestry also maps
    forests
  • e.g. the wildlife division of the department of
    conservation maps habitat, which includes fields
    and forest

40
Sharing Data (contd.)
  • Each may digitize their forest class onto
    different GIS systems, using different protocols,
    and with different definitions for the classes of
    forest cover
  • this is a waste of time and money
  • Sharing information gives it added value
  • Sharing basic formats with other information
    providers, such as a department of
    transportation, might make marketing the database
    more profitable

41
Errors and Accuracy
  • There is a nearly universal tendency to lose
    sight of errors once the data are in digital form
  • are implanted in databases because of errors in
    the original sources (source errors)
  • are added during data capture and storage
    (processing errors)
  • occur when data are extracted from the computer
  • arise when the various layers of data are
    combined in an analytical exercise

42
Errors in Sources
  • Are extremely common in non-mapped source data,
    such as locations of wells, or lot descriptions
  • Can be caused by doing inventory work from aerial
    photography and misinterpreting images
  • Often occur because base maps are relied on too
    heavily

43
Classification Errors
  • Are common when tabular data are rendered in map
    form
  • Simple typing errors may be invisible until
    presented graphically
  • More complex classification errors may be due to
    the sampling strategies that produced the
    original data

44
Data Capture Errors
  • Manual data input induces another set of errors
  • Eye-hand coordination varies from operator to
    operator and from time to time
  • Data input is a tedious task - it is difficult to
    maintain quality over long periods of time

45
Accuracy Standards
  • Many agencies may not have established accuracy
    standards for geographical data
  • these are more often concerned with accuracy of
    locations of objects than with accuracy of
    attributes
  • Higher accuracy requires better source materials
  • is the added cost justified by the objectives of
    the study?
  • Accuracy standards should be determined by
    considering both the value of information and the
    cost of collection

46
Data Abstraction
  • Capturing the essential pieces of information to
    describe the spatial phenomenon
  • Based on a conceptual model of reality
  • Expressed in data models
  • Realized by building up of data structures (i.e.
    internal representation of spatial data) using
    database models

47
Reality, Conceptual Model and Database
Database (Cyber world)
Reality
Conceptual model
48
Data Abstraction Example
Locating objects
Recording attribute info.
Data (1.0, 20.3, 9.0, 12.8, 15.0, 10000.00)
49
Data Abstraction Example
Data (1.0, 20.3, 9.0, 12.8, 15.0, 10000.00)
What do you mean by these data?
(1.0, 20.3)
Describing a house for rent
Z15.0
(9.0, 12.8)
Abstraction rule
Monthly rent PhP 10,000.00
Conceptual model of describing house for rent.
Crucial information for data users
50
Levels of Data Abstraction
51
Data Model vs. Database Model
  • Data Model
  • Vector methods (feature-based)
  • Raster methods (field-based)
  • Database Model
  • Software implementation of data models
  • Metwork, hierarchical and object-oriented
    databases

52
Vector Data Model
  • Method of representing geographic features by the
    basic graphical elements
  • Points
  • Lines (arcs)
  • Polygons (area)
  • They can also be used to construct complex
    features

53
Basic Graphical Elements
54
Vector Data Model (contd.)
  • Related vector data are always organized by
    themes, which are also referred to as layers or
    coverages
  • examples of themes geodetic control, base map,
    soil, vegetation cover, land use, transportation,
    drainage and hydrology, political boundaries,
    land parcel and others

55
Vector Data Model (contd.)
  • For themes covering a very large geographic area,
    the data are always divided into tiles so that
    they can be managed more easily
  • a tile is the digital equivalent of an individual
    map in a map series
  • a tile is uniquely identified by a file name

56
Vector Data Model (contd.)
  • A collection of themes of vector data covering
    the same geographic area and serving the common
    needs of a multitude of users constitutes the
    spatial component of a geographical database
  • Graphical data captured by imaging devices in
    remote sensing and digital cartography (such as
    multi-spectral scanners, digital cameras and
    image scanners) are made up of a matrix of
    picture elements (pixels) of very fine resolution

57
Raster Data Model
  • Method of representing geographic features by
    pixels
  • A raster pixel is usually a square grid cell but
    there are there are several variants such as
    triangles and hexagons

58
Raster Data Model (contd.)
  • A raster pixel represents the generalized
    characteristics of an area of specific size on or
    near the surface of the Earth
  • the actual ground size depicted by a pixel is
    dependent on the resolution of the data, which
    may range from smaller than a square meter to
    several square kilometers
  • Raster data are organized by themes, which is
    also referred to as layers

59
Raster Data Model (contd.)
  • Raster data covering a large geographic area are
    organized by scenes (for remote sensing images
  • The raster method is based on the concept that
    geographic features are represented as surfaces,
    regions or segments

60
Vector Data Structure
  • Spaghetti
  • a direct line-for-line unstructured translation
    of the paper map has very limited practical use
  • it is usually an interim data structure for map
    digitizing
  • Hierarchical
  • a vector data structure developed to facilitate
    data retrieval by separately storing points,
    lines and areas in a logically hierarchical manner

61
Spaghetti Data Model and Data Structure
62
Hierarchical Data Model and Data Structure
63
Vector Data Structure (contd.)
  • Topological
  • a vector data structure that captures spatial
    relationship by explicitly storing adjacency
    information
  • the basic logical feature for line and area
    coverage is a straight line segment
  • each individual line segment is defined by the
    coordinates of its end points called nodes

64
Topological Data Model and Data Structure
65
Raster Data Structure
  • Space is subdivided into regular grids of square
    grid cells or other forms of polygonal meshes
    known as picture elements (pixels)
  • the location of each cell is defined by its row
    and column numbers
  • the area that each cell represents defines the
    spatial resolution of the data
  • the position of a geographic feature is only
    recorded to the nearest pixel

66
Raster Data Structure (contd.)
  • the value stored for each cell indicates the
    types of the object, phenomenon or condition that
    is found in that particular location
  • different types of values can be coded integers,
    real numbers and alphabets
  • integer values often act as code numbers, which
    are referenced to names in an associated table
    (called the look-up table) or legend
  • different attributes at the same cell location
    are stored as separate themes or layers

67
Characteristics of Raster Data Structure
68
Raster Data Structure (contd.)
  • There are several variants to the regular grid
    raster data structure, including
  • irregular tessellation (e.g. triangulated
    irregular network (TIN))
  • hierarchical tessellation (e.g. quad tree) and
  • scan-line

69
Representing Fields
  • there are many ways of representing fields
  • not all are implemented in GIS
  • different terminologies exist in different
    disciplines
  • Six major representations
  • Regular cells
  • Rectangular grid of points
  • Irregularly spaced points
  • Digitized contours
  • Polygons
  • Triangulated irregular networks (TINs)

70
Regular Cells
  • Value in each cell is an average, total, or some
    other aggregate property of the field within the
    cell
  • the representation defines a value everywhere, so
    is complete
  • however, all within-cell variation is lost
  • if necessary, it must be reconstructed by some
    method of intelligent guesswork
  • e.g. remote sensing data and other kinds of
    digital imagery

71
(No Transcript)
72
Rectangular grid of points
  • e.g. measurements of land surface elevation in a
    digital elevation model (DEM)
  • spacing of measurements is critical to accuracy
    of representation
  • all variation between sample points is lost
  • elevations at other points must be estimated by
    some method of intelligent guesswork (the
    representation is incomplete)

73
(No Transcript)
74
Irregularly spaced points
  • The field's value is defined at a set of sample
    points scattered in the frame
  • values of the field at other points must be
    interpolated representation is incomplete
  • e.g. weather data, available at scattered weather
    stations
  • accuracy depends on the density of points
  • it is not clear what measure best defines
    accuracy - density per unit area, minimum
    distance between sample points, maximum distance

75
(No Transcript)
76
Digitized contours
  • The field is represented as a set of isolines,
    each connecting points of constant value
  • representation is incomplete
  • The scale of measurement of the variable must be
    at least ordinal
  • isolines cannot be defined for nominal data
  • Each isoline is represented as a polyline
  • e.g. data obtained from topographic maps
  • Accuracy depends on
  • the number of contoured values, or the contour
    interval
  • the density of polyline points

77
(No Transcript)
78
Polygons
  • The frame is partitioned into irregular areas
    (volumes for 3 or more dimensions)
  • value in each area is an average, total, or some
    other aggregate property of the field within the
    area
  • the representation is complete
  • all variation within areas is lost
  • e.g. data obtained from maps of vegetation cover
    class, soil type

79
Polygons (contd.)
  • The boundaries of areas are continuously curved
    lines
  • represented digitally as polylines - an ordered
    sequence of points connected by straight lines
  • the denser the points, the more accurate the
    polyline as a representation of a continuous
    curve
  • accuracy depends both on the size of polygons and
    on the density of polyline points
  • it is not clear what measure of polygon size -
    average, minimum - best defines accuracy

80
Polygons (contd.)
  • Every point in the frame lies in exactly one
    polygon
  • except for points on the boundaries
  • the polygons cannot overlap, must exhaust the
    frame
  • they are said to tesselate the space, they form
    an irregular tesselation

81
(No Transcript)
82
Triangulated irregular networks (TINs)
  • the frame is covered with a mesh of irregular
    triangles
  • every point lies in exactly one triangle, or on a
    triangle edge
  • the value of the field is known at every triangle
    vertex
  • within triangles and along edges it is assumed to
    vary linearly
  • the representation is complete
  • contours drawn across triangles will therefore
    always be straight and parallel
  • across triangle edges there will be breaks of
    slope, but not cliffs
  • contours will kink at edges

83
Triangulated irregular networks (TINs)
  • the scale of measurement of the variable must be
    at least interval
  • variation within triangles cannot be defined for
    nominal or ordinal variables

84
Triangulated irregular networks (TINs)
  • Accuracy depends on
  • how carefully the vertices were located on the
    surface
  • how well the planes defined within each triangle
    fit the actual surface
  • the sizes of triangles
  • but it is not clear what property of triangle
    size best defines accuracy - average, smallest,
    largest

85
(No Transcript)
86
Vector and Raster Data Integration
  • Recent advances in computer technologies allow
    these two types of data to be used in the same
    applications
  • computers are now capable of converting data from
    the vector format to the raster format
    (rasterization) and vice versa (vectorization)
  • computers are now able to display vector and
    raster simultaneously
  • vector and raster data are largely seen as
    complimentary to, rather than competing against,
    one another in geographic data processing

87
Georelational Data Structure
  • Was developed to handle geographic data
  • It allows the association between spatial
    (graphical) and non-spatial (descriptive) data
  • It is the data structure used by many
    vector-based GIS software packages
  • Both spatial and non-spatial data are stored in
    relational tables

88
Georelational Data Structure (contd.)
  • Point, line and polygon data are stored in
    separate feature attribute tables (FAT)
  • in the FAT, each entity is assigned a unique
    feature identifier (FID)
  • topological information is explicitly stored by
    employing a method similar to the topological
    data structure described above
  • non-spatial data are stored in relational tables

89
Feature Attribute Table (FAT)
90
Georelational Data Structure (contd.)
  • Entities in the spatial and non-spatial
    relational tables are linked by the common FIDs
    of entities

91
Linking spatial and non-spatial tables
92
Data Modeling
  • Process of defining real world phenomena or
    geographic features of interest in terms of their
    characteristics and their relationships with one
    another
  • it is concerned with different phases of work
    carried out to implement information organization
    and data structure

93
Data Modeling (contd.)
  • There are three steps in the data modeling
    process, resulting in a series of progressively
    formalized data models as the form of the
    database becomes more and more rigorously defined
  • conceptual data modeling - defining in broad and
    generic terms the scope and requirements of a
    database
  • logical data modeling - specifying the user's
    view of the database with a clear definition of
    attributes and relationships
  • physical data modeling - specifying internal
    storage structure and file organization of the
    database

94
Conceptual Data Modeling
  • Entity-relationship (E-R) modeling is probably
    the most popular method of conceptual data
    modeling
  • It is sometimes referred to as a method of
    semantic data modeling because it used a human
    language-like vocabulary to describe information
    organization

95
Conceptual Data Modeling (contd.)
  • It involves four aspects of work
  • identifying entities
  • an entity is defined as a person, a place, an
    event, a thing, etc.
  • identifying attributes
  • determining relationships
  • drawing an entity-relationship diagram (E-R
    diagram)

96
Sample E-R Diagram
97
Logical Data Modeling
  • Comprehensive process by which the conceptual
    data model is consolidated and refined
  • the proposed database is reviewed in its entirety
    in order to identify potential problems such as
  • irrelevant data that will not be used
  • omitted or missing data
  • inappropriate representation of entities
  • lack of integration between various parts of the
    database
  • unsupported applications
  • potential additional cost to revise the database

98
Logical Data Modeling (contd.)
  • The end product of logical data modeling is a
    logical schema
  • the logical schema is developed by mapping the
    conceptual data model (such as the E-R diagram)
    to a software-dependent design document

99
Logical Schema Example
100
Physical Data Modeling
  • Database design process by which the actual
    tables that will be used to store the data are
    defined in terms of
  • data format - the format of the data that is
    specific to a database management system (DBMS)
  • storage requirements - the volume of the database
  • physical location of data - optimizing system
    performance by minimizing the need to transmit
    data between different storage devices or data
    servers

101
Physical Data Modeling (contd.)
  • The end product of physical data modeling is a
    physical schema
  • a physical schema is also variably known as data
    dictionary, item definition table, data specific
    table or physical database definition
  • it is both software- and hardware specific
  • this means the physical schemas for different
    systems look different from one another

102
Physical Schema Example
103
End
Write a Comment
User Comments (0)
About PowerShow.com