Title: GeoSpatial Unstructured Data
1GeoSpatial Unstructured Data
- Dan Rickman
- GeoSpatial SG
2Agenda
- What is geospatial data
- What does structured geospatial data look like?
- General data modelling issues regarding
geospatial data - In search of the BLPU
- A brief history of OS maps how structured are
they (then and now) - Raster map data
- EDRM
- Geo-parsers/gazetteers/metadata
- Web-based systems
- Future directions?
3What is Geospatial Information? - 1
- Spatial data which relates to the surface of the
Earth - Geodetic reference system as base e.g. WGS84 used
for Global Positioning System (Earth as an
ellipsoid), Latitude and Longitude (Earth as a
sphere) - Ordnance Survey (GB) define National Grid
projection onto flat surface NB OS(NI) use
Irish grid - Spatial relationships defined around concept of
neighbourhood relates to two laws of
geography - Most things influence most other things in some
way - Nearby things are usually more similar than
things which are far apart
4What is Geospatial Information? - 2
- Unstructured spaghetti data
- Topology information structured as networks,
polygons - GeoSpatial information requires metadata e.g.
minimal information such as map projection used - GeoSpatial information may also temporal
modelling e.g. farm subsidies vary as
utilisation and legislation change - Field-based model versus object-based model of
space, e.g. rainfall versus buildings on which
rain falls - GeoSpatial information requires ontology
- What is the real world, how classified
- Relates to semantics
5What are GeoSpatial Systems?
- Known as Geographic Information Systems, Spatial
Information Systems - Enables capture, modelling, storage, retrieval,
sharing, manipulation and analysis of
geographically referenced data - Database is at the heart as is attribute data
- Model developing perhaps GeoSpatial data better
seen as attribute of alphanumeric business
information - Presentation does not have to be map-based in all
cases - Key element is spatial indexing uses different
techniques to alphanumeric indexing
6Where used? Examples
- Central government DEFRA, ODPM, Land Registry,
ONS - Local government planning, highways authorities
- Utilities physical and logical network
- Insurance flood plains
- Health epidemiology
- Travel, multi-modal route planning
- More widespread use addresses, postcode based
data against regional boundaries, infrastructure
(geographies used to divide country, catchment
area) - Fiat boundaries verus bona fide boundaries
what is real world how do we structure it?
7Structured geo-databaseParadigm shift?
Relational Database (Attribute data)
Spatial Data (proprietary format)
Real Time/Engineering Systems
CRM
ERP
- Spatially extended RDBMS
- Complex data types for spatial data
- Computational geometry
- Spatial indexing
- DDL and DML extensions
8GIS
9ROMANSE - Hampshire CC
10(No Transcript)
11Roadwork Information
12Geospatial data modelling
- Field-based model versus object-based model
- Geographic Information Systems are object-based
in practice - Most common field based information, e.g. Digital
Elevation Model (line of sight applications),
attached to objects - Objects rely on field-based model, i.e. spatial
co-ordinates - Initiatives such as Digital National Framework
encourage organisations to structure data on
references to objects, not re-capture and
duplicate data - GeoSpatial equivalent of referential integrity
- Nevertheless duplication, lack of (referential)
integrity is common place and hard to eradicate
13In search of the BLPU
- Basic Land and Property Unit
- Holy grail of industry no Da Vinci code
produced yet! - Example of Ordnance Survey Master Map (OSMM)
- "St Mary's football stadium, Southampton" is one
object - Typical detached house and its plot of land,
likewise - Complex entities such as "Southampton railway
station" are defined in terms multiple objects
one for the main building, several for the
platforms, one more for pedestrian bridge over
the tracks. (NB See Wikipedia article on TOID) - Defining the candidate BLPU, their lifecycles and
their attribute data and verifying that these are
meaningful/practicable from the wide variety of
business processes which apply to the BLPU and
the aggregate entities which are created from
them - Dependencies so that data sets are based on the
BLPU wherever possible limited by business use,
e.g. field use change quite different from a
tenant/owner perspective
14Evolution of geographic information
1950
2010
15Raster map data
- Scanned ortho-rectified map or map-based data
metadata is co-ordinates, projection, extent - For example Google Maps/Google Earth, Microsoft
Virtual Earth - Traditionally stored outside the database as
external files, analogous to vector data storage,
e.g. Oracle 10g GeoRaster - Data stored as BLOBs, metadata required regarding
number of bytes per pixel, compression algorithms
and so on - Benefits limited as intelligence in map
requires interpretation - Still limited progress on map-based pattern
recognition there are semi-automated solutions
from companies such as Laser-Scan
16EDRM
- Electronic document and records management
- Increase usage in local/central government due to
Freedom of Information act - Contain potentially significant geospatial data
- Most common example is address
- Requires capture of appropriate metadata or
appropriate pattern recognition to identify
addresses - Requires gazetteers to provide reference to
spatial co-ordinates - NB most familiar gazetteer list of streets in
AtoZ maps
17Geo-parsers/gazetteers/metadata
- Geo-parsers identify spatial tags (geo-tags) in
data - Context sensitivity and patterns of usage
required - E.g. Jordan (country) ! Jordan (Katie Price)
- Can see an example at
- http//edina.ac.uk/projects/geoxwalk/geoparser.htm
l - Relies on and populates gazetteer of associated
names - Emerging standards for geo-parsing, e.g. Open GIS
Consortium looking at - Gazetteer service
- Geo-coder service
- Web services (WMS/WFS)
18Web-based systemsGoogle Earth meets Flickr
- Web-based systems (metacarta, KML, mashup)
19Web-based systems
- World wide wild west of unstructured data
- Increasing use of systems to control, coordinate
and make this accessible - Geo-enabled semantic web raises issues of
ontology - www.metacarta.com provide web-based Geographic
Text Search (GTS), has the ability to confine
searches by geography and retrieve information
that it detects using the keywords, and then
displays this information geographically on a map
interface (working now with Google Earth).
20They know where you live
- MetaCarta(R), Inc., a leading provider of
geographic intelligence, announced today that it
had won a one-year contract with the Department
of Homeland Security which identifies and
assesses current and future threats to the
homeland, maps those threats against the nation's
vulnerabilities, issues timely warnings and takes
preventative and protective action The product
automatically identifies geographic references
using advanced natural language processing (NLP)
from any type of unstructured content in a
customer's archives such as email, web pages,
newswires or cables. It assigns a latitude and
longitude to these references so that users can
analyze their text archives using geographic
maps, keywords and time as filters. The results
of a query are displayed on a map with icons
representing the locations found in the natural
language text of the documents and as a text
results list. Both the icons and text summaries
are hyperlinked to the documents they represent. - (Source http//www.prnewswire.com/cgi-bin/stories
.pl?ACCT109STORY/www/story/03-14-2005/000319390
9EDATE)
21The future (and summary)
- Structured environment will contain more
unstructured data - Web will continue to provide unstructured
distributed data - Success of semantic-based approach yet to be
determined, experience with geospatial data
indicates there are significant complexities
based around our representations of the real
world - One issue is clear increasingly less privacy,
location is already accessible through mobile
phones and linking this to other data can provide
significant intelligence information - Also clear data quality issues will persist
- They will still get it wrong!