Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University Libraries - PowerPoint PPT Presentation


PPT – Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University Libraries PowerPoint presentation | free to view - id: 1dba08-ZDc1Z


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation

Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University Libraries


State expression of the National Map ... National States Geographic Information Council. FGDC Historical Data Committee ... more ... – PowerPoint PPT presentation

Number of Views:76
Avg rating:3.0/5.0


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University Libraries

Preservation of Digital Geospatial Data
Challenges and Opportunities Steve Morris Head
of Digital Library Initaitives North Carolina
State University Libraries
NARA Meeting
Dec. 14, 2005
  • Digital Geospatial Data Types
  • Risks to Digital Geospatial Data
  • Overview of NC Geospatial Data Archiving Project
  • Preservation Challenges and Possible Solutions

Geospatial data types Vector data
Geospatial data types Satellite imagery
Geospatial data types Aerial imagery
Geospatial data types Aerial imagery
Geospatial data types Aerial imagery
Geospatial data types Tabular data (w/vector)
Time series vector data Parcel Boundary Changes
2001-2004, North Raleigh, NC
Time series Ortho imagery Vicinity of
Raleigh-Durham International Airport 1993-2002
Todays geospatial data as tomorrows cultural
Risks to Digital Geospatial Data
Risks to Digital Geospatial Data
  • Producer focus on current data
  • Time-versioned content generally not archives
  • Future support of data formats in question
  • Vast range of data formats in use--complex
  • Shift to streaming data for access
  • Archives have been a by-product of providing
  • Preservation metadata requirements
  • Descriptive, administrative, technical, DRM
  • Geodatabases
  • Complex functionality

NC Geospatial Data Archiving Project
  • Partnership between university library (NCSU) and
    state agency (NCCGIA)
  • Focus on state and local geospatial content in
    North Carolina (state demonstration)
  • Tied to NC OneMap initiative, which provides for
    seamless access to data, metadata, and inventory
  • Objective engage existing state/federal
    geospatial data infrastructures in preservation

Targeted Content
  • Resource Types
  • GIS vector (point/line/polygon) data
  • Digital orthophotography
  • Digital maps
  • Tabular data (e.g. assessment data)
  • Content Producers
  • Mostly state, local, regional agencies
  • Some university, not-for-profit, commercial
  • Selected local federal projects

Local Government GIS Archival Issues
  • Data resources are highly distributed and subject
    to frequent update
  • More detailed, current, accurate than
    federal/state data resources
  • North Carolina local agency GIS environment
  • 100 counties, 95 with GIS
  • 85 counties with high resolution orthophotography
  • Growing number of municipal systems
  • Value 162 million plus investment (est. in

Work plan in a Nutshell
  • Work from existing data inventories
  • NC OneMap Data Sharing Agreements as the
    blanket, individual agreements as the quilt
  • Partnership work with existing geospatial data
    infrastructures (state and federal)
  • Technical approach
  • METS with FGDC, PREMIS?, GeoDRM?
  • Dspace now re-ingest to different environment
  • Web services consumption for archival development

NCGDAP Philosphy of Engagement
Provide feedback to producer organizations/ inform
state geospatial infrastructure
Take the data as in the manner In which it can
be obtained
Wrangle and archive data
Note the Project in North Carolina Geospatial
Data Archiving Project the process, the
learning experience, and the engagement with
geospatial data infrastructures are more
important than the archive
Big Challenges
  • Format migration paths
  • Management of data versions over time
  • Preservation metadata
  • Harnessing geospatial web services
  • Preserving cartographic representation
  • Keeping content repository-agnostic
  • Preserving geodatabases
  • More

Vector Data Format Issues
  • Vector data much more complicated than image data
  • Archiving vs. Permanent access
  • An open pile of XML might make an archive, but
    if using it requires a team of programmers to do
    digital archaeology then it does not provide
    permanent access
  • Piles of XML need to be widely understood piles
  • GML need widely accepted application schemas
    (like OSMM?)
  • The Geodatabase conundrum
  • Export feature classes, and lose topology,
    annotation, relationships, etc.
  • or use the Geodatabase as the primary archival
    platform (some are now thinking this way)

GIS Software Used NC Local Agencies
Source NC OneMap Data Inventory 2004
Vector Data Format Options
  • Option A use an open format and have a really
    unfortunate transformation and limited vendor
    support for the output object
  • Option B use closed format but retain the
    original content and count on short- and
    medium-term vendor support. 
  • Option C do both to buy time and look for an
    open, ASCII-based solution. (watch GML activity)
  • No sweet spot, just an evolving and changing mix
  • flawed options that are used in combination.

Geography Markup Language Issues
  • GML still more useful as a transfer format than
    an archival format, support limited even for
  • Permanent access requirements
  • profiles and application schemas widely
    understood and supported, avoid requiring
    digital archaeology
  • role of GML Simple Features Profile?
  • Assessing formats for preservation
    sustainability factors, quality functionality
  • Apply same approach to GML profiles and
    application schemas?

Geography Markup Language Issues
  • Plans for environmental scan of existing GML
    profiles and application schemas or profiles
  • schema name (e.g. OSMM, top10NL, ESRI GML,
  • responsible agency schema has official
    government status?
  • GML version known unsupported GML components
  • schema history known interoperation with other
  • vendor support translator support stability
    over time

Managing Time-versioned Content
Managing Time-versioned Content
  • Many local agency data layers continuously
  • E.g., some county cadastral data updated
    dailyolder versions not generally available
  • Individual versioned datasets will wander off
    from the archive
  • How do users get current metadata/DRM/object
    from a versioned dataset found in the wild?
  • How do we certify concurrency and agreement
    between the metadata and the data?

Managing Time-versioned Content
  • Can we manage the relationship loosely using a
    persistent identifier link to a parent object?

Persistent ID Resolver
Parent Object Manager
Preservation Metadata Issues
  • FGDC Metadata
  • Many flavors, incoming metadata needs processing
  • Cross-walk elements to PREMIS, MODS?
  • Metadata wrapper/Content packaging
  • METS (Metadata Encoding and Transmission
    Standard) vs. other industry solutions
  • Need a geospatial industry solution for the
    METS-like problem
  • GeoDRM a likely triggerwrapper to enforce
    licensing (MPEG 21 references in OGIS Web
    Services 3)

Metadata Availability
Harnessing Geospatial Web Services
(No Transcript)
(No Transcript)
(No Transcript)
(No Transcript)
(No Transcript)
Geospatial Web Service Types
  • Image services
  • Deliver image resulting from query against
    underlying data
  • Limited opportunity for analysis
  • Feature services
  • Stream actual feature data, greater opportunity
    for data analysis
  • Other
  • Geocoding services
  • Routing
  • .etc.

(No Transcript)
Geospatial Web Services Rights Issues Example
Desktop GIS-accessible ArcIMS
  • 39 of 100 NC counties have desktop GIS-accessible
    ArcIMS services
  • It is difficult to know how many of these
    counties actually expect users to either
  • A) access data through desktop GIS for viewing
    only, or
  • B) extract and download data

Harnessing Geospatial Web Services
  • Automated content identification
  • capabilities files, registries, catalog
  • WMS (Web Map Service) for batch extraction of
    image atlases
  • last ditch capture option
  • preserve cartographic representation
  • retain records of decision-making process
  • feature services (WFS) later.
  • Rights issues in the web services space are

Web mash-ups and the New Mainstream Geospatial
Web Services
Preserving Cartographic Representation
Preserving Cartographic Representation
  • The true counterpart of the old map is not the
    GIS dataset, but rather the cartographic
    representation that builds on that data
  • Intellectual choices about symbolization, layer
  • Data models, analysis, annotations
  • Cartographic representation typically encoded in
    proprietary files (.avl, .lyr, .apr, .mxd) that
    do not lend themselves well to migration
  • Symbologies have meaning to particular
    communities at particular points in time,
    preserving information about symbol sets and
    their meaning is a different problem

Preserving Cartographic Representation
  • Image-based approaches
  • Generate images using Map Book or similar tools
  • Harvest existing atlas images
  • Capture atlases from WMS servers
  • Export layouts or maps to image
  • Vector-based approaches
  • Store explicitly in the data format (e.g. Feature
    Class Representation in ArcGIS 9.2)
  • Archive and upward-migrate existing files .avl,
    .apr, .lyr, .mxd, etc.
  • SVG, VML or other XML approaches
  • Other?

Preserving Cartographic Representation
Preserving Cartographic Representation
Repository Architecture Issues
  • Interest in how geospatial content interacts with
    widely available digital repository software
  • Focus on salient, domain-specific issues
  • Challenge remain repository agnostic
  • Avoid imprinting on repository software
  • Preservation package should not be the same as
    the ingest object of the first environment
  • Tension between exploiting repository software
    features vs. becoming software dependent

Preserving Geodatabases
  • Spatial databases in general vs. ESRI Geodatabase
  • Not just data layers and attributesalso
    topology, annotation, relationships, behaviors
  • ESRI Geodatabase archival issues
  • XML Export, Geodatabase History, File
    Geodatabase, Geodatabase Replication
  • Some looking to Geodatabase as archival platform
    (in addition to feature class export)

Geodatabase Availability
  • Local agencies, especially municipalities, are
    increasingly turning to the ESRI Geodatabase
    format to manage geospatial data.
  • According to the 2003 Local Government GIS Data
    Inventory, 10.0 of all county framework data and
    32.7 of all municipal framework data were
    managed in that format.

Evolving Geodatabase Handling Approaches
Project Stage Planned Approach
Original Proposal (Nov. 2003) Export feature classes as shapefiles archive Geodatabases less than 2 GB in size
Finalized Work Plan (Dec. 2004) Also export content as Geodatabase XML
Possible Future Work Plan Changes Explore maintenance of some archival content in Geodatabase form explore Geodatabase replication as an archive development approach archive Geodatabases of unlimited size
Efficient Content Replication
  • Content replication also needed for
  • Disaster preparedness
  • State and federal data improvement projects
  • Aggregation by regional geospatial web service
  • WFS, e.g. efficiency in complete content
  • Rsync-like function, plus rights management,
    inventory processes, metadata management,
    informed by data update cycles
  • Archiving delta files vs. complete replication
    need to avoid requiring digital archaeology in
    the future

Points of Engagement with the Open Geospatial
Consortium (OGC)
  • GML for archiving
  • GeoDRM -- Adding preservation use cases
  • Content Packaging -- Industry solution?
  • Web Services Context Documents
  • Can we save data state as well as application
  • Content Replication
  • Is this layer in the architecture?
  • Persistent Identifiers

Project Outcomes
  • Demonstration archive
  • Outreach activity planting seeds
  • International, national, state, local, commercial
  • Learning experience, informing
  • Spatial data infrastructure
  • Commercial vendors (data/software/consulting)
  • Repository software communities
  • Metadata practice (both GIS preservation)
  • Rights management developments
  • Data and interoperability standards

Content Identification and Selection
  • Work from NC OneMap Data Inventory
  • Combine with inventory information from various
    state agencies and from previous NCSU efforts
  • Develop methodology for selecting from among
    early, middle, and late stage products
  • Develop criteria for time series development
  • Investigate use of emerging Open Geospatial
    Consortium technologies in data identification

Content Acquisition
  • Work from NC OneMap Data Sharing Agreements as a
    starting point (the blanket)
  • Secure individual agreements (the quilt)
  • Investigate use of OGC technologies in capture
  • Explore use of METS as a metadata wrapper
  • Ingest FGDC metadata Xwalk to MODS? PREMIS?
  • Maybe METS DRM short term GeoDRM long term
  • Consider links to services version management
  • Get the geospatial community to tackle the
    content packaging problem (maybe MPEG 21?)

Partnership Building
  • Work within context of the NC OneMap initiative
  • State, local, federal partnership
  • State expression of the National Map
  • Defined characteristic Historic and temporal
    data will be maintained and available
  • Advisory Committee drawn from the NC Geographic
    Information Coordinating Council subcommittees
  • Seek external partners
  • National States Geographic Information Council
  • FGDC Historical Data Committee
  • more

Content Retention and Transfer
  • Ingest into Dspace
  • Explore how geospatial content interacts with
    existing digital repository software environments
  • Investigate re-ingest into a second platform
  • Challenge keep the collection repository-agnostic
  • Start to define format migration paths
  • Special problem geodatabases
  • Purse long term solution
  • Roles of data producing agencies, state agencies
    NC OneMap NCSU

Project Status
  • Completing inventory analysis stage
  • Storage system and backup deployed
  • DSpace deployed to production
  • Metadata workflow finalized
  • Ingest workflow near finalization
  • Content migration workflow near finalization
  • Regional site visits planned for coming months
  • Wide range of outreach/collaboration FGDC, ESRI,
    EDINA (JISC), USGS, OGC, TRB, etc.
  • Pilot project, georegistering digital archival
    geologic maps

Contact Steve Morris Head, Digital Library
Initiatives NCSU Libraries ph (919)