Title: Preserving State and Local Agency Digital Geospatial Data NC Geospatial Data Archiving Project (NCGDAP) North Carolina State University Libraries North Carolina Center for Geographic Information
1Preserving State and Local Agency Digital
Geospatial DataNC Geospatial Data Archiving
Project (NCGDAP)North Carolina State University
LibrariesNorth Carolina Center for Geographic
Information AnalysisPresented by Steve
Morris Head of Digital Library
Initiatives NCSU Libraries
URISA Annual Conference
August 22, 2007
2NC Geospatial Data Archiving Project
- Partnership between university library (NCSU) and
NC Center for Geographic Information Analysis - Part of the Library of Congress National Digital
Information Infrastructure and Preservation
Program (NDIIPP) - Focus on state and local geospatial content in
North Carolina (state demonstration) - Tied to NC OneMap initiative, which provides for
seamless access to data, metadata, and
inventories - Objective engage existing state/federal
geospatial data infrastructures in preservation
Serve as catalyst for discussion within industry
3Collection Focus State and Local Government
Geospatial Data
- 96 of 100 North Carolina Counties have GIS
systems as do many municipalities - Over 30 state agency data producers
- Exceptional value
- Detailed, current, accurate
- Exceptional risk
- Inconsistent or nonexistent archiving practices
- Complicated formats and complex objects
Source NC OneMap
4NCGDAP Data Types Digital Orthophotography
- All 100 NC counties with orthos
- 1-5 flight years per county
- 30-300 gb per flight
5NCGDAP Data Types Vector GIS
- Detailed, accurate, current
- Frequently updated
- Cadastral (tax parcels)
- Street centerlines
- Zoning
- Topographic contours
- School, sheriff, fire
- Voting precincts
- More
6NCGDAP Data Types Other (now and future)
Digital cartographic products web services
Digital cartographic products files
Remote sensing e.g. LIDAR data
Place-based data
7Carrboro, NC Population 17,797 (2005 est.)
22 downloadable GIS data layers
10 web mapping applications
3 OGC WMS services (web services)
9 downloadable PDF map layers
8Digital Preservation Points of Failure
- Data is not saved, or
- cant be found, or
- media is obsolete, or
- media is corrupt, or
- format is obsolete, or
- file is corrupt, or
- meaning is lost
Solutions Migration Emulation Encapsulation XML
9Value in Older Data Cultural Heritage
Future uses of data are difficult to anticipate
(as with Sanborn Maps)
10Value in Older Data Solving Business Problems
Land use change analysis
Site location analysis
Real estate trends analysis
Disaster response
Resolution of legal challenges
Impervious surface maps
Suburban Development 1993/2002 Near
Mecklenburg-Cabarrus County border
11NC Spatial Data Infrastructure NC OneMap
- NC OneMap is a next generation mechanism to
coordinate and disseminate geographic information
in North Carolina and interact with the NSDI. - Objectives
- Build a common
- understanding of North
- Carolina data resources
- Enable widespread
- access and distribution
- of geospatial data
12NC OneMap
- Objectives (cont.)
- Develop ongoing data
- inventory for all geospatial data
- holdings RAMONA
- http//nc.gisinventory.net
- Develop content standards
- for key data themes
- NC Geographic Information
- Coordinating Council (GICC)
- One of the defined characteristics of NC OneMap
is that Historic and temporal data will be
maintained and available.
13NCGDAP Leveraging Existing SDI
- NC OneMap "Historic and temporal data will be
maintained and available - Metadata outreach and content standards
- Data inventory (RAMONA)
- Emerging data distribution networks
- Centerline Data Distribution System
- Orthophoto sneakernet
- More
- Data sharing agreements
- Regional partnerships
14Stewardship Informing Other Infrastructure
- NC GIS Inventory
- Efficient data identification
- Adding preservation elements
Orthophoto Data Distribution System Efficient
transfer of large quantities of imagery
- NC OneMap Data Download and Viewer
- Public access
- Data visualization
Street Centerline Data Distribution
System Efficient transfer of data from 100
counties, with metadata and clarified rights
15Frequency of Capture Survey Objectives
- Targeted all 100 counties and 25 largest
municipalities - Tried to avoid focus on backup strategies
(difficult) - Constraint Keep survey short and interesting
- Concurrent with RAMONA inventory push
- Needed to avoid adding to contact fatigue
associated with various survey efforts - Work towards best practices in archives
- Provide guidance back to local producers
- Inform efforts at State Archives
- Engage community in discussion about archiving
- Harvest use cases for older data to sell value of
archives
16Question 1 (the filter)
Do you create periodic snapshots of any vector
datasets for long-term retention and archiving?
- Response
- yes 65.3,
- no 34.7
- (out of 57.6
- response rate)
Respondents answering No automatically skip
most of the remaining questions
17Key Results Capture Frequency
18Key Results Formats
19Key Results Formats
20Key Results Metadata
21Key Results Storage
22Key Results Storage
23Key Results Reasons for Archiving
24Survey Observations
- Process of survey formulation and implementation
helped to socialize the problem of archiving data - Local innovation needs to be mined further to
inform development of best practices - Business drivers for archiving need more study
(e.g., stated adherence to retention policy) - Exposure to peer practice encourages archiving
- Pronounced local interest in scanning/rectifying
older analog maps and imagery
25NCGDAP Next Steps
- Continuing development of demonstration
repository - Formalizing involvement of State Archives
- Further development of data distribution
infrastructure (e.g. Centerline Data Distribution
System) - Increased temporal content in NC OneMap access
system (data download and web services) - Open Geospatial Consortium Data Preservation
Working Group
26Questions?
Steve Morris Head, Digital Library
Initiatives Steven_Morris_at_ncsu.edu
Web site http//www.lib.ncsu.edu/ncgdap/
27Holding
28NC OneMap Viewer
29Coordinated Content Transfer
- Will allow one data snapshot to be accessible by
multiple agencies - Question Capture frequency of data snapshot?
- Survey in-the-works to identify local government
best practices, consumer agencies needs - Working Group for Roads and Transportation (WGRT)
- Stakeholder group working to build data
depository for statewide local road data - First serious effort to develop a plan for
local-to-state data sharing on a regular basis
30Framework Data Questions
- Targeting Key, Changing Framework Layers
- Parcels
- Street centerlines
- Jurisdictional boundaries
- Zoning
- Questions
- Capture frequency
- Format of snapshot
- Format conversion involved?
- Attributes saved with the geometry?
31Other Questions
- Technical questions
- Storage environment?
- Onsite or offsite storage?
- Policy questions
- Provide access to snapshots?
- What business rules drive archive development?
- Other archives questions
- How far back do archives go?
- What other data layers saved?
- Disposition of superceded orthophotos?
- Scanning/rectification of analog maps or imagery?
32Solutions Content Exchange Infrastructure
- Volume of state/federal requests for local data
(contact fatigue) spurs rethinking of archive
strategy for data acquisition - Leveraging more compelling business reasons to
put the data in motion (disaster preparedness,
highway construction, census, ) - Content exchange networks
- Minimize need to make contact
- Add technical, administrative, descriptive
metadata - Establish rights and provenance
33Signs of Hope
- Software vendors are more keenly aware of
temporal data management as a customer problem - Consulting firms increasingly see temporal data
management and archiving as a business
opportunity - Innovative practices emerging at local and state
level to complement and inform national level
activities
Viral adoption of archiving practices vs.
mandated archiving practices which will have
more effect?
34NC Frequency of Capture Survey
- Survey objective
- Document current practices for obtaining archival
snapshots of county/municipal geospatial vector
data layers - Seek guidance about frequency of capture
- Survey topics
- General questions about data archiving practice
- Specific questions about parcels, street
centerlines, jurisdictional boundaries, and
zoning - Survey subjects
- All 100 counties and 25 municipalities
- 58 response rate
- Survey conducted September 2006
35Survey Results Overview
- Two-thirds of responding agencies create and
retain periodic snapshots - Long-term retention more common in counties with
larger populations - Storage environments vary, with servers and
CD-ROMs most common - Offsite storage (or both onsite and offsite) is
used by nearly half of the respondents - Popularity of historic images has resulted in
scanning and geo-referencing of hardcopy aerial
photos among one-third of the respondents
36Toss
37Challenge Cartographic Representation
Counterpart to the map is not just the dataset
but also models, symbolization, classification,
annotation, etc.
38Challenge Vector Data Formats
- No widely-supported, open vector formats for
geospatial data - Spatial Data Transfer Standard (SDTS) not widely
supported - Geography Markup Language (GML) diversity of
application schemas and profiles a challenge for
permanent access - Spatial Databases
- The whole is more than the sum of the parts, and
the whole is very difficult to preserve - Can export individual data layers for curation,
but relationships and context are lost - Some thinking of using the spatial database as
the primary archival platform
39Challenge Geospatial Web Services
- How to capture records from decision-
- making processes?
- Possible Atlas collections from automated
- image capture
- Web 2.0 impact Emerging tiling and
- caching schemes (archive target?)
40Geospatial Data Risks
- Producer focus on current data
- Future support of data formats in question
- Shift to web services- and API-based access
- Inadequate or nonexistent metadata
- Increasing use of spatial databases for data
management
Many digital archiving challenges
41Different Ways to Approach Preservation
- Technical solutions How do we archive acquired
content over the long term? - Tools
- Hardware
- Software
- Cultural/Organizational solutions How do we make
the data more preservableand more prone to be
archivedfrom point of production? - Collaboration
- Education
- Feedback
42Geospatial data types Cartographic Project Files
Counterpart to the map is not just the dataset
but also models, symbolization, classification,
annotation, etc.
43Challenge Geospatial Web Services
- How to capture records from decision-
- making processes?
- Possible Atlas collections from automated
- image capture
- Web 2.0 impact Emerging tiling and
- caching schemes (archive target?)
44Challenge Vector Data Formats
- No widely-supported, open vector formats for
geospatial data - Spatial Data Transfer Standard (SDTS) not widely
supported - Geography Markup Language (GML) diversity of
application schemas and profiles threatens
permanent access - Spatial Databases
- The sum is more than the whole of the parts, and
the sum is very difficult to preserve - Can export individual data layers for curation
- Some thinking of using the spatial database as
the primary archival platform
45Cultural/Organizational Approaches
- Feedback to metadata outreach program
- Feedback to coordinating bodies on adherence to
content standards - Engage existing spatial data infrastructure in
archiving and preservation - Engage software vendors and standards community
- Cross-fertilize with other national archiving
efforts
Current use and data sharing requirements not
archiving needs drive improved preservability
of content and improvement of metadata
46Technical Approaches
- Receive data as is variety of distribution
methods - Migration of some at-risk formats
- Metadata remediation, standardization, and
synchronization - Distilling complex objects into repository ingest
items (not easy) - Using DSpace for demonstration purposes
- In the development use METS record as dormant
item brain within the repository
Some unsustainable activities for learning
experience
47Project Surprises Engaging Standards Efforts
- Partnered with EDINA (UK) and NARA to approach
the Open Geospatial Consortium (OGC) in 2005-2006 - Working Group charter approved by OGC Technical
Committee plenary Dec. 2006
48General Workflow
- Receive Data from Agency
- Copy data from agency source to NCSU workstation
- Create Dspace collection space for the data
- Create administrative metadata
- Process geospatial metadata
- Scan geospatial formats and migrate to archival
format - Ingest original and archival data objects, and
geospatial administrative metadata to Dspace
49Changes in the Domain Mashups, Google Earth, Map
APIs, and More
- Huge new audience for geospatial content
- Massive crossover of mainstream IT to geospatial,
spurring open source activities e.g. WMS
tiling and caching - Good enough approaches to data (formats,
quality, standards)
50Project Surprises Handling PDF as a Geospatial
Format
- The true counterpart to the old map is not the
GIS dataset but rather the finished geographic
product (map, chart, etc.) - More than dataalso classification, layering,
symbolization, annotation, modeling, more
51Preservation of Digital Geologic and Historic Maps
- Georeferenced over 450 maps scanned by NC
Geologic Survey - Maps are available for download at
http//wfs.enr.state.nc.us/NCGeologicMaps
1,200 24,000
15-min topo maps
131,680 1430,000
1500,000 12.5 M
52NCGDAP Data Types Cartographic
- GIS Software
- Software project file (.mxd, .apr, )
- Data layer file (.avl, .lyr, )
- PDF map exports
- Web Services-based representations
53Project Surprises Emerging Industry Interest in
Data Longevity
- A temporally-impaired industry begins to
discover time and the value of older data - Major vendors and consulting firms begin to see
temporal data management and analysis as a
customer problem
54Project Background
- North Carolina Geospatial Data Archiving Project
- Partnership with Library of Congress under the
National Digital Information Infrastructure and
Preservation Program - Connected with NC OneMap effort
(state/local/federal)
Issue How frequently should county and municipal
vector data layers be captured in
archives? Parcels, centerlines, jurisdictions,
zoning,
Parcel Boundary Changes 2001-2004, North
Raleigh, NC
55Points of Engagement with the OGC
- GML for archiving
- Geo Rights Management adding archive use cases
- Content packaging
- Saving data state in web services Interactions
- Content replication (OGC/Open Grid Forum talks)
- Persistent identifiers
- Data versioning (metadata and catalog support)
- Cartographic representation
Cross-fertilize between library/archives and
geospatial communities
56NC Geospatial Data Archiving Project (NCGDAP)
- Partnership between NCSU Libraries and NCCGIA
with Library of Congress under NDIIPP - One of 8 NDIIPP Digital Preservation Partners
projects - Focus on state and local geospatial content in
North Carolina (state demonstration) - Tied to NC OneMap initiative objective Historic
and temporal data will be maintained and
available. - Objective engage existing state/federal
geospatial data infrastructures in preservation
57Emerging Regional Partnerships
- Focused on development of shared infrastructure
for cultivating access to data - Becoming test beds for innovation in the area of
data sharing and data management, including
archiving
58Local Govt. Data Sharing
- Becoming more open, fewer agreements to sign
- Recent survey over 20 state and federal agencies
use local data - Problem of local governments being swamped by
requests - Many requests are more compelling than
archiving - Content transfer is non-trivial large dataset
sizes, small rural staffs, technical limitations
59Key Results Capture Frequency
60Key Results Digitization Efforts
61Key Results Attributes
62Outline
- Project Background
- Targeted Geospatial Content
- Risks to Data
- Value in Older Data
- xxxx
- xxx
- Next Steps
63NCGDAP Goals
- Repository Goal
- Capture at-risk data
- Explore technical and organizational challenges
- Project End Goal
- Data Producers Improved temporal data management
practices - Archives More efficient means of acquiring and
preserving data - Progress towards best practices
Temporal data management vs. long-term
preservation
64Data Capture Survey Results Overview
- Two-thirds of responding agencies create and
retain periodic snapshots - Long-term retention more common in counties with
larger populations - Storage environments vary, with servers and
CD-ROMs most common - Offsite storage (or both onsite and offsite) is
used by nearly half of the respondents - Popularity of historic images has resulted in
scanning and geo-referencing of hardcopy aerial
photos among one-third of the respondents
65Survey Formulation and Implementation
- Survey Formulation (Community Engagement)
- Initial questions developed by NCSU Libraries,
NCCGIA, and State Archives - Process vetted by stakeholder organizations
- Initial test run by Local Government Committee
Advisory Team
- Survey Implementation
- Used SurveyMonkey.com
- Total of 28 Questions
- Open Sept. 13-28, 2006
- Response rate 57.6
- (exceeded expectations)