Title: Archiving State and Local Agency Digital Geospatial Data: An Overview of the Problem Area Steven P. Morris Head of Digital Library Initiatives North Carolina State University Libraries
1Archiving State and Local Agency Digital
Geospatial Data An Overview of the Problem
AreaSteven P. MorrisHead of Digital Library
InitiativesNorth Carolina State University
Libraries
GICC Archival and Long Term Access Kickoff Meeting
February 29, 2008
2Outline
- Risks to Digital Geospatial Data
- Value in Temporal/Historical Data
- Archiving Challenges
- Content Identification and Selection Issues
- Industry Engagement
- Archives Processes
- Conclusion
3NC Geospatial Data Archiving Project
- Partnership between university library (NCSU) and
state agency (NCCGIA), with Library of Congress
under the National Digital Information
Infrastructure and Preservation Program (NDIIPP) - One of 8 initial NDIIPP collection building
partnerships - Focus on state and local geospatial content in
North Carolina (state demonstration) - Tied to NC OneMap initiative, which provides for
seamless access to data, metadata, and
inventories - Objective engage existing state/federal
geospatial data infrastructures in preservation
Serve as catalyst for discussion within industry
4NCGDAP Goals
- Repository Goal
- Capture at-risk data
- Explore technical and organizational challenges
- Project End Goal
- Data Producers Improved temporal data management
practices - Archives More efficient means of acquiring and
preserving data - Progress towards best practices
Temporal data management vs. long-term
preservation
5Risks to Geospatial Data
6How would you describe your current geospatial
archive?
7Digital Preservation Points of Failure
- Data is not saved, or
- cant be found, or
- media is obsolete, or
- media is corrupt, or
- format is obsolete, or
- file is corrupt, or
- meaning is lost
Solutions Migration Emulation Encapsulation XML
8Risks to Geospatial Data
- Producer focus on current data
- Data overwrite as common practice
- Future support of data formats in question
- No open, supported format for vector data
- Shift to web services-based access
- Data becoming more ephemeral
- Inadequate or nonexistent metadata
- Impedes discovery and use
- Increasing use of spatial databases for data
management - The whole is greater than the sum of the parts
9Value in Older Geospatial Data
10Value in Older Data Cultural Heritage
Future uses of data are difficult to anticipate
(as with Sanborn Maps)
11Value in Older Data Solving Business Problems
Land use change analysis
Site location analysis
Real estate trends analysis
Disaster response
Resolution of legal challenges
Impervious surface maps
Suburban Development 1993/2002 Near
Mecklenburg-Cabarrus County border
12Problem Flood and Hurricane Preparedness
13Application Impervious Surface Change Mapping
A.
B.
2002 Impervious
2004 Aerial Photography
C.
D.
2004 Impervious Update
2004 Impervious using 2002 Mask
14Problem Beach Erosion and Shoreline Change
15Application Shoreline Change Mapping
16Problem Tracking Land Use Change
17Application Land Use Change Mapping
Input Data
Output GIS Data
Using Mecklenburg County 2002 true color
orthorectified aerial photography
18Preservation Challenges
19Challenge Vector Data Formats
- No widely-supported, open vector formats for
geospatial data - Spatial Data Transfer Standard (SDTS) not widely
supported - Geography Markup Language (GML) diversity of
application schemas and profiles a challenge for
permanent access - Spatial Databases
- The whole is more than the sum of the parts, and
the whole is very difficult to preserve - Can export individual data layers for curation,
but relationships and context are lost - Some thinking of using the spatial database as
the primary archival platform
20Challenge Cartographic Representation
Counterpart to the map is not just the dataset
but also models, symbolization, classification,
annotation, etc.
21Challenge Geospatial Web Services
- How to capture records from decision-
- making processes?
22Challenge Preservation Metadata
Results from a 2006 survey of all 100 NC counties
and 25 largest NC municipalities
23Challenge Data Capture
2006 Frequency of Capture Survey targeting North
Carolina counties and municipalities
Response yes 65.3, no 34.7 (out of
57.6 response rate)
24Challenge Digital Object Complexity
25Where is the Dataset?
26Heres One!
- Files
- Multi-file dataset
- Georeferencing
- Metadata file
- Symbolization file
- Additional
- documentation
- License
- Disclaimer
- More
- Metadata
- FGDC
- Acquisition metadata
- Transfer metadata
- Ingest metadata
- Archive rights
- Archive processes
- Collection metadata
- Series metadata
27Other Challenges
- Rights management
- Data versioning
- Semantic issues
- Large scale content transfer
- Integrating older analog data
- More
28Different Ways to Approach Preservation
- Technical solutions How do we preserve acquired
content over the long term? - Cultural/Organizational solutions How do we make
the data more preservableand more prone to be
preservedfrom point of production?
Current use and data sharing requirements not
archiving needs are most likely to drive
improved preservability of content and
improvement of metadata
29Content Identification and Selection Issues
30What do Inventories (e.g. RAMONA) Offer to
Archives?
- Data Availability Information
- Detailed information by data layer
- Contact Information
- Minimal Metadata
- Descriptive, technical, administrative
- Rights Information
- Document Technical Environment
- Software used, formats, transfer methods
- Future Data Development Plans
31Selection Issues
- Most content is already at some level of risk
- Early-Middle-Late Stage issues
- Middle stage is usually the sweet spot, e.g.
TIFF orthophotos vs. raw images or compressed
images - Also added-value products digital maps,
cartographic representation - Digital maps record or not?
- Frequency of capture
32Problem Multiple choice for format type,
coordinate system, tiling scheme
33Geospatial Data Types Spatial Databases
- Vector and raster data
- Relationships
- Behaviors
- Annotation
- Data Models
34Geospatial Data Types Cartographic
- GIS Software
- Software project file (.mxd, .apr, )
- Data layer file (.avl, .lyr, )
- PDF map exports
- Web Services-based representations
35Other Geospatial Data Types Place-based Data
Oblique Imagery
- Mobile, LBS, and, social networking applications
- Long-term cultural heritage value in non-overhead
imagery more descriptive of place and
function
Street View Images
Tax Dept. Photos
Road Videologs
36Time series vector data Parcel Boundary Changes
2001-2004, North Raleigh, NC
Continuously updated data Frequency of
snapshots? Different for various framework
layers?
37Sept. 2006 Frequency of Capture Survey
- Survey objective
- Document current practices for obtaining archival
snapshots of county/municipal geospatial vector
data layers - Seek guidance about frequency of capture
- Survey topics
- General questions about data archiving practice
- Specific questions about parcels, street
centerlines, jurisdictional boundaries, and
zoning - Survey subjects
- All 100 counties and 25 municipalities
- 58 response rate
- Survey conducted September 2006
38Data Capture Survey Results Overview
- Two-thirds of responding agencies create and
retain periodic snapshots - Long-term retention more common in counties with
larger populations - Storage environments vary, with servers and
CD-ROMs most common - Offsite storage (or both onsite and offsite) is
used by nearly half of the respondents - Popularity of historic images has resulted in
scanning and geo-referencing of hardcopy aerial
photos among one-third of the respondents
39Survey Observations
- Process of survey formulation and implementation
helped to socialize the problem of archiving data - Local innovation needs to be mined further to
inform development of best practices - Business drivers for archiving need more study
(e.g., stated adherence to retention policy) - Exposure to peer practice encourages archiving
- Pronounced local interest in scanning/rectifying
older analog maps and imagery
40Engaging Industry
41Points of Engagement with Spatial Data
Infrastructure (e.g. NC OneMap)
- Framework data communities
- Snapshot frequency, naming schemes,
classification, GML application schemas, format
strategies - Metadata standards and outreach
- Persistent identifiers, versioning, feedback on
metadata quality - Content exchange networks/content replication
- For data improvement projects, disaster
preparedness, aggregation by regional service
providers, and archives - Where does archiving and preservation fit in?
42Content Exchange Infrastructure
- High volume of state/federal requests for local
data - Solving the present-day problems of data sharing
is a pre-requisite to solving the problem of
long-term access - Leveraging more compelling business reasons to
put the data in motion (disaster preparedness,
business continuity, highway construction,
census, ) - Content exchange networks
- Minimize need to make contact
- Add technical, administrative, descriptive
metadata - Establish rights and provenance
43Archives Processes
44Maine GeoArchives Project Components
- Retention schedules
- Geospatial data
- Administrative records
- Record accessioning
- Appraisal system
- System documentation
- Archival data and metadata standards
- Rules for disposition of local government records
45Maine GeoArchives Functional Requirements
Adopted set of functional requirements for
recordkeeping systems to insure permanent
retention of data layers
- Compliance
- Responsible
- Credibility
- Completeness
- Authenticity
- Soundness
- Auditability
- Availability
- Exportable
- Renderable
- Redactable
46Conclusion
47Key issues
- What are the points of intersection between
archive needs and business continuity/disaster
preparedness and other business needs? - How to best stimulate and learn from innovation
at the state/regional/local level? - How to make data more preservable from point of
production and on through data transfer - How to most effectively move data in an
efficient, well-documented manner with clarified
rights - How to best make State Archives a part of spatial
data infrastructure? - Defining the record data vs. derivative
components
48Cultural Changing Industry Thinking
- Is the geospatial industry temporally-impaired?
- Lack of access to older data
- Lack for tool/model support for temporal analysis
- Metadata poor support for changing data
- Education building class projects around
available data (i.e., not temporal) - Increased interest now in temporal applications?
- Increased demand for temporal data?
- Improved tool support ArcGIS 9.2 animation
tools Geodatabase History, etc.
49Questions?
Contact Steve Morris Head, Digital Library
Initiatives NCSU Libraries ph (919)
515-1361 Steven_Morris_at_ncsu.edu http//www.lib.nc
su.edu/ncgdap