Title: New Paradigms for Community Statistical Systems Session 13, Friday, March 12th, 3:30 5 p'm' Panel Pr
1New Paradigms for Community Statistical Systems
Session 13, Friday, March 12th, 330 5
p.m.Panel Presenter Brand Niemann, US EPA
- The 2004 National Community Indicators
Conference - Advances in the Science Practice of Community
Indicators - Eldorado Hotel
- Reno, Nevada
- March 10-13, 2004
2Preface
- Express appreciation for the opportunity to
participate in this conference to - Andy Reamer, Community Statistical System
Coordinator - Joe Ferreira, Massachusetts Institute of
Technology - Bob Stroh, University of Florida and
- Joe Sirgy, Conference Co-Chair.
3Abstract
- The use of eXtensible Markup Languages (XML) and
Semantic Web Services are suggested and
illustrated as a new paradigm for community
statistical systems to make them interoperable so
reuse and integration within and across domains
of interest and communities of practice are more
likely to occur. - Scalable Vector Graphics (SVG), Geography Markup
Language (GML), and LandView 6, a viewer for
spatial (administrative) databases, are also
introduced for the same purpose.
4Overview
- 1. Introduction
- 2. eXtensible Markup Language (XML)
- 3. XML Web Services
- 4. Semantic Web Services
- 5. Scalable Vector Graphics and Geography Markup
Language - 6. LandView 6
- 7. Contact Information
51. Introduction
- Even Broader Perspective Based on Conference
Presentations and Literature - Process from Federal and Local Experience
- Collaboration Opportunity define domain broadly
enough so a win-win for all (see example at
http//componenttechnology.org in next slide) - Services to be provided IT architecture and
technology (selected Service-Oriented
Architecture with Semantic Web Services) and - Sources of Assistance Experts in CoPs,
Knowledge Management, Statistics and Data Mining,
Emerging Technologies (e.g. Semantic Web), etc.
61. Introduction
A very successful public-private
collaboration/community of practice to
be featured at FOSE, March 23-25, 2004,
Washington DC Convention Center!
71. Introduction
Replace labels in previous Venn diagram with
these
Community Indicators
Decision Makers
Collaboration Opportunity
Collaboration Zone
Data Providers
Sponsors
81. Introduction
- Period of Rapid Change
- eGov Act of 2002
- Interagency Committee on Government Information
- E.g. Standards for large documents on the Web.
- Data and Information Reference Model
Intergovernmental Data Integration Pilots (5). - Semantic Web Standards from W3C
- Collaborative Web, Semantic Web, and Web of
Trust. - Work on new standards so that data access
policies can travel with the XML data
(confidentiality-personal information and
privacy-aggregation of information). - CIO Council Committees Highest Priority
(Pilots) - Architecture Infrastructure Committee
- FEA Data and Information Reference Model (DRM).
- Best Practices Knowledge Management
- Semantic Interoperability Community of Practice.
91. Introduction
- Presentation
- Highlights of this presentation and
- Last part of Data and Information Reference Model
(DRM) Pilot - Uses the Statistical Abstract of the United
States, 2003 - Awarded a free copy at the Census Bureau
Symposium last week. - Obvious first pilot to begin to demonstrate
Semantic Web standards and technologies to the
Community Indicators Conference.
101. Introduction
- Indicators are usually expressed in graphical
form based on data tables that should have
metadata (data about the data) which are usually
embedded in documents that need to be Semantic
Web Services that can be searched (find what
expect to find), discovered (find what you dont
expect to find), and exploited (reused).
111. Introduction
- Indicator data needs to be interoperable to
facilitate - (1) reuse (e.g. in statistics we should ask
compared to what?) and - (2) integration within and across domains of
interest and communities of practice. - eXtensible Markup Language (XML), XML Web
Services, and Semantic Web Services help
accomplish those needs as will be illustrated in
this presentation.
122. eXtensible Markup Language (XML)
- Markup adds structure and metadata to the data
and separates presentation with HTML from the
data in XML to facilitate different presentations
and reuse of the data. - An EPA Toxic Release Inventory (TRI) Indicator
example - lt?xml version"1.0" encoding"UTF-8" ?gt lt!--
File Name tri99table1.xml --gt - ltTRI99Table1gt
- ltTable1gt
- ltStategtNevadalt/Stategt
- EXPLANATION ltSTARTTAGgtCONTENTlt/ENDTAGgt
- ltRankgt1lt/Rankgt
- ltFugitiveAirgt1529022lt/FugitiveAirgt
- ltStackAirgt1868475lt/StackAirgt
- ltSurfaceWatergt136431lt/SurfaceWatergt
- ltUndgInjectiongt2797lt/UndgInjectiongt
- ltLandReleasesgt1.1647E09lt/LandReleasesgt
- ltOnSiteReleasesgt1.1682E09lt/OnSiteReleasesgt
- ltOffSiteReleasesgt212998lt/OffSiteReleasesgt
- ltTotalReleasesgt1.1684E09lt/TotalReleasesgt
- lt/Table1gt
- Etc.
132. eXtensible Markup Language (XML)
Display with HTML Markup
Structure and Metadata with XML Markup
http//web-services.gov/tri99table1.htm
http//web-services.gov/tri99table1.xml
Note The HTML file contains only the mapping of
the table fields to the XML database, not the
actual XML database itself!
142. eXtensible Markup Language (XML)
Grid View in XML Spy Showing Spreadsheet-like
Structure
Note XML files can be used in spreadsheets,
statistical analysis packages, etc
153. XML Web Services
- Markup adds structure and metadata to documents
that contain mixed content (text, tables,
graphics, references, etc.) - An EPA Indicator Report example
- Americas Children and the Environment Report,
2003 - The report consists of 171 pages of text, tables,
graphics, references, etc. and exists in two
basic forms - A 2 MB PDF file and
- A new HTML version on the Web.
- This document was converted to XML by several
tools but the automated conversion was
practically worthless from a semantic point of
view. - This single document covers so much information
that it will benefit immensely from semantic
dissection, linking, and augmentation (explosion
of single PDF file to multiple XML files stored
in an XML database for reuse). - As a result this report consists of the
following - Images (2.64 MB) 33 .jpeg and 33 .rdf (metadata
format to be explained later). - XML 102 files/1.65 MB for 12 sections.
163. XML Web Services
Document Structure and Links
Document Search for Cancer
See EPA Report on America's Children and the
Environment, 2003, at http//www.sdi.gov
174. Semantic Web Services
- We want to move simultaneously from static to
dynamic Web Services and from interoperable
syntax (XML) to interoperable semantics (RDF and
OWL). (See next three slides.) - We want to virtually centralize the content
instead of physically centralizing the content
(e.g. leave it distributed and connect it over
the Internet). - We want to build a fully distributed indicator
management system for the Council on
Environmental Quality based on the work for the
Interagency Working Group on Sustainable
Development Indicators at http//www.sdi.gov. - A good example is our work with the Annual
Statistical Abstract, which is an XML content
collections consisting of the conversion of about
50 PDF files and 1500 Excel files with their
metadata, which could be distributed back to the
200 some statistical programs that contribute to
it each year to eliminate most of the manual
aggregation that is done by the Census Bureau
each year, to make it a living and reusable
document. Another example is the EPA Report on
the Environment, 2003. (See next slide.)
184. Semantic Web Services
Annual Statistical Abstract of the U.S.
EPA Report on the Environment, 2003
See http//www.sdi.gov for many structured
documents on state of the environment,
indicators, etc.
194. Semantic Web Services
Enterprise Ontology and Web Services Registry
Dynamic Resources
Semantic Web Services
Web Services
Static Resources
WWW
Semantic Web
Source Derived in part from two separate
presentations at the Web Services One Conference
2002 by Dieter Fensel and Dragan Sretenovic.
Interoperable Syntax
Interoperable Semantics
204. Semantic Web Services
- The Resource Description Framework (RDF) is an
XML-based language to describe resources and is
designed to create meta data about the resource
as a standalone entity. The RDF model is often
called a triple because it has three parts (1)
a resource (2) a resources properties and (3)
the property values. - The knowledge representation community uses the
grammatical parts of a sentence (1) subject (2)
predicate and (3) object. - RDF Schema is language layer on top of RDF in
what is called the Semantic Web Stack. Above
RDF Schema is Ontologies and above that is the
third and final web in Tim Berners-Lees three
part vision (collaborative web, Semantic Web, web
of trust). - XML Topic Maps are popular implementations of
taxonomies and have complimentary characteristics
to RDF. - Three excellent resources are
- Practical RDF Solving Problems with the Resource
Description Framework, Shelley Powers, OReilly,
July 2003. - The Semantic Web A Guide to the Future of XML,
Web Services, and Knowledge Management, Wiley
Technology Publishing, June 2003 and - XML Topic Maps Creating and Using Topic Maps for
the Web, Addison Wesley, July 2002.
214. Semantic Web Services
Key Ontology Components
RDF Triple Components
The company sells batteries.
depiction
Image
knows
Person birthdate date Gender char
Object
Predicate
published
Subject
Resource
Predicate
Literal
works for
is-A
leads
Resource Description Framework
Leader
Organization
URI
Literal
Source The Semantic Web A Guide to the Future
of XML, Web Services, and Knowledge Management,
Wiley Technology Publishing, June 2003.
Property or Association
225. Scalable Vector Graphics and Geography Markup
Language
- Scalable Vector Graphics (SVG)
- An XML vector graphics standard (W3C) that
enables them to be processed efficiently,
robustly, and in an automated fashion and enables
scaling, panning, highlighting, etc. - Graphical applications that are currently
realized using bitmap graphics will start using
SVG. The scope of SVG use will expand and it will
displace the use of bitmap graphics in many
areas, prime examples of which include mapping
and GIS applications. - Geography Markup Language (GML)
- An XML-based common encoding for spatial
features. - Makes it possible to renders legacy and
third-party data and services interoperable
minimizing the coupling between components. - Enables multi-source, multi-sensor fusion.
- Can be converted to SVG on-the-fly.
235. Scalable Vector Graphics
http//www.sdi.gov/Svg/120MonthLoan_Advanced.htm
245. Geography Markup Language
Galdos Viewer for U.S. Census Data at
http//web-services.gov/tigerviewer.zip (unzip
and run display.html then load the sample
TIGER/GML for Manhattan)
256. LandView 6
- A Viewer for the Environmental Protection Agency,
U.S. Census Bureau, and U.S. Geological Survey
Data and Maps - http//landview.census.gov
- The LandView 6 Demo system provides the complete
LandView software, but with a small extract of
data and maps, from Prince William County,
Virginia. The demo can be used to evaluate
whether this product is suitable for your needs.
The LandView 6 Tutorial uses this demo system in
its examples - http//www.census.gov/geo/landview/lv6/lv6demo.htm
l
266. LandView 6
- What is LandView 6?
- LandView has its roots in the CAMEO software
(Computer-Aided Management of Emergency
Operations). CAMEO was developed by the EPA and
the NOAA to facilitate the implementation of the
Emergency Planning and Community Right-to-Know
Act. This far-reaching law requires communities
to develop emergency response plans addressing
chemical hazards and to make available to the
public information on chemical hazards in the
community. - This product contains both database management
software and mapping software used in the CAMEO
system to create a simple computer mapping system
involving two programs - MARPLOT and LandView. - The MARPLOT mapping program allows users to map
Census 2000 legal and statistical areas, EPA
EnviroFact sites, and USGS Geographic Names
Information (GNIS) features.
276. LandView 6
- What is LandView 6? (continued)
- The LandView database system allows users to
retrieve Census 2000 demographic and housing
data, EPA EnviroFacts data and USGS GNIS
information. The GNIS contains over 1.2 million
records which show the official federally
recognized geographic names for all known places,
features, and areas in the United States that are
identified by a proper name. - The LandView 6 and MARPLOT software included on
the DVDs were created by agencies of the U.S.
Government and are in the public domain. They can
be copied, used and distributed freely without
the requirement for royalty payments or further
permissions. However, the Census Bureau cannot
provide technical support for products created by
others using LandView. - LandView 6 is available on a set of 2 DVDs that
cover the entire nation. A national version is
also available for installation on a network
server or individual computer hard drives.
287. Contact Information
- Brand Niemann
- U.S. Environmental Protection Agency, Office of
Environmental Information (Office of the Chief
Information Officer) - Enterprise Architecture Team.
- Computer Scientist and Semantic XML Web Services
Specialist. - 202-566-1657, niemann.brand_at_epa.gov.
- Interagency Working Group on Sustainable
Development Indicators - http//www.sdi.gov.
- CIO Councils Emerging Technology Subcommittee
- http//web-services.gov.
- http//componenttechnology.org.
- CIO Councils Semantic (Web Services)
Interoperability Community of Practice - http//km.gov and http//web-services.gov