Title: Presentations and Demonstrations to the: Business Intelligence and Analytics BIA Users Group Meeting
1Presentations and Demonstrations to theBusiness
Intelligence and Analytics (BIA) Users Group
Meeting
- Brand Niemann
- U.S. EPA, OTOP/ITPPD/ITSPB
- April 13, 2004
2Overview
- 1. LandView 6 Population Estimator on DVD and the
Web - Calculates Census 2000 demographic and housing
characteristics for user defined radii. - 2. GeoResponse - Voice GIS Multimodal
Notification - Commercialization of EPAs CIO Council Showcase
of Excellence Award Winning XML and Voice XML Web
Service. - 3. Data Mining and Repository
- US EPA Toxics Release Inventory.
- 4. Application of Semantic Web Technologies to
EPA Data and Information - Semantic Web Service Portals and Networks.
3Background
- OEI Portal Project is considering the following
portal components - E-Alert
- Geospatial
- Data Analysts
- AQS Data Mart
- AQS Web Service
- All of these match the XML Web Services pilot
projects that have been conducted!
41. LandView 6 Population Estimator on DVD and the
Web
- What is LandView 6?
- LandView has its roots in the CAMEO software
(Computer-Aided Management of Emergency
Operations). CAMEO was developed by the EPA and
the NOAA to facilitate the implementation of the
Emergency Planning and Community Right-to-Know
Act. This far-reaching law requires communities
to develop emergency response plans addressing
chemical hazards and to make available to the
public information on chemical hazards in the
community. - This product contains both database management
software and mapping software used in the CAMEO
system to create a simple computer mapping system
involving two programs - MARPLOT and LandView. - The MARPLOT mapping program allows users to map
Census 2000 legal and statistical areas, EPA
EnviroFact sites, and USGS Geographic Names
Information (GNIS) features.
http//www.census.gov/geo/landview/
51. LandView 6 Population Estimator on DVD and the
Web
- What is LandView 6? (continued)
- The LandView database system allows users to
retrieve Census 2000 demographic and housing
data, EPA EnviroFacts data and USGS GNIS
information. The GNIS contains over 1.2 million
records which show the official federally
recognized geographic names for all known places,
features, and areas in the United States that are
identified by a proper name. - The LandView 6 and MARPLOT software included on
the DVDs were created by agencies of the U.S.
Government and are in the public domain. They can
be copied, used and distributed freely without
the requirement for royalty payments or further
permissions. However, the Census Bureau cannot
provide technical support for products created by
others using LandView. - LandView 6 is available on a set of 2 DVDs that
cover the entire nation. A national version is
also available for installation on a network
server or individual computer hard drives.
http//www.census.gov/geo/landview/
61. LandView 6 Population Estimator on DVD and the
Web
Population Estimator
LandView Home Screen
http//www.census.gov/geo/landview/
71. LandView 6 Population Estimator on DVD and the
Web
- The Population Estimator can be opened from the
Estimate Population within a Radius 3 button on
the LandView 6 Home page. Normally, the
population search is tracking the location of the
Focus Point in MARPLOT, and the MARPLOT
application should be running before using the
Population Estimator. In MARPLOT, once the Focus
Point has been set to the desired search point,
the Population Estimator is available from the
MARPLOT MenuBar, at Sharing/LandView/LandView
Census 2000 Population Estimator. Either pathway
opens the Estimator shown in the next slide.
http//www.census.gov/geo/landview/lv6help/pop_est
imate.html
81. LandView 6 Population Estimator on DVD and the
Web
http//www.census.gov/geo/landview/lv6help/pop_est
imate.html
91. LandView 6 Population Estimator on DVD and the
Web
- To share the rich demographics available in
LandView 6, LandView has turned to the Census
Block Group. The demographic profile in the
Estimator relies on the capture of the centroids
of Census Block Groups within the radius of a
circle. For any individual Census Block Group,
say, 55 of its area lies inside the capture
circle, and its data would be included. Another
Group might have 49 of its area inside of the
circle and it would be excluded. On sum, the
capture and loss of individual Block Groups
should average out, but not to the degree of
precision attached to population search by Census
Blocks. It is this dichotomy of data that needs
to be borne in mind when interpreting Population
Estimator results.
http//www.census.gov/geo/landview/lv6help/pop_est
imate.html
101. LandView 6 Population Estimator on DVD and the
Web
- Notes
- (1) There are 8.2 million census tabulation
blocks for the geographic areas shown in
LandView. 2.7 million blocks are excluded because
they contain zero population and zero housing
units. - (2) There are 211,267 block groups for the 50
states, District of Columbia and Puerto Rico. On
the average, each block group contains about 39
blocks. - (3) The LandView program's algorithm for
determining which block internal points fall
within the radius takes the curvature of the
earth into account. The MARPLOT mapping engine in
mapping the radius does not. Consequently for
larger radii, users might note differences in the
block point counts between MARPLOT and the
Population Estimator. - (4) An alternate population search that is
frequently of interest is Finding the Population
Within an Irregular Polygon. An example of such a
search is given in Lesson 5 of the LandView 6
Tutorial.
http//www.census.gov/geo/landview/lv6help/pop_est
imate.html
112. GeoResponse - Voice GIS Multimodal
Notification
- GeoResponse is a Distributed XML Web Service
across two companies that provides - The power of advanced notification, analysis and
response capabilities without the hefty start-up
costs of such a system. - The flexibility for various plans depending on
your needs. A simple monthly or annual fee
provides you with the calling, report, and
decision support capabilities you want without
the need to buy, maintain, and hire additional
resources. A service model provides the
flexibility that you like.
http//www.georesponse.com
122. GeoResponse - Voice GIS Multimodal
Notification
Dial deliver messages
ASR
Spatial Database
Location-based Info.
Phone Voice Interface
Public Switched Telephone Network
GIS Interface
TTS
Navigation (structured results)
Map Engine
Report real-time logging
Internet
User responses
Extensible To other Channels (SMS, IM, e-mail)
XML data exch.
Call trees
Database inter.
GeoResponse Applications
Web Services
GeoResponse Server Architecture
http//www.georesponse.com
132. GeoResponse - Voice GIS Multimodal
Notification
- Moving Beyond Traditional Emergency Response
Notification - Bringing together the best in speech technology
with the power of mapping through GIS to provide
Emergency Response Managers with the speed and
flexibility to meet even the most challenging
notification scenarios. - GeoResponse is developed using XML technology
which provides an open standard that the
Emergency Response community has been seeking. - The easy-to-use application can be accessed by
multiple agents from any location through a
simple Internet browser. With a set of
personalized passwords, Responders can customize
a set of standard response forms, choose the
targeted area for the notification using a simple
GIS interface, and initiate the call. Recipients
can hear the message spoken. - The system can conduct dialog, map and collect
critical response information, and trigger alerts
and other dispatch. - Easy-to-read reports help you rate and understand
your notification performance to identify which
calls were answered, which ones never reached
their intended recipients, and which recipients
need help.
http//www.georesponse.com
142. GeoResponse - Voice GIS Multimodal
Notification
- Enable Informed Decision-Making and Communication
when it matters most - Log Incidents and Identify Evacuation Zones using
Interactive Mapping. - Type or speak custom messages using combination
of Text-To-Speech technology and Recorded Speech
or Audio. - Collect real-time critical responses from call
recipients using Voice Recognition technology. - Collect and Respond to critical responses (ex. I
am immobile and need help evacuating) using
Real-time GIS Mapping and rapid Multi-Modal
Notifications (messages go to LAN phone, text
message, pager, and cell.) Notifications in
Multiple Languages. - Communicate and Integrate with any system thru
the power of XML. - Use comprehensive Web-based Reports to rate your
performance in notifications.
http//www.georesponse.com
152. GeoResponse - Voice GIS Multimodal
Notification
- Features
- Identify locations accurately using a web-based
GIS interface. - Customize notification message from any location
by typing using a simple Internet browser. - Dynamic integration with voice based notification
so all recipients hear instructions. - Interactive dialog with recipient using voice
recognition to identify who needs help. - Real time data collection and notification to key
response personnel. - Trigger Inbound and Outbound calls.
- Personalization of notification messages.
- Flexible options to search for recipient
responses. - Trigger dispatch, alerts, and other activities.
http//www.georesponse.com
162. GeoResponse - Voice GIS Multimodal
Notification
- Functions (see next slides)
- Report an Event (Log)
- Geocode the Event (View)
- Define the Call List (Create)
- Customize your Message
- Make the Call
- Track and Map Responses
- Trigger another Process
http//www.georesponse.com
172. GeoResponse - Voice GIS Multimodal
Notification
Log An Incident
http//www.georesponse.com
182. GeoResponse - Voice GIS Multimodal
Notification
View Incidents
http//www.georesponse.com
192. GeoResponse - Voice GIS Multimodal
Notification
Create a Call List
http//www.georesponse.com
202. GeoResponse - Voice GIS Multimodal
Notification
Customize Message
http//www.georesponse.com
212. GeoResponse - Voice GIS Multimodal
Notification
Initiate Call
http//www.georesponse.com
222. GeoResponse - Voice GIS Multimodal
Notification
- The partnership
- Broadstrokes is a software development and
consulting company specializing in middleware
solutions applied to communication technologies.
The company has developed solutions based on XML
open-standards that significantly reduce the time
and cost for developing voice and
telecommunication applications. With this
technology the ability to move persisted and
real-time information between devices (phone,
cell, PDA, laptop) is possible, enabling a host
of more reliable and collaborative applications.
http//www.georesponse.com
232. GeoResponse - Voice GIS Multimodal
Notification
- The partnership (continued)
- Intelligent Decisions Systems Inc. (IDSi)
delivers high-quality GIS and Information System
solutions for clients seeking effective,
value-added services and management tools.
Blending the latest Internet technology with
powerful GIS mapping capabilities, IDSi continues
to develop innovative solutions for data
collection, analysis, and notifications for
government and private sector clients. IDSi's
professionals include award-winning developers
and project managers with years of experience in
GIS and Internet technology development and
implementation. - IDSi is an ESRI partner and has worked
extensively with ESRI ArcGIS, ArcIMS, and ArcSDE.
http//www.georesponse.com
243. Data Mining and Repository
- Abstract of Presentation
- The eXtensible Markup Language (XML) promotes
information sharing and reuse and enables
enterprise integration in XML repositories. The
Toxics Release Inventory (TRI), published by the
U.S. EPA, is a valuable source of information
regarding toxic chemicals that are being used,
manufactured, treated, transported, or released
into the environment. The TRI database is about 8
GB and requires industrial-strength tools and
analyses for data mining, indexing, conversion to
XML, and storage and retrieval with XML Web
Services. This pilot demonstrated that large EPA
databases can be data mined and repurposed into
XML repositories.
February 25-26, 2003, Data Mining Technology for
Military and Government Applications Conference,
XML Web Services for Data Mining and Repository
US EPA Toxics Release Inventory, Brand Niemann,
US EPA, and Data Mining Technology, Jim Walters,
Insightful Corporation. See http//web-services.go
v
253. Data Mining and Repository
- Six years ago, the World Wide Web Consortium
(W3C) published XML 1.0 as a Recommendation on
February 10, 1998. - The eXtensible Markup Language (XML) has become
pervasive nearly everywhere that information is
managed and has changed not only the way people
publish documents on the Web but also the way
people manage information internal to their
enterprise. - XML has emerged as the standard platform for
convergence of information.
See http//www.w3.org/2003/02/xml-at-5.html
263. Data Mining and Repository
- CD-ROM Raw DAT (excerpt)
- Internal 30 files/4.4 GB
- TRI_CHEM_ACTIVITY 35.2 MB
- tri_chem_info 75KB
- TRI_CODE_DESC 570 KB
- TRI_COUNTY 1.5 MB
- TRI_ENERGY_RECOVERY 16.0 MB
- TRI_FACILITY 31.6 MB
- TRI_FACILITY_DB 1.7 MB
- TRI_FACILITY_DB_HISTORY 6.5 MB
- TRI_FACILITY_HISTORY 124 MB
- TRI_FACILITY_NPDES 1.1 MB
- TRI_FACILITY_NPDES_HISTORY 4.2 MB
- TRI_FACILITY_RCRA 1.8 MB
- TRI_FACILITY_RCRA_HISTORY 7.1 MB
- TRI_FACILITY_SIC 1.6 MB
- TRI_FACILITY_SIC_HISTORY 6.8 MB
Total of 74 files with 8 GB!
273. Data Mining and Repository
http//www.insightful.com/
283. Data Mining and Repository
- Insightful Miner Components
- Full life-cycle, from data access through
deployment, data mining workbench. - Scalable, extensible and affordable toolset that
enables both new data miners and skilled modelers
to solve their toughest analytic challenges with
a best of breed approach. - Advanced pipeline architecture and analytics are
built to scale to the data size problems of today
and far into the future. - Embedded data analysis language that allows it to
adapt to the changing business needs of its users.
293. Data Mining and Repository
I-Miner on TRI 2000 Public Release Data
Pipeline Architecture and Visual Workflow
303. Data Mining and Repository
I-Miner on TRI 2000 Public Release Data
Histogram Plots
Rationale Toxic chemical releases to different
media should be correlated outliers suggest
need to follow up with reporting facilities.
313. Data Mining and Repository
- Update Data Fusion with XML
- Integrated data analysis across content types is
enabled by XML, but is still a very new area for
vendors and researchers. - See Native XML Database Technologies in the next
slides. - Also see DataSpaceWeb.Org in subsequent slides.
- Insightful Miner Server Released That Provides
All the Insightful Miner Desktop Capabilities in
a Multi-User, Distributed Environment Using
Industry Standard Terminal Services or X-Windows
Protocols, Fall 2003. - Insightful Miner Selected as Finalist in the
Business Management Analytics Reporting
Category in eWeeks Excellence Awards Program,
April 5, 2004.
323. Data Mining and Repository
- Why store XML?
- Single source publishing.
- Effective searching.
- eBusiness messages.
- XML-driven Web sites.
- Web Services.
- Mobile Communication devices.
- Office 2003
- Longer message life-cycles moving from simple
service invocations to long-running stateful
interactions and the need for message management. - Source Mike Champion and Steve Hamby, Native XML
Database Applications Development, XML 2002
Conference Tutorial, December 9th.
333. Data Mining and Repository
- SoftwareAGs Tamino XML Server 4.1.4
- Native XML storage.
- Store for any type of data.
- Extensible by definition.
- Consolidates data from various sources in one
place. - Find-Engine for fast retrieval of XML-based
content. - Built-in full-text retrieval at no extra cost.
- Multi-channel output formatting capabilities.
- Server extensions for custom functionality and
application integration.
343. Data Mining and Repository
http//www.softwareag.com/tamino/
353. Data Mining and Repository
Store and Query TRI Data in XML Format
363. Data Mining and Repository
- DataSpace is a web services based infrastructure
for exploring, analyzing, and mining remote and
distributed data. Their Web site describes
DataSpace protocols, DataSpace applications, and
open source DataSpace servers and clients. - DataSpace is supported by grants from the NSF.
DataSpace is built around standards developed by
the Data Mining Group and W3C. - The Data Mining Group (DMG) is an independent,
vendor led group which develops data mining
standards, such as the Predictive Model Markup
Language (PMML). - DataSpace applications employ a protocol for
working with remote and distributed data called
the DataSpace Transfer Protocol or DSTP. DSTP
simplifies working with data by providing direct
support for common operations, such as working
with attributes, keys and metadata. - The DSTP protocol can be layered over specialized
high performance transport protocols such as
SABUL. Using protocols such as SABUL, DataSpace
applications can effectively work on wide area
high performance OC-3, OC-12 and Gbps networks.
SABUL currently holds the landspeed record for
connecting two distributed clusters, a record set
at iGrid 02.
http//www.dataspaceweb.org/
374. Application of Semantic Web Technologies to
EPA Data and Information
- Overview
- Repurposing of large documents with mixed content
(text, tables, graphics, etc.) into XML content
collections began with The Statistical Abstract
of the United States (1999 Edition) as part of
the FedStats.Net project to build a distributed
network of statistical data and information using
new XML standards and technology. - The Statistical Abstract of the United States was
considered to be one of the best examples of
"manual aggregation of government information"
(from some 200 programs across about 70 agencies)
that would benefit from a distributed XML-based
content network that would leave the content in
the hands of its originators and create a more
"living document". - This work was recognized by OMB Associate
Director for Information Technology and
E-Government, Mark Forman, and the Quad Council
with a Special Award for Innovation in the 2002
CIO Showcase of Excellence for the use of XML in
a distributed content network (renamed FedGov)
and use of VoiceXML in providing universal access
to emergency response information.
384. Application of Semantic Web Technologies to
EPA Data and Information
- Overview (continued)
- More recently, the eGov Act of 2002's provisions
for an Intergovernmental Committee on Government
Information (ICGI) and Data Integration Pilots,
the Federal Enterprise Architecture's Data and
Information Reference Model (DRM) and its Data
Management Strategy and the focus in the CIO
Council's Architecture and Infrastructure
Committee on Intergovernmental Data Exchange,
have all be tied together in a new pilot that
simultaneously accomplishes multiple objectives
(see next slide). - This Smart Data Enterprise approach came from
the Semantic Technologies for eGov Conference,
September 8, 2003, at the White House Conference
Center (in which the EPA CIO and her staff
participated), and continues in the new CIO
Councils Semantic Interoperability (Web
Services) Community of Practice (SICoP) (see
subsequent slide).
394. Application of Semantic Web Technologies to
EPA Data and Information
- Overview (continued)
- (1) Repurposes government data and information
into structured documents using new XML-based
standards and technologies that facilitate reuse
and exchange. - (2) Repurpose the data and information so that it
can be readily decomposed into XML fragments (for
text and tables) and RDF metadata (for graphics)
that can be stored and referenced in a database
and can be in turn repurposed into new documents
that provide additional user-defined views of the
data and information. - (3) Organize and categorize the repurposed data
and information using taxonomies and even
ontologies in semantic registries and
repositories. - (4) Use "XML data islands", and RDF and OWL to
add metadata, interoperability and semantic
meaning to data and information to be reused and
exchanged. - (5) Standardize the data element and XML tag
names in a DRM registry and repository. - (6) Share these results with others that are
working on Semantic Web and Technology
Applications for eGovernment.
404. Application of Semantic Web Technologies to
EPA Data and Information
Industry Advisory Council (IAC)
U.S. CIO Council
OMB - FEAPMO
Enterprise Architecture Special Interest Group
Architecture Infrastructure Committee
IT Workforce Connections
Best Practices Committee
WGs and CoPs
Subcommittees Governance Components Emerging
Technologies
Semantic Interoperability Community of Practice
Chief Architects Forum
41The Smart Data Enterprise
Figure 2. Developer's Perspective on Data To the
application developer, the data evolution
timeline is viewed through the correlation of
programming paradigms with the relation of data
and code. From Designing the Smart-Data
Enterprise, Get prepared for the 10 ways that
semantic computing will impact enterprise IT, by
Michael C. Daconta, Posted November 28, 2003,
Enterprise Architect Magazine.
42The Smart Data Enterprise
Figure 3. The Smart Data Continuum Data has
progressed through four stages of increasing
intelligence. (Reprinted with permission from The
Semantic Web A Guide to the Future of XML, Web
Services, and Knowledge Management John Wiley
Sons, 2003. From Designing the Smart-Data
Enterprise, Get prepared for the 10 ways that
semantic computing will impact enterprise IT, by
Michael C. Daconta, Posted November 28, 2003,
Enterprise Architect Magazine.
434. Application of Semantic Web Technologies to
EPA Data and Information
- Overview (continued)
- The methodology for repurposing large documents
into structured XML content collections was
presented previously - November 18-19, 2003, Website Content Management
for Government Conference, Invited Presentation
on November 19th on "Repurposing Documents Into
Semantic Web Services and Networks" (EPA
Enterprise Integration Portal/Data Exchange
Network Pilot), Doubletree Hotel, Arlington, VA.
Also see Folio-to-XML Conversion and Webinar. - See Past Meetings and Presentations at
http//web-services.gov
444. Application of Semantic Web Technologies to
EPA Data and Information
- Markup adds structure and metadata to documents
that contain mixed content (text, tables,
graphics, references, etc.) - An EPA Indicator Report example
- Americas Children and the Environment Report,
2003 - The report consists of 171 pages of text, tables,
graphics, references, etc. and exists in two
basic forms - A 2 MB PDF file and
- A new HTML version on the Web.
- This document was converted to XML by several
tools but the automated conversion was
practically worthless from a semantic point of
view. - This single document covers so much information
that it will benefit immensely from semantic
dissection, linking, and augmentation (explosion
of single PDF file to multiple XML files stored
in an XML database for reuse). - As a result this report consists of the
following - Images (2.64 MB) 33 .jpeg and 33 .rdf (metadata
format to be explained later). - XML 102 files/1.65 MB for 12 sections.
454. Application of Semantic Web Technologies to
EPA Data and Information
Initial Taxonomy/Ontology for Structuring
Additional Data Information.
464. Application of Semantic Web Technologies to
EPA Data and Information
Detailed Table of Contents for Each Section.
474. Application of Semantic Web Technologies to
EPA Data and Information
Graphics can have RDF metadata.
484. Application of Semantic Web Technologies to
EPA Data and Information
Tables are structured data (copy to Excel) and
available in XML
494. Application of Semantic Web Technologies to
EPA Data and Information
Table copied to Excel from Browser
504. Application of Semantic Web Technologies to
EPA Data and Information
Search within just one chapter of the entire
document.
514. Application of Semantic Web Technologies to
EPA Data and Information
Better search than from conventional Internet
search engines.
524. Application of Semantic Web Technologies to
EPA Data and Information
Appendix on Data (Data Quality) and Methods!
534. Application of Semantic Web Technologies to
EPA Data and Information
- Extensions
- The same process has been applied to many other
EPA and non-EPA documents so that a collection of
structured documents on the same server
constitutes a portal. - E.g. EPA Report on the Environment and the Heinz
Center Report on the Nations Ecosystems. - And a distributed collection of structured
documents across multiple servers constitutes a
network! - E.g. EPA and USGS Digital Library.
- These portals and networks can be searched and
their content reused in XML!
544. Application of Semantic Web Technologies to
EPA Data and Information
Portal and Network Nodes
Search Across All for Watershed