Presentations and Demonstrations to the: Business Intelligence and Analytics BIA Users Group Meeting - PowerPoint PPT Presentation

1 / 54
About This Presentation
Title:

Presentations and Demonstrations to the: Business Intelligence and Analytics BIA Users Group Meeting

Description:

... 25-26, 2003, Data Mining Technology for Military and Government Applications ... Niemann, US EPA, and Data Mining Technology, Jim Walters, Insightful Corporation. ... – PowerPoint PPT presentation

Number of Views:132
Avg rating:3.0/5.0
Slides: 55
Provided by: Niem
Category:

less

Transcript and Presenter's Notes

Title: Presentations and Demonstrations to the: Business Intelligence and Analytics BIA Users Group Meeting


1
Presentations and Demonstrations to theBusiness
Intelligence and Analytics (BIA) Users Group
Meeting
  • Brand Niemann
  • U.S. EPA, OTOP/ITPPD/ITSPB
  • April 13, 2004

2
Overview
  • 1. LandView 6 Population Estimator on DVD and the
    Web
  • Calculates Census 2000 demographic and housing
    characteristics for user defined radii.
  • 2. GeoResponse - Voice GIS Multimodal
    Notification
  • Commercialization of EPAs CIO Council Showcase
    of Excellence Award Winning XML and Voice XML Web
    Service.
  • 3. Data Mining and Repository
  • US EPA Toxics Release Inventory.
  • 4. Application of Semantic Web Technologies to
    EPA Data and Information
  • Semantic Web Service Portals and Networks.

3
Background
  • OEI Portal Project is considering the following
    portal components
  • E-Alert
  • Geospatial
  • Data Analysts
  • AQS Data Mart
  • AQS Web Service
  • All of these match the XML Web Services pilot
    projects that have been conducted!

4
1. LandView 6 Population Estimator on DVD and the
Web
  • What is LandView 6?
  • LandView has its roots in the CAMEO software
    (Computer-Aided Management of Emergency
    Operations). CAMEO was developed by the EPA and
    the NOAA to facilitate the implementation of the
    Emergency Planning and Community Right-to-Know
    Act. This far-reaching law requires communities
    to develop emergency response plans addressing
    chemical hazards and to make available to the
    public information on chemical hazards in the
    community.
  • This product contains both database management
    software and mapping software used in the CAMEO
    system to create a simple computer mapping system
    involving two programs - MARPLOT and LandView.
  • The MARPLOT mapping program allows users to map
    Census 2000 legal and statistical areas, EPA
    EnviroFact sites, and USGS Geographic Names
    Information (GNIS) features.

http//www.census.gov/geo/landview/
5
1. LandView 6 Population Estimator on DVD and the
Web
  • What is LandView 6? (continued)
  • The LandView database system allows users to
    retrieve Census 2000 demographic and housing
    data, EPA EnviroFacts data and USGS GNIS
    information. The GNIS contains over 1.2 million
    records which show the official federally
    recognized geographic names for all known places,
    features, and areas in the United States that are
    identified by a proper name.
  • The LandView 6 and MARPLOT software included on
    the DVDs were created by agencies of the U.S.
    Government and are in the public domain. They can
    be copied, used and distributed freely without
    the requirement for royalty payments or further
    permissions. However, the Census Bureau cannot
    provide technical support for products created by
    others using LandView.
  • LandView 6 is available on a set of 2 DVDs that
    cover the entire nation. A national version is
    also available for installation on a network
    server or individual computer hard drives.

http//www.census.gov/geo/landview/
6
1. LandView 6 Population Estimator on DVD and the
Web
Population Estimator
LandView Home Screen
http//www.census.gov/geo/landview/
7
1. LandView 6 Population Estimator on DVD and the
Web
  • The Population Estimator can be opened from the
    Estimate Population within a Radius 3 button on
    the LandView 6 Home page. Normally, the
    population search is tracking the location of the
    Focus Point in MARPLOT, and the MARPLOT
    application should be running before using the
    Population Estimator. In MARPLOT, once the Focus
    Point has been set to the desired search point,
    the Population Estimator is available from the
    MARPLOT MenuBar, at Sharing/LandView/LandView
    Census 2000 Population Estimator. Either pathway
    opens the Estimator shown in the next slide.

http//www.census.gov/geo/landview/lv6help/pop_est
imate.html
8
1. LandView 6 Population Estimator on DVD and the
Web
http//www.census.gov/geo/landview/lv6help/pop_est
imate.html
9
1. LandView 6 Population Estimator on DVD and the
Web
  • To share the rich demographics available in
    LandView 6, LandView has turned to the Census
    Block Group. The demographic profile in the
    Estimator relies on the capture of the centroids
    of Census Block Groups within the radius of a
    circle. For any individual Census Block Group,
    say, 55 of its area lies inside the capture
    circle, and its data would be included. Another
    Group might have 49 of its area inside of the
    circle and it would be excluded. On sum, the
    capture and loss of individual Block Groups
    should average out, but not to the degree of
    precision attached to population search by Census
    Blocks. It is this dichotomy of data that needs
    to be borne in mind when interpreting Population
    Estimator results.

http//www.census.gov/geo/landview/lv6help/pop_est
imate.html
10
1. LandView 6 Population Estimator on DVD and the
Web
  • Notes
  • (1) There are 8.2 million census tabulation
    blocks for the geographic areas shown in
    LandView. 2.7 million blocks are excluded because
    they contain zero population and zero housing
    units.
  • (2) There are 211,267 block groups for the 50
    states, District of Columbia and Puerto Rico. On
    the average, each block group contains about 39
    blocks.
  • (3) The LandView program's algorithm for
    determining which block internal points fall
    within the radius takes the curvature of the
    earth into account. The MARPLOT mapping engine in
    mapping the radius does not. Consequently for
    larger radii, users might note differences in the
    block point counts between MARPLOT and the
    Population Estimator.
  • (4) An alternate population search that is
    frequently of interest is Finding the Population
    Within an Irregular Polygon. An example of such a
    search is given in Lesson 5 of the LandView 6
    Tutorial.

http//www.census.gov/geo/landview/lv6help/pop_est
imate.html
11
2. GeoResponse - Voice GIS Multimodal
Notification
  • GeoResponse is a Distributed XML Web Service
    across two companies that provides
  • The power of advanced notification, analysis and
    response capabilities without the hefty start-up
    costs of such a system.
  • The flexibility for various plans depending on
    your needs. A simple monthly or annual fee
    provides you with the calling, report, and
    decision support capabilities you want without
    the need to buy, maintain, and hire additional
    resources. A service model provides the
    flexibility that you like.

http//www.georesponse.com
12
2. GeoResponse - Voice GIS Multimodal
Notification
Dial deliver messages
ASR
Spatial Database
Location-based Info.
Phone Voice Interface
Public Switched Telephone Network
GIS Interface
TTS
Navigation (structured results)
Map Engine
Report real-time logging
Internet
User responses
Extensible To other Channels (SMS, IM, e-mail)
XML data exch.
Call trees
Database inter.
GeoResponse Applications
Web Services
GeoResponse Server Architecture
http//www.georesponse.com
13
2. GeoResponse - Voice GIS Multimodal
Notification
  • Moving Beyond Traditional Emergency Response
    Notification
  • Bringing together the best in speech technology
    with the power of mapping through GIS to provide
    Emergency Response Managers with the speed and
    flexibility to meet even the most challenging
    notification scenarios.
  • GeoResponse is developed using XML technology
    which provides an open standard that the
    Emergency Response community has been seeking.
  • The easy-to-use application can be accessed by
    multiple agents from any location through a
    simple Internet browser. With a set of
    personalized passwords, Responders can customize
    a set of standard response forms, choose the
    targeted area for the notification using a simple
    GIS interface, and initiate the call. Recipients
    can hear the message spoken.
  • The system can conduct dialog, map and collect
    critical response information, and trigger alerts
    and other dispatch.
  • Easy-to-read reports help you rate and understand
    your notification performance to identify which
    calls were answered, which ones never reached
    their intended recipients, and which recipients
    need help.

http//www.georesponse.com
14
2. GeoResponse - Voice GIS Multimodal
Notification
  • Enable Informed Decision-Making and Communication
    when it matters most
  • Log Incidents and Identify Evacuation Zones using
    Interactive Mapping.
  • Type or speak custom messages using combination
    of Text-To-Speech technology and Recorded Speech
    or Audio.
  • Collect real-time critical responses from call
    recipients using Voice Recognition technology.
  • Collect and Respond to critical responses (ex. I
    am immobile and need help evacuating) using
    Real-time GIS Mapping and rapid Multi-Modal
    Notifications (messages go to LAN phone, text
    message, pager, and cell.) Notifications in
    Multiple Languages.
  • Communicate and Integrate with any system thru
    the power of XML.
  • Use comprehensive Web-based Reports to rate your
    performance in notifications.

http//www.georesponse.com
15
2. GeoResponse - Voice GIS Multimodal
Notification
  • Features
  • Identify locations accurately using a web-based
    GIS interface.
  • Customize notification message from any location
    by typing using a simple Internet browser.
  • Dynamic integration with voice based notification
    so all recipients hear instructions.
  • Interactive dialog with recipient using voice
    recognition to identify who needs help.
  • Real time data collection and notification to key
    response personnel.
  • Trigger Inbound and Outbound calls.
  • Personalization of notification messages.
  • Flexible options to search for recipient
    responses.
  • Trigger dispatch, alerts, and other activities.

http//www.georesponse.com
16
2. GeoResponse - Voice GIS Multimodal
Notification
  • Functions (see next slides)
  • Report an Event (Log)
  • Geocode the Event (View)
  • Define the Call List (Create)
  • Customize your Message
  • Make the Call
  • Track and Map Responses
  • Trigger another Process

http//www.georesponse.com
17
2. GeoResponse - Voice GIS Multimodal
Notification
Log An Incident
http//www.georesponse.com
18
2. GeoResponse - Voice GIS Multimodal
Notification
View Incidents
http//www.georesponse.com
19
2. GeoResponse - Voice GIS Multimodal
Notification
Create a Call List
http//www.georesponse.com
20
2. GeoResponse - Voice GIS Multimodal
Notification
Customize Message
http//www.georesponse.com
21
2. GeoResponse - Voice GIS Multimodal
Notification
Initiate Call
http//www.georesponse.com
22
2. GeoResponse - Voice GIS Multimodal
Notification
  • The partnership
  • Broadstrokes is a software development and
    consulting company specializing in middleware
    solutions applied to communication technologies.
    The company has developed solutions based on XML
    open-standards that significantly reduce the time
    and cost for developing voice and
    telecommunication applications. With this
    technology the ability to move persisted and
    real-time information between devices (phone,
    cell, PDA, laptop) is possible, enabling a host
    of more reliable and collaborative applications.

http//www.georesponse.com
23
2. GeoResponse - Voice GIS Multimodal
Notification
  • The partnership (continued)
  • Intelligent Decisions Systems Inc. (IDSi)
    delivers high-quality GIS and Information System
    solutions for clients seeking effective,
    value-added services and management tools.
    Blending the latest Internet technology with
    powerful GIS mapping capabilities, IDSi continues
    to develop innovative solutions for data
    collection, analysis, and notifications for
    government and private sector clients. IDSi's
    professionals include award-winning developers
    and project managers with years of experience in
    GIS and Internet technology development and
    implementation.
  • IDSi is an ESRI partner and has worked
    extensively with ESRI ArcGIS, ArcIMS, and ArcSDE.

http//www.georesponse.com
24
3. Data Mining and Repository
  • Abstract of Presentation
  • The eXtensible Markup Language (XML) promotes
    information sharing and reuse and enables
    enterprise integration in XML repositories. The
    Toxics Release Inventory (TRI), published by the
    U.S. EPA, is a valuable source of information
    regarding toxic chemicals that are being used,
    manufactured, treated, transported, or released
    into the environment. The TRI database is about 8
    GB and requires industrial-strength tools and
    analyses for data mining, indexing, conversion to
    XML, and storage and retrieval with XML Web
    Services. This pilot demonstrated that large EPA
    databases can be data mined and repurposed into
    XML repositories.

February 25-26, 2003, Data Mining Technology for
Military and Government Applications Conference,
XML Web Services for Data Mining and Repository
US EPA Toxics Release Inventory, Brand Niemann,
US EPA, and Data Mining Technology, Jim Walters,
Insightful Corporation. See http//web-services.go
v
25
3. Data Mining and Repository
  • Six years ago, the World Wide Web Consortium
    (W3C) published XML 1.0 as a Recommendation on
    February 10, 1998.
  • The eXtensible Markup Language (XML) has become
    pervasive nearly everywhere that information is
    managed and has changed not only the way people
    publish documents on the Web but also the way
    people manage information internal to their
    enterprise.
  • XML has emerged as the standard platform for
    convergence of information.

See http//www.w3.org/2003/02/xml-at-5.html
26
3. Data Mining and Repository
  • CD-ROM Raw DAT (excerpt)
  • Internal 30 files/4.4 GB
  • TRI_CHEM_ACTIVITY 35.2 MB
  • tri_chem_info 75KB
  • TRI_CODE_DESC 570 KB
  • TRI_COUNTY 1.5 MB
  • TRI_ENERGY_RECOVERY 16.0 MB
  • TRI_FACILITY 31.6 MB
  • TRI_FACILITY_DB 1.7 MB
  • TRI_FACILITY_DB_HISTORY 6.5 MB
  • TRI_FACILITY_HISTORY 124 MB
  • TRI_FACILITY_NPDES 1.1 MB
  • TRI_FACILITY_NPDES_HISTORY 4.2 MB
  • TRI_FACILITY_RCRA 1.8 MB
  • TRI_FACILITY_RCRA_HISTORY 7.1 MB
  • TRI_FACILITY_SIC 1.6 MB
  • TRI_FACILITY_SIC_HISTORY 6.8 MB

Total of 74 files with 8 GB!
27
3. Data Mining and Repository
http//www.insightful.com/
28
3. Data Mining and Repository
  • Insightful Miner Components
  • Full life-cycle, from data access through
    deployment, data mining workbench.
  • Scalable, extensible and affordable toolset that
    enables both new data miners and skilled modelers
    to solve their toughest analytic challenges with
    a best of breed approach.
  • Advanced pipeline architecture and analytics are
    built to scale to the data size problems of today
    and far into the future.
  • Embedded data analysis language that allows it to
    adapt to the changing business needs of its users.

29
3. Data Mining and Repository
I-Miner on TRI 2000 Public Release Data
Pipeline Architecture and Visual Workflow
30
3. Data Mining and Repository
I-Miner on TRI 2000 Public Release Data
Histogram Plots
Rationale Toxic chemical releases to different
media should be correlated outliers suggest
need to follow up with reporting facilities.
31
3. Data Mining and Repository
  • Update Data Fusion with XML
  • Integrated data analysis across content types is
    enabled by XML, but is still a very new area for
    vendors and researchers.
  • See Native XML Database Technologies in the next
    slides.
  • Also see DataSpaceWeb.Org in subsequent slides.
  • Insightful Miner Server Released That Provides
    All the Insightful Miner Desktop Capabilities in
    a Multi-User, Distributed Environment Using
    Industry Standard Terminal Services or X-Windows
    Protocols, Fall 2003.
  • Insightful Miner Selected as Finalist in the
    Business Management Analytics Reporting
    Category in eWeeks Excellence Awards Program,
    April 5, 2004.

32
3. Data Mining and Repository
  • Why store XML?
  • Single source publishing.
  • Effective searching.
  • eBusiness messages.
  • XML-driven Web sites.
  • Web Services.
  • Mobile Communication devices.
  • Office 2003
  • Longer message life-cycles moving from simple
    service invocations to long-running stateful
    interactions and the need for message management.
  • Source Mike Champion and Steve Hamby, Native XML
    Database Applications Development, XML 2002
    Conference Tutorial, December 9th.

33
3. Data Mining and Repository
  • SoftwareAGs Tamino XML Server 4.1.4
  • Native XML storage.
  • Store for any type of data.
  • Extensible by definition.
  • Consolidates data from various sources in one
    place.
  • Find-Engine for fast retrieval of XML-based
    content.
  • Built-in full-text retrieval at no extra cost.
  • Multi-channel output formatting capabilities.
  • Server extensions for custom functionality and
    application integration.

34
3. Data Mining and Repository
http//www.softwareag.com/tamino/
35
3. Data Mining and Repository
Store and Query TRI Data in XML Format
36
3. Data Mining and Repository
  • DataSpace is a web services based infrastructure
    for exploring, analyzing, and mining remote and
    distributed data. Their Web site describes
    DataSpace protocols, DataSpace applications, and
    open source DataSpace servers and clients.
  • DataSpace is supported by grants from the NSF.
    DataSpace is built around standards developed by
    the Data Mining Group and W3C.
  • The Data Mining Group (DMG) is an independent,
    vendor led group which develops data mining
    standards, such as the Predictive Model Markup
    Language (PMML).
  • DataSpace applications employ a protocol for
    working with remote and distributed data called
    the DataSpace Transfer Protocol or DSTP. DSTP
    simplifies working with data by providing direct
    support for common operations, such as working
    with attributes, keys and metadata.
  • The DSTP protocol can be layered over specialized
    high performance transport protocols such as
    SABUL. Using protocols such as SABUL, DataSpace
    applications can effectively work on wide area
    high performance OC-3, OC-12 and Gbps networks.
    SABUL currently holds the landspeed record for
    connecting two distributed clusters, a record set
    at iGrid 02.

http//www.dataspaceweb.org/
37
4. Application of Semantic Web Technologies to
EPA Data and Information
  • Overview
  • Repurposing of large documents with mixed content
    (text, tables, graphics, etc.) into XML content
    collections began with The Statistical Abstract
    of the United States (1999 Edition) as part of
    the FedStats.Net project to build a distributed
    network of statistical data and information using
    new XML standards and technology.
  • The Statistical Abstract of the United States was
    considered to be one of the best examples of
    "manual aggregation of government information"
    (from some 200 programs across about 70 agencies)
    that would benefit from a distributed XML-based
    content network that would leave the content in
    the hands of its originators and create a more
    "living document".
  • This work was recognized by OMB Associate
    Director for Information Technology and
    E-Government, Mark Forman, and the Quad Council
    with a Special Award for Innovation in the 2002
    CIO Showcase of Excellence for the use of XML in
    a distributed content network (renamed FedGov)
    and use of VoiceXML in providing universal access
    to emergency response information.

38
4. Application of Semantic Web Technologies to
EPA Data and Information
  • Overview (continued)
  • More recently, the eGov Act of 2002's provisions
    for an Intergovernmental Committee on Government
    Information (ICGI) and Data Integration Pilots,
    the Federal Enterprise Architecture's Data and
    Information Reference Model (DRM) and its Data
    Management Strategy and the focus in the CIO
    Council's Architecture and Infrastructure
    Committee on Intergovernmental Data Exchange,
    have all be tied together in a new pilot that
    simultaneously accomplishes multiple objectives
    (see next slide).
  • This Smart Data Enterprise approach came from
    the Semantic Technologies for eGov Conference,
    September 8, 2003, at the White House Conference
    Center (in which the EPA CIO and her staff
    participated), and continues in the new CIO
    Councils Semantic Interoperability (Web
    Services) Community of Practice (SICoP) (see
    subsequent slide).

39
4. Application of Semantic Web Technologies to
EPA Data and Information
  • Overview (continued)
  • (1) Repurposes government data and information
    into structured documents using new XML-based
    standards and technologies that facilitate reuse
    and exchange.
  • (2) Repurpose the data and information so that it
    can be readily decomposed into XML fragments (for
    text and tables) and RDF metadata (for graphics)
    that can be stored and referenced in a database
    and can be in turn repurposed into new documents
    that provide additional user-defined views of the
    data and information.
  • (3) Organize and categorize the repurposed data
    and information using taxonomies and even
    ontologies in semantic registries and
    repositories.
  • (4) Use "XML data islands", and RDF and OWL to
    add metadata, interoperability and semantic
    meaning to data and information to be reused and
    exchanged.
  • (5) Standardize the data element and XML tag
    names in a DRM registry and repository.
  • (6) Share these results with others that are
    working on Semantic Web and Technology
    Applications for eGovernment.

40
4. Application of Semantic Web Technologies to
EPA Data and Information
Industry Advisory Council (IAC)
U.S. CIO Council
OMB - FEAPMO
Enterprise Architecture Special Interest Group
Architecture Infrastructure Committee
IT Workforce Connections
Best Practices Committee
WGs and CoPs
Subcommittees Governance Components Emerging
Technologies
Semantic Interoperability Community of Practice
Chief Architects Forum
41
The Smart Data Enterprise
Figure 2. Developer's Perspective on Data To the
application developer, the data evolution
timeline is viewed through the correlation of
programming paradigms with the relation of data
and code. From Designing the Smart-Data
Enterprise, Get prepared for the 10 ways that
semantic computing will impact enterprise IT, by
Michael C. Daconta, Posted November 28, 2003,
Enterprise Architect Magazine.
42
The Smart Data Enterprise
Figure 3. The Smart Data Continuum Data has
progressed through four stages of increasing
intelligence. (Reprinted with permission from The
Semantic Web A Guide to the Future of XML, Web
Services, and Knowledge Management John Wiley
Sons, 2003. From Designing the Smart-Data
Enterprise, Get prepared for the 10 ways that
semantic computing will impact enterprise IT, by
Michael C. Daconta, Posted November 28, 2003,
Enterprise Architect Magazine.
43
4. Application of Semantic Web Technologies to
EPA Data and Information
  • Overview (continued)
  • The methodology for repurposing large documents
    into structured XML content collections was
    presented previously
  • November 18-19, 2003, Website Content Management
    for Government Conference, Invited Presentation
    on November 19th on "Repurposing Documents Into
    Semantic Web Services and Networks" (EPA
    Enterprise Integration Portal/Data Exchange
    Network Pilot), Doubletree Hotel, Arlington, VA.
    Also see Folio-to-XML Conversion and Webinar.
  • See Past Meetings and Presentations at
    http//web-services.gov

44
4. Application of Semantic Web Technologies to
EPA Data and Information
  • Markup adds structure and metadata to documents
    that contain mixed content (text, tables,
    graphics, references, etc.)
  • An EPA Indicator Report example
  • Americas Children and the Environment Report,
    2003
  • The report consists of 171 pages of text, tables,
    graphics, references, etc. and exists in two
    basic forms
  • A 2 MB PDF file and
  • A new HTML version on the Web.
  • This document was converted to XML by several
    tools but the automated conversion was
    practically worthless from a semantic point of
    view.
  • This single document covers so much information
    that it will benefit immensely from semantic
    dissection, linking, and augmentation (explosion
    of single PDF file to multiple XML files stored
    in an XML database for reuse).
  • As a result this report consists of the
    following
  • Images (2.64 MB) 33 .jpeg and 33 .rdf (metadata
    format to be explained later).
  • XML 102 files/1.65 MB for 12 sections.

45
4. Application of Semantic Web Technologies to
EPA Data and Information
Initial Taxonomy/Ontology for Structuring
Additional Data Information.
46
4. Application of Semantic Web Technologies to
EPA Data and Information
Detailed Table of Contents for Each Section.
47
4. Application of Semantic Web Technologies to
EPA Data and Information
Graphics can have RDF metadata.
48
4. Application of Semantic Web Technologies to
EPA Data and Information
Tables are structured data (copy to Excel) and
available in XML
49
4. Application of Semantic Web Technologies to
EPA Data and Information
Table copied to Excel from Browser
50
4. Application of Semantic Web Technologies to
EPA Data and Information
Search within just one chapter of the entire
document.
51
4. Application of Semantic Web Technologies to
EPA Data and Information
Better search than from conventional Internet
search engines.
52
4. Application of Semantic Web Technologies to
EPA Data and Information
Appendix on Data (Data Quality) and Methods!
53
4. Application of Semantic Web Technologies to
EPA Data and Information
  • Extensions
  • The same process has been applied to many other
    EPA and non-EPA documents so that a collection of
    structured documents on the same server
    constitutes a portal.
  • E.g. EPA Report on the Environment and the Heinz
    Center Report on the Nations Ecosystems.
  • And a distributed collection of structured
    documents across multiple servers constitutes a
    network!
  • E.g. EPA and USGS Digital Library.
  • These portals and networks can be searched and
    their content reused in XML!

54
4. Application of Semantic Web Technologies to
EPA Data and Information
Portal and Network Nodes
Search Across All for Watershed
Write a Comment
User Comments (0)
About PowerShow.com