Title: Compilation and Design of a Functioning Distributed Database of North American Electric Generating Emissions
1Compilation and Design of a Functioning
Distributed Database of North American Electric
Generating Emissions
Demonstration of a Distributed Emissions
Inventory using Web Technologies
- Stefan Falke
- Center for Air Pollution Impact and Trend
Analysis - Washington University in St. Louis
- Gregory Stella
- Alpine Geophysics, LLC
- Terry Keating
- US EPA Office of Air Radiation
2Background
Air pollutant emission inventories for the US,
Canada, and Mexico are compiled, stored and
disseminated using different methods
The development of a single comprehensive and
accurate emissions inventory is essential for the
coordinated reporting, policy development,
transport analyses, and socio-economic studies
that create an environment for collaboration
among international researchers, policy-makers,
and the interested public
In support of this longer term goal, the
Commission on Environmental Cooperation (CEC) and
the US EPA have initiated a project to develop a
prototype web tool for enabling uniform access to
distributed emissions data from North American
electricity generating power plants.
3Distributed Data and Management Networks
Advances in information science and technology
are driving the trend toward distributed networks
and virtual communities for science and
management.
- Cyberinfrastructure
- NSFs initiative to apply new IT to building new
ways of conducting collaborative research
http//www.communitytechnology.org/nsf_ci_report/ - Earth Observation Summit
- International effort to build comprehensive,
coordinated, and sustained Earth observation
systems http//www.earthobservationsummit.gov - Ecoinformatics
- EPAs vision for national and international
cooperation in data and technology development
http//oaspub.epa.gov/sor/user_conference.startup
Integrated Ocean Observing System International
network of ocean related monitoring, assessment,
and communication http//www.ocean.us/ Linked
Environments for Atmospheric Discovery Network
of high-performance computers and software to
gain new insights into weather
http//lead.ou.edu/ Virtual Observatory Network
for astronomical data sharing and distributed
analysis http//www.us-vo.org/
4Emissions Community Collaborative Activities
- NIF Data standards
- Standard format and submission
- NEI XML schema
- Environmental Information Exchange Network
- Network linking EPA, States, and other partners
through the Internet and standardized data
formats - Facility Registry System
- Standard facility codes and locations
- Data Sharing Efforts
- States, Tribes, Local agencies, RPOs
- North America
5NEISGEI
Networked Environmental Information System for
Global Emissions Inventories
- is both a conceptual framework and
implementation effort for the development of a
fully integrated, distributed air emissions
inventory and the foundation for an all-media
environmental information network - Tie together data at all spatial and temporal
scales using emerging distributed database
technologies - Provide shared, online tools for processing and
analysis - Provide for the seamless merging, manipulation
and analysis of Internet accessible air
quality-relevant data through the development of
emerging Internet-oriented technologies - Make use of existing resources link and partner
with other efforts - Build a broad-based air quality user community
scientists, regulators, policy analysts and the
public - Create the network and toolkit piece-wise through
multiple, connected projects
Ongoing Efforts NSF-EPA Digital Government
Funded Projects The California Air Resources
Network Fire Air Quality Data and Tools
Network Future Effort EPA OAR RFA on
Distributed Air Quality Data in Support of
NEISGEI
6CAREN The California Air Resources Network
Eduard Hovy, Jose-Luis Ambite, Andrew Philpot
USC Information Sciences Institute
- Environmental data sharing among international,
national, state and local governments, the public
and academic and other non-governmental research
organizations is a difficult challenge. - Barrier Technological incompatibilities
- Barrier Data format incompatibilities
- Barrier Financial (staff time) limitations
RPOs
- The Solution Strategy (First Step)
- Automate the integration of heterogeneous
databases
Use semi-automated information integration
methods to generate translation protocols between
related information sources, e.g. AQMD and CARB.
???
7Fire and Air Quality Data Network
Data Sources
Data Wrappers
CAPITA
BLM Fire History
FS Coarse Spatial Data
Data Catalog
NEI Fire Emissions
ftp
HMS Fire Detection
Wildland Fire Assessment System
text tables
RDBMS
Spatial Interpolation Service
VIEWS
- Data wrappers are used to translate the format
of data sets into a uniform format. - The data are either stored on the CAPITA
database server or dynamically accessed from its
original source - The datasets are registered with metadata in the
data catalog - GIS-type interfaces provide users with ways to
view, analyze, and export the data
8Envisioned Emissions Community Resource of Data
Tools
Users Projects
Data
Data Catalogs
Geospatial One-Stop
Wrappers
XML
Emissions Inventory Catalog
Report Generation
RDBMS
Data Analysis
Emissions Inventories
Web Tools/Services
Activity Data
Comparison of Emissions Methods
Spatial Allocation
GIS
Emissions Factors
Model Development
Estimation Methods
Transport Models
Surrogates
9Current Project Objectives
- Recommend and demonstrate to the CEC approaches
for the comparability of techniques and
methodologies for data gathering and analysis,
data management, and electronic data
communications for promoting access to publicly
available electric utility emissions - Identify, collect, and review existing sources
of electric generating utility (EGU) emissions
and activity databases, and provide a summary of
the state-of-science - Build a prototype web browser tool to query,
retrieve, and explore emissions data from
heterogeneous databases - Demonstrate the utility of new information
technologies in creating an integrated network of
distributed data and tools - Dynamically link multiple existing data without
requiring substantial modification on the
providers end and provide interfaces that make
the links transparent to the end user. - Add value to the linked data through the
application of data analysis and processing tools
The projects focus is on criteria pollutants and
toxics because of their availability and
accessibility.
10Design Objectives
- Distributed
- Non-intrusive to data provider
- Transparent to end user
- Flexible
- Extendable
- Light on user requirements
11Process of Building Demonstration
- Identify and access relevant data (build
wrappers) - Build relational database to temporarily store
the data that are not accessible in a distributed
manner - Acquire authorization and access to those data
that are dynamically accessible through internet
interfaces - Create field name mappings among datasets
- Identify available web technologies for building
a distributed emissions tool - Develop new components necessary for the
prototype - Build web tool prototype for demonstrating the
feasibility of exploring emissions data
12Available Internet Accessible Emissions Data
Data Source Time Coverage Pollutants Reporting Level
NEI (US) 1985-1999 (criteria) 1996-1999 (HAPs) NOx, SO2, CO, PM, VOC, HAPs Boiler
eGrid (US) 1996-2000 NOx, SO2, CO2, Mercury Boiler Generator
Clean Air Markets (US) 1980, 1985, 1988-1999 NOx, SO2, CO2 Generator
NPRI (Canada) 1994-2001 HAPs (Criteria starting in 2002) Facility
These are publicly available, on-line accessible
emissions data. Other data resources are
available, but at this time only in hard copy
form and therefore not usable in demonstrating
distributed database concepts.
- NEI, NPRI, and eGrid data were downloaded and
stored in relational databases on the CAPITA
server - BRAVO Mexican emissions data were obtained in
electronic format and imported to the CAPITA
server - Clean Air Markets was identified as the source
most suitable for demonstrating distributed access
13Emissions Data Characteristics
Web browser query Web map server Recent database
structure upgrade
Web browser query Remotely accessible using
SecuRemote Not yet publicly accessible
Web browser query Remotely accessible using
SecuRemote Oracle database
Downloadable Excel Spreadsheets Plans for a
dynamic web system were shelved
14Database Fields Mapping
Emissions inventories are based on different
underlying data models. Each inventory uses a
uniquely defined set of field names. However,
many of these field names are similar to (or
their content is similar to) fields in another
countrys inventory. Some of the key
relationships among the inventories have been
captured by developing a mapping among fields.
These mappings provide a set of connections
that can subsequently be applied to automated
query and integration of data from multiple
inventories.
SO2
SO2_Ann
SO2
Sulfur Dioxide
SO2Yr
15Leveraging Multiple Projects
The challenges in distributed information systems
should be addressed by collaborative efforts
across governments, agencies, researchers, and
disciplines.
Common underlying goals among projects provide
opportunities to naturally design systems to
interoperate
Avoids one-time, stand alone solutions that
cannot be reused
16DataFed.Net
The Aerosol Data and Services Federation,
(http//www.datafed.net), is a network of
providers and users for sharing atmospheric data
and processing services. DataFed includes a
Community of participants who share and use data
and processing services, Mediator software
component to homogenize data access and
Peer-to-peer computing for composing web
applications.
R. Husar, CAPITA
17Catalog of Air Quality Related Data
The emissions data are registered in the
DataFed.net catalog
select data domain aerosols, emissions, fire
Metadata for each dataset are registered in a
catalog allowing users to browse available
datasets and determine which datasets to use for
their particular application. The catalog
entries includes data access instructions.
http//capita.wustl.edu/dvoy_2.0.0/dvoy_services/d
atafed.aspx?
18Preview Data
19Modular Applications for Working with Online Data
Emissions data is multi-dimensional (plant, year,
pollutant, fuel type, boiler capacity, etc.). Its
multidimensionality requires multiple views of
the data in a variety of end-use applications.
Data views can be created in the DataFeds.net
framework, including maps, time series, and
tables. Each view is independently linked to its
data sources and described (i.e. geographic and
temporal extents). Using the data access
instructions registered in the catalog, a view
can be dynamically assembled. The modularly
designed views can be embedded and controlled in
web pages using Javascript, ASP or other web
application programming languages.
20Web Services
Substantial progress has been achieved in data
interoperability. One the next advances required
is interoperable data analysis/processing
tools. Web services are applications that are
used over the Web. Because they are
self-contained and use XML-based standards (SOAP,
WSDL, UDDI) for describing themselves and
communicating with other web resources, they can
be reused in a variety of independent
applications. Many of the analysis and
processing tools used by the air quality
management community could benefit from web
service technology. Not only can their data be
shared but their heterogeneous, distributed tools
that operate on that data can be shared as well.
A longer term vision for web services is to be
able to orchestrate or chain multiple
services from multiple providers so that new
third party applications can be constructed.
21Example Web Services Application
Emissions data from multiple databases are
displayed on maps, time series, and tables. Tools
are included for browsing and querying the data.
In this example, the user can change the
pollutant, date, map zoom data on/off, and
map/time scales.
22North American Emissions Demonstration Data Flow
MapPointAccess
MapPointRender
DataSet NPRI Year1999 ParameterSO2
Color Yellow SymbolBar Width8
Data Catalog
MapPointAccess
MapPointRender
DataSet eGrid Year1999 ParameterSO2
Color Red SymbolBar Width8
MapImageOverlay
Layer Order N.Am, NEI, eGRid, NPRI
MapPointRender
MapPointAccess
Color Blue SymbolBar Width8
Wrappers
DataSet NEI Year1999 ParameterSO2
MapImageAccess
MapImageRender
Color Maroon Size2
DataSet N.Am. Borders
The settings of each web service can be changed
by the user, creating a dynamic application
Name
web service
Settings
23Embedded Images and Controllers in Web Page
Parameter Controller
Date Controller
Query Controller
The controllers and map image view can be linked
and assembled in a web page. Changing the
settings of a controller changes the URL of the
map image and updates the web page. The web
page can be constructed using standard web
application programming languages, such as
JavaScript and ASP.
24Project Results
- Technology has been demonstrated to be at the
point where we can begin to apply some of the
distributed database concepts to real
situations - Emissions data present unique challenges due to
their complex relational dimensionality - Collaborative efforts in the near future could
generate a distributed North American emissions
inventory - The initial versions of the inventory would help
clarify the issues related to handling complex
queries - Building and using a distributed emissions tool
will assist in creating consensus data naming
conventions
25Current Project Challenges
- Dynamic Access to Data
- - technical snafus
- - security issues
- - slow performance
- Complexity of Emissions Data
- Quickly evolving technology
26Whats Missing
- Metadata
- More complete metadata would help in relating
heterogeneous databases - More complete access to distributed datasets
- A process for creating trusted provider-user
agreements would help address issues of security
and data misuse - More comprehensive content
- Networked data and tools that spark additional
interest in the technologys potential - Actual Implementations
- FASTNet will demonstrate the use of many of these
technologies in the real time monitoring of major
aerosol events this summer
27Next Steps
- Advance the prototype tool to make it more
representative of what an emissions inventory
would ultimately use - EPAs RFA on Distributed Air Quality Data in
Support of NEISGEI - Link to other web services (Geospatial One-Stop,
TerraServer) and other relevant data sources,
including Web Map Servers - Establish collaborative partnerships with other
researchers and agencies developing related
networks and tools (ReVa, Earth Science
Federation (NASA), NSF Cyberinfrastructure
efforts) - Build tools that add value to the distributed
data and provide incentives for data providers to
join networks - Clarify the handling of distributed data for
optimal system performance
28Mediators
- The flow of data from provider to user passes
through brokers, or mediators. - The data continues to be maintained by original
providers - Contracts are used to retain a constant link to
the original data - Data is still not centrally stored but is
cached in a format that allows efficient
queries and analyses - Mediators provide an interface between the user
and the data that enhances the effective
information exchange between the two sides
29Adding Value to data through Services
Mediator Services
analytical web services
multidimensional data cubes
Server
Client
Data Sources
Data Users
30Demo Links
http//capita.wustl.edu/NAMEN/DemoFeb27.htm