Compilation and Design of a Functioning Distributed Database of North American Electric Generating Emissions - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Compilation and Design of a Functioning Distributed Database of North American Electric Generating Emissions

Description:

Demonstration of a Distributed Emissions Inventory using Web Technologies Stefan Falke Center for Air Pollution Impact and Trend Analysis Washington University in St ... – PowerPoint PPT presentation

Number of Views:244
Avg rating:3.0/5.0
Slides: 31
Provided by: s1133
Category:

less

Transcript and Presenter's Notes

Title: Compilation and Design of a Functioning Distributed Database of North American Electric Generating Emissions


1
Compilation and Design of a Functioning
Distributed Database of North American Electric
Generating Emissions
Demonstration of a Distributed Emissions
Inventory using Web Technologies
  • Stefan Falke
  • Center for Air Pollution Impact and Trend
    Analysis
  • Washington University in St. Louis
  • Gregory Stella
  • Alpine Geophysics, LLC
  • Terry Keating
  • US EPA Office of Air Radiation

2
Background
Air pollutant emission inventories for the US,
Canada, and Mexico are compiled, stored and
disseminated using different methods
The development of a single comprehensive and
accurate emissions inventory is essential for the
coordinated reporting, policy development,
transport analyses, and socio-economic studies
that create an environment for collaboration
among international researchers, policy-makers,
and the interested public
In support of this longer term goal, the
Commission on Environmental Cooperation (CEC) and
the US EPA have initiated a project to develop a
prototype web tool for enabling uniform access to
distributed emissions data from North American
electricity generating power plants.
3
Distributed Data and Management Networks
Advances in information science and technology
are driving the trend toward distributed networks
and virtual communities for science and
management.
  • Cyberinfrastructure
  • NSFs initiative to apply new IT to building new
    ways of conducting collaborative research
    http//www.communitytechnology.org/nsf_ci_report/
  • Earth Observation Summit
  • International effort to build comprehensive,
    coordinated, and sustained Earth observation
    systems http//www.earthobservationsummit.gov
  • Ecoinformatics
  • EPAs vision for national and international
    cooperation in data and technology development
    http//oaspub.epa.gov/sor/user_conference.startup

Integrated Ocean Observing System International
network of ocean related monitoring, assessment,
and communication http//www.ocean.us/ Linked
Environments for Atmospheric Discovery Network
of high-performance computers and software to
gain new insights into weather
http//lead.ou.edu/ Virtual Observatory Network
for astronomical data sharing and distributed
analysis http//www.us-vo.org/
4
Emissions Community Collaborative Activities
  • NIF Data standards
  • Standard format and submission
  • NEI XML schema
  • Environmental Information Exchange Network
  • Network linking EPA, States, and other partners
    through the Internet and standardized data
    formats
  • Facility Registry System
  • Standard facility codes and locations
  • Data Sharing Efforts
  • States, Tribes, Local agencies, RPOs
  • North America

5
NEISGEI
Networked Environmental Information System for
Global Emissions Inventories
  • is both a conceptual framework and
    implementation effort for the development of a
    fully integrated, distributed air emissions
    inventory and the foundation for an all-media
    environmental information network
  • Tie together data at all spatial and temporal
    scales using emerging distributed database
    technologies
  • Provide shared, online tools for processing and
    analysis
  • Provide for the seamless merging, manipulation
    and analysis of Internet accessible air
    quality-relevant data through the development of
    emerging Internet-oriented technologies
  • Make use of existing resources link and partner
    with other efforts
  • Build a broad-based air quality user community
    scientists, regulators, policy analysts and the
    public
  • Create the network and toolkit piece-wise through
    multiple, connected projects

Ongoing Efforts NSF-EPA Digital Government
Funded Projects The California Air Resources
Network Fire Air Quality Data and Tools
Network Future Effort EPA OAR RFA on
Distributed Air Quality Data in Support of
NEISGEI
6
CAREN The California Air Resources Network
Eduard Hovy, Jose-Luis Ambite, Andrew Philpot
USC Information Sciences Institute
  • Environmental data sharing among international,
    national, state and local governments, the public
    and academic and other non-governmental research
    organizations is a difficult challenge.
  • Barrier Technological incompatibilities
  • Barrier Data format incompatibilities
  • Barrier Financial (staff time) limitations

RPOs
  • The Solution Strategy (First Step)
  • Automate the integration of heterogeneous
    databases

Use semi-automated information integration
methods to generate translation protocols between
related information sources, e.g. AQMD and CARB.
???
7
Fire and Air Quality Data Network
Data Sources
Data Wrappers
CAPITA
BLM Fire History
FS Coarse Spatial Data
Data Catalog
NEI Fire Emissions
ftp
HMS Fire Detection
Wildland Fire Assessment System
text tables
RDBMS
Spatial Interpolation Service
VIEWS
  • Data wrappers are used to translate the format
    of data sets into a uniform format.
  • The data are either stored on the CAPITA
    database server or dynamically accessed from its
    original source
  • The datasets are registered with metadata in the
    data catalog
  • GIS-type interfaces provide users with ways to
    view, analyze, and export the data

8
Envisioned Emissions Community Resource of Data
Tools
Users Projects
Data
Data Catalogs
Geospatial One-Stop
Wrappers
XML
Emissions Inventory Catalog
Report Generation
RDBMS
Data Analysis
Emissions Inventories
Web Tools/Services
Activity Data
Comparison of Emissions Methods
Spatial Allocation
GIS
Emissions Factors
Model Development
Estimation Methods
Transport Models
Surrogates
9
Current Project Objectives
  • Recommend and demonstrate to the CEC approaches
    for the comparability of techniques and
    methodologies for data gathering and analysis,
    data management, and electronic data
    communications for promoting access to publicly
    available electric utility emissions
  • Identify, collect, and review existing sources
    of electric generating utility (EGU) emissions
    and activity databases, and provide a summary of
    the state-of-science
  • Build a prototype web browser tool to query,
    retrieve, and explore emissions data from
    heterogeneous databases
  • Demonstrate the utility of new information
    technologies in creating an integrated network of
    distributed data and tools
  • Dynamically link multiple existing data without
    requiring substantial modification on the
    providers end and provide interfaces that make
    the links transparent to the end user.
  • Add value to the linked data through the
    application of data analysis and processing tools

The projects focus is on criteria pollutants and
toxics because of their availability and
accessibility.
10
Design Objectives
  • Distributed
  • Non-intrusive to data provider
  • Transparent to end user
  • Flexible
  • Extendable
  • Light on user requirements

11
Process of Building Demonstration
  • Identify and access relevant data (build
    wrappers)
  • Build relational database to temporarily store
    the data that are not accessible in a distributed
    manner
  • Acquire authorization and access to those data
    that are dynamically accessible through internet
    interfaces
  • Create field name mappings among datasets
  • Identify available web technologies for building
    a distributed emissions tool
  • Develop new components necessary for the
    prototype
  • Build web tool prototype for demonstrating the
    feasibility of exploring emissions data

12
Available Internet Accessible Emissions Data
Data Source Time Coverage Pollutants Reporting Level
NEI (US) 1985-1999 (criteria) 1996-1999 (HAPs) NOx, SO2, CO, PM, VOC, HAPs Boiler
eGrid (US) 1996-2000 NOx, SO2, CO2, Mercury Boiler Generator
Clean Air Markets (US) 1980, 1985, 1988-1999 NOx, SO2, CO2 Generator
NPRI (Canada) 1994-2001 HAPs (Criteria starting in 2002) Facility
These are publicly available, on-line accessible
emissions data. Other data resources are
available, but at this time only in hard copy
form and therefore not usable in demonstrating
distributed database concepts.
  • NEI, NPRI, and eGrid data were downloaded and
    stored in relational databases on the CAPITA
    server
  • BRAVO Mexican emissions data were obtained in
    electronic format and imported to the CAPITA
    server
  • Clean Air Markets was identified as the source
    most suitable for demonstrating distributed access

13
Emissions Data Characteristics
Web browser query Web map server Recent database
structure upgrade
Web browser query Remotely accessible using
SecuRemote Not yet publicly accessible
Web browser query Remotely accessible using
SecuRemote Oracle database
Downloadable Excel Spreadsheets Plans for a
dynamic web system were shelved
14
Database Fields Mapping
Emissions inventories are based on different
underlying data models. Each inventory uses a
uniquely defined set of field names. However,
many of these field names are similar to (or
their content is similar to) fields in another
countrys inventory. Some of the key
relationships among the inventories have been
captured by developing a mapping among fields.
These mappings provide a set of connections
that can subsequently be applied to automated
query and integration of data from multiple
inventories.
SO2
SO2_Ann
SO2
Sulfur Dioxide
SO2Yr
15
Leveraging Multiple Projects
The challenges in distributed information systems
should be addressed by collaborative efforts
across governments, agencies, researchers, and
disciplines.
Common underlying goals among projects provide
opportunities to naturally design systems to
interoperate
Avoids one-time, stand alone solutions that
cannot be reused
16
DataFed.Net
The Aerosol Data and Services Federation,
(http//www.datafed.net), is a network of
providers and users for sharing atmospheric data
and processing services. DataFed includes a
Community of participants who share and use data
and processing services, Mediator software
component to homogenize data access and
Peer-to-peer computing for composing web
applications.
R. Husar, CAPITA
17
Catalog of Air Quality Related Data
The emissions data are registered in the
DataFed.net catalog
select data domain aerosols, emissions, fire
Metadata for each dataset are registered in a
catalog allowing users to browse available
datasets and determine which datasets to use for
their particular application. The catalog
entries includes data access instructions.
http//capita.wustl.edu/dvoy_2.0.0/dvoy_services/d
atafed.aspx?
18
Preview Data
19
Modular Applications for Working with Online Data
Emissions data is multi-dimensional (plant, year,
pollutant, fuel type, boiler capacity, etc.). Its
multidimensionality requires multiple views of
the data in a variety of end-use applications.
Data views can be created in the DataFeds.net
framework, including maps, time series, and
tables. Each view is independently linked to its
data sources and described (i.e. geographic and
temporal extents). Using the data access
instructions registered in the catalog, a view
can be dynamically assembled. The modularly
designed views can be embedded and controlled in
web pages using Javascript, ASP or other web
application programming languages.
20
Web Services
Substantial progress has been achieved in data
interoperability. One the next advances required
is interoperable data analysis/processing
tools. Web services are applications that are
used over the Web. Because they are
self-contained and use XML-based standards (SOAP,
WSDL, UDDI) for describing themselves and
communicating with other web resources, they can
be reused in a variety of independent
applications. Many of the analysis and
processing tools used by the air quality
management community could benefit from web
service technology. Not only can their data be
shared but their heterogeneous, distributed tools
that operate on that data can be shared as well.
A longer term vision for web services is to be
able to orchestrate or chain multiple
services from multiple providers so that new
third party applications can be constructed.
21
Example Web Services Application
Emissions data from multiple databases are
displayed on maps, time series, and tables. Tools
are included for browsing and querying the data.
In this example, the user can change the
pollutant, date, map zoom data on/off, and
map/time scales.
22
North American Emissions Demonstration Data Flow
MapPointAccess
MapPointRender
DataSet NPRI Year1999 ParameterSO2
Color Yellow SymbolBar Width8
Data Catalog
MapPointAccess
MapPointRender
DataSet eGrid Year1999 ParameterSO2
Color Red SymbolBar Width8
MapImageOverlay
Layer Order N.Am, NEI, eGRid, NPRI
MapPointRender
MapPointAccess
Color Blue SymbolBar Width8
Wrappers
DataSet NEI Year1999 ParameterSO2
MapImageAccess
MapImageRender
Color Maroon Size2
DataSet N.Am. Borders
The settings of each web service can be changed
by the user, creating a dynamic application
Name
web service
Settings
23
Embedded Images and Controllers in Web Page
Parameter Controller
Date Controller
Query Controller
The controllers and map image view can be linked
and assembled in a web page. Changing the
settings of a controller changes the URL of the
map image and updates the web page. The web
page can be constructed using standard web
application programming languages, such as
JavaScript and ASP.
24
Project Results
  • Technology has been demonstrated to be at the
    point where we can begin to apply some of the
    distributed database concepts to real
    situations
  • Emissions data present unique challenges due to
    their complex relational dimensionality
  • Collaborative efforts in the near future could
    generate a distributed North American emissions
    inventory
  • The initial versions of the inventory would help
    clarify the issues related to handling complex
    queries
  • Building and using a distributed emissions tool
    will assist in creating consensus data naming
    conventions

25
Current Project Challenges
  • Dynamic Access to Data
  • - technical snafus
  • - security issues
  • - slow performance
  • Complexity of Emissions Data
  • Quickly evolving technology

26
Whats Missing
  • Metadata
  • More complete metadata would help in relating
    heterogeneous databases
  • More complete access to distributed datasets
  • A process for creating trusted provider-user
    agreements would help address issues of security
    and data misuse
  • More comprehensive content
  • Networked data and tools that spark additional
    interest in the technologys potential
  • Actual Implementations
  • FASTNet will demonstrate the use of many of these
    technologies in the real time monitoring of major
    aerosol events this summer

27
Next Steps
  • Advance the prototype tool to make it more
    representative of what an emissions inventory
    would ultimately use
  • EPAs RFA on Distributed Air Quality Data in
    Support of NEISGEI
  • Link to other web services (Geospatial One-Stop,
    TerraServer) and other relevant data sources,
    including Web Map Servers
  • Establish collaborative partnerships with other
    researchers and agencies developing related
    networks and tools (ReVa, Earth Science
    Federation (NASA), NSF Cyberinfrastructure
    efforts)
  • Build tools that add value to the distributed
    data and provide incentives for data providers to
    join networks
  • Clarify the handling of distributed data for
    optimal system performance

28
Mediators
  • The flow of data from provider to user passes
    through brokers, or mediators.
  • The data continues to be maintained by original
    providers
  • Contracts are used to retain a constant link to
    the original data
  • Data is still not centrally stored but is
    cached in a format that allows efficient
    queries and analyses
  • Mediators provide an interface between the user
    and the data that enhances the effective
    information exchange between the two sides

29
Adding Value to data through Services
Mediator Services
analytical web services
multidimensional data cubes
Server
Client
Data Sources
Data Users
30
Demo Links
http//capita.wustl.edu/NAMEN/DemoFeb27.htm
Write a Comment
User Comments (0)
About PowerShow.com