XML Web Services: Toxics Release Inventory - PowerPoint PPT Presentation

1 / 58
About This Presentation
Title:

XML Web Services: Toxics Release Inventory

Description:

If your PDF files use tagged Adobe PDF, you can extract the text without ... Both PDFMaker and Acrobat Web Capture create tagged Adobe PDF automatically. ... – PowerPoint PPT presentation

Number of Views:100
Avg rating:3.0/5.0
Slides: 59
Provided by: Niem
Category:

less

Transcript and Presenter's Notes

Title: XML Web Services: Toxics Release Inventory


1
XML Web ServicesToxics Release Inventory
  • Brand Niemann
  • XML Web Services Evangelist
  • Data Standards Branch
  • January 12, 2002

Disclaimer Any reference to or depiction of the
commercial product of any vendor is for
illustrative purposes only and does not
constitute an endorsement by EPA or the trainer.
2
Overview
  • 1. Background
  • 2. National Database to FileMaker XML
  • 3. Web Pages and PDF to XML Documents
  • 4. Data Tables to XML Data Islands
  • 5. Some Future Steps
  • 6. Questions and Answers

3
1. Background
  • The Toxics Release Inventory (TRI), published by
    the U.S. EPA, is a valuable source of information
    regarding toxic chemicals that are being used,
    manufactured, treated, transported, or released
    into the environment.
  • Two statutes, Section 313 of the Emergency
    Planning and Community Right-To-Know Act (EPCRA)
    and section 6607 of the Pollution Prevention Act
    (PPA), mandate that a publicly accessible toxic
    chemical database be developed and maintained by
    US EPA. This database, known as the Toxics
    Release Inventory (TRI), contains information
    concerning waste management activities and the
    release of toxic chemicals by facilities that
    manufacture, process, or otherwise use said
    materials. Using this information, citizens,
    businesses, and governments can work together to
    protect the quality of their land, air and water.

4
2. National Database to FileMaker XML
  • 2.1 FileMaker 5.5
  • http//www.filemaker.com
  • 2.2 Steps
  • Download National.exe (16.7 MB) and extract.
  • http//epa.gov/tri/tri99/data/
  • Import each of 4 files into FileMaker 5.5 (164
    MB).
  • Make the 4 files sharable on the Web.
  • Use the FileMaker URL syntax for XML output.
  • 2.3 Interface Customization Possibilities.
  • http//www.filemaker.com/products/fmu_home.html

5
2.1 FileMaker 5.5
  • Subsidiary of Apple Computer with powerful
    desktop desktop database functionality that
    supports multiple platforms including the Web.
  • The workgroup database of choice with
    organizations more than 65 of the 1.2 million
    units shipped in 2000-2001 were volume license
    sales - second to Microsoft Access.
  • Third party developer resources
  • Macromedia Dreamweaver
  • Adobe GoLive
  • Allaire ColdFusion

6
2.1 FileMaker 5.5 Database-to-XML
7
2.2 1999 Toxics Release Inventory (TRI) Data Files
  • File Type 1 Facility, Chemical, Releases and
    Other Waste Management Summary Information. This
    file contains facility information (Part I on
    Form R and Form A) as well as most chemical
    information (Part II on Form R and Form A). Data
    elements are reported individually. The
    information is also disaggregated based on Waste
    Management code (i.e., "M" code), and aggregated
    up to On-site Releases, Off-site Releases, Other
    On-site Waste Management, and Transfers Off-site
    for Further Waste Management categories. (84,079
    records)
  • File Type 2 Detailed Waste Management and Source
    Reduction Activities. This files contains
    facility information (Part I on Form R and Form
    A) as well as the detailed information regarding
    source reduction and recycling activities (Part
    II, Section 8 on Form R) and on-site waste
    treatment methods (Part II, Section 7 on Form R).
    (84,079 records)
  • File Type 3A Details of Transfers Off-site. This
    file contains facility information (Part I on
    Form R and Form A) as well as details of
    individual transfers off-site (Part II, Section
    6.2 on Form R). (100,033 records)
  • File Type 3B Details of Transfers to Publicly
    Owned Treatment Works (POTW). This file contains
    facility information (Part I on Form R and Form
    A) as well as a list of POTWs (Part II, Section
    6.1.B on Form R). (84,079 records)

8
2.2 TRI National File Type 1 in FileMaker 5.5
9
2.2 TRI National File Type 1 in Web Browser
10
2.2 TRI National File Type 1 in Web Browser
11
2.2 TRI National File Type 1 in IE 6 (XML)
12
2.3 Interface Customization Possibilities
  • Change default.htm to own.
  • Use own stylesheet (XSL). Need Developer version.
  • Use HTML and Java to build Web application or
    portal.
  • Local Emergency Planning Committee database
  • http//www.epa.gov/ceppo/lepclist.htm
  • List of Lists database
  • http//130.11.53.73/lol/
  • Population Estimation from Year 2000 Census
    Blocks
  • http//198.246.85.108591/population

13
3. PDF and Web Pages to XML Documents
  • 3.1 Content Re-design and Re-publishing.
  • 3.2 Repurposing PDF to Excel.
  • 3.3 Repurposing PDF to XML.
  • 3.4 Repurposing PDF to Folio Views.
  • 3.5 NextPage Folio Views, LivePublish, and NXT 3.
  • 3.6 Comments.

14
3.1 Content Re-design and Re-publishing
  • Background
  • backgrd_factors.pdf
  • Database
  • National.exe
  • Press
  • 40 pdf files at http//epa.gov/tri/tri99/press/pre
    ss.htm
  • Tri99press.xsl (34 tables)
  • Previous
  • Tri97.nfo and tri97.xls
  • Questions and Answers
  • Qa.pdf (file error)
  • Report
  • 1999pdr.pdf, completereport.pdf,
    sfs_introduction.pdf (Tri99.xsl - 23 tables).

15
3.1 Content Re-design and Re-publishing
16
3.2 Repurposing PDF to Excel
  • See Adobe Acrobat Help pages 103-109 82-84
  • See next two slides for background.
  • Do Edit, Preferences, Text/Formatted Text
    Preferences, Default Selection Type Table, Okay.
  • Select Table/Formatted Text Select Tool and draw
    a box around the table to be converted.
  • Do Edit, Copy (or CtrlC)
  • In a blank Excel worksheet do Edit, Paste
    (Ctrl-V)
  • Results tri99.xls and tri99press.xls.

17
Acobat 5.0 Repurposing and Extracting
  • Acrobat 5.0 gives you powerful commands for
    repurposing or extracting text and graphics in
    PDF files.You can use the Save As command to save
    all text in a PDF file in Rich Text Format (RTF)
    for import into your favorite authoring
    application. If your PDF files use tagged Adobe
    PDF, you can extract the text without losing the
    formatting. For example, you can save pages of
    tables from a PDF file for import into an
    application such as Adobe FrameMaker or Microsoft
    Word and the table formatting will be preserved.
    Both PDFMaker and Acrobat Web Capture create
    tagged Adobe PDF automatically. (See About the
    different types of Adobe PDF documents on next
    slide) You can also use the Save As command to
    save each page in a PDF file to an image format.
    You can use the Export command to export all
    images in a PDF file each image is saved in a
    separate file. In addition, Acrobat provides
    several toolsthe text select tool, the column
    select tool, the table/formatted text select
    tool, and the graphics select toolfor copying
    and pasting small amounts of text and graphics
    from a PDF file to your clipboard.You can also
    paste text from a PDF document into a comment or
    bookmark name. While in a PDF document, you
    select the text or graphic and copy it onto the
    clipboard. Once the text or graphic is on the
    clipboard, you can launch the other application
    and paste the text or graphic into a file.

18
About the different types of Adobe PDF documents
  • There are three types of Adobe PDF documents
    unstructured, structured, and tagged. These
    document types differ in what they contain and
    how their contents can be repurposed. In general,
    the more structural information the Adobe PDF
    document contains, the more options you have for
    repurposing its contents.
  • 1. Unstructured Adobe PDF You can save
    unstructured Adobe PDF files to other formats
    such as RTF with good results. An unstructured
    Adobe PDF file saved to RTF recognizes
    paragraphs, but not basic text formatting, lists,
    or tables.You cant reflow unstructured Adobe PDF
    files into different-sized devices, such as eBook
    reading devices. Unstructured Adobe PDF files
    arent reliably accessible using a screen reader
    for Windows.
  • 2. Structured Adobe PDF You can save structured
    Adobe PDF files to other formats such as RTF with
    results that are better than unstructured Adobe
    PDF files but not as good as tagged Adobe PDF
    files. Structured Adobe PDF files saved to RTF
    recognize paragraphs and basic text formatting,
    but not lists or tables.You cant reflow
    structured Adobe PDF files into different-sized
    devices. Structured Adobe PDF files can be
    accessed using a screen reader for Windows, but
    without the reliability of tagged Adobe PDF
    files.
  • 3. Tagged Adobe PDF You can save tagged Adobe
    PDF files to other formats such as RTF with the
    best results, including the recognition of
    paragraphs, basic text formatting, lists, and
    tables.You can reflow tagged Adobe PDF files so
    that theyre readable in different-sized
    devices.Tagged Adobe PDF files have been
    optimized for accessibility, so they can be
    accessed reliably using a screen reader for
    Windows.

19
3.2 Repurposing PDF to Excel
20
3.2 Repurposing PDF to Excel
21
3.3 Repurposing PDF to XML
  • Adobe PDF Document as HTML
  • http//access.adobe.com/simple_form.html
  • Save As XML Plug-In for Windows (B2)
  • http//www.adobe.com/support/downloads/detail.jsp?
    hexID89a2
  • Install and do Help and About Adobe Acrobat
    Plugins and select SaveasXML.
  • Do File, Save as, XML-1.00 without styling
    (.xml) or XHTML-1.00 with CSS-1.00 (.htm).
    (Note Must be a tagged Acrobat PDF.)
  • See SaveAsXML Developer Information for Creating
    and Modifying Mapping Tables (DeveloperInfo.pdf).

22
3.3 Repurposing PDF to XML
23
3.3 Repurposing PDF to XML
24
3.3 Repurposing PDF to XML
25
3.3 Repurposing PDF to XML
26
3.4 Repurposing PDF to Folio Views
  • Imports major word processing and Web formats.
  • Use Adobe Acrobat 5.0.5.
  • Not the free Acrobat Reader.
  • Do File, Open as Adobe PDF, then File, Save as,
    RTF.
  • Use Folio View 4.2
  • Do File, New and give it a name, Open or File,
    Import, select RTF, Open.
  • Also do File, Import URL for Web formats.
  • Apply structure, links, formatting, etc. using
    the GUI.

27
3.4 Repurposing PDF to Folio Views
28
3.4 Repurposing PDF to Folio Views
29
3.5 NextPage Folio Views, LivePublish, and NXT 3
  • NextPage http//www.nextpage.com
  • Folio Views SGML-like markup (pre-XML) in a
    GUI.
  • CD-ROM distribution.
  • Web Server (Markup-to-HTML on the fly).
  • LivePublish Basic XML support (uses DTD and see
    next slide).
  • Site Administrator.
  • Personal Edition (Desktop and CD-ROM).
  • Web Server (Markup-to-HTML on the fly).
  • NXT 3 Advanced support for XML (LivePublish
    plus XSL, SOAP, etc. see later slide).
  • Content Network Manager.
  • Content Network Server.

30
3.5 NextPage LivePublish
  • Uses of XML (see separate handout)
  • Serve up native XML.
  • Convert XML to HTML using a CSS or XSL at run
    time using the Display Filter API.
  • Convert XML to HTML at build time.
  • Uses an XML-based file to define site look and
    feel.
  • The build Makefiles are XML files that define the
    structure and contents of the information
    collections.
  • XML-based legacy conversion tools simplify the
    conversion of existing content into HTML.
  • Indexsheets (XIL) define and control the indexing
    of content like stylesheets (XSL) define and
    control the formatting (see separate handout).

31
3.5 NextPage Folio Views
32
3.5 NextPage LivePublish Site Administrator
33
3.5 NextPage LivePublish Personal Edition
34
3.5 NextPage LivePublish Personal Edition
35
3.5 NextPage LivePublish Web Server
36
3.5 NextPage NXT 3 Content Network
  • NextPage Web Services White Paper
  • NXT 3 has been delivering XML Web Services since
    July 2000 based on an early SOAP recommendations
    before SOAP became a standard.
  • NextPage is developing full support for SOAP,
    WSDL, and UDDI standards and conforming Web
    service frameworks such as .Net and Sun One
    (Java).
  • Basic XML Web services provides low-level
    communication and NXT 3 provides high-level data
    coordination when intelligent evaluation of
    distributed content and collaborative
    capabilities in the context of business processes
    is needed (just released Matrix).

37
3.5 NextPage NXT 3 Content Network Manager
38
3.5 NextPage NXT 3 Content Network Web Server
39
3.6 Comments
  • Previous work
  • 1995 Folio Views Infobase and Excel files.
  • TRI 1997 CD-ROM Users Guide Infobase.
  • Could add Year 2000 easily to Year 1999.
  • Organized files by folders for indexing with the
    NXT 3 File Service (recall section 3.1 screen
    capture and see next slide).
  • Can/should create tagged PDF files when you use
    Acrobat PDFMaker 5.0 to create PDF files from
    within Microsoft Office 2000 applications.

40
3.6 Discussion
41
4. Excel Data Tables to XML Data Islands
  • 4.1 Excel-to-XML and XML-to-Excel Round-tripping.
  • 4.2 XML Spy 4.2.
  • 4.3 Application of XML Step by Step, Second
    Edition, Data Binding.
  • 4.4 Comments.

42
4.1 Excel-to-HTML(XML) andHTML(XML)-to-Excel
Round-tripping
  • In Excel do File, Save as Web Page, select
    Republish Sheet, Publish, Open in Browser,
    Publish.
  • In IE 5 or 6 do View Source and explore the
    XML-like markup.
  • In Excel do File, Open, Files of type Web pages.

43
4.2 Data Tables to XML Data Islands
  • XML Spy 4.2 (see Tutorial)
  • Copying XML data to and from third party
    products
  • XML Spy allows you to easily copy data to and
    from third party products. The copied data can be
    used within XML Spy as well as third-party
    products, enabling you to transfer XML data to
    spreadsheet-like applications (e.g. Microsoft
    Excel).
  • The " Copy as Structured Text" command copies
    elements to the clipboard as they appear on
    screen. This command is useful for copying
    table-like data from the Enhanced Grid View as
    well as the integrated Database/Table View.
  • The copied data can be used within XML Spy as
    well as third-party products, enabling you to
    transfer XML data to spreadsheet-like
    applications (e.g. Microsoft Excel).

44
4.3 Application of XML Step by Step, Second
Edition, Data Binding
  • Re-format Excel worksheet with appropriate field
    names (Upper Camel Case). (See next slide)
  • Import to FileMaker 5.5 using field names.
  • Query FileMaker on the Web for XML output
  • http//localhost/FMPro?dbtri99table1.fp5format
    dso_xmlfindall
  • Add the XML output as a data island in the HTML
    file and display in IE5-6.
  • See tri1999table1.xml and tri1999table1.htm

45
4.3 Application of XML Step by Step, Second
Edition, Data Binding
46
4.3 Application of XML Step by Step, Second
Edition, Data Binding
47
4.3 Application of XML Step by Step, Second
Edition, Data Binding
  • Nevada
  • 1
  • 1529022
  • 1868475
  • 136431
  • 2797
  • 1.1647E09
  • 1.1682E09
  • 212998
  • 1.1684E09
  • ..

48
4.3 Application of XML Step by Step, Second
Edition, Data Binding
49
5. Some Future Steps
  • 5.1 Microsoft Excel 2002 lets you open or save
    workbooks in XML format.
  • 5.2 Access 2002 allows you to create a database
    table by importing an XML document or to export a
    database table or other object to an XML
    document.

50
5.1 Microsoft Excel 2002
  • Source Chapter 15. Publishing Information on the
    Web, Step by Step Microsoft Excel 2002
  • Previous Excel 2000 workbooks and worksheets
    could be saved as Web files and queries could
    bring Web data into workbooks.
  • Excel 2002 extends those capabilities by
    providing live-links from Excel to Web files and
    by providing import and export of XML and Smart
    Tags (e.g. have Excel look for known stock
    symbols and connect to a Web site that has
    information related to that symbol).

51
5.1 Microsoft Excel 2002
  • Working with Structured Data
  • XML can identify rows and cells within the
    spreadsheet and allow spreadsheet data to move
    freely to other applications.
  • Do File, Save As, Save as type, select XML
    Spreadsheet (.xml), and click Save. Click Yes
    when the message box appears.
  • Open the XML file in Spy to examine its structure
    and content (PivotXML.xml).
  • Open the XML file in Excel 2002 to see it
    re-display.

52
5.1 Microsoft Excel 2002
53
5.2 Access 2002
  • Source Chapter 3. Getting Information Into and
    Out of a Database, Step by Step Microsoft Access
    2002
  • Best practices
  • Link to other databases rather than import so can
    view and edit in both systems.
  • Share databases by exporting to XML (universal
    format).
  • http//office.microsoft.com/assistance/2002/articl
    es/acExOfScenariosUsingXML.aspx
  • Import
  • Open Access 2002 database.
  • File, Get External Data, Import, Files of type,
    XML Documents, Import both XML and XSD, select
    file to be imported, Import, Import XML, Options,
    Structure and Data, Okay.
  • Open and view database tables to confirm data was
    imported.

54
5.2 Access 2002
  • Exporting to other applications
  • Works for Table, Query, Form, and Report.
  • Open Access 2002 database and select a table.
  • File, Export, select XML Documents, Save as type,
    Export, Export XML, select both Data (XML) and
    Schema (XSD) of the data, Okay.
  • See screen captures on next pages.
  • See Advanced, Schema tab and select appropriate
    option.
  • Look at XML and XSD files (see examples below) in
    XML Spy 4.2
  • Orders.xml, Order Details.xml, and Order
    Details.xsd.

55
5.2 Access 2002
56
5.2 Access 2002
57
5.2 Access 2002
58
6. Questions and Answers
  • Brand Niemann. Ph.D.
  • USEPA Headquarters, EPA West, Room 6143D
  • Office of Environmental Information, MC 2822T
  • 1200 Pennsylvania Avenue, NW, Washington, DC
    20460
  • 202-566-1657
  • niemann.brand_at_epa.gov
  • EPA http//161.80.70.167
  • Outside EPA http//130.11.44.140
Write a Comment
User Comments (0)
About PowerShow.com