Title: XML Working Group, Emerging Technology Committee, U'S' Federal CIO Council
1XML Working Group,Emerging Technology
Committee,U.S. Federal CIO Council
XML, XFML, and the Other Xs From Data
Aggregation to Faceted Search-and-Discovery to
Business Value
Joint presentation
Washington, DC July 23, 2003
Brett Stein, Solutions Director XAware, Inc.
Iqbal Talib, Co-Founder CEO i411, Inc.
2Presentation Outline (Road Map)
1. Objectives 2. Rationale
Brett
Iqbal
31. Demo 2
3. Context 4. Impetus
18. Demo 1
19-30. Faceted search discovery
5-13. Data aggregation
32-33. Conclusions, QA
14-16. Context for Demos 1 2
31. Objectives (what? bottom line?)
- __________________________________________________
___ - Demystify some of the Xs, e.g.
- - XML, XFML, XTM, and XQuery
- - XSLT, ebXML, DASL
- __________________________________________________
___ - Show the power of creating real-time read/write
access through unified XML views of disparate,
dispersed, and (un)structured data sources among
Federal agencies
Demo 1 Convert OMB data to XML - __________________________________________________
___ - Show human-friendly, category-based navigation
via real-time, faceted search-and-discovery
technology Demo 2
Conduct faceted search of OMB data - __________________________________________________
___ - Illustrate how others will benefit, e.g.,
unlock the full social, economic, and
intelligence value of information assets of a
government agency - __________________________________________________
___
42. Rationale (basis for joint presentation?)
- __________________________________________________
________ - Vital need among Federal agencies for data
integration, e.g., treat multiple data
repositories as one logical sourceand create a
single view - Urgency to access, search, retrieve, share, and
exchange critical information across sister
agenciesoften in real time - Leverage innovativebut provenbetter, faster
mousetraps to attain technology and business
goals, including ROI - __________________________________________________
________ - XML underpins the common format for data
exchangeand the basis upon which faceted
searching can be enabled - XFML allows the sharing of hierarchical faceted
metadata and indexing efforts - __________________________________________________
________ - Relevance to the xmlWG CIO Counciland
education and outreach to government - Show me demos suggested by the xmlWG (OMB
Exhibit 53) - __________________________________________________
________
53. Context (big picture? external forces at play?)
10
1
Citizens/CRM
New world
2
9
New technologies
Congress
Civilian agencies
3
8
GPRA/NPR 1993, ITMRA 1996
Defense agencies
Intelligence agencies
Federal government
U.S. DHS 2003
Security agencies
4
7
ERM GPEA 1998
e-Gov/ FirstGov 2002
5
6
9/11, U.S. Patriot Act
FEA 2001
64. Impetus (example issues?)
- Government agencies face significant challenges
and opportunities with tagging, sharing,
searching, and exchanging information, e.g
__________________________________________________
_________________ - Of the 28 lines of business found in the Federal
government, 19 Executive Departments and agencies
(on average) are performing the same line of
business E-Government Strategy, OMB - Each agency typically has invested intraditional
approaches, regardless of other departments
redundant efforts E-Government Strategy, OMB - __________________________________________________
________________ - Agencies may be wasting at least 20 of the
approx. 60 billion (FY 2004) allocation for IT
on redundant systems and services - 40 of all application development effort is
spent on accessing existing data IDC - __________________________________________________
________________ - Myriad issues, trends, and mandates
- - typhoons of data
- - silos, stovepipes
- - interoperability, interconnectivity
- - net-centricity
- - horizontal fusion
- - EA, EAI, ERM
- __________________________________________________
________________
75. XMLand the Other Xs (brief definitions?)
- XML (eXtensible Markup Language) A standard,
simple, self-describing way of encoding both text
and data so that content can be processed with
relatively little human intervention and
exchanged across diverse hardware, operating
systems, and applications. - XFML (eXchangeable Faceted Metadata Language) An
XML model to express topics, organized in
hierarchies or trees within mutually exclusive
containers called facets. It also allows the
expression of indexing efforts metadata
assigned to pages of data - XTM (XML Topic Maps) An XML specification that
provides a model and grammar for representing the
structure of information resources used to define
topics, and the associations (relationships)
between topics. - XQuery (XML Query Language) A query language
that uses the structure of XML to intelligently
express queries across all these kinds of data,
whether physically stored in XML or viewed as XML
via middleware. - XSLT (eXtensible Stylesheet Language
Transformation) defines the syntax and semantics
of XSLT, which is a language for transforming XML
documents into other XML documents (HTML). - ebXML (Electronic Business using eXtensible
Markup Language) a modular suite of
specifications that enables enterprises of any
size and in any geographical location to conduct
business over the Internet. - XFDL (eXtensible Forms Description Language) The
purpose of XFDL is to solve the body of problems
associated with digitally representing complex
forms such as those found in business and
government.
86. Standards-Driven Enterprise Info. Integration
- __________________________________________________
_________ - Virtual views into multiple data sources (
Source Giga Information Group) - A view represents a business entity in a
metadata-based description - a customer
- a sales pipeline
- the performance of a manufacturer's production
floor - __________________________________________________
_________ - Applications access a view as if the data were
physically located in a single databaseeven
though individual data may reside in a different
source system - When an application accesses a view, the EII
platform transparently handles connectivity with
back-end databases and applications, along with
related functions, e.g., security, data
integrity, and query optimization - __________________________________________________
_________ - XML is the ideal conduit for enabling such
on-demand views - XML Views facilitate the continued utilization of
current IT investments, and the eventual
intelligent migration to new world technologies
according to business rules ______________________
_____________________________________
97. Single View of Information
Across all levels and functions of government
108. Solving the Integration Challenge
11(No Transcript)
129. Information On-Demand
- Aggregation
- Data chaining
- Inbound XML
- Decomposition
- Unique standards/XML-based approach -
1310. Using XML for Federal Enterprise Architecture
- BRM is a function-driven framework for
describing the business operations of the Federal
Government independent of the agencies that
perform them - XAware uses XML to bi-directionally supply data
to applications independent of the physical
agency location of the data source(s) in a
many-to-many relationship - DRM will describe the data and information
that support program and business line operations
(Cross-Agency Exchange) - XAware utilizes a standards-based approach (XML)
to accomplish cross-agency information exchanges
in real-time, leveraging current IT investments.
1411. Using XML for FEA (contd)
- The massive duplication of efforts that the
FEA is intended to resolve is indicative of the
many individual agency data sources leveraged to
accomplish line of business activities - XAware allows many data sources to appear as a
single logical source, with an XML interface
1512. Integrated Information Sharing
Single XML views across all levels and functions
of government
1613. XAware Example Applications
- Civilian government
- U.S. DOI, Bureau of Land Management
- U.S. Department of State
- Justice Information System - Alaska
- State local government
- Nebraska Department of Environment
- Colorado Department of Health
- California EPA
- Defense DoD
- Raytheon
- Northrop Grumman
- Mitre
1714. Context for Demos 1 2 OMB and the Federal
Budget
Report on Information Technology (IT) Spending
for the Federal Government for Fiscal Years 2002,
2003, and 2004
1815. Context for Demos 1 2 The Federal Budget
Process
- __________________________________________________
________ - Year-long, multi-step process with multiple
reviews, refinements, and touch points - Every office in every agency has to develop its
budget each FY - __________________________________________________
________ - Plans, programs, and budgets start at the lowest
level, then trickle up to the highest levels,
e.g., Department Secretary - Proposed budget then sent to OMB, Executive
Office of the President - The President submits final budget request to
Congress (FY 2004 gt 2 trillion
20 of U.S. GDP) - __________________________________________________
________ - OMB Exhibit 300 is a superset of OMB Exhibit 53
- As such, the instance budget data are accumulated
via the Exhibit 300 form, they become a candidate
for a more complex implementation of the
functionalityto be described as Demo 1
1916. Demo 1 Demo 2 Flowchart of Steps
2017. Demo 1 Conversion of OMB Exhibit 53 to XFML
- __________________________________________________
________ - OMB Exhibit 53 is represented as an Excel
spreadsheet containing IT Investment Details for
every project in every agency of the Federal
government, with total, development, and
steady-state costs for past, current, and next
fiscal year - __________________________________________________
________ - This demonstration will use XAwares XA-Suite to
convert the Exhibit 53 spreadsheet to an XFML
representation - Each row in the spreadsheet is represented by a
page in the XFML file - Each page has a title which uniquely identifies
a row in the spreadsheet - Each page in the XFML file contains
occurrences, where each occurrence indicates
the existence of a topic on the page - Each topic belongs to a facet, defining the
organizational structure of the XFML file - __________________________________________________
________ - Five facets have been defined for this
demonstration - Department, Investment Type, Budget Entry Year,
Project Type, and Investment
2118. Demo 1 Conversion of OMB Exhibit 53 to XFML
- Topics have been defined for each of the facets
- Department
- Each agency name is a topic
- Investment Type Topics are
- 01 Projects by Mission Area
- 02 Office Automation and Infrastructure
- 03 Enterprise Architecture and Planning
- 04 Grants Management
- 05 Intramural / Grants to States
- Budget Entry Year Topics are
- 2000, 2001, 2002, and 2003
- 24 (representing one of the 24 E-Gov initiatives)
- Project Type Topics are
- Major and Minor
- Investment Topics have been established for
- Dollar amount ranges for each of Total,
Development, and Steady-State expenditures
2219. The Big Issue with Search and Discovery (SD)
- How do you make sense of the information
contained in a - very large data repository(s)? By having the
ability to - Get an aerial view of all the information,
neatly organized by categories and distributed by
counts of the documents found - See all of this information along hierarchical
categories and as different views (facets) - Conduct search and browsein tandem, and by
categories within any facet - Slice all or any part of the repository along any
combination of category axessimilarly organized
and distributed - Bring back the searched result subset(s) in real
timesimilarly organized and distributed
2320. Evolution of Apps./User Interfaces and Data
Relns.
Applications and user interfaces
Data relationships
1960s
Now
2421. Reqs. of Data Stakeholders that are Driving
Standards
2522. Specific Problems with SDw.r.t. Stakeholders
- Stakeholder 1 End-user
- Search around (Google type)
- Browse one or more taxonomies
- Find what s/he knows existseven if s/he may or
may not know how to ask for it - Discoverserendipitously and deductively
- Stakeholder 2 Author
- Expect target audience to have access
- Possess the means of controlling discovery by
end-users - Have the confidence that a user looking for a
document will find it
- Stakeholder 3 Database mgr.
- Ensure minimum intrusion
- Use help in identifying errors in their databases
- Stakeholder 4 Repository mgr.
- Enable a single point of search
- Provide simultaneous, real-time update of all new
info - Synchronize data across all classes of users
- Win consensus of DB managers
- Address security concerns
2623. Specific Problems with SDw.r.t. Business
Value
Economic value
Social value
Intelligence value
Better, faster interaction between end-users
and textual information
Pillar 3 Greater visibility
Pillar 5 Real-time customer response
Pillar 4 Virtual syndication
Pillar 2 Faceted search and browse
Pillar 1 Virtual aggregation
2724. Solving the SD ProblemsWhat Do We Need?
XML to help define as much of the common
structure between disparate data sources without
disrupting the existing maintenance
infrastructure and processes
- XFML to help metatag the facets and headings to
resources (documents) - From attributes (fields) in a document (e.g.,
source, date, author) - Manual effort
- Automated heuristic classifiers
(e.g., Applied Semantics, Entrevo, Inxight,
NStein) - WebDAV for collaborative tagging
Better, faster interaction between end-users
and textual information
- Other technologies, e.g.
- XTM (topic maps)and XTM vs. XFML
2825. Toward a Better Mousetrap for Search and
Discovery
- What must the faceted SD technology do?
- - Bin-sort, with counts, any searched subset of
the data repository by one or more underlying
taxonomies (and facets)in real time. - Whats the big deal?
- - The big deal is the performance. Retrieving
and sorting results sets that may contain
hundreds of thousands or millions of documents in
real time is a huge performance challenge. - So, how could it be done?
- - By employing certain indexing techniques,
search algorithms, and parallel processing. - And how could we achieve search/research
interoperability across dispersed, disparate
databases? - - With search API and hardware that sit
independently of the database hardware and its
maintenance infrastructure.
2926. Search, Browse, Slice, Dice, Discoverby
Facets
Visual example of a user-controlled search logic
with the faceted SD technology
View 2 See subset along Axis 1
View 1 Get aerial layout of the entire database
View 3 See subset along Axis 2
View 4 Search by keyword
3027. Conventional vs. Faceted Search Discovery
3128. Searchvia Google
- Search only Web-based information, not RDBMS
data. - Results ranked by relevancewhich may not be
relevant to the user. - Long lists not organized by categories
therefore, valuable information may not be
visible to the user. - No opportunity for user to refine search results,
perform drill-downs, and back-track without
starting over.
3229. Searchvia Faceted Search Discovery
- Search results always in context and presented in
structured (sub)categories. - All returned items visible (with counts of the
documents found). - User can view results in any category tree, and
switch views from one tree to another at any
time. - User controls searching, browsing, and
back-tracking rapidly and interactively. - Documents achieve maximum exposure and visibility
by unlocking, organizing, and amplifying
critical/relevant data across multiple data
repositories through a single point of search.
Relevant views of the data
Simultaneous free-text Boolean search
Categories of selected views shown at all times
3330. i411 Discovery Engine Example Applications
Government applications
CRISP Grants Database Office of Extramural
Research, National Institutes of Health, U.S.
Department of Health and Human Services http//cri
sp.i411.com/
AIDS Projects Query System Office of AIDS
Research, National Institutes of Health, U.S.
Department of Health and Human Services http//dem
o.altum.com
With Altum, Inc.
Trade and Economic Archives/KM STAT-USA, Economi
cs and Statistics Administration, U.S. Department
of Commerce http//statusa.i411.com
3431. Demo 2 Multifaceted SD of OMB Exhibit 53
Demo 2 URL http//demoweb01.i411.com/budget/inde
x.html
3532. Conclusionsand QA
- __________________________________________________
______________ - W.r.t. the databases of Federal government
agencies, there are many acute, urgent issues
that have to do with - - The underlying data (and documents)
- - Bringing data from different sources
- - Ensuring the integrity of data
- - Generating value for the end-user at the
front line - Butthese problems present huge opportunities
that can be tapped with new, human-friendly
technologies for data aggregation, search, and
discovery - __________________________________________________
______________ - Creating real-time read/write access of
disparate, dispersed, and (un)structured data
sources among Federal agencies through unified
XML views is very powerful - Unlocking the full social, economic, and
intelligence value of information assets within
an agency is a long-term pursuit, rapidly aided
by the simplification of data exchange and
integration (realize benefits quickly) - The creation of information on-demand through XML
views is an FEA-compliant solution to many
significant IT challenges in government - __________________________________________________
_______________
3633. Contact Information
Iqbal Talib, Co-Founder CEO Amin Hassam, V.
President, Gov. Solutions i411, Inc. 13655 Dulles
Technology Drive, Suite 250 Herndon, Virginia
80920 Email italib_at_i411.com Email
ahassam_at_i411.com Iqbal 703.793.3270 x105 Amin
703,793.3270 x140 www.i411.com
Brett Stein, Solutions Director Steve Horneman,
Director of Marketing XAware, Inc. 2060 Briargate
Parkway, Suite 150 Colorado Springs, Colorado
80920 Email bstein_at_xaware.com Email
shorneman_at_xaware.com Brett 719.884.5420 Steve
719.884.5424 www.xaware.com