Title: XML-based Web Publishing and Content Management at Seattle University School of Law
1XML-based Web Publishing and Content Management
at Seattle University School of Law
- James Cooper
- Director of Technology Media Services
- jcooper_at_seattleu.edu
- Evan Lenz
- Content Management Architect
- lenze_at_seattleu.edu
2Contents
- Web site requirements and architecture
- Web site management with Cocoon
- URI design discussion
- Redhawk CMS
- An acronym you should know XSLT
- QA
31. Web site requirements and architecture
4SU Law Web site requirements (summer 2002)
- Must include a Flash-enhanced version
- Must include an HTML-based version that
approximates the look-and-feel and navigational
structure of the Flash-enhanced version - Must include a version of the site that is
designed for accessibility - Must employ the separation of presentation and
content through the use of XML technologies.
Multiple published versions of the same content
must originate in an automatic way from the same
source. - The publishing framework must employ a single
point of control over navigational structure,
e.g. using an XML configuration file.
5Web site requirements, cont.
- Must allow an average Web developer to easily
author new content, edit existing content, etc. - Must accommodate the continued use of existing
tools for authoring content, e.g. Dreamweaver. - Particular kinds of content that have
predictable, repeating structure should be
converted into custom XML vocabularies to
increase their flexibility and ease of
management. - The Web site must include search functionality
integrated into all versions of the site.
6Web content strategy today
- Static pages were converted to and are stored as
style-free XHTML (in VSS, with latest versions
shadowed on the staging server). - Apache Ant is invoked on the staging server to
incrementally build all versions (Flash,
Standard, Text-only, and crawler) of each static
page, using the page source, as well as global
navigation and sidebar configuration files, as
input. - Cocoon powers the core functionality of the site,
including setting the users version preferences
and serving dynamic content. All static pages and
files are served directly by Apache. - Dynamic content pieces are identified by URI in
the Cocoon sitemap, which is configured to
assemble corresponding pages on-the-fly. Dynamic
content examples include - Specialized content in our home-grown CMS called
Redhawk, which provides end-user WYSIWYG
editing of certain kinds of content - Google search results
- Legacy ASP pages
- Traditional Web content management, e.g. WYSIWYG
editing of all pages, is being considered, but
not sorely missed at this time.
7Benefits of using XML
- Separation of presentation from content
- Ensures consistency of presentation across all
pages (eliminates layout errors) - Enables publication to multiple channels
- Content re-use
- Many commercial and open-source tools available
for processing/creating XML - Integration between disparate systems (including
legacy ASP pages, Google, Redhawk, etc.) - Great for configuration files
8Primary tools used in our Web site
- Run-time
- Apache Cocoon (Java-based)
- Apache Web server on Linux
- mod_rewrite (for rewriting incoming URLs, e.g.
path?modeflash, to /flash-html/path.html) - Google Appliance (for integrated search inside
our site template) - IIS/ASP (legacy database access scripts, e-mail
forms, etc.) - 4Suite, for exporting content from the Redhawk
CMS (based on 4Suite)
- Build-time
- MS Visual SourceSafe (for versioning of static
content) - Samba (for mounting a VSS shadow folder on the
Linux staging server) - Dreamweaver MX (includes XHTML support and VSS
integration) - Apache Ant (for building the bulk of the site
statically) - 4Suite, for end-user content management of
specialized document types, aka Redhawk
92. Web site management with Cocoon
10Introduction to Cocoon
- Cocoon is an open-source, Java-based XML Web
publishing framework - Recently gained status as a top-level Apache
project, at http//cocoon.apache.org - Designed to enable the separation of concerns
between content, logic, and style
11The Cocoon sitemap
- SAX-based pipeline mechanism allows XML content
to go through a series of transformations,
configurable by the sitemap, Cocoon's central
point of configuration - Each pipeline consists of
- Exactly one generator
- Produces XML content using any number of
mechanisms reading a file, submitting an HTTP
request, calling a database, invoking a server
page script, etc. - Followed by zero or more transformers
- Processes the XML, e.g. XSLT or Xinclude, for
subsequent handling by either another transformer
or the serializer - Followed by exactly one serializer
- Serializes into a particular format, e.g.
well-formed XML, browser-compatible XHTML, SVG,
PDF (via XSLFO and FOP), rasterized images (via
SVG and Batik), etc.
12Simplified Cocoon sitemap excerpt
ltmapmatch pattern"accesstojustice/hague/cases"gt
ltmapgenerate src"http//redhawk/?xsltg
etCases.xsl"/gt ltmaptransform
src"stylesheets/case2html.xsl"/gt
ltmapserialize type"xhtml"/gt lt/mapmatchgt
13Another sitemap excerpt
- ltmapresource name"front-door"gt
- ltmapselect type"request-parameter"gt
- ltmapparameter name"parameter-name"
- value"set-version"/gt
- ltmapwhen test"flash"gt
- ltmapcall resource"check-flash"/gt
- lt/mapwhengt
- ltmapwhen test"flash-confirmed"gt
- ltmapcall resource"set-preference-to-fla
sh"/gt - lt/mapwhengt
- ltmapwhen test"standard"gt
- ltmapcall resource"set-preference-to-sta
ndard"/gt - lt/mapwhengt
- ltmapwhen test"simple"gt
- ltmapcall resource"set-preference-to-sim
ple"/gt - lt/mapwhengt
- ltmapotherwisegt
- lt!-- more logic --gt
- lt/mapotherwisegt
14(No Transcript)
15URI design considerations
- The URI design of the SU Law Web site was
inspired by Tim Berners-Lee's 1998 essay Cool
URIs don't change http//www.w3.org/Provider/St
yle/URI.html - Aims to follow two of the essay's suggestions
- Leave out file extensions
- Leave out topic/classification by subject
16Leave out file extensions
- Cocoon makes it easy to map external URIs to
internal filenames or other content generators - In the SU Law Web site, the URLs of all HTML
pages do not include any file extensions - Other types of content use standard file
extensions, e.g. JPG, GIF, Flash, Word, etc.
17Leave out topic/classification by subject
- Difficult problem
- Design URIs such that they are meaningfully
mnemonic and will never change, even though the
corresponding pages may be classified into
different topics later - Berners-Lee "Because the relationships between
subjects are web-like rather than tree-like,
even...people who agree on a web may pick a
different tree representation."
18Decouple navigational structure from URI structure
- URI structure is, of necessity, hierarchical
- Site navigation tends to be hierarchical,
classifying pages into topics or subjects - To help in following the original suggestion, we
formulated the following mandate - Decouple navigational structure from URI
structure. - We met this goal through the use of a custom XML
configuration file (navigation.xml) that maps
between the two independent hierarchies
(navigation and URI structure)
19Excerpt from navigation.xml
- ltnavigation xmlns"http//law.seattleu.edu"gt
- ltmenu display"Welcome" sectionId"welcome"gt
- ltlink href"/" display"SU Law Home"/gt
- ltlink display"Contact Information"
href"/contactus"/gt - ltlink display"Directions" href"/directions"/
gt - ltlink href"/welcome" display"From the
Dean"/gt - ltlink href"/history" display"History"/gt
- ltlink href"/calendar" display"Master
Calendar"/gt - ltlink href"/mission" display"Mission"/gt
- ltlink href"/search" display"Search"/gt
- ltlink href"/sitemap" display"Site Map"/gt
- ltlink href"http//www.seattleu.edu"
- display"Seattle University Home"/gt
- lthidden href"/news" display"News"/gt
- lthidden pattern"/news"/gt
- lthidden href"/privacy" display"Privacy
Statement"/gt - lt/menugt
- ltmenu display"Students" sectionId"students"gt
- ltmenu display"Academics"gt
20The benefits of URI-navigation independence
- Pages can be moved from one section of the site
to another by simply editing one file
(navigation.xml) - Navigation structure can change without needing
to update any links or change any URIs (thereby
rendering them uncool) - Files do not need to be moved around just because
corresponding pages move around the site
21XML-based configuration of the Web site sidebar
- ltsidebar xmlns"http//law.seattleu.edu"gt
- ltallButtonsgt
- ltpromotion id"laptop" img"laptoppurchase.gif
- alt"Student Laptop Purchase
Program (Dell) - href"/technology/purchase"/gt
- ltprofile id"cmhall" alt"Christian
Halliburton Video - movie"cmhall.rm"/gt
- ltquote id"cumbow" img"cumbow.gif"
alt"Cumbow Quote"/gt - ...
- lt/allButtonsgt
- ...
- ltsection id"faculty"gt
- ltprofile idref"cmhall"/gt
- ltquote idref"cumbow"/gt
- ltpromotion idref"giving"/gt
- ltpromotion idref"newfaculty"/gt
- ltpromotion idref"laptop"/gt
- lt/sectiongt
- ...
223. Redhawk CMS
23Redhawk, home-grown CMS
- Redhawk is a specialized XML content management
system, based on 4Suite, an open-source platform
for XML and RDF processing - Named after SU mascot
- Basic unit of storage is an XML document
- Supports development of custom Redhawk "document
classes", which correspond to XML document types
(or schemas) - Provides basic CRUD (Create, Read, Update,
Delete) and role-based workflow functionality - Two types of users for each document class
Author and Editor - Any Create, Update, or Delete requests by an
Author must be approved by an Editor before
taking effect - Pluggable WYSIWYG editing environments so far we
have developed support for Altova's free
browser-based XML editor, Authentic 5 - Future plans to support Microsoft InfoPath and
Word 2003
24Create New Announcement form
25Current Redhawk applications
- Announcements and events for the Docket
(migration from custom production application in
process) - Access to Justice Institutes Hague Project for
managing Hague Convention-related case
information (in production)
264. An acronym you should know XSLT
27The common denominator XSLT (Extensible
Stylesheet Language Transformations)
- Used in Cocoon to assemble all pages (XSLT is the
default type of "Transformer") - Used in our site build process, via Ant's ltxsltgt
task for collectively applying transformations
over multiple files - Built-in to 4Suite and used throughout Redhawk to
assemble pages, create documents, and implement
the core CMS logic (with the help of extensions) - Used in the Google Appliance to style the output
of search results - Used in Redhawk in the browser to apply
supplemental "clean-up" transformations to the
XML resulting from Authentic editing - Growing abundance of conformant XSLT processors,
including IE6 and Mozilla support, as well as a
growing number of powerful tools - And XSLT is reaching mainstream technology
status Microsoft Office 2003 will pervasively
employ XSLT for the development of custom XML
solutions, particularly in Word, Excel, Access,
and InfoPath.
28References
- http//cocoon.apache.org
- http//4suite.org
- http//ant.apache.org
- Cool URIs don't change http//www.w3.org/Provi
der/Style/URI.html - Cocoon and 4Suite for Content Management The
Best of Both Worlds at Seattle University School
of Law - http//www.xmlportfolio.com/xmleurope200
3/
29Questions?