Title: Office Automation
1Office Automation Intranets
Lecture 12 Advanced Systems Dynamic Generation
of Web Pages, Embedded Servers
2Agenda (1)
- we consider some advanced web systems which are
either available or possible and useful - all use as a basis some form of Common Gateway
Interface (CGI) which are capabilities available
in all web servers - some of these advanced web systems are currently
under development in the Dept. Business Systems
3Agenda (2)
- the following topics are discussed
- dynamic page generation (also known as dynamic
documents) in order to introduce basic CGI
concepts - embedded web servers which can be very useful
for certain applications- example provided is of
a genre prototyping tool
4Agenda (3)
- intelligent hypertext systems that can provide
users with previews of documents prior to jumping
to them - dynamic site structure- we have described the
utility of generating pages on the fly using CGI
(in this lecture) and Server Side Includes
(discussed elsewhere) but it is possible and
useful to also have site structure change
dynamically
5Dynamic Page Generation
6Dynamic Page GenerationDefinition
- another name for this is dynamic documents
- documents can be generated on the fly from
information that is being - constantly updated, or
- generated algorithmically
- the result of a search
7Dynamic Page GenerationUtility
- widely used on the web as
- gateways to other information systems and
applications - used to process the input from forms and image
maps - within a web resource, links may be created to
virtual documents which when requested are
generated before being served
8Dynamic Page GenerationGateways and Forms
- not all information services fit the Web
authoring mold in which static files are placed
in directories - sometimes the information must be generated
dynamically from a database - gateways are used to solve these problems by
providing an extension mechanism for the Web
Server
9Dynamic Page GenerationCommon Gateway Interface
Specification
- the interface between the server and the programs
that generate dynamic documents is defined by the
Common Gateway Interface (CGI) specification - a related mechanism for generating dynamic
information (described previously) is the Server
Side Include
10Dynamic Page GenerationGateways and Forms
- gateways take an information source that doesnt
fit the web authoring mold making it look to the
browser like a file on the Web Server - in practice, the gateway is just a script or
program invoked by your web server- accepting
user input data through the Web Server and can
output HTML
11Dynamic Page GenerationGateways and Forms
- forms are used in WWW as a way of collecting
input to a script or program on your server - form scripts are closely related to gateways-
these scripts pass data to and from the Web
server - Common Gateway Interface (CGI) is a mechanism for
communicating between a gateway and web server
12Dynamic Page GenerationCommon Gateway Interface
Specification
- CGI specification describes how HTTP servers
interact with external gateway programs - these external gateway programs- called CGI
scripts- can be written in almost any language
including Perl, AWK and C, Visual Basic etc.
13Dynamic Page GenerationCGI Operation
- Information is passed to CGI scripts via
environmental variables and in the input stream - CGI scripts are written to output a HTML
document, with MIME headers - server sends the output to the clients browser
and the MIME headers tell the browser how to
display the document
14Dynamic Page GenerationEnvironmental Variables
(Ford 1995, 145-146)
15Dynamic Page GenerationEnvironmental Variables
(Ford 1995, 145-146)
16Dynamic Page GenerationCGI Security Concerns
- CGI scripts can open up potential security
loopholes in web applications - this is because they have the ability to access
information from outside the usual web directory
hierarchy - must be considered potentially untrustworthy
17Dynamic Page GenerationExample Tutorial
Preference System
18Dynamic Page GenerationVerify Tutorial
Preferences Screen
19Dynamic Page GenerationHTML Tags ltFORM/FORMgt
- requests are generally gathered using a ltFORMgt
tag which includes attributes - ACTION URL specifying the location to which the
contents of the form are to be sent- generally a
CGI script - METHOD selects variations in the protocol eg.
GET or POST - ENCTYPE specifies the format of the submitted
data (if necessary)
20Dynamic Page GenerationVerify Tutorial
Preferences ltFORMgt
ltCENTERgtltH2gtFACULTY OF COMMERCEltBRgtTUTORIAL
PREFERENCE SYSTEMltPgt Verify Tutorial
Preferencelt/H2gtlt/CENTERgt ltFORM METHODPOST
ACTION /cgi-bin/tps-cgigt ltINPUT TYPE hidden
NAMEstate VALUE13gt ltINPUT TYPE hidden
NAMEchksum VALUE91wlyrngt ltPgtltBRgtltULgt ltBgtPLEA
SE ENTER YOUR STUDENT NUMBER lt/Bgt ltINPUT
TYPEtext NAMEstudnum MAXLENGTH7
SIZE7gt ltBgtPLEASE ENTER YOUR BIRTHDATE
(ddmmyy) lt/Bgt ltINPUT TYPEpassword NAMEdob
MAXLENGTH6 SIZE6gt ltCENTERgt ltBgtEXIT your web
browser after viewing your tutorial preferences,
ltBRgt otherwise anyone using this computer after
you will be able to access your information
lt/Bgt ltPgt ltINPUT TYPEsubmit VALUEClick Here to
Continuegt ltINPUT TYPEreset VALUEClick Here to
Clear EntriesgtltPgt lt/CENTERgt lt/FORMgt
21Dynamic Page GenerationInformation Flow
- when the user enters text on a Form and hits the
return key, the web browser sends keystrokes
captured by the user to the web server (for
example NCSA web server is called http daemon or
httpd server) - the web server accepts input, starts up the
gateway and hands the input to the gateway via CGI
22Dynamic Page GenerationInformation Flow
- the users keystrokes are passed to the gateway
either via - environmental variables, called the GET method
or, - using standard input, called the POST method
- the gateway then parses the input and processes
it (eg. sends a retrieval command to a database)
23Dynamic Page GenerationInformation Flow
- the gateway may generate HTML output (via a
template) - the HTML output is returned to the web server to
either - pass on to the client, or
- it may save the data in a file or database, or
- may send the information to someone via email
24Dynamic Page GenerationGateway Scripts
- may be scripts or programs written in C/C,
Perl, tcl, the C shell or the Bourne Shell - Perl
- tcl stands for tool command language and is
pronounced tickle - C shell and Bourne shell are interactive command
interpreter and command programming language for
UNIX
25Dynamic Page GenerationCGI Gateway Scripts- HTML
output
- CGI gateways that generate HTML output are
required to preface the HTML output to stdout
with the following line - Content-type text/html
- this line must be followed by a blank line before
the first ltHTMLgttag is sent
26Dynamic Page GenerationCGI Gateway Scripts-
Non-HTML output
- the gateway need not generate HTML
- it could return the URL of another file,
indictating to the browser that it should get a
file- this is called URL redirection - CGI gateways using URL redirection write the
following line to stdout - Location URL
27Dynamic Page GenerationInformation Flow
28Embedded Web Servers
29Embedded Web ServersTraditional Client-Server
Model
- So far our discussions in this course have
centred on a traditional client-server model for
web systems - client sends a request to a remote server, and
eventually - a response is returned to the client based on the
operation of the remote server
30Embedded Web ServersClient-side Servers
- so useful is this arrangement that we rarely
question it- but it is not necessary to have a
server running remotely - rather it may be useful to have one or many
temporary web servers instantiated and executed
client-side
31Embedded Web ServersCase Tool for Genre
Analysis- GASP
- genre can be applied to analysing the structure
of workpractices (Clarke 2000) - to speed up the description of workpractices a
case tool is being developed in Dept. Business
Systems - system is called Genre and Action Sequence
Processor (GASP) uses an client side or embedded
web server
32Embedded Web ServersCase Tool for Genre
Analysis- GASP
- users and analysts jointly build up a genre
sequences consisting of a set of nodes and links - the nodes are web pages which may contain textual
descriptions, forms etc describing a stage in a
workpractice - alternatively nodes may contain video clips of
action collected in the field
33Embedded Web ServersOperation of GASP
- the nodes and links are dynamically created
client-side using an embedded server - users make requests to the embedded server for a
requested page - but the page is generated only when it is needed
from a directed graph of the genre which is
stored in a name space
34Embedded Web ServersOperation of GASP
- state information about the users traversal of
the genre digraph is - stored in the URL, and
- ultimately written to a database when the user
reaches the end-of-sequence symbol for the
digraph or when the system times out - stored traversals are the basis for usability
analysis!
35Embedded Web ServersGASP as a CASE Environment
- GASP can form part of CASE environment using
distributed web technologies - by configuring GASP to echo its activity to a
proxy server, a project manager would be able to
see exactly what the users and analysts are doing
in real time
36Embedded Web ServersGASP as a CASE Environment
37Intelligent Hypertext System
38Intelligent Hypertext SystemNeed to Understand
Textual Resources
- in Lecture 10, we discussed issues relating to
text resources - re-purposing texts to hypertexts can disrupt the
communicative utility of the former (see also
T909-10.DOC) - throughout this course we showed how
understanding texts could help us create widgets
enabling users to traverse large hyper-documents
while simultaneously reducing screen real estate
39Intelligent Hypertext SystemInability to Preview
prior to Jumping
- one aspect of the WWW which is a problem is that
users are not able to preview textual resources
prior to jumping to them - promotes superficial and inefficient reading
practices- skim, scroll and peck (Clarke 1995) - increases the number of hits on server, increases
user frustration etc.
40Intelligent Hypertext SystemPreview prior to
Jump Feature
- the ability to preview prior to jumping to a
resource would provide users with much greater
control over what they retrieved - this is frustrating because this capability has
been available on the earliest microcomputer-based
hypertext systems (eg. Hypercard, Supercard on
the Apple Macintosh etc.)
41Intelligent Hypertext SystemThematic
Information Resources of Texts
- the texture resource that readers need in order
to predict what will occur next in a text is
referred to as theme - usually associated with theme is the
texture-forming resource called information in
which subsequent new meanings can be created in
a text from previously accumulated given
meanings
42Intelligent Hypertext SystemPreview features
emphasise Theme
- each intranet text for which previews are
required, would need to have encoded - various themes at the level of the clause
- various hyper-Themes at the level of the
paragraph, and - the so-called macro-Theme at the level of an
overall abstract for a text
43Intelligent Hypertext SystemDetermining Thematic
Resources
- it is unlikely that we will ever be able to
completely automate the analysis of texts for
thematic resources - tools are available to support a linguist to
conduct this kind of textual analysis (see
Michael OTooles SFL WWW Site), but - it only needs to be done once for each
hyper-document and then only needs to be repeated
each time the document is amended
44Intelligent Hypertext SystemEncoding Thematic
Information (1)
- the results of thematic analysis need to be
encoded in the hyper-document - the thematic analysis must
- move with the document
- must be copied whenever the document is
duplicated within the originating web site - must not interfere with the rendering or
processing of the document on other sites
45Intelligent Hypertext SystemEncoding Thematic
Information (2)
- HTML standard of adding user content to a
document is by using META tags - there are conventional uses of this tag (eg.
Description, Keywords) but there is no explicit
standardisation limiting what can be encoded into
a hyper-document using this tag - meta information is not displayed in the browser-
users dont know it exists unless they View Page
Source
46Intelligent Hypertext SystemEncoding Thematic
Information (3)
- thematic resources are organised into chains
which flow through a text - each text will have a pattern of themes called a
thematic progression (examples of which include
simple, multiple, and zig-zag) - how to efficiently encode these chains into META
tags will form the basis of a Masters project
(anyone interested?)
47Intelligent Hypertext SystemEncoding Thematic
Information (4)
- once the chain encoding is developed, it is
likely to be applicable to other resources as
well- including information - each text only needs to knows about its own
thematic structure - providing the thematic preview of a document
referenced by, or reachable from, the current one
is conducted as a server-side process
48Intelligent Hypertext SystemEncoding Thematic
Information (4)
- it is assumed that all documents in the intranet
have encoded in them, the required thematic and
associated information resources - this involves some additional preparation work
during re-purposing or document creation, but the
resulting increase in functionality would be well
worth it
49Intelligent Hypertext SystemEncoding Thematic
Information (5)
50Intelligent Hypertext SystemSuggested
Architecture (1)
- if the user rolls their mouse over a link to an
encoded document - as is normally the case, clicking the left arrow
key, enables the user to immediately jump to that
document, but - clicking the right arrow key, opens up the usual
menu of choices (Edit Linked Item, View Linked
Item etc) but also displays at the top of this
list an option called About this Item
51Intelligent Hypertext SystemSuggested
Architecture (2)
- the About this Item option is not a standard
option on this menu- it is included by the
Intranet operators - adding menu items themselves is a relatively
straightforward configuration detail when using
the Netscape browser - it is something that is easily setup for Intranet
developers
52Intelligent Hypertext SystemSuggested
Architecture (3)
- clicking on the About this Item option sends a
GET URL request to run the Theme CGI program on
the web server, along with the document pointed
to by the link - as part of the operation of the Theme CGI
program, a META tag parser is run on that document
53Intelligent Hypertext SystemSuggested
Architecture (4)
- the Theme CGI program sends back a response in
the form of a dynamically generated
hyper-document consisting of the output of the
META tag parser - a dependent window is opened in the users browser
to display the hyper-document- options can be
selected for pulling up required information
54Intelligent Hypertext SystemSuggested
Architecture (5)
- the users could select from a range of available
options depending on the what text resources were
encoded in the documents META tags, including - Abstract encoded macro-Theme
- Topics encoded hyper-Themes
- Information encoded hyper-New
- Summary encoded macro-New
55Intelligent Hypertext System
56Scalable/Dynamic Site Structure
57Scalable Site StructureScalable Web Sites
- some web hosting service companies understand
that corporate sites must be scalable- that is
the entire site can change its scale or size - small companies may only initially need a small
web presence... - ...but over several years they may then need to
add more extensive e-commerce, and extranet
capabilities
58Scalable Site StructureScalable Web Sites
- we will describe an overview of the technology
being developed by a new company called Loudcloud - founded by some senior former employees of
Netscape including its co-founder Marc
Andreessen... - ...and former Netscape/AOL executives Ben
Horowitz, Tim Howes and In Sik Rhee
www.loudcloud.com/company/index.html
59Scalable Site StructureScalable Web Sites
- Loudcloud uses a technology it has developed
called Opsware automation technology to enable
sites to be scaled due to planned or unexpected
massive increases in demand - Opsware supports that allocation of additional
Capacity On Demand within minutes of a request!
60Scalable Site StructureScalable Web Sites
- a customer can dynamically add or delete services
as required- Loudcloud refers to these as Smart
Cloud technologies - Smart Clouds are predefined components
- LoudCloud can do all this because it controls the
construction and hosting of each web site it
supports - each web site is heavily instrumented and
centrally controlled
61Scalable Site StructureLoudclouds Scalable Web
Sites
- A companies Internet Applications are built on
top of Loudclouds architecture - Each internet service referred to as a Smart
Cloud, is built on to of Opsware automation
technology - Opsware technology automates manual tasks
including - capacity scaling
- system configuration provisioning
- site versioning
- A range of hardware and systems software can be
used to support Loudclouds environment
62Scalable Site Structure
- Scale to fit load requests (hits)
63Dynamic Site Structure Rationale (1)
- there is a related but perhaps even more radical
possibility than to having scalable web sites-
one which has a great deal of promise
commercially - we can extend the idea of dynamic web pages to
that of having dynamic web sites- sites that
change their structure to accommodate use
64Dynamic Site Structure Rationale (2)
- experience with developing web sites should
suggest to you that content determines the
structure of the web sites - assumption that web sites should have a static
web structure- one which does not change over
time, or alternatively - that it is either not useful, or too much bother
to change the site structure
65Dynamic Site Structure
- this is the case even when we use
- Dynamic HTML (DHTML), and JavaScript to produce
pages which appear to be changing - or even when we generate pages-on-demand, that is
dynamic generated pages as a consequence of
searches, database queries etc.
66Dynamic Site Structure
- in Lecture 9, we discussed installation of a web
server which focussed on the NCSA httpd Server,
and was tested by installing an Apache Server for
Windows/NT on zathros at the Department of
Business Systems - recall that we will also need to Configure,
Manage and Analyse the Server Log Files which
grow as a consequence of accesses made by users
of the web server
67Dynamic Site Structure
- if we study what users are accessing on the web
server then we can re-organise the site structure
to assist users in their requests for pages and
resources - this would result in more quickly or more easily
serving requests for resources - this may also permit more users to be served by a
web site
68Dynamic Site Structure
- re-organising web sites, in order to promote
functionality based on analysing the requests by
users, is currently being done by some
consultants in Australia - usually this leads to an improved but
none-the-less static site structure which enables
users to more easily access resources
69Dynamic Site Structure Necessity for Structural
Change
- changing the site structure can be useful
- over the long-term, site structures do and should
evolve as its uses are being further refined or
redefined (form should follow function) - in the short-term, usage patterns may change
diurnally. For example, intensive web-database
queries for local workers during the day time,
and FTP requests by overseas workers during the
evening.
70Dynamic Site StructureTechnical Feasibility
- it is technically feasible to generate a redirect
page which - informs users that a page has moved with a
reminder to bookmark the new location- you have
probably encountered this already, or which - automatically redirects a user to the new
location- sometimes without a user being aware
that this has occurred
71Dynamic Site StructureApproach (1)
- although there are no technical impediments to
changing site structure, and some justification
for doing it, the question becomes how to best
change the site structure? - as mentioned earlier it requires access to the
Server Log file, and a willingness to set up this
file to record as much as possible about user
requests
72Dynamic Site StructureApproach (2)
- the page hits for each user session must be
recorded using the Site Log - these records of user sessions can identify the
parts of the site structure hierarchy being
intensively used - the analysis would likely proceed by creating a
weighted tree of usage across the site topology
73Dynamic Site StructureApproach (3)
- just as with dynamic web pages on primarily
static sites, not all sections or weblets in a
web site will need to have a dynamic site
structure - the key to implementing efficient dynamic site
structure is to isolate those parts of a site
topology that will need to be changed
74Dynamic Site StructureApproach (4)
- for those parts of the site structure that
require routine change, server- side system
programs may need to - rearrange web pages at the terminal nodes or
leaves of the weighted tree - provide additional intermediate nodes in the form
of additional pages - need to change, verify and manage internal links
between pages to remove the possibility of bad
links
75Dynamic Site StructurePattern Recognition,
Neural Nets
- required usability analysis could be implemented
by applying pattern recognition techniques - suggestive of neural network technologies- web
site usage learnt by example- there are research
opportunities in this work! - great deal of money to be made as this
information on user behaviour is just a form of
consumer profiling
76Dynamic Site StructureAnalogous Approach found
in HCI
- this process is analogous to applying so-called
usability analysis- developed in HCI for
improving user interfaces - this involves analysing session transcripts for
evidence of repeating sequences of keystrokes
during the operation of a system - identify the most frequently occurring runs, then
reducing or removing them while providing the
same functionality
77Dynamic Site StructureOutsourcing to Portal Sites
- to create dynamic site structure there must be
tight integration between the Site Logs and the
Web Site itself - if organisations want this feature, but dont
want to implement it themselves, they will need
to outsource their entire web site operation and
maintenance to a Portal Site
78Dynamic Site StructureRelevant Example ...
- a dynamic site structure could in principle be
useful for an educational web site like the one
being developed to support BUSS909 - at the moment, students must click on a Lectures
link to get a list of all Lecture files, then
select the week or topic in order to select one
file from a maximum of 14 files
79Dynamic Site Structure Relevant Example ...
- students could click on a link called Current
Lecture, which retrieves the file relevant to the
current week - the previous lectures could be accessed via a
section page called Previous Lectures - the Future lectures could be accessed via a
section page called Future Lectures
80Dynamic Site Structure Relevant Example ...
- we might implement this if
- the Site Log files indicated that students were
accessing the same file repeatedly- an indication
that they did not know which was current - under the assumption that all lecture files were
to made available prior to delivery- not the case
at the moment!
81Dynamic Site Structure
Dynamic Site Structure fewer incorrect hits but
the site structure would need to be revised
weekly
Static Site Structure User request activity high
medium lowLectures, Past Lectures, Future
Lectures
L
L
Week 2
1
2
3
3
P
F
Week 3
1
2
4
L
5
Week 3
1
2
3
4
L
Week 4
L
4
P
F
Week 4
1
2
3
5
6
1
2
3
4
5
82Acknowledgements
- the author gratefully acknowledges discussions
with Joshua Fan, Department of Business Systems,
who suggested the web architecture necessary for
implementing the Intelligent Hypertext System and
also for alerting the author to the possibility
of Portal Services providing organisations with
web hosting services that would provide for the
restructuring of web sites - the architecture of GASP is being jointly
developed as part of an ongoing project between
Rodney Clarke with Tony McGrath, Wandering
Albatross Consulting
83References
- Clarke, R. J. (1995) WWW Page Metaphor
considered Harmful Proceedings of OZCHI95,
University of Wollongong - Clarke (2000) An Information System in its
Organisational Contexts A Systemic Semiotic
Longitudinal Case Study Unpublished PhD
Dissertation, Department of Information Systems,
University of Wollongong - Fan, Joshua (1999) Personal Communication
- Ford, A. (1995) Spinning the Web How to Provide
Information on the Internet London International
Thomson Computer Press - McGrath, T. (1999) Personal Communication
- Schwartz, R. L. (1999) Programming with Perl
Step-by Step Link Verification Web Techniques 4
(3) March 1999, 30-34