Title: Assuming Accurate Layout Information for Web Documents is Available, What Now Hassan Alam, Rachmat H
1Assuming Accurate Layout Information for Web
Documents is Available, What Now? Hassan Alam,
Rachmat Hartono, Aman Kumar, Fuad Rahman, Yuliya
Tarnikova and Che Wilcox Human Computer
Interaction GroupBCL Technologies Inc. Santa
Clara, CA 95050www.bcltechnologies.comfuad_at_bclte
chnologies.com
2Overview of the talk
- Web pages vs. document layout
- Why do we need layout information?
- Web page summarization for handheld devices
- The future Marrying Ontology with XML
- Conclusion and Future Work
3Related Work
4 Web Page Summarization for Handheld Devices
Web Page Data Structure
Content Processing for Re-authoring
Content Analysis
Representing the Complete Web page
Node Merging
5 Web Page Summarization for Handheld Devices
6The Future Marrying Ontology with XML
- We assume that we have layout information for a
web page - What do we do then?
- How do we use this information?
- How do that information help us in getting better
re-authoring solutions?
We define an XML to code that information
We then define an ontology for that domain!
7What is Ontology and How do We Define it?
Ontology establishes a joint terminology between
members of a community of interest.
These members can be human or automated agents.
- A list of elements
- Concept hierarchy
- Concept association
- Rules or axioms
- To define an ontology for the domain of web pages
8A List of Elements in the Web Domain
9Concept Hierarchy
and so on
10Concept Association
and so on
11Rules or Axioms
and so on
12 Web Page Summarization for Handheld Devices
using Ontology
Web Page Data Structure
Content Processing for Re-authoring
Content Analysis
Representing the Complete Web page
XML Structure Derived
Node Merging
Use Ontology to re-format the web page
Device Specific Display
13 What is the Advantage of using Ontology?
- It improves the quality of the output in many
ways. - It becomes possible to capture the contextual
relationship among various components within the
document - It leads to better understanding of the
information contained within the document. - This additional information can be used in other
processes, such as document categorization and
contextual search.
14Future Work
- It is assumed that the future of mobile browsing
lies in the adoption of semantic web technology. - Before that realizes, the proposed approach
offers a workable compromise to generate high
fidelity re-authored web pages. - This is an exploratory paper offering a specific
pathway to the future of web page re-authoring
provided accurate layout information is
available. - Currently, it is beyond the capability of any
algorithm to achieve this level of accuracy.
However, approximations to that accuracy are
attainable and even practical. It will be
interesting to discuss other possibilities in
this space.
15Conclusions
- Some ideas about how to produce better web page
re-authoring solutions by using linguistic
knowledge and ontology assuming accurate layout
information for web pages is available. - It is shown that such an approach will produce
high quality intelligent summary for web pages
allowing fast and efficient web browsing on small
display handheld devices.