Automatische Inhaltsbersicht fr XMLKollektionen - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Automatische Inhaltsbersicht fr XMLKollektionen

Description:

To obtain custom summaries of specific XML collections. ... XML Summarization-Template Designer Suite' (XSTDS), where the template is designed. ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 22
Provided by: isInforma
Category:

less

Transcript and Presenter's Notes

Title: Automatische Inhaltsbersicht fr XMLKollektionen


1
Automatische Inhaltsübersicht für XML-Kollektionen
  • Diploma thesis introduction

2
Goals
  • To obtain custom summaries of specific XML
    collections.
  • Develop or extend methods for XML summarization.
  • Provide a prototype implementation for such XML
    summarization methods.

3
Structure
  • Theoretical part
  • Summarization
  • XML Summarization
  • Template method.
  • Datatypes
  • Summarization modules.
  • Practical part
  • A summarization tool.

4
Summarization 6
  • The process of distilling the most important
    information from a source (or sources) to produce
    an abridged version for a particular user (or
    users) and task (or tasks).
  • Extract vs. Abstract.
  • Summary Function Indicative, Informative and
    Evaluative summaries.
  • Approaches Shallower and Deeper approaches.

5
XML Summarization
  • Previous work
  • SqueezeX compressor 01, which does not directly
    imply XML summarization, but a lossless and lossy
    compressor.
  • XML summary based on statistics of queries 03.
  • Text Summarization with text tagged as XML 02a
    02b.
  • All previous work cannot handle documents with
    specific characteristics.
  • Each XML collection has its own characteristics
  • Template method allows characterization.

6
Template method
  • Describes a pseudo-structure of original
    document.
  • Such structure may include explicit data type
    information.
  • Adapts different summarization efforts to
    specific parts of an XML document through
    summarization modules.
  • Semi-automatic way because of the need to design
    a template when looking for an optimal XML
    summarization of a characteristic XML collection.
    So, an explicit design handles specific
    characteristics.

7
INPUT
Bibliography
entry
entry
entry
entry
entry
titleb2
titlek3
title123
titleDB
titlek1
isbnxx-yy
isbnx5-6
isbnx2-y3
isbn7891
isbnxx-yy
autorXXX
autorJTK
autorXXX
autorFühr
autorJTK
date7/3/1971
date1/3/1987
date1/1/1971
date1/1/2004
date1/1/1999
Bibliography
OUTPUT
Bibliography
TEMPLATE
entry
entry
entry
entry
1)author ? merge with other entry siblings 2)date
? interval of year 3)else ? remove
autorJTK
autorXXX
autorFühr
year1987-1999
year1971
year2004
8
INPUT
a
b
b
c3
dt1
c9
dt2
TEMPLATE
OUTPUT
a
a
b
b
c12
dt3
1) any integer of b ? aggregation sum
function 2) any string of b ? merge all ?
text summarize
9
Template workflow
  • Template design process
  • Summarization rules creation (if needed).
  • Data types creation (if needed).
  • Template design. (reuses rules and data types)
  • Template validation.
  • Template project file saved.
  • XML collection matching to existent templates.
  • With optimal template found, summarize according
    to template rules and definitions.

10
Template Matching
Look for a matching Template
Analize nextDocument
Matched Documents
Summarize
Analize nextDocument
Match!
No match.
11
Data types
  • Abstraction levels for data organization.
  • In 102 three levels of data organization are
    distinguished the physical level, conceptual
    level and external level.
  • Framework for dealing with data types in the XML
    context.
  • Efforts towards an XSD data type framework.

12
Data type hierarchy example
D-
string
integer
date
text
name
journal title
English
German
person-name
inst-name
English JT
German JT
D-
13
Summarization Modules
  • Summarization Module is a concept to extend and
    implement future summarization methods. By
    default there will be some summarization modules
    available. It is integrated by two parts
  • An XSD complex type definition.
  • An implemented AbstractSummarizationModule.
  • The AbstractSummarizationModule is an abstract
    class that provides an interface for implementing
    specific summarization methods. A module may take
    a subtree and give a summarized subtree as an
    output.
  • Examples
  • A module that implements text summarization using
    segmentation, lexical chains and aid by a
    thesaurus like WordNet.
  • A module that reuses an already created text
    summarizer Open Text Summarizer or
    Classifier4J.

14
A Summarization Tool
  • Main implementation consists of two parts
  • XML Summarization-Template Designer Suite
    (XSTDS), where the template is designed.
  • AX Summarizer (Automatic XML Summarizer or AXS),
    where there is an automatic workflow of
    summarization.
  • XSTDS has three main components
  • Rules Creator
  • Data type Designer
  • Template Designer

15
Template relation
16
General Steps for User (I)
  • User wants to summarize an XML collection with
    certain characteristics, these characteristics
    may be reflected in a DTD or XSD file. So in
    general, characteristics of a collection will aid
    in the creation of a template.
  • The XSTDS program is loaded. A new project is
    created and a template editing GUI appears. It
    has the resemblance of a right-sided tree.
  • (A wizard/assistant could be in future
    development)
  • User has the option of creating at first a draft
    template from one or many XML samples. Under many
    samples there will be a merge of duplicate nodes
    under a same root and will try to determine data
    types and assign default behavior. With one
    sample merge still applies for duplicated nodes.
  • User can add or remove nodes. It can apply
    summarization rules and assign data types and
    modify default behavior. There is a pool of
    previously defined summarization rules (with
    available summarization modules) and data types.
    The user has access to both pools that are
    categorized by the project where they were
    created or date (or any attribute that may be a
    useful indicator). The summarization modules are
    hard-coded (but maybe it will possible to have
    dynamic summarization module loading).

17
General Steps for User (II)
  • User could specify a pattern of the input XML
    filenames as help to the future template matching
    process.
  • When the user finishes the design it validates
    the template for consistency.
  • User saves the project and closes the XSTDS
    application.
  • User loads the AX Summarizer application. The AXS
    knows of the available templates (projects?)
    because of the default template folder.
  • The AXS shows a GUI where the user can specify
    the source/input (a file, a folder or a recursive
    folder), the output folder and the template
    options. Under template options the user can set
    default matching or specify a specific
    template. It may be possible to customize certain
    options like very general output characteristics.
  • User could save settings.
  • The user can start now to summarize by pressing
    the Summarize button. It should be possible to
    cancel the process.
  • A sound and a popup window indicates that there
    was success or failure, and a under failure there
    will be a informative log to assist the user of
    the problems that were encounter. Fatal and
    simple errors can be noticed. Some statistics of
    the whole process is shown.

18
The End
  • Questions and discussions.

19
(No Transcript)
20
(No Transcript)
21
Bibliography
  • 01 Cannataro, Comito, Pugliese (2002).
    SqueezeX Synthesis and Compression of XML Data.
    Proceedings of the International Conference on
    Information Technology Coding and Computing
    (ITCC02)
  • 02a Kenneth C. Litkowski (2003). Text
    Summarization Using XML-Tagged Documents. CL
    Research DUC 2003
  • 02b Kenneth C. Litkowski (2004). Summarization
    Experiments in DUC 2004. CL Research DUC 2004
  • 03 Comai, Marrara and Tanca (2004). XML
    Document Summarization Using XQuery for Synopsis
    Creation. Proceedings of the 15th International
    Workshop on Database and Expert Systems
    Applications (DEXA04)
  • 06 T. Maybury, I. Mani (2001). Automatic
    Summarization.
  • American/European Conference on Computational
    Linguistics (ACL/EACL 2001) Toulouse, France 8
    July 2001
  • 102 N. Fuhr (1999). Towards Data Abstraction
    in Networked Information Retrieval Systems.
    Information Processing and Management 35(2)
Write a Comment
User Comments (0)
About PowerShow.com