engenda - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

engenda

Description:

Despite the fact that there has been a very active research ... XML can be converted to LMNL syntax, but LMNL syntax requires hacks to be converted to XML ... – PowerPoint PPT presentation

Number of Views:65
Avg rating:3.0/5.0
Slides: 24
Provided by: mindt
Category:
Tags: engenda | hack | mind | the

less

Transcript and Presenter's Notes

Title: engenda


1
engenda
Gavin Thomas Nicol CTO Red Bridge Interactive,
Inc.
2
Agenda
3
Presentation Agenda
  • Motivation
  • LMNL Syntax
  • LMNL Data Model
  • Reified LMNL Data Model
  • Relationship to XML
  • Conclusions

4
Motivation
5
Which came first, text or markup?
  • A text is some form of written communication
  • Markup is added to the text to
  • Correct, or annotate the text
  • Add semantics to the text.
  • Aid in formatting.
  • Why is it that formal models of documents model
    the markup?

6
Is a document really a tree?
  • Simplistically, yes. Practically, no.
    (DeRose/Renear et al).
  • Documents have
  • Annotations
  • Overlapping structures
  • Cross references
  • Etc.
  • Why is it that most document models are trees?
  • Doesnt that complicate things?

7
Summary
  • Most current models of documents are
  • Limiting inaccurate
  • Fail to deal with text well (data centric)
  • Documents are not always inherently trees
  • Though it is often very convenient to look at
    them as such.
  • Documents do not necessarily have a canonical
    format.
  • Implied semantics are as important as defined
    semantics.
  • LMNL helps resolve some of the issues.

8
LMNL Syntax
9
LMNL Syntax
  • Borrows things from XML
  • Any sequence of characters is a LMNL document if
  • It uses UTF-16 or UTF-8 as the encoding.
  • It does not use or without escaping.
  • Predefined LMNL Entities
  • amp() lsqb() rsqb() lcub() rcub()
    apos(') quot()
  • LMNL Declaration
  • Encoding and version.

I am a LMNL document Hey, both you amp
me! !lmnl version0.2 !lmnl version0.2
encodingISO-8859-1
10
LMNL Syntax Ranges Tags
  • A range is an identified sequence of characters
    in the text.
  • Ranges can overlap.
  • A range can be delineated using tags.
  • There are start and end tags.
  • End tags may be empty.
  • Ranges delineated by tags, but without names
    (start and end tags are empty) are called
    anonymous ranges.
  • Overlapping ranges can use tag identifiers for
    disambiguation.
  • Tags can be empty.

11
LMNL Syntax Annotations
  • Tags (start or end) may have associated
    annotations.
  • Unlike XML, multiple annotations of the same name
    are allowed.
  • Unlike XML, annotations can be structured.
  • Annotations give anonymous ranges something to do.

12
LMNL Syntax Additional Bits
  • Comments allowed anywhere
  • !-- A comment --
  • Numeric character references just like XML
  • xA0
  • Entities can be declared anywhere
  • !entity nbspxA0
  • !entities hreffoo.lmnl
  • Namespaces - sane namespaces
  • !ns bibhttp//www.example.com/bib
  • Layers allow structural groupings of ranges
    (more later)
  • !layer namefoo basetext
  • tagfoo...

13
LMNL Data Model
14
LMNL Data Model The Basics
  • A LMNL document is made up of
  • One or more layers.
  • Layers may contain characters or ranges
  • Ranges may have annotations
  • LMNL is ultimately defined in terms of the model,
    not the syntax
  • The data model is based on, and much aligned with
    Attributed Range Algebra

15
LMNL Data Model Layers
  • A sequence (from ARA) is a completely ordered
    finite set of items
  • A layer is a sequence of characters or ranges. A
    layer containing characters is called a text
    layer.
  • A layer has these properties
  • base The layer over which the ranges in this
    layer span.
  • content The actual items
  • overlays A set of layers that range over this
    layer

Sequence of characters (file/string)
16
LMNL Data Model Ranges
  • A range is a span over a sequence of items in a
    layer.
  • A range has these properties
  • owner layer The layer this range is part of.
  • name The (possibly empty) name of the range.
  • start The starting offset for the range.
  • length The length of the range. Length 0 is
    acceptable.
  • annotations A (possibly empty) sequence of
    annotations.

0,3,owner-layerlexical,namenull,annotationsnul
l

17
LMNL Data Model Annotations
  • Essentially a name,value pair.
  • An annotations has these properties
  • owner A range or annotation that this
    annotation is attached to (yes, annotations can
    be annotated).
  • name The annotation name.
  • value A text layer containing the value of the
    attribute.
  • annotations A (possibly empty) sequence of
    annotations.

18
LMNL Data Model LayersRanges
  • Layer can contain characters or ranges.
  • Ranges span items in a layer.
  • There can be any number of layers.
  • Ranges can therefore span sequences of ranges,
    thereby identifying higher-level structures.
  • There can be an arbitrary number of layers
    supporting an arbitrary number of views of the
    content (JITT).

19
LMNL Data Model Syntax
20
LMNL Reified Data Model
  • Extra layer that exposes the syntax to the
    application as well

linkimg srcfoo.jpgsrclink img
srcfoo.jpgsrclink
Start tag
End tag
Text
21
Relationship To XML
  • LMNL borrows good ideas from XML, and has a
    similar feel
  • LMNL syntax goes beyond XML
  • Overlapping markup
  • Structured attributes
  • LMNL is really a data model
  • A LMNL processor could well parse XML syntax, the
    application would be syntax independent
  • XML can be converted to LMNL syntax, but LMNL
    syntax requires hacks to be converted to XML

22
Conclusions
23
Conclusions
  • LMNL is data model and syntax that provides
    additional functionality over XML
  • LMNL is based on ARA, an evolving formalism of
    markup and parsing
  • Current status
  • Basic specifications for syntax, data model and
    reified layer complete
  • Basic parser (LMNOP) like SAX
  • Future work
  • RAQUEL Query language
  • Concurrent forest regular expressions for
    validation
  • Tools formatter/translator/etc.

For more information http//www.lmnl.org/
Write a Comment
User Comments (0)
About PowerShow.com