Title: engenda
1engenda
Gavin Thomas Nicol CTO Red Bridge Interactive,
Inc.
2Agenda
3Presentation Agenda
- Motivation
- LMNL Syntax
- LMNL Data Model
- Reified LMNL Data Model
- Relationship to XML
- Conclusions
4Motivation
5Which came first, text or markup?
- A text is some form of written communication
- Markup is added to the text to
- Correct, or annotate the text
- Add semantics to the text.
- Aid in formatting.
- Why is it that formal models of documents model
the markup?
6Is a document really a tree?
- Simplistically, yes. Practically, no.
(DeRose/Renear et al). - Documents have
- Annotations
- Overlapping structures
- Cross references
- Etc.
- Why is it that most document models are trees?
- Doesnt that complicate things?
7Summary
- Most current models of documents are
- Limiting inaccurate
- Fail to deal with text well (data centric)
- Documents are not always inherently trees
- Though it is often very convenient to look at
them as such. - Documents do not necessarily have a canonical
format. - Implied semantics are as important as defined
semantics. - LMNL helps resolve some of the issues.
8LMNL Syntax
9LMNL Syntax
- Borrows things from XML
- Any sequence of characters is a LMNL document if
- It uses UTF-16 or UTF-8 as the encoding.
- It does not use or without escaping.
- Predefined LMNL Entities
- amp() lsqb() rsqb() lcub() rcub()
apos(') quot() - LMNL Declaration
- Encoding and version.
I am a LMNL document Hey, both you amp
me! !lmnl version0.2 !lmnl version0.2
encodingISO-8859-1
10LMNL Syntax Ranges Tags
- A range is an identified sequence of characters
in the text. - Ranges can overlap.
- A range can be delineated using tags.
- There are start and end tags.
- End tags may be empty.
- Ranges delineated by tags, but without names
(start and end tags are empty) are called
anonymous ranges. - Overlapping ranges can use tag identifiers for
disambiguation. - Tags can be empty.
11LMNL Syntax Annotations
- Tags (start or end) may have associated
annotations. - Unlike XML, multiple annotations of the same name
are allowed. - Unlike XML, annotations can be structured.
- Annotations give anonymous ranges something to do.
12LMNL Syntax Additional Bits
- Comments allowed anywhere
- !-- A comment --
- Numeric character references just like XML
- xA0
- Entities can be declared anywhere
- !entity nbspxA0
- !entities hreffoo.lmnl
- Namespaces - sane namespaces
- !ns bibhttp//www.example.com/bib
- Layers allow structural groupings of ranges
(more later) - !layer namefoo basetext
- tagfoo...
13LMNL Data Model
14LMNL Data Model The Basics
- A LMNL document is made up of
- One or more layers.
- Layers may contain characters or ranges
- Ranges may have annotations
- LMNL is ultimately defined in terms of the model,
not the syntax - The data model is based on, and much aligned with
Attributed Range Algebra
15LMNL Data Model Layers
- A sequence (from ARA) is a completely ordered
finite set of items - A layer is a sequence of characters or ranges. A
layer containing characters is called a text
layer. - A layer has these properties
- base The layer over which the ranges in this
layer span. - content The actual items
- overlays A set of layers that range over this
layer
Sequence of characters (file/string)
16LMNL Data Model Ranges
- A range is a span over a sequence of items in a
layer. - A range has these properties
- owner layer The layer this range is part of.
- name The (possibly empty) name of the range.
- start The starting offset for the range.
- length The length of the range. Length 0 is
acceptable. - annotations A (possibly empty) sequence of
annotations.
0,3,owner-layerlexical,namenull,annotationsnul
l
17LMNL Data Model Annotations
- Essentially a name,value pair.
- An annotations has these properties
- owner A range or annotation that this
annotation is attached to (yes, annotations can
be annotated). - name The annotation name.
- value A text layer containing the value of the
attribute. - annotations A (possibly empty) sequence of
annotations.
18LMNL Data Model LayersRanges
- Layer can contain characters or ranges.
- Ranges span items in a layer.
- There can be any number of layers.
- Ranges can therefore span sequences of ranges,
thereby identifying higher-level structures. - There can be an arbitrary number of layers
supporting an arbitrary number of views of the
content (JITT).
19LMNL Data Model Syntax
20LMNL Reified Data Model
- Extra layer that exposes the syntax to the
application as well
linkimg srcfoo.jpgsrclink img
srcfoo.jpgsrclink
Start tag
End tag
Text
21Relationship To XML
- LMNL borrows good ideas from XML, and has a
similar feel - LMNL syntax goes beyond XML
- Overlapping markup
- Structured attributes
- LMNL is really a data model
- A LMNL processor could well parse XML syntax, the
application would be syntax independent - XML can be converted to LMNL syntax, but LMNL
syntax requires hacks to be converted to XML
22Conclusions
23Conclusions
- LMNL is data model and syntax that provides
additional functionality over XML - LMNL is based on ARA, an evolving formalism of
markup and parsing - Current status
- Basic specifications for syntax, data model and
reified layer complete - Basic parser (LMNOP) like SAX
- Future work
- RAQUEL Query language
- Concurrent forest regular expressions for
validation - Tools formatter/translator/etc.
For more information http//www.lmnl.org/