Title: Constraint Propagation vs Syntactical Analysis for the Logical Structure of Library References
1Constraint Propagation vs Syntactical Analysis
for the Logical Structure of Library References
A. Belaïd LORIA-CNRS Nancy France
Y. Chenevoy CRID Univ. Bourgogne Dijon, France
- Outline
- Structure Modeling
- Syntactical Analysis
- Constraint Propagation
- Results Conclusion
2Model generic structure
3Model Attribute Grammar
Object Constructor subordinate objects
qualifier sequence,
required, aggregate, optional, choice
repetitive Separator
space, graphic line / punctuation
Attributes Physical Logical
Typographical
position lexicon
typeface Weights
Attributes Sub-objects
Imp / Reco.
Imp / Hyp. Ambig.
4Syntactical Analysis the approach
- Top-down Model driven
- Bottom-up Data driven
- Mixed
- - Anchor points extraction (o) - Bottom-up
Choice of a rule A ??o o ?o - Top-down
verification for left context ?o right
context ?o - Add A to anchor points
5Syntactical Analysis Left context verification
6Initials Finals
Model G (Vn, Vt, P, S)
- Finals
- O Cho A B C F(O) A, B, CO Seq A B C
F(O) CO Seq A B C? F(O) B, C - O ?Vt , F(O) O
- O ?Vn , F(O) F(O) ? (?i?F(O) F(i))
- Initials
- O Cho A B C I(O) A, B, CO Seq A B C
I(O) AO Seq A? B C I(O) A, B - O ?Vt , I(O) O
- O ?Vn , I(O) I(O) ? (?i?I(O) I(i))
7Indices Extraction without OCR
Specific problems
Corr. with
Corr. with
4.7
16.1
76.7
43.3
37.5
91.0
55.5
31.5
37.5
61
8Indices Extraction the approaches
( )
Masks
_-
Bounding Box Baseline
Profile Projection
. ,
Bounding Box Baseline
Particular words
Sound Lines
Text style (Bold Italic Underlined)
( spaced text)
(Small text)
- Projection - Spacing - Bounding Box
9Constraint Propagation
10Neighbors (Example)
11Propagation Results
Frag. Possible labels After Cons.
Prop. 1 2 1 2 23 1 3 23 2 4 23 3 5 2 1 6 7 1
7 10 1 8 7 1 9 3 1 ...
12Model Compilation
- Pre-processing of the model
- Find initials, finals and neighborslet LNa,p
the set of possible neighbors at the left of a in
the rule - p?? a ? ? ? (Vt ? Vn) ? ? ((Vt ? Vn) -
a) if a ? ? then LNa,p F ? else LNa,p F
? ? LNa ?by extension lna,p ?l?LNa,p Fl - and LNa ? p?Pa lna,p the left neighborhood
of a in the model - A is left compatible with B if B ? LNA or A ?
RNB or(A ? B) ? PA ? PA and ? PB ? PB / PA ?
PB
13Results
Group Vedette
Area Title
Principal Title
Crossing Title
End of the title
Cros. Formulae
Area Address / Date
Crossing Title
Address
Date
Area Collection
200 references 75
Group Cote
14Results scientific references
400 references 99.8
15Results
Yua 95 J. Juan, Y. Y. Tang, and C. Y. Suen.
Four Directional Adjacency Graphs (fdag) and
their Application in Locating \34elds in Forms.
In Third International Conference on Document
Analysis and Recognition (ICDAR95), pages 752\25
755. IEEE Computer Society Press, Aug. 1995.
Author(3) J. Juan, Y. Y. Tang, and C. Y.
Suen Title Four Directional
Adjacency Graphs (fdag) and their Application in
Locating fields in Forms Editor (0) Month
Aug Year 1995 Volume
Number Publisher IEEE
Computer Society Press ADDRESS PA--GES
752-755 Organization Booktitle Third
International Conference on Document Analysis and
Recognition (ICDAR95) Series Note
16 Conclusion
- Weak points
- 25 lead to inconsistant chain
- Feasability study without OCR
- Weakness of indices extractio algo.
- Local context handling
- Strong points or improvements
- Fast analysis
- Structure well recognized for the others
- The method can be applied with OCR with
better results - Global context can be applied (path
consistency) at the cost of CPU time
- Good for ambiguous models
- Limit the number of hypotheses during the
analysis - Limit the number of backtracking