Constraint Propagation vs Syntactical Analysis for the Logical Structure of Library References - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Constraint Propagation vs Syntactical Analysis for the Logical Structure of Library References

Description:

for the Logical Structure of Library References. A. Bela d. LORIA ... Weakness of indices extractio algo. Local context handling. Strong points or improvements ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 17
Provided by: abdel9
Category:

less

Transcript and Presenter's Notes

Title: Constraint Propagation vs Syntactical Analysis for the Logical Structure of Library References


1
Constraint Propagation vs Syntactical Analysis
for the Logical Structure of Library References
A. Belaïd LORIA-CNRS Nancy France
Y. Chenevoy CRID Univ. Bourgogne Dijon, France
  • Outline
  • Structure Modeling
  • Syntactical Analysis
  • Constraint Propagation
  • Results Conclusion

2
Model generic structure
3
Model Attribute Grammar
Object Constructor subordinate objects
qualifier sequence,
required, aggregate, optional, choice
repetitive Separator
space, graphic line / punctuation
Attributes Physical Logical
Typographical
position lexicon
typeface Weights
Attributes Sub-objects
Imp / Reco.
Imp / Hyp. Ambig.
4
Syntactical Analysis the approach
  • Top-down Model driven
  • Bottom-up Data driven
  • Mixed
  • - Anchor points extraction (o) - Bottom-up
    Choice of a rule A ??o o ?o - Top-down
    verification for left context ?o right
    context ?o - Add A to anchor points

5
Syntactical Analysis Left context verification
6
Initials Finals
Model G (Vn, Vt, P, S)
  • Finals
  • O Cho A B C F(O) A, B, CO Seq A B C
    F(O) CO Seq A B C? F(O) B, C
  • O ?Vt , F(O) O
  • O ?Vn , F(O) F(O) ? (?i?F(O) F(i))
  • Initials
  • O Cho A B C I(O) A, B, CO Seq A B C
    I(O) AO Seq A? B C I(O) A, B
  • O ?Vt , I(O) O
  • O ?Vn , I(O) I(O) ? (?i?I(O) I(i))

7
Indices Extraction without OCR
Specific problems
Corr. with
Corr. with
4.7
16.1
76.7
43.3
37.5
91.0
55.5
31.5
37.5
61
8
Indices Extraction the approaches
( )
Masks
_-
Bounding Box Baseline
Profile Projection
. ,
Bounding Box Baseline
Particular words
Sound Lines
Text style (Bold Italic Underlined)
( spaced text)
(Small text)
- Projection - Spacing - Bounding Box
9
Constraint Propagation
10
Neighbors (Example)
11
Propagation Results
Frag. Possible labels After Cons.
Prop. 1 2 1 2 23 1 3 23 2 4 23 3 5 2 1 6 7 1
7 10 1 8 7 1 9 3 1 ...
12
Model Compilation
  • Pre-processing of the model
  • Find initials, finals and neighborslet LNa,p
    the set of possible neighbors at the left of a in
    the rule
  • p?? a ? ? ? (Vt ? Vn) ? ? ((Vt ? Vn) -
    a) if a ? ? then LNa,p F ? else LNa,p F
    ? ? LNa ?by extension lna,p ?l?LNa,p Fl
  • and LNa ? p?Pa lna,p the left neighborhood
    of a in the model
  • A is left compatible with B if B ? LNA or A ?
    RNB or(A ? B) ? PA ? PA and ? PB ? PB / PA ?
    PB

13
Results
Group Vedette
Area Title
Principal Title
Crossing Title
End of the title
Cros. Formulae
Area Address / Date
Crossing Title
Address
Date
Area Collection
200 references 75
Group Cote
14
Results scientific references
400 references 99.8
15
Results
Yua 95 J. Juan, Y. Y. Tang, and C. Y. Suen.
Four Directional Adjacency Graphs (fdag) and
their Application in Locating \34elds in Forms.
In Third International Conference on Document
Analysis and Recognition (ICDAR95), pages 752\25
755. IEEE Computer Society Press, Aug. 1995.

Author(3) J. Juan, Y. Y. Tang, and C. Y.
Suen Title Four Directional
Adjacency Graphs (fdag) and their Application in
Locating fields in Forms Editor (0) Month
Aug Year 1995 Volume
Number Publisher IEEE
Computer Society Press ADDRESS PA--GES
752-755 Organization Booktitle Third
International Conference on Document Analysis and
Recognition (ICDAR95) Series Note

16
Conclusion
  • Weak points
  • 25 lead to inconsistant chain
  • Feasability study without OCR
  • Weakness of indices extractio algo.
  • Local context handling
  • Strong points or improvements
  • Fast analysis
  • Structure well recognized for the others
  • The method can be applied with OCR with
    better results
  • Global context can be applied (path
    consistency) at the cost of CPU time
  • Good for ambiguous models
  • Limit the number of hypotheses during the
    analysis
  • Limit the number of backtracking
Write a Comment
User Comments (0)
About PowerShow.com