Groundtruthing for Performance Evaluation of Document Image Analysis Systems: a primer - PowerPoint PPT Presentation

About This Presentation
Title:

Groundtruthing for Performance Evaluation of Document Image Analysis Systems: a primer

Description:

M. Delalandre. Groundtruthing for Performance Evaluation of Document Image Analysis Systems: a primer. Digidoc Meeting, Tours, France, 6th of January 2012. – PowerPoint PPT presentation

Number of Views:0
Slides: 8
Provided by: mathieu.delalandre
Category:
Tags:

less

Transcript and Presenter's Notes

Title: Groundtruthing for Performance Evaluation of Document Image Analysis Systems: a primer


1
Groundtruthing for Performance Evaluation of
Document Image Analysis Systems a primer
  • Mathieu Delalandre
  • mathieu.delalandre_at_univ-tours.fr
  • Pattern Recognition and Image Analysis Group
  • Laboratory of Computer Science
  • François Rabelais UniversityTours city, France
  • Digidoc meeting, 6th of January 2012

2
Groundtruthing for Performance Evaluation of
Document Image Analysis Systems a
primerIntroduction
Performance evaluation is a particular
cross-disciplinary research field in a variety of
domains. Its purpose is the development of
frameworks to evaluate and compare a set of
methods in order to select the best-suited for a
given application.
Groundtruth must be reliable (i.e. 100
recognition rate) and exhaustive (label,
localization, geometric transforms, noise
estimation, metadata, etc.)
Considering the document image analysis field
(apart of the graphics), five main approaches
exist.
Speed Reliability Type Constraint
GUI based groundtruthing - - Real (any) None
Semi automatic transcription Real (any) None
Electronic document mapping Real (modern) Electronic document
Transcript mapping Real (old) Transcription
Generation of synthetic document Synthetic (any) None
3
Groundtruthing for Performance Evaluation of
Document Image Analysis Systems a primerGUI
based groundtruthing
Speed Reliability Type Constraint
GUI based groundtruthing - - Real (any) None
Semi automatic transcription Real (any) None
Electronic document mapping Real (modern) Electronic document
Transcript mapping Real (old) Transcription
Generation of synthetic document Synthetic (any) None
Principles GUI plugged to a DIA systems, based
on user correction. e.g. TrueViz Kan01,
Xmillum Hitz00, PinkPanther Yanikoglu01,
PerfectDoc Yacoub05, etc. Pros Discussion
about groundtruth formalism Cons Time consuming
considering the user correction, specific DIA
chains must be designed for every application,
groundtruth is not still reliable.
4
Groundtruthing for Performance Evaluation of
Document Image Analysis Systems a
primerSemi-automatic transcription
Speed Reliability Type Constraint
GUI based groundtruthing - - Real (any) None
Semi automatic transcription Real (any) None
Electronic document mapping Real (modern) Electronic document
Transcript mapping Real (old) Transcription
Generation of synthetic document Synthetic (any) None
Principles To exploit the context and user
interaction to make more robust the recognition
process. Transcription is achieved at metadata
level, without considering the images. e.g.
Bal 08, Lebourgeois 01 Algorithms
binarization and connected component labeling,
shape context, image distance, clustering,
etc. Pros Interesting idea, 5 of labeling
could result in 95 of correct transcription. Con
s What about the robustness, are we sure of a
complete transcription, what about the impact of
the segmentation, robustness of the approach is
not proved yet.
5
Groundtruthing for Performance Evaluation of
Document Image Analysis Systems a
primerElectronic document mapping
Speed Reliability Type Constraint
GUI based groundtruthing - - Real (any) None
Semi automatic transcription Real (any) None
Electronic document mapping Real (modern) Electronic document
Transcript mapping Real (old) Transcription
Generation of synthetic document Synthetic (any) None
Principles A registration algorithm estimates
the global geometric transformation and then
performs a robust local bitmap match to register
an ideal document image to its corresponding
scanned version. e.g. Kan96, Hobby98,
Beusekom08, Kim02 Algorithms Registration
for transformation estimation, RAST (Recognition
using Adaptive Subdivision of Transformation
space), branch-and-bound algorithm Pros The
strongest approach of the literature. Cons
Cant be applied with old documents, as an
electronic version is mandatory.
6
Groundtruthing for Performance Evaluation of
Document Image Analysis Systems a
primerTranscript mapping
Principles Transcript mapping eases the
construction of document image segmentation
ground truth that includes text-image
alignment. e.g. Stamatopoulos10, Zinger09,
Jawahar07, etc. Algorithms HHM, DTW Pros
When no electronic documents exist, certainly the
only valid way to obtain a groundtruth at the
graphical level. Cons Depends of the quality of
transcriptions, producing transcriptions is time
consuming, the approach is more sensitive to
segmentation errors.
7
Groundtruthing for Performance Evaluation of
Document Image Analysis Systems a
primerGeneration of synthetic document
Speed Reliability Type Constraint
GUI based groundtruthing - - Real (any) None
Semi automatic transcription Real (any) None
Electronic document mapping Real (modern) Electronic document
Transcript mapping Real (old) Transcription
Generation of synthetic document Synthetic (any) None
Principles In such a system, the test documents
are generated by an automatic system which
combines pre-defined models of document
components in a pseudo-random way. As documents
are synthetically generated, the groundtruth
becomes automatically available. e.g.
Heroux07, Zi05, etc. Pros No previous
data is mandatory, efficient and exhaustive
groundtruth is generated automatically. Cons
Synthetic is not real, to prove similarity
between synthetic and real data is not so simple.
Write a Comment
User Comments (0)
About PowerShow.com