Title: Digitization Workflow Management System for Massive Digitization Projects
1Digitization Workflow Management System for
Massive Digitization Projects
The 2nd International Conference on Universal
Digital Library 2006 (ICUDL 2006) Mohamed
Yakout Noha Adly Magdy
Nagi mohamed.yakout_at_bibalex.org
noha.adly_at_bibalex.org
magdy.nagi_at_bibalex.org
- Bibliotheca Alexandrina
- November 19, 2006
2Goals
- Automate, track and manage the digitization
workflow. - Flexibility in defining digitization workflow
Phases. - Support dynamic evolution and deviations with a
history tracking. - Flexibility integration with the LIS and Library
Digital Repository. - Accept external partially digitized Jobs to start
in the proper Phase within the digitization
workflow - Simultaneous management of multiple projects with
a diversity of materials (books, journals,
manuscripts, audio, video, slides, etc)
3Related Work
- Manual workflow management using several software
packages (MS Excel, MS SharePoint, MS Project) - Simple tracking workflow system with limited
capabilities - Several integrated digitization activities
(digital capturing, image processing, OCRing, )
in one software - DOCWorks from CCS.
- BookRestorer from i2s.
- OUPS
- Limitations
- Tightly coupled with certain tools and do not
allow easily other tools to be integrated. - No Resources Management (e.g. Workstations and
users) - Lack of projects and collections management.
- Manual files handling between the storage server
and clients. - Lack of handling workflow exceptions, dynamic
evolution and deviations except through manual
intervention.
4System Data Model
5System Data Model
- The object being digitized
- Book for Naguib Mahfouz
- Photos for an event
- Map for Alexandria
- Music sheet for Omar Khayrat
6System Data Model
- All types of materials in the system
- Book Manuscripts
- Map Journals
- Audio Video
7System Data Model
- A task that should be applied within the
digitization process - Scanning Processing
- OCRing Encoding
- Publishing Zipping for archiving
8System Data Model
- The system users with several roles
- Digital lab operators
- Shift operators
- Administrator
9System Data Model
- Represents logical grouping for the Jobs
- Nasser
- AlexMed
- AMEEL
10System Data Model
- The computer used to perform the Phase
11System Architecture
12System Architecture
13System Architecture
14System Handlers
ltPhase Name"Book Arabic OCR"gt ltPrePhasegt
ltPhysical Mode"UnRestricted"gt ltFolder
Name"OTIFF" Create"false"
ToDestination"false" NewName"OTIFF"
Mode"Restircted"gt ltFile
Name"OriginalFiles" Type"tif" Count""
ToDestination"false" Compare""/gt
lt/Foldergt . . lt/Physicalgt
lt/PrePhasegt ltPostPhasegt ltPhysical
Mode"UnRestricted"gt ltFolder Name"TXT"
Create"false" ToDestination"true"
NewName"TXT" Mode"Restircted"gt
ltFile Name"" Type"frf" Count"1"
ToDestination"true" Compare""/gt
ltFile Name"" Type"art" Count"1"
ToDestination"true" Compare""/gt
lt/Foldergt lt/Physicalgt ltDatabasegt
ltField Name"Font" DisplayName"Font Family "
/gt ltField Name"LrnPage" DisplayName"Learn
Page "/gt . . lt/Databasegt
ltReflectionCall Method"packageName.doSomething"
/gt lt/PostPhasegt lt/Phasegt
- XML Phases Definition Handler
- Pre-Phase and Post-Phase
- Physical section
- Database section
- Reflection Call
15System Handlers
ltPhase Name"Book Arabic OCR"gt ltPrePhasegt
ltPhysical Mode"UnRestricted"gt ltFolder
Name"OTIFF" Create"false"
ToDestination"false" NewName"OTIFF"
Mode"Restircted"gt ltFile
Name"OriginalFiles" Type"tif" Count""
ToDestination"false" Compare""/gt
lt/Foldergt . . lt/Physicalgt
lt/PrePhasegt ltPostPhasegt ltPhysical
Mode"UnRestricted"gt ltFolder Name"TXT"
Create"false" ToDestination"true"
NewName"TXT" Mode"Restircted"gt
ltFile Name"" Type"frf" Count"1"
ToDestination"true" Compare""/gt
ltFile Name"" Type"art" Count"1"
ToDestination"true" Compare""/gt
lt/Foldergt lt/Physicalgt ltDatabasegt
ltField Name"Font" DisplayName"Font Family "
/gt ltField Name"LrnPage" DisplayName"Learn
Page "/gt . . lt/Databasegt
ltReflectionCall Method"packageName.doSomething"
/gt lt/PostPhasegt lt/Phasegt
- XML Phases Definition Handler
- Pre-Phase and Post-Phase
- Physical section
- Database section
- Reflection Call
16System Handlers
ltPhase Name"Book Arabic OCR"gt ltPrePhasegt
ltPhysical Mode"UnRestricted"gt ltFolder
Name"OTIFF" Create"false"
ToDestination"false" NewName"OTIFF"
Mode"Restircted"gt ltFile
Name"OriginalFiles" Type"tif" Count""
ToDestination"false" Compare""/gt
lt/Foldergt . . lt/Physicalgt
lt/PrePhasegt ltPostPhasegt ltPhysical
Mode"UnRestricted"gt ltFolder Name"TXT"
Create"false" ToDestination"true"
NewName"TXT" Mode"Restircted"gt
ltFile Name"" Type"frf" Count"1"
ToDestination"true" Compare""/gt
ltFile Name"" Type"art" Count"1"
ToDestination"true" Compare""/gt
lt/Foldergt lt/Physicalgt ltDatabasegt
ltField Name"Font" DisplayName"Font Family "
/gt ltField Name"LrnPage" DisplayName"Learn
Page "/gt . . lt/Databasegt
ltReflectionCall Method"packageName.doSomething"
/gt lt/PostPhasegt lt/Phasegt
- XML Phases Definition Handler
- Pre-Phase and Post-Phase
- Physical section
- Database section
- Reflection Call
17System Handlers
ltPhase Name"Book Arabic OCR"gt ltPrePhasegt
ltPhysical Mode"UnRestricted"gt ltFolder
Name"OTIFF" Create"false"
ToDestination"false" NewName"OTIFF"
Mode"Restircted"gt ltFile
Name"OriginalFiles" Type"tif" Count""
ToDestination"false" Compare""/gt
lt/Foldergt . . lt/Physicalgt
lt/PrePhasegt ltPostPhasegt ltPhysical
Mode"UnRestricted"gt ltFolder Name"TXT"
Create"false" ToDestination"true"
NewName"TXT" Mode"Restircted"gt
ltFile Name"" Type"frf" Count"1"
ToDestination"true" Compare""/gt
ltFile Name"" Type"art" Count"1"
ToDestination"true" Compare""/gt
lt/Foldergt lt/Physicalgt ltDatabasegt
ltField Name"Font" DisplayName"Font Family "
/gt ltField Name"LrnPage" DisplayName"Learn
Page "/gt . . lt/Databasegt
ltReflectionCall Method"packageName.doSomething"
/gt lt/PostPhasegt lt/Phasegt
- XML Phases Definition Handler
- Pre-Phase and Post-Phase
- Physical section
- Database section
- Reflection Call
18System Architecture
19System Architecture
20System Architecture
21System Architecture
22System Modules
- Check-In
- Plug-in based for integration.
- Creates the Job in the system
- Assign the Job to any Phase
- Check-Out
- Java Reflection Call section of the XML Phases
Definition - Ingest the Jobs digital objects into the
repository
23System Architecture
24System Modules
- Phases Manager
- Request a new Job
- Download the Jobs folders and files
- Submit the Job back to the system to continue
other Phases - Reject a Job and recommend another Phase in
addition to specifying reasons. - Redirect a Job from the default Phase Sequence
- Provide information on the files level to help
solving problems
25System Modules (Contd)
- Reporting
- Workflow Tracking
- Pending Items
- Late Jobs
- Operators rates
- Build Customized Report
- Archiving
- On different Medias with different size and on
online storage - Administration
26BA Digitization Workflow
27(No Transcript)
28Quality Assurance
- Supported on two different stages
- Maintain QA information on the files levels while
moving from a Phase to another. - A QA Phase is defined in the Digitization Phase
Sequence as the last Phase before the Archiving
29Achieving Flexibility Using DWMS
- The defined Phase Sequence for a Job Type is a
guide, rather than a prescription. - The list of Phases can or can not be in the Phase
Sequence. The operator can assign the Job to any
of all of these Phases. - Jobs can be Forwarded dynamically to another
Phase in the Phase Sequence. - Changes in the Phase Sequence affects the current
and new Jobs in the system, leading to natural
process evolution
30Job Life Cycle
31Future Work
- Check-out plug-in for Fedora..
- Check-in plug-ins will be implemented to support
various metadata standards formats MODS, DC, VAR,
etc. - Enhance the software interface with graphical
tools to help design and follow the digitization
process.
32Thank You
- mohamed.yakout_at_bibalex.org