Expected category of contribution: OCR for Indian Languages

About This Presentation

Title:

Expected category of contribution: OCR for Indian Languages

Description:

Expected category of contribution: OCR for Indian Languages ... ISI Kolkata IISc Banglore. CSIO Chandigrah IIT Kanpur. IIT Roorkee ELDA,France ... – PowerPoint PPT presentation

Number of Views:53

Avg rating:3.0/5.0

Slides: 8

Provided by: bks7

more less

Transcript and Presenter's Notes

Title: Expected category of contribution: OCR for Indian Languages

1
OPTICAL CHARATCER RECOGNITION SYSTEM FOR INDIAN
LANGUAGES
Proposer CDAC Noida
Name of the company CDAC Noida
Language/Language pair Hindi, Punjabi, Marathi,
Tamil, Telgu, Malayalam, Bangla, Oriya
Expected category of contribution OCR for Indian
Languages
2

Strength of CDAC Noida
Technical Capabilities NLP Lab equipped with
necessary software and 50 trained
Engineers
Previous collaboration with universities/RD
institutions
ABBYY Software Ltd, Moscow W3C
Thapar Institute of Engineering, Patiala
University of Hyderabad Jamia-Milia
IIT Guwahati CDAC Trivandrum
Utkal Univ CDAC Kolkata
ISI Kolkata IISc Banglore
CSIO Chandigrah IIT Kanpur
IIT Roorkee ELDA,France
DRDO, DElhi CSTT, Delhi BITS
Pilani Banasthali Vidyapeeth COCOSDA,
Japan Kumaon Univ, Nainital MGAHV,
Wardha Delhi Press Prakashan
Pustak Mahal Kendriya Hindi Sansthan

3
Previous work done in this or similar areas

Beta Version of product named Swarnakriti
(Integration of Indian Languages OCRs and Hindi
TTS with Unicode Word Processor) has been
released and made available for public domain
usage through ILDC portal. Product has been
appreciated by users.
Chitraksharika OCR for Devanagari Script.

4
Proposed Approach and Architecture
Components based approach will be used for
development of OCR system. Major Components to be
developed for OCRs are 1.Page Layout Analysis
Engine (Page Segmentation) 2.Visual Component
Extraction Engine 3.Visual Component Recognizer
Engine (Template Based) 4. Post Processor Engine
(Error Correction and Detection Module) 5.
Testing Data Annotation Tools
5
Architecture
Proposed architecture of the system for Phase-I
Phase-II is as follows
Phase I
In Phase I we will be using the ABBYY
International, Moscow Fine Reader Engine 7.1
SDK for Document/Page Analysis Engine. Rest
all components will be developed by CDAC Noida
Phase II
In Phase II we will be using our own developed
Document/Page Analysis Engine.
6
Phase I
User Friendly Graphical User Interface
Devnagari Script
GurmukhiScript
BanglaScript
Tamil Script
TelguScript
OCRs
ABBYY Software Ltd Fine Reader Engine 7.1
SDK Document Layout Analysis Layout Retention
API Engine
Respective Modules will be Developed
7
Phase II

Write a Comment

User Comments (0)