Expected category of contribution: OCR for Indian Languages

1 / 7
About This Presentation
Title:

Expected category of contribution: OCR for Indian Languages

Description:

Expected category of contribution: OCR for Indian Languages ... ISI Kolkata IISc Banglore. CSIO Chandigrah IIT Kanpur. IIT Roorkee ELDA,France ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 8
Provided by: bks7

less

Transcript and Presenter's Notes

Title: Expected category of contribution: OCR for Indian Languages


1
OPTICAL CHARATCER RECOGNITION SYSTEM FOR INDIAN
LANGUAGES
Proposer CDAC Noida
Name of the company CDAC Noida
Language/Language pair Hindi, Punjabi, Marathi,
Tamil, Telgu, Malayalam, Bangla, Oriya
Expected category of contribution OCR for Indian
Languages
2
  • Strength of CDAC Noida
  • Technical Capabilities NLP Lab equipped with
    necessary software and 50 trained
    Engineers
  • Previous collaboration with universities/RD
    institutions
  • ABBYY Software Ltd, Moscow W3C
  • Thapar Institute of Engineering, Patiala
  • University of Hyderabad Jamia-Milia
  • IIT Guwahati CDAC Trivandrum
  • Utkal Univ CDAC Kolkata
  • ISI Kolkata IISc Banglore
  • CSIO Chandigrah IIT Kanpur
  • IIT Roorkee ELDA,France
  • DRDO, DElhi CSTT, Delhi BITS
    Pilani Banasthali Vidyapeeth COCOSDA,
    Japan Kumaon Univ, Nainital MGAHV,
    Wardha Delhi Press Prakashan
  • Pustak Mahal Kendriya Hindi Sansthan

3
Previous work done in this or similar areas
  • Beta Version of product named Swarnakriti
    (Integration of Indian Languages OCRs and Hindi
    TTS with Unicode Word Processor) has been
    released and made available for public domain
    usage through ILDC portal. Product has been
    appreciated by users.
  • Chitraksharika OCR for Devanagari Script.

4
Proposed Approach and Architecture
Components based approach will be used for
development of OCR system. Major Components to be
developed for OCRs are 1.Page Layout Analysis
Engine (Page Segmentation) 2.Visual Component
Extraction Engine 3.Visual Component Recognizer
Engine (Template Based) 4. Post Processor Engine
(Error Correction and Detection Module) 5.
Testing Data Annotation Tools
5
Architecture
Proposed architecture of the system for Phase-I
Phase-II is as follows
Phase I
In Phase I we will be using the ABBYY
International, Moscow Fine Reader Engine 7.1
SDK for Document/Page Analysis Engine. Rest
all components will be developed by CDAC Noida
Phase II
In Phase II we will be using our own developed
Document/Page Analysis Engine.
6
Phase I
User Friendly Graphical User Interface
Devnagari Script
GurmukhiScript
BanglaScript
Tamil Script
TelguScript
OCRs
ABBYY Software Ltd Fine Reader Engine 7.1
SDK Document Layout Analysis Layout Retention
API Engine
Respective Modules will be Developed
7
Phase II
Write a Comment
User Comments (0)