Part of Speech Tagging of Indian languages using - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Part of Speech Tagging of Indian languages using

Description:

Part of Speech Tagging. Is the task of assigning POS tags to words ... The Part of Speech taggers for Hindi should morphological information ... – PowerPoint PPT presentation

Number of Views:238
Avg rating:3.0/5.0
Slides: 21
Provided by: Gue142
Category:

less

Transcript and Presenter's Notes

Title: Part of Speech Tagging of Indian languages using


1
Part of Speech Tagging of Indian languages using
Hidden Markov Model Ph. D. Seminar
Report by Manish Shrivastava Roll no.
03405002 Under the guidance of Dr. Pushpak
Bhattacharyya
2
Presentation Outline
  • Part of Speech Tagging
  • Motivation
  • Existing Taggers
  • Need for Part of Speech Taggers for Indian
    languages
  • Part of Speech Tagging of Indian languages
  • The Morphological Perspective
  • Morphological Advantages
  • Hidden Markov Model
  • Conclusions
  • Future work

3
Part of Speech Tagging
  • Is the task of assigning POS tags to words
  • Selecting among more than one tags that apply
  • Can be used for further NLP tasks
  • Information extraction, Question Answering etc.

4
Example of POS tagging
5
Motivation
  • Lack of significant tools for Indian languages
  • Dependence of other NLP activities on PoS tagging
  • Failure of existing techniques on Indian Languages

6
Existing Taggers
  • Techniques used for foreign languages
  • Rule Based Tagging
  • Stochastic Tagging

7
Overview of PoS tagging
8
Existing Taggers
  • Rule Based Taggers
  • Brill tagger
  • Stochastic Taggers
  • CLAWS tagger
  • Tree tagger

9
Need for a new Taggers for Hindi
  • The existing taggers fail on Indian languages
  • The grammatical structure differs
  • Free word structure of Hindi
  • Stochastic taggers cannot give good performance
  • Morphological Information not taken into account

10
Example of Free word structure
11
Part of Speech tagging of Indian Languages
  • To make efficient taggers
  • Get morphological information
  • Use heuristics to use morphological information

12
Morphological Perspective
  • Three kind of word morphologies
  • Verb
  • Noun
  • Adjectives

13
Morphological Perspective
  • Noun Morphology
  • Depicting possesion
  • laD,ka Possesion laD,ko ka
  • Depicting number
  • laD,ka plural laD,ko

14
Morphological Perspective
  • Verb Morphology
  • Tense
  • Kola laD,ko Kola rho hO.
  • Kola laDko Kolato qao .
  • Kola laD,ko Kolanaa caahto hOM.

15
Morphological Advantage
  • POS tag heuristic
  • Noun
  • laD,kaoM Suffix -- oM aoM
  • sahoilayaaoM Suffix -- iyoN yaaoM
  • Verb
  • pZ,UMgaa Suffix -- UMgA Mgaa
  • pZ,ta Suffix -- wA ta

16
Morphological Advantages
  • Morphological strength of Hindi helps in
    efficient tagging
  • The morphological information can be used for
    further tasks

17
The Tool Hidden Markov Model
  • Why HMM
  • Underlying events generate surface probabilities
  • The models can be trained using Expectation
    Maximization algorithm.
  • Easy to port to other languages

18
Hidden Markov Model
Example of a Hidden Markov Model
19
Hidden Markov Model
  • The Parameters
  • ?i initial state probabilities
  • aij state transition probability
  • bij probability of recognizing kth symbol in
    transition from i to j
  • Estimation
  • Initial estimation done with training data
  • Re-estimation done using Baum-Welch Re-estimation

20
Conclusions
  • The Part of Speech taggers for Hindi should
    morphological information
  • To make efficient taggers we must allow use of
    heuristics
  • Hidden Markov Models can be used for portable
    taggers.
Write a Comment
User Comments (0)
About PowerShow.com