Title: Building feature rich POS tagger for morphologically rich languages : Experiences in Hindi
1Building feature rich POS tagger for
morphologically rich languages Experiences in
Hindi
- Aniket Dalal
- Kumar Nagaraj
- Uma Sawant
- Sandeep Shelke
- Pushpak. Bhattacharyya
2Motivation
- POS tagging Preparation for higher level NLP
tasks - Parsing
- Named Entity Recognition
- Translation
- Challenges in Hindi POS tagging
- Morphologically rich
- Free word order language
- Long distance dependencies
3Outline
- Maximum Entropy Markov Model (MEMM)
- System Architecture
- Feature Functions
- Experimental Setup
- Results and Performance Analysis
- Conclusion and Future Work
4Maximum Entropy Markov Model
- MEMM Feature based exponential probabilistic
model. - Feature function Captures relevant aspect of
language. - Fix the best feature set.
- Training Assign that weight to the features
which will maximize the entropy of the model. - Deployment Choose the maximum probable tag
sequence for a sentence.
5System Architecture
6Feature Functions
- Contextual
- Morphological
- Categorical
- Compound
- Lexical
7Contextual Features
- Sense disambiguation
- Trade-off between large and small context
windows - Example
8Morphological Features
- Suffix list
- Useful for unseen word tagging
- Example
- (suffix)
9Categorical Features
- List of POS tags associated with a word
- Exactly one POS tag
- Example
- - noun
(mango), - - adj
(common)
10Compound Features
- Combine information from lexicon and dictionary
- Condition-based features
- Example
- If the word is present in the lexicon as PPN
- - Is the word PPN according to dictionary
OR - - Is the word unknown
11Lexical Features
- English letters
- Numerals
- Special characters
- Example
-
- ISRO, IIT, IIIT
12Experimental Setup
- Maxent package
- Hindi news corpus of BBC
- 4 data sets, manually tagged at IIT Bombay
- 15562 words
- 27 POS tags
13Results Different Context Windows
14Results Introduction of Features
15Results Cross Validation
19 of test data consisted of unseen words.
16Results Per Tag Accuracy
17Good Performance CM, CONJ, PNG, ORD and
NEG
Closed list
Closed list
Closed list
Closed list
Closed list
Closed list
18Good Performance Number
Number
19Good Performance PPN N
Compound Features
compound features
20Poor Performance ADV, QUAN INTEN
Sparse occurrence
21Poor Performance VM VCOP
Semantic level ambiguity
22Performance Analysis
- Good performance
- Closed Lists CM, NEG, PNG, CONJ ORD
- Numbers
- Compound features N PPN
- Poor performance
- Sparse occurrence ADV, QUAN INTEN
- Semantic level ambiguity VCOP and VM
23Conclusion and Future Work
- Contextual, morphological, categorical and
lexical features together deliver high
performance. - Avg. accuracy - 94.38 and Best accuracy -
94.89 - Can be extended to other Indo-Aryan languages by
building language specific resources like stemmer
and dictionary. - Enriching dictionary.
24References
- Adwait Ratnaparakhi. 1996. A maximum entropy
model for part-of-speech tagging. In Erich Brill
and Kenneth Church, editors, Proceedings of the
Conference on Empirical Methods in NLP, pages
133-142. ACL. Somerset, New Jersey. - Adwait Ratnaparakhi. 1997. A simple introduction
to maximum entropy models for natural language
processing. Technical report 97-08, Institute for
Research in Cognitive Science, University of
Pennsylvania. - Adam L. Berger , Vincent J. Della Pietra ,
Stephen A. Della Pietra. 1996. A maximum entropy
approach to natural language processing.
Computational Linguistics, v.22 n.1, p.39-71.
25References
- Jan Haji?c. 2000. Morphological tagging Data vs.
dictionaries. In Proceedings of the 6th Applied
Natural Language Processing and the 1st NAACL
Conference, pages 94101.6 - Manish Shrivastava, N. Agrawal, S. Singh, and P.
Bhattacharya. 2005. Harnessing morphological
analysis in pos tagging task. In Proceedings of
ICON 05, December. - Smriti Singh, Kuhoo Gupta, Manish Shrivastava,
and Pushpak Bhattacharyya. 2006. Morphological
richness offsets resource poverty- an experience
in building a pos tagger for hindi. In
Proceedings of Coling/ACL 2006, Sydney,
Australia, July.
26References
- P. R Ray, V. Harish, A. Basu, S. Sarkar. 2003.
Part of speech tagging and local word grouping
techniques for natural language parsing in Hindi.
In proceedings of ICON 2003, Mysore. - http//maxent.sourceforge.net
27Thank you!Questions ?
28Maximum Entropy Markov Model
- Maximum entropy principle
- The least biased model which considers all known
information is the one which maximizes entropy.
29Maximum Entropy Markov Model
- Maximize entropy
- Under constraints
- where,
30Maximum Entropy Markov Model
The distribution with the maximum entropy(p) is
equivalent to where,