Integration of morphological features and contextual weightage using monotonic chunk attention for part of speech tagging

Show simple item record

dc.contributor.author Mundotiya, Rajesh Kumar
dc.contributor.author Mehta, Arpit
dc.contributor.author Baruah, Rupjyoti
dc.contributor.author Singh, Anil Kumar
dc.date.accessioned 2023-04-19T05:21:34Z
dc.date.available 2023-04-19T05:21:34Z
dc.date.issued 2022-10
dc.identifier.issn 13191578
dc.identifier.uri http://localhost:8080/xmlui/handle/123456789/2102
dc.description This paper is submitted by the author of IIT (BHU), Varanasi, India en_US
dc.description.abstract Part-of-Speech (POS) tagging is a fundamental sequence labeling problem in Natural Language Processing. Recent deep learning sequential models combine the forward and backward word informatio for POS tagging. The information of contextual words to the current word play a vital role in capturing the non-continuous relationship. We have proposed Monotonic chunk-wise attention with CNN-GRU-Softmax (MCCGS), a deep learning architecture that adheres to these essential information. This architecture consists of Input Encoder (IE), encodes word and character-level, Contextual Encoder (CE), assigns the weightage to adjacent word and Disambiguator (D), which resolves intra-label dependencies as core components. Moreover, different morphological features have been integrated into the core components of MCCGS architecture as MCCGS-IE, MCCGS-CE and MCCGS-D. The MCCGS architecture is validated on the 21 languages from Universal Dependency (UD) treebank. The state-of-the-art models, Type constraints, Retrofitting, Distant Supervision from Disparate Sources and Position-aware Self Attention, MCCGS and its variants such as MCCGS-IE, MCCGS-CE and MCCGS-D are obtained mean accuracy 83.65%, 81.29%, 84.10%, 90.18%, 90.40%, 91.40%, 90.90%, 92.30%, respectively. The proposed model architecture provides state-of-the-art accuracy on the low resource languages as Marathi (93.58%), Tamil (87.50%), Telugu (96.69%) and Sanskrit (97.28%) from UD treebank and Hindi (95.64%) and Urdu (87.47%) from Hindi-Urdu multi-representational treebank. en_US
dc.description.sponsorship Science and Engineering Research Board , IIT (BHU), Varanasi, India en_US
dc.language.iso en_US en_US
dc.publisher King Saud bin Abdulaziz University en_US
dc.relation.ispartofseries Journal of King Saud University - Computer and Information Sciences;Volume 34, Issue 9, Pages 7324 - 7334
dc.subject Morphological features en_US
dc.subject Part of Speech tagging en_US
dc.subject Convolutional neural network en_US
dc.subject Attention mechanism en_US
dc.title Integration of morphological features and contextual weightage using monotonic chunk attention for part of speech tagging en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search in IDR


Advanced Search

Browse

My Account