Hierarchical self attention based sequential labelling model for Bhojpuri, Maithili and Magahi languages

Mundotiya, Rajesh Kumar; Mishra, Swasti; Singh, Anil Kumar

IDR Home
→
Article
→
Department of Computer Science and Engineering
→
View Item

dc.contributor.author	Mundotiya, Rajesh Kumar
dc.contributor.author	Mishra, Swasti
dc.contributor.author	Singh, Anil Kumar
dc.date.accessioned	2023-04-18T07:50:09Z
dc.date.available	2023-04-18T07:50:09Z
dc.date.issued	2022-10
dc.identifier.issn	13191578
dc.identifier.uri	http://localhost:8080/xmlui/handle/123456789/2081
dc.description	This paper is submitted by the author of IIT (BHU), Varanasi	en_US
dc.description.abstract	Sequential labelling plays a vital role in solving numerous Natural Language Processing (NLP) applications such as Machine Translation and Information Extraction etc. One of these is Part-of-Speech (POS) tagging, which assigns a sequence of grammatical categories to the given sentence, and Chunking which groups them into ‘chunks’ or what can be called minimal phrases. Bhojpuri, Maithili and Magahi are low resource languages and widely spoken in central north-eastern India, belonging to the Indo-Aryan language family. The creation of an annotated corpus for POS tagging and Chunking, and then building an initial automatic tool for these problems is the first attempt towards building language technology tools for these languages. The annotated corpus used to develop POS Taggers and Chunkers, based on various machine learning algorithms (TnT, CRF, MEMM and Structured SVM) and state-of-the-art LSTM-CNN-CRF model, and then these compared with the obtained results on two new proposed deep learning-based models, Self-Attention Hierarchical Bi-LSTM CRF (SAHBiLC) and a fine-tuned version of it, Fine-SAHBiLC. The SAHBiLC and Fine-SAHBiLC models outperform on Bhojpuri (Accuracy for POS and Chunking is 0.86% and 0.94%, respectively) and Maithili (Accuracy for POS and Chunking is 0.86% and 0.95%, respectively) and Magahi (Accuracy for POS is 0.86%).	en_US
dc.language.iso	en	en_US
dc.publisher	King Saud bin Abdulaziz University	en_US
dc.relation.ispartofseries	Journal of King Saud University - Computer and Information Sciences;Volume 34, Issue 10, Pages 8739 - 8749
dc.subject	Chunking	en_US
dc.subject	Datasets	en_US
dc.subject	Machine learning	en_US
dc.subject	Neural network	en_US
dc.subject	POS tagging	en_US
dc.subject	Transfer learning	en_US
dc.title	Hierarchical self attention based sequential labelling model for Bhojpuri, Maithili and Magahi languages	en_US
dc.type	Article	en_US