Detecting Paraphrases in the Marathi Language

Shruti Srivastava; Sharvari Govilkar

doi:10.54646/bijscit.2020.01

PDF HTML XML EPUB

How to Cite

Srivastava, S., & Govilkar, S. (2020). Detecting Paraphrases in the Marathi Language. BOHR International Journal of Smart Computing and Information Technology, 1(1), 1–12. https://doi.org/10.54646/bijscit.2020.01

Published: Feb 5, 2020

Updated: 2020-02-05

DOI: https://doi.org/10.54646/bijscit.2020.01

Dimensions Citation count:

Keywords:

Paraphrase, Marathi Language Statistical, Semantic, Sumo metric, Universal Networking Language (UNL).

Authors

Shruti Srivastava

Department of Computer Engineering, PCE, University of Mumbai, India

Sharvari Govilkar

Department of Computer Engineering, PCE, University of Mumbai, India

Abstract

Paraphrasing refers to writing that either differs in its textual content or is dissimilar in rearrangement of words but conveys the same meaning. Identifying a paraphrase is exceptionally important in various real life applications such as Information Retrieval, Plagiarism Detection, Text Summarization and Question Answering. A large amount of work in Paraphrase Detection has been done in English and many Indian Languages. However, there is no existing system to identify paraphrases in Marathi. This is the first such endeavor in the Marathi Language.

A paraphrase has differently structured sentences, and since Marathi is a semantically strong language, this system is designed for checking both statistical and semantic similarities of Marathi sentences. Statistical similarity measure does not need any prior knowledge as it is only based on the factual data of sentences. The factual data is calculated on the basis of the degree of closeness between the word-set, word-order, word-vector and word-distance. Universal Networking Language (UNL) speaks about the semantic significance in sentence without any syntactic points of interest. Hence, the semantic similarity calculated on the basis of generated UNL graphs for two Marathi sentences renders semantic equality of two Marathi sentences. The total paraphrase score was calculated after joining statistical and semantic similarity scores, which gives a judgement on whether there is paraphrase or non-paraphrase about the Marathi sentences in question.

Share This Article On Social Media

Usage Statistics

Downloads

Download data is not yet available.

Issue

Vol. 1 No. 1 (2020): BOHR International Journal of Smart Computing and Information Technology (BIJSCIT)

Section

Methods

Article Sidebar

Main Article Content

Authors

Abstract

Downloads

Article Details