मराठी भाषा वाचविण्याचा एक प्रयास. इंग्रजी ते मराठीचा शब्दकोश. An attempt to preserve the Marathi language. A lightweight and ad free English to Marathi thesaurus.

Overview

For English, scroll down

मराठी शब्द

मराठी भाषा वाचवण्यासाठी मी हा ओपन सोर्स प्रोजेक्ट सुरू केला आहे.

माझ्या मते, आपली भाषा हळूहळू आणि कोणाचाही लक्षात न येता एका मृत भाषेच्या दिशेने वाटचाल करत आहे. या उपक्रमात सगळ्यांचे स्वागत आहे, ज्यांना कोणाला हा एक गंभीर विषय वाटतो व त्यात काही सुधारणा करण्याची गरज आहे असे वाटते.

अगदी सोप्या रीतीने सांगायचं झाला तर खालील उदाहरण पहा -

१. मराठी वाक्यांमधील इंग्रजी शब्दांचा जास्त आणि अनावश्यक वापर.

  • अयोग्य - "फार bore झालंय. चला एखादा picture बघूया."
  • योग्य - "फार कंटाळा आलाय. चला एखादा चित्रपट बघूया. "

२. देवनगरीऐवजी लॅटिन अक्षरे वापरुन मराठी टायपिंग / लिहिणे

  • अयोग्य - "me tujhya sobat marathi bolat ahe."
  • योग्य - "मी तुझ्या सोबत मराठीत बोलत आहे."

अधिक माहितीसाठी खालील इंग्रजी मजकूर वाचा. आपण सॉफ्टवेअर अभियंते जरी नसाल तरीही आपण योगदान करू शकता.

योगदान करण्यासाठी

१. "Github" वर आपले खाते बनवा

२. "Discussions" पृष्ठावरील आपल्या कल्पना, टिप्पण्या इ. वर चर्चा करा.

Marathi shabd

About

This project is being developed as a part of an effort to help save the Marathi language from its gradual and unnoticeable decline into a dying language.

Goal


(This is the goal of the overall idea and not just this project.)

Revive the usage of Marathi language in its original/unadulterated form in day-to-day life in both spoken and written medium.

How to do it?


  1. Make people realise that these problems exist
  2. Motivate them to work towards fixing it
  3. Provide them with resources (this project basically is a part of this step)
  4. Ask them to do actually implement this in their daily life

This will be done with a combination of videos, blogs and software tools such as this. (Contributions in all these are welcome.)

Overview of this project

The idea is to have a static website (ad free, bloat free and fast) where people, looking to improve their Marathi vocabulary, can search for an English word/phrase and quickly find its Marathi equivalent, and also usage example wherever possible.

Words can also be categorised into various topics (tags) so that words used in same context can be found together to improve the vocabulary those particular topics. More features can be added in the future, if necessary.

So basically it will be an ad-free and fast English-to-Marathi thesaurus for day-to-day words with some additonal features.

Development and contribution

It is currently in its very initial stage where I am conceptualising it and looking for contributors (developers as well as people well versed in the Marathi language).

Some places to do contributions

  • Database update - adding English words with Marathi equivalents
  • Static website creation - Basically parsing the database and creating an output markdown file with all the content. This file will be used on the github.io static website page.
    • note - I would particularly like help in this area as it is new to me as well.
  • Adding/correcting content in Marathi language to this project's documentation (readme, website pages etc.)

(This is the current plan and can be improvised.)

Please suggest your ideas, comments etc. in the "Discussions" page.

I also have in mind quite a few other ideas related to creating resources in Marathi language, which I plan to start once I have this project's website first ready at some usable level.

What is the need to do this?

As I see it, there 2 main problems which are explained below -

  1. Excessive use of English words in Marathi sentences.

Simply stated this is using a lot of English words in our sentences where we could easily use Marathi words. Example -

  • Not OK - "फार bore झालंय. चला एखादा picture बघूया."
  • OK - "फार कंटाळा आलाय. चला एखादा चित्रपट बघूया. "

The direct consequence of this is that we are loosing our grip on the Marathi vocabulary. And this problem is ever growing like a snowball, which needs external force and motivation to fix it. This problem exists in both the spoken as well as the written form. Also while this is particularly serious in the urban population, it may also expand to rural areas as the reach of English schools and the internet widens.

This project currently is for working on the above problem only.

  1. Typing/writing Marathi using the Latin alphabet instead of Devanagari.

This is basically typing Marathi like this

  • Not OK - "me tujhya sobat marathit bolat ahe."
  • OK - "मी तुझ्या सोबत मराठीत बोलत आहे."

This problem is something that I feel should not exist in today's date, as we now have good keyboards for typing in Marathi using Devanagari on all platforms be it mobile or computers. However it continues to exist, as people find it easier to type using Latin alphabet on the qwerty keyboard.

Comments
  • database to markdown file

    database to markdown file

    This script is needed to create a single markdown file containing all the words from the database csv file.

    The output format for each word and it's Marathi equivalent is given in the example present in the template folder.

    opened by sanketgarade 21
  • Added a way to parse tags

    Added a way to parse tags

    Added TopicsParser class. The class's gen_topics function first finds the column index of tags then searches each row at that index and returns the tags in a sorted array.

    opened by masonwoodford 11
  • db to markdown file script

    db to markdown file script

    • input file will be the db.csv
    • output will be a markdown file which will be used on the github pages website. for now it will the be the home page of the site.
    • for now, a user will have to manual search for a word of interest (or can also use the browser's search function.)
    opened by sanketgarade 11
  • CSV Database filter script

    CSV Database filter script

    Program to take a database (csv format currently) as input, keep only the necessary data (as per a filter criteria which is another input), and output this data in the same format as the input database.

    opened by sanketgarade 10
  • extract topics list in topics-list.md

    extract topics list in topics-list.md

    The topics list file is currently manually created.

    It should be created using a script which uses the topicsParser script.

    Please use the develop branch for development.

    enhancement good first issue help wanted 
    opened by sanketgarade 7
  • topics parser

    topics parser

    • parse the database db.csv file tags column and extract the individual tags (topics) from it.
    • if there are multiple tags (separated by semicolons) those also need to be separately extracted
    • duplicates should be removed
    • the output of this parser should be a list of tags in a string array separated by a delimiter (comma for now)
    • this string array can be later used in the main.py script to generated output files by topic (which are currently hardcoded.)
    enhancement good first issue help wanted 
    opened by sanketgarade 7
  • Shouldn't this be a web application with a database instead of a plain csv file.

    Shouldn't this be a web application with a database instead of a plain csv file.

    A web app so that layman users can search for words that they need, or adding new words could be simplified.

    Possible Problem: Everytime a new english word has to be added we have to make changes to csv file and open a PR. Even if in future a github.io page is added, it will still be diffiicult to work with a single csv file.

    Proposed Solution: Web app with its database which can be used to fetch words through a Github page (frontend). And this app can also provide a simple UI to add new English words and their alternative Marathi words.

    enhancement 
    opened by the-kaustubh 6
  • Created a prototype infographics card

    Created a prototype infographics card

    I have created an html file that generates a card similar to what you have created with keynote. If it is approved, I'll write a python script to automate the generation of such html files.

    opened by zarbod 5
  • remove python cache folder __pycache__

    remove python cache folder __pycache__

    a cache folder gets created everytime a python script is run. Is there a way to prevent this from being created? If not then can be deleted after the scripts are executed? If that too is not possible, then probably this can be added to gitignore.

    enhancement 
    opened by sanketgarade 5
  • each browse md file must have a its file name in its heading

    each browse md file must have a its file name in its heading

    • currently the topic files being created have only the word blocks for the words under that topic.
    • we need to add a title/heading to each topic file, so that it is easy to understand which topic this file is about.
    • this will also be helpful when combining (concatenating) multiple topic files to each other, whenever such a feature is needed in the future.
    • the title should be same as name of the file

    eg. the title for topic file science.md will look like

    science

    by using

    # science
    
    ... followed by the individual word blocks
    
    enhancement good first issue help wanted 
    opened by sanketgarade 3
  • remove header row from csv when used for markdown file generation

    remove header row from csv when used for markdown file generation

    currently all md files generated using the python scripts has the header row of "en" and "mr" in it. It (the csv header row) should be removed at the top level function where it is read for the first time.

    bug 
    opened by sanketgarade 3
  • Word info graphic creator

    Word info graphic creator

    This is a feature to create card like images which can be shared on social media for creating awareness.

    Additional to the search functionality, a user can get an info graphic (like the one shown below) when he or she search for a specific word. It can be downloaded as an image (worst case, take a screenshot) and can be shared further.

    If anyone has any idea how this can be done and wants to take it up please let me know here. So that we can discuss further.

    image

    enhancement 
    opened by sanketgarade 3
  • Form for users to submit missing Marathi words for EXISTING English words

    Form for users to submit missing Marathi words for EXISTING English words

    • There are some words in the database where English word is present but the Marathi word is absent.
    • For such missing Marathi words, we can have a page where all the English words are displayed along with a text box beside it (to enter the missing Marathi word) and press a submit button at the top or bottom of the page to submit the words to the database.
    • These user submitted words will be collected in another database and will enter the main database upon review.
    enhancement help wanted 
    opened by sanketgarade 11
  • Create a form for users to submit NEW words

    Create a form for users to submit NEW words

    • Users should be able to submit new words
    • Submission should happen on a dedicated page/form
    • The submitted words should get added to a database of "user suggested words" and not the main database. From there they will be reviewed by the maintainer and added to the main database.
    • the main database is currently in a simple csv file, but any other suitable type of database (of user suggested words) for this feature is ok.
    enhancement good first issue help wanted 
    opened by sanketgarade 3
Owner
मुक्त स्त्रोत
मुक्त स्त्रोत
Using context-free grammar formalism to parse English sentences to determine their structure to help computer to better understand the meaning of the sentence.

Sentance Parser Executing the Program Make sure Python 3.6+ is installed. Install requirements $ pip install requirements.txt Run the program:

Vaibhaw 12 Sep 28, 2022
This repository serves as a place to document a toy attempt on how to create a generative text model in Catalan, based on GPT-2

GPT-2 Catalan playground and scripts to train a GPT-2 model either from scrath or from another pretrained model.

Laura 1 Jan 28, 2022
An attempt to map the areas with active conflict in Ukraine using open source twitter data.

Live Action Map (LAM) An attempt to use open source data on Twitter to map areas with active conflict. Right now it is used for the Ukraine-Russia con

Kinshuk Dua 171 Nov 21, 2022
Finds snippets in iambic pentameter in English-language text and tries to combine them to a rhyming sonnet.

Sonnet finder Finds snippets in iambic pentameter in English-language text and tries to combine them to a rhyming sonnet. Usage This is a Python scrip

Marcel Bollmann 11 Sep 25, 2022
Help you discover excellent English projects and get rid of disturbing by other spoken language

GitHub English Top Charts 「Help you discover excellent English projects and get

GrowingGit 544 Jan 9, 2023
Yodatranslator is a simple translator English to Yoda-language

yodatranslator Overview yodatranslator is a simple translator English to Yoda-language. Project is created for educational purposes. It is intended to

null 1 Nov 11, 2021
Coreference resolution for English, German and Polish, optimised for limited training data and easily extensible for further languages

Coreferee Author: Richard Paul Hudson, msg systems ag 1. Introduction 1.1 The basic idea 1.2 Getting started 1.2.1 English 1.2.2 German 1.2.3 Polish 1

msg systems ag 169 Dec 21, 2022
Abhijith Neil Abraham 2 Nov 5, 2021
🚀 RocketQA, dense retrieval for information retrieval and question answering, including both Chinese and English state-of-the-art models.

In recent years, the dense retrievers based on pre-trained language models have achieved remarkable progress. To facilitate more developers using cutt

null 475 Jan 4, 2023
Coreference resolution for English, French, German and Polish, optimised for limited training data and easily extensible for further languages

Coreferee Author: Richard Paul Hudson, Explosion AI 1. Introduction 1.1 The basic idea 1.2 Getting started 1.2.1 English 1.2.2 French 1.2.3 German 1.2

Explosion 70 Dec 12, 2022
A fast Text-to-Speech (TTS) model. Work well for English, Mandarin/Chinese, Japanese, Korean, Russian and Tibetan (so far). 快速语音合成模型,适用于英语、普通话/中文、日语、韩语、俄语和藏语(当前已测试)。

简体中文 | English 并行语音合成 [TOC] 新进展 2021/04/20 合并 wavegan 分支到 main 主分支,删除 wavegan 分支! 2021/04/13 创建 encoder 分支用于开发语音风格迁移模块! 2021/04/13 softdtw 分支 支持使用 Sof

Atomicoo 161 Dec 19, 2022
A demo for end-to-end English and Chinese text spotting using ABCNet.

ABCNet_Chinese A demo for end-to-end English and Chinese text spotting using ABCNet. This is an old model that was trained a long ago, which serves as

Yuliang Liu 45 Oct 4, 2022
A unified tokenization tool for Images, Chinese and English.

ICE Tokenizer Token id [0, 20000) are image tokens. Token id [20000, 20100) are common tokens, mainly punctuations. E.g., icetk[20000] == '<unk>', ice

THUDM 42 Dec 27, 2022
This simple Python program calculates a love score based on your and your crush's full names in English

This simple Python program calculates a love score based on your and your crush's full names in English. There is no logic or reason in the calculation behind the love score. The calculation could have been anything different from what's shown in this code.

p.katekomol 1 Jan 24, 2022
Yomichad - a Japanese pop-up dictionary that can display readings and English definitions of Japanese words

Yomichad is a Japanese pop-up dictionary that can display readings and English definitions of Japanese words, kanji, and optionally named entities. It is similar to yomichan, 10ten, and rikaikun in spirit, but targets qutebrowser.

Jonas Belouadi 7 Nov 7, 2022
A python wrapper around the ZPar parser for English.

NOTE This project is no longer under active development since there are now really nice pure Python parsers such as Stanza and Spacy. The repository w

ETS 49 Sep 12, 2022
Auto translate textbox from Japanese to English or Indonesia

priconne-auto-translate Auto translate textbox from Japanese to English or Indonesia How to use Install python first, Anaconda is recommended Install

Aji Priyo Wibowo 5 Aug 25, 2022
Accurately generate all possible forms of an English word e.g "election" --> "elect", "electoral", "electorate" etc.

Accurately generate all possible forms of an English word Word forms can accurately generate all possible forms of an English word. It can conjugate v

Dibya Chakravorty 570 Dec 31, 2022