A text file containing 479k English words for all your dictionary/word-based projects e.g: auto-completion / autosuggestion

Overview

List Of English Words

A text file containing over 466k English words.

While searching for a list of english words (for an auto-complete tutorial) I found: http://stackoverflow.com/questions/2213607/how-to-get-english-language-word-database which refers to http://www.infochimps.com/datasets/word-list-350000-simple-english-words-excel-readable (archived).

No idea why infochimps put the word list inside an excel (.xls) file.

I pulled out the words into a simple new-line-delimited text file. Which is more useful when building apps or importing into databases etc.

Copyright still belongs to them.

Files you may be interested in:

  • words.txt contains all words.
  • words_alpha.txt contains only [[:alpha:]] words (words that only have letters, no numbers or symbols). If you want a quick solution choose this.
  • words_dictionary.json contains all the words from words_alpha.txt as json format. If you are using Python, you can easily load this file and use it as a dictionary for faster performance. All the words are assigned with 1 in the dictionary. See read_english_dictionary.py for example usage.
Comments
  • Slow loading of words_alpha.txt in Python

    Slow loading of words_alpha.txt in Python

    While working with words_alpha.txt in Python 3.6, I have faced slow loading and slow searching. I think there should be JSON formatted dictionary for better searching performance of all words.

    opened by arsho 13
  • Clarify why this repo exists! :-)

    Clarify why this repo exists! :-)

    The word list was originally used in a proof-of-concept auto-completion mini-project: https://github.com/nelsonic/ac but it's since been used by quite a few people. Please give examples of how you are using the list in your project(s) so we can add them as suggestions in the readme. thanks!

    enhancement help wanted question 
    opened by nelsonic 13
  • Created scripts to make things easier

    Created scripts to make things easier

    I created some scripts to make things easier. Just run bash scripts/create.sh and the words added to words.txt will be sorted, words_alpha.txt will be updated, words_dictionary.json is updated, and all the .zip files are updated. This makes life a lot easier, I think. I also added dino to the words to fix #37.

    awaiting-review 
    opened by InnovativeInventor 7
  • "price" is missing from words.txt

    Subject says it all. Several compound words are found, but not "price" itself

    New to GitHub, not sure how to contribute yet. Filing an issue to track it.

    % grep price words.txt Caprice caprices counter-price cut-price half-price Haltemprice high-priced low-priced miniprice miniprices misprice one-price outprice outpriced outprices overprice overpriced overprices popular-priced preprice prepriced priceable priceably price-cut price-cutter price-cutting priced price-deciding price-enhancing pricefixing price-fixing pricey priceite priceless pricelessly pricelessness price-lowering pricemaker pricer price-raising price-reducing pricers price-ruling prices price-stabilizing reprice repriced reprices underprice underpriced underprices uneven-priced unpriceably unpriced well-priced

    opened by miketashcorpnet 2
  • Missing word: cryptocurrency

    Missing word: cryptocurrency

    The word "cryptocurrency" exists in the English language (according to Merriam Webster dictionary) and is not in words_alpha.txt. This word should be added as it is a common word now.

    opened by dianezhou96 2
  • Should words_dictionary.json be a json array ?

    Should words_dictionary.json be a json array ?

    Is there any reason why the content of words_dictionary.json isn't a JSON array ?

    For e.g instead of it being { "aalii": 1, "aaliis": 1, "aals": 1 } it could be ["aalii", "aaliis", "aals"]

    opened by sushantsahu1987 2
  • words_alpha.txt contains non-English words

    words_alpha.txt contains non-English words

    I did not review the entire file but found a bunch of words like avestruz and mudar (Portuguese words), biznagas (Spanish word) and bizzarro (Italian word) in the words_alpha.txt file. I don't think these exist in the English language so what gives? I'm sure there's more of them...

    opened by jcnmsg 1
  • read_english_dictionary.py not working

    read_english_dictionary.py not working

    I have copied read_english_dictionary.py and words_alpha.txt. When I run the code it just returns 'false' If I add print(english_words) it prints what looks like the file contents, but not a recognisable list of English words Am I missing something here? Mick

    question technical user-feedback 
    opened by micksulley 1
  • ease of use?

    ease of use?

    changed the example load_words function to return immediately. and instead of just checking against 'fate' allow end user to just do an input check in a loop until they interrupt the script.

    opened by busterbeam 0
  • Chore: Review all the open Pull Requests and Attempt to merge them ๐Ÿง‘โ€๐Ÿ’ป

    Chore: Review all the open Pull Requests and Attempt to merge them ๐Ÿง‘โ€๐Ÿ’ป

    At present there are 20 open Pull Requests: https://github.com/dwyl/english-words/pulls

    image

    This is both a great and a really bad sign. ๐ŸŽ‰ ๐Ÿ˜•

    On the positive side it's awesome that people want to contribute to the project [not just take the words and give nothing back...] and continually renews my faith in Open Source! ๐Ÿ˜

    But the down-side of having so many PRs open is that the efforts of several people are wasted ... ๐Ÿ˜ž ewe need to do a much better job of communicating how to contribute.

    https://github.com/dwyl/english-words/pull/130 is an example of one that clearly took some effort and will add value to people! ๐Ÿ‘Œ So we need to make every effort to ensure that the effort was not wasted! โณ

    Note: this is priority-3 ("nice to have") because while we definitely want to maintain and improve this repo, we have extremely limited time. And since we are not currently using it in our App ... we cannot justify the time allocation/investment right now. That's why I've "ignored" the open PRs. ๐Ÿ™„ Only features that relate to the App are given higher priority. It's possible that we could use English Words in the App e.g. for auto-completion/suggestion. In which case the priority of updating/maintaining this repo/project would be raised.

    Todo

    • [ ] Review the PR: ๐Ÿ‘€

      • [ ] Assign it to yourself and add the label: in-review ๐Ÿ‘จโ€๐Ÿซ
      • [ ] View the code/words updates and leave comments ๐Ÿ’ฌ
      • [ ] If you feel it's immediately mergeable, approve it and assign it to me. ๐Ÿ™
      • [ ] If it's not mergable, please leave a comment for the author (including an apology for the delay and link to this issue) ๐Ÿ”—
    • [ ] Criteria that qualifies for merging:

      • [ ] Minor code change where CI passes. โœ… e.g: https://github.com/dwyl/english-words/pull/153
      • [ ] Word change that references an issue where it was previously discussed. โœ…
    • [ ] Instantly reject PRs that update Zip files and explain that Zip files files are commonly used for viruses so we cannot afford to trust anyone with updating them. "It's not you, it's me/us" we "Trust No One" https://en.wikipedia.org/wiki/Trust_no_one_(Internet_security)

    @LuchoTurtle you mentioned this repo to me verbally in our catch up yesterday. If you want to make a stab at reviewing the open PRs in a Pomodoro Break, go for it! Please just leave a comment on this issue first and link to the PR you're picking off.

    enhancement help wanted T4h chore priority-3 technical discuss 
    opened by nelsonic 0
  • Epic: English-Words Project Roadmap ๐Ÿ—บ๏ธ

    Epic: English-Words Project Roadmap ๐Ÿ—บ๏ธ

    Just Want a List of Words? โ“

    If you only care about the list of words in this repo, ๐Ÿ“ that's great; use them and have an awesome day! ๐ŸŽ‰

    donaldtrum-best-words

    Want More? ๐Ÿš€

    For the minuscule minority of people who want more, this issue is for you! ๐Ÿ™Œ

    oliver-please-have-some-more

    Brief History / Context

    A few years ago I needed a list of English Words for a work project. ๐Ÿ‘จโ€๐Ÿ’ป Went searching and didn't find a ready-made list of English Words ... ๐Ÿ” ๐Ÿคทโ€โ™‚๏ธ

    But found this StackOverflow Question and Answer: https://stackoverflow.com/questions/2213607/how-to-get-english-language-word-database

    stackoverflow-english-words

    Extracted the words from the Excel file that was on InfoChimps (now 404) and dumped them in a .txt file. Put it on GitHub and linked to it in a comment on SO and didn't give it anymore thought. ๐Ÿ‘Œ

    Sadly, the work project that used the words was closed source for a company that got acquired and the App was shut down. ๐Ÿ˜ข The folly of working on closed source things is that you often have nothing to show for your years of your life! ๐Ÿ’ญ

    Meanwhile many thousands of people have downloaded the word list and the repo has 8.3k โญ ๐Ÿคฏ

    The mini [Open Source] demo project I created: nelsonic/autocomplete โžก๏ธ wordsy.herokuapp.com ...

    autocomplete-wordsy-demo

    will soon be taken offline by Heroku's Bean-counters ๐Ÿ™„

    I outlined what I wanted to do in autocomplete#tasks but it's very incomplete ... so this issue will give a muuuuch better roadmap of what we're doing. ๐Ÿคž

    What challenge are we solving? ๐Ÿค”

    The original purpose of this repo will 100% be maintained. โœ… What we are doing is enhancing the repo with a showcase App that allows people to:

    With that in mind, this is the plan:

    1. High quality list of English words in an easy to extract file/format e.g. .txt, .json and .zip
    2. Instructions for how to use the words in various programming languages; code examples.
    • [ ] JavaScript/TypeScript
    • [ ] Python
    • [ ] Elixir
    • [ ] Dart
    • [ ] Rust
    • [ ] Invite contributions from the community for code examples from more programming languages [but NOT frameworks] Make it clear that we really don't want a React sample because we don't want to encourage anyone to use it.
    1. Clarity on the Process for updating the words list both adding, correcting and removing [invalid] words.
    2. Automate the creation of the .zip file so that we don't have people attempting to submit Pull Requests with Zip Files.

    We're never going to accept a PR with a zip file. It's an easy attack vector for a malicious auto-executable. Read more: https://github.com/snyk/zip-slip-vulnerability It's not that we don't "trust" people ... but we know that not everyone on GitHub has good intentions. Crime pays otherwise there wouldn't be any crims ... And cyber-crime pays big BTCs! So let's just avoid it. ๐Ÿ‘Œ

    1. Allow anyone to lookup words with auto-completion and to make suggestions via Web App/UI. That will invite way more people including non-technical people who don't know how to use GitHub to help maintain+improve the list of words.

    Todo

    • [ ] Review the existing/open PRs and try to merge them: https://github.com/dwyl/english-words/issues/155
    • [ ] Create Phoenix App ๐Ÿ†• ... Note: waiting for Phoenix v1.7 to do this to minimise time wasted with updates ... โณ
    • [ ] Re-create basic features from nelsonic/autocomplete:
      • [ ] Use PostgreSQL for simplicity.
      • [ ] If we notice too much query latency, we can switch to SQLite or ETS for speed:
    • [ ] Load the current English Words List into the DB
    • [ ] Determine/decide what other metadata we want to store for each word. ๐Ÿ’ญ
    • [ ] Discuss any other features we want to have. (please comment!) ๐Ÿ’ฌ
    enhancement help wanted T1d chore epic technical priority-2 discuss 
    opened by nelsonic 1
Owner
dwyl
Start here: https://github.com/dwyl/start-here
dwyl
Yomichad - a Japanese pop-up dictionary that can display readings and English definitions of Japanese words

Yomichad is a Japanese pop-up dictionary that can display readings and English definitions of Japanese words, kanji, and optionally named entities. It is similar to yomichan, 10ten, and rikaikun in spirit, but targets qutebrowser.

Jonas Belouadi 7 Nov 7, 2022
Accurately generate all possible forms of an English word e.g "election" --> "elect", "electoral", "electorate" etc.

Accurately generate all possible forms of an English word Word forms can accurately generate all possible forms of an English word. It can conjugate v

Dibya Chakravorty 570 Dec 31, 2022
This program do translate english words to portuguese

Python-Dictionary This program is used to translate english words to portuguese. Web-Scraping This program use BeautifulSoap to make web scraping, so

Joรฃo Assalim 1 Oct 10, 2022
The projects lets you extract glossary words and their definitions from a given piece of text automatically using NLP techniques

Unsupervised technique to Glossary and Definition Extraction Code Files GPT2-DefinitionModel.ipynb - GPT-2 model for definition generation. Data_Gener

Prakhar Mishra 28 May 25, 2021
Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

Pytorch-NLU๏ผŒไธ€ไธชไธญๆ–‡ๆ–‡ๆœฌๅˆ†็ฑปใ€ๅบๅˆ—ๆ ‡ๆณจๅทฅๅ…ทๅŒ…๏ผŒๆ”ฏๆŒไธญๆ–‡้•ฟๆ–‡ๆœฌใ€็Ÿญๆ–‡ๆœฌ็š„ๅคš็ฑปใ€ๅคšๆ ‡็ญพๅˆ†็ฑปไปปๅŠก๏ผŒๆ”ฏๆŒไธญๆ–‡ๅ‘ฝๅๅฎžไฝ“่ฏ†ๅˆซใ€่ฏๆ€งๆ ‡ๆณจใ€ๅˆ†่ฏ็ญ‰ๅบๅˆ—ๆ ‡ๆณจไปปๅŠกใ€‚ Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

null 186 Dec 24, 2022
Count the frequency of letters or words in a text file and show a graph.

Word Counter By EBUS Coding Club Count the frequency of letters or words in a text file and show a graph. Requirements Python 3.9 or higher matplotlib

EBUS Coding Club 0 Apr 9, 2022
This simple Python program calculates a love score based on your and your crush's full names in English

This simple Python program calculates a love score based on your and your crush's full names in English. There is no logic or reason in the calculation behind the love score. The calculation could have been anything different from what's shown in this code.

p.katekomol 1 Jan 24, 2022
Auto translate textbox from Japanese to English or Indonesia

priconne-auto-translate Auto translate textbox from Japanese to English or Indonesia How to use Install python first, Anaconda is recommended Install

Aji Priyo Wibowo 5 Aug 25, 2022
Help you discover excellent English projects and get rid of disturbing by other spoken language

GitHub English Top Charts ใ€ŒHelp you discover excellent English projects and get

GrowingGit 544 Jan 9, 2023
Text editor on python to convert english text to malayalam(Romanization/Transiteration).

Manglish Text Editor This is a simple transiteration (romanization ) program which is used to convert manglish to malayalam (converts njaan to เดžเดพเตป ).

Merin Rose Tom 1 May 11, 2022
Text editor on python tkinter to convert english text to other languages with the help of ployglot.

Transliterator Text Editor This is a simple transliteration program which is used to convert english word to phonetically matching word in another lan

Merin Rose Tom 1 Jan 16, 2022
Auto-researching tool generating word documents.

About ResearchTE automates researching by generating document with answers to given questions. Supports getting results from: Google DuckDuckGo (with

null 1 Feb 14, 2022
File-based TF-IDF: Calculates keywords in a document, using a word corpus.

File-based TF-IDF Calculates keywords in a document, using a word corpus. Why? Because I found myself with hundreds of plain text files, with no way t

Jakob Lindskog 1 Feb 11, 2022
The aim of this task is to predict someone's English proficiency based on a text input.

English_proficiency_prediction_NLP The aim of this task is to predict someone's English proficiency based on a text input. Using the The NICT JLE Corp

null 1 Dec 13, 2021
C.J. Hutto 3.8k Dec 30, 2022
C.J. Hutto 2.8k Feb 18, 2021