How to check if a word is an English word with Python

Question

I want to check in a Python program if a word is in the English dictionary   I believe nltk wordnet interface might be the way to go but I have no clue how to use it for such a simple task   def is english word word       pass   how to I implement is english word   is english word token lower      In the future  I might want to check if the singular form of a word is in the dictionary  e g   properties -  property -  english word   How would I achieve that

User · Answer

Using a set to store the word list because looking them up will be faster:

with open("english_words.txt") as word_file:
    english_words = set(word.strip().lower() for word in word_file)

def is_english_word(word):
    return word.lower() in english_words

print is_english_word("ham")  # should be true if you have a good english_words.txt

To answer the second part of the question, the plurals would already be in a good word list, but if you wanted to specifically exclude those from the list for some reason, you could indeed write a function to handle it. But English pluralization rules are tricky enough that I'd just include the plurals in the word list to begin with.

As to where to find English word lists, I found several just by Googling "English word list". Here is one: http://www.sil.org/linguistics/wordlists/english/wordlist/wordsEn.txt You could Google for British or American English if you want specifically one of those dialects.

User · Answer

With pyEnchant checker SpellChecker   from enchant checker import SpellChecker  def is in english quote       d   SpellChecker  en US       d set text quote      errors    err word for err in d      return False if   len errors   gt  4  or len quote split     lt  3  else True  print is in english                           Q V2166384296                       print is in english     Two things are infinite  the universe and human stupidity  and I  m not sure about the universe          gt  False  gt  True

User · Answer

It won t work well with WordNet  because WordNet does not contain all english words  Another possibility based on NLTK without enchant is NLTK s words corpus   gt  gt  gt  from nltk corpus import words  gt  gt  gt   would  in words words   True  gt  gt  gt   could  in words words   True  gt  gt  gt   should  in words words   True  gt  gt  gt   I  in words words   True  gt  gt  gt   you  in words words   True

User · Answer

For All Linux Unix Users  If your OS uses the Linux kernel  there is a simple way to get all the words from the English American dictionary   In the directory  usr share dict you have a words file  There is also a more specific american-english and british-english files  These contain all of the words in that specific language  You can access this throughout every programming language which is why I thought you might want to know about this   Now  for python specific users  the python code below should assign the list words to have the value of every single word   import re file   open   usr share dict words    r   words   re sub     w          file read    split    def is word word       return word lower   in words  is word  tarts      Returns true is word  jwiefjiojrfiorj      Returns False   Hope this helps

User · Answer

For a faster NLTK-based solution you could hash the set of words to avoid a linear search   from nltk corpus import words as nltk words def is english word word         creation of this dictionary would be done outside of            the function because you only need to do it once      dictionary   dict fromkeys nltk words words    None      try          x   dictionary word          return True     except KeyError          return False

User · Answer

Using NLTK   from nltk corpus import wordnet  if not wordnet synsets word to test      Not an English Word else     English Word   You should refer to this article if you have trouble installing wordnet or want to try other approaches

User · Answer

For  much  more power and flexibility  use a dedicated spellchecking library like PyEnchant  There s a tutorial  or you could just dive straight in   gt  gt  gt  import enchant  gt  gt  gt  d   enchant Dict  quot en US quot    gt  gt  gt  d check  quot Hello quot   True  gt  gt  gt  d check  quot Helo quot   False  gt  gt  gt  d suggest  quot Helo quot     He lo    He-lo    Hello    Helot    Help    Halo    Hell    Held    Helm    Hero    quot He ll quot    gt  gt  gt   PyEnchant comes with a few dictionaries  en GB  en US  de DE  fr FR   but can use any of the OpenOffice ones if you want more languages  There appears to be a pluralisation library called inflect  but I ve no idea whether it s any good

User · Answer

I find that there are 3 package-based solutions to solve the problem  They are pyenchant  wordnet and corpus self-defined or from ntlk   Pyenchant couldn t installed easily in win64 with py3  Wordnet doesn t work very well because it s corpus isn t complete  So for me  I choose the solution answered by  Sadik  and use  set words words     to speed up   First   pip3 install nltk python3  import nltk nltk download  words     Then   from nltk corpus import words setofwords   set words words     print  hello  in setofwords   gt  gt True

User · Answer

For a semantic web approach  you could run a sparql query against WordNet in RDF format   Basically just use urllib module to issue GET request and return results in JSON format  parse using python  json  module   If it s not English word you ll get no results   As another idea  you could query Wiktionary s API

[python] How to check if a word is an English word with Python?

Examples related to python

Examples related to nltk

Examples related to wordnet