[python] Count frequency of words in a list and sort by frequency

I am using Python 3.3

I need to create two lists, one for the unique words and the other for the frequencies of the word.

I have to sort the unique word list based on the frequencies list so that the word with the highest frequency is first in the list.

I have the design in text but am uncertain how to implement it in Python.

The methods I have found so far use either Counter or dictionaries which we have not learned. I have already created the list from the file containing all the words but do not know how to find the frequency of each word in the list. I know I will need a loop to do this but cannot figure it out.

Here's the basic design:

 original list = ["the", "car",....]
 newlst = []
 frequency = []
 for word in the original list
       if word not in newlst:
           newlst.append(word)
           set frequency = 1
       else
           increase the frequency
 sort newlst based on frequency list 

This question is related to python python-3.x list frequency word

The answer is


the best thing to do is :

def wordListToFreqDict(wordlist):
    wordfreq = [wordlist.count(p) for p in wordlist]
    return dict(zip(wordlist, wordfreq))

then try to : wordListToFreqDict(originallist)


One way would be to make a list of lists, with each sub-list in the new list containing a word and a count:

list1 = []    #this is your original list of words
list2 = []    #this is a new list

for word in list1:
    if word in list2:
        list2.index(word)[1] += 1
    else:
        list2.append([word,0])

Or, more efficiently:

for word in list1:
    try:
        list2.index(word)[1] += 1
    except:
        list2.append([word,0])

This would be less efficient than using a dictionary, but it uses more basic concepts.


You can use

from collections import Counter

It supports Python 2.7,read more information here

1.

>>>c = Counter('abracadabra')
>>>c.most_common(3)
[('a', 5), ('r', 2), ('b', 2)]

use dict

>>>d={1:'one', 2:'one', 3:'two'}
>>>c = Counter(d.values())
[('one', 2), ('two', 1)]

But, You have to read the file first, and converted to dict.

2. it's the python docs example,use re and Counter

# Find the ten most common words in Hamlet
>>> import re
>>> words = re.findall(r'\w+', open('hamlet.txt').read().lower())
>>> Counter(words).most_common(10)
[('the', 1143), ('and', 966), ('to', 762), ('of', 669), ('i', 631),
 ('you', 554),  ('a', 546), ('my', 514), ('hamlet', 471), ('in', 451)]

Using Counter would be the best way, but if you don't want to do that, you can implement it yourself this way.

# The list you already have
word_list = ['words', ..., 'other', 'words']
# Get a set of unique words from the list
word_set = set(word_list)
# create your frequency dictionary
freq = {}
# iterate through them, once per unique word.
for word in word_set:
    freq[word] = word_list.count(word) / float(len(word_list))

freq will end up with the frequency of each word in the list you already have.

You need float in there to convert one of the integers to a float, so the resulting value will be a float.

Edit:

If you can't use a dict or set, here is another less efficient way:

# The list you already have
word_list = ['words', ..., 'other', 'words']
unique_words = []
for word in word_list:
    if word not in unique_words:
        unique_words += [word]
word_frequencies = []
for word in unique_words:
    word_frequencies += [float(word_list.count(word)) / len(word_list)]
for i in range(len(unique_words)):
    print(unique_words[i] + ": " + word_frequencies[i])

The indicies of unique_words and word_frequencies will match.


use this

from collections import Counter
list1=['apple','egg','apple','banana','egg','apple']
counts = Counter(list1)
print(counts)
# Counter({'apple': 3, 'egg': 2, 'banana': 1})

Yet another solution with another algorithm without using collections:

def countWords(A):
   dic={}
   for x in A:
       if not x in  dic:        #Python 2.7: if not dic.has_key(x):
          dic[x] = A.count(x)
   return dic

dic = countWords(['apple','egg','apple','banana','egg','apple'])
sorted_items=sorted(dic.items())   # if you want it sorted

Pandas answer:

import pandas as pd
original_list = ["the", "car", "is", "red", "red", "red", "yes", "it", "is", "is", "is"]
pd.Series(original_list).value_counts()

If you wanted it in ascending order instead, it is as simple as:

pd.Series(original_list).value_counts().sort_values(ascending=True)

The ideal way is to use a dictionary that maps a word to it's count. But if you can't use that, you might want to use 2 lists - 1 storing the words, and the other one storing counts of words. Note that order of words and counts matters here. Implementing this would be hard and not very efficient.


You can use reduce() - A functional way.

words = "apple banana apple strawberry banana lemon"
reduce( lambda d, c: d.update([(c, d.get(c,0)+1)]) or d, words.split(), {})

returns:

{'strawberry': 1, 'lemon': 1, 'apple': 2, 'banana': 2}

Try this:

words = []
freqs = []

for line in sorted(original list): #takes all the lines in a text and sorts them
    line = line.rstrip() #strips them of their spaces
    if line not in words: #checks to see if line is in words
        words.append(line) #if not it adds it to the end words
        freqs.append(1) #and adds 1 to the end of freqs
    else:
        index = words.index(line) #if it is it will find where in words
        freqs[index] += 1 #and use the to change add 1 to the matching index in freqs

Here is code support your question is_char() check for validate string count those strings alone, Hashmap is dictionary in python

def is_word(word):
   cnt =0
   for c in word:

      if 'a' <= c <='z' or 'A' <= c <= 'Z' or '0' <= c <= '9' or c == '$':
          cnt +=1
   if cnt==len(word):
      return True
  return False

def words_freq(s):
  d={}
  for i in s.split():
    if is_word(i):
        if i in d:
            d[i] +=1
        else:
            d[i] = 1
   return d

 print(words_freq('the the sky$ is blue not green'))

words = file("test.txt", "r").read().split() #read the words into a list.
uniqWords = sorted(set(words)) #remove duplicate words and sort
for word in uniqWords:
    print words.count(word), word

Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to python-3.x

Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation Replace specific text with a redacted version using Python Upgrade to python 3.8 using conda "Permission Denied" trying to run Python on Windows 10 Python: 'ModuleNotFoundError' when trying to import module from imported package What is the meaning of "Failed building wheel for X" in pip install? How to downgrade python from 3.7 to 3.6 I can't install pyaudio on Windows? How to solve "error: Microsoft Visual C++ 14.0 is required."? Iterating over arrays in Python 3 How to upgrade Python version to 3.7?

Examples related to list

Convert List to Pandas Dataframe Column Python find elements in one list that are not in the other Sorting a list with stream.sorted() in Java Python Loop: List Index Out of Range How to combine two lists in R How do I multiply each element in a list by a number? Save a list to a .txt file The most efficient way to remove first N elements in a list? TypeError: list indices must be integers or slices, not str Parse JSON String into List<string>

Examples related to frequency

How to count how many values per level in a given factor? Relative frequencies / proportions with dplyr Count the frequency that a value occurs in a dataframe column Count frequency of words in a list and sort by frequency Frequency table for a single variable Getting the count of unique values in a column in bash How to find most common elements of a list? How to count the frequency of the elements in an unordered list? Item frequency count in Python

Examples related to word

Does VBA contain a comment block syntax? Count frequency of words in a list and sort by frequency Regular expression to match a word or its prefix