[python] Calculate cosine similarity given 2 sentence strings

Thanks @vpekar for your implementation. It helped a lot. I just found that it misses the tf-idf weight while calculating the cosine similarity. The Counter(word) returns a dictionary which has the list of words along with their occurence.

cos(q, d) = sim(q, d) = (q ยท d)/(|q||d|) = (sum(qi, di)/(sqrt(sum(qi2)))*(sqrt(sum(vi2))) where i = 1 to v)

  • qi is the tf-idf weight of term i in the query.
  • di is the tf-idf
  • weight of term i in the document. |q| and |d| are the lengths of q and d.
  • This is the cosine similarity of q and d . . . . . . or, equivalently, the cosine of the angle between q and d.

Please feel free to view my code here. But first you will have to download the anaconda package. It will automatically set you python path in Windows. Add this python interpreter in Eclipse.

Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to string

How to split a string in two and store it in a field String method cannot be found in a main class method Kotlin - How to correctly concatenate a String Replacing a character from a certain index Remove quotes from String in Python Detect whether a Python string is a number or a letter How does String substring work in Swift How does String.Index work in Swift swift 3.0 Data to String? How to parse JSON string in Typescript

Examples related to nlp

Replace specific text with a redacted version using Python How to return history of validation loss in Keras How to compute precision, recall, accuracy and f1-score for the multiclass case with scikit learn? Python NLTK: SyntaxError: Non-ASCII character '\xc3' in file (Sentiment Analysis -NLP) Stopword removal with NLTK How to get rid of punctuation using NLTK tokenizer? Calculate cosine similarity given 2 sentence strings How do I tokenize a string sentence in NLTK? How to compute the similarity between two text documents? How do I do word Stemming or Lemmatization?

Examples related to similarity

What's the fastest way in Python to calculate cosine similarity given sparse matrix data? Find the similarity metric between two strings Calculate cosine similarity given 2 sentence strings Checking images for similarity with OpenCV

Examples related to cosine-similarity

Cosine Similarity between 2 Number Lists What's the fastest way in Python to calculate cosine similarity given sparse matrix data? Calculate cosine similarity given 2 sentence strings Can someone give an example of cosine similarity, in a very simple, graphical way?