[python] Calculate cosine similarity given 2 sentence strings

I have similar solution but might be useful for pandas

import math
import re
from collections import Counter
import pandas as pd

WORD = re.compile(r"\w+")


def get_cosine(vec1, vec2):
    intersection = set(vec1.keys()) & set(vec2.keys())
    numerator = sum([vec1[x] * vec2[x] for x in intersection])

    sum1 = sum([vec1[x] ** 2 for x in list(vec1.keys())])
    sum2 = sum([vec2[x] ** 2 for x in list(vec2.keys())])
    denominator = math.sqrt(sum1) * math.sqrt(sum2)

    if not denominator:
        return 0.0
    else:
        return float(numerator) / denominator


def text_to_vector(text):
    words = WORD.findall(text)
    return Counter(words)

df=pd.read_csv('/content/drive/article.csv')
df['vector1']=df['headline'].apply(lambda x: text_to_vector(x)) 
df['vector2']=df['snippet'].apply(lambda x: text_to_vector(x)) 
df['simscore']=df.apply(lambda x: get_cosine(x['vector1'],x['vector2']),axis=1)

Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to string

How to split a string in two and store it in a field String method cannot be found in a main class method Kotlin - How to correctly concatenate a String Replacing a character from a certain index Remove quotes from String in Python Detect whether a Python string is a number or a letter How does String substring work in Swift How does String.Index work in Swift swift 3.0 Data to String? How to parse JSON string in Typescript

Examples related to nlp

Replace specific text with a redacted version using Python How to return history of validation loss in Keras How to compute precision, recall, accuracy and f1-score for the multiclass case with scikit learn? Python NLTK: SyntaxError: Non-ASCII character '\xc3' in file (Sentiment Analysis -NLP) Stopword removal with NLTK How to get rid of punctuation using NLTK tokenizer? Calculate cosine similarity given 2 sentence strings How do I tokenize a string sentence in NLTK? How to compute the similarity between two text documents? How do I do word Stemming or Lemmatization?

Examples related to similarity

What's the fastest way in Python to calculate cosine similarity given sparse matrix data? Find the similarity metric between two strings Calculate cosine similarity given 2 sentence strings Checking images for similarity with OpenCV

Examples related to cosine-similarity

Cosine Similarity between 2 Number Lists What's the fastest way in Python to calculate cosine similarity given sparse matrix data? Calculate cosine similarity given 2 sentence strings Can someone give an example of cosine similarity, in a very simple, graphical way?