[python] How to calculate the sentence similarity using word2vec model of gensim with python

Gensim implements a model called Doc2Vec for paragraph embedding.

There are different tutorials presented as IPython notebooks:

Another method would rely on Word2Vec and Word Mover's Distance (WMD), as shown in this tutorial:

An alternative solution would be to rely on average vectors:

from gensim.models import KeyedVectors
from gensim.utils import simple_preprocess    

def tidy_sentence(sentence, vocabulary):
    return [word for word in simple_preprocess(sentence) if word in vocabulary]    

def compute_sentence_similarity(sentence_1, sentence_2, model_wv):
    vocabulary = set(model_wv.index2word)    
    tokens_1 = tidy_sentence(sentence_1, vocabulary)    
    tokens_2 = tidy_sentence(sentence_2, vocabulary)    
    return model_wv.n_similarity(tokens_1, tokens_2)

wv = KeyedVectors.load('model.wv', mmap='r')
sim = compute_sentence_similarity('this is a sentence', 'this is also a sentence', wv)
print(sim)

Finally, if you can run Tensorflow, you may try: https://tfhub.dev/google/universal-sentence-encoder/2