[nlp] How to compute the similarity between two text documents?

If you are more interested in measuring semantic similarity of two pieces of text, I suggest take a look at this gitlab project. You can run it as a server, there is also a pre-built model which you can use easily to measure the similarity of two pieces of text; even though it is mostly trained for measuring the similarity of two sentences, you can still use it in your case.It is written in java but you can run it as a RESTful service.

Another option also is DKPro Similarity which is a library with various algorithm to measure the similarity of texts. However, it is also written in java.

code example:

// this similarity measure is defined in the dkpro.similarity.algorithms.lexical-asl package
// you need to add that to your .pom to make that example work
// there are some examples that should work out of the box in dkpro.similarity.example-gpl 
TextSimilarityMeasure measure = new WordNGramJaccardMeasure(3);    // Use word trigrams

String[] tokens1 = "This is a short example text .".split(" ");   
String[] tokens2 = "A short example text could look like that .".split(" ");

double score = measure.getSimilarity(tokens1, tokens2);

System.out.println("Similarity: " + score);