[algorithm] Getting the closest string match

There is one more similarity measure which I once implemented in our system and was giving satisfactory results :-

Use Case

There is a user query which needs to be matched against a set of documents.

Algorithm

  1. Extract keywords from the user query (relevant POS TAGS - Noun, Proper noun).
  2. Now calculate score based on below formula for measuring similarity between user query and given document.

For every keyword extracted from user query :-

  • Start searching the document for given word and for every subsequent occurrence of that word in the document decrease the rewarded points.

In essence, if first keyword appears 4 times in the document, the score will be calculated as :-

  • first occurrence will fetch '1' point.
  • Second occurrence will add 1/2 to calculated score
  • Third occurrence would add 1/3 to total
  • Fourth occurrence gets 1/4

Total similarity score = 1 + 1/2 + 1/3 + 1/4 = 2.083

Similarly, we calculate it for other keywords in user query.

Finally, the total score will represent the extent of similarity between user query and given document.

Examples related to algorithm

How can I tell if an algorithm is efficient? Find the smallest positive integer that does not occur in a given sequence Efficiently getting all divisors of a given number Peak signal detection in realtime timeseries data What is the optimal algorithm for the game 2048? How can I sort a std::map first by value, then by key? Finding square root without using sqrt function? Fastest way to flatten / un-flatten nested JSON objects Mergesort with Python Find common substring between two strings

Examples related to language-agnostic

IOException: The process cannot access the file 'file path' because it is being used by another process Peak signal detection in realtime timeseries data Match linebreaks - \n or \r\n? Simple way to understand Encapsulation and Abstraction How can I pair socks from a pile efficiently? How do I determine whether my calculation of pi is accurate? What is ADT? (Abstract Data Type) How to explain callbacks in plain english? How are they different from calling one function from another function? Ukkonen's suffix tree algorithm in plain English Private vs Protected - Visibility Good-Practice Concern

Examples related to string-comparison

How do I compare two strings in python? How to compare the contents of two string objects in PowerShell String comparison in bash. [[: not found How do I compare version numbers in Python? Test if a string contains a word in PHP? Checking whether a string starts with XXXX comparing two strings in SQL Server Getting the closest string match How can I make SQL case sensitive string comparison on MySQL? String Comparison in Java

Examples related to levenshtein-distance

Getting the closest string match