Check if a string contains an element from a list of strings

Question

For the following block of code   For I   0 To listOfStrings Count - 1     If myString Contains lstOfStrings Item I   Then         Return True     End If Next Return False   The output is   Case 1   myString  C  Files myfile doc listOfString  C  Files   C  Files2  Result  True   Case 2   myString  C  Files3 myfile doc listOfString  C  Files   C  Files2  Result  False   The list  listOfStrings  may contain several items  minimum 20  and it has to be checked against a thousands of strings  like myString    Is there a better  more efficient  way to write this code

User · Answer

Based on your patterns one improvement would be to change to using StartsWith instead of Contains. StartsWith need only iterate through each string until it finds the first mismatch instead of having to restart the search at every character position when it finds one.

Also, based on your patterns, it looks like you may be able to extract the first part of the path for myString, then reverse the comparison -- looking for the starting path of myString in the list of strings rather than the other way around.

string[] pathComponents = myString.Split( Path.DirectorySeparatorChar );
string startPath = pathComponents[0] + Path.DirectorySeparatorChar;

return listOfStrings.Contains( startPath );

EDIT: This would be even faster using the HashSet idea @Marc Gravell mentions since you could change Contains to ContainsKey and the lookup would be O(1) instead of O(N). You would have to make sure that the paths match exactly. Note that this is not a general solution as is @Marc Gravell's but is tailored to your examples.

Sorry for the C# example. I haven't had enough coffee to translate to VB.

User · Answer

With LINQ  and using C   I don t know VB much these days    bool b   listOfStrings Any s  gt myString Contains s      or  shorter and more efficient  but arguably less clear    bool b   listOfStrings Any myString Contains     If you were testing equality  it would be worth looking at HashSet etc  but this won t help with partial matches unless you split it into fragments and add an order of complexity     update  if you really mean  StartsWith   then you could sort the list and place it into an array   then use Array BinarySearch to find each item - check by lookup to see if it is a full or partial match

User · Answer

myList Any myString Contains

User · Answer

Have you tested the speed   i e  Have you created a sample set of data and profiled it  It may not be as bad as you think   This might also be something you could spawn off into a separate thread and give the illusion of speed

User · Answer

Old question  But since VB NET was the original requirement  Using the same values of the accepted answer   listOfStrings Any Function s  myString Contains s

User · Answer

If speed is critical  you might want to look for the Aho-Corasick algorithm for sets of patterns    It s a trie with failure links  that is  complexity is O n m k   where n is the length of the input text  m the cumulative length of the patterns and k the number of matches  You just have to modify the algorithm to terminate after the first match is found

User · Answer

when you construct yours strings it should be like this  bool inact   new string      SUSPENDARE    DIZOLVARE    Any s  gt stare Contains s

User · Answer

Based on your patterns one improvement would be to change to using StartsWith instead of Contains. StartsWith need only iterate through each string until it finds the first mismatch instead of having to restart the search at every character position when it finds one.

Also, based on your patterns, it looks like you may be able to extract the first part of the path for myString, then reverse the comparison -- looking for the starting path of myString in the list of strings rather than the other way around.

string[] pathComponents = myString.Split( Path.DirectorySeparatorChar );
string startPath = pathComponents[0] + Path.DirectorySeparatorChar;

return listOfStrings.Contains( startPath );

EDIT: This would be even faster using the HashSet idea @Marc Gravell mentions since you could change Contains to ContainsKey and the lookup would be O(1) instead of O(N). You would have to make sure that the paths match exactly. Note that this is not a general solution as is @Marc Gravell's but is tailored to your examples.

Sorry for the C# example. I haven't had enough coffee to translate to VB.

User · Answer

I liked Marc s answer  but needed the Contains matching to be CaSe InSenSiTiVe   This was the solution   bool b   listOfStrings Any s   gt  myString IndexOf s  StringComparison OrdinalIgnoreCase   gt   0

User · Answer

There were a number of suggestions from an earlier similar question  Best way to test for existing string against a large list of comparables    Regex might be sufficient for your requirement  The expression would be a concatenation of all the candidate substrings  with an OR     operator between them  Of course  you ll have to watch out for unescaped characters when building the expression  or a failure to compile it because of complexity or size limitations   Another way to do this would be to construct a trie data structure to represent all the candidate substrings  this may somewhat duplicate what the regex matcher is doing   As you step through each character in the test string  you would create a new pointer to the root of the trie  and advance existing pointers to the appropriate child  if any   You get a match when any pointer reaches a leaf

User · Answer

As I needed to check if there are items from a list in a  long  string  I ended up with this one  listOfStrings Any x   gt  myString ToUpper   Contains x ToUpper       Or in vb net  listOfStrings Any Function x  myString ToUpper   Contains x ToUpper

User · Answer

Have you tested the speed   i e  Have you created a sample set of data and profiled it  It may not be as bad as you think   This might also be something you could spawn off into a separate thread and give the illusion of speed

User · Answer

I liked Marc s answer  but needed the Contains matching to be CaSe InSenSiTiVe   This was the solution   bool b   listOfStrings Any s   gt  myString IndexOf s  StringComparison OrdinalIgnoreCase   gt   0

User · Answer

There were a number of suggestions from an earlier similar question  Best way to test for existing string against a large list of comparables    Regex might be sufficient for your requirement  The expression would be a concatenation of all the candidate substrings  with an OR     operator between them  Of course  you ll have to watch out for unescaped characters when building the expression  or a failure to compile it because of complexity or size limitations   Another way to do this would be to construct a trie data structure to represent all the candidate substrings  this may somewhat duplicate what the regex matcher is doing   As you step through each character in the test string  you would create a new pointer to the root of the trie  and advance existing pointers to the appropriate child  if any   You get a match when any pointer reaches a leaf

User · Answer

The drawback of Contains method is that it doesn t allow to specify comparison type which is often important when comparing strings  It is always culture-sensitive and case-sensitive  So I think the answer of WhoIsRich is valuable  I just want to show a simpler alternative   listOfStrings Any s   gt  s Equals myString  StringComparison OrdinalIgnoreCase

User · Answer

With LINQ  and using C   I don t know VB much these days    bool b   listOfStrings Any s  gt myString Contains s      or  shorter and more efficient  but arguably less clear    bool b   listOfStrings Any myString Contains     If you were testing equality  it would be worth looking at HashSet etc  but this won t help with partial matches unless you split it into fragments and add an order of complexity     update  if you really mean  StartsWith   then you could sort the list and place it into an array   then use Array BinarySearch to find each item - check by lookup to see if it is a full or partial match

User · Answer

With LINQ  and using C   I don t know VB much these days    bool b   listOfStrings Any s  gt myString Contains s      or  shorter and more efficient  but arguably less clear    bool b   listOfStrings Any myString Contains     If you were testing equality  it would be worth looking at HashSet etc  but this won t help with partial matches unless you split it into fragments and add an order of complexity     update  if you really mean  StartsWith   then you could sort the list and place it into an array   then use Array BinarySearch to find each item - check by lookup to see if it is a full or partial match

User · Answer

Based on your patterns one improvement would be to change to using StartsWith instead of Contains. StartsWith need only iterate through each string until it finds the first mismatch instead of having to restart the search at every character position when it finds one.

Also, based on your patterns, it looks like you may be able to extract the first part of the path for myString, then reverse the comparison -- looking for the starting path of myString in the list of strings rather than the other way around.

string[] pathComponents = myString.Split( Path.DirectorySeparatorChar );
string startPath = pathComponents[0] + Path.DirectorySeparatorChar;

return listOfStrings.Contains( startPath );

EDIT: This would be even faster using the HashSet idea @Marc Gravell mentions since you could change Contains to ContainsKey and the lookup would be O(1) instead of O(N). You would have to make sure that the paths match exactly. Note that this is not a general solution as is @Marc Gravell's but is tailored to your examples.

Sorry for the C# example. I haven't had enough coffee to translate to VB.

User · Answer

There were a number of suggestions from an earlier similar question  Best way to test for existing string against a large list of comparables    Regex might be sufficient for your requirement  The expression would be a concatenation of all the candidate substrings  with an OR     operator between them  Of course  you ll have to watch out for unescaped characters when building the expression  or a failure to compile it because of complexity or size limitations   Another way to do this would be to construct a trie data structure to represent all the candidate substrings  this may somewhat duplicate what the regex matcher is doing   As you step through each character in the test string  you would create a new pointer to the root of the trie  and advance existing pointers to the appropriate child  if any   You get a match when any pointer reaches a leaf

User · Answer

As I needed to check if there are items from a list in a  long  string  I ended up with this one  listOfStrings Any x   gt  myString ToUpper   Contains x ToUpper       Or in vb net  listOfStrings Any Function x  myString ToUpper   Contains x ToUpper

User · Answer

If speed is critical  you might want to look for the Aho-Corasick algorithm for sets of patterns    It s a trie with failure links  that is  complexity is O n m k   where n is the length of the input text  m the cumulative length of the patterns and k the number of matches  You just have to modify the algorithm to terminate after the first match is found

User · Answer

Based on your patterns one improvement would be to change to using StartsWith instead of Contains. StartsWith need only iterate through each string until it finds the first mismatch instead of having to restart the search at every character position when it finds one.

Also, based on your patterns, it looks like you may be able to extract the first part of the path for myString, then reverse the comparison -- looking for the starting path of myString in the list of strings rather than the other way around.

string[] pathComponents = myString.Split( Path.DirectorySeparatorChar );
string startPath = pathComponents[0] + Path.DirectorySeparatorChar;

return listOfStrings.Contains( startPath );

EDIT: This would be even faster using the HashSet idea @Marc Gravell mentions since you could change Contains to ContainsKey and the lookup would be O(1) instead of O(N). You would have to make sure that the paths match exactly. Note that this is not a general solution as is @Marc Gravell's but is tailored to your examples.

Sorry for the C# example. I haven't had enough coffee to translate to VB.

User · Answer

Old question  But since VB NET was the original requirement  Using the same values of the accepted answer   listOfStrings Any Function s  myString Contains s

User · Answer

myList Any myString Contains

User · Answer

With LINQ  and using C   I don t know VB much these days    bool b   listOfStrings Any s  gt myString Contains s      or  shorter and more efficient  but arguably less clear    bool b   listOfStrings Any myString Contains     If you were testing equality  it would be worth looking at HashSet etc  but this won t help with partial matches unless you split it into fragments and add an order of complexity     update  if you really mean  StartsWith   then you could sort the list and place it into an array   then use Array BinarySearch to find each item - check by lookup to see if it is a full or partial match

User · Answer

Have you tested the speed   i e  Have you created a sample set of data and profiled it  It may not be as bad as you think   This might also be something you could spawn off into a separate thread and give the illusion of speed

User · Answer

when you construct yours strings it should be like this  bool inact   new string      SUSPENDARE    DIZOLVARE    Any s  gt stare Contains s

User · Answer

If speed is critical  you might want to look for the Aho-Corasick algorithm for sets of patterns    It s a trie with failure links  that is  complexity is O n m k   where n is the length of the input text  m the cumulative length of the patterns and k the number of matches  You just have to modify the algorithm to terminate after the first match is found

User · Answer

There were a number of suggestions from an earlier similar question  Best way to test for existing string against a large list of comparables    Regex might be sufficient for your requirement  The expression would be a concatenation of all the candidate substrings  with an OR     operator between them  Of course  you ll have to watch out for unescaped characters when building the expression  or a failure to compile it because of complexity or size limitations   Another way to do this would be to construct a trie data structure to represent all the candidate substrings  this may somewhat duplicate what the regex matcher is doing   As you step through each character in the test string  you would create a new pointer to the root of the trie  and advance existing pointers to the appropriate child  if any   You get a match when any pointer reaches a leaf

User · Answer

If speed is critical  you might want to look for the Aho-Corasick algorithm for sets of patterns    It s a trie with failure links  that is  complexity is O n m k   where n is the length of the input text  m the cumulative length of the patterns and k the number of matches  You just have to modify the algorithm to terminate after the first match is found

User · Answer

Have you tested the speed   i e  Have you created a sample set of data and profiled it  It may not be as bad as you think   This might also be something you could spawn off into a separate thread and give the illusion of speed

User · Answer

The drawback of Contains method is that it doesn t allow to specify comparison type which is often important when comparing strings  It is always culture-sensitive and case-sensitive  So I think the answer of WhoIsRich is valuable  I just want to show a simpler alternative   listOfStrings Any s   gt  s Equals myString  StringComparison OrdinalIgnoreCase

[c#] Check if a string contains an element from a list (of strings)

Examples related to c#

Examples related to vb.net

Examples related to list

Examples related to coding-style

Examples related to performance