[python] Best way to handle list.index(might-not-exist) in python?

I have code which looks something like this:

thing_index = thing_list.index(thing)
otherfunction(thing_list, thing_index)

ok so that's simplified but you get the idea. Now thing might not actually be in the list, in which case I want to pass -1 as thing_index. In other languages this is what you'd expect index() to return if it couldn't find the element. In fact it throws a ValueError.

I could do this:

try:
    thing_index = thing_list.index(thing)
except ValueError:
    thing_index = -1
otherfunction(thing_list, thing_index)

But this feels dirty, plus I don't know if ValueError could be raised for some other reason. I came up with the following solution based on generator functions, but it seems a little complex:

thing_index = ( [(i for i in xrange(len(thing_list)) if thing_list[i]==thing)] or [-1] )[0]

Is there a cleaner way to achieve the same thing? Let's assume the list isn't sorted.

This question is related to python list find indexing

The answer is


I have the same issue with the ".index()" method on lists. I have no issue with the fact that it throws an exception but I strongly disagree with the fact that it's a non-descriptive ValueError. I could understand if it would've been an IndexError, though.

I can see why returning "-1" would be an issue too because it's a valid index in Python. But realistically, I never expect a ".index()" method to return a negative number.

Here goes a one liner (ok, it's a rather long line ...), goes through the list exactly once and returns "None" if the item isn't found. It would be trivial to rewrite it to return -1, should you so desire.

indexOf = lambda list, thing: \
            reduce(lambda acc, (idx, elem): \
                   idx if (acc is None) and elem == thing else acc, list, None)

How to use:

>>> indexOf([1,2,3], 4)
>>>
>>> indexOf([1,2,3], 1)
0
>>>

There is nothing wrong with your code that uses ValueError. Here's yet another one-liner if you'd like to avoid exceptions:

thing_index = next((i for i, x in enumerate(thing_list) if x == thing), -1)

The dict type has a get function, where if the key doesn't exist in the dictionary, the 2nd argument to get is the value that it should return. Similarly there is setdefault, which returns the value in the dict if the key exists, otherwise it sets the value according to your default parameter and then returns your default parameter.

You could extend the list type to have a getindexdefault method.

class SuperDuperList(list):
    def getindexdefault(self, elem, default):
        try:
            thing_index = self.index(elem)
            return thing_index
        except ValueError:
            return default

Which could then be used like:

mylist = SuperDuperList([0,1,2])
index = mylist.getindexdefault( 'asdf', -1 )

This issue is one of language philosophy. In Java for example there has always been a tradition that exceptions should really only be used in "exceptional circumstances" that is when errors have happened, rather than for flow control. In the beginning this was for performance reasons as Java exceptions were slow but now this has become the accepted style.

In contrast Python has always used exceptions to indicate normal program flow, like raising a ValueError as we are discussing here. There is nothing "dirty" about this in Python style and there are many more where that came from. An even more common example is StopIteration exception which is raised by an iterator‘s next() method to signal that there are no further values.


What about like this:

temp_inx = (L + [x]).index(x) 
inx = temp_inx if temp_inx < len(L) else -1

I don't know why you should think it is dirty... because of the exception? if you want a oneliner, here it is:

thing_index = thing_list.index(elem) if thing_list.count(elem) else -1

but i would advise against using it; I think Ross Rogers solution is the best, use an object to encapsulate your desiderd behaviour, don't try pushing the language to its limits at the cost of readability.


Comparison of implementations

Simple comparison on Python 3.8

TL;DR maybeidx2 is faster in general except for arrays (n<100) with lots of misses

def maybeidx1(l, v):
    return l.index(v) if v in l else None

def maybeidx2(l, v):
    try:
        return l.index(v)
    except ValueError:
        return None

Test cases:

a = [*range(100_000)]
# Case 1: index in list
maybeidx1(a, 50_000)
Out[20]: 50000
maybeidx2(a, 50_000)
Out[21]: 50000
# Case 2: index not in list
maybeidx1(a, 100_000) is None
Out[23]: True
maybeidx2(a, 100_000) is None
Out[24]: True

Timing case 1

%timeit maybeidx1(a, 50_000)
1.06 ms ± 15.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit maybeidx2(a, 50_000)
530 µs ± 8.47 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Timing case 2

%timeit maybeidx1(a, 100_000)
1.07 ms ± 21 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit maybeidx2(a, 100_000)
1.07 ms ± 16.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Result

Use the maybeidx2 method for larger arrays. This is faster due to maybeidx1 having two scans of the array in search of the value - this is still O(n) time but has a constant multiplier of 2 and thus slower in practice. This holds in the case of the value being present in the list. When the value is not present, these times will be roughly equal; they both have to scan the full array exactly once, then return None. The overhead of the try-except is negligible, even with an array size of 10 - unless case two occurs. Then the try-except overhead is noticeable. Example:

a = [*range(10)]
%timeit maybeidx1(a, 10)
191 ns ± 2.61 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
%timeit maybeidx2(a, 10)
566 ns ± 5.93 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

This overhead becomes negligible (on my machine) when a has above 100 elements.


thing_index = thing_list.index(elem) if elem in thing_list else -1

One line. Simple. No exceptions.


What about this :

li = [1,2,3,4,5] # create list 

li = dict(zip(li,range(len(li)))) # convert List To Dict 
print( li ) # {1: 0, 2: 1, 3: 2, 4:3 , 5: 4}
li.get(20) # None 
li.get(1)  # 0 

There is nothing "dirty" about using try-except clause. This is the pythonic way. ValueError will be raised by the .index method only, because it's the only code you have there!

To answer the comment:
In Python, easier to ask forgiveness than to get permission philosophy is well established, and no index will not raise this type of error for any other issues. Not that I can think of any.


If you are doing this often then it is better to stove it away in a helper function:

def index_of(val, in_list):
    try:
        return in_list.index(val)
    except ValueError:
        return -1 

I'd suggest:

if thing in thing_list:
  list_index = -1
else:
  list_index = thing_list.index(thing)

It's been quite some time but it's a core part of the stdlib and has dozens of potential methods so I think it's useful to have some benchmarks for the different suggestions and include the numpy method which can be by far the fastest.

import random
from timeit import timeit
import numpy as np

l = [random.random() for i in range(10**4)]
l[10**4 - 100] = 5

# method 1
def fun1(l:list, x:int, e = -1) -> int:
    return [[i for i,elem in enumerate(l) if elem == x] or [e]][0]

# method 2
def fun2(l:list, x:int, e = -1) -> int:
    for i,elem in enumerate(l):
        if elem == x:
            return i
    else:
        return e

# method 3
def fun3(l:list, x:int, e = -1) -> int:
    try:
        idx = l.index(x)
    except ValueError:
        idx = e
    return idx

# method 4
def fun4(l:list, x:int, e = -1) -> int:
    return l.index(x) if x in l else e

l2 = np.array(l)
# method 5
def fun5(l:list or np.ndarray, x:int, e = -1) -> int:
    res = np.where(np.equal(l, x))
    if res[0].any():
        return res[0][0]
    else:        
        return e


if __name__ == "__main__":
    print("Method 1:")
    print(timeit(stmt = "fun1(l, 5)", number = 1000, globals = globals()))
    print("")
    print("Method 2:")
    print(timeit(stmt = "fun2(l, 5)", number = 1000, globals = globals()))
    print("")
    print("Method 3:")
    print(timeit(stmt = "fun3(l, 5)", number = 1000, globals = globals()))
    print("")
    print("Method 4:")
    print(timeit(stmt = "fun4(l, 5)", number = 1000, globals = globals()))
    print("")
    print("Method 5, numpy given list:")
    print(timeit(stmt = "fun5(l, 5)", number = 1000, globals = globals()))
    print("")
    print("Method 6, numpy given np.ndarray:")
    print(timeit(stmt = "fun5(l2, 5)", number = 1000, globals = globals()))
    print("")

When run as main, this results in the following printout on my machine indicating time in seconds to complete 1000 trials of each function:

Method 1: 0.7502102799990098

Method 2: 0.7291318440002215

Method 3: 0.24142152300009911

Method 4: 0.5253471979995084

Method 5, numpy given list: 0.5045417560013448

Method 6, numpy given np.ndarray: 0.011147511999297421

Of course the question asks specifically about lists so the best solution is to use the try-except method, however the speed improvements (at least 20x here compared to try-except) offered by using the numpy data structures and operators instead of python data structures is significant and if building something on many arrays of data that is performance critical then the author should try to use numpy throughout to take advantage of the superfast C bindings. (CPython interpreter, other interpreter performances may vary)

Btw, the reason Method 5 is much slower than Method 6 is because numpy first has to convert the given list to it's own numpy array, so giving it a list doesn't break it it just doesn't fully utilise the speed possible.


What about this:

otherfunction(thing_collection, thing)

Rather than expose something so implementation-dependent like a list index in a function interface, pass the collection and the thing and let otherfunction deal with the "test for membership" issues. If otherfunction is written to be collection-type-agnostic, then it would probably start with:

if thing in thing_collection:
    ... proceed with operation on thing

which will work if thing_collection is a list, tuple, set, or dict.

This is possibly clearer than:

if thing_index != MAGIC_VALUE_INDICATING_NOT_A_MEMBER:

which is the code you already have in otherfunction.


Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to list

Convert List to Pandas Dataframe Column Python find elements in one list that are not in the other Sorting a list with stream.sorted() in Java Python Loop: List Index Out of Range How to combine two lists in R How do I multiply each element in a list by a number? Save a list to a .txt file The most efficient way to remove first N elements in a list? TypeError: list indices must be integers or slices, not str Parse JSON String into List<string>

Examples related to find

Find a file by name in Visual Studio Code Explaining the 'find -mtime' command find files by extension, *.html under a folder in nodejs MongoDB Show all contents from all collections How can I find a file/directory that could be anywhere on linux command line? Get all files modified in last 30 days in a directory FileNotFoundError: [Errno 2] No such file or directory Linux find and grep command together find . -type f -exec chmod 644 {} ; Find all stored procedures that reference a specific column in some table

Examples related to indexing

numpy array TypeError: only integer scalar arrays can be converted to a scalar index How to print a specific row of a pandas DataFrame? What does 'index 0 is out of bounds for axis 0 with size 0' mean? How does String.Index work in Swift Pandas KeyError: value not in index Update row values where certain condition is met in pandas Pandas split DataFrame by column value Rebuild all indexes in a Database How are iloc and loc different? pandas loc vs. iloc vs. at vs. iat?