[python] Python: Select subset from list based on index set

I have several lists having all the same number of entries (each specifying an object property):

property_a = [545., 656., 5.4, 33.]
property_b = [ 1.2,  1.3, 2.3, 0.3]
...

and list with flags of the same length

good_objects = [True, False, False, True]

(which could easily be substituted with an equivalent index list:

good_indices = [0, 3]

What is the easiest way to generate new lists property_asel, property_bsel, ... which contain only the values indicated either by the True entries or the indices?

property_asel = [545., 33.]
property_bsel = [ 1.2, 0.3]

This question is related to python list

The answer is


You could just use list comprehension:

property_asel = [val for is_good, val in zip(good_objects, property_a) if is_good]

or

property_asel = [property_a[i] for i in good_indices]

The latter one is faster because there are fewer good_indices than the length of property_a, assuming good_indices are precomputed instead of generated on-the-fly.


Edit: The first option is equivalent to itertools.compress available since Python 2.7/3.1. See @Gary Kerr's answer.

property_asel = list(itertools.compress(property_a, good_objects))

Matlab and Scilab languages offer a simpler and more elegant syntax than Python for the question you're asking, so I think the best you can do is to mimic Matlab/Scilab by using the Numpy package in Python. By doing this the solution to your problem is very concise and elegant:

from numpy import *
property_a = array([545., 656., 5.4, 33.])
property_b = array([ 1.2,  1.3, 2.3, 0.3])
good_objects = [True, False, False, True]
good_indices = [0, 3]
property_asel = property_a[good_objects]
property_bsel = property_b[good_indices]

Numpy tries to mimic Matlab/Scilab but it comes at a cost: you need to declare every list with the keyword "array", something which will overload your script (this problem doesn't exist with Matlab/Scilab). Note that this solution is restricted to arrays of number, which is the case in your example.


Use the built in function zip

property_asel = [a for (a, truth) in zip(property_a, good_objects) if truth]

EDIT

Just looking at the new features of 2.7. There is now a function in the itertools module which is similar to the above code.

http://docs.python.org/library/itertools.html#itertools.compress

itertools.compress('ABCDEF', [1,0,1,0,1,1]) =>
  A, C, E, F

Assuming you only have the list of items and a list of true/required indices, this should be the fastest:

property_asel = [ property_a[index] for index in good_indices ]

This means the property selection will only do as many rounds as there are true/required indices. If you have a lot of property lists that follow the rules of a single tags (true/false) list you can create an indices list using the same list comprehension principles:

good_indices = [ index for index, item in enumerate(good_objects) if item ]

This iterates through each item in good_objects (while remembering its index with enumerate) and returns only the indices where the item is true.


For anyone not getting the list comprehension, here is an English prose version with the code highlighted in bold:

list the index for every group of index, item that exists in an enumeration of good objects, if (where) the item is True


I see 2 options.

  1. Using numpy:

    property_a = numpy.array([545., 656., 5.4, 33.])
    property_b = numpy.array([ 1.2,  1.3, 2.3, 0.3])
    good_objects = [True, False, False, True]
    good_indices = [0, 3]
    property_asel = property_a[good_objects]
    property_bsel = property_b[good_indices]
    
  2. Using a list comprehension and zip it:

    property_a = [545., 656., 5.4, 33.]
    property_b = [ 1.2,  1.3, 2.3, 0.3]
    good_objects = [True, False, False, True]
    good_indices = [0, 3]
    property_asel = [x for x, y in zip(property_a, good_objects) if y]
    property_bsel = [property_b[i] for i in good_indices]