[python] List comprehension vs. lambda + filter

I happened to find myself having a basic filtering need: I have a list and I have to filter it by an attribute of the items.

My code looked like this:

my_list = [x for x in my_list if x.attribute == value]

But then I thought, wouldn't it be better to write it like this?

my_list = filter(lambda x: x.attribute == value, my_list)

It's more readable, and if needed for performance the lambda could be taken out to gain something.

Question is: are there any caveats in using the second way? Any performance difference? Am I missing the Pythonic Way™ entirely and should do it in yet another way (such as using itemgetter instead of the lambda)?

This question is related to python list functional-programming filter lambda

The answer is


An important difference is that list comprehension will return a list while the filter returns a filter, which you cannot manipulate like a list (ie: call len on it, which does not work with the return of filter).

My own self-learning brought me to some similar issue.

That being said, if there is a way to have the resulting list from a filter, a bit like you would do in .NET when you do lst.Where(i => i.something()).ToList(), I am curious to know it.

EDIT: This is the case for Python 3, not 2 (see discussion in comments).


Here's a short piece I use when I need to filter on something after the list comprehension. Just a combination of filter, lambda, and lists (otherwise known as the loyalty of a cat and the cleanliness of a dog).

In this case I'm reading a file, stripping out blank lines, commented out lines, and anything after a comment on a line:

# Throw out blank lines and comments
with open('file.txt', 'r') as lines:        
    # From the inside out:
    #    [s.partition('#')[0].strip() for s in lines]... Throws out comments
    #   filter(lambda x: x!= '', [s.part... Filters out blank lines
    #  y for y in filter... Converts filter object to list
    file_contents = [y for y in filter(lambda x: x != '', [s.partition('#')[0].strip() for s in lines])]

Filter is just that. It filters out the elements of a list. You can see the definition mentions the same(in the official docs link I mentioned before). Whereas, list comprehension is something that produces a new list after acting upon something on the previous list.(Both filter and list comprehension creates new list and not perform operation in place of the older list. A new list here is something like a list with, say, an entirely new data type. Like converting integers to string ,etc)

In your example, it is better to use filter than list comprehension, as per the definition. However, if you want, say other_attribute from the list elements, in your example is to be retrieved as a new list, then you can use list comprehension.

return [item.other_attribute for item in my_list if item.attribute==value]

This is how I actually remember about filter and list comprehension. Remove a few things within a list and keep the other elements intact, use filter. Use some logic on your own at the elements and create a watered down list suitable for some purpose, use list comprehension.


It took me some time to get familiarized with the higher order functions filter and map. So i got used to them and i actually liked filter as it was explicit that it filters by keeping whatever is truthy and I've felt cool that I knew some functional programming terms.

Then I read this passage (Fluent Python Book):

The map and filter functions are still builtins in Python 3, but since the introduction of list comprehensions and generator ex- pressions, they are not as important. A listcomp or a genexp does the job of map and filter combined, but is more readable.

And now I think, why bother with the concept of filter / map if you can achieve it with already widely spread idioms like list comprehensions. Furthermore maps and filters are kind of functions. In this case I prefer using Anonymous functions lambdas.

Finally, just for the sake of having it tested, I've timed both methods (map and listComp) and I didn't see any relevant speed difference that would justify making arguments about it.

from timeit import Timer

timeMap = Timer(lambda: list(map(lambda x: x*x, range(10**7))))
print(timeMap.timeit(number=100))

timeListComp = Timer(lambda:[(lambda x: x*x) for x in range(10**7)])
print(timeListComp.timeit(number=100))

#Map:                 166.95695265199174
#List Comprehension   177.97208347299602

generally filter is slightly faster if using a builtin function.

I would expect the list comprehension to be slightly faster in your case


My take

def filter_list(list, key, value, limit=None):
    return [i for i in list if i[key] == value][:limit]

I find the second way more readable. It tells you exactly what the intention is: filter the list.
PS: do not use 'list' as a variable name


Since any speed difference is bound to be miniscule, whether to use filters or list comprehensions comes down to a matter of taste. In general I'm inclined to use comprehensions (which seems to agree with most other answers here), but there is one case where I prefer filter.

A very frequent use case is pulling out the values of some iterable X subject to a predicate P(x):

[x for x in X if P(x)]

but sometimes you want to apply some function to the values first:

[f(x) for x in X if P(f(x))]


As a specific example, consider

primes_cubed = [x*x*x for x in range(1000) if prime(x)]

I think this looks slightly better than using filter. But now consider

prime_cubes = [x*x*x for x in range(1000) if prime(x*x*x)]

In this case we want to filter against the post-computed value. Besides the issue of computing the cube twice (imagine a more expensive calculation), there is the issue of writing the expression twice, violating the DRY aesthetic. In this case I'd be apt to use

prime_cubes = filter(prime, [x*x*x for x in range(1000)])

I thought I'd just add that in python 3, filter() is actually an iterator object, so you'd have to pass your filter method call to list() in order to build the filtered list. So in python 2:

lst_a = range(25) #arbitrary list
lst_b = [num for num in lst_a if num % 2 == 0]
lst_c = filter(lambda num: num % 2 == 0, lst_a)

lists b and c have the same values, and were completed in about the same time as filter() was equivalent [x for x in y if z]. However, in 3, this same code would leave list c containing a filter object, not a filtered list. To produce the same values in 3:

lst_a = range(25) #arbitrary list
lst_b = [num for num in lst_a if num % 2 == 0]
lst_c = list(filter(lambda num: num %2 == 0, lst_a))

The problem is that list() takes an iterable as it's argument, and creates a new list from that argument. The result is that using filter in this way in python 3 takes up to twice as long as the [x for x in y if z] method because you have to iterate over the output from filter() as well as the original list.


This is a somewhat religious issue in Python. Even though Guido considered removing map, filter and reduce from Python 3, there was enough of a backlash that in the end only reduce was moved from built-ins to functools.reduce.

Personally I find list comprehensions easier to read. It is more explicit what is happening from the expression [i for i in list if i.attribute == value] as all the behaviour is on the surface not inside the filter function.

I would not worry too much about the performance difference between the two approaches as it is marginal. I would really only optimise this if it proved to be the bottleneck in your application which is unlikely.

Also since the BDFL wanted filter gone from the language then surely that automatically makes list comprehensions more Pythonic ;-)


In addition to the accepted answer, there is a corner case when you should use filter instead of a list comprehension. If the list is unhashable you cannot directly process it with a list comprehension. A real world example is if you use pyodbc to read results from a database. The fetchAll() results from cursor is an unhashable list. In this situation, to directly manipulating on the returned results, filter should be used:

cursor.execute("SELECT * FROM TABLE1;")
data_from_db = cursor.fetchall()
processed_data = filter(lambda s: 'abc' in s.field1 or s.StartTime >= start_date_time, data_from_db) 

If you use list comprehension here you will get the error:

TypeError: unhashable type: 'list'


Summarizing other answers

Looking through the answers, we have seen a lot of back and forth, whether or not list comprehension or filter may be faster or if it is even important or pythonic to care about such an issue. In the end, the answer is as most times: it depends.

I just stumbled across this question while optimizing code where this exact question (albeit combined with an in expression, not ==) is very relevant - the filter + lambda expression is taking up a third of my computation time (of multiple minutes).

My case

In my case, the list comprehension is much faster (twice the speed). But I suspect that this varies strongly based on the filter expression as well as the Python interpreter used.

Test it for yourself

Here is a simple code snippet that should be easy to adapt. If you profile it (most IDEs can do that easily), you will be able to easily decide for your specific case which is the better option:

whitelist = set(range(0, 100000000, 27))

input_list = list(range(0, 100000000))

proximal_list = list(filter(
        lambda x: x in whitelist,
        input_list
    ))

proximal_list2 = [x for x in input_list if x in whitelist]

print(len(proximal_list))
print(len(proximal_list2))

If you do not have an IDE that lets you profile easily, try this instead (extracted from my codebase, so a bit more complicated). This code snippet will create a profile for you that you can easily visualize using e.g. snakeviz:

import cProfile
from time import time


class BlockProfile:
    def __init__(self, profile_path):
        self.profile_path = profile_path
        self.profiler = None
        self.start_time = None

    def __enter__(self):
        self.profiler = cProfile.Profile()
        self.start_time = time()
        self.profiler.enable()

    def __exit__(self, *args):
        self.profiler.disable()
        exec_time = int((time() - self.start_time) * 1000)
        self.profiler.dump_stats(self.profile_path)


whitelist = set(range(0, 100000000, 27))
input_list = list(range(0, 100000000))

with BlockProfile("/path/to/create/profile/in/profile.pstat"):
    proximal_list = list(filter(
            lambda x: x in whitelist,
            input_list
        ))

    proximal_list2 = [x for x in input_list if x in whitelist]

print(len(proximal_list))
print(len(proximal_list2))

Although filter may be the "faster way", the "Pythonic way" would be not to care about such things unless performance is absolutely critical (in which case you wouldn't be using Python!).


Curiously on Python 3, I see filter performing faster than list comprehensions.

I always thought that the list comprehensions would be more performant. Something like: [name for name in brand_names_db if name is not None] The bytecode generated is a bit better.

>>> def f1(seq):
...     return list(filter(None, seq))
>>> def f2(seq):
...     return [i for i in seq if i is not None]
>>> disassemble(f1.__code__)
2         0 LOAD_GLOBAL              0 (list)
          2 LOAD_GLOBAL              1 (filter)
          4 LOAD_CONST               0 (None)
          6 LOAD_FAST                0 (seq)
          8 CALL_FUNCTION            2
         10 CALL_FUNCTION            1
         12 RETURN_VALUE
>>> disassemble(f2.__code__)
2           0 LOAD_CONST               1 (<code object <listcomp> at 0x10cfcaa50, file "<stdin>", line 2>)
          2 LOAD_CONST               2 ('f2.<locals>.<listcomp>')
          4 MAKE_FUNCTION            0
          6 LOAD_FAST                0 (seq)
          8 GET_ITER
         10 CALL_FUNCTION            1
         12 RETURN_VALUE

But they are actually slower:

   >>> timeit(stmt="f1(range(1000))", setup="from __main__ import f1,f2")
   21.177661532000116
   >>> timeit(stmt="f2(range(1000))", setup="from __main__ import f1,f2")
   42.233950221000214

Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to list

Convert List to Pandas Dataframe Column Python find elements in one list that are not in the other Sorting a list with stream.sorted() in Java Python Loop: List Index Out of Range How to combine two lists in R How do I multiply each element in a list by a number? Save a list to a .txt file The most efficient way to remove first N elements in a list? TypeError: list indices must be integers or slices, not str Parse JSON String into List<string>

Examples related to functional-programming

Dart: mapping a list (list.map) Index inside map() function functional way to iterate over range (ES6/7) How can I count occurrences with groupBy? How do I use the includes method in lodash to check if an object is in the collection? Does Java SE 8 have Pairs or Tuples? Functional style of Java 8's Optional.ifPresent and if-not-Present? What is difference between functional and imperative programming languages? How does functools partial do what it does? map function for objects (instead of arrays)

Examples related to filter

Monitoring the Full Disclosure mailinglist Pyspark: Filter dataframe based on multiple conditions How Spring Security Filter Chain works Copy filtered data to another sheet using VBA Filter object properties by key in ES6 How do I filter date range in DataTables? How do I filter an array with TypeScript in Angular 2? Filtering array of objects with lodash based on property value How to filter an array from all elements of another array How to specify "does not contain" in dplyr filter

Examples related to lambda

Java 8 optional: ifPresent return object orElseThrow exception How to properly apply a lambda function into a pandas data frame column What are functional interfaces used for in Java 8? Java 8 lambda get and remove element from list Variable used in lambda expression should be final or effectively final Filter values only if not null using lambda in Java8 forEach loop Java 8 for Map entry set Java 8 Lambda Stream forEach with multiple statements Java 8 stream map to list of keys sorted by values Task.Run with Parameter(s)?