[python] Removing multiple keys from a dictionary safely

I know how to remove an entry, 'key' from my dictionary d, safely. You do:

if d.has_key('key'):
    del d['key']

However, I need to remove multiple entries from a dictionary safely. I was thinking of defining the entries in a tuple as I will need to do this more than once.

entities_to_remove = ('a', 'b', 'c')
for x in entities_to_remove:
    if x in d:
        del d[x]

However, I was wondering if there is a smarter way to do this?

This question is related to python dictionary

The answer is


inline

import functools

#: not key(c) in d
d = {"a": "avalue", "b": "bvalue", "d": "dvalue"}

entitiesToREmove = ('a', 'b', 'c')

#: python2
map(lambda x: functools.partial(d.pop, x, None)(), entitiesToREmove)

#: python3

list(map(lambda x: functools.partial(d.pop, x, None)(), entitiesToREmove))

print(d)
# output: {'d': 'dvalue'}

Found a solution with pop and map

d = {'a': 'valueA', 'b': 'valueB', 'c': 'valueC', 'd': 'valueD'}
keys = ['a', 'b', 'c']
list(map(d.pop, keys))
print(d)

The output of this:

{'d': 'valueD'}

I have answered this question so late just because I think it will help in the future if anyone searches the same. And this might help.

Update

The above code will throw an error if a key does not exist in the dict.

DICTIONARY = {'a': 'valueA', 'b': 'valueB', 'c': 'valueC', 'd': 'valueD'}
keys = ['a', 'l', 'c']

def remove_keys(key):
    try:
        DICTIONARY.pop(key, None)
    except:
        pass  # or do any action

list(map(remove_key, keys))
print(DICTIONARY)

output:

DICTIONARY = {'b': 'valueB', 'd': 'valueD'}

It would be nice to have full support for set methods for dictionaries (and not the unholy mess we're getting with Python 3.9) so that you could simply "remove" a set of keys. However, as long as that's not the case, and you have a large dictionary with potentially a large number of keys to remove, you might want to know about the performance. So, I've created some code that creates something large enough for meaningful comparisons: a 100,000 x 1000 matrix, so 10,000,00 items in total.

from itertools import product
from time import perf_counter

# make a complete worksheet 100000 * 1000
start = perf_counter()
prod = product(range(1, 100000), range(1, 1000))
cells = {(x,y):x for x,y in prod}
print(len(cells))

print(f"Create time {perf_counter()-start:.2f}s")
clock = perf_counter()
# remove everything above row 50,000

keys = product(range(50000, 100000), range(1, 100))

# for x,y in keys:
#     del cells[x, y]

for n in map(cells.pop, keys):
    pass

print(len(cells))
stop = perf_counter()
print(f"Removal time {stop-clock:.2f}s")

10 million items or more is not unusual in some settings. Comparing the two methods on my local machine I see a slight improvement when using map and pop, presumably because of fewer function calls, but both take around 2.5s on my machine. But this pales in comparison to the time required to create the dictionary in the first place (55s), or including checks within the loop. If this is likely then its best to create a set that is a intersection of the dictionary keys and your filter:

keys = cells.keys() & keys

In summary: del is already heavily optimised, so don't worry about using it.


Another map() way to remove list of keys from dictionary

and avoid raising KeyError exception

    dic = {
        'key1': 1,
        'key2': 2,
        'key3': 3,
        'key4': 4,
        'key5': 5,
    }
    
keys_to_remove = ['key_not_exist', 'key1', 'key2', 'key3']
k = list(map(dic.pop, keys_to_remove, keys_to_remove))

print('k=', k)
print('dic after =  \n', dic)

**this will produce output** 

k= ['key_not_exist', 1, 2, 3]
dic after =  {'key4': 4, 'key5': 5}

Duplicate keys_to_remove is artificial, it needs to supply defaults values for dict.pop() function. You can add here any array with len_ = len(key_to_remove)


For example

dic = {
    'key1': 1,
    'key2': 2,
    'key3': 3,
    'key4': 4,
    'key5': 5,
}

keys_to_remove = ['key_not_exist', 'key1', 'key2', 'key3']    
k = list(map(dic.pop, keys_to_remove, np.zeros(len(keys_to_remove))))

print('k=', k)
print('dic after = ', dic)

** will produce output **

k= [0.0, 1, 2, 3]
dic after =  {'key4': 4, 'key5': 5}

Why not:

entriestoremove = (2,5,1)
for e in entriestoremove:
    if d.has_key(e):
        del d[e]

I don't know what you mean by "smarter way". Surely there are other ways, maybe with dictionary comprehensions:

entriestoremove = (2,5,1)
newdict = {x for x in d if x not in entriestoremove}

I think using the fact that the keys can be treated as a set is the nicest way if you're on python 3:

def remove_keys(d, keys):
    to_remove = set(keys)
    filtered_keys = d.keys() - to_remove
    filtered_values = map(d.get, filtered_keys)
    return dict(zip(filtered_keys, filtered_values))

Example:

>>> remove_keys({'k1': 1, 'k3': 3}, ['k1', 'k2'])
{'k3': 3}

Using Dict Comprehensions

final_dict = {key: t[key] for key in t if key not in [key1, key2]}

where key1 and key2 are to be removed.

In the example below, keys "b" and "c" are to be removed & it's kept in a keys list.

>>> a
{'a': 1, 'c': 3, 'b': 2, 'd': 4}
>>> keys = ["b", "c"]
>>> print {key: a[key] for key in a if key not in keys}
{'a': 1, 'd': 4}
>>> 

a solution is using map and filter functions

python 2

d={"a":1,"b":2,"c":3}
l=("a","b","d")
map(d.__delitem__, filter(d.__contains__,l))
print(d)

python 3

d={"a":1,"b":2,"c":3}
l=("a","b","d")
list(map(d.__delitem__, filter(d.__contains__,l)))
print(d)

you get:

{'c': 3}

If you also need to retrieve the values for the keys you are removing, this would be a pretty good way to do it:

values_removed = [d.pop(k, None) for k in entities_to_remove]

You could of course still do this just for the removal of the keys from d, but you would be unnecessarily creating the list of values with the list comprehension. It is also a little unclear to use a list comprehension just for the function's side effect.


I'm late to this discussion but for anyone else. A solution may be to create a list of keys as such.

k = ['a','b','c','d']

Then use pop() in a list comprehension, or for loop, to iterate over the keys and pop one at a time as such.

new_dictionary = [dictionary.pop(x, 'n/a') for x in k]

The 'n/a' is in case the key does not exist, a default value needs to be returned.


I have no problem with any of the existing answers, but I was surprised to not find this solution:

keys_to_remove = ['a', 'b', 'c']
my_dict = {k: v for k, v in zip("a b c d e f g".split(' '), [0, 1, 2, 3, 4, 5, 6])}

for k in keys_to_remove:
    try:
        del my_dict[k]
    except KeyError:
        pass

assert my_dict == {'d': 3, 'e': 4, 'f': 5, 'g': 6}

Note: I stumbled across this question coming from here. And my answer is related to this answer.


Using dict.pop:

d = {'some': 'data'}
entries_to_remove = ('any', 'iterable')
for k in entries_to_remove:
    d.pop(k, None)

I have tested the performance of three methods:

# Method 1: `del`
for key in remove_keys:
    if key in d:
        del d[key]

# Method 2: `pop()`
for key in remove_keys:
    d.pop(key, None)

# Method 3: comprehension
{key: v for key, v in d.items() if key not in remove_keys}

Here are the results of 1M iterations:

  1. del: 2.03s 2.0 ns/iter (100%)
  2. pop(): 2.38s 2.4 ns/iter (117%)
  3. comprehension: 4.11s 4.1 ns/iter (202%)

So both del and pop() are the fastest. Comprehensions are 2x slower. But anyway, we speak nanoseconds here :) Dicts in Python are ridiculously fast.


Some timing tests for cpython 3 shows that a simple for loop is the fastest way, and it's quite readable. Adding in a function doesn't cause much overhead either:

timeit results (10k iterations):

  • all(x.pop(v) for v in r) # 0.85
  • all(map(x.pop, r)) # 0.60
  • list(map(x.pop, r)) # 0.70
  • all(map(x.__delitem__, r)) # 0.44
  • del_all(x, r) # 0.40
  • <inline for loop>(x, r) # 0.35
def del_all(mapping, to_remove):
      """Remove list of elements from mapping."""
      for key in to_remove:
          del mapping[key]

For small iterations, doing that 'inline' was a bit faster, because of the overhead of the function call. But del_all is lint-safe, reusable, and faster than all the python comprehension and mapping constructs.