[python] How to fix 'Object arrays cannot be loaded when allow_pickle=False' for imdb.load_data() function?

I'm trying to implement the binary classification example using the IMDb dataset in Google Colab. I have implemented this model before. But when I tried to do it again after a few days, it returned a value error: 'Object arrays cannot be loaded when allow_pickle=False' for the load_data() function.

I have already tried solving this, referring to an existing answer for a similar problem: How to fix 'Object arrays cannot be loaded when allow_pickle=False' in the sketch_rnn algorithm. But it turns out that just adding an allow_pickle argument isn't sufficient.

My code:

from keras.datasets import imdb
(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)

The error:

ValueError                                Traceback (most recent call last)
<ipython-input-1-2ab3902db485> in <module>()
      1 from keras.datasets import imdb
----> 2 (train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)

2 frames
/usr/local/lib/python3.6/dist-packages/keras/datasets/imdb.py in load_data(path, num_words, skip_top, maxlen, seed, start_char, oov_char, index_from, **kwargs)
     57                     file_hash='599dadb1135973df5b59232a0e9a887c')
     58     with np.load(path) as f:
---> 59         x_train, labels_train = f['x_train'], f['y_train']
     60         x_test, labels_test = f['x_test'], f['y_test']
     61 

/usr/local/lib/python3.6/dist-packages/numpy/lib/npyio.py in __getitem__(self, key)
    260                 return format.read_array(bytes,
    261                                          allow_pickle=self.allow_pickle,
--> 262                                          pickle_kwargs=self.pickle_kwargs)
    263             else:
    264                 return self.zip.read(key)

/usr/local/lib/python3.6/dist-packages/numpy/lib/format.py in read_array(fp, allow_pickle, pickle_kwargs)
    690         # The array contained Python objects. We need to unpickle the data.
    691         if not allow_pickle:
--> 692             raise ValueError("Object arrays cannot be loaded when "
    693                              "allow_pickle=False")
    694         if pickle_kwargs is None:

ValueError: Object arrays cannot be loaded when allow_pickle=False

This question is related to python numpy keras

The answer is


Here's a trick to force imdb.load_data to allow pickle by, in your notebook, replacing this line:

(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)

by this:

import numpy as np
# save np.load
np_load_old = np.load

# modify the default parameters of np.load
np.load = lambda *a,**k: np_load_old(*a, allow_pickle=True, **k)

# call load_data with allow_pickle implicitly set to true
(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)

# restore np.load for future normal usage
np.load = np_load_old

This issue is still up on keras git. I hope it gets solved as soon as possible. Until then, try downgrading your numpy version to 1.16.2. It seems to solve the problem.

!pip install numpy==1.16.1
import numpy as np

This version of numpy has the default value of allow_pickle as True.


Following this issue on GitHub, the official solution is to edit the imdb.py file. This fix worked well for me without the need to downgrade numpy. Find the imdb.py file at tensorflow/python/keras/datasets/imdb.py (full path for me was: C:\Anaconda\Lib\site-packages\tensorflow\python\keras\datasets\imdb.py - other installs will be different) and change line 85 as per the diff:

-  with np.load(path) as f:
+  with np.load(path, allow_pickle=True) as f:

The reason for the change is security to prevent the Python equivalent of an SQL injection in a pickled file. The change above will ONLY effect the imdb data and you therefore retain the security elsewhere (by not downgrading numpy).


I just used allow_pickle = True as an argument to np.load() and it worked for me.

np.load(path, allow_pickle=True)


In my case worked with:

np.load(path, allow_pickle=True)

I think the answer from cheez (https://stackoverflow.com/users/122933/cheez) is the easiest and most effective one. I'd elaborate a little bit over it so it would not modify a numpy function for the whole session period.

My suggestion is below. I´m using it to download the reuters dataset from keras which is showing the same kind of error:

old = np.load
np.load = lambda *a,**k: old(*a,**k,allow_pickle=True)

from keras.datasets import reuters
(train_data, train_labels), (test_data, test_labels) = reuters.load_data(num_words=10000)

np.load = old
del(old)

You can try changing the flag's value

np.load(training_image_names_array,allow_pickle=True)

none of the above listed solutions worked for me: i run anaconda with python 3.7.3. What worked for me was

  • run "conda install numpy==1.16.1" from Anaconda powershell

  • close and reopen the notebook


I landed up here, tried your ways and could not figure out.

I was actually working on a pregiven code where

pickle.load(path)

was used so i replaced it with

np.load(path, allow_pickle=True)

on jupyter notebook using

np_load_old = np.load

# modify the default parameters of np.load
np.load = lambda *a,**k: np_load_old(*a, allow_pickle=True, **k)

worked fine, but the problem appears when you use this method in spyder(you have to restart the kernel every time or you will get an error like:

TypeError : () got multiple values for keyword argument 'allow_pickle'

I solved this issue using the solution here:


find the path to imdb.py then just add the flag to np.load(path,...flag...)

    def load_data(.......):
    .......................................
    .......................................
    - with np.load(path) as f:
    + with np.load(path,allow_pickle=True) as f:

Its work for me

        np_load_old = np.load
        np.load = lambda *a: np_load_old(*a, allow_pickle=True)
        (x_train, y_train), (x_test, y_test) = reuters.load_data(num_words=None, test_split=0.2)
        np.load = np_load_old

What I have found is that TensorFlow 2.0 (I am using 2.0.0-alpha0) is not compatible with the latest version of Numpy i.e. v1.17.0 (and possibly v1.16.5+). As soon as TF2 is imported, it throws a huge list of FutureWarning, that looks something like this:

FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/anaconda3/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/anaconda3/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/anaconda3/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.

This also resulted in the allow_pickle error when tried to load imdb dataset from keras

I tried to use the following solution which worked just fine, but I had to do it every single project where I was importing TF2 or tf.keras.

np.load = lambda *a,**k: np_load_old(*a, allow_pickle=True, **k)

The easiest solution I found was to either install numpy 1.16.1 globally, or use compatible versions of tensorflow and numpy in a virtual environment.

My goal with this answer is to point out that its not just a problem with imdb.load_data, but a larger problem vaused by incompatibility of TF2 and Numpy versions and may result in many other hidden bugs or issues.


Use this

 from tensorflow.keras.datasets import imdb

instead of this

 from keras.datasets import imdb

This error comes when you have the previous version of torch like 1.6.0 with torchvision==0.7.0, you may check yours torch version through this command:

import tensorflow
print(tensorflow.__version__)

this error is already resolved in the newer version of torch.

you can remove this error through making the following change in np.load()

np.load(somepath, allow_pickle=True)

The allow_pickle=True will solve it


The error also can occur if you try to save a python list of numpy arrays with np.save and load with np.load. I am only saying it for the sake of googler's to check out that this is not the issue. Also using allow_pickle=True fixed the issue if a list is indeed what you meant to save and load.


Yes, installing previous a version of numpy solved the problem.

For those who uses PyCharm IDE:

in my IDE (Pycharm), File->Settings->Project Interpreter: I found my numpy to be 1.16.3, so I revert back to 1.16.1. Click + and type numpy in the search, tick "specify version" : 1.16.1 and choose--> install package.


Tensorflow has a fix in tf-nightly version.

!pip install tf-nightly

The current version is '2.0.0-dev20190511'.


Instead of

from keras.datasets import imdb

use

from tensorflow.keras.datasets import imdb

top_words = 10000
((x_train, y_train), (x_test, y_test)) = imdb.load_data(num_words=top_words, seed=21)

I don't usually post to these things but this was super annoying. The confusion comes from the fact that some of the Keras imdb.py files have already updated:

with np.load(path) as f:

to the version with allow_pickle=True. Make sure check the imdb.py file to see if this change was already implemented. If it has been adjusted, the following works fine:

from keras.datasets import imdb
(train_text, train_labels), (test_text, test_labels) = imdb.load_data(num_words=10000)

The easiest way is to change imdb.py setting allow_pickle=True to np.load at the line where imdb.py throws error.


I was facing the same issue, here is line from error

File "/usr/lib/python3/dist-packages/numpy/lib/npyio.py", line 260, in __getitem__

So i solve the issue by updating "npyio.py" file. In npyio.py line 196 assigning value to allow_pickle so i update this line as

self.allow_pickle = True

The answer of @cheez sometime doesn't work and recursively call the function again and again. To solve this problem you should copy the function deeply. You can do this by using the function partial, so the final code is:

import numpy as np
from functools import partial

# save np.load
np_load_old = partial(np.load)

# modify the default parameters of np.load
np.load = lambda *a,**k: np_load_old(*a, allow_pickle=True, **k)

# call load_data with allow_pickle implicitly set to true
(train_data, train_labels), (test_data, test_labels) = 
imdb.load_data(num_words=10000)

# restore np.load for future normal usage
np.load = np_load_old

Questions with python tag:

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation Upgrade to python 3.8 using conda Unable to allocate array with shape and data type How to fix error "ERROR: Command errored out with exit status 1: python." when trying to install django-heroku using pip How to prevent Google Colab from disconnecting? "UserWarning: Matplotlib is currently using agg, which is a non-GUI backend, so cannot show the figure." when plotting figure with pyplot on Pycharm How to fix 'Object arrays cannot be loaded when allow_pickle=False' for imdb.load_data() function? "E: Unable to locate package python-pip" on Ubuntu 18.04 Tensorflow 2.0 - AttributeError: module 'tensorflow' has no attribute 'Session' Jupyter Notebook not saving: '_xsrf' argument missing from post How to Install pip for python 3.7 on Ubuntu 18? Python: 'ModuleNotFoundError' when trying to import module from imported package OpenCV TypeError: Expected cv::UMat for argument 'src' - What is this? Requests (Caused by SSLError("Can't connect to HTTPS URL because the SSL module is not available.") Error in PyCharm requesting website How to setup virtual environment for Python in VS Code? Pylint "unresolved import" error in Visual Studio Code Pandas Merging 101 Numpy, multiply array with scalar What is the meaning of "Failed building wheel for X" in pip install? Selenium: WebDriverException:Chrome failed to start: crashed as google-chrome is no longer running so ChromeDriver is assuming that Chrome has crashed Could not install packages due to an EnvironmentError: [Errno 13] OpenCV !_src.empty() in function 'cvtColor' error ConvergenceWarning: Liblinear failed to converge, increase the number of iterations How to downgrade python from 3.7 to 3.6 I can't install pyaudio on Windows? How to solve "error: Microsoft Visual C++ 14.0 is required."? Iterating over arrays in Python 3 How do I install opencv using pip? How do I install Python packages in Google's Colab? How do I use TensorFlow GPU? How to upgrade Python version to 3.7? How to resolve TypeError: can only concatenate str (not "int") to str How can I install a previous version of Python 3 in macOS using homebrew? Flask at first run: Do not use the development server in a production environment TypeError: only integer scalar arrays can be converted to a scalar index with 1D numpy indices array What is the difference between Jupyter Notebook and JupyterLab? Pytesseract : "TesseractNotFound Error: tesseract is not installed or it's not in your path", how do I fix this? Could not install packages due to a "Environment error :[error 13]: permission denied : 'usr/local/bin/f2py'" How do I resolve a TesseractNotFoundError? Trying to merge 2 dataframes but get ValueError Authentication plugin 'caching_sha2_password' is not supported Python Pandas User Warning: Sorting because non-concatenation axis is not aligned

Questions with numpy tag:

Unable to allocate array with shape and data type How to fix 'Object arrays cannot be loaded when allow_pickle=False' for imdb.load_data() function? Numpy, multiply array with scalar TypeError: only integer scalar arrays can be converted to a scalar index with 1D numpy indices array Could not install packages due to a "Environment error :[error 13]: permission denied : 'usr/local/bin/f2py'" Pytorch tensor to numpy array Numpy Resize/Rescale Image what does numpy ndarray shape do? How to round a numpy array? numpy array TypeError: only integer scalar arrays can be converted to a scalar index Convert np.array of type float64 to type uint8 scaling values How to import cv2 in python3? How to calculate 1st and 3rd quartiles? Counting unique values in a column in pandas dataframe like in Qlik? Binning column with python pandas convert array into DataFrame in Python How to change a single value in a NumPy array? 'DataFrame' object has no attribute 'sort' ValueError: could not broadcast input array from shape (224,224,3) into shape (224,224) Pytorch reshape tensor dimension Python "TypeError: unhashable type: 'slice'" for encoding categorical data len() of a numpy array in python ValueError: cannot reshape array of size 30470400 into shape (50,1104,104) Python - AttributeError: 'numpy.ndarray' object has no attribute 'append' How to plot vectors in python using matplotlib How to plot an array in python? TypeError: 'DataFrame' object is not callable LogisticRegression: Unknown label type: 'continuous' using sklearn in python Python Pandas - Missing required dependencies ['numpy'] 1 Pandas Split Dataframe into two Dataframes at a specific row What does 'index 0 is out of bounds for axis 0 with size 0' mean? What is the difference between i = i + 1 and i += 1 in a 'for' loop? Get index of a row of a pandas dataframe as an integer FutureWarning: elementwise comparison failed; returning scalar, but in the future will perform elementwise comparison TensorFlow ValueError: Cannot feed value of shape (64, 64, 3) for Tensor u'Placeholder:0', which has shape '(?, 64, 64, 3)' How to get element-wise matrix multiplication (Hadamard product) in numpy? Showing ValueError: shapes (1,3) and (1,3) not aligned: 3 (dim 1) != 1 (dim 0) Pandas: convert dtype 'object' to int ValueError: all the input arrays must have same number of dimensions Numpy: Checking if a value is NaT How to split data into 3 sets (train, validation and test)? Pandas: Subtracting two date columns and the result being an integer How to get the indices list of all NaN value in numpy array? What is dtype('O'), in pandas? ImportError: cannot import name NUMPY_MKL why numpy.ndarray is object is not callable in my simple for python loop How to convert numpy arrays to standard TensorFlow format? ValueError when checking if variable is None or numpy.array TypeError: only length-1 arrays can be converted to Python scalars while plot showing TypeError: Invalid dimensions for image data when plotting array with imshow()

Questions with keras tag:

Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation How to fix 'Object arrays cannot be loaded when allow_pickle=False' for imdb.load_data() function? Tensorflow 2.0 - AttributeError: module 'tensorflow' has no attribute 'Session' What is the use of verbose in Keras while validating the model? Save and load weights in keras How to import keras from tf.keras in Tensorflow? How to check which version of Keras is installed? Can I run Keras model on gpu? How to check if keras tensorflow backend is GPU or CPU version? Keras input explanation: input_shape, units, batch_size, dim, etc How to predict input image using trained model in Keras? What is the role of "Flatten" in Keras? Running Tensorflow in Jupyter Notebook How to concatenate two layers in keras? Why plt.imshow() doesn't display the image? How to save final model using keras? How do I use the Tensorboard callback of Keras? Why binary_crossentropy and categorical_crossentropy give different performances for the same problem? Keras, How to get the output of each layer? Error when checking model input: expected convolution2d_input_1 to have 4 dimensions, but got array with shape (32, 32, 3) Can Keras with Tensorflow backend be forced to use CPU or GPU at will? Deep-Learning Nan loss reasons Cannot import keras after installation Get class labels from Keras functional model Using Keras & Tensorflow with AMD GPU Keras, how do I predict after I trained a model? How to return history of validation loss in Keras Keras model.summary() result - Understanding the # of Parameters How to load a model from an HDF5 file in Keras? Where do I call the BatchNormalization function in Keras? How do I install Keras and Theano in Anaconda Python on Windows?