In another question, other users offered some help if I could supply the array I was having trouble with. However, I even fail at a basic I/O task, such as writing an array to a file.
Can anyone explain what kind of loop I would need to write a 4x11x14 numpy array to file?
This array consist of four 11 x 14 arrays, so I should format it with a nice newline, to make the reading of the file easier on others.
Edit: So I've tried the numpy.savetxt function. Strangely, it gives the following error:
TypeError: float argument required, not numpy.ndarray
I assume that this is because the function doesn't work with multidimensional arrays? Any solutions as I would like them within one file?
ndarray.tofile()
should also work
e.g. if your array is called a
:
a.tofile('yourfile.txt',sep=" ",format="%s")
Not sure how to get newline formatting though.
Edit (credit Kevin J. Black's comment here):
Since version 1.5.0,
np.tofile()
takes an optional parameternewline='\n'
to allow multi-line output. https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.savetxt.html
If you don't need a human-readable output, another option you could try is to save the array as a MATLAB .mat
file, which is a structured array. I despise MATLAB, but the fact that I can both read and write a .mat
in very few lines is convenient.
Unlike Joe Kington's answer, the benefit of this is that you don't need to know the original shape of the data in the .mat
file, i.e. no need to reshape upon reading in. And, unlike using pickle
, a .mat
file can be read by MATLAB, and probably some other programs/languages as well.
Here is an example:
import numpy as np
import scipy.io
# Some test data
x = np.arange(200).reshape((4,5,10))
# Specify the filename of the .mat file
matfile = 'test_mat.mat'
# Write the array to the mat file. For this to work, the array must be the value
# corresponding to a key name of your choice in a dictionary
scipy.io.savemat(matfile, mdict={'out': x}, oned_as='row')
# For the above line, I specified the kwarg oned_as since python (2.7 with
# numpy 1.6.1) throws a FutureWarning. Here, this isn't really necessary
# since oned_as is a kwarg for dealing with 1-D arrays.
# Now load in the data from the .mat that was just saved
matdata = scipy.io.loadmat(matfile)
# And just to check if the data is the same:
assert np.all(x == matdata['out'])
If you forget the key that the array is named in the .mat
file, you can always do:
print matdata.keys()
And of course you can store many arrays using many more keys.
So yes – it won't be readable with your eyes, but only takes 2 lines to write and read the data, which I think is a fair trade-off.
Take a look at the docs for scipy.io.savemat and scipy.io.loadmat and also this tutorial page: scipy.io File IO Tutorial
Write to a file with Python's print()
:
import numpy as np
import sys
stdout_sys = sys.stdout
np.set_printoptions(precision=8) # Sets number of digits of precision.
np.set_printoptions(suppress=True) # Suppress scientific notations.
np.set_printoptions(threshold=sys.maxsize) # Prints the whole arrays.
with open('myfile.txt', 'w') as f:
sys.stdout = f
print(nparr)
sys.stdout = stdout_sys
Use set_printoptions()
to customize how the objects are displayed.
Pickle is best for these cases. Suppose you have a ndarray named x_train
. You can dump it into a file and revert it back using the following command:
import pickle
###Load into file
with open("myfile.pkl","wb") as f:
pickle.dump(x_train,f)
###Extract from file
with open("myfile.pkl","rb") as f:
x_temp = pickle.load(f)
I am not certain if this meets your requirements, given I think you are interested in making the file readable by people, but if that's not a primary concern, just pickle
it.
To save it:
import pickle
my_data = {'a': [1, 2.0, 3, 4+6j],
'b': ('string', u'Unicode string'),
'c': None}
output = open('data.pkl', 'wb')
pickle.dump(my_data, output)
output.close()
To read it back:
import pprint, pickle
pkl_file = open('data.pkl', 'rb')
data1 = pickle.load(pkl_file)
pprint.pprint(data1)
pkl_file.close()
There exist special libraries to do just that. (Plus wrappers for python)
netCDF4 Python interface: http://www.unidata.ucar.edu/software/netcdf/software.html#Python
hope this helps
You can simply traverse the array in three nested loops and write their values to your file. For reading, you simply use the same exact loop construction. You will get the values in exactly the right order to fill your arrays correctly again.
I have a way to do it using a simply filename.write() operation. It works fine for me, but I'm dealing with arrays having ~1500 data elements.
I basically just have for loops to iterate through the file and write it to the output destination line-by-line in a csv style output.
import numpy as np
trial = np.genfromtxt("/extension/file.txt", dtype = str, delimiter = ",")
with open("/extension/file.txt", "w") as f:
for x in xrange(len(trial[:,1])):
for y in range(num_of_columns):
if y < num_of_columns-2:
f.write(trial[x][y] + ",")
elif y == num_of_columns-1:
f.write(trial[x][y])
f.write("\n")
The if and elif statement are used to add commas between the data elements. For whatever reason, these get stripped out when reading the file in as an nd array. My goal was to output the file as a csv, so this method helps to handle that.
Hope this helps!
Use JSON module for multidimensional arrays, e.g.
import json
with open(filename, 'w') as f:
json.dump(myndarray.tolist(), f)
Source: Stackoverflow.com