[python] How to read HDF5 files in Python

I am trying to read data from hdf5 file in Python. I can read the hdf5 file using h5py, but I cannot figure out how to access data within the file.

My code

import h5py    
import numpy as np    
f1 = h5py.File(file_name,'r+')    

This works and the file is read. But how can I access data inside the file object f1?

This question is related to python hdf5

The answer is


Reading the file

import h5py

f = h5py.File(file_name, mode)

Studying the structure of the file by printing what HDF5 groups are present

for key in f.keys():
    print(key) #Names of the groups in HDF5 file.

Extracting the data

#Get the HDF5 group
group = f[key]

#Checkout what keys are inside that group.
for key in group.keys():
    print(key)

data = group[some_key_inside_the_group].value
#Do whatever you want with data

#After you are done
f.close()

What you need to do is create a dataset. If you take a look at the quickstart guide, it shows you that you need to use the file object in order to create a dataset. So, f.create_dataset and then you can read the data. This is explained in the docs.


If you have named datasets in the hdf file then you can use the following code to read and convert these datasets in numpy arrays:

import h5py
file = h5py.File('filename.h5', 'r')

xdata = file.get('xdata')
xdata= np.array(xdata)

If your file is in a different directory you can add the path in front of'filename.h5'.


Use below code to data read and convert into numpy array

import h5py
f1 = h5py.File('data_1.h5', 'r')
list(f1.keys())
X1 = f1['x']
y1=f1['y']
df1= np.array(X1.value)
dfy1= np.array(y1.value)
print (df1.shape)
print (dfy1.shape)

you can use Pandas.

import pandas as pd
pd.read_hdf(filename,key)

from keras.models import load_model 

h= load_model('FILE_NAME.h5')

To read the content of .hdf5 file as an array, you can do something as follow

> import numpy as np 
> myarray = np.fromfile('file.hdf5', dtype=float)
> print(myarray)

Using bits of answers from this question and the latest doc, I was able to extract my numerical arrays using

import h5py
with h5py.File(filename, 'r') as h5f:
    h5x = h5f[list(h5f.keys())[0]]['x'][()]

Where 'x' is simply the X coordinate in my case.


Here's a simple function I just wrote which reads a .hdf5 file generated by the save_weights function in keras and returns a dict with layer names and weights:

def read_hdf5(path):

    weights = {}

    keys = []
    with h5py.File(path, 'r') as f: # open file
        f.visit(keys.append) # append all keys to list
        for key in keys:
            if ':' in key: # contains data if ':' in key
                print(f[key].name)
                weights[f[key].name] = f[key].value
    return weights

https://gist.github.com/Attila94/fb917e03b04035f3737cc8860d9e9f9b.

Haven't tested it thoroughly but does the job for me.