Reference documentation/manual

The HDFArchive class offers a convenient interface between the python objects and the HDF5 files, similar to a dictionary (or a shelve).

The module contains two classes:

Typically, one constructs an HDFArchive explicitely, the HDFArchiveGroup is created during operations, e.g.:

h = HDFArchive( "myfile.h5", 'r')
g = h['subgroup1'] # g is a HDFArchiveGroup.

Apart from the root path and the constructor, the classes are the same (in fact HDFArchive is a HDFArchiveGroup). Let us first document HDFArchive.

Warning

HDFArchive and HDFArchiveGroup do NOT handle parallelism. Check however the HDFArchiveInert below.

HDFArchive

class h5.HDFArchive(descriptor=None, open_flag='a', key_as_string_only=True, reconstruct_python_object=True, init={})[source]

HDFArchiveGroup

class HDFArchiveGroup

There is no explicit constructor for the user of the class.

The HDFArchiveGroup support most of the operations supported by dictionaries. In the following, H is a HDFArchiveGroup.

len(H)

Return the number of items in the HDFArchiveGroup H.

H[key]

Return the item of H with key key, retrieved from the file. Raises a KeyError if key is not in the HDFArchiveGroup.

get_raw(key)

Returns the subgroup key, without any reconstruction, ignoring the HDF5_data_scheme.

H[key] = value

Set H[key] to value.

del H[key]

Remove H[key] from H. Raises a KeyError if key is not in the HDFArchiveGroup.

key in H

Return True if H has a key key, else False.

key not in H

Equivalent to not key in H.

iter(H)

Return an iterator over the keys of the dictionary. This is a shortcut for iterkeys().

items()

Generator returning couples (key, values) in the group.

Warning

Note that in all these iterators, the objects will only be retrieved from the file and loaded into memory one by one.

keys()

Generator returning the keys of the group.

update(d)

Add into the archive the content of any mapping d: keys->values, with hfd-compliant values.

values()

Generator returning the values in the group

create_group(K)

Creates a new subgroup named K to the root path of the group. Raises exception if the subgroup already exists.

is_group(K)

Return True iif K is a subgroup.

is_data(K)

Return True iif K is a leaf.

read_attr(AttributeName)

Return the attribute AttributeName of the root path of the group. If there is no attribute, return None.

root_path()

Return the root path of the group

apply_on_leaves(f)

For each named leaf (name,value) of the tree, it calls f(name,value).

f should return:

  • None : no action is taken

  • an empty tuple () : the leaf is removed from the tree

  • an hdf-compliant value : the leaf is replaced by the value

HDFArchiveInert

class HDFArchiveInert

HDFArchive and HDFArchiveGroup do NOT handle parallelism. In general, it is good practive to write/read only on the master node. Reading from all nodes on a cluster may lead to communication problems.

To simplify the writing of code, the simple HDFArchiveInert class may be useful. It is basically inert but does not fail.

H[key]

Return H and never raise exception. E.g. H[‘a’][‘b’] never raises an exception.

H[key] = value

Does nothing.

Usage in a mpi code, e.g.

R = HDFArchive("Results.h5",'w') if mpi.is_master_node() else HDFArchiveInert()
a= mpi.bcast(R['a'])       # properly broadcast the R['a'] from the master to the nodes.
R['b'] = X                 # sets R['b'] in the file on the master only, does nothing on the nodes.

Hdf-compliant objects

By definition, hdf-compliant objects are those which can be stored/retrieved in an HDFArchive. In order to be hdf-compliant, a class must:

  • have a HDF5_data_scheme tag properly registered.

  • implement one of the two protocols described below.

HDF5 data scheme

To each hdf-compliant object, we associate a data scheme which describes how the data is stored in the hdf5 tree, i.e. the tree structure with the name of the nodes and their contents. This data scheme is added in the attribute HDF5_data_scheme at the node corresponding to the object in the file.

For a given class Cls, the HDF5_data_scheme is Cls._hdf5_data_scheme_ if it exists or the name of the class Cls.__name__. The HDF5_data_scheme of a class must be registered in order for HDFArchive to properly reconstruct the object when rereading. The class is registered using the module formats

class myclass :
  pass #....

from h5.formats import register_class
register_class (myclass)

The function is

register_class(cls[, doc = None])
Parameters:
  • cls – the class to be registered.

  • doc – a doc directory

Register the class for HDFArchive use.

The name of data scheme will be myclass._hdf5_data_scheme_ if it is defined, or the name of the class otherwise.

How does a class become hdf-compliant?