HDF5 for Python Documentation
Groups are the container mechanism by which HDF5 files are organized. From a Python perspective, they operate somewhat like dictionaries. In this case the "keys" are the names of group members, and the "values" are the members themselves (Group and Dataset) objects.
Group objects also contain most of the machinery which makes HDF5 useful. The File object does double duty as the HDF5 root group, and serves as your entry point into the file:
>>> f = h5py.File('foo.hdf5','w')
>>> f.name
'/'
>>> f.keys()
[]
New groups are easy to create:
>>> grp = f.create_group("bar")
>>> grp.name
'/bar'
>>> subgrp = grp.create_group("baz")
>>> subgrp.name
'/bar/baz'
Datasets are also created by a Group method:
>>> dset = subgrp.create_dataset("MyDS", (100,100), dtype='i')
>>> dset.name
'/bar/baz/MyDS'
Groups implement a subset of the Python dictionary convention. They have methods like keys(), values() and support iteration. Most importantly, they support the indexing syntax, and standard exceptions:
>>> myds = subgrp["MyDS"]
>>> missing = subgrp["missing"]
KeyError: "Name doesn't exist (Symbol table: Object not found)"
Objects can be deleted from the file using the standard syntax:
>>> del subgroup["MyDataset"]
Group objects implement the following subset of the Python "mapping" interface:
Add the given object to the group.
The action taken depends on the type of object assigned:
If a group member of the same name already exists, the assignment will fail.
Create a new HDF5 group.
Fails with ValueError if the group already exists.
Open the specified HDF5 group, creating it if it doesn't exist.
Fails with TypeError if an incompatible object (dataset or named type) already exists.
Create a new dataset. There are two logical ways to specify the dataset:
- Give the shape, and optionally the dtype. If the dtype is not given, single-precision floating point ('=f4') will be assumed.
- Give a NumPy array (or anything that can be converted to a NumPy array) via the "data" argument. The shape and dtype of this array will be used, and the dataset will be initialized to its contents.
Additional keyword parameters control the details of how the dataset is stored.
Keywords (see also Dataset Special features):
Setting for compression filter; legal values for each filter type are:
"gzip" | Integer 0-9 |
"lzf" | (none allowed) |
"szip" | 2-tuple ('ec'|'nn', even integer 0-32) |
See the filters module docstring for a more detailed description of these filters.
Open a new dataset, creating one if it doesn't exist.
This method operates exactly like create_dataset(), except that if a dataset with compatible shape and dtype already exists, it is opened instead. The additional keyword arguments are only honored when actually creating a dataset; they are ignored for the comparison.
If an existing incompatible object (Group or Datatype) already exists with the given name, fails with ValueError.
Only available with HDF5 1.8
Recusively copy an object from one location to another, or between files.
Copies the given object, and (if it is a group) all objects below it in the hierarchy. The destination need not be in the same file.
Only available with HDF5 1.8
Recursively iterate a callable over objects in this group.
You supply a callable (function, method or callable object); it will be called exactly once for each link in this group and every group below it. Your callable must conform to the signature:
func(<member name>) -> <None or return value>
Returning None continues iteration, returning anything else stops and immediately returns that value from the visit method. No particular order of iteration within groups is guranteed.
Example:
>>> # List the entire contents of the file
>>> f = File("foo.hdf5")
>>> list_of_names = []
>>> f.visit(list_of_names.append)
Only available with HDF5 1.8
Recursively visit names and objects in this group and subgroups.
You supply a callable (function, method or callable object); it will be called exactly once for each link in this group and every group below it. Your callable must conform to the signature:
func(<member name>, <object>) -> <None or return value>
Returning None continues iteration, returning anything else stops and immediately returns that value from the visit method. No particular order of iteration within groups is guranteed.
Example:
# Get a list of all datasets in the file
>>> mylist = []
>>> def func(name, obj):
... if isinstance(obj, Dataset):
... mylist.append(name)
...
>>> f = File('foo.hdf5')
>>> f.visititems(func)