Python-caterva documentation¶
Python-caterva is a Python wrapper of Caterva, an open source C library specially designed to deal with large multidimensional, chunked, compressed datasets.
Getting Started
New to python-caterva? Check out the getting started guides. They contain an introduction to python-caterva’ main concepts and an installation tutorial.
API Reference
The reference guide contains a detailed description of the python-caterva API. The reference describes how the functions work and which parameters can be used.
Development
Saw a typo in the documentation? Want to improve existing functionalities? The contributing guidelines will guide you through the process of improving python-caterva.
Release Notes
Want to see what’s new in the latest release? Check out the release notes to find out!
Getting Started¶
Installation¶
Pip¶
python -m pip install caterva
Source code¶
git clone --recurse-submodules https://github.com/Blosc/python-caterva
cd python-caterva
python -m pip install .
Tutorial¶
import caterva as cat
cat.__version__
'0.6.0'
Creating an array¶
c = cat.zeros((10000, 10000), itemsize=4, chunks=(1000, 1000), blocks=(100, 100))
c
<caterva.ndarray.NDArray at 0x7f0bc0552150>
Reading and writing data¶
import struct
import numpy as np
dtype = np.int32
c[0, :] = np.arange(10000, dtype=dtype)
c[:, 0] = np.arange(10000, dtype=dtype)
c[0, 0]
<caterva.ndarray.NDArray at 0x7f0bb00bf050>
np.array(c[0, 0]).view(dtype)
array(0, dtype=int32)
np.array(c[0, -1]).view(dtype)
array(9999, dtype=int32)
np.array(c[0, :]).view(dtype)
array([ 0, 1, 2, ..., 9997, 9998, 9999], dtype=int32)
np.array(c[:, 0]).view(dtype)
array([ 0, 1, 2, ..., 9997, 9998, 9999], dtype=int32)
np.array(c[:]).view(dtype)
array([[ 0, 1, 2, ..., 9997, 9998, 9999],
[ 1, 0, 0, ..., 0, 0, 0],
[ 2, 0, 0, ..., 0, 0, 0],
...,
[9997, 0, 0, ..., 0, 0, 0],
[9998, 0, 0, ..., 0, 0, 0],
[9999, 0, 0, ..., 0, 0, 0]], dtype=int32)
Persistent data¶
c1 = cat.full((1000, 1000), fill_value=b"pepe", chunks=(100, 100), blocks=(50, 50),
urlpath="cat_tutorial.caterva")
c2 = cat.open("cat_tutorial.caterva")
c2.info
Type | NDArray (Blosc) |
---|---|
Itemsize | 4 |
Shape | (1000, 1000) |
Chunks | (100, 100) |
Blocks | (50, 50) |
Comp. codec | LZ4 |
Comp. level | 5 |
Comp. filters | [SHUFFLE] |
Comp. ratio | 588.24 |
np.array(c2[0, 20:30]).view("S4")
array([b'pepe', b'pepe', b'pepe', b'pepe', b'pepe', b'pepe', b'pepe',
b'pepe', b'pepe', b'pepe'], dtype='|S4')
import os
if os.path.exists("cat_tutorial.caterva"):
cat.remove("cat_tutorial.caterva")
Compression params¶
b = np.arange(1000000).tobytes()
c1 = cat.from_buffer(b, shape=(1000, 1000), itemsize=8, chunks=(500, 10), blocks=(50, 10))
c1.info
Type | NDArray (Blosc) |
---|---|
Itemsize | 8 |
Shape | (1000, 1000) |
Chunks | (500, 10) |
Blocks | (50, 10) |
Comp. codec | LZ4 |
Comp. level | 5 |
Comp. filters | [SHUFFLE] |
Comp. ratio | 6.64 |
c2 = c1.copy(chunks=(500, 10), blocks=(50, 10),
codec=cat.Codec.ZSTD, clevel=9, filters=[cat.Filter.BITSHUFFLE])
c2.info
Type | NDArray (Blosc) |
---|---|
Itemsize | 8 |
Shape | (1000, 1000) |
Chunks | (500, 10) |
Blocks | (50, 10) |
Comp. codec | ZSTD |
Comp. level | 9 |
Comp. filters | [BITSHUFFLE] |
Comp. ratio | 20.83 |
Metalayers¶
from msgpack import packb, unpackb
meta = {
"dtype": packb("i8"),
"coords": packb([5.14, 23.])
}
c = cat.zeros((1000, 1000), 5, chunks=(100, 100), blocks=(50, 50), meta=meta)
len(c.meta)
3
c.meta.keys()
['caterva', 'dtype', 'coords']
for key in c.meta:
print(f"{key} -> {unpackb(c.meta[key])}")
caterva -> [0, 2, [1000, 1000], [100, 100], [50, 50]]
dtype -> i8
coords -> [5.14, 23.0]
c.meta["coords"] = packb([0., 23.])
for key in c.meta:
print(f"{key} -> {unpackb(c.meta[key])}")
caterva -> [0, 2, [1000, 1000], [100, 100], [50, 50]]
dtype -> i8
coords -> [0.0, 23.0]
Example of use¶
from PIL import Image
im = Image.open("../_static/blosc-logo_128.png")
im

meta = {"dtype": b"|u1"}
c = cat.asarray(np.array(im), chunks=(50, 50, 4), blocks=(10, 10, 4), meta=meta)
c.info
Type | NDArray (Blosc) |
---|---|
Itemsize | 1 |
Shape | (70, 128, 4) |
Chunks | (50, 50, 4) |
Blocks | (10, 10, 4) |
Comp. codec | LZ4 |
Comp. level | 5 |
Comp. filters | [SHUFFLE] |
Comp. ratio | 2.68 |
im2 = c[15:55, 10:35] # Letter B
Image.fromarray(np.array(im2).view(c.meta["dtype"]))

API Reference¶
Constructors¶
NDArray¶
The multidimensional data array class.
Attributes¶
The itemsize of this container. |
|
The number of dimensions of this container. |
|
The shape of this container. |
|
The chunk shape of this container. |
|
The block shape of this container. |
|
Metalayers¶
-
class
caterva.meta.
Meta
(ndarray)¶ Class providing access to user meta on a
NDArray
. It will be available via the .meta property of an array.
Methods¶
Return the item metalayer. |
|
Update the key metalayer with value. |
|
Return the value for key if key is in the dictionary, else default. |
|
Return the metalayers keys |
|
Iter over the keys of the metalayers |
|
Check if the key metalayer exists or not |
Development¶
Contributing to python-caterva¶
python-caterva is a community maintained project. We want to make contributing to this project as easy and transparent as possible.
Asking for help¶
If you have a question about how to use python-caterva, please post your question on StackOverflow using the “caterva” tag.
Bug reports¶
We use GitHub issues to track public bugs. Please ensure your description is clear and has sufficient instructions to be able to reproduce the issue. The ideal report should contain the following:
1. Summarize the problem: Include details about your goal, describe expected and actual results and include any error messages.
2. Describe what you’ve tried: Show what you’ve tried, tell us what you found and why it didn’t meet your needs.
3. Minimum reproducible example: Share the minimum amount of code needed to reproduce your issue. You can format the code nicely using markdown:
```python
import caterva as cat
...
```
4. Determine the environment: Indicates the python-caterva version and the operating system the code is running on.
Contributing to code¶
We actively welcome your code contributions. By contributing to python-caterva, you agree that your contributions will be licensed under the LICENSE file of the project.
Fork the repo¶
Make a fork of the python-caterva repository and clone it:
git clone https://github.com/<your-github-username>/python-caterva
Create your branch¶
Before you do any new work or submit a pull request, please open an issue on GitHub to report the bug or propose the feature you’d like to add.
Then create a new, separate branch for each piece of work you want to do.
Update docstrings¶
If you’ve changed APIs, update the involved docstrings using the doxygen format.
Run the test suite¶
If you have added code that needs to be tested, add the necessary tests and verify that all tests pass successfully.
Roadmap¶
This document lists the main goals for the upcoming python-caterva releases.
Features¶
Support for variable-length metalayers. This would provide users a lot of flexibility to define their own metadata
Resize array dimensions. This feature would allow Caterva to increase or decrease in size any dimension of the arrays.
Interoperability¶
Third-party integration. Caterva need better integration with libraries like:
xarray (labeled arrays)
dask (computation)
napari (visualization)
Release notes¶
Changes from 0.5.3 to 0.6.0¶
Provide wheels in PyPi.
Update caterva submodule to 0.5.0.
Changes from 0.5.1 to 0.5.3¶
Fix dependencies installation issue.
Changes from 0.5.0 to 0.5.1¶
Update setup.py and add pyproject.toml.
Changes from 0.4.2 to 0.5.0¶
Big c-core refactoring improving the slicing performance.
Implement __setitem__ method for arrays to allow to update the values of the arrays.
Use Blosc special-constructors to initialize the arrays.
Improve the buffer and array protocols.
Remove the data type support in order to simplify the library.
Changes from 0.4.1 to 0.4.2¶
Add files in MANIFEST.in.
Changes from 0.4.0 to 0.4.1¶
Fix invalid values for classifiers defined in setup.py.
Changes from 0.3.0 to 0.4.0¶
Compile the package using scikit-build.
Introduce a second level of multidimensional chunking.
Complete API renaming.
Support the buffer protocol and the numpy array protocol.
Generalize the slicing.
Make cat4py independent of numpy.
Changes from 0.2.3 to 0.3.0¶
Set the development status to alpha.
Add instructions about installing cat4py from pip.
getitem and setitem are now special methods in ext.Container.
Add new class from numpy arrays NPArray.
Support for serializing/deserializing Containers to/from serialized frames (bytes).
The pshape is calculated automatically if is None.
Add a .sframe attribute for the serialized frame.
Big refactor for more consistent inheritance among classes.
The from_numpy() function always return a NPArray now.
Changes from 0.2.2 to 0.2.3¶
Rename MANINFEST.in for MANIFEST.in.
Fix the list of available cnames.
Changes from 0.2.1 to 0.2.2¶
Added a MANIFEST.in for including all C-Blosc2 and Caterva sources in package.
Changes from 0.1.1 to 0.2.1¶
Docstrings has been added. In addition, the documentation can be found at: https://cat4py.readthedocs.io.
Add a copy parameter to from_file().
complib has been renamed to cname for compatibility with blosc-powered packages.
The use of an itemsize different than a 2 power is allowed now.