Tutorial¶
Caterva functions let users to perform different operations with Caterva arrays like setting, copying or slicing them. In this section, we are going to see how to create and manipulate a Caterva array in a simple way.
import caterva as cat
cat.__version__
'0.7.2'
Creating an array¶
First, we create an array, with zero being used as the default value for uninitialized portions of the array.
c = cat.zeros((10000, 10000), itemsize=4, chunks=(1000, 1000), blocks=(100, 100))
c
<caterva.ndarray.NDArray at 0x7f35bdc0c410>
Reading and writing data¶
We can access and edit Caterva arrays using NumPy.
import struct
import numpy as np
dtype = np.int32
c[0, :] = np.arange(10000, dtype=dtype)
c[:, 0] = np.arange(10000, dtype=dtype)
c[0, 0]
array(b'', dtype='|S4')
np.array(c[0, 0]).view(dtype)
array(0, dtype=int32)
np.array(c[0, -1]).view(dtype)
array(9999, dtype=int32)
np.array(c[0, :]).view(dtype)
array([ 0, 1, 2, ..., 9997, 9998, 9999], dtype=int32)
np.array(c[:, 0]).view(dtype)
array([ 0, 1, 2, ..., 9997, 9998, 9999], dtype=int32)
np.array(c[:]).view(dtype)
array([[ 0, 1, 2, ..., 9997, 9998, 9999],
[ 1, 0, 0, ..., 0, 0, 0],
[ 2, 0, 0, ..., 0, 0, 0],
...,
[9997, 0, 0, ..., 0, 0, 0],
[9998, 0, 0, ..., 0, 0, 0],
[9999, 0, 0, ..., 0, 0, 0]], dtype=int32)
Persistent data¶
When we create a Caterva array, we can we can specify where it will be stored. Then, we can access to this array whenever we want and it will still contain all the data as it is stored persistently.
c1 = cat.full((1000, 1000), fill_value=b"pepe", chunks=(100, 100), blocks=(50, 50),
urlpath="cat_tutorial.caterva")
c2 = cat.open("cat_tutorial.caterva")
c2.info
Type | NDArray |
---|---|
Itemsize | 4 |
Shape | (1000, 1000) |
Chunks | (100, 100) |
Blocks | (50, 50) |
Comp. codec | LZ4 |
Comp. level | 5 |
Comp. filters | [SHUFFLE] |
Comp. ratio | 588.24 |
np.array(c2[0, 20:30]).view("S4")
array([b'pepe', b'pepe', b'pepe', b'pepe', b'pepe', b'pepe', b'pepe',
b'pepe', b'pepe', b'pepe'], dtype='|S4')
import os
if os.path.exists("cat_tutorial.caterva"):
cat.remove("cat_tutorial.caterva")
Compression params¶
Here we can see how when we make a copy of a Caterva array we can change its compression parameters in an easy way.
b = np.arange(1000000).tobytes()
c1 = cat.from_buffer(b, shape=(1000, 1000), itemsize=8, chunks=(500, 10), blocks=(50, 10))
c1.info
Type | NDArray |
---|---|
Itemsize | 8 |
Shape | (1000, 1000) |
Chunks | (500, 10) |
Blocks | (50, 10) |
Comp. codec | LZ4 |
Comp. level | 5 |
Comp. filters | [SHUFFLE] |
Comp. ratio | 6.64 |
c2 = c1.copy(chunks=(500, 10), blocks=(50, 10),
codec=cat.Codec.ZSTD, clevel=9, filters=[cat.Filter.BITSHUFFLE])
c2.info
Type | NDArray |
---|---|
Itemsize | 8 |
Shape | (1000, 1000) |
Chunks | (500, 10) |
Blocks | (50, 10) |
Comp. codec | ZSTD |
Comp. level | 9 |
Comp. filters | [BITSHUFFLE] |
Comp. ratio | 20.83 |
Metalayers¶
Metalayers are small metadata for informing about the properties of data that is stored on a container. The metalayers of a Caterva array are also easy to access and edit by users.
from msgpack import packb, unpackb
meta = {
"dtype": packb("i8"),
"coords": packb([5.14, 23.])
}
c = cat.zeros((1000, 1000), 5, chunks=(100, 100), blocks=(50, 50), meta=meta)
len(c.meta)
3
c.meta.keys()
['caterva', 'dtype', 'coords']
for key in c.meta:
print(f"{key} -> {unpackb(c.meta[key])}")
caterva -> [0, 2, [1000, 1000], [100, 100], [50, 50]]
dtype -> i8
coords -> [5.14, 23.0]
c.meta["coords"] = packb([0., 23.])
for key in c.meta:
print(f"{key} -> {unpackb(c.meta[key])}")
caterva -> [0, 2, [1000, 1000], [100, 100], [50, 50]]
dtype -> i8
coords -> [0.0, 23.0]
Small tutorial¶
In this example it is shown how easy is to create a Caterva array from an image and how users can manipulate it using Caterva and Image functions.
from PIL import Image
im = Image.open("../_static/blosc-logo_128.png")
im
meta = {"dtype": b"|u1"}
c = cat.asarray(np.array(im), chunks=(50, 50, 4), blocks=(10, 10, 4), meta=meta)
c.info
Type | NDArray |
---|---|
Itemsize | 1 |
Shape | (70, 128, 4) |
Chunks | (50, 50, 4) |
Blocks | (10, 10, 4) |
Comp. codec | LZ4 |
Comp. level | 5 |
Comp. filters | [SHUFFLE] |
Comp. ratio | 4.31 |
im2 = c[15:55, 10:35] # Letter B
Image.fromarray(np.array(im2).view(c.meta["dtype"]))