Releases · Blosc/python-blosc2

17 Jun 10:42

v4.5.1

1504963

Release 4.5.1 Latest

Latest

Changes from 4.5.0 to 4.5.1

This follow-up release builds the b2view terminal viewer into a richer
data-exploration tool — a scatter plot, a searchable column picker, a
one-shot demo download, refreshed chrome, and several interaction fixes — and
upgrades the bundled C-Blosc2 to 3.1.4. WASM/Pyodide is now a fully
supported platform, and CTable.info reports per-column compressed sizes.

b2view: richer exploration

Scatter plots: from a column plot, press s to scatter the current column
(X) against another column (Y) chosen from a list, over the current (zoomed)
row range; h then opens a high-resolution matplotlib scatter.
High-res for 1-D series is now an envelope plot (matching the in-terminal
view), and a new r key toggles between the min/max envelope and the raw
values (strided-sampled when the range is wide).
Searchable column picker: the c go-to-column key now opens a searchable,
selectable list (type to filter, ↑/↓, Enter) for CTables, instead of a text
field; N-D arrays still go by numeric index.
Show/hide columns: / opens a searchable multi-select to pick which CTable
columns are displayed.
Demo download: b2view --download fetches a demo bundle
(chicago-taxi-flat.b2z by default) into the current directory if it is not
already there, then opens it.
Refreshed chrome: a branded header, a left-docked filename label in the
title, and clearer status chips.

b2view: interaction fixes

Go-to-row/column pre-fill is now pre-selected, so the first keystroke
replaces the current index instead of appending to it (typing a column name no
longer produced e.g. 0payment.fare).
Escape keeps its layered exit while a panel is maximized: with the data
panel maximized, escape now unlocks a plot's locked row window (and clears
filters) as documented, instead of being hijacked into restoring the panel —
use r to restore (ESCAPE_TO_MINIMIZE = False).
Test-suite robustness fixes (a timing flake and a Windows rendering glitch).

Other

C-Blosc2 upgraded to 3.1.4.
WASM/Pyodide is now a fully supported platform, with more frequent CI runs.
CTable.info shows per-column compressed sizes (cbytes and cratio),
and print_versions() uses clearer Python-Blosc2 / C-Blosc2 labels.

Assets 2

15 Jun 12:03

FrancescAlted

v4.5.0

82faed2

Release 4.5.0

Changes from 4.4.5 to 4.5.0

This release teaches the b2view terminal viewer to plot — peak-preserving
envelope line plots of any series, with zoom, a row-window lock, and an optional
high-resolution matplotlib view — and gives CTable a pandas-like display and
CSV experience. It also publishes WASM/Pyodide wheels to PyPI and adds
faster strided reads for NDArray and Column.

b2view: plotting and data inspection

In-terminal plots: press p on a numeric series (a CTable column or an
array row) to draw a braille line plot. Plots are peak-preserving min/max
envelopes by default, so no spike or trough is hidden however large the
series is; large local series stream their envelope exactly in bounded
spans (only remote c2arrays fall back to a labeled strided sample).
Zoom and row-window lock: zoom the plot into a row range and pan it; press
v to lock the data grid to the plotted range so paging stays inside it
(escape unlocks). The plot and high-res views honor the locked window.
High-resolution view: h opens a high-res matplotlib image of the
plotted range (new optional hires extra: matplotlib + textual-image).
On-demand cell decode: enter decodes a single skipped/expensive CTable
cell, and SChunk nodes now preview as a paged hex dump.
Fixes and polish: row paging re-aligns to the page grid after dim-mode
single-row scrolls; the data panel now focuses correctly with
--path ... --panel data; status chips are branded yellow.

CTable display

CTable.to_string() now renders the whole table by default (every row and
every column), like pandas' DataFrame.to_string(). New max_rows and
max_width parameters truncate on demand. Behaviour change: previously
to_string() returned the truncated view; code that relied on that should
pass max_rows=/max_width= (or use str()).
The [N rows x M columns] dimensions footer now follows pandas: omitted by
to_string() (pass show_dimensions=True to force it), and shown by
str/repr/print only when the view is actually truncated. Previously it
was always appended.
repr(ctable) now shows the same truncated table as str(ctable)
(pandas/polars convention), instead of the one-line CTable<…> summary. The
compact summary remains available via ctable.info.
New display options in set_printoptions: display_width controls the
column-fitting width budget (None = auto-detect terminal, -1 = show all
columns, positive int = fixed budget), and display_rows now accepts -1 to
show all rows (0 still shows none).
New blosc2.printoptions(...) context manager temporarily sets the display
options and restores them on exit, e.g.
with blosc2.printoptions(display_rows=-1, display_width=-1): print(t).

CTable I/O

CTable.to_csv() now accepts no path, returning the CSV as a string like
pandas' DataFrame.to_csv(). Passing a path still writes the file (and
returns None); the returned string is byte-for-byte the same as the file.

Performance

Faster strided reads: NDArray.__getitem__ gains a sparse-gather fast
path for large strides, and Column.__getitem__ short-circuits when the
logical positions equal the physical ones.
Fix: a negative step in Column getitem could return []; it now
returns the reversed selection.

Indexing

Fix: a sidecar-handle cache collision could return the wrong SUMMARY
index for a compact-store column.
Cross-column index pruning is now enabled for compact CTable queries, so
more predicates prune blocks before any data is materialized. The docs also
note when summary indexes are not created automatically.

Packaging

WASM/Pyodide wheels on PyPI: the main wheel build now also produces
pyemscripten wheels for CPython 3.13 (2025 ABI) and 3.14 (2026 ABI) and
uploads them to PyPI, so blosc2 is micropip-installable in Pyodide, and
b2view prints a clear message instead of crashing when run under WASM.
Known limitation: slicing an in-memory SChunk loaded from a frame fails on
the Pyodide 0.29.x Emscripten toolchain (cp313); it works on Pyodide 314
(cp314) and natively. See issue #664.
cibuildwheel updated to 4.1.

Assets 2

12 Jun 15:25

FrancescAlted

v4.4.5

d4d8080

Release 4.4.5

Changes from 4.4.3 to 4.4.5

Note: 4.4.4 was skipped due to a failure during the release process.

This release promotes the b2view terminal viewer to a core feature —
installed by default, with new interactive row and column filtering — and
makes BatchArray block layouts (and hence compression ratios) reproducible
across CPUs.

b2view, the terminal data viewer

Installed by default: textual and rich are now regular
dependencies, so the b2view CLI works out of the box (the [tui] extra
is gone). A getting-started walkthrough was added to the docs, and the
README now lists the CLI tools.
Row filtering: pressing f on a CTable node opens a modal that takes
the same string expressions as CTable.where() (dotted nested names,
and/or) and pages through the matching view. Filters are remembered
per node for the session, the data header shows the active filter plus
the unfiltered total, and escape (or an empty expression) clears it.
Column filtering: / narrows the visible columns by case-insensitive
substring; column paging and the c goto-column modal then operate on
that subset. Combines freely with the row filter; escape clears one
layer per press (rows first, then columns).
Mouse handling: the terminal owns the mouse by default, so native
text selection/copy works like in any CLI program; --mouse lets b2view
capture it instead (click-to-focus, wheel scrolling by half a page,
paging at the edges).
Navigation: ? opens a help screen listing all keys; c jumps to a
column by index, exact name or unique name prefix; s/e jump to the
first/last column window; row paging and jumps keep the cursor on its
column; dim-mode index/viewport movements clamp at the boundaries instead
of wrapping around.
Rendering: column windows are fitted from measured rendered widths
(and re-fitted on terminal resize and panel maximize/restore), and float
columns use a uniform number of decimals so decimal points align down
the column.
Test suite: first automated tests for the TUI — Pilot-driven keyboard
journeys against a deterministic generated store (marker tui), plus
render unit tests. Skipped on wasm, where Textual apps cannot start
(no termios).

BatchArray

Reproducible block layouts: automatic variable-length block sizing
now uses fixed byte budgets (1 MiB for clevel 1-3, 8 MiB for 4-6, 16 MiB
for 7-8) instead of the CPU cache sizes, so the layout — and hence the
compression ratio — no longer depends on the machine that created the
array.

Build and docs

Installing test dependencies: the docs now use
pip install . --group test (a PEP 735 dependency group); the stale
[test] extra syntax was removed.
cibuildwheel updated to 4.0.

Assets 2

10 Jun 17:44

FrancescAlted

v4.4.3

3864a2d

Release 4.4.3

Changes from 4.4.2 to 4.4.3

This is a maintenance release focused on faster CTable cold-start, printing
and groupby performance, a lighter import blosc2, new raw-storage access
for columns, and support for the new J2K/HTJ2K codec plugins.

CTable performance

Lazy column opening in views: select() (and other view-producing
operations) no longer open every projected column up front. A column is
only opened from storage when the view actually reads it, so selecting and
then touching a subset of columns — or aggregating a single one — skips
the cold-start cost of the rest.
Lazy index opening in queries: query planning no longer opens every
SUMMARY-indexed column on a wide persistent table; only indexes for
columns actually referenced by the predicate are loaded.
Faster table printing: repr()/to_string() now memoise per-column
sparse gathers for the duration of a render and combine the head and tail
rows into a single sparse read per column. Each column is read from
storage once instead of ~6 times (precision detection, width sizing and
row rendering all hit the cache).
Groupby with integral float keys: float key columns whose values are
integral and fit a compact non-negative range (e.g. float32 id/second
columns) now take the dense single-key fast path instead of the markedly
slower generic float-hash path. Fractional or non-finite keys fall back
automatically.
No tempdir in read mode: opening a .b2z/.b2d store in 'r' mode
no longer creates a temporary working directory, since nothing is ever
written.

Lighter imports and prefetcher rework

asyncio dependency dropped: the on-disk chunk prefetcher used by the
UDF and numexpr fallback engines now uses plain concurrent.futures
instead of an asyncio event loop. import blosc2 no longer pulls in
~30 asyncio modules, saving ~3 MB of memory footprint at import time.
Prefetcher deadlock fixed: an exception during evaluation could leave
the generator finalizer blocked forever in thread.join() while the
reader thread was stuck on a full prefetch queue. A stop event now makes
the producer bail out when its consumer goes away.

New features

Column.raw accessor: returns the underlying storage container of a
column (NDArray, ListArray, DictionaryColumn, …) directly. Unlike
Column.__getitem__, which always materializes NumPy arrays, this is the
column as a blosc2-native compressed object — usable as a lazy-expression
operand without decompressing, and exposing storage details like schunk,
chunks or cparams. Note that this is a physical view: fixed-width
containers are over-allocated to chunk capacity, so slice to len(table)
to get just the live rows, and no validity-mask or null-sentinel
processing is applied. Raises AttributeError for computed columns,
which have no backing storage.
J2K and HTJ2K codec IDs: blosc2.Codec.J2K and blosc2.Codec.HTJ2K
expose the IDs for the new JPEG 2000 codec plugins (installable with
pip install blosc2-j2k and pip install blosc2-htj2k).

Fixes

--float-trunc-prec and nested columns: the precision-truncation
filter of the parquet_to_blosc2 CLI now propagates to float fields
inside nested (struct/list) columns too.
Guard for unsupported computed-column expressions: expressions that
would serialize to an empty or non-round-trippable string are now rejected
with an early, actionable ValueError at add_computed_column() time,
instead of silently breaking on reload.

Build

C-Blosc2 updated to 3.1.3.

Assets 2

04 Jun 16:10

FrancescAlted

v4.4.2

4b95cdb

Release 4.4.2

Changes from 4.4.1 to 4.4.2

This is a feature and maintenance release that promotes DSL kernels to
first-class CTable computed columns, adds a new CTable.__setitem__
assignment idiom, optimises bulk NDArray writes, and fixes several
correctness issues.

DSL kernels as first-class CTable columns

add_computed_column() accepts DSL kernels: @blosc2.dsl_kernel-decorated
functions can now back virtual computed columns directly, in addition to
the existing string-expression form. The column survives save/open
round-trips via persisted dsl_source.
add_generated_column() accepts DSL kernels: stored generated columns
(written during append/extend and on refresh_generated_column())
now support DSL kernels as their transformer.
CTable.where() accepts UDF/DSL kernels: filter predicates are no
longer limited to expression strings — any DSL kernel can be passed directly.
dtype inference for DSL kernels: when dtype is omitted,
lazyudf() infers the output dtype via NumPy type promotion of the input
column dtypes. Pass dtype explicitly for type-changing kernels
(comparisons, casts).
kernel_from_source() utility: new dsl_kernel.kernel_from_source()
reconstructs a DSLKernel from its stored source text, shared by the
CTable DSL-column loaders and the persisted LazyUDF decoder.
Security note: .b2d files from untrusted sources that contain DSL
computed columns execute stored Python source on open. A warning is now
included in the documentation.

New `CTable.setitem` column-assignment API

t["col"] = arr: new shorthand equivalent to t["col"][:] = arr.
Accepts any array-like including blosc2.NDArray. Raises KeyError for
unknown columns and ValueError for views or read-only tables.

Chunked NDArray writes in `extend()` and `Column.setitem`

extend({"col": ndarray}) decompresses chunk-by-chunk: when a
blosc2.NDArray is passed as a column value to extend(), it is now
written in chunks instead of being fully decompressed upfront. Pass
validate=False to avoid a transient full decompression during constraint
checking.
col[:] = blosc2_ndarray fast path: a new no-holes fast path in
Column.__setitem__ skips the O(n) validity-mask gather and writes the
NDArray one chunk at a time using contiguous slice writes. Works for both
scalar and fixed-shape ndarray columns. Falls back to a chunked fancy-index
path when deleted rows are present.

`BLOSC_ME_JIT` environment variable override

Full CLI override: BLOSC_ME_JIT now takes unconditional priority over
both the jit= and jit_backend= keyword arguments, making it easy to
switch JIT backends from the command line without modifying code.

Correctness fixes

View corruption in Column.__setitem__: a None == None guard
evaluation on view-backed columns could fire the NDArray fast path,
bypassing physical-position remapping and silently corrupting rows. Fixed
by explicitly checking base is None before activating the fast path.
CTable.__setitem__ view guard: the new t["col"] = arr API now
raises ValueError on views, matching the contract of all other mutating
CTable methods.
Fast path enabled for disk-opened tables: the fast path previously
remained dormant for tables opened from disk because _last_pos starts as
None. The guard now calls _resolve_last_pos() to lazily initialise it.
DSL column jit_backend preserved in _empty_copy: the jit_backend
setting was silently dropped during internal table copies; it is now
retained.
lazyexpr Column unwrapping: convert_inputs() now automatically
unwraps CTable.Column objects to their backing NDArray so that shape and
identity checks work correctly.

Documentation and examples

Parquet-to-blosc2 walkthrough: new step-by-step tutorial added to the
getting-started section. Thanks to @SyedIshmumAhnaf.
CTable performance tips: new section in the overview covering when to
prefer computed vs. generated columns, chunk sizing, and query optimisation.
Simplified docstring examples: examples throughout ndarray.py and
ctable.py now use blosc2.array(), blosc2.arange(), and
blosc2.linspace() directly instead of two-step numpy-then-asarray
patterns.
udf-computed-col.py example: new end-to-end example demonstrating DSL
kernel computed and generated columns.

Contributors

SyedIshmumAhnaf

Assets 2

03 Jun 05:12

FrancescAlted

v4.4.1

241704b

Release 4.4.1

Changes from 4.3.3 to 4.4.1

This is a feature release focused on a new interactive data viewer, automatic
SUMMARY indexes for fast WHERE queries, chunk-aligned Arrow/Parquet imports,
expanded where() acceleration via miniexpr, and a range of CTable ergonomics
and performance improvements. Python 3.10 support has been dropped; Python
3.11 is now the minimum.

b2view: interactive Text User Interface data viewer

New b2view command: a terminal-based interactive viewer for all
blosc2 containers — NDArray, CTable, SChunk, BatchArray, and more.
Launch it with b2view <file> or as blosc2.b2view() from Python.
Full 1-D and 2-D browsing: arrays with more than two dimensions are
sliceable along any axis; 1-D arrays are shown as a single-column table.
CTable navigation: scroll through rows with keyboard shortcuts; t/b
jump to the top/bottom; --panel jumps straight to a named panel on launch.
CTable.vlmeta panel: variable-length metadata is exposed in a dedicated
panel.
New dim mode: navigate along all dimensions freely for N-D arrays.

SUMMARY indexes for fast WHERE queries

Automatic SUMMARY index creation: when a CTable is closed after a
write session, SUMMARY indexes (per-block min/max) are built by default for
all eligible scalar columns with no extra configuration needed.
Incremental build during writes: indexes are accumulated block-by-block
during extend() and Arrow import, so closing the table costs almost nothing
beyond the write already done.
Block-skip prefilter: the miniexpr prefilter uses SUMMARY bitmaps to skip
entire blocks whose min/max range cannot satisfy the WHERE predicate, reducing
decompression work for selective queries.
Conjunction support: per-column SUMMARY block masks are combined with
bitwise AND so multi-column conjunctions prune blocks efficiently.
Cost gate: a cost model guards index use; the SUMMARY path is skipped when
block skipping is unlikely to help (e.g. very low selectivity).
--no-summary-index: new CLI flag for parquet-to-blosc2 to disable
automatic index creation on import.

CTable column grid alignment

Shared chunk/block grid for scalar columns: fixed-size columns are now
written on a shared chunk/block grid derived from the numeric column widths,
so all columns have identical chunk boundaries. This makes multi-column
SUMMARY scans and chunk-parallel reads significantly faster.
Chunk-aligned Arrow import: incoming Arrow/Parquet batches are buffered
and flushed in exact chunk-sized blocks, so each chunk is compressed exactly
once instead of being split across batch boundaries.
Vectorized dictionary-column import: dictionary codes are now written in
bulk at full chunk capacity rather than element by element.
Small fixed strings on the grid: fixed-length string columns narrow enough
to share the numeric grid are admitted to it, reducing the number of distinct
chunk sizes.
--reduce-mem: new CLI option for parquet-to-blosc2 to cap the Arrow
read-batch size on nested list<struct> imports, keeping peak RSS low at a
modest speed cost.

CTable.copy() enhancements

C-level bulk copy for ListArray and BatchArray: a new chunk_copy()
method transfers pre-compressed chunks directly at the C level, bypassing
Python-level serialization and recompression. CTable.copy() uses this path
automatically.
chunks= / blocks= overrides in CTable.copy(): callers can now
specify target chunk and block sizes for the output copy.
cparams and blocks overrides: CTable.copy() accepts cparams and
blocks to recompress the copy with different settings.
--chunks / --blocks added to the parquet-to-blosc2 CLI.

Take/gather APIs

Added NDArray.take() following Array API take shape semantics, including
axis=None flattening and N-dimensional integer indices. One-dimensional
gathers use a new sparse C-level path (b2nd_get_sparse_cbuffer) internally.
Extended top-level blosc2.take() to dispatch to NDArray.take(),
CTable.take(), and Column.take() while preserving the input container
type.
Added CTable.take() and Column.take() for logical row/value gathers that
preserve order and duplicate indices, unlike mask-based views.
For ndim > 1 axis-based take, orthogonal selection is used internally for
better performance.

where() and miniexpr acceleration

where(cond, x) via miniexpr: the single-argument where (fill-with-zero
variant) is now handled directly by the miniexpr engine when the condition is
a boolean array, avoiding a numexpr round-trip.
where(cond, x, y) via miniexpr: the two-argument flavor is likewise
dispatched to miniexpr for element-wise conditional selection.
Sparse boolean mask fast path: when a boolean indexing result is very
sparse (high selectivity), auto-detection switches to a fast gather path
instead of a full-array scan.
Early boolean key check: NDArray.__getitem__ with a boolean array key
now detects it before the general process_key / nonzero path, avoiding
wasted work.
Compressed transient masks: temporary boolean masks created during
queries are now stored as LZ4-compressed blosc2 arrays, reducing memory
pressure without measurable speed regression.
BLOSC_ME_JIT / BLOSC_ME_JIT_TRACE: new environment variables to
control and trace the miniexpr JIT backend at runtime.

CTable views and lazy sorting

sort_by() on a view is now lazy: calling sort_by() on a filtered view
returns a position-reordered view without materializing data; the sort
positions are cached and used directly on column access.
Lazy column materialization in filtered views: select() on a view no
longer materializes unneeded columns eagerly; columns are resolved only when
accessed.

NestedColumn and .info improvements

NestedColumn public class: the previously internal
_NestedColumnNamespace has been renamed and promoted to NestedColumn,
providing aggregate metadata (col_names, nrows, nbytes, cbytes,
cratio) and a structured .info report over a group of dotted columns.
Uniform .info across containers: Column.info, CTable.info,
NestedColumn.info, and related classes now follow a consistent field order
(identity → shape/grid → sizes → content → compression params).

Context manager support for blosc2.open()

All objects returned by blosc2.open() — NDArray, SChunk, CTable,
BatchArray, ListArray, and stores — now support the with statement.
The __exit__ method flushes and closes the underlying storage.

Performance improvements and fixes

CTable.nrows stored persistently: row counts are written to metadata on
close and read back on open, avoiding a full column scan at startup.
Index sidecar loading from .b2z: SUMMARY/BUCKET sidecars inside .b2z
archives are now read in-place rather than extracted to a temporary directory,
cutting open latency for indexed tables.
Compressed query cache: the hot query-result cache is now stored
LZ4-compressed, reducing its memory footprint with negligible overhead.
Query cache consistency fixes: on-disk query cache side effects and a
miniexpr chunk-cache race condition on Apple Silicon have been resolved.
macOS L2 floor for chunk sizing: on macOS the full L2 cache is used as a
floor for automatic chunk sizing, giving better compression/speed trade-offs.
Better Apple Silicon L3 handling: missing L3 cache on Apple Silicon is
handled more gracefully in the cache-size heuristic.
Table capacity management: large CTables grow more conservatively, and
capacity is trimmed on close and after Arrow import to reclaim over-allocated
space.
Faster iteration with iterchunks_info(): several hot loops switched to
iterchunks_info() for lower overhead per chunk.
Cost-model index refinement threshold: the previously hardcoded threshold
for switching between index and scan has been replaced with a data-driven cost
model.
Index prefetch reuse: data already prefetched during an index lookup is
reused in the refinement phase, avoiding redundant I/O.
Simplify index sidecar filenames in _indexes/{col}/ directories.
DictStore embed disabled by default: embedding a store inside a dict
store is now opt-in (it was error-prone as the default).
Fixed wasm32 issue: a 32-bit platform arithmetic fix for reduce operations.
Chunks never exceed array dimension: compute_chunks_blocks now
guarantees chunk dimensions are capped at the array shape dimension.
max_rows robust to older PyArrow: truncation logic no longer depends on
PyArrow APIs that are absent in older releases.
cratio display: compression ratio is now shown with an explicit x
suffix (e.g. 2.47x) throughout .info output.
Updated bundled C-Blosc2 to the latest release.

Dropped Python 3.10 support

Python 3.11 is now the minimum supported version.

Assets 2

21 May 12:10

FrancescAlted

v4.3.3

8c0e3ad

Release 4.3.3

Changes from 4.3.1 to 4.3.3

note: 4.3.2 was an internal pre-release that was not published to PyPI.

This is a maintenance release focused on CTable display ergonomics, indexed-query
correctness, and query-planner performance.

CTable display and print options

Pandas-like CTable display by default: str(table) / print(table) now use
a compact, pandas/DuckDB-style table representation, including a displayed
logical row index, numeric alignment, compact spacing, and a trailing footer
such as [726017 rows x 5 columns].
Configurable display options: added blosc2.set_printoptions() and
blosc2.get_printoptions() for CTable rendering. The supported options are
display_index, display_rows, display_precision, and fancy.
CTable.to_string(): added a one-off formatting API for producing CTable
string representations without changing global print options.
Compact truncation for large tables: when a table exceeds the configured
display_rows threshold, only the first five and last five rows are shown,
with an ellipsis row in between.
Float display refinements: compact mode uses pandas-like fixed precision
for floating-point columns, and integer-valued float columns are displayed
with a single decimal place.
Fancy display preserved: set_printoptions(fancy=True) restores the more
decorated display with dtype rows, separator rules, and hidden row/column
counts.

Indexed queries and sorting

Cross-column exact index refinement: multi-column conjunctions can now use
exact positions from a selective indexed column (FULL, PARTIAL, or OPSI)
as a compact pre-filter, then refine the remaining predicates on those
positions instead of scanning the full table.
NaN-safe index boundary navigation: fixed sorted-boundary navigation for
floating-point indexes containing NaN values, so indexed results match scan
results for bucket/full index lookups.
Better index-planner heuristics: the planner now avoids low-value indexed
paths when segment pruning is unlikely to help, and avoids expensive scalar
specialization for non-scalar arrays.
Faster filtered sorting: small filtered views can be materialized and
sorted directly, avoiding an extra gather of sort keys.

Performance and fixes

Avoid full materialization of valid_rows in several CTable code paths.
Keep row counts lazy for views and avoid unnecessary nrows calls in the
query planner.
Reduced overhead in root-column iteration and small query-planner operations.
Fixed dictionary-column capacity handling during Arrow import and a regression
affecting dictionary columns.
Marked additional long-running tests as heavy to reduce default test-suite
runtime.

Documentation

Updated the containers tutorial with dedicated ListArray and CTable
sections, including CTable's columnar storage model and support for columns
backed by NDArray, BatchArray, ObjectArray, ListArray, and related
containers.

Assets 2

19 May 17:38

FrancescAlted

v4.3.1

29a2da1

Release 4.3.1

Changes from 4.3.0 to 4.3.1

This is a maintenance release focused on CTable nested-column ergonomics,
grouped reductions, and API/documentation polish.

CTable nested columns and grouped reductions

Nested column names in group_by() results: grouped output columns can now
preserve dotted/nested names such as trip.sec instead of requiring valid
Python identifiers.
Column-object selectors: CTable.group_by() and CTable.sort_by() now
accept Column objects as well as string names, enabling idioms such as
t.group_by(t.trip.sec) and t.sort_by(t.trip.sec).
Grouped arg reductions: CTableGroupBy now supports argmin() and
argmax(), plus agg({"col": "argmin"}) / agg({"col": "argmax"}).
Results are logical row positions in the grouped table or view; groups with no
non-null values return -1.

NDArray constructor ergonomics

blosc2.array(): added a NumPy-like constructor for NDArrays. It mirrors
blosc2.asarray() but defaults to copy=True, so passing an existing
NDArray creates a copy unless copy=False or copy=None is requested.

Documentation

Expanded the CTable reference with RowTransformer, Column.row_transformer,
and CTableGroupBy.argmin / argmax documentation.
Added blosc2.ndarray(), blosc2.dictionary(), and related public schema
factory functions to the Schema Specs reference.
Moved blosc2.group_reduce() into the Reduction Functions reference and
updated its example to use Blosc2 NDArrays.

Assets 2

18 May 16:55

FrancescAlted

v4.3.0

a682951

Release 4.3.0

Changes from 4.2.0 to 4.3.0

CTable: N-dimensional (ndarray) columns

Multidimensional columns: CTable columns can now hold NDArray-backed cells, allowing
each row of a column to contain a full n-dimensional compressed array. This enables
use cases such as embedding vectors, image patches, time-series windows, or any other
multidimensional per-row payload.
CSV and DataFrame import/export: Multidimensional column data can be imported and
exported via CSV and pandas DataFrames, with automatic detection of array-valued cells.
Nullable ndarray columns: Multidimensional columns fully support the nullable
semantics (null_count, sentinel handling, null_policy) already available for scalar
columns.
from_pandas() improvements: CTable.from_pandas() now creates the correct
specialized backing storage for DictionarySpec, ListSpec, VLStringSpec,
VLBytesSpec, and other variable-length scalar specifications.
Improved schema coverage: New CTable timestamp schema type and extended
Column.info output with shape, chunks, and blocks descriptors.
Arg reductions: Added argmin() and argmax() for scalar and ndarray
CTable columns, plus row-transformer support for generated columns such as
per-row peak-hour or dominant-embedding-dimension features.

CTable: Group-by and filtered aggregation

CTable.group_by(): The primary group-by interface. Call
t.group_by("city", sort=True).agg({"qty": "mean"}) to produce a new
:class:CTable with aggregated results. Single-key and multi-key groupings are
supported, along with convenience methods such as .size(), .count(),
.sum(), .mean(), .min() and .max():

.. code-block:: python
```
by_city = t.group_by("city", sort=True)
by_city.size()  # COUNT(*)
by_city.sum("sales")  # SUM(sales) per city
by_city.agg({"sales": ["sum", "mean"]})  # SUM(sales), AVG(sales) per city
```
Performance accelerators: Dedicated Cython fast paths deliver significant speedups:
~25× for float32/64 group-by keys, ~8× for integer and dictionary-code keys, and a
general-purpose hash table for arbitrary float keys.
Filtered aggregate pushdown: The where= parameter is now accepted in aggregation
methods, pushing the filter into the compute engine so that only matching rows are
read and reduced.
Persistent grouped output: Group-by results can be saved directly to persistent
storage via the urlpath= parameter.
blosc2.group_reduce(): New public function that performs group-by reduction over
NDArray instances and CTable columns, with Cython-accelerated backends for common
key/reduction combinations.

CTable: Dictionary / categorical columns

DictionarySpec column type: Introduced a new dictionary-encoded (categorical)
column type that stores string or integer codes mapped to a shared dictionary, providing
compact storage and accelerated equality and membership queries.
Dictionary types in where clauses: Dictionary columns can be queried with the same
where= expression syntax as other column types, including nested dotted-name access.
Improved display: CTable printing now adapts to the terminal width, and dictionary
values are shown in their decoded form. Column.info has been extended with type
details, shape, chunks, and blocks.

CTable: Nested columns and field-name escaping

Dotted nested column access: Columns whose names contain literal .
(e.g., "root.nested") are now fully addressable via the dotted accessor syntax in
where expressions, __getitem__, and the public API.
Hierarchical _cols storage paths: The internal column storage layout now preserves
a hierarchical structure that mirrors the logical nesting, improving introspection
and interop.
Nested-field pipeline: A new flattened-storage pipeline with logical mapping
preserves nested schema structure (field names, types, and hierarchy) through
Arrow and Parquet import/export. For unnamed top-level list<struct<...>> Parquet
files, the logical schema round-trips faithfully, though the original physical row
grouping is intentionally not preserved.
Field-name escaping: Special characters (. and /) in column names are
automatically escaped during schema construction and metadata round-trips.

Parquet import/export improvements

Arrow serializer by default: CTable.from_parquet() now defaults to the Arrow
serializer, providing better schema fidelity and nested-type support.
Progress reporting: A --progress flag and an ETA estimator have been added to
the parquet-to-blosc2 CLI for long-running imports.
--max-rows parameter: CTable.from_parquet() and the CLI now accept max_rows
to limit the number of imported rows.
--timestamp-unit: New CLI option to control timestamp unit conversion on import.
--float-trunc-prec: New CLI option to truncate floating-point precision on import.
Separated nested columns enabled by default: The separate_nested_cols flag is now
True by default for both the Python API and the CLI, ensuring nested Arrow structs
are always expanded into flat columns.
list_serializer parameter: New option to control how list-type columns are
serialized, with sensible defaults for different list layouts.
Validation optimizations: Arrow datetime values are validated only during import,
reducing runtime overhead on subsequent operations.

TreeStore: Inline CTable support

CTables inside TreeStore: CTable objects can now be stored inline as items
inside a TreeStore, enabling hierarchical storage that mixes arrays and tables in a
single persistent container.
Cache hardening: TreeStore cache assignments now use defensive copies and cache
effective object roots to avoid aliasing and stale-cache errors.
Examples and tutorials: New tutorials and docstring examples demonstrate how to
store, retrieve, and query CTables within a TreeStore.

Performance and usability enhancements

Faster open and import: blosc2.open() and store constructors now assume valid
file extensions and defer column metainfo loading, making CTable.open() and
package import noticeably faster.
CTable.nrows is now lazy: The row count is computed on demand rather than eagerly,
speeding up open and schema-inspection workflows.
Accelerated scalar and small-slice access: The batch/list path for reading scalar
values or small column slices has been overhauled, eliminating internal placeholder
materialization and yielding lower latency.
Late-import optimizations: Heavy optional dependencies are imported lazily at the
blosc2 package level, reducing the baseline import blosc2 overhead.
iter_arrow_batches() optimization: Avoids full Python object materialization of
batches during iteration, reducing memory pressure.
NDArray-to-list conversion: Small optimization when converting NDArray objects
to Python lists.
_last_pos invalidation skipped: Mid-table deletes no longer eagerly invalidate
cached positional state, improving delete latency.

Documentation, examples and benchmarks

API reference expanded: blosc2.group_reduce() has been added to the Sphinx
reference, along with updated CTable, Column, and TreeStore pages.
New tutorials and examples: Added sections on CTable–TreeStore integration,
nested fields, dictionary columns, aggregates, grouping and querying with where=.
New benchmarks: Graph benchmarks for CTable insert time, column count, memory usage,
and where= queries, plus dedicated group-by, nested-filter, and Parquet round-trip
benchmarks.

Fixes and compatibility

Null and NaN handling: NumPy scalar null sentinels are now normalized to plain Python
scalars, and floating-point NaN sentinels are treated consistently with Python
float('nan').
Empty aggregate results: Filtered aggregations that produce no rows now handle the
empty result gracefully.
Generated column safety: Accessing a stalled (unfillable) generated column now raises
a clear exception instead of producing undefined results.
Miniexpr bundling: Miniexpr’s bundled libtcc and related runtime files are now
kept inside the blosc2 package, avoiding conflicts with other TCC installations.
Test improvements: Torch-dependent tests are marked as heavy, PyArrow-optional
tests are skipped when the library is absent, and parametrization matrices have been
trimmed to reduce CI time.
Missing Cython validation: Added validation guards for several Cython extension
functions that previously lacked explicit error checking.
C-Blosc2 update: Bundled C-Blosc2 has been updated to the latest version (3.0.3).
blosc2.open() default mode changed from 'a' to 'r': Removed the FutureWarning that
was added to prepare for this transition.

Assets 2

07 May 11:38

FrancescAlted

v4.2.0

81a9c09

Release 4.2.0

Changes from 4.1.2 to 4.2.0

CTable: columnar compressed tables

Introduced blosc2.CTable, a new columnar table container for compressed, typed columns. CTables support dataclass- and schema-based construction, row iteration, column access, table views, head() / tail() / sample(), sorting, selection and compact where expressions.
Added persistent CTables backed by TreeStore, with support for blosc2.open(), CTable.open(), CTable.load(), CTable.save(), CTable.to_b2d() and CTable.to_b2z(). CTable views can be saved too, and .b2z/.b2d path handling has been tightened.
Added mutation operations for CTables, including append(), extend(), delete(), compact(), add_column(), drop_column(), rename_column() and related schema validation.
Added computed columns, including virtual computed columns backed by lazy expressions, materialized computed columns and automatic filling of materialized computed columns during inserts.
Added CTable indexing support, including persistent indexes, direct expression indexes, ordered index reuse, boolean LazyExpr/NDArray masks in CTable.__getitem__, iter_sorted() and indexing support for .b2z tables.
Added nullable schema support and null policies for CTable scalar columns, preserving nullable scalar Parquet round-trips.
Added variable-length CTable column support via ListArray / ObjectArray, including vlstring and vlbytes schema specs, fixed-length string/bytes import support and list/struct Arrow/Parquet round-trips.
Added Arrow, Parquet and CSV interoperability for CTables, including batch-wise Arrow/Parquet import/export, Arrow schema metadata preservation, CTable.from_arrow_batches() improvements and a new parquet-to-blosc2 CLI utility.
Added CTable documentation, tutorials, examples and benchmarks covering schema definition, persistence, querying, indexing, mutations, nullable columns, computed columns and variable-length columns.

Indexing and ordering

Added a new indexing subsystem for NDArrays and CTables, including full, partial/bucket, light/medium and OPSI-style index kinds, out-of-core index builders and sidecar storage.
Added blosc2.Index as the unified public index handle, plus APIs such as create_index(), compact_index(), iter_sorted(), will_use_index() and related query explanation support.
Added materialized expression indexes for NDArrays and direct expression indexes for CTables.
Added persistent query-result caching for indexed lookups, with FIFO pruning and cache accounting.
Added blosc2.argsort() and refactored indexing APIs around explicit index enums and sorting helpers.
Improved indexed query performance with Cython accelerators, threaded chunk batching, zero-copy/cached mmap reads, chunk-aware and reduced-order layouts and faster scattered row gathering.
Reduced memory usage during index creation and lookup by avoiding full sidecar materialization, replacing memmap staging with Blosc2 scratch arrays and adding tmpdir support for full out-of-core indexes.

Persistence, stores and serialization

Added structured Blosc2 serialization based on b2object carriers, including persisted C2Array, LazyExpr and DSL LazyUDF objects.
Added blosc2.Ref for serializing external references, plus examples for b2object bundles and persisted expressions/UDFs.
Added blosc2.load() as a convenience loader.
Added vlmeta support to LazyArray objects.
Improved store handling by preserving lazy b2object carriers in DictStore, allowing reopened proxies to refill caches after read-only opens, relaxing DictStore/TreeStore suffix requirements and adding DictStore.to_b2d().
Accelerated blosc2.open() by trying standard opens first and warning on implicit append mode.

Arrays, computation and containers

Added ObjectArray for fully general object data and renamed the earlier VLArray work accordingly; added ListArray docstrings and Arrow integration improvements.
Added schema helpers including numeric specs, blosc2.struct() and blosc2.object() for nested/fully general column declarations.
Improved fromiter() with direct chunked construction and substantially lower peak memory use.
Improved asarray() behavior for NDArray inputs when copy-inducing keyword arguments are supplied.
Added SChunk.reorder_offsets().
Improved BatchArray defaults and documentation; the default compression level is now tuned for faster lookup/scan behavior.
Continued matmul/linalg optimization work and shared-thread-pool integration.

CLI, docs and examples

Added the parquet-to-blosc2 command with options such as --max-rows, --parquet-batch-size, --blosc2-items-per-block and --use-dict.
Added new CTable, ObjectArray, BatchArray, containers, indexing and serialization tutorials and examples.
Reorganized and expanded the API reference for CTable, Column, schema specs, Index, save/load helpers and miscellaneous APIs.
Updated benchmark suites for CTables, indexing, Parquet import/export, BatchArray and NDArray construction/indexing.

Fixes and compatibility

Updated bundled C-Blosc2 to v3.0.2 and require C-Blosc2 >= 3.0.0 when building against a system library.
Updated bundled C-Blosc2 and miniexpr sources multiple times.
Restored compatibility with NumPy < 2.
Fixed Windows and mmap/file-locking issues in index creation, rebuilds and temporary file cleanup.
Fixed full-index query failures for large CTable columns and full out-of-core merge failures on systems with small /tmp.
Fixed stale sidecar/cache reuse and targeted cache invalidation when persistent sidecars are replaced.
Fixed .b2z double-open corruption caused by GC-triggered repacking and made temporary .b2z unpacking default to the source file directory.
Fixed a regression when reopening persisted proxies in read-only mode.
Fixed GC-induced thread hangs on macOS with Python 3.14 and hardened async chunk reading/cache cleanup paths.
Fixed lazy-chunk source-size handling in decode/getitem callers.
Fixed nullable validation, dictionary extend validation, CTable close propagation, print alignment and NumPy mask support.
Fixed arange() regressions and several pre-existing set_slice error-handling issues.
Clamped indexing/thread defaults for wasm32.

Assets 2

Uh oh!

Releases: Blosc/python-blosc2

Release 4.5.1

Changes from 4.5.0 to 4.5.1

b2view: richer exploration

b2view: interaction fixes

Other

Uh oh!

Release 4.5.0

Changes from 4.4.5 to 4.5.0

b2view: plotting and data inspection

CTable display

CTable I/O

Performance

Indexing

Packaging

Uh oh!

Release 4.4.5

Changes from 4.4.3 to 4.4.5

b2view, the terminal data viewer

BatchArray

Build and docs

Uh oh!

Release 4.4.3

Changes from 4.4.2 to 4.4.3

CTable performance

Lighter imports and prefetcher rework

New features

Fixes

Build

Uh oh!

Release 4.4.2

Changes from 4.4.1 to 4.4.2

DSL kernels as first-class CTable columns

New CTable.__setitem__ column-assignment API

Chunked NDArray writes in extend() and Column.__setitem__

BLOSC_ME_JIT environment variable override

Correctness fixes

Documentation and examples

Contributors

Uh oh!

Release 4.4.1

Changes from 4.3.3 to 4.4.1

b2view: interactive Text User Interface data viewer

SUMMARY indexes for fast WHERE queries

CTable column grid alignment

CTable.copy() enhancements

Take/gather APIs

where() and miniexpr acceleration

CTable views and lazy sorting

NestedColumn and .info improvements

Context manager support for blosc2.open()

Performance improvements and fixes

Dropped Python 3.10 support

Uh oh!

Release 4.3.3

Changes from 4.3.1 to 4.3.3

CTable display and print options

Indexed queries and sorting

Performance and fixes

Documentation

Uh oh!

Release 4.3.1

Changes from 4.3.0 to 4.3.1

CTable nested columns and grouped reductions

NDArray constructor ergonomics

Documentation

Uh oh!

Release 4.3.0

Changes from 4.2.0 to 4.3.0

CTable: N-dimensional (ndarray) columns

CTable: Group-by and filtered aggregation

CTable: Dictionary / categorical columns

CTable: Nested columns and field-name escaping

Parquet import/export improvements

TreeStore: Inline CTable support

Performance and usability enhancements

Documentation, examples and benchmarks

Fixes and compatibility

Uh oh!

New `CTable.setitem` column-assignment API

Chunked NDArray writes in `extend()` and `Column.setitem`

`BLOSC_ME_JIT` environment variable override