Skip to content

Releases: dfa1/vortex-java

v0.11.0

Choose a tag to compare

@github-actions github-actions released this 28 Jun 12:27
Immutable release. Only release title and notes can be modified.

SQL over Vortex grows a compute layer: a Calcite WHERE-filtered SUM/COUNT/MIN/MAX is now answered from the zone-map statistics — folding the chunks the predicate fully selects without decoding them and decoding only the one or two chunks its range cuts through (ADR 0013 §6 / ADR 0018 boundary tier). On a 100-chunk file a SELECT SUM(x) WHERE id BETWEEN … answers ~12× faster than a full scan when the range is wide. Plus the security hardening of the untrusted-input parse paths (ADR 0003) and a vortex.zstd binding bump.

Added

  • VortexCalcite.connect(schemaName, tables) opens a Calcite JDBC connection with the VortexSchema registered in one call, folding away the DriverManager / unwrap / getRootSchema().add(...) boilerplate. It wires the Babel SQL parser, so columns whose names are reserved words (close, open, value, year, …) are queryable unquotedselect close, open from vtx.ohlc — which columnar files routinely need; only the typed-literal keywords (date, time, timestamp, interval) still require back-tick quoting. (24b64b32)
  • The Calcite aggregate push-down rule now auto-registers over a bare jdbc:calcite: connection: a VortexTable translates to a VortexTableScan whose register() installs the rules when the planner first sees it, so SELECT MIN/MAX/COUNT/SUM over a VortexSchema is answered from zone-map statistics with no caller wiring (previously the rule had to be attached to the planner by hand). AVG joins them — AggregateReduceFunctionsRule reduces it to SUM/COUNT, both of which push down, so a whole-table AVG also decodes no data segment. Column projection and WHERE chunk-skip push-down are unchanged. (24b64b32)
  • The Calcite aggregate push-down rule now rewrites a whole-table SUM(col) to a single-row Values computed from the zone-map table (via VortexTable.zoneSum), so SELECT SUM(col) answers metadata-only with no data segment decoded — joining the existing MIN/MAX/COUNT push-down. SUM over an all-null (or empty) column answers the SQL NULL, and the rule abandons to a normal scan when a zone carries no usable sum (no zone map, or an overflowed zone). (24b64b32)
  • A Calcite WHERE-filtered SUM/COUNT/MIN/MAX is now answered from zone-map statistics when the predicate partitions the chunks cleanly — each chunk wholly matches or wholly fails it — folding only the kept chunks' stats into a scan-free result. A predicate that cuts through a chunk (the typical selective range) still falls back to the zone-map-pruned scan. (32cc4a29)
  • A WHERE-filtered aggregate whose range cuts through a chunk no longer falls back to a full scan: the interior chunks still fold from the zone-map stats, and each straddling boundary chunk is decoded on its own and reduced under a row-level filter, so SELECT SUM(volume) WHERE close BETWEEN 100 AND 200 decodes only the one or two boundary chunks instead of every surviving chunk. The push-down still abandons to the (correct) scan for an unsigned or floating-point filter column, a non-numeric SUM, or a missing zone map. (f89b5b69)
  • VortexReader.decodeChunk(chunkIndex, columns) and chunkCount() decode a single chunk for a chosen subset of columns in isolation, rather than streaming the whole file — the returned Chunk owns its memory and is valid until closed. (084a0133)
  • ScanIterator.columnZoneStats(column) surfaces per-zone min/max/sum/null-count from a column's vortex.stats zone-map table without decoding any data segment — the read side of aggregate push-down (ADR 0013 §6). ArrayStats gains a sum component, decoded from the zone-map table (where the Rust reference stores it too), so the Calcite adapter now answers SUM/AVG metadata-only when every zone carries a sum, falling back to a streaming scan only for columns without a zone map. (05dd9204)

Changed

  • Bumped io.github.dfa1.zstd (the vortex.zstd FFM bindings, pinned by the BOM) 0.3 → 0.6, which ships smaller jars (native debug symbols stripped). (677c2cf7, 6dcdbe94, fec0a0d3)
  • Bumped Apache Calcite (the SQL adapter's engine) 1.40 → 1.42. (2f9f02c6)

Security

  • DType-tree and array-node decoding are now depth-capped (64, matching the layout-tree guard): a crafted or self-referential FlatBuffer surfaces as a VortexException instead of a StackOverflowError — which, being an Error, previously escaped sanitization and leaked the reader's memory-mapped Arena. (93f8d5f4, 428026d3)
  • The HTTP reader validates footer segmentSpecs against the file size before any Range request is built from them, matching the local-file path. (1d8ddebc)
  • vortex.zstd decode bounds-checks each frame's declared uncompressed size and overflow-checks the total before allocating, and range-checks VarBin length prefixes — a crafted payload can no longer under-allocate or read out of bounds. (2df4e3a7, adc445e8)
  • The HTTP reader parses the server-controlled Content-Range header and slices the tail buffer defensively, so a malformed response yields a VortexException rather than a raw NumberFormatException/IndexOutOfBoundsException. (feac99b7)

v0.10.0

Choose a tag to compare

@github-actions github-actions released this 26 Jun 18:33
Immutable release. Only release title and notes can be modified.

A vortex.zstd overhaul: compression now runs through FFM bindings to the native libzstd, gaining framed (sliceable) payloads, nullable-column support, and shared-dictionary decode. Alongside it, zone-map pruning is fixed to compare in the column's type domain.

Added

  • DType.isUnsigned()true for the unsigned integer primitives (U8U64), false otherwise. (#159)
  • The vortex.zstd encoder now writes nullable columns (primitive and utf8/binary): null positions are stripped before compression and validity is emitted as a Bool child, matching the Rust reference layout. When vortex.zstd is the configured encoder, nullable primitive columns route to it directly instead of being wrapped in vortex.masked.
  • new ZstdEncodingEncoder(valuesPerFrame) splits the payload into independently compressed frames of valuesPerFrame values each (one ZstdFrameMetadata per frame), letting a slice scan decompress only the frames overlapping its row range. The no-arg constructor still emits a single frame. (#170)

Changed

  • The vortex.zstd encoding now compresses and decompresses through io.github.dfa1.zstd:zstd (FFM bindings to the native libzstd) instead of io.airlift:aircompressor-v3. Consumers of vortex.zstd declare a single dependency, io.github.dfa1.zstd:zstd-platform, which transitively brings the zstd binding plus the native libzstd for every supported platform (replacing the former per-platform zstd-native-<platform> artifacts).

Fixed

  • vortex.zstd segments compressed with a shared (trained) dictionary now decode, via the native libzstd dictionary support, instead of being rejected. The upstream zstd.vortex compatibility fixture is read end-to-end and matches the Rust reference. (#104)
  • Writing a nullable Utf8/Binary column no longer throws NullPointerException (or silently drops nulls): nullable string columns now carry their validity like nullable primitives and round-trip through vortex.masked. As a result they decode as MaskedArray (validity + values child) rather than a bare VarBinArray. (#168)
  • CSV export now handles nullable columns (MaskedArray): null rows export as an empty field instead of failing with "unsupported array type for CSV export". (#168)
  • Zone-map pruning now compares filter values in the column's type domain rather than by the boxed value's type. A predicate whose value is boxed at a different width (e.g. Integer on an I64 column) — or any value on a U64 column — previously pruned nothing and silently degraded to a full scan; it now prunes correctly (unsigned columns by unsigned order). As part of this, a filter value genuinely incomparable to its column (e.g. a String against a numeric column) now raises VortexException during the scan instead of silently disabling pruning — a behaviour change for callers that relied on the previous silent full scan. (#159)

v0.9.0

Choose a tag to compare

@github-actions github-actions released this 24 Jun 19:41
Immutable release. Only release title and notes can be modified.

Two import-only breaking changes — the vortex-core types moved under io.github.dfa1.vortex.core.*, and the no-arg DType factories became constants. In return, Vortex now ships with no FlatBuffers or Protobuf runtime dependency: the .fbs/.proto schemas compile in-house to MemorySegment-native Java, dropping com.google.flatbuffers:flatbuffers-java — the last automatic-module dependency — so a named JPMS module-info is viable, and the generated wire classes are prefixed so they no longer collide on your classpath (ADR 0017).

Added

  • Canonical non-nullable DType constants: DType.I8I64, U8U64, F16/F32/F64, plus BOOL, UTF8, BINARY, NULL, VARIANT; build a nullable column with DType.I64.asNullable(). (f4b22e42)

Changed

  • Breaking (imports): every vortex-core type moved under io.github.dfa1.vortex.core.*core.model (DType, PType, TimeUnit, EncodingId, ExtensionId, Time*Dtype), core.io (IoBounds, PTypeIO, VortexFormat), core.error (VortexException), core.compute (FastLanes, PrimitiveArrays), core.fbs/core.proto (wire codecs). E.g. io.github.dfa1.vortex.core.DTypeio.github.dfa1.vortex.core.model.DType. (52f30c16)
  • Dropped the com.google.flatbuffers:flatbuffers-java runtime dependency; the .fbs/.proto schemas compile in-house to MemorySegment-native Java, and the generated wire classes are prefixed Fbs*/Proto* so the generic names (Array, Buffer, DType, …) no longer collide on your classpath (ADR 0017). (5907302e)

Removed

  • Breaking (imports): the no-arg DType factories (DType.i64(), DType.utf8(), …) — use the constants above (DType.i64()DType.I64). DType.decimal(..)/DType.structBuilder() and the record constructors are unchanged. (f4b22e42)

v0.8.3

Choose a tag to compare

@github-actions github-actions released this 23 Jun 07:01
Immutable release. Only release title and notes can be modified.

A Sonar-driven refactoring release: no new file-format capability, but a focused pass using SonarCloud findings to drive cleanups — dead code removed, duplication factored out, and one hot-loop micro-optimisation. Each finding was triaged (lead, not verdict) so the changes preserve behaviour and the JIT vectorisation of the hot decode loops. The interpretation framework behind this is now documented in docs/testing.md.

Performance

  • FastLanes.transposeIndex / iterateIndex: replaced the per-element %// + ORDER[] indirection with permutation tables built once in a static initialiser. Faster address generation keeps more outstanding scatter misses in flight; measured 1.4×–3.4× on the transpose/undelta kernels (Apple M5, L1→DRAM working sets). The per-element decode loops stay specialised per width to preserve C2 superword vectorisation. (089b6e36, e683a634)

Removed

  • Breaking (read SPI): removed EncodingDecoder.accepts(DType). It was a residual of the ADR-0001 read/write split — encode-selection semantics copied onto the decoder side, where the reader dispatches purely by EncodingId and never called it (dead since the split). EncodingEncoder.accepts is unchanged. Downstream custom EncodingDecoder implementations should delete their accepts override. (7516a544)

Changed

  • Internal dedup driven by Sonar duplication findings: extracted the shared FastLanes layout + PType.bits and PrimitiveArrays.toLongs/fromLongs into core, hoisted the Materialized* array boilerplate into a shared base, factored the four BitpackedEncodingDecoder unpack loops onto one precomputed per-row schedule, added PType.isUnsigned (dropping three private copies), and deduplicated the CLI inspect plumbing and formatBytes. (ec6b9631, a74263c0, 7af0af2a, 8362a353, 87c77cc9, d8f84088, b557e573, d52e8c0c)
  • Dropped dead PType switch arms in the writer's readPrimitiveElement, primitiveArrayLen, and buildTypedUniqueArray — unreachable branches flagged as uncovered. (4c6ab149, 94d2fa49, f89072a6)

Fixed

  • Cleared two SonarCloud-reported bugs in the writer's SUM zone-map stat plumbing. (33798ab9)
  • Suppressed java:S1172 on AbstractMaterializedArray.materialize with a reason — the arena parameter is contractual (implements Array#materialize(SegmentAllocator) for the leaf classes), not a removable unused parameter. (9b226f73)

Tests

  • Filled coverage gaps surfaced by Sonar: the Materialized* materialize defaults, every SchemaCommand.formatDType arm, and the writer's global-dict cardinality fallback with U16 utf8 codes. (8741dad3, 77fad504, c2918eaa)

Docs

  • docs/testing.md: new section on reading Sonar/PIT as data — the uncovered-line triage (missing-test / dead-code / defensive-by-contract), why mutation testing splits what coverage cannot, and when duplication is the deliberate price of the hot-loop rule. (8999661b)

v0.8.2

Choose a tag to compare

@github-actions github-actions released this 22 Jun 19:03
Immutable release. Only release title and notes can be modified.

The headline is writer-side zone-map statistics: the writer now emits vortex.stats (zoned) layouts carrying per-chunk MIN/MAX, NULL_COUNT, and SUM — matching the Rust reference — so zone-map chunk pruning and aggregate push-down work on Java-written files (previously the reader could decode these stats but the writer never produced them). The release also continues the test-hardening track: the lowest-covered encoder/decoder paths are filled in, SonarCloud new-code coverage is back to 100% with the quality gate green (overall ~83%, all ratings A, zero bugs/vulnerabilities), and the build toolchain is refreshed across eight dependency bumps.

Added

  • Writer: vortex.stats (zoned) layout emission, toggled by WriteOptions.enableZoneMaps. Each column is wrapped with a per-zone (one zone per chunk) statistics table; the stat set follows the Rust reference exactly. (838dba82, f2d74351)
  • Writer: per-zone MIN/MAX for primitive columns including F16, extension columns (over their storage primitive), Utf8 columns (full string bounds), and dictionary-encoded columns (computed on the logical values, independent of the dict encoding). (838dba82, fb5d096a, 38ab5c51, c1198253, e51da936)
  • Writer: per-zone NULL_COUNT for every column type. (135c9b37, c52d4b83, ab233b86)
  • Writer: per-zone SUM for numeric primitive columns (signed → i64, unsigned → u64, float → f64; integer overflow records a null sum). Matches Rust, which sums numeric primitives and decimals but not Utf8/extension columns. (9661f554)
  • Reader: RowFilter.isNull / RowFilter.isNotNull predicates with zone-map chunk pruning — IS NULL skips chunks with zero nulls, IS NOT NULL skips all-null chunks — via the per-chunk null_count. (2749b6ca)
  • Reader: columnStats() aggregates null_count across a column's chunks (reported only when every chunk carries one). (cb844f23)

Changed

  • Reader: the shared default HttpClient behind VortexHttpReader.open(URI, ReadRegistry) is now a package-private non-final field used purely as a unit-test seam, so the default-client overload is driven to a normal return by a mocked client instead of a live network call. Production never reassigns it. (12e46270)

Tests

  • Coverage for the ten lowest-coverage encode/decode classes — ZigZagEncodingDecoder/Encoder, SequenceEncodingEncoder, VariantEncodingDecoder.dtypeFromProto (every proto→core DType arm), TimeExtensionEncoder, VarBinViewEncodingDecoder, VarBinEncodingDecoder, AlpEncodingDecoder, DateTimePartsEncodingDecoder, and DeltaEncodingDecoder — exercising guards, broadcast/constant paths, and ptype arms. (a3012d4a, c9386eda, 6c9682b8, bbb9d669, 7742ecd3)
  • Writer: property-based and mutation-driven round-trips for the Delta and AlpRd encoders. (d3d245a6)
  • Reader: HTTP fixtures bumped to v0.75.0 with a smoke test across all encodings; the open(URI, ReadRegistry) overload is now covered via the default-client seam. (8a1b5db2, 12e46270)
  • Reader: decoder tests allocate via Arena.ofAuto() instead of the never-freed Arena.global(). (59ec2e2a)

Build

  • Dependency refresh: jacoco-maven-plugin 0.8.13→0.8.15, pitest-maven 1.20.0→1.25.5, checkstyle 13.5.0→13.6.0, byte-buddy-agent 1.17.7→1.18.10, central-publishing-maven-plugin 0.10.0→0.11.0, maven-jar-plugin 3.4.1→3.5.0, maven-dependency-plugin 3.7.0→3.11.0, and actions/checkout 6→7. (dab876b7, 7b7c3580, 46659a73, 46a30be1, c6723832, 3e5fa349, c943f81b, af009116)

v0.8.1

Choose a tag to compare

@github-actions github-actions released this 20 Jun 19:35
Immutable release. Only release title and notes can be modified.

A hardening release: no new file-format capability, but a large step up in verification rigour. Mutation testing (PIT) now guards the security-critical bounds/parse paths in core, reader, and writer at 99–100% kill rate; the build fails on any javac warning (-Xlint:all -Werror); and property-based round-trips exercise every lossless encoding plus the full cascade-selection pipeline against seeded-random inputs. The one functional addition is boxed-nullable array input on the map writeChunk path.

Added

  • Writer: the map-based writeChunk path accepts boxed nullable arrays (Integer[], Long[], Double[], …) alongside primitive arrays, so columns with nulls can be written without manual validity bookkeeping. (4d18939a)

Changed

  • Breaking — ExtensionEncoder.encodeAll is now abstract. The default body threw VortexException; every implementation already overrides it, so the contract now fails at compile time rather than at runtime. (2dcd69ce)
  • Breaking — Estimate is now an enum { SKIP, ALWAYS_USE, COMPLETE }. The sealed interface with empty Skip/AlwaysUse records, the skip()/alwaysUse() factories, and the null "no verdict" sentinel are gone; COMPLETE is the explicit defer-to-sample-encode verdict. (c355a4bf)
  • Reader cleanups: dropped a dead length < 0 blob check and a redundant offset > fileSize bounds clause, reused the shared PTypeIO little-endian layouts, and removed redundant numeric casts flagged by static analysis. (5d5fcc45, 36328285, 04cab707)

Fixed

  • Writer: I8/I16 columns are excluded from the global dictionary — the reader cannot decode a narrow-int dict, so dict-encoding them produced unreadable files. (473256b1)
  • Writer: WriteRegistry now iterates encoders in a deterministic order and accepts() reports honestly, fixing a non-deterministic encoder selection that broke the Windows build. (9c4ebb18)
  • Reader: Pco decode now guards preDeltaN against int overflow before clamping — the subtraction is widened to long, restoring the overflow-safe path. (b7346e7c)

Build

  • Zero-warning rule: -Xlint:all -Werror across all modules. The classfile lint (which only flags missing annotation class files inside third-party Arrow bytecode) is scoped off in the two Arrow-using modules only. (dab467e5, 43f6f840)
  • Mutation testing (PIT): opt-in pitest profiles in core, reader, and writer, scoped to the bounds/parse classes (IoBounds, PTypeIO, WriteRegistry, ChunkImpl, …), with common config hoisted into the parent POM. (46904b24, ed8c98a1, 1200c76b, 840cc46a)
  • SonarCloud: generated fbs/ and proto/ sources excluded from analysis (machine output, not hand-maintained); the deliberate per-width SIMD-loop duplication is documented in ADR 0005 rather than refactored away. Code smells dropped 857→394; coverage ~81%, all ratings A, zero bugs/vulnerabilities. (6c591293)

Tests

  • Property-based lossless round-trips added for ALP (f32/f64), Delta/FoR/ZigZag/AlpRd, a bitpacked bit-width sweep, the full CascadingCompressor (every codec × cascade depth 0–3), and a Pco seeded-random distribution sweep. (dbe44aaa, a2cf3443, aede11d7, 115dd6fd, a426c1de)
  • Mutation-driven test hardening lifted core/reader/writer bounds and registry classes to 99–100% kill rate. (2235499a, c9243f9a, 912fcaff)
  • Integration: added Java↔Rust round-trips for vortex.patched, fastlanes.delta, and masked encodings. (13702764)
  • CLI: terminal smoke tests now force class initialization so the FFM libc/kernel32 symbol resolution is actually exercised. (3f741ef7)

v0.8.0

Choose a tag to compare

@github-actions github-actions released this 20 Jun 11:41

Read and write Vortex Variant (semi-structured, JSON-shaped) columns from Java. Internally, transform encodings now decode lazily, trimming per-decode allocation. This release also hardens the reader's bounds handling on untrusted input (ADR 0003 Phase E), fixes CSV-import memory blow-ups on large files, and lifts test coverage to 80% with all Sonar ratings at A.

Added

  • Writer: vortex.variant encoder. Encodes a variant column as the canonical vortex.variant container over core_storage — an all-equal column becomes a single vortex.constant, a row-varying column a vortex.chunked of per-run constants — with an optional row-aligned typed shredded child recorded in VariantMetadata.shredded_dtype. Input is VariantData(List<Scalar>) with .constant(n, v) / .shredded(...) factories. Java↔Rust (JNI) round-trip verified for constant, row-varying, and shredded columns. Scalar values only — arbitrary nested objects need vortex.parquet.variant (deferred, ADR 0014). (35da529d, e4e44980, 4566dca0)
  • Reader: variant columns now decode Java-side. ConstantEncodingDecoder and ChunkedEncodingDecoder handle DType.Variant (materialising the inner-typed array); VariantEncodingDecoder wraps the result as VariantArray, exposing coreStorage() and shredded(). (76e4c741, 4566dca0)

Security

  • Reader bounds hardening (ADR 0003 Phase E): untrusted offsets/lengths from file metadata now flow through a typed IoBounds helper that throws VortexException instead of a raw IndexOutOfBoundsException, and hand-rolled index guards were replaced with Objects.checkIndex. A crafted flat-segment file can no longer trip an unchecked array access during decode. (e9af80d6, 3bcd9881, a5ce8380)

Fixed

  • CSV import: large files no longer OOM. The importer now streams rows in a single pass (buffering only the first chunk for schema inference) and disables the global-dictionary pass by default, which previously accumulated every distinct value in memory. (d5280ae2, 0b6784b5, 62863616)
  • CLI: IoWorker.runAndAwait decremented its in-flight counter after signaling completion, so a caller reading pending() right after it returned could still see the task counted; the counter is now decremented before the await returns. The view/tui commands also close the opened VortexHandle on every error path (openOnWorker returns Optional). (95c06b1a, 27446d81)
  • Reader: BoolArray.materialize masked the accumulator byte before the bit-set OR, removing a sign-promotion footgun in the packed-bitmap write. (bc8e9d4e)

Changed

  • Decode shape: transform encodings now decode lazy-only. The eager Materialized*Array fallbacks were removed from vortex.zigzag (all PTypes + broadcast, cd59fefa), fastlanes.for (all integer PTypes, d7953e1f), vortex.alp (broadcast-without-patches, deab8067), vortex.constant (Decimal → LazyConstantDecimalArray, a6a9611e), vortex.runend (Bool → LazyRunEndBoolArray, 0bbcb81f), vortex.sparse (Bool → LazySparseBoolArray, db2e955b), and fastlanes.rle (validity → OffsetBoolArray, empty → LazyConstantXxxArray, 5e83a5c3). Decompression encodings (bitpacked, pco, zstd, fsst, delta, patched), the primitive base, the vortex.dict encoding-level path, and the vortex.alp patches path stay Materialized by design. See ADR 0015.
  • Breaking — sealed Array permits changed. DecimalArray is now a non-sealed family interface (decimal arrays moved from implements Array to implements DecimalArray), so decimal joins the per-dtype family layer. Downstream exhaustive switch over Array must add a case DecimalArray. (a6a9611e)
  • Breaking — Array API. Array.truncate(rows) renamed to Array.limited(rows) and made an abstract operation implemented by every array (composites slice their children); raw-segment access moved off the ArraySegments utility onto Array.materialize(SegmentAllocator) and Array.segmentIfPresent(). (87ab65e2, 4d9ac1f8, 332b067e, 32a35e03)
  • CSV import reports progress every 10K rows instead of per-chunk. (07a056e7)

Removed

  • Breaking — EmptyArray removed from the sealed Array permits. It was never emitted by the reader (empties are zero-length typed arrays in their own family) and broke the dtype→family invariant (EmptyArray(I64) was not a LongArray). Represent an empty column as a zero-length array of the appropriate family. (3a4dcdfa)

Documentation

  • ADR 0016: captures vortex-arrow bridge interop options (separate module / Arrow C-Data / none); deferred until a concrete downstream need. (a6126f29)

Tests

  • Test coverage raised from ~74% to 80% — the lazy/chunked/dict/run-end/sparse array families, ChunkImpl, and several decoders (DecimalEncodingDecoder, DictEncodingDecoder, ParquetImporter) reached full line + branch coverage. SonarCloud quality gate green: reliability, security, and maintainability all at A, zero bugs and vulnerabilities.

v0.7.3

Choose a tag to compare

@dfa1 dfa1 released this 17 Jun 18:30

Parquet ZSTD support, vortex.patched encoder, constant-encoding selection fix, Windows TUI raw-mode fix.

Added

  • Parquet: ZSTD-compressed Parquet importzstd-jni was an optional dep in hardwood and had to be declared explicitly. NYC Yellow Taxi 2024-01 (47.6 MB Parquet, 2.96 M rows × 19 cols) imports to 40.7 MB Vortex — 14% smaller than the Rust JNI reference (47 MB) thanks to the global-dict encoder catching low-cardinality F64 columns.
  • Writer: vortex.patched encoder — identifies outlier values that exceed the optimal bit width, zeros them in the inner array (exposed as an open cascade child for further bitpacking), and stores their within-chunk U16 indices and raw values separately.

Fixed

  • CLI: Windows TUI raw-modereadKey now calls ReadFile directly on the kernel handle obtained via GetStdHandle instead of reading from System.in. Java's System.in goes through JVM-internal CRT wrappers that ignore SetConsoleMode, so every keypress previously required Enter before the TUI reacted.
  • Writer: constant encoding skipped for single-distinct-value columnsisDictCandidate returned true for distinctCount == 1, routing all-same-value columns through the global-dict path instead of vortex.constant.

Changed

  • CLI: polling loop in Terminal.readKey(Duration) extracted to KeyDecoder.nextWithTimeout(InputStream, Duration) — eliminates duplication between PosixTerminal and WindowsTerminal.

Tests

  • Integration: TaxiParquetOracleVsJavaIntegrationTest — hardwood reads the taxi Parquet to a CSV (oracle); ParquetImporterCsvExporter produces a second CSV (SUT); line-by-line diff must be zero. Proves the importer loses no data across 2.96 M rows × 19 columns.

Full changelog: https://github.com/dfa1/vortex-java/blob/main/CHANGELOG.md#0.7.3

v0.7.2

Choose a tag to compare

@dfa1 dfa1 released this 16 Jun 19:56

CLI usability + reader robustness on real-world files (NYC Yellow Taxi).

Added

  • CLI view <file> — scrollable Excel-like grid TUI. Streams rows on demand via a new LazyGridSource (one live chunk at a time, formats only the visible window). Title bar shows chunk K/N. Quit with q / Esc.
  • CLI export writes to a derived <name>.csv next to the input file by default, with a stderr progress bar mirroring the import flow. Use export <file.vortex> - to stream to stdout, or export <file.vortex> out.csv for an explicit path.
  • Reader ScanIterator.chunkRowCounts() — returns per-chunk row counts by walking the layout tree, no value decode. Used by the view TUI to plan navigation.
  • Lazy vortex.decimal decode via new LazyDecimalArray record — zero-copy mmap slice + per-row BigDecimal materialisation. Replaces the GenericArray wrapper.
  • 7 Offset*Array records (Long / Int / Short / Byte / Double / Float / Bool) + VarBinArray.SlicedMode for offset-based slicing of pre-decoded shared arrays.

Fixed

  • Per-column chunking alignment. Files where one column has 1 mega-flat and another has N small flats (e.g. NYC Yellow Taxi 2024-01: 2.96M-row VendorID next to 23 × 131072-row datetime flats) now decode the wide column once into a sharedArena and slice it per chunk via Offset*Array. Previously the scan iterator emitted a single chunk whose datetime columns were the first 131072 rows only — silently dropping 95.6 % of the file.
  • FrameOfReferenceEncodingDecoder now uses the arena variant of ArraySegments.of, so lazy children (e.g. LazyRunEndLongArray) materialise instead of throwing "no primary segment".

Docs

  • Compatibility table refreshed: constant, varbinview, alprd, datetimeparts, decimal_byte_parts, decimal now reflect their shipped Lazy shape; container encodings (list / listview / fixed_size_list) marked Lazy (inherit child shape); patched pinned Materialized with reasoning.
  • New ADR 0013 — policy for dropping Materialized fallbacks once Lazy ships.

Maven Central: io.github.dfa1.vortex:vortex-reader:0.7.2 (and vortex-writer, vortex-cli, etc.).

v0.7.1

Choose a tag to compare

@dfa1 dfa1 released this 16 Jun 17:41

Cleanup release on top of 0.7.0 — one more lazy encoding, a Windows TUI usability fix, and a fresh round of read benchmarks.

Added

  • vortex.constant lazy decode — seven metadata-only LazyConstantXxxArray records (Long / Int / Double / Float / Short / Byte / Bool) replace the one-element broadcast buffer; the per-element broadcast-modulo path is gone (3edf6e8c)
  • Top-N read benchmarks (N=10, 100) + README table, refreshed 80M-row numbers (c00fdf7f, 33714d7b, a6fd92fc)

Changed

  • CLI: schema prints per-row column listing (9b3fe4b5)
  • CLI: Terminal.readKey takes Duration instead of long ms (2942a4da)
  • Reader: extract TimeDtype + TimestampDtype shared metadata helpers (8f1b9feb)

Fixed

  • CLI: actionable error on Git Bash / MinTTY — GetConsoleMode failure now points users at winpty / Windows Terminal / PowerShell instead of dead-ending on the raw error (6ec42288)
  • Reader: ArraySegments.of(arr) typed-accessor fallback for lazy arrays (74ec207b)

CI

Full changelog: v0.7.0...v0.7.1