Releases: dfa1/vortex-java
Release list
v0.11.0
SQL over Vortex grows a compute layer: a Calcite WHERE-filtered SUM/COUNT/MIN/MAX is now answered from the zone-map statistics — folding the chunks the predicate fully selects without decoding them and decoding only the one or two chunks its range cuts through (ADR 0013 §6 / ADR 0018 boundary tier). On a 100-chunk file a SELECT SUM(x) WHERE id BETWEEN … answers ~12× faster than a full scan when the range is wide. Plus the security hardening of the untrusted-input parse paths (ADR 0003) and a vortex.zstd binding bump.
Added
VortexCalcite.connect(schemaName, tables)opens a Calcite JDBC connection with theVortexSchemaregistered in one call, folding away theDriverManager/unwrap/getRootSchema().add(...)boilerplate. It wires the Babel SQL parser, so columns whose names are reserved words (close,open,value,year, …) are queryable unquoted —select close, open from vtx.ohlc— which columnar files routinely need; only the typed-literal keywords (date,time,timestamp,interval) still require back-tick quoting. (24b64b32)- The Calcite aggregate push-down rule now auto-registers over a bare
jdbc:calcite:connection: aVortexTabletranslates to aVortexTableScanwhoseregister()installs the rules when the planner first sees it, soSELECT MIN/MAX/COUNT/SUMover aVortexSchemais answered from zone-map statistics with no caller wiring (previously the rule had to be attached to the planner by hand).AVGjoins them —AggregateReduceFunctionsRulereduces it toSUM/COUNT, both of which push down, so a whole-tableAVGalso decodes no data segment. Column projection andWHEREchunk-skip push-down are unchanged. (24b64b32) - The Calcite aggregate push-down rule now rewrites a whole-table
SUM(col)to a single-rowValuescomputed from the zone-map table (viaVortexTable.zoneSum), soSELECT SUM(col)answers metadata-only with no data segment decoded — joining the existingMIN/MAX/COUNTpush-down.SUMover an all-null (or empty) column answers the SQLNULL, and the rule abandons to a normal scan when a zone carries no usable sum (no zone map, or an overflowed zone). (24b64b32) - A Calcite
WHERE-filteredSUM/COUNT/MIN/MAXis now answered from zone-map statistics when the predicate partitions the chunks cleanly — each chunk wholly matches or wholly fails it — folding only the kept chunks' stats into a scan-free result. A predicate that cuts through a chunk (the typical selective range) still falls back to the zone-map-pruned scan. (32cc4a29) - A
WHERE-filtered aggregate whose range cuts through a chunk no longer falls back to a full scan: the interior chunks still fold from the zone-map stats, and each straddling boundary chunk is decoded on its own and reduced under a row-level filter, soSELECT SUM(volume) WHERE close BETWEEN 100 AND 200decodes only the one or two boundary chunks instead of every surviving chunk. The push-down still abandons to the (correct) scan for an unsigned or floating-point filter column, a non-numericSUM, or a missing zone map. (f89b5b69) VortexReader.decodeChunk(chunkIndex, columns)andchunkCount()decode a single chunk for a chosen subset of columns in isolation, rather than streaming the whole file — the returnedChunkowns its memory and is valid until closed. (084a0133)ScanIterator.columnZoneStats(column)surfaces per-zone min/max/sum/null-count from a column'svortex.statszone-map table without decoding any data segment — the read side of aggregate push-down (ADR 0013 §6).ArrayStatsgains asumcomponent, decoded from the zone-map table (where the Rust reference stores it too), so the Calcite adapter now answersSUM/AVGmetadata-only when every zone carries a sum, falling back to a streaming scan only for columns without a zone map. (05dd9204)
Changed
- Bumped
io.github.dfa1.zstd(thevortex.zstdFFM bindings, pinned by the BOM) 0.3 → 0.6, which ships smaller jars (native debug symbols stripped). (677c2cf7, 6dcdbe94, fec0a0d3) - Bumped Apache Calcite (the SQL adapter's engine) 1.40 → 1.42. (2f9f02c6)
Security
DType-tree and array-node decoding are now depth-capped (64, matching the layout-tree guard): a crafted or self-referential FlatBuffer surfaces as aVortexExceptioninstead of aStackOverflowError— which, being anError, previously escaped sanitization and leaked the reader's memory-mappedArena. (93f8d5f4, 428026d3)- The HTTP reader validates footer
segmentSpecsagainst the file size before anyRangerequest is built from them, matching the local-file path. (1d8ddebc) vortex.zstddecode bounds-checks each frame's declared uncompressed size and overflow-checks the total before allocating, and range-checks VarBin length prefixes — a crafted payload can no longer under-allocate or read out of bounds. (2df4e3a7, adc445e8)- The HTTP reader parses the server-controlled
Content-Rangeheader and slices the tail buffer defensively, so a malformed response yields aVortexExceptionrather than a rawNumberFormatException/IndexOutOfBoundsException. (feac99b7)
v0.10.0
A vortex.zstd overhaul: compression now runs through FFM bindings to the native libzstd, gaining framed (sliceable) payloads, nullable-column support, and shared-dictionary decode. Alongside it, zone-map pruning is fixed to compare in the column's type domain.
Added
DType.isUnsigned()—truefor the unsigned integer primitives (U8–U64),falseotherwise. (#159)- The
vortex.zstdencoder now writes nullable columns (primitive and utf8/binary): null positions are stripped before compression and validity is emitted as a Bool child, matching the Rust reference layout. Whenvortex.zstdis the configured encoder, nullable primitive columns route to it directly instead of being wrapped invortex.masked. new ZstdEncodingEncoder(valuesPerFrame)splits the payload into independently compressed frames ofvaluesPerFramevalues each (oneZstdFrameMetadataper frame), letting a slice scan decompress only the frames overlapping its row range. The no-arg constructor still emits a single frame. (#170)
Changed
- The
vortex.zstdencoding now compresses and decompresses throughio.github.dfa1.zstd:zstd(FFM bindings to the nativelibzstd) instead ofio.airlift:aircompressor-v3. Consumers ofvortex.zstddeclare a single dependency,io.github.dfa1.zstd:zstd-platform, which transitively brings thezstdbinding plus the nativelibzstdfor every supported platform (replacing the former per-platformzstd-native-<platform>artifacts).
Fixed
vortex.zstdsegments compressed with a shared (trained) dictionary now decode, via the nativelibzstddictionary support, instead of being rejected. The upstreamzstd.vortexcompatibility fixture is read end-to-end and matches the Rust reference. (#104)- Writing a nullable
Utf8/Binarycolumn no longer throwsNullPointerException(or silently drops nulls): nullable string columns now carry their validity like nullable primitives and round-trip throughvortex.masked. As a result they decode asMaskedArray(validity + values child) rather than a bareVarBinArray. (#168) - CSV export now handles nullable columns (
MaskedArray): null rows export as an empty field instead of failing with "unsupported array type for CSV export". (#168) - Zone-map pruning now compares filter values in the column's type domain rather than by the boxed value's type. A predicate whose value is boxed at a different width (e.g.
Integeron anI64column) — or any value on aU64column — previously pruned nothing and silently degraded to a full scan; it now prunes correctly (unsigned columns by unsigned order). As part of this, a filter value genuinely incomparable to its column (e.g. aStringagainst a numeric column) now raisesVortexExceptionduring the scan instead of silently disabling pruning — a behaviour change for callers that relied on the previous silent full scan. (#159)
v0.9.0
Two import-only breaking changes — the vortex-core types moved under io.github.dfa1.vortex.core.*, and the no-arg DType factories became constants. In return, Vortex now ships with no FlatBuffers or Protobuf runtime dependency: the .fbs/.proto schemas compile in-house to MemorySegment-native Java, dropping com.google.flatbuffers:flatbuffers-java — the last automatic-module dependency — so a named JPMS module-info is viable, and the generated wire classes are prefixed so they no longer collide on your classpath (ADR 0017).
Added
- Canonical non-nullable
DTypeconstants:DType.I8…I64,U8…U64,F16/F32/F64, plusBOOL,UTF8,BINARY,NULL,VARIANT; build a nullable column withDType.I64.asNullable(). (f4b22e42)
Changed
- Breaking (imports): every
vortex-coretype moved underio.github.dfa1.vortex.core.*—core.model(DType,PType,TimeUnit,EncodingId,ExtensionId,Time*Dtype),core.io(IoBounds,PTypeIO,VortexFormat),core.error(VortexException),core.compute(FastLanes,PrimitiveArrays),core.fbs/core.proto(wire codecs). E.g.io.github.dfa1.vortex.core.DType→io.github.dfa1.vortex.core.model.DType. (52f30c16) - Dropped the
com.google.flatbuffers:flatbuffers-javaruntime dependency; the.fbs/.protoschemas compile in-house toMemorySegment-native Java, and the generated wire classes are prefixedFbs*/Proto*so the generic names (Array,Buffer,DType, …) no longer collide on your classpath (ADR 0017). (5907302e)
Removed
- Breaking (imports): the no-arg
DTypefactories (DType.i64(),DType.utf8(), …) — use the constants above (DType.i64()→DType.I64).DType.decimal(..)/DType.structBuilder()and the record constructors are unchanged. (f4b22e42)
v0.8.3
A Sonar-driven refactoring release: no new file-format capability, but a focused pass using SonarCloud findings to drive cleanups — dead code removed, duplication factored out, and one hot-loop micro-optimisation. Each finding was triaged (lead, not verdict) so the changes preserve behaviour and the JIT vectorisation of the hot decode loops. The interpretation framework behind this is now documented in docs/testing.md.
Performance
FastLanes.transposeIndex/iterateIndex: replaced the per-element%//+ORDER[]indirection with permutation tables built once in a static initialiser. Faster address generation keeps more outstanding scatter misses in flight; measured 1.4×–3.4× on the transpose/undelta kernels (Apple M5, L1→DRAM working sets). The per-element decode loops stay specialised per width to preserve C2 superword vectorisation. (089b6e36, e683a634)
Removed
- Breaking (read SPI): removed
EncodingDecoder.accepts(DType). It was a residual of the ADR-0001 read/write split — encode-selection semantics copied onto the decoder side, where the reader dispatches purely byEncodingIdand never called it (dead since the split).EncodingEncoder.acceptsis unchanged. Downstream customEncodingDecoderimplementations should delete theiracceptsoverride. (7516a544)
Changed
- Internal dedup driven by Sonar duplication findings: extracted the shared FastLanes layout +
PType.bitsandPrimitiveArrays.toLongs/fromLongsinto core, hoisted theMaterialized*array boilerplate into a shared base, factored the fourBitpackedEncodingDecoderunpack loops onto one precomputed per-row schedule, addedPType.isUnsigned(dropping three private copies), and deduplicated the CLI inspect plumbing andformatBytes. (ec6b9631, a74263c0, 7af0af2a, 8362a353, 87c77cc9, d8f84088, b557e573, d52e8c0c) - Dropped dead
PTypeswitch arms in the writer'sreadPrimitiveElement,primitiveArrayLen, andbuildTypedUniqueArray— unreachable branches flagged as uncovered. (4c6ab149, 94d2fa49, f89072a6)
Fixed
- Cleared two SonarCloud-reported bugs in the writer's SUM zone-map stat plumbing. (33798ab9)
- Suppressed
java:S1172onAbstractMaterializedArray.materializewith a reason — thearenaparameter is contractual (implementsArray#materialize(SegmentAllocator)for the leaf classes), not a removable unused parameter. (9b226f73)
Tests
- Filled coverage gaps surfaced by Sonar: the
Materialized*materializedefaults, everySchemaCommand.formatDTypearm, and the writer's global-dict cardinality fallback with U16 utf8 codes. (8741dad3, 77fad504, c2918eaa)
Docs
docs/testing.md: new section on reading Sonar/PIT as data — the uncovered-line triage (missing-test / dead-code / defensive-by-contract), why mutation testing splits what coverage cannot, and when duplication is the deliberate price of the hot-loop rule. (8999661b)
v0.8.2
The headline is writer-side zone-map statistics: the writer now emits vortex.stats (zoned) layouts carrying per-chunk MIN/MAX, NULL_COUNT, and SUM — matching the Rust reference — so zone-map chunk pruning and aggregate push-down work on Java-written files (previously the reader could decode these stats but the writer never produced them). The release also continues the test-hardening track: the lowest-covered encoder/decoder paths are filled in, SonarCloud new-code coverage is back to 100% with the quality gate green (overall ~83%, all ratings A, zero bugs/vulnerabilities), and the build toolchain is refreshed across eight dependency bumps.
Added
- Writer:
vortex.stats(zoned) layout emission, toggled byWriteOptions.enableZoneMaps. Each column is wrapped with a per-zone (one zone per chunk) statistics table; the stat set follows the Rust reference exactly. (838dba82, f2d74351) - Writer: per-zone MIN/MAX for primitive columns including F16, extension columns (over their storage primitive), Utf8 columns (full string bounds), and dictionary-encoded columns (computed on the logical values, independent of the dict encoding). (838dba82, fb5d096a, 38ab5c51, c1198253, e51da936)
- Writer: per-zone NULL_COUNT for every column type. (135c9b37, c52d4b83, ab233b86)
- Writer: per-zone SUM for numeric primitive columns (signed →
i64, unsigned →u64, float →f64; integer overflow records a null sum). Matches Rust, which sums numeric primitives and decimals but not Utf8/extension columns. (9661f554) - Reader:
RowFilter.isNull/RowFilter.isNotNullpredicates with zone-map chunk pruning — IS NULL skips chunks with zero nulls, IS NOT NULL skips all-null chunks — via the per-chunknull_count. (2749b6ca) - Reader:
columnStats()aggregatesnull_countacross a column's chunks (reported only when every chunk carries one). (cb844f23)
Changed
- Reader: the shared default
HttpClientbehindVortexHttpReader.open(URI, ReadRegistry)is now a package-private non-final field used purely as a unit-test seam, so the default-client overload is driven to a normal return by a mocked client instead of a live network call. Production never reassigns it. (12e46270)
Tests
- Coverage for the ten lowest-coverage encode/decode classes —
ZigZagEncodingDecoder/Encoder,SequenceEncodingEncoder,VariantEncodingDecoder.dtypeFromProto(every proto→coreDTypearm),TimeExtensionEncoder,VarBinViewEncodingDecoder,VarBinEncodingDecoder,AlpEncodingDecoder,DateTimePartsEncodingDecoder, andDeltaEncodingDecoder— exercising guards, broadcast/constant paths, and ptype arms. (a3012d4a, c9386eda, 6c9682b8, bbb9d669, 7742ecd3) - Writer: property-based and mutation-driven round-trips for the Delta and AlpRd encoders. (d3d245a6)
- Reader: HTTP fixtures bumped to
v0.75.0with a smoke test across all encodings; theopen(URI, ReadRegistry)overload is now covered via the default-client seam. (8a1b5db2, 12e46270) - Reader: decoder tests allocate via
Arena.ofAuto()instead of the never-freedArena.global(). (59ec2e2a)
Build
- Dependency refresh:
jacoco-maven-plugin0.8.13→0.8.15,pitest-maven1.20.0→1.25.5,checkstyle13.5.0→13.6.0,byte-buddy-agent1.17.7→1.18.10,central-publishing-maven-plugin0.10.0→0.11.0,maven-jar-plugin3.4.1→3.5.0,maven-dependency-plugin3.7.0→3.11.0, andactions/checkout6→7. (dab876b7, 7b7c3580, 46659a73, 46a30be1, c6723832, 3e5fa349, c943f81b, af009116)
v0.8.1
A hardening release: no new file-format capability, but a large step up in verification rigour. Mutation testing (PIT) now guards the security-critical bounds/parse paths in core, reader, and writer at 99–100% kill rate; the build fails on any javac warning (-Xlint:all -Werror); and property-based round-trips exercise every lossless encoding plus the full cascade-selection pipeline against seeded-random inputs. The one functional addition is boxed-nullable array input on the map writeChunk path.
Added
- Writer: the map-based
writeChunkpath accepts boxed nullable arrays (Integer[],Long[],Double[], …) alongside primitive arrays, so columns with nulls can be written without manual validity bookkeeping. (4d18939a)
Changed
- Breaking —
ExtensionEncoder.encodeAllis now abstract. The default body threwVortexException; every implementation already overrides it, so the contract now fails at compile time rather than at runtime. (2dcd69ce) - Breaking —
Estimateis now an enum{ SKIP, ALWAYS_USE, COMPLETE }. The sealed interface with emptySkip/AlwaysUserecords, theskip()/alwaysUse()factories, and thenull"no verdict" sentinel are gone;COMPLETEis the explicit defer-to-sample-encode verdict. (c355a4bf) - Reader cleanups: dropped a dead
length < 0blob check and a redundantoffset > fileSizebounds clause, reused the sharedPTypeIOlittle-endian layouts, and removed redundant numeric casts flagged by static analysis. (5d5fcc45, 36328285, 04cab707)
Fixed
- Writer: I8/I16 columns are excluded from the global dictionary — the reader cannot decode a narrow-int dict, so dict-encoding them produced unreadable files. (473256b1)
- Writer:
WriteRegistrynow iterates encoders in a deterministic order andaccepts()reports honestly, fixing a non-deterministic encoder selection that broke the Windows build. (9c4ebb18) - Reader: Pco decode now guards
preDeltaNagainst int overflow before clamping — the subtraction is widened tolong, restoring the overflow-safe path. (b7346e7c)
Build
- Zero-warning rule:
-Xlint:all -Werroracross all modules. Theclassfilelint (which only flags missing annotation class files inside third-party Arrow bytecode) is scoped off in the two Arrow-using modules only. (dab467e5, 43f6f840) - Mutation testing (PIT): opt-in
pitestprofiles in core, reader, and writer, scoped to the bounds/parse classes (IoBounds,PTypeIO,WriteRegistry,ChunkImpl, …), with common config hoisted into the parent POM. (46904b24, ed8c98a1, 1200c76b, 840cc46a) - SonarCloud: generated
fbs/andproto/sources excluded from analysis (machine output, not hand-maintained); the deliberate per-width SIMD-loop duplication is documented in ADR 0005 rather than refactored away. Code smells dropped 857→394; coverage ~81%, all ratings A, zero bugs/vulnerabilities. (6c591293)
Tests
- Property-based lossless round-trips added for ALP (f32/f64), Delta/FoR/ZigZag/AlpRd, a bitpacked bit-width sweep, the full
CascadingCompressor(every codec × cascade depth 0–3), and a Pco seeded-random distribution sweep. (dbe44aaa, a2cf3443, aede11d7, 115dd6fd, a426c1de) - Mutation-driven test hardening lifted core/reader/writer bounds and registry classes to 99–100% kill rate. (2235499a, c9243f9a, 912fcaff)
- Integration: added Java↔Rust round-trips for
vortex.patched,fastlanes.delta, andmaskedencodings. (13702764) - CLI: terminal smoke tests now force class initialization so the FFM libc/kernel32 symbol resolution is actually exercised. (3f741ef7)
v0.8.0
Read and write Vortex Variant (semi-structured, JSON-shaped) columns from Java. Internally, transform encodings now decode lazily, trimming per-decode allocation. This release also hardens the reader's bounds handling on untrusted input (ADR 0003 Phase E), fixes CSV-import memory blow-ups on large files, and lifts test coverage to 80% with all Sonar ratings at A.
Added
- Writer:
vortex.variantencoder. Encodes a variant column as the canonicalvortex.variantcontainer overcore_storage— an all-equal column becomes a singlevortex.constant, a row-varying column avortex.chunkedof per-run constants — with an optional row-aligned typedshreddedchild recorded inVariantMetadata.shredded_dtype. Input isVariantData(List<Scalar>)with.constant(n, v)/.shredded(...)factories. Java↔Rust (JNI) round-trip verified for constant, row-varying, and shredded columns. Scalar values only — arbitrary nested objects needvortex.parquet.variant(deferred, ADR 0014). (35da529d, e4e44980, 4566dca0) - Reader: variant columns now decode Java-side.
ConstantEncodingDecoderandChunkedEncodingDecoderhandleDType.Variant(materialising the inner-typed array);VariantEncodingDecoderwraps the result asVariantArray, exposingcoreStorage()andshredded(). (76e4c741, 4566dca0)
Security
- Reader bounds hardening (ADR 0003 Phase E): untrusted offsets/lengths from file metadata now flow through a typed
IoBoundshelper that throwsVortexExceptioninstead of a rawIndexOutOfBoundsException, and hand-rolled index guards were replaced withObjects.checkIndex. A crafted flat-segment file can no longer trip an unchecked array access during decode. (e9af80d6, 3bcd9881, a5ce8380)
Fixed
- CSV import: large files no longer OOM. The importer now streams rows in a single pass (buffering only the first chunk for schema inference) and disables the global-dictionary pass by default, which previously accumulated every distinct value in memory. (d5280ae2, 0b6784b5, 62863616)
- CLI:
IoWorker.runAndAwaitdecremented its in-flight counter after signaling completion, so a caller readingpending()right after it returned could still see the task counted; the counter is now decremented before the await returns. Theview/tuicommands also close the openedVortexHandleon every error path (openOnWorkerreturnsOptional). (95c06b1a, 27446d81) - Reader:
BoolArray.materializemasked the accumulator byte before the bit-set OR, removing a sign-promotion footgun in the packed-bitmap write. (bc8e9d4e)
Changed
- Decode shape: transform encodings now decode lazy-only. The eager
Materialized*Arrayfallbacks were removed fromvortex.zigzag(all PTypes + broadcast, cd59fefa),fastlanes.for(all integer PTypes, d7953e1f),vortex.alp(broadcast-without-patches, deab8067),vortex.constant(Decimal →LazyConstantDecimalArray, a6a9611e),vortex.runend(Bool →LazyRunEndBoolArray, 0bbcb81f),vortex.sparse(Bool →LazySparseBoolArray, db2e955b), andfastlanes.rle(validity →OffsetBoolArray, empty →LazyConstantXxxArray, 5e83a5c3). Decompression encodings (bitpacked,pco,zstd,fsst,delta,patched), the primitive base, thevortex.dictencoding-level path, and thevortex.alppatches path stay Materialized by design. See ADR 0015. - Breaking — sealed
Arraypermits changed.DecimalArrayis now anon-sealedfamily interface (decimal arrays moved fromimplements Arraytoimplements DecimalArray), so decimal joins the per-dtype family layer. Downstream exhaustiveswitchoverArraymust add acase DecimalArray. (a6a9611e) - Breaking —
ArrayAPI.Array.truncate(rows)renamed toArray.limited(rows)and made an abstract operation implemented by every array (composites slice their children); raw-segment access moved off theArraySegmentsutility ontoArray.materialize(SegmentAllocator)andArray.segmentIfPresent(). (87ab65e2, 4d9ac1f8, 332b067e, 32a35e03) - CSV import reports progress every 10K rows instead of per-chunk. (07a056e7)
Removed
- Breaking —
EmptyArrayremoved from the sealedArraypermits. It was never emitted by the reader (empties are zero-length typed arrays in their own family) and broke the dtype→family invariant (EmptyArray(I64)was not aLongArray). Represent an empty column as a zero-length array of the appropriate family. (3a4dcdfa)
Documentation
- ADR 0016: captures
vortex-arrowbridge interop options (separate module / Arrow C-Data / none); deferred until a concrete downstream need. (a6126f29)
Tests
- Test coverage raised from ~74% to 80% — the lazy/chunked/dict/run-end/sparse array families,
ChunkImpl, and several decoders (DecimalEncodingDecoder,DictEncodingDecoder,ParquetImporter) reached full line + branch coverage. SonarCloud quality gate green: reliability, security, and maintainability all at A, zero bugs and vulnerabilities.
v0.7.3
Parquet ZSTD support, vortex.patched encoder, constant-encoding selection fix, Windows TUI raw-mode fix.
Added
- Parquet: ZSTD-compressed Parquet import —
zstd-jniwas an optional dep in hardwood and had to be declared explicitly. NYC Yellow Taxi 2024-01 (47.6 MB Parquet, 2.96 M rows × 19 cols) imports to 40.7 MB Vortex — 14% smaller than the Rust JNI reference (47 MB) thanks to the global-dict encoder catching low-cardinalityF64columns. - Writer:
vortex.patchedencoder — identifies outlier values that exceed the optimal bit width, zeros them in the inner array (exposed as an open cascade child for further bitpacking), and stores their within-chunk U16 indices and raw values separately.
Fixed
- CLI: Windows TUI raw-mode —
readKeynow callsReadFiledirectly on the kernel handle obtained viaGetStdHandleinstead of reading fromSystem.in. Java'sSystem.ingoes through JVM-internal CRT wrappers that ignoreSetConsoleMode, so every keypress previously required Enter before the TUI reacted. - Writer: constant encoding skipped for single-distinct-value columns —
isDictCandidatereturnedtruefordistinctCount == 1, routing all-same-value columns through the global-dict path instead ofvortex.constant.
Changed
- CLI: polling loop in
Terminal.readKey(Duration)extracted toKeyDecoder.nextWithTimeout(InputStream, Duration)— eliminates duplication betweenPosixTerminalandWindowsTerminal.
Tests
- Integration:
TaxiParquetOracleVsJavaIntegrationTest— hardwood reads the taxi Parquet to a CSV (oracle);ParquetImporter→CsvExporterproduces a second CSV (SUT); line-by-line diff must be zero. Proves the importer loses no data across 2.96 M rows × 19 columns.
Full changelog: https://github.com/dfa1/vortex-java/blob/main/CHANGELOG.md#0.7.3
v0.7.2
CLI usability + reader robustness on real-world files (NYC Yellow Taxi).
Added
- CLI
view <file>— scrollable Excel-like grid TUI. Streams rows on demand via a newLazyGridSource(one live chunk at a time, formats only the visible window). Title bar showschunk K/N. Quit withq/Esc. - CLI
exportwrites to a derived<name>.csvnext to the input file by default, with a stderr progress bar mirroring the import flow. Useexport <file.vortex> -to stream to stdout, orexport <file.vortex> out.csvfor an explicit path. - Reader
ScanIterator.chunkRowCounts()— returns per-chunk row counts by walking the layout tree, no value decode. Used by theviewTUI to plan navigation. - Lazy
vortex.decimaldecode via newLazyDecimalArrayrecord — zero-copy mmap slice + per-rowBigDecimalmaterialisation. Replaces theGenericArraywrapper. - 7
Offset*Arrayrecords (Long / Int / Short / Byte / Double / Float / Bool) +VarBinArray.SlicedModefor offset-based slicing of pre-decoded shared arrays.
Fixed
- Per-column chunking alignment. Files where one column has 1 mega-flat and another has N small flats (e.g. NYC Yellow Taxi 2024-01: 2.96M-row VendorID next to 23 × 131072-row datetime flats) now decode the wide column once into a
sharedArenaand slice it per chunk viaOffset*Array. Previously the scan iterator emitted a single chunk whose datetime columns were the first 131072 rows only — silently dropping 95.6 % of the file. FrameOfReferenceEncodingDecodernow uses the arena variant ofArraySegments.of, so lazy children (e.g.LazyRunEndLongArray) materialise instead of throwing "no primary segment".
Docs
- Compatibility table refreshed:
constant,varbinview,alprd,datetimeparts,decimal_byte_parts,decimalnow reflect their shipped Lazy shape; container encodings (list/listview/fixed_size_list) marked Lazy (inherit child shape);patchedpinned Materialized with reasoning. - New ADR 0013 — policy for dropping Materialized fallbacks once Lazy ships.
Maven Central: io.github.dfa1.vortex:vortex-reader:0.7.2 (and vortex-writer, vortex-cli, etc.).
v0.7.1
Cleanup release on top of 0.7.0 — one more lazy encoding, a Windows TUI usability fix, and a fresh round of read benchmarks.
Added
vortex.constantlazy decode — seven metadata-onlyLazyConstantXxxArrayrecords (Long / Int / Double / Float / Short / Byte / Bool) replace the one-element broadcast buffer; the per-element broadcast-modulo path is gone (3edf6e8c)- Top-N read benchmarks (N=10, 100) + README table, refreshed 80M-row numbers (c00fdf7f, 33714d7b, a6fd92fc)
Changed
- CLI:
schemaprints per-row column listing (9b3fe4b5) - CLI:
Terminal.readKeytakesDurationinstead oflong ms(2942a4da) - Reader: extract
TimeDtype+TimestampDtypeshared metadata helpers (8f1b9feb)
Fixed
- CLI: actionable error on Git Bash / MinTTY —
GetConsoleModefailure now points users atwinpty/ Windows Terminal / PowerShell instead of dead-ending on the raw error (6ec42288) - Reader:
ArraySegments.of(arr)typed-accessor fallback for lazy arrays (74ec207b)
CI
- Drop
sonar.cpd.exclusions(cde845bf)
Full changelog: v0.7.0...v0.7.1