Skip to content

[WIP][POC] Pfor encoding#579

Draft
prtkgaur wants to merge 2 commits into
apache:masterfrom
prtkgaur:pforEncoding
Draft

[WIP][POC] Pfor encoding#579
prtkgaur wants to merge 2 commits into
apache:masterfrom
prtkgaur:pforEncoding

Conversation

@prtkgaur
Copy link
Copy Markdown

@prtkgaur prtkgaur commented Jun 3, 2026

Rationale for this change

What changes are included in this PR?

Do these changes have PoC implementations?

PFOR is an integer compression encoding (encoding number 11) for INT32
and INT64 columns. It compresses by subtracting the minimum value (FOR),
selecting an optimal bit width via a histogram-based cost model, bit-packing
the deltas, and storing outlier values as exceptions with their positions.

Adds the full encoding specification in Encodings.md and the PFOR = 11
enum entry in parquet.thrift.
Remove stray colon from "Patched Frame of Reference: (PFOR = 11)"
to be consistent with other headings like "Delta Encoding (DELTA_BINARY_PACKED = 5)".
@alamb
Copy link
Copy Markdown
Contributor

alamb commented Jun 5, 2026

What are your thoughts about using FastLanes https://15721.courses.cs.cmu.edu/spring2024/papers/03-data2/p2132-afroozeh.pdf

(which has frame of reference in addition to several other schemes, and is reported to have very good eprformance)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants