Crunchy

Crunchy is a fast, versioned, backwards compatible, data serialization library for C (>= C11). More specifically, Crunchy is a compiler which turns a DSL text file containing type definitions file into C code for de/serializing those types.

Why Use Crunchy

The main selling point of Crunchy is that the structs which you use in your program for state representation are exactly the same thing that gets serialized. There are no accessor functions, no getter and setters. No need for a separate struct which represents the serialization format and you have to convert back and forth. You define a struct, crunchy generates the code to serialize and deserialize that struct. The serialization function takes a pointer to that struct and produces the binary representation. The deserialization function takes the binary representation and writes to a pointer of your struct. That's it.

The second reason to use Crunchy is that it has a much better implementation of backwards compatibility than the other popular options. The Crunchy compiler produces a lock file, which is a text representation of the current state of your binary data definition. Any time you make an edit to your struct definitions, the lock file is checked and compilation fails if you made a change which would break compatibility. Other formats like Protobuf and Cap'n Proto require developer discipline and careful edits to never introduce breaking changes. Their compiler does not help you in this regard.

Crunchy also has an enormous advantage over other popular serialization compilers: it is designed to be "vendored" (copied into your project). Crunchy is a single header file with an optional C file for a CLI interface. This means that your serialization format cannot change out from under you. You are in complete control of something that needs to be absolutely stable for your application to work correctly. You don't have to worry about vendoring an absolutely enormous C++ DSL compiler with its own specific build system dependencies. The crunchy header file is about 7000 lines and the compilation for the CLI interface is cc crunchy.c.

Crunchy is designed for programs that load serialized data once. Application state and network messages are good examples. Crunchy is not the best choice for workloads that read a small subset of fields from a large number of records and then discard them (e.g., scanning a large log file for a single field). Accessor-based formats like FlatBuffers are better suited for that pattern.

Crunchy is designed with three main goals in mind:

The binary format should be flexible enough to use for saved application state, network communication, and IPC.
- A non-goal of this project is to make it flexible enough to design something like a high performance database file format. You'd want to use something completely custom for that purpose anyway, not a general purpose tool like Crunchy.
The integration into a project should be as painless as possible.
- To that end, the result of deserialization is a plain struct
- No accessor functions
- The data you serialize is the exact same struct type as your normal application state struct
- No copying the data into a separate message type first
The binary format should be backwards compatible.
It should be impossible to introduce backwards incompatible or data-corrupting changes without intentionally doing so.

Example

Here's how it works:

You write a "crunchy file", a text file containing all of the type definitions of your program that you want to use in serialization.
Crunchy will then write a header file containing the C representations of these types. This way the crunchy file becomes the single source of truth and you don't need to keep two definitions in sync.
Crunchy will also write a source file containing the functions necessary for serialization and deserialization
Finally, a lock file will be produced. This file is used by subsequent invocations of crunchy to make sure that your edits to the crunchy file did not introduce any breaking changes.
Include the crunchy file and crunchy lock file in your source control
Include both the header and source files in your C build. Use the types as you normally would. they are just normal C structs.

That's pretty much it for most setups.

A crunchy can look something like the following

struct Vehicle
{
    ROOT;
    VERSION = 1;
    SIGNATURE = "MY_APP_VEHICLE";

    V(1) u32 vehicle_make_id;
    V(1) u32 vehicle_model_id;
    V(1) u16 vehicle_year;
    // This field was added for version 2, vehicles saved in version
    // one will not have this field but they can still be loaded by
    // version 2 code.
    V(2) u32 odometer_reading;
}

When you run the compiler you will get a header file and a C source file with the following definitions

#include <stdint.h>

struct Vehicle
{
  uint32_t vehicle_make_id;
  uint32_t vehicle_model_id;
  uint16_t vehicle_year;
  uint32_t odometer_reading;
}

void crunchy_serialize_vehicle(JSLOutputSink sink, const Vehicle* data, void* context);

bool crunchy_deserialize_vehicle(JSLAllocatorInterface allocator, JSLImmutableMemory binary_data, Vehicle* out_data, void* context);

Let's look at a more complicated example:

enum ColorTheme
{
    COLOR_THEME_LIGHT = 1,
    COLOR_THEME_DARK = 2,
    COLOR_THEME_HIGH_CONTRAST = 3
}

custom MyStringType
{
    BASE_TYPE = u8;
    SERIALIZER = serialize_my_string;
    DESERIALIZER = deserialize_my_string;
}

struct UserSettings
{
    VERSION = 3;

    V(1) ColorTheme color_theme;
    // This field was added for version 2
    V(2) i32 spaces_per_indent;
    V(2) bool uses_tabs;
    // This field was added for version 1 and was removed in
    // version 2. Version three will not save this field at all
    // but will skip it on deserialization 
    V(1, 2) bool auto_save_before_exit;
}

struct User
{
    ROOT;
    VERSION = 1;
    SIGNATURE = "MY_USER";

    V(1) UserSettings settings;
    V(1) MyStringType name;
}

Crunchy File Change Safety

Crunchy's lock file makes it impossible to introduce data-corrupting changes without intentionally doing so. Every time you run the crunchy compiler, it compares your current crunchy file against the lock file produced by the previous compilation. If any change would break the ability to read previously serialized data, the compiler refuses to generate code and reports the exact problem.

What the lock file checks

Type-level checks (all declaration kinds)

Root type removal — You cannot remove a ROOT struct that existed in the lock file. Root structs are serialization entry points, so dropping one would leave any previously serialized data with no way to be read back.
Non-root type removal — Non-root structs, enums, and custom types may be removed, but only once nothing else in the schema still references them. The compiler enforces this implicitly: if any other type still uses the removed type, semantic analysis fails before the lock file check even runs.
Declaration kind change — A type cannot change from one kind to another (e.g., a struct cannot become an enum).

Struct checks:

VERSION cannot decrease — The struct's version number must never go down.
VERSION cannot jump by more than 1 — Incrementing by more than one suggests missing intermediate lock file updates that may have introduced fields tied to skipped versions.
SIGNATURE cannot be removed or changed — If a struct had a signature, it must keep the exact same signature.
Fields cannot be deleted unless the field has an end version set and that end version is at or below the struct's MINIMUM_VERSION. This ensures the field is no longer present in any data the struct claims to support.
Fields cannot be deleted without MINIMUM_VERSION — If a struct has no MINIMUM_VERSION, no fields can be removed at all.
New fields cannot have a useless end version — A new field whose end version is already below MINIMUM_VERSION would never appear in any supported data and is flagged as an error.
Field version_start cannot change — The version at which a field was introduced is permanent.
Field version_end rules — Once a field's end version has "taken effect" (meaning the struct's VERSION has advanced past it, so data exists without the field), the end version cannot be changed or removed. If the end version has not yet taken effect, it can be adjusted, but not to a value below the locked VERSION.
Static array length cannot change — A field that was a static array must remain a static array of the same length. A scalar field cannot become an array, and an array field cannot become a scalar.
SKIP cannot be added or removed — The SKIP attribute changes serialization behavior and is permanent once set.
field_id cannot change — Each field's unique ID is assigned once and locked forever.
Field type cannot change — A field's type (whether builtin or user-defined) is permanent. You cannot change an i32 to an i64, or swap one struct type for another.

Enum checks:

Items cannot be removed — Enum items are permanent. If you need fewer items, create a new enum type and migrate.
Item values cannot change — The integer value associated with each enum item is locked.

Custom type checks:

BASE_TYPE cannot change — The struct type that backs the custom container is permanent.
SERIALIZER cannot change — The serialization function name is locked.
DESERIALIZER cannot change — The deserialization function name is locked.

Reversion Safety

TL;DR: git reverts are inherently dangerous and there's nothing Crunchy can do about that.

One scenario the lock file cannot protect against is git reverts of lock file changes when data has already been written with the reverted schema.

Consider this sequence:

You add a field score to a struct, giving it field_id=5 and type_id for its type. The lock file is updated and committed.
Your application serializes data using this schema — binary files now contain field headers with type_id + field_id=5 encoding score values.
You revert the commit, restoring the old lock file that has no knowledge of score.
You add a different field rating to the same struct. Because the lock file no longer knows about field_id=5, crunchy assigns field_id=5 to rating.

Now rating and the old score share the same field_id, and crunchy has no way to detect this — the lock file looks internally consistent.

However, the wire format provides significant protection against silent data corruption in this scenario. Every field in the binary stream is tagged with both a type_id (identifying the data type) and a field_id (identifying the field). If score was an i32 (type_id=6) and rating is an f64 (type_id=11), the deserializer will encounter a type_id that does not match what it expects for field_id=5. Because the generated deserialization code reads fields by matching the field_id and then consuming a fixed number of bytes determined by the type_id, a type mismatch will cause the deserializer to consume the wrong number of bytes, which will cascade into subsequent field header reads producing invalid type_id or field_id values. This will almost certainly cause deserialization to fail rather than silently produce corrupt data.

The one case where this protection does not help is when the old and new fields happen to share both the same field_id and the same type_id — for example, replacing one i32 field with a different i32 field. In that case, the deserializer has no way to distinguish the old field's bytes from the new field's bytes, and will silently interpret the old value as the new field. This is a narrow case, but it is worth understanding: the lock file is only as reliable as its git history, and reverting lock file changes after data has been written is an inherently unsafe operation that crunchy cannot fully guard against.

Current Performance

As of writing, the serialization performance is about 3.4-3.8 gb/second and the deserialization is about 5.6-6.2 gb/second. These numbers are measured purely with in memory operations. File loading or integrity checks are not included.

You can double check this on your machine with the benchmark/run_bench.sh script.

Given Crunchy's feature set, these numbers are within the bounds of acceptable and expected performance. Crunchy has several features which tradeoff speed for other considerations:

Crunchy allows field reordering. Removing this one feature would speed up serialization and deserialization performance by about a factor of 2.5x because instead of needing a loop + switch for deserialization you can just do a series of reads in a row. This is included anyway specifically because Crunchy produces your struct code so you need to be able to reorder the fields how ever you want for bit packing and byte padding reasons.
Crunchy reads and writes type ids and field ids. That's a four byte overhead for every single field. So a simple struct like a vec2 would have a type id + field id for the struct and then again for the two fields, meaning a bookkeeping overhead of 150%. Crunchy does this despite the downsides because,
- The field ID allows the field reordering mentioned above
- The type id + field id are your main defense against accidentally loading a divergent version of a serialized file. See the section "Reversion Safety" for more details

Crunchy File Formal Grammar

Program              ::= { CommentOrWS IncludeDir } { CommentOrWS Decl } EOF

IncludeDir           ::= "#include" IncludePath
IncludePath          ::= String | "<" AnglePath ">"
AnglePath            ::= /[^>\n]+/

Decl                 ::= { CommentOrWS } (StructDecl | EnumDecl | CustomDecl | UnionDecl)

StructDecl           ::= "struct" Identifier StructBody
StructBody           ::= "{" StructDirectives StructMembers "}"

StructDirectives     ::= { CommentOrWS | StructDirective }
StructDirective      ::= NoEmitDirective | RootDirective | VersionDirective | SignatureDirective | MinVersionDirective

StructMembers        ::= { CommentOrWS StructMember }
StructMember         ::= VersionedField ";"

VersionedField       ::= { CommentOrWS } [ "SKIP" ] VersionTag TypeSpec Identifier [ ArraySuffix ]
VersionTag           ::= "V" "(" Number [ "," Number ] ")"    // inclusive start[,end]
TypeSpec             ::= BuiltinType | Identifier             // user type or builtin
ArraySuffix          ::= "[" Number "]"                       // [N] fixed

EnumDecl             ::= "enum" Identifier EnumBody
EnumBody             ::= "{" EnumDirectives EnumItems "}"

EnumDirectives       ::= { CommentOrWS | EnumDirective }
EnumDirective        ::= NoEmitDirective;

EnumItems            ::= { CommentOrWS EnumItem ","? }
EnumItem             ::= { CommentOrWS } Identifier "=" Number 

CustomDecl           ::= "custom" Identifier CustomBody
CustomBody           ::= "{" CustomDirectives "}"
CustomDirectives     ::= { CommentOrWS CustomDirective }
CustomDirective      ::= CustomBaseType | CustomSerializer | CustomDeserializer
CustomBaseType       ::= "BASE_TYPE" Whitespace "=" Whitespace Identifier ";"
CustomSerializer     ::= "SERIALIZER" Whitespace "=" Whitespace Identifier ";"
CustomDeserializer   ::= "DESERIALIZER" Whitespace "=" Whitespace Identifier ";"

UnionDecl            ::= "union" Identifier UnionBody
UnionBody            ::= "{" UnionDirectives UnionVariants "}"

UnionDirectives      ::= { CommentOrWS | UnionDirective }
UnionDirective       ::= VersionDirective | MinVersionDirective | NoEmitDirective | TagDirective

UnionVariants        ::= { CommentOrWS UnionVariant }
UnionVariant         ::= { CommentOrWS } VersionTag TagReference TypeSpec Identifier ";"

TagDirective         ::= "TAG" Whitespace "=" Whitespace Identifier ";"
TagReference         ::= "TAG" "(" Identifier ")"

NoEmitDirective      ::= "NO_EMIT;"
RootDirective        ::= "ROOT;"
VersionDirective     ::= "VERSION" Whitespace "=" Whitespace Number Whitespace ";"
SignatureDirective   ::= "SIGNATURE" Whitespace "=" Whitespace String Whitespace ";"
MinVersionDirective  ::= "MINIMUM_VERSION" Whitespace "=" Whitespace Number Whitespace ";"

BuiltinType          ::= Identifier
Identifier           ::= /[A-Za-z_][A-Za-z0-9_]*/
Number               ::= /[0-9]+/
String               ::= /"[^"\n]*"/

CommentOrWS          ::= Whitespace | LineComment | BlockComment
Whitespace           ::= /[ \t\r\n]+/
LineComment          ::= "//" /[^\n]*/
DocLineComment       ::= "///" /[^\n]*/    // doc-intent; treated same as Comment in grammar
BlockComment         ::= "/*" (not "*/")* "*/"
DocBlockComment      ::= "/**" (not "*/")* "*/"

Lock File Formal Grammar

The lock file is a line-oriented text format. Each line begins with a keyword that identifies the record. Declarations are separated by blank lines. Lines starting with // are comments and are ignored by the parser.

The parser is not order-sensitive within a declaration block — attribute lines may appear in any order. The grammar below reflects that flexibility.

LockFile         ::= { Comment | BlankLine | Decl }

Decl             ::= StructDecl | EnumDecl | CustomDecl | UnionDecl

StructDecl       ::= StructHeader { StructAttr } { StructField }
StructHeader     ::= "struct" Identifier Newline
StructAttr       ::= "type_id"      Integer Newline
                   | "version"      Integer Newline
                   | "min_version"  Integer Newline
                   | "next_field_id" Integer Newline
                   | "signature"    Identifier Newline
                   | "root"         Newline
                   | "no_emit"      Newline
StructField      ::= "field" Identifier TypeName "id=" Integer "v=" VersionRange
                     [ "array=" Integer ] [ "skip" ] Newline

EnumDecl         ::= EnumHeader { EnumAttr } { EnumMember }
EnumHeader       ::= "enum" Identifier Newline
EnumAttr         ::= "type_id" Integer Newline
                   | "no_emit" Newline
EnumMember       ::= "member" Identifier Integer Newline

CustomDecl       ::= CustomHeader { CustomAttr }
CustomHeader     ::= "custom" Identifier Newline
CustomAttr       ::= "type_id"      Integer Newline
                   | "base_type"    Identifier Newline
                   | "serializer"   Identifier Newline
                   | "deserializer" Identifier Newline

UnionDecl        ::= UnionHeader { UnionAttr } { UnionVariant }
UnionHeader      ::= "union" Identifier Newline
UnionAttr        ::= "type_id"     Integer Newline
                   | "version"     Integer Newline
                   | "min_version" Integer Newline
                   | "tag"         Identifier Newline
                   | "no_emit"     Newline
UnionVariant     ::= "variant" Identifier "type=" Identifier "tag_value=" Integer
                     "v=" VersionRange Newline

VersionRange     ::= Integer [ "," Integer ]    // start[,end] — inclusive
TypeName         ::= Identifier

Identifier       ::= /[A-Za-z_][A-Za-z0-9_]*/
Integer          ::= "-"? /[0-9]+/
Comment          ::= "//" /[^\n]*/ Newline
BlankLine        ::= Newline
Newline          ::= "\n"

Tokens on a line are separated by single spaces. The lock file is written by the compiler and is not intended to be edited by hand.

Wire Format Specification

Crunchy's binary wire format is a little-endian, fixed-width format with no alignment padding. All multi-byte integers and floats are stored in the host's native byte order (assumed little-endian). There are no length-prefixed containers at the top level — the format is a flat sequence of typed values whose layout is determined entirely by the schema.

Notation

In the diagrams below, each named block represents a contiguous run of bytes in the output stream. Sizes are in bytes. U16, U32, I32, I64, etc. refer to fixed-width little-endian integers. F32 and F64 are IEEE 754 floats stored in their native representation.

Struct Encoding

A struct is encoded as an optional signature, a version word, and then its fields in declaration order:

[signature bytes]    — raw bytes, only if SIGNATURE is set on the struct
version  : U32       — the struct's VERSION value
field_0              — first active field (see "Field Encoding" below)
field_1              — second active field
...
field_N              — last active field

Fields whose version range has expired (i.e. version_end < VERSION) are not written during serialization. Fields marked SKIP are also omitted. All other fields are written unconditionally — version guards on the deserializer side handle forward/backward compatibility.

Field Encoding

Every field is preceded by a 4-byte header that identifies the field's type and its unique field ID:

type_id  : U16       — identifies the data type (see "Type IDs" below)
field_id : U16       — unique, stable identifier for this field within its struct
<payload>            — the field's data, format depends on the type

The field_id is assigned when a field is first created and never changes, even if fields are reordered in the schema. This enables future order-independent deserialization.

The payload format depends on the field kind:

Builtin Scalar

A single value of the field's primitive type, written at its natural width:

Type	Payload Size
`bool`	1 (U8)
`i8`	1
`u8`	1
`i16`	2
`u16`	2
`i32`	4
`u32`	4
`i64`	8
`u64`	8
`f32`	4
`f64`	8

Example — an i32 field with field_id=3 and value 42:

06 00        type_id = 6 (i32)
03 00        field_id = 3
2a 00 00 00  value = 42

Builtin Static Array

The field header is written once, followed by N consecutive values where N is the compile-time array length from the schema:

type_id  : U16
field_id : U16
element_0            — first element at natural width
element_1
...
element_{N-1}

Enum

Enum fields are serialized as I32 values:

type_id  : U16       — the enum's assigned type_id
field_id : U16
value    : I32       — the enum's integer value

Enum static arrays follow the same pattern as builtin static arrays: one header followed by N consecutive I32 values.

Nested Struct

The field header is written, then the referenced struct is serialized inline using the struct encoding described above (signature + version + fields):

type_id  : U16       — the nested struct's assigned type_id
field_id : U16
<inline struct encoding>

For static arrays of structs, the field header is written once, then N consecutive inline struct encodings follow.

Custom Type

Custom types use a user-provided serializer that converts the custom data into an array of a base struct type. The wire format is:

type_id  : U16       — the custom type's assigned type_id
field_id : U16
array_len : I64      — number of base type elements
element_0            — first element, encoded as an inline struct
element_1
...
element_{array_len-1}

During serialization, crunchy calls the user's serializer function to obtain a pointer to an array of the base type and its length, then writes the length followed by each element serialized as an inline struct.

During deserialization, the array length is read, memory is allocated for the array, each element is deserialized as an inline struct, and then the user's deserializer function is called to convert the array back into the custom type.

Type IDs

Every type in a crunchy schema is assigned a uint16_t type ID. These IDs are written into the binary stream as part of each field header and are persisted in the lock file to ensure stability across schema changes.

Builtin types have fixed, well-known IDs that match the CrunchyFieldTypeId enum in crunchy.h:

Type	ID
`bool`	1
`i8`	2
`u8`	3
`i16`	4
`u16`	5
`i32`	6
`u32`	7
`i64`	8
`u64`	9
`f32`	10
`f64`	11

IDs 12 through 16,383 are reserved for future builtin types.

User-defined types — structs, enums, and custom types — are assigned IDs starting at 16,384 (CRUNCHY_FIELD_TYPE_USER_DEFINED_START). IDs are assigned sequentially in declaration order on first compilation and are persisted in the lock file. New types added in later compilations receive the next available ID. Once assigned, a type's ID never changes.

Version Guards (Deserialization)

During deserialization, each field is wrapped in a version guard based on its V(start[, end]) annotation:

V(start) → if (read_version >= start) { ... }
V(start, end) → if (read_version >= start && read_version <= end) { ... }

If a struct has MINIMUM_VERSION set and a field's start is at or below that minimum, the start check is elided (the field is always present in any valid data).

This means that when deserializing data written by an older version of the schema, fields not yet present are simply skipped (left at their zero-initialized value), and fields that have since been removed are still read from the stream and stored in the struct.

Signature Validation

If a struct has a SIGNATURE directive, the signature bytes are written as raw bytes (no length prefix) at the start of the struct encoding. During deserialization, the same number of bytes are read and compared. If they don't match, deserialization returns false immediately.

Signatures are useful for detecting data corruption or ensuring you're reading the expected type from an untyped byte stream.

Complete Example

Given this schema:

struct Inner
{
    VERSION = 1;
    SIGNATURE = "INNER";

    V(1) i32 x;
    V(1) i32 y;
}

struct Outer
{
    ROOT;
    VERSION = 1;
    SIGNATURE = "OUTER";

    V(1) i32 value;
    V(1) Inner nested;
}

Serializing Outer { value = 42, nested = { x = 10, y = 20 } } produces:

Offset  Bytes                   Meaning
------  -----                   -------
 0      4f 55 54 45 52          "OUTER" signature (5 bytes)
 5      01 00 00 00             Outer version = 1 (U32)
 9      06 00                   type_id = 6 (i32)
11      01 00                   field_id = 1 (value)
13      2a 00 00 00             value = 42 (I32)
17      00 40                   type_id = 16384 (Inner struct)
19      02 00                   field_id = 2 (nested)
21      49 4e 4e 45 52          "INNER" signature (5 bytes)
26      01 00 00 00             Inner version = 1 (U32)
30      06 00                   type_id = 6 (i32)
32      01 00                   field_id = 1 (x)
34      0a 00 00 00             x = 10 (I32)
38      06 00                   type_id = 6 (i32)
40      02 00                   field_id = 2 (y)
42      14 00 00 00             y = 20 (I32)

Total: 46 bytes

Security

The binary data produced by Crunchy has no built-in encryption, authentication, or integrity checking. If your application needs any of these properties, you add them yourself.

Integrity

Even if you are only deserializing data that you yourself produced (saved game state, local configuration, etc.), it is a good idea to store a checksum alongside the binary data (CRC-32 is fine for this) and verify it. This catches garden-variety data corruption from disk errors, truncated writes, etc.

Authentication and Untrusted Data

If you are deserializing data that comes from an untrusted source (network messages, files uploaded by users, IPC with a less-privileged process, etc.), you need a checksum + authentication. Either the transport already provides authentication (TLS with mutual or server authentication, for example) or you must add your own with a message authentication code such as HMAC-SHA256 applied to the serialized bytes before deserialization.

Without authentication, an attacker can craft binary payloads that, while not exploiting crunchy itself, can set any field of your structs to any value the type allows. Whether that is dangerous depends entirely on what your application does with those values.

Do Not Disable C Runtime Assertions

The generated deserialization code depends on the JSL's JSL_ASSERT macro for bounds checking during deserialization. By default this maps to assert.

Compiling with NDEBUG defined disables all of these assertions. If you do this, a truncated or corrupted payload will silently read past the end of the input buffer.

Using NDEBUG is just a bad idea in general. Manually removing assertions in specific hot spots in your code makes sense. Disabling assertions across your entire program is you asking for data breaches.

FAQ

Why don't you just use JSON?

JSON is incredibly slow and bloated for most use cases. If you don't need human readable, hierarchical, or schema-less data you really should not use JSON.

Every number must be parsed from its decimal string representation. Every string is UTF-8 encoded with escape sequences that must be interpreted. Every field name is repeated in every record. A 4-byte i32 with value 1000000 takes 7 bytes as the text "1000000", plus the field name, plus quotes, plus a colon, plus structural characters. The parsing cost scales with the size of the text, not the size of the data. Changing all of the key names to two character codes with a look up table in the app becomes a significant optimization on larger documents.

Crunchy's binary format stores that same i32 as 4 bytes, always, preceded by a fixed 4-byte field header (type ID + field ID). There are no field names in the output — just compact numeric identifiers. Each field deserialization is a sequence of fixed-width memory reads with no parsing or decoding loops. JSON deserialization is orders of magnitude slower for any non-trivial data.

JSON also has no schema. Field types, required fields, and versioning are all conventions enforced by application code. Crunchy's schema file is the single source of truth, and the lock file prevents breaking changes between versions.

Why don't you just use Protobuf?

Protobuf does have two major advantages over Crunchy: it supports forwards compatibility and it has broad cross-language support.

For backwards compatibility, Protobuf relies on manually numbered fields and developer discipline. Each field has a number (int32 name = 1;) and the wire format uses that number as the field identifier. Adding new fields with new numbers is safe — old readers skip unknown field numbers and new readers use default values for missing fields. But protoc only compiles one version of a schema at a time. It does not compare your current .proto file against any previous version, so it cannot catch you changing a field's type, reusing a retired field number across schema versions, or removing a field that existing data still contains.

Protobuf does have a reserved keyword that prevents reuse of specific field numbers or names within a single message definition, and protoc enforces it. But someone has to remember to add the reservation when deleting a field — there is no automatic tracking. If you forget, a teammate can reuse that number and protoc will not complain. Third-party tools like Buf exist to fill this gap, but they are not part of the standard Protobuf toolchain.

Crunchy's lock file handles all of this automatically. Every invocation compares the current schema against the locked version and rejects incompatible changes like changing types and removing fields that are still in use by the current version. There is no manual field tracking.

Protobuf is mainly designed for passing messages across a network. The C++ implementation generates message classes with getter/setter methods, not plain structs — fields are private and accessed through methods like name(), set_name(), has_name(). This means you typically maintain two representations of your data: the Protobuf message type and your application's own types, with conversion code between them. Crunchy structs are your application structs. Once deserialization is done you have a normal C struct ready to use.

Protobuf does not officially support C. There are third-party C implementations (protobuf-c, nanopb) that generate actual C structs, but they are community maintained and require the C++ protoc compiler to be installed for code generation.

Why don't you just use FlatBuffers?

FlatBuffers uses an accessor-based API: you never get plain C structs. Instead, every field access goes through generated functions that chase offsets into a serialized buffer. This means FlatBuffers has zero deserialization cost upfront — you just get a pointer into the buffer and start reading.

The tradeoff is that every field access pays for that indirection. Reading a scalar field in FlatBuffers involves reading a vtable pointer, reading the field offset from the vtable, and then reading the actual value — three dependent memory loads. In a tight loop accessing the same struct repeatedly, this adds up.

Crunchy takes the opposite approach: you pay an upfront deserialization cost where each field is read as a fixed-width value into a plain C struct — no varint decoding, no offset chasing. After deserialization you get real C structs with direct field access (a single load at a known offset).

For types that contain custom containers (hash maps, trees, etc.), crunchy does require an upfront deserialization step. But after that step, you have real data structures with their native performance characteristics — O(1) hash map lookups instead of O(log n) binary search into a sorted buffer, which is how FlatBuffers would represent the same data.

The other difference is that crunchy gives you plain C structs. You can pass them to existing functions, take their address, memcpy them, and use them like any other struct. FlatBuffers requires your entire codebase to interact with data through its generated accessor API.

Why don't you just use Cap'n Proto?

Cap'n Proto's big idea is that the in-memory representation is the wire format — there is no encode/decode step. You write bytes straight to disk or the network and read them back without parsing. For message-passing between services in the same language this is genuinely fast.

The cost is the same accessor problem as FlatBuffers. Cap'n Proto generates Builder and Reader wrapper types; every field read or write goes through methods that perform pointer arithmetic and bounds checking. You never get a plain C struct you can pass to existing code, memcpy, or inspect in a debugger. Your application either adopts Cap'n Proto types everywhere or maintains a conversion layer between its own types and the message types.

Cap'n Proto also uses a pointer-based encoding (struct pointers, list pointers, far pointers for cross-segment references) that is substantially more complex than a flat sequential format. Implementing a correct reader from scratch is non-trivial, which matters if you value being able to vendor a dependency and understand it end to end.

On the C front, Cap'n Proto is a C++ project. The third-party C library, c-capnproto, is unmaintained. Even using the C library requires installing the C++ capnp schema compiler, since the C code generator is a plugin to it.

For backwards compatibility, Cap'n Proto relies on convention and careful ordering. Fields are numbered with annotations (@0, @1, @2, ...) and new fields must be added at the end of the struct with the next sequential number. A reader that encounters a struct written with a newer schema simply ignores the extra data section at the tail. But nothing in the tooling prevents you from reordering fields, changing a field's type, or reusing a retired field number — all of which silently corrupt data. The schema compiler checks syntax, not compatibility between schema versions. It is up to the developer to never make those mistakes.

Crunchy enforces this with the lock file. Every time you run the compiler it compares your current schema against the locked version and rejects backwards-incompatible changes: reordering fields, changing types, removing fields that are still in use by a live version. When you do need to remove a field, you mark it with skip and bump the version — Crunchy handles the rest and the old field ID is never accidentally reused. You don't need to remember which numbers are retired or trust that no one on the team will make an edit that looks harmless but breaks the wire format.

Crunchy deserializes into plain C structs in a single upfront pass, has no C++ dependency, and the lock file catches backwards-incompatible schema changes that Cap'n Proto leaves to developer discipline.

Why is it called Crunchy?

Because cereal is crunchy. Get it? Cereal? Serial? No?

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
benchmark		benchmark
src		src
tests		tests
vendor		vendor
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE.md		LICENSE.md
README.md		README.md
TODO.md		TODO.md
TUTORIAL.md		TUTORIAL.md
build.sh		build.sh
generate_containers.sh		generate_containers.sh
test.sh		test.sh

Folders and files

Latest commit

History

Repository files navigation

Crunchy

Table of Contents

Why Use Crunchy

Example

Crunchy File Change Safety

What the lock file checks

Type-level checks (all declaration kinds)

Struct checks:

Enum checks:

Custom type checks:

Reversion Safety

Current Performance

Crunchy File Formal Grammar

Lock File Formal Grammar

Wire Format Specification

Notation

Struct Encoding

Field Encoding

Builtin Scalar

Builtin Static Array

Enum

Nested Struct

Custom Type

Type IDs

Version Guards (Deserialization)

Signature Validation

Complete Example

Security

Integrity

Authentication and Untrusted Data

Do Not Disable C Runtime Assertions

FAQ

Why don't you just use JSON?

Why don't you just use Protobuf?

Why don't you just use FlatBuffers?

Why don't you just use Cap'n Proto?

Why is it called Crunchy?

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages