Crunchy is a fast, versioned, backwards compatible, data serialization library for C (>= C11). More specifically, Crunchy is a compiler which turns a DSL text file containing type definitions file into C code for de/serializing those types.
- Why Use Crunchy
- Example
- Crunchy File Change Safety
- Current Performance
- Crunchy File Formal Grammar
- Lock File Formal Grammar
- Wire Format Specification
- Security
- FAQ
The main selling point of Crunchy is that the structs which you use in your program for state representation are exactly the same thing that gets serialized. There are no accessor functions, no getter and setters. No need for a separate struct which represents the serialization format and you have to convert back and forth. You define a struct, crunchy generates the code to serialize and deserialize that struct. The serialization function takes a pointer to that struct and produces the binary representation. The deserialization function takes the binary representation and writes to a pointer of your struct. That's it.
The second reason to use Crunchy is that it has a much better implementation of backwards compatibility than the other popular options. The Crunchy compiler produces a lock file, which is a text representation of the current state of your binary data definition. Any time you make an edit to your struct definitions, the lock file is checked and compilation fails if you made a change which would break compatibility. Other formats like Protobuf and Cap'n Proto require developer discipline and careful edits to never introduce breaking changes. Their compiler does not help you in this regard.
Crunchy also has an enormous advantage over other popular serialization compilers: it is designed
to be "vendored" (copied into your project). Crunchy is a single header file with an optional C
file for a CLI interface. This means that your serialization format cannot change out from under
you. You are in complete control of something that needs to be absolutely stable for your application
to work correctly. You don't have to worry about vendoring an absolutely enormous C++
DSL compiler with its own specific build system dependencies. The crunchy header file is about
7000 lines and the compilation for the CLI interface is cc crunchy.c.
Crunchy is designed for programs that load serialized data once. Application state and network messages are good examples. Crunchy is not the best choice for workloads that read a small subset of fields from a large number of records and then discard them (e.g., scanning a large log file for a single field). Accessor-based formats like FlatBuffers are better suited for that pattern.
Crunchy is designed with three main goals in mind:
- The binary format should be flexible enough to use for saved application state, network communication, and IPC.
- A non-goal of this project is to make it flexible enough to design something like a high performance database file format. You'd want to use something completely custom for that purpose anyway, not a general purpose tool like Crunchy.
- The integration into a project should be as painless as possible.
- To that end, the result of deserialization is a plain struct
- No accessor functions
- The data you serialize is the exact same struct type as your normal application state struct
- No copying the data into a separate message type first
- The binary format should be backwards compatible.
- It should be impossible to introduce backwards incompatible or data-corrupting changes without intentionally doing so.
Here's how it works:
- You write a "crunchy file", a text file containing all of the type definitions of your program that you want to use in serialization.
- Crunchy will then write a header file containing the C representations of these types. This way the crunchy file becomes the single source of truth and you don't need to keep two definitions in sync.
- Crunchy will also write a source file containing the functions necessary for serialization and deserialization
- Finally, a lock file will be produced. This file is used by subsequent invocations of crunchy to make sure that your edits to the crunchy file did not introduce any breaking changes.
- Include the crunchy file and crunchy lock file in your source control
- Include both the header and source files in your C build. Use the types as you normally would. they are just normal C structs.
That's pretty much it for most setups.
A crunchy can look something like the following
struct Vehicle
{
ROOT;
VERSION = 1;
SIGNATURE = "MY_APP_VEHICLE";
V(1) u32 vehicle_make_id;
V(1) u32 vehicle_model_id;
V(1) u16 vehicle_year;
// This field was added for version 2, vehicles saved in version
// one will not have this field but they can still be loaded by
// version 2 code.
V(2) u32 odometer_reading;
}
When you run the compiler you will get a header file and a C source file with the following definitions
#include <stdint.h>
struct Vehicle
{
uint32_t vehicle_make_id;
uint32_t vehicle_model_id;
uint16_t vehicle_year;
uint32_t odometer_reading;
}
void crunchy_serialize_vehicle(JSLOutputSink sink, const Vehicle* data, void* context);
bool crunchy_deserialize_vehicle(JSLAllocatorInterface allocator, JSLImmutableMemory binary_data, Vehicle* out_data, void* context);
Let's look at a more complicated example:
enum ColorTheme
{
COLOR_THEME_LIGHT = 1,
COLOR_THEME_DARK = 2,
COLOR_THEME_HIGH_CONTRAST = 3
}
custom MyStringType
{
BASE_TYPE = u8;
SERIALIZER = serialize_my_string;
DESERIALIZER = deserialize_my_string;
}
struct UserSettings
{
VERSION = 3;
V(1) ColorTheme color_theme;
// This field was added for version 2
V(2) i32 spaces_per_indent;
V(2) bool uses_tabs;
// This field was added for version 1 and was removed in
// version 2. Version three will not save this field at all
// but will skip it on deserialization
V(1, 2) bool auto_save_before_exit;
}
struct User
{
ROOT;
VERSION = 1;
SIGNATURE = "MY_USER";
V(1) UserSettings settings;
V(1) MyStringType name;
}
Crunchy's lock file makes it impossible to introduce data-corrupting changes without intentionally doing so. Every time you run the crunchy compiler, it compares your current crunchy file against the lock file produced by the previous compilation. If any change would break the ability to read previously serialized data, the compiler refuses to generate code and reports the exact problem.
- Root type removal — You cannot remove a
ROOTstruct that existed in the lock file. Root structs are serialization entry points, so dropping one would leave any previously serialized data with no way to be read back. - Non-root type removal — Non-root structs, enums, and custom types may be removed, but only once nothing else in the schema still references them. The compiler enforces this implicitly: if any other type still uses the removed type, semantic analysis fails before the lock file check even runs.
- Declaration kind change — A type cannot change from one kind to another (e.g., a struct cannot become an enum).
VERSIONcannot decrease — The struct's version number must never go down.VERSIONcannot jump by more than 1 — Incrementing by more than one suggests missing intermediate lock file updates that may have introduced fields tied to skipped versions.SIGNATUREcannot be removed or changed — If a struct had a signature, it must keep the exact same signature.- Fields cannot be deleted unless the field has an end version set and that end
version is at or below the struct's
MINIMUM_VERSION. This ensures the field is no longer present in any data the struct claims to support. - Fields cannot be deleted without MINIMUM_VERSION — If a struct has no
MINIMUM_VERSION, no fields can be removed at all. - New fields cannot have a useless end version — A new field whose end version
is already below
MINIMUM_VERSIONwould never appear in any supported data and is flagged as an error. - Field version_start cannot change — The version at which a field was introduced is permanent.
- Field version_end rules — Once a field's end version has "taken effect"
(meaning the struct's
VERSIONhas advanced past it, so data exists without the field), the end version cannot be changed or removed. If the end version has not yet taken effect, it can be adjusted, but not to a value below the lockedVERSION. - Static array length cannot change — A field that was a static array must remain a static array of the same length. A scalar field cannot become an array, and an array field cannot become a scalar.
SKIPcannot be added or removed — TheSKIPattribute changes serialization behavior and is permanent once set.- field_id cannot change — Each field's unique ID is assigned once and locked forever.
- Field type cannot change — A field's type (whether builtin or user-defined) is
permanent. You cannot change an
i32to ani64, or swap one struct type for another.
- Items cannot be removed — Enum items are permanent. If you need fewer items, create a new enum type and migrate.
- Item values cannot change — The integer value associated with each enum item is locked.
BASE_TYPEcannot change — The struct type that backs the custom container is permanent.SERIALIZERcannot change — The serialization function name is locked.DESERIALIZERcannot change — The deserialization function name is locked.
TL;DR: git reverts are inherently dangerous and there's nothing Crunchy can do about that.
One scenario the lock file cannot protect against is git reverts of lock file changes when data has already been written with the reverted schema.
Consider this sequence:
- You add a field
scoreto a struct, giving itfield_id=5andtype_idfor its type. The lock file is updated and committed. - Your application serializes data using this schema — binary files now contain
field headers with
type_id+field_id=5encodingscorevalues. - You revert the commit, restoring the old lock file that has no knowledge of
score. - You add a different field
ratingto the same struct. Because the lock file no longer knows aboutfield_id=5, crunchy assignsfield_id=5torating.
Now rating and the old score share the same field_id, and crunchy has no
way to detect this — the lock file looks internally consistent.
However, the wire format provides significant protection against silent data
corruption in this scenario. Every field in the binary stream is tagged with both
a type_id (identifying the data type) and a field_id (identifying the field).
If score was an i32 (type_id=6) and rating is an f64 (type_id=11), the
deserializer will encounter a type_id that does not match what it expects for
field_id=5. Because the generated deserialization code reads fields by matching
the field_id and then consuming a fixed number of bytes determined by the type_id,
a type mismatch will cause the deserializer to consume the wrong number of bytes,
which will cascade into subsequent field header reads producing invalid type_id or
field_id values. This will almost certainly cause deserialization to fail rather
than silently produce corrupt data.
The one case where this protection does not help is when the old and new fields
happen to share both the same field_id and the same type_id — for example,
replacing one i32 field with a different i32 field. In that case, the
deserializer has no way to distinguish the old field's bytes from the new field's
bytes, and will silently interpret the old value as the new field. This is a narrow
case, but it is worth understanding: the lock file is only as reliable as its
git history, and reverting lock file changes after data has been written is an
inherently unsafe operation that crunchy cannot fully guard against.
As of writing, the serialization performance is about 3.4-3.8 gb/second and the deserialization is about 5.6-6.2 gb/second. These numbers are measured purely with in memory operations. File loading or integrity checks are not included.
You can double check this on your machine with the benchmark/run_bench.sh
script.
Given Crunchy's feature set, these numbers are within the bounds of acceptable and expected performance. Crunchy has several features which tradeoff speed for other considerations:
- Crunchy allows field reordering. Removing this one feature would speed up serialization and deserialization performance by about a factor of 2.5x because instead of needing a loop + switch for deserialization you can just do a series of reads in a row. This is included anyway specifically because Crunchy produces your struct code so you need to be able to reorder the fields how ever you want for bit packing and byte padding reasons.
- Crunchy reads and writes type ids and field ids. That's a four byte overhead
for every single field. So a simple struct like a
vec2would have a type id + field id for the struct and then again for the two fields, meaning a bookkeeping overhead of 150%. Crunchy does this despite the downsides because,- The field ID allows the field reordering mentioned above
- The type id + field id are your main defense against accidentally loading a divergent version of a serialized file. See the section "Reversion Safety" for more details
Program ::= { CommentOrWS IncludeDir } { CommentOrWS Decl } EOF
IncludeDir ::= "#include" IncludePath
IncludePath ::= String | "<" AnglePath ">"
AnglePath ::= /[^>\n]+/
Decl ::= { CommentOrWS } (StructDecl | EnumDecl | CustomDecl | UnionDecl)
StructDecl ::= "struct" Identifier StructBody
StructBody ::= "{" StructDirectives StructMembers "}"
StructDirectives ::= { CommentOrWS | StructDirective }
StructDirective ::= NoEmitDirective | RootDirective | VersionDirective | SignatureDirective | MinVersionDirective
StructMembers ::= { CommentOrWS StructMember }
StructMember ::= VersionedField ";"
VersionedField ::= { CommentOrWS } [ "SKIP" ] VersionTag TypeSpec Identifier [ ArraySuffix ]
VersionTag ::= "V" "(" Number [ "," Number ] ")" // inclusive start[,end]
TypeSpec ::= BuiltinType | Identifier // user type or builtin
ArraySuffix ::= "[" Number "]" // [N] fixed
EnumDecl ::= "enum" Identifier EnumBody
EnumBody ::= "{" EnumDirectives EnumItems "}"
EnumDirectives ::= { CommentOrWS | EnumDirective }
EnumDirective ::= NoEmitDirective;
EnumItems ::= { CommentOrWS EnumItem ","? }
EnumItem ::= { CommentOrWS } Identifier "=" Number
CustomDecl ::= "custom" Identifier CustomBody
CustomBody ::= "{" CustomDirectives "}"
CustomDirectives ::= { CommentOrWS CustomDirective }
CustomDirective ::= CustomBaseType | CustomSerializer | CustomDeserializer
CustomBaseType ::= "BASE_TYPE" Whitespace "=" Whitespace Identifier ";"
CustomSerializer ::= "SERIALIZER" Whitespace "=" Whitespace Identifier ";"
CustomDeserializer ::= "DESERIALIZER" Whitespace "=" Whitespace Identifier ";"
UnionDecl ::= "union" Identifier UnionBody
UnionBody ::= "{" UnionDirectives UnionVariants "}"
UnionDirectives ::= { CommentOrWS | UnionDirective }
UnionDirective ::= VersionDirective | MinVersionDirective | NoEmitDirective | TagDirective
UnionVariants ::= { CommentOrWS UnionVariant }
UnionVariant ::= { CommentOrWS } VersionTag TagReference TypeSpec Identifier ";"
TagDirective ::= "TAG" Whitespace "=" Whitespace Identifier ";"
TagReference ::= "TAG" "(" Identifier ")"
NoEmitDirective ::= "NO_EMIT;"
RootDirective ::= "ROOT;"
VersionDirective ::= "VERSION" Whitespace "=" Whitespace Number Whitespace ";"
SignatureDirective ::= "SIGNATURE" Whitespace "=" Whitespace String Whitespace ";"
MinVersionDirective ::= "MINIMUM_VERSION" Whitespace "=" Whitespace Number Whitespace ";"
BuiltinType ::= Identifier
Identifier ::= /[A-Za-z_][A-Za-z0-9_]*/
Number ::= /[0-9]+/
String ::= /"[^"\n]*"/
CommentOrWS ::= Whitespace | LineComment | BlockComment
Whitespace ::= /[ \t\r\n]+/
LineComment ::= "//" /[^\n]*/
DocLineComment ::= "///" /[^\n]*/ // doc-intent; treated same as Comment in grammar
BlockComment ::= "/*" (not "*/")* "*/"
DocBlockComment ::= "/**" (not "*/")* "*/"
The lock file is a line-oriented text format. Each line begins with a keyword
that identifies the record. Declarations are separated by blank lines.
Lines starting with // are comments and are ignored by the parser.
The parser is not order-sensitive within a declaration block — attribute lines may appear in any order. The grammar below reflects that flexibility.
LockFile ::= { Comment | BlankLine | Decl }
Decl ::= StructDecl | EnumDecl | CustomDecl | UnionDecl
StructDecl ::= StructHeader { StructAttr } { StructField }
StructHeader ::= "struct" Identifier Newline
StructAttr ::= "type_id" Integer Newline
| "version" Integer Newline
| "min_version" Integer Newline
| "next_field_id" Integer Newline
| "signature" Identifier Newline
| "root" Newline
| "no_emit" Newline
StructField ::= "field" Identifier TypeName "id=" Integer "v=" VersionRange
[ "array=" Integer ] [ "skip" ] Newline
EnumDecl ::= EnumHeader { EnumAttr } { EnumMember }
EnumHeader ::= "enum" Identifier Newline
EnumAttr ::= "type_id" Integer Newline
| "no_emit" Newline
EnumMember ::= "member" Identifier Integer Newline
CustomDecl ::= CustomHeader { CustomAttr }
CustomHeader ::= "custom" Identifier Newline
CustomAttr ::= "type_id" Integer Newline
| "base_type" Identifier Newline
| "serializer" Identifier Newline
| "deserializer" Identifier Newline
UnionDecl ::= UnionHeader { UnionAttr } { UnionVariant }
UnionHeader ::= "union" Identifier Newline
UnionAttr ::= "type_id" Integer Newline
| "version" Integer Newline
| "min_version" Integer Newline
| "tag" Identifier Newline
| "no_emit" Newline
UnionVariant ::= "variant" Identifier "type=" Identifier "tag_value=" Integer
"v=" VersionRange Newline
VersionRange ::= Integer [ "," Integer ] // start[,end] — inclusive
TypeName ::= Identifier
Identifier ::= /[A-Za-z_][A-Za-z0-9_]*/
Integer ::= "-"? /[0-9]+/
Comment ::= "//" /[^\n]*/ Newline
BlankLine ::= Newline
Newline ::= "\n"
Tokens on a line are separated by single spaces. The lock file is written by the compiler and is not intended to be edited by hand.
Crunchy's binary wire format is a little-endian, fixed-width format with no alignment padding. All multi-byte integers and floats are stored in the host's native byte order (assumed little-endian). There are no length-prefixed containers at the top level — the format is a flat sequence of typed values whose layout is determined entirely by the schema.
In the diagrams below, each named block represents a contiguous run of bytes
in the output stream. Sizes are in bytes. U16, U32, I32, I64, etc.
refer to fixed-width little-endian integers. F32 and F64 are IEEE 754
floats stored in their native representation.
A struct is encoded as an optional signature, a version word, and then its fields in declaration order:
[signature bytes] — raw bytes, only if SIGNATURE is set on the struct
version : U32 — the struct's VERSION value
field_0 — first active field (see "Field Encoding" below)
field_1 — second active field
...
field_N — last active field
Fields whose version range has expired (i.e. version_end < VERSION) are
not written during serialization. Fields marked SKIP are also omitted.
All other fields are written unconditionally — version guards on the
deserializer side handle forward/backward compatibility.
Every field is preceded by a 4-byte header that identifies the field's type and its unique field ID:
type_id : U16 — identifies the data type (see "Type IDs" below)
field_id : U16 — unique, stable identifier for this field within its struct
<payload> — the field's data, format depends on the type
The field_id is assigned when a field is first created and never changes,
even if fields are reordered in the schema. This enables future
order-independent deserialization.
The payload format depends on the field kind:
A single value of the field's primitive type, written at its natural width:
| Type | Payload Size |
|---|---|
bool |
1 (U8) |
i8 |
1 |
u8 |
1 |
i16 |
2 |
u16 |
2 |
i32 |
4 |
u32 |
4 |
i64 |
8 |
u64 |
8 |
f32 |
4 |
f64 |
8 |
Example — an i32 field with field_id=3 and value 42:
06 00 type_id = 6 (i32)
03 00 field_id = 3
2a 00 00 00 value = 42
The field header is written once, followed by N consecutive values where
N is the compile-time array length from the schema:
type_id : U16
field_id : U16
element_0 — first element at natural width
element_1
...
element_{N-1}
Enum fields are serialized as I32 values:
type_id : U16 — the enum's assigned type_id
field_id : U16
value : I32 — the enum's integer value
Enum static arrays follow the same pattern as builtin static arrays: one
header followed by N consecutive I32 values.
The field header is written, then the referenced struct is serialized inline using the struct encoding described above (signature + version + fields):
type_id : U16 — the nested struct's assigned type_id
field_id : U16
<inline struct encoding>
For static arrays of structs, the field header is written once, then N
consecutive inline struct encodings follow.
Custom types use a user-provided serializer that converts the custom data into an array of a base struct type. The wire format is:
type_id : U16 — the custom type's assigned type_id
field_id : U16
array_len : I64 — number of base type elements
element_0 — first element, encoded as an inline struct
element_1
...
element_{array_len-1}
During serialization, crunchy calls the user's serializer function to obtain a pointer to an array of the base type and its length, then writes the length followed by each element serialized as an inline struct.
During deserialization, the array length is read, memory is allocated for the array, each element is deserialized as an inline struct, and then the user's deserializer function is called to convert the array back into the custom type.
Every type in a crunchy schema is assigned a uint16_t type ID. These IDs are
written into the binary stream as part of each field header and are persisted
in the lock file to ensure stability across schema changes.
Builtin types have fixed, well-known IDs that match the CrunchyFieldTypeId
enum in crunchy.h:
| Type | ID |
|---|---|
bool |
1 |
i8 |
2 |
u8 |
3 |
i16 |
4 |
u16 |
5 |
i32 |
6 |
u32 |
7 |
i64 |
8 |
u64 |
9 |
f32 |
10 |
f64 |
11 |
IDs 12 through 16,383 are reserved for future builtin types.
User-defined types — structs, enums, and custom types — are assigned IDs
starting at 16,384 (CRUNCHY_FIELD_TYPE_USER_DEFINED_START). IDs are
assigned sequentially in declaration order on first compilation and are
persisted in the lock file. New types added in later compilations receive the
next available ID. Once assigned, a type's ID never changes.
During deserialization, each field is wrapped in a version guard based on its
V(start[, end]) annotation:
V(start)→if (read_version >= start) { ... }V(start, end)→if (read_version >= start && read_version <= end) { ... }
If a struct has MINIMUM_VERSION set and a field's start is at or below
that minimum, the start check is elided (the field is always present in any
valid data).
This means that when deserializing data written by an older version of the schema, fields not yet present are simply skipped (left at their zero-initialized value), and fields that have since been removed are still read from the stream and stored in the struct.
If a struct has a SIGNATURE directive, the signature bytes are written as
raw bytes (no length prefix) at the start of the struct encoding. During
deserialization, the same number of bytes are read and compared. If they
don't match, deserialization returns false immediately.
Signatures are useful for detecting data corruption or ensuring you're reading the expected type from an untyped byte stream.
Given this schema:
struct Inner
{
VERSION = 1;
SIGNATURE = "INNER";
V(1) i32 x;
V(1) i32 y;
}
struct Outer
{
ROOT;
VERSION = 1;
SIGNATURE = "OUTER";
V(1) i32 value;
V(1) Inner nested;
}
Serializing Outer { value = 42, nested = { x = 10, y = 20 } } produces:
Offset Bytes Meaning
------ ----- -------
0 4f 55 54 45 52 "OUTER" signature (5 bytes)
5 01 00 00 00 Outer version = 1 (U32)
9 06 00 type_id = 6 (i32)
11 01 00 field_id = 1 (value)
13 2a 00 00 00 value = 42 (I32)
17 00 40 type_id = 16384 (Inner struct)
19 02 00 field_id = 2 (nested)
21 49 4e 4e 45 52 "INNER" signature (5 bytes)
26 01 00 00 00 Inner version = 1 (U32)
30 06 00 type_id = 6 (i32)
32 01 00 field_id = 1 (x)
34 0a 00 00 00 x = 10 (I32)
38 06 00 type_id = 6 (i32)
40 02 00 field_id = 2 (y)
42 14 00 00 00 y = 20 (I32)
Total: 46 bytes
The binary data produced by Crunchy has no built-in encryption, authentication, or integrity checking. If your application needs any of these properties, you add them yourself.
Even if you are only deserializing data that you yourself produced (saved game state, local configuration, etc.), it is a good idea to store a checksum alongside the binary data (CRC-32 is fine for this) and verify it. This catches garden-variety data corruption from disk errors, truncated writes, etc.
If you are deserializing data that comes from an untrusted source (network messages, files uploaded by users, IPC with a less-privileged process, etc.), you need a checksum + authentication. Either the transport already provides authentication (TLS with mutual or server authentication, for example) or you must add your own with a message authentication code such as HMAC-SHA256 applied to the serialized bytes before deserialization.
Without authentication, an attacker can craft binary payloads that, while not exploiting crunchy itself, can set any field of your structs to any value the type allows. Whether that is dangerous depends entirely on what your application does with those values.
The generated deserialization code depends on the JSL's JSL_ASSERT macro for bounds
checking during deserialization. By default this maps to assert.
Compiling with NDEBUG defined disables all of these assertions. If you do this, a
truncated or corrupted payload will silently read past the end of the input
buffer.
Using NDEBUG is just a bad idea in general. Manually removing assertions in specific
hot spots in your code makes sense. Disabling assertions across your entire program
is you asking for data breaches.
JSON is incredibly slow and bloated for most use cases. If you don't need human readable, hierarchical, or schema-less data you really should not use JSON.
Every number must be parsed from its decimal string representation. Every string is UTF-8 encoded with escape sequences that must be interpreted. Every field name is repeated in every record. A 4-byte i32 with value 1000000 takes 7 bytes as the text "1000000", plus the field name, plus quotes, plus a colon, plus structural characters. The parsing cost scales with the size of the text, not the size of the data. Changing all of the key names to two character codes with a look up table in the app becomes a significant optimization on larger documents.
Crunchy's binary format stores that same i32 as 4 bytes, always, preceded by a fixed 4-byte field header (type ID + field ID). There are no field names in the output — just compact numeric identifiers. Each field deserialization is a sequence of fixed-width memory reads with no parsing or decoding loops. JSON deserialization is orders of magnitude slower for any non-trivial data.
JSON also has no schema. Field types, required fields, and versioning are all conventions enforced by application code. Crunchy's schema file is the single source of truth, and the lock file prevents breaking changes between versions.
Protobuf does have two major advantages over Crunchy: it supports forwards compatibility and it has broad cross-language support.
For backwards compatibility, Protobuf relies on manually numbered fields and developer
discipline. Each field has a number (int32 name = 1;) and the wire format uses that number as
the field identifier. Adding new fields with new numbers is safe — old readers skip unknown
field numbers and new readers use default values for missing fields. But protoc only compiles
one version of a schema at a time. It does not compare your current .proto file against any
previous version, so it cannot catch you changing a field's type, reusing a retired field number
across schema versions, or removing a field that existing data still contains.
Protobuf does have a reserved keyword that prevents reuse of specific field numbers or names
within a single message definition, and protoc enforces it. But someone has to remember to add
the reservation when deleting a field — there is no automatic tracking. If you forget, a
teammate can reuse that number and protoc will not complain. Third-party tools like Buf exist
to fill this gap, but they are not part of the standard Protobuf toolchain.
Crunchy's lock file handles all of this automatically. Every invocation compares the current schema against the locked version and rejects incompatible changes like changing types and removing fields that are still in use by the current version. There is no manual field tracking.
Protobuf is mainly designed for passing messages across a network. The C++ implementation
generates message classes with getter/setter methods, not plain structs — fields are private and
accessed through methods like name(), set_name(), has_name(). This means you typically
maintain two representations of your data: the Protobuf message type and your application's own
types, with conversion code between them. Crunchy structs are your application structs. Once
deserialization is done you have a normal C struct ready to use.
Protobuf does not officially support C. There are third-party C implementations (protobuf-c,
nanopb) that generate actual C structs, but they are community maintained and require the C++
protoc compiler to be installed for code generation.
FlatBuffers uses an accessor-based API: you never get plain C structs. Instead, every field access goes through generated functions that chase offsets into a serialized buffer. This means FlatBuffers has zero deserialization cost upfront — you just get a pointer into the buffer and start reading.
The tradeoff is that every field access pays for that indirection. Reading a scalar field in FlatBuffers involves reading a vtable pointer, reading the field offset from the vtable, and then reading the actual value — three dependent memory loads. In a tight loop accessing the same struct repeatedly, this adds up.
Crunchy takes the opposite approach: you pay an upfront deserialization cost where each field is read as a fixed-width value into a plain C struct — no varint decoding, no offset chasing. After deserialization you get real C structs with direct field access (a single load at a known offset).
For types that contain custom containers (hash maps, trees, etc.), crunchy does require an upfront deserialization step. But after that step, you have real data structures with their native performance characteristics — O(1) hash map lookups instead of O(log n) binary search into a sorted buffer, which is how FlatBuffers would represent the same data.
The other difference is that crunchy gives you plain C structs. You can pass them to existing functions, take their address, memcpy them, and use them like any other struct. FlatBuffers requires your entire codebase to interact with data through its generated accessor API.
Cap'n Proto's big idea is that the in-memory representation is the wire format — there is no encode/decode step. You write bytes straight to disk or the network and read them back without parsing. For message-passing between services in the same language this is genuinely fast.
The cost is the same accessor problem as FlatBuffers. Cap'n Proto generates Builder and Reader wrapper types; every field read or write goes through methods that perform pointer arithmetic and bounds checking. You never get a plain C struct you can pass to existing code, memcpy, or inspect in a debugger. Your application either adopts Cap'n Proto types everywhere or maintains a conversion layer between its own types and the message types.
Cap'n Proto also uses a pointer-based encoding (struct pointers, list pointers, far pointers for cross-segment references) that is substantially more complex than a flat sequential format. Implementing a correct reader from scratch is non-trivial, which matters if you value being able to vendor a dependency and understand it end to end.
On the C front, Cap'n Proto is a C++ project. The third-party C library, c-capnproto, is unmaintained.
Even using the C library requires installing the C++ capnp schema compiler, since the C code
generator is a plugin to it.
For backwards compatibility, Cap'n Proto relies on convention and careful ordering. Fields are
numbered with annotations (@0, @1, @2, ...) and new fields must be added at the end of
the struct with the next sequential number. A reader that encounters a struct written with a
newer schema simply ignores the extra data section at the tail. But nothing in the tooling
prevents you from reordering fields, changing a field's type, or reusing a retired field number
— all of which silently corrupt data. The schema compiler checks syntax, not compatibility
between schema versions. It is up to the developer to never make those mistakes.
Crunchy enforces this with the lock file. Every time you run the compiler it compares your
current schema against the locked version and rejects backwards-incompatible changes: reordering
fields, changing types, removing fields that are still in use by a live version. When you do
need to remove a field, you mark it with skip and bump the version — Crunchy handles the rest
and the old field ID is never accidentally reused. You don't need to remember which numbers are
retired or trust that no one on the team will make an edit that looks harmless but breaks the
wire format.
Crunchy deserializes into plain C structs in a single upfront pass, has no C++ dependency, and the lock file catches backwards-incompatible schema changes that Cap'n Proto leaves to developer discipline.
Because cereal is crunchy. Get it? Cereal? Serial? No?