The smallest and fastest TypeScript web standards-compliant value serialization library in the wild. 3.64 KiB gzipped; 0 dependencies. Strongly-typed and still works fine in plain JS. Efficiently encode and decode your values to and from ArrayBuffers. Integrates very well with WebSockets.
import { array, boolean, float32, object, uint8, varuint } from 'crunches'
const playerSchema = object({
position: array({
element: float32(),
length: 3,
}),
health: varuint(),
jumping: boolean(),
attributes: object({
str: uint8(),
agi: uint8(),
int: uint8(),
}),
})On the server:
const player = {
position: [-540.2378623, 343.183749, 1201.23897468],
health: 4000,
jumping: false,
attributes: {str: 87, agi: 42, int: 22},
}
// encode the value to a new `DataView`
const view = playerSchema.encode(player)
// use some socket library to send the binary data...
socket.emit('player-data', view)On the client:
// use some socket library to receive the binary data...
socket.on('player-data', (buffer) => {
const player = playerSchema.decode(buffer)
})In this example, the size of payload is only 18 bytes. JSON.stringify would consume 124 bytes.
There is a convenience method which will allocate a view over a buffer sized to hold your value.
// create a view for our value
const view = playerSchema.allocate(player)
// pass the view to the encoder
playerSchema.encodeInto(player, view, 0)It can be useful for performance reasons to reuse your buffers.
This is sugar over the following:
// get the schema size
const size = playerSchema.size(player)
// allocate a buffer
const buffer = new ArrayBuffer(size)
// create a view over the buffer
const view = new DataView(buffer)
// pass the view to the encoder
playerSchema.encodeInto(player, view, 0)You may encodeInto a view over any existing ArrayBuffer provided that it's large enough to contain the encoded payload.
| Type Name | Bytes | Range of Values | ||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| boolean | 1 (worst case, see boolean coalescence) | Truthy values are coerced to true; falsy values to false |
||||||||||||||||||
| int8 | 1 | -128 to 127 | ||||||||||||||||||
| uint8 | 1 | 0 to 255 | ||||||||||||||||||
| int16 | 2 | -32,768 to 32,767 | ||||||||||||||||||
| uint16 | 2 | 0 to 65,535 | ||||||||||||||||||
| int32 | 4 | -2,147,483,648 to 2,147,483,647 | ||||||||||||||||||
| uint32 | 4 | 0 to 4,294,967,295 | ||||||||||||||||||
| int64 | 8 | -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 NOTE: Only accepts and decodes to BigInts. |
||||||||||||||||||
| uint64 | 8 | 0 to 18,446,744,073,709,551,615 NOTE: Only accepts and decodes to BigInts. |
||||||||||||||||||
| float32 | 4 | 3.4E +/- 38 (7 digits) | ||||||||||||||||||
| float64 | 8 | 1.7E +/- 308 (15 digits) | ||||||||||||||||||
| string | Prefix followed by the encoded string bytes | Any string | ||||||||||||||||||
| buffer | Prefix followed by the bytes of the buffer | Any ArrayBufferNOTE: Decodes to a DataView.See: buffers and arrays. |
||||||||||||||||||
| varuint |
|
0 to 4,294,967,295 | ||||||||||||||||||
| varint |
|
-2,147,483,648 to 2,147,483,647 | ||||||||||||||||||
| date | Same as string above after calling toIsoString |
Value is coerced to Date e.g. new Date(value).toIsoString() |
Requires a properties object. Supports optional fields. Booleans are coalesced.
Example:
const schema = object({
foo: uint32(),
bar: string().optional(),
})
// 14 = uint32 (4) + optional flag (1) + string prefix (4) + 'hello' (5)
expect(schema.size({foo: 32, bar: 'hello'})).to.equal(14)
// 5 = uint32 (4) + optional flag (1)
expect(schema.size({foo: 32})).to.equal(5)Requires an element key to define the structure of the array elements. Encodes a 32-bit prefix followed by the contents of the array.
const schema = array({
element: uint32(),
})
// 16 = array prefix (4) + uint32 (4) + uint32 (4) + uint32 (4)
expect(schema.size([1, 2, 3])).to.equal(16)Arrays of number types decode to the corresponding TypedArray.
Arrays may be specified as fixed-length through the length key.
const schema = array({
element: uint32(),
length: 3,
})
// 12 = uint32 (4) + uint32 (4) + uint32 (4)
expect(schema.size([1, 2, 3])).to.equal(12)No prefix is written, saving 4 bytes!
Requires a key and value key to define the structure of the map. Any iterable will be coerced as entries. Encoded as an array of entries. Decodes to a native Map object.
const schema = map({
key: int32(),
value: string(),
})
const value = new Map<number, string>()
value.set(32, 'sup')
value.set(64, 'hi')
// 25 = array prefix (4) + int32 (4) + string prefix (4) + 'sup' (3) + int32 (4) + string prefix (4) + 'hi' (2)
expect(schema.size(value)).to.equal(25)
// same, with coercion
expect(schema.size([[32, 'sup'], [64, 'hi']])).to.equal(25)Requires an element key to define the structure of the map. Any iterable will be coerced. Encoded as an array. Decodes to a native Set object.
const schema = set({
element: string(),
})
const value = new Set<string>()
value.add('foo')
value.add('bar')
// 18 = array prefix (4) + string prefix (4) + 'foo' (3) + string prefix (4) + 'bar' (3)
expect(schema.size(value)).to.equal(18)
// same, with coercion
expect(schema.size(['foo', 'bar'])).to.equal(18)Any code monkey worth their salt secretly wonders whether their boolean type actually takes a single bit of space. The crunches answer is: ideally, yes!
The reason it's not an unequivocal "yes" is because there is no actual bit-width primitive when dealing with DataViews in JavaScript. However, boolean fields are packed as much as possible.
In other words, if you have an object with 2 boolean fields, the object itself will encode to 1 byte! This is the case all the way up to 8 boolean fields. If you add another, the object encodes to 2 bytes, up until you have more than 16 boolean fields!
More concretely, packing boolean fields takes
Math.ceil(numberOfBooleanFields / 8)bytes of space.
Object properties may call an optional method. If the value is undefined upon encoding, the field will be encoded as not present. Upon decoding, the presence flag is checked and if the value is not present, the value decoding will be skipped and undefined will be returned as the decoded value.
This is a great alternative for rolling updates to a monolithic state, which would otherwise have to be individually defined for every discrete slice of state that could update.
Using the original example with optional fields:
const stateSchema = object({
position: array({
element: float32(),
length: 3,
}).optional(),
health: varuint().optional(),
jumping: boolean().optional(),
attributes: object({
str: uint8(),
agi: uint8(),
int: uint8(),
}).optional(),
})if we were to check the size of a completely blank update:
expect(stateSchema.size({})).to.equal(1)We will see that the size is 1 byte! It literally doesn't get better than that. How is it only one byte when we have 4 optional fields? Well,
The same packing as for booleans occurs when encoding the presence of optional fields on an object. Each optional field ideally takes a single bit to encode its presence value. In other words, if you have an object with up to 8 optional fields, the presence encoding only takes 1 byte!
More concretely, packing optional flags takes
Math.ceil(numberOfOptionalFields / 8)bytes of space.
crunches defaults to little-endian byte ordering to align with the majority of architectures'
implementation of TypedArray. This may be overridden on any crunches type:
const stateSchema = object({
health: varuint(), // by default, properties inherit the endianness of their parent
strength: varuint(), // so, these properties are big endian
accumulator: uint32().littleEndian(), // but children may override their endianness
}).bigEndian(); // the object is big endianYou may define your own codecs:
import { CrunchesString, CrunchesType, object, string, Target } from 'crunches'
type CoercibleToDate = Date | string | number
export class MySuperCustomDate
// extend CrunchesType<OUTPUT_TYPE, INPUT_TYPE> to create your codec!
//
// this means our codec outputs `Date`s and accepts `Date`s, `string`s and `number`s.
extends CrunchesType<Date, CoercibleToDate>
{
// we're delegating to the string codec
private readonly $$string: CrunchesString
constructor() {
super()
this.$$string = new CrunchesString()
}
// propagate endianness to any "child" codecs
bigEndian(): this {
// only propagate if the child hasn't overridden its endianness
if (undefined === this.$$string.isLittleEndian) {
this.$$string.bigEndian()
}
return super.bigEndian()
}
decodeFrom(view: DataView, target: Target): Date {
return new Date(this.$$string.decodeFrom(view, target))
}
encodeInto(value: CoercibleToDate, view: DataView, byteOffset: number): number {
return this.$$string.encodeInto(new Date(value).toISOString(), view, byteOffset)
}
// propagate endianness to any "child" codecs
littleEndian(): this {
// only propagate if the child hasn't overridden its endianness
if (undefined === this.$$string.isLittleEndian) {
this.$$string.littleEndian()
}
return super.littleEndian()
}
sizeOf(value: CoercibleToDate): number {
return this.$$string.sizeOf(new Date(value).toISOString())
}
}
// export a small helper function to make things smooth for your consumers!
// using e.g. `string()` instead of `new CrunchesString()` is a nicer experience
export const mySuperCustomDate = () => new MySuperCustomDate()This class is using CrunchesString to delegate encoding/decoding strings to/from the wire. All crunches codecs are available to import directly.
We're delegating to the CrunchesString codec for the methods, but we'll discuss them briefly.
-
Decode and return a value from the
DataView, starting attarget.byteOffset. You must incrementtarget.byteOffsetby the number of bytes you consume from theDataViewwhen decoding. -
Encode
valueinto theDataView, starting atbyteOffset. You must return the number of bytes written to theDataView. -
Return the computed size of
valuein bytes.
We could use the codec we just defined like so:
const schema = object({
name: string(),
when: mySuperCustomDate(),
})
const encoded = schema.encode({
name: 'John Doe',
when: 1234567890123,
})
expect(schema.decode(encoded)).to.deep.equal({
name: 'John Doe',
when: new Date('2009-02-13T23:31:30.123Z') // above timestamp equivalent as UTC date
})SchemaPack (huge respect from and inspiration for this library! ❤️) is great for packing objects into Node buffers. Over time, this approach has become outdated in favor of modern standards like ArrayBuffer. I also took inspiration for fluent API design from Zod. Great library!
It is also frequently desirable to preallocate and reuse buffers for performance reasons. SchemaPack always allocates new buffers when encoding. The performance hit is generally less than the naive case since Node is good about buffer pooling, but performance degrades in the browser (and doesn't exist on any other platform). Buffer reuse is the Correct Way™. We also apply even more optimizations of buffers and arrays.
I also wanted an implementation that does amazing things like boolean coalescence and optional fields (also with coalescence) as well as supporting more even more types like Maps, Sets, Dates, etc.
When defining arrays, the elements are all the same type. There is no mixing of types. If you need this, you might consider using an array of objects (which themselves maybe contain arrays).
SchemaPack uses varuint prefixes for arrays, buffers, and strings. For speed, crunches uses 32-bit prefixes by default. A varuint prefix may be used for buffers and strings by providing a varuint key in the schema blueprint:
const schema = string({
varuint: true,
})
// 6 = varuint prefix (1) + 'hello' (5)
expect(schema.size('hello')).to.equal(6)NOTE: Strings may use one extra byte to encode the prefix than necessary. This is because string.length * 3 is used to calculate the width of the varuint prefix. This expression will most likely overestimate the space required to store the string. One byte of space in certain cases is a better tradeoff than the space/time complexity required to calculate the true size in a performant way.
Arrays always use a 32-bit prefix and may not specify a varuint prefix. This is because any iterable may be coerced into an array. It is technically possible to implement varuint prefixes in a performant way only for actual arrays (or Sets) which can be coered to TypedArrays, however it might be confusing as it would need to be ignored in cases even when it could be specified by the user and would introduce more implementation complexity.
A massive performance gain is achieved by copy-free buffer decoding. In other words, a buffer value is not copied out of the binary from which it is decoded; a DataView is created over the encoded binary and the DataView is returned. Decoding a 1024-byte buffer is 10x faster on the machine used to benchmark. The gains increase even more as the size of the buffer increases.
A similar performance gain is also used for arrays. The fast path is used for arrays of the following types:
int8(Int8Array)uint8(Uint8Array)int16(Int16Array)uint16(Uint16Array)int32(Int32Array)uint32(Uint32Array)int64(BigInt64Array)uint64(BigUint64Array)float32(Float32Array)float64(Float64Array)
Instead of copying the data from the buffer, a TypedArray is created over the encoded binary and returned instead. The same optimization is applied for encoding. This is roughly 1.5x faster for encoding and 50x faster for decoding a 1024-byte array on the machine used to benchmark. The gains increase even more as the size of the array increases.
NOTE: TypedArrays are padded with extra bytes if necessary to satisfy the required alignment.
Q: Why did you call it crunches?
A: 'cuz you gotta crunch those flabby AB(ArrayBuffer)s! 😋
For entertainment purposes only.
> npx tsx benchmark/run.ts
encoding x 10000
SchemaPack 342.09 ms
crunches (encodeInto) 154.93 ms
crunches (encode) 250.44 ms
decoding x 10000
SchemaPack 221.75 ms
crunches 118.27 ms