crunches 💪

The smallest and fastest TypeScript web standards-compliant value serialization library in the wild. 3.64 KiB gzipped; 0 dependencies. Strongly-typed and still works fine in plain JS. Efficiently encode and decode your values to and from ArrayBuffers. Integrates very well with WebSockets.

Example

import { array, boolean, float32, object, uint8, varuint } from 'crunches'

const playerSchema = object({
  position: array({
    element: float32(),
    length: 3,
  }),
  health: varuint(),
  jumping: boolean(),
  attributes: object({
    str: uint8(),
    agi: uint8(),
    int: uint8(),
  }),
})

On the server:

const player = {
  position: [-540.2378623, 343.183749, 1201.23897468],
  health: 4000,
  jumping: false,
  attributes: {str: 87, agi: 42, int: 22},
}

// encode the value to a new `DataView`
const view = playerSchema.encode(player)
// use some socket library to send the binary data...
socket.emit('player-data', view)

On the client:

// use some socket library to receive the binary data...
socket.on('player-data', (buffer) => {
  const player = playerSchema.decode(buffer)
})

In this example, the size of payload is only 18 bytes. JSON.stringify would consume 124 bytes.

Allocating a buffer and view

There is a convenience method which will allocate a view over a buffer sized to hold your value.

// create a view for our value
const view = playerSchema.allocate(player)
// pass the view to the encoder
playerSchema.encodeInto(player, view, 0)

It can be useful for performance reasons to reuse your buffers.

This is sugar over the following:

// get the schema size
const size = playerSchema.size(player)
// allocate a buffer
const buffer = new ArrayBuffer(size)
// create a view over the buffer
const view = new DataView(buffer)
// pass the view to the encoder
playerSchema.encodeInto(player, view, 0)

You may encodeInto a view over any existing ArrayBuffer provided that it's large enough to contain the encoded payload.

Primitive types

Type Name Bytes Range of Values

boolean 1 (worst case, see boolean coalescence) Truthy values are coerced to true; falsy values to false

int8 1 -128 to 127

uint8 1 0 to 255

int16 2 -32,768 to 32,767

uint16 2 0 to 65,535

int32 4 -2,147,483,648 to 2,147,483,647

uint32 4 0 to 4,294,967,295

int64 8 -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807

NOTE: Only accepts and decodes to BigInts.

uint64 8 0 to 18,446,744,073,709,551,615

NOTE: Only accepts and decodes to BigInts.

float32 4 3.4E +/- 38 (7 digits)

float64 8 1.7E +/- 308 (15 digits)

string Prefix followed by the encoded string bytes Any string

buffer Prefix followed by the bytes of the buffer Any ArrayBuffer

NOTE: Decodes to a DataView.

See: buffers and arrays.

varuint

size	min	max
1	0	127
2	128	16,383
3	16,384	2,097,151
4	2,097,152	268,435,455
5	268,435,456	4,294,967,295

0 to 4,294,967,295

varint

size	min	max
1	-64	63
2	-8,192	8,191
3	-1,048,576	1,048,575
4	-134,217,728	134,217,727
5	-2,147,483,648	2,147,483,647

-2,147,483,648 to 2,147,483,647

date Same as string above after calling toIsoString Value is coerced to Date e.g. new Date(value).toIsoString()

Aggregate types

`object`

Requires a properties object. Supports optional fields. Booleans are coalesced.

Example:

const schema = object({
  foo: uint32(),
  bar: string().optional(),
})
// 14 = uint32 (4) + optional flag (1) + string prefix (4) + 'hello' (5)
expect(schema.size({foo: 32, bar: 'hello'})).to.equal(14)
// 5 = uint32 (4) + optional flag (1)
expect(schema.size({foo: 32})).to.equal(5)

`array`

Requires an element key to define the structure of the array elements. Encodes a 32-bit prefix followed by the contents of the array.

const schema = array({
  element: uint32(),
})
// 16 = array prefix (4) + uint32 (4) + uint32 (4) + uint32 (4)
expect(schema.size([1, 2, 3])).to.equal(16)

Arrays of number types decode to the corresponding TypedArray.

Fixed-length arrays

Arrays may be specified as fixed-length through the length key.

const schema = array({
  element: uint32(),
  length: 3,
})
// 12 = uint32 (4) + uint32 (4) + uint32 (4)
expect(schema.size([1, 2, 3])).to.equal(12)

No prefix is written, saving 4 bytes!

`map`

Requires a key and value key to define the structure of the map. Any iterable will be coerced as entries. Encoded as an array of entries. Decodes to a native Map object.

const schema = map({
  key: int32(),
  value: string(),
})
const value = new Map<number, string>()
value.set(32, 'sup')
value.set(64, 'hi')
// 25 = array prefix (4) + int32 (4) + string prefix (4) + 'sup' (3) + int32 (4) + string prefix (4) + 'hi' (2)
expect(schema.size(value)).to.equal(25)
// same, with coercion
expect(schema.size([[32, 'sup'], [64, 'hi']])).to.equal(25)

`set`

Requires an element key to define the structure of the map. Any iterable will be coerced. Encoded as an array. Decodes to a native Set object.

const schema = set({
  element: string(),
})
const value = new Set<string>()
value.add('foo')
value.add('bar')
// 18 = array prefix (4) + string prefix (4) + 'foo' (3) + string prefix (4) + 'bar' (3)
expect(schema.size(value)).to.equal(18)
// same, with coercion
expect(schema.size(['foo', 'bar'])).to.equal(18)

🔥 features

Boolean coalescence

Any code monkey worth their salt secretly wonders whether their boolean type actually takes a single bit of space. The crunches answer is: ideally, yes!

The reason it's not an unequivocal "yes" is because there is no actual bit-width primitive when dealing with DataViews in JavaScript. However, boolean fields are packed as much as possible.

In other words, if you have an object with 2 boolean fields, the object itself will encode to 1 byte! This is the case all the way up to 8 boolean fields. If you add another, the object encodes to 2 bytes, up until you have more than 16 boolean fields!

More concretely, packing boolean fields takes

Math.ceil(numberOfBooleanFields / 8)

bytes of space.

Optional fields

Object properties may call an optional method. If the value is undefined upon encoding, the field will be encoded as not present. Upon decoding, the presence flag is checked and if the value is not present, the value decoding will be skipped and undefined will be returned as the decoded value.

This is a great alternative for rolling updates to a monolithic state, which would otherwise have to be individually defined for every discrete slice of state that could update.

Using the original example with optional fields:

const stateSchema = object({
  position: array({
    element: float32(),
    length: 3,
  }).optional(),
  health: varuint().optional(),
  jumping: boolean().optional(),
  attributes: object({
    str: uint8(),
    agi: uint8(),
    int: uint8(),
  }).optional(),
})

if we were to check the size of a completely blank update:

expect(stateSchema.size({})).to.equal(1)

We will see that the size is 1 byte! It literally doesn't get better than that. How is it only one byte when we have 4 optional fields? Well,

Optional field coalescence

The same packing as for booleans occurs when encoding the presence of optional fields on an object. Each optional field ideally takes a single bit to encode its presence value. In other words, if you have an object with up to 8 optional fields, the presence encoding only takes 1 byte!

More concretely, packing optional flags takes

Math.ceil(numberOfOptionalFields / 8)

bytes of space.

Endianness

crunches defaults to little-endian byte ordering to align with the majority of architectures' implementation of TypedArray. This may be overridden on any crunches type:

const stateSchema = object({
  health: varuint(), // by default, properties inherit the endianness of their parent
  strength: varuint(), // so, these properties are big endian
  accumulator: uint32().littleEndian(), // but children may override their endianness
}).bigEndian(); // the object is big endian

Extensible

You may define your own codecs:

import { CrunchesString, CrunchesType, object, string, Target } from 'crunches'

type CoercibleToDate = Date | string | number

export class MySuperCustomDate

  // extend CrunchesType<OUTPUT_TYPE, INPUT_TYPE> to create your codec!
  //
  // this means our codec outputs `Date`s and accepts `Date`s, `string`s and `number`s.
  extends CrunchesType<Date, CoercibleToDate>
{
  // we're delegating to the string codec
  private readonly $$string: CrunchesString

  constructor() {
    super()
    this.$$string = new CrunchesString()
  }

  // propagate endianness to any "child" codecs
  bigEndian(): this {
    // only propagate if the child hasn't overridden its endianness
    if (undefined === this.$$string.isLittleEndian) {
      this.$$string.bigEndian()
    }
    return super.bigEndian()
  }

  decodeFrom(view: DataView, target: Target): Date {
    return new Date(this.$$string.decodeFrom(view, target))
  }

  encodeInto(value: CoercibleToDate, view: DataView, byteOffset: number): number {
    return this.$$string.encodeInto(new Date(value).toISOString(), view, byteOffset)
  }

  // propagate endianness to any "child" codecs
  littleEndian(): this {
    // only propagate if the child hasn't overridden its endianness
    if (undefined === this.$$string.isLittleEndian) {
      this.$$string.littleEndian()
    }
    return super.littleEndian()
  }

  sizeOf(value: CoercibleToDate): number {
    return this.$$string.sizeOf(new Date(value).toISOString())
  }
}

// export a small helper function to make things smooth for your consumers!
// using e.g. `string()` instead of `new CrunchesString()` is a nicer experience
export const mySuperCustomDate = () => new MySuperCustomDate()

This class is using CrunchesString to delegate encoding/decoding strings to/from the wire. All crunches codecs are available to import directly.

We're delegating to the CrunchesString codec for the methods, but we'll discuss them briefly.

decodeFrom

Decode and return a value from the DataView, starting at target.byteOffset. You must increment target.byteOffset by the number of bytes you consume from the DataView when decoding.
encodeInto

Encode value into the DataView, starting at byteOffset. You must return the number of bytes written to the DataView.
sizeOf

Return the computed size of value in bytes.

Using what we wrote

We could use the codec we just defined like so:

const schema = object({
  name: string(),
  when: mySuperCustomDate(),
})

const encoded = schema.encode({
  name: 'John Doe',
  when: 1234567890123,
})

expect(schema.decode(encoded)).to.deep.equal({
  name: 'John Doe',
  when: new Date('2009-02-13T23:31:30.123Z') // above timestamp equivalent as UTC date
})

Motivation

SchemaPack (huge respect from and inspiration for this library! ❤️) is great for packing objects into Node buffers. Over time, this approach has become outdated in favor of modern standards like ArrayBuffer. I also took inspiration for fluent API design from Zod. Great library!

It is also frequently desirable to preallocate and reuse buffers for performance reasons. SchemaPack always allocates new buffers when encoding. The performance hit is generally less than the naive case since Node is good about buffer pooling, but performance degrades in the browser (and doesn't exist on any other platform). Buffer reuse is the Correct Way™. We also apply even more optimizations of buffers and arrays.

I also wanted an implementation that does amazing things like boolean coalescence and optional fields (also with coalescence) as well as supporting more even more types like Maps, Sets, Dates, etc.

Notable differences from SchemaPack

Monomorphic arrays

When defining arrays, the elements are all the same type. There is no mixing of types. If you need this, you might consider using an array of objects (which themselves maybe contain arrays).

Prefixes

SchemaPack uses varuint prefixes for arrays, buffers, and strings. For speed, crunches uses 32-bit prefixes by default. A varuint prefix may be used for buffers and strings by providing a varuint key in the schema blueprint:

const schema = string({
  varuint: true,
})
// 6 = varuint prefix (1) + 'hello' (5)
expect(schema.size('hello')).to.equal(6)

NOTE: Strings may use one extra byte to encode the prefix than necessary. This is because string.length * 3 is used to calculate the width of the varuint prefix. This expression will most likely overestimate the space required to store the string. One byte of space in certain cases is a better tradeoff than the space/time complexity required to calculate the true size in a performant way.

Arrays always use a 32-bit prefix and may not specify a varuint prefix. This is because any iterable may be coerced into an array. It is technically possible to implement varuint prefixes in a performant way only for actual arrays (or Sets) which can be coered to TypedArrays, however it might be confusing as it would need to be ignored in cases even when it could be specified by the user and would introduce more implementation complexity.

Buffers and arrays

A massive performance gain is achieved by copy-free buffer decoding. In other words, a buffer value is not copied out of the binary from which it is decoded; a DataView is created over the encoded binary and the DataView is returned. Decoding a 1024-byte buffer is 10x faster on the machine used to benchmark. The gains increase even more as the size of the buffer increases.

A similar performance gain is also used for arrays. The fast path is used for arrays of the following types:

int8 (Int8Array)
uint8 (Uint8Array)
int16 (Int16Array)
uint16 (Uint16Array)
int32 (Int32Array)
uint32 (Uint32Array)
int64 (BigInt64Array)
uint64 (BigUint64Array)
float32 (Float32Array)
float64 (Float64Array)

Instead of copying the data from the buffer, a TypedArray is created over the encoded binary and returned instead. The same optimization is applied for encoding. This is roughly 1.5x faster for encoding and 50x faster for decoding a 1024-byte array on the machine used to benchmark. The gains increase even more as the size of the array increases.

NOTE: TypedArrays are padded with extra bytes if necessary to satisfy the required alignment.

Q/A

Q: Why did you call it crunches?
A: 'cuz you gotta crunch those flabby AB(ArrayBuffer)s! 😋

Benchmark

For entertainment purposes only.

> npx tsx benchmark/run.ts

encoding x 10000
  SchemaPack             342.09 ms
  crunches (encodeInto)	 154.93 ms
  crunches (encode)	     250.44 ms
decoding x 10000
  SchemaPack             221.75 ms
  crunches               118.27 ms

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
.github/workflows		.github/workflows
benchmark		benchmark
src		src
.gitignore		.gitignore
.release-please-manifest.json		.release-please-manifest.json
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
release-please-config.json		release-please-config.json
tsconfig.build.json		tsconfig.build.json
tsconfig.json		tsconfig.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

crunches 💪

Example

Allocating a buffer and view

Primitive types

Aggregate types

`object`

`array`

Fixed-length arrays

`map`

`set`

🔥 features

Boolean coalescence

Optional fields

Optional field coalescence

Endianness

Extensible

`decodeFrom`

`encodeInto`

`sizeOf`

Using what we wrote

Motivation

Notable differences from SchemaPack

Monomorphic arrays

Prefixes

Buffers and arrays

Q/A

Benchmark

About

Uh oh!

Releases 17

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

crunches 💪

Example

Allocating a buffer and view

Primitive types

Aggregate types

object

array

Fixed-length arrays

map

set

🔥 features

Boolean coalescence

Optional fields

Optional field coalescence

Endianness

Extensible

decodeFrom

encodeInto

sizeOf

Using what we wrote

Motivation

Notable differences from SchemaPack

Monomorphic arrays

Prefixes

Buffers and arrays

Q/A

Benchmark

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 17

Uh oh!

Contributors

Uh oh!

Languages

`object`

`array`

`map`

`set`

`decodeFrom`

`encodeInto`

`sizeOf`