GitHub - johnw42/overflowing_int: Wrapper around `num-bigint` to provide better performance for small integers

A wrapper around num-bigint that provides faster, more memory-efficient representations for integers that fit in primitive types. It can serve as a drop-in replacement for num-bigint in most cases. The intended use case is to choose one of the implementations and import it using renaming, such as use overflowing_int::ArcInt128 as BigInt;.

Caveats

Certain methods necessarily have different signatures. In particular, from_bigint, to_bigint and magnitude have different signatures, and this crate has its own version of TryFromBigIntError.

As nice as it would be for the wrapper types and num-bigint's types to implement a common trait, it isn't feasible because so much of the functionality is provided by traits where the necessary trait bounds can't be expressed as a where clause on the trait itself.

Choosing an Implementation

If you're working with numbers that are always, or almost always, big, this crate is not for you. Use BigInt and BigUint from num-bigint instead, because they offer better performance for big values, and the overhead of this crate's wrapper types is not worth it in that case.

In general, my recommendation is to use the Arc-based wrapper types because they provide a good balance of performance and convenience, and they have the ability to share big values between instances, which is often relevant because many formulas include the same value more than once. In particular, the 64-bit Arc wrapper types are the smallest, making them an ideal choice for cases where large values are rare, such as an interpreter that needs to support arbitrary precision integers.

The Cow-based wrapper types are a good choice if you want to avoid depending on code that uses unsafe, or if you want to be able to borrow small values from a big value without cloning. However, they are less convenient to use because they have lifetime parameters that must be managed.

The Overflowing wrapper types are the most naive implementation. Only use them if you don't care about sharing bignum values and you want to avoid depending on code that uses unsafe.

The following table summarizes the differences between the wrapper types, assuming a 64-bit target architecture. The "Max Small Bits" column indicates the maximum number of bits that can be stored inline.

Type	size_of	Max Small Bits	Uses `unsafe`?
BigInt	32	0	no
BigUint	24	0	no
ArcInt64	8	63	yes
ArcUint64	8	63	yes
ArcInt128	16	127	yes
ArcUint128	16	127	yes
CowInt64	32	64	no
CowUint64	24	64	no
CowInt128	32	128	no
CowUint128	32	128	no
OverflowingI64	32	64	no
OverflowingU64	24	64	no
OverflowingI128	32	128	no
OverflowingU128	32	128	no

Safety

All of the wrapper types in this crate are safe to use, and all of their methods are safe to call. However, if you're concerned about depending code that uses unsafe, avoid using the Arc-based wrapper types, because they use unsafe internally and assume pointers never have 1 in their LSB.

Performance

This crate has been heavily benchmarked to ensure optimal performance and to highlight the performance characteristics of the different encoding types.

For small numbers, the 64-bit wrapper types outperform num-bigint by a factor of roughly 10x, and the 128-bit wrapper types outperform num-bigint by a factor of roughly 3x.

For big numbers, the overhead of the wrapper types is generally around a factor of 2.

There are no Rc-based wrapper types because benchmarking has shown that they don't provide any meaningful performance benefits over the Arc-based wrapper types. In the case of small numbers, no reference counting is done, and in the case of big numbers, the overhead of reference counting is negligible compared to the overhead of performing arithmetic on big numbers.

Benchmark Results

The Pi benchmark clearly shows the relative performance of of the different implementations using a somewhat realistic workload. The chart below shows the time taken to compute different numbers of digits of pi. The line labeled Control shows the result of using BigInt directly, and the line labeled Identity shows an trivial encoding that always encodes its value as a BigInt; it is meant to gague the overhead of using an encoding when no optimizations apply for small numbers.

The chart shows that ArcInt64 is clearly the fastest for a workload involving mostly small integers (less than 63 bits), but it quickly ceases to be helpful once large integers are in play. The next fastest implementations. The 128-bit encodings show performance improvements for calculating up to 30-40 digits of pi, with CowInt128 and OverflowingI128 being the clear performance winners over ArcInt128. This is to be expected, as the benchmark does little that would take advantage of sharing BigInt values between encoding instances. The lack of sharing also explains why CowInt128 shows no advantage over OverflowingI128.

The performance gap between all implementations starts to close as the number of digits approaches 1000, showing that in this case, the small integer optimizations are not helpful, but the cost of performing arithmetic on increasingly large integers begins to dwarf the overhead of the useless optimizations.

An alternate way to look at the same performance data is to look at throughput, measured as the number of digits of pi computed per second. Aside for perhaps being slighly easier to read, this chart shows that every type, including BigInt itself, has a "sweet spot" for throughput.

Additions to the `BigInt` and `BigUint` APIs

The wrapper types provide a few methods that are not present on BigInt and BigUint. The into_static method, useful only for Cow-based types, uses cloning if necessary to coerce a value into having a 'static lifetime. The reencode_from method allows a value to be re-encoded from one encoding type to another, which is useful when mixing encodings (though I don't necessarily recommend mixing encodings in a single codebase).

Testing

This crate include a thorough set of unit tests, with a large number of property-based tests using the quickcheck crate. The property-based tests are designed to test the consistency of the wrapper types with BigInt and BigUint. They include tests for all overloaded operators and other methods implemented through traits.

The tests assume the num-bigint types are themselves correct, so they are not designed to catch bugs in num-bigint's implementation. Instead, they are designed to catch bugs in the wrapper types' implementation, and to ensure that the wrapper types are consistent with num-bigint's types.

Implementation Notes

The implementation of this crate is in layers. The lowest layer is the BigNumber trait, which exposes the APIs of BigInt and BigUint through a trait that includes all their overloaded operators in a single trait.

The next layer is the Encoding trait, which defines how to encode small values and big values, and how to convert between them. This layer is where the different encoding strategies are implemented.

The top layer is the wrapper types, which are the types that users of this crate will interact with. They are implemented in terms of the Encoding trait, and they provide roughly the same APIs as BigInt and BigUint, with its same trait implementations, but with different performance characteristics depending on the encoding used.

Overloaded operators, implemented in the num_ops module, have their own set of layers. The top level is a set of traits for different categories of operators: arithmetic, bitwise, shift, and the power operator. These trait provide the logic to dispatch different argument types to the most specific overloads provided by BigInt and BigUint. These operators come in many combinations of argument types based which primitive type they accept and whether the arguments are consumed or borrowed.

In the next layer, the operator traits are implemented for specific operators, and finally, the operators themselves are implemented on the wrapper types by delegating to the operator trait implementations.

This crate makes heavy use of macros to avoid code duplication. The duplicate crate used heavily, and the paste crate is indespensible for building method names systematically.

Name		Name	Last commit message	Last commit date
Latest commit History 154 Commits
benches		benches
examples		examples
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md
benchmark.py		benchmark.py
regroup_bounds.py		regroup_bounds.py
throughput.svg		throughput.svg
time.svg		time.svg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Caveats

Choosing an Implementation

Safety

Performance

Benchmark Results

Additions to the `BigInt` and `BigUint` APIs

Testing

Implementation Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Caveats

Choosing an Implementation

Safety

Performance

Benchmark Results

Additions to the BigInt and BigUint APIs

Testing

Implementation Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Additions to the `BigInt` and `BigUint` APIs

Packages