Skip to content

johnw42/overflowing_int

Repository files navigation

A wrapper around num-bigint that provides faster, more memory-efficient representations for integers that fit in primitive types. It can serve as a drop-in replacement for num-bigint in most cases. The intended use case is to choose one of the implementations and import it using renaming, such as use overflowing_int::ArcInt128 as BigInt;.

Caveats

Certain methods necessarily have different signatures. In particular, from_bigint, to_bigint and magnitude have different signatures, and this crate has its own version of TryFromBigIntError.

As nice as it would be for the wrapper types and num-bigint's types to implement a common trait, it isn't feasible because so much of the functionality is provided by traits where the necessary trait bounds can't be expressed as a where clause on the trait itself.

Choosing an Implementation

If you're working with numbers that are always, or almost always, big, this crate is not for you. Use BigInt and BigUint from num-bigint instead, because they offer better performance for big values, and the overhead of this crate's wrapper types is not worth it in that case.

In general, my recommendation is to use the Arc-based wrapper types because they provide a good balance of performance and convenience, and they have the ability to share big values between instances, which is often relevant because many formulas include the same value more than once. In particular, the 64-bit Arc wrapper types are the smallest, making them an ideal choice for cases where large values are rare, such as an interpreter that needs to support arbitrary precision integers.

The Cow-based wrapper types are a good choice if you want to avoid depending on code that uses unsafe, or if you want to be able to borrow small values from a big value without cloning. However, they are less convenient to use because they have lifetime parameters that must be managed.

The Overflowing wrapper types are the most naive implementation. Only use them if you don't care about sharing bignum values and you want to avoid depending on code that uses unsafe.

The following table summarizes the differences between the wrapper types, assuming a 64-bit target architecture. The "Max Small Bits" column indicates the maximum number of bits that can be stored inline.

Type size_of Max Small Bits Uses unsafe?
BigInt 32 0 no
BigUint 24 0 no
ArcInt64 8 63 yes
ArcUint64 8 63 yes
ArcInt128 16 127 yes
ArcUint128 16 127 yes
CowInt64 32 64 no
CowUint64 24 64 no
CowInt128 32 128 no
CowUint128 32 128 no
OverflowingI64 32 64 no
OverflowingU64 24 64 no
OverflowingI128 32 128 no
OverflowingU128 32 128 no

Safety

All of the wrapper types in this crate are safe to use, and all of their methods are safe to call. However, if you're concerned about depending code that uses unsafe, avoid using the Arc-based wrapper types, because they use unsafe internally and assume pointers never have 1 in their LSB.

Performance

This crate has been heavily benchmarked to ensure optimal performance and to highlight the performance characteristics of the different encoding types.

For small numbers, the 64-bit wrapper types outperform num-bigint by a factor of roughly 10x, and the 128-bit wrapper types outperform num-bigint by a factor of roughly 3x.

For big numbers, the overhead of the wrapper types is generally around a factor of 2.

There are no Rc-based wrapper types because benchmarking has shown that they don't provide any meaningful performance benefits over the Arc-based wrapper types. In the case of small numbers, no reference counting is done, and in the case of big numbers, the overhead of reference counting is negligible compared to the overhead of performing arithmetic on big numbers.

Benchmark Results

The Pi benchmark clearly shows the relative performance of of the different implementations using a somewhat realistic workload. The chart below shows the time taken to compute different numbers of digits of pi. The line labeled Control shows the result of using BigInt directly, and the line labeled Identity shows an trivial encoding that always encodes its value as a BigInt; it is meant to gague the overhead of using an encoding when no optimizations apply for small numbers.

The chart shows that ArcInt64 is clearly the fastest for a workload involving mostly small integers (less than 63 bits), but it quickly ceases to be helpful once large integers are in play. The next fastest implementations. The 128-bit encodings show performance improvements for calculating up to 30-40 digits of pi, with CowInt128 and OverflowingI128 being the clear performance winners over ArcInt128. This is to be expected, as the benchmark does little that would take advantage of sharing BigInt values between encoding instances. The lack of sharing also explains why CowInt128 shows no advantage over OverflowingI128.

The performance gap between all implementations starts to close as the number of digits approaches 1000, showing that in this case, the small integer optimizations are not helpful, but the cost of performing arithmetic on increasingly large integers begins to dwarf the overhead of the useless optimizations.

Performance comparison: time

An alternate way to look at the same performance data is to look at throughput, measured as the number of digits of pi computed per second. Aside for perhaps being slighly easier to read, this chart shows that every type, including BigInt itself, has a "sweet spot" for throughput.

Performance comparison: throughput

Additions to the BigInt and BigUint APIs

The wrapper types provide a few methods that are not present on BigInt and BigUint. The into_static method, useful only for Cow-based types, uses cloning if necessary to coerce a value into having a 'static lifetime. The reencode_from method allows a value to be re-encoded from one encoding type to another, which is useful when mixing encodings (though I don't necessarily recommend mixing encodings in a single codebase).

Testing

This crate include a thorough set of unit tests, with a large number of property-based tests using the quickcheck crate. The property-based tests are designed to test the consistency of the wrapper types with BigInt and BigUint. They include tests for all overloaded operators and other methods implemented through traits.

The tests assume the num-bigint types are themselves correct, so they are not designed to catch bugs in num-bigint's implementation. Instead, they are designed to catch bugs in the wrapper types' implementation, and to ensure that the wrapper types are consistent with num-bigint's types.

Implementation Notes

The implementation of this crate is in layers. The lowest layer is the BigNumber trait, which exposes the APIs of BigInt and BigUint through a trait that includes all their overloaded operators in a single trait.

The next layer is the Encoding trait, which defines how to encode small values and big values, and how to convert between them. This layer is where the different encoding strategies are implemented.

The top layer is the wrapper types, which are the types that users of this crate will interact with. They are implemented in terms of the Encoding trait, and they provide roughly the same APIs as BigInt and BigUint, with its same trait implementations, but with different performance characteristics depending on the encoding used.

Overloaded operators, implemented in the num_ops module, have their own set of layers. The top level is a set of traits for different categories of operators: arithmetic, bitwise, shift, and the power operator. These trait provide the logic to dispatch different argument types to the most specific overloads provided by BigInt and BigUint. These operators come in many combinations of argument types based which primitive type they accept and whether the arguments are consumed or borrowed.

In the next layer, the operator traits are implemented for specific operators, and finally, the operators themselves are implemented on the wrapper types by delegating to the operator trait implementations.

This crate makes heavy use of macros to avoid code duplication. The duplicate crate used heavily, and the paste crate is indespensible for building method names systematically.

About

Wrapper around `num-bigint` to provide better performance for small integers

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors