Stateless random number generation for parallel and GPU workloads. Zero dependencies.
PhiloxRNG.jl implements the Philox4x32 counter-based RNG as pure, inlineable functions with no global state. Each call maps a (counter, key) pair directly to random output — making it trivially parallel across threads, tasks, or GPU lanes. Includes built-in uniform and normal distributions.
While the raw integer outputs of philox4x32_10 are identical on all devices, floating-point distribution outputs may differ slightly due to fast-math approximations.
Ported from the C++ Random123 library.
using Pkg
Pkg.add("PhiloxRNG")Every function takes three UInt64 arguments: (ctr0, ctr1, key).
ctr0,ctr1— a 128-bit counter. Each unique(ctr0, ctr1)pair produces independent output.key— a seed. Different keys give independent streams.
using PhiloxRNG
key = UInt64(42)
# 4 normally distributed Float32 values
randn_f32(UInt64(0), UInt64(0), key)
# 2 normally distributed Float64 values
randn_f64(UInt64(0), UInt64(0), key)
# 4 uniform Float32 values in (0, 1]
randu01_f32(UInt64(0), UInt64(0), key)All take (ctr0::UInt64, ctr1::UInt64, key::UInt64):
| Function | Returns |
|---|---|
randn_f32 |
NTuple{4, Float32} — normal |
randn_f64 |
NTuple{2, Float64} — normal |
randu01_f32 |
NTuple{4, Float32} — uniform (0, 1] |
randu01_f64 |
NTuple{2, Float64} — uniform (0, 1] |
randuneg11_f32 |
NTuple{4, Float32} — uniform [-1, 1] |
randuneg11_f64 |
NTuple{2, Float64} — uniform [-1, 1] |
philox4x32_10 |
NTuple{4, UInt32} — raw RNG output |
Lower-level public helpers (access via PhiloxRNG.u01, etc.):
| Function | Description |
|---|---|
u01(F, x::Unsigned) |
Convert unsigned int to F in (0, 1] |
uneg11(F, x::Unsigned) |
Convert unsigned int to F in [-1, 1] |
boxmuller(F, u1, u2) |
Box-Muller transform: 2 unsigned ints to 2 normal floats |
Julia 1.12.5, AMD Ryzen 7 9800X3D, NVIDIA GeForce RTX 3080.
| Function | PhiloxRNG | Random stdlib |
|---|---|---|
rand F32 |
0.679 | 0.528 |
rand F64 |
1.371 | 1.074 |
randn F32 |
0.898 | 2.103 |
randn F64 |
2.009 | 1.801 |
| Function | PhiloxRNG | CUDA.jl |
|---|---|---|
rand F32 |
0.006 | 0.006 |
randn F32 |
0.007 | 0.032 |
See benchmarks/ for the full benchmark scripts.
-
Random123.jl provides an
AbstractRNGinterface for multiple counter-based RNG families (Philox, Threefry, ARS, AESNI). Use it when you need a drop-in replacement for Julia's standardrand(rng, ...)API. -
PhiloxRNG.jl exposes bare functions with no
AbstractRNGwrapper, no dependencies, and built-in fast distributions. Use it in GPU kernels or hot loops where the function-call interface is a better fit than a mutable RNG object.
- Salmon, J. K., Moraes, M. A., Dror, R. O., & Shaw, D. E. (2011). Parallel random numbers. Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, 1–12. https://doi.org/10.1145/2063384.2063405