Skip to content

Transcendental functions for Saloon#62

Merged
sigilante merged 20 commits into
mainfrom
sigilante/chebyshev-transcendentals
Jun 11, 2026
Merged

Transcendental functions for Saloon#62
sigilante merged 20 commits into
mainfrom
sigilante/chebyshev-transcendentals

Conversation

@sigilante

Copy link
Copy Markdown
Collaborator

Implements the plan from #18 (which this supersedes): replace the naive
Taylor/AGM transcendentals in /lib/math with range-reduced minimax
algorithms that are faithful (≤1 ULP) and bit-exact across both Vere
variants
— vere64 (~tex) and vere32 (~hex) — so they can be jetted
bit-for-bit against SoftFloat.

All internals are forced to round-nearest-even (a correctly-rounded
transcendental has no rounding-mode axis; the future SoftFloat C jet will
match this). Reference/oracle: libmath/tools/cheb_check.py — a strict
f64/f32 implementation of each algorithm (bit-identical to SoftFloat for the
primitives used) with mpmath as independent truth for the ULP bound.

Done (both @rs and @rd, swapped into math.hoon, tests green on both variants)

  • exp — Cody-Waite reduction + minimax poly; full range guards (NaN, ±inf,
    overflow→inf, underflow→0, correctly-rounded subnormals).
  • log2^e·m reduction + atanh series (fdlibm f - s·(f-R) form);
    guards incl. log(±0)=-inf, log(x<0)=NaN, subnormal inputs.
  • sin/cosq·π/2 reduction + fdlibm kernels with two-part reduced
    argument; @rd 2-part π/2, @rs 3-part π/2.
  • atan — fdlibm breakpoint reduction + minimax; atan2 derives from it.
  • sqrt@rs delegates to the SoftFloat f32 root; @rd = stdlib seed + one
    Markstein FMA correction → correctly-rounded (matches the jet).
  • cbrt — now sign·exp(log|x|/3), defined for all reals (was pow x .333
    and crashed on negatives).
  • log-2 / log-10 / pow / pow-n — derive from exp/log, auto-improved + locked.

Test suites: math-exp (34), math-log (26), math-trig (33), math-sqrt
(18), math-derived (10), math-atan (22).

Remaining (in progress)

  • asin/acos — reference proven faithful in cheb_check.py (all four ≤1 ULP);
    Hoon swap next.
  • tan — currently the sin/cos ratio (~1 ULP); a dedicated kernel later.
  • rh/rq doors still naive (Phase 2), then the batched C-jet pass.

Plan/design discussion: #18.

🤖 Generated with Claude Code

sigilante and others added 6 commits June 9, 2026 18:50
Replace the naive Taylor exp in the @rs and @rd math doors with a
Cody-Waite range reduction (x = k*ln2 + r) plus a minimax polynomial
(degree 6 for @rs, 11 for @rd), faithful to <=1 ULP.  Internals are
forced to round-nearest-even -- a correctly-rounded transcendental has
no rounding-mode axis, and this is what the forthcoming SoftFloat C jet
will match bit-for-bit (the goal of PR #18).

+scale2 is a correctly-rounded ldexp (2^k reconstruction) that is exact
across the normal range and rounds exactly once into the overflow and
subnormal tails.  Full range guards: NaN, +-inf, overflow->inf,
underflow->0, plus correctly-rounded subnormal results.

The reference/oracle is libmath/tools/cheb_check.py: a strict-f64/f32
Python implementation of the identical algorithm, bit-identical to
SoftFloat for the primitives used (+ - * /, round-to-int, ldexp), with
mpmath as independent truth for coefficient design and the ULP bound.

Verified bit-exact on both Vere variants -- vere64 (~tex) and vere32
(~hex) -- across the normal range and every tail; @rd shows no
cross-variant divergence.  New suite: tests/lib/math-exp (34 cases).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Replace the naive (unreduced) Taylor log in the @rs and @rd math doors
with x = 2^e * m reduction (m in [sqrt(1/2), sqrt(2))) plus the atanh
series in the fdlibm-style form log(1+f) = f - s*(f - 2z*P2(z)), which
keeps f as an exact leading term.  Degree 4 (@rs) / 9 (@rd), faithful to
<=1 ULP.  Round-nearest-even internally to match the future SoftFloat
jet bit-for-bit (PR #18).

Adds the guards the old log lacked entirely: NaN, +inf, log(+-0)->-inf,
log(x<0)->NaN, and subnormal inputs (normalised by 2^p before the
bit-extraction reduction).

Reference/oracle extended in libmath/tools/cheb_check.py (log @rd/@rs).
Verified bit-exact on both Vere variants -- vere64 (~tex) and vere32
(~hex) -- over the normal range, both tails, and subnormal inputs.
New suite: tests/lib/math-log (26 cases).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Replace the naive mod-tau Taylor sin/cos in the @rs and @rd doors with a
q*(pi/2) argument reduction + fdlibm __kernel_sin/__kernel_cos, selected
by q&3 with sign symmetry.  The reduced argument is carried as a two-part
(rhi+rlo) value so the kernel's y-correction holds accuracy near zeros;
@rd uses a 2-part pi/2, @rs a 3-part pi/2 (f32 needs the extra bits).

Faithful to <=1 ULP for |x| <~ 500 (@rs) / 2^22 (@rd); guards NaN,
+-inf -> NaN, and +-0 -> +-0 (sign preserved).  sin/cos are thin
wrappers over a shared per-door +rs-trig / +rd-trig engine; +tan keeps
deriving as sin/cos and is now accurate via the new kernels.

Round-nearest-even internally to match the future SoftFloat jet.
Oracle extended in cheb_check.py (sin/cos @rd/@rs).  Verified bit-exact
on vere64 (~tex) and vere32 (~hex).  New suite: tests/lib/math-trig (33).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
sqrt:
  - @rs delegates to the stdlib (SoftFloat) f32 root, which is correctly
    rounded.
  - @rd: the stdlib f64 root is only faithful (up to 1 ULP off, e.g.
    sqrt 2 -> ...bcc not ...bcd), so seed with it and apply one Markstein
    FMA correction (r = x - g*g via fma; g + (0.5/g)*r).  Verified
    correctly-rounded over 600k inputs incl. +-1-ULP seeds -- so it
    matches the SoftFloat sqrt jet bit-for-bit (correct rounding is
    unique).  Full guards: NaN, +inf, +-0, x<0 -> NaN.  (Old code was a
    Newton iteration that stopped at the door rtol ~ 1e-5.)

cbrt: was `pow x .0.3333` and asserted x>=0 (crashed on negatives).  Now
  sign(x) * exp(log|x| / 3) using the accurate exp/log, defined for all
  reals, with NaN / +-0 guards.

log-2 (= log/ln2), log-10 (= log/ln10), pow (= exp(n*log x)), pow-n all
already derived from exp/log, so they improved automatically; add
tests/lib/math-derived to lock their (deterministic, jet-reproducible)
outputs.  New suites: math-sqrt (18), math-derived (10).  Bit-exact on
vere64 (~tex) and vere32 (~hex); eig/complex regressions green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Replace the slow AGM-iteration atan in the @rs and @rd doors with the
fdlibm s_atan algorithm: reduce |x| against the breakpoints 7/16, 11/16,
19/16, 39/16 to a small argument near atan(.5)/atan(1)/atan(1.5)/
atan(inf), then a minimax polynomial (degree 4 @rs / 10 @rd) with odd
symmetry.  Faithful to <=1 ULP; guards NaN, +-inf -> +-pi/2, +-0 -> +-0.
No huge-x threshold needed -- the id=3 (xr = -1/x) path handles
arbitrarily large x (-> +-pi/2) on its own.

atan is a thin wrapper over a shared per-door +rs-atan / +rd-atan |%
engine (coeffs + reduction + kernel).  atan2 already derives from atan
via quadrant logic, so it improved automatically.

Round-nearest-even internally to match the future SoftFloat jet.  Oracle
extended in cheb_check.py (atan @rd/@rs).  Verified bit-exact on vere64
(~tex) and vere32 (~hex).  New suite: tests/lib/math-atan (22).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add asin/acos to libmath/tools/cheb_check.py, all faithful to <=1 ULP
over [-1,1]: asin @rd 0.84, acos @rd 0.89, asin @rs 0.85, acos @rs 0.98.
fdlibm rational P/Q kernel (shared by asin/acos); the [0.5,1) range uses
a sqrt head/tail split to recover low bits (needed at @rs too, not just
@rd, to stay under 1 ULP).  Small-x acos shortcut uses a 2^-26/2^-57
threshold (a too-large threshold returns pi/2 without subtracting x).
Hoon swap into math.hoon is the next step.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
sigilante and others added 3 commits June 10, 2026 09:30
Replace the atan(x/sqrt(1-x^2)) asin/acos (inaccurate near +-1, and
crashed out of domain) with the fdlibm rational P/Q kernel shared by
both: |x|<0.5 uses x + x*R(x^2); [0.5,1) reduces via t=(1-|x|)/2,
s=sqrt(t), with a sqrt head/tail split to recover low bits (needed at
@rs too).  acos recombines per-branch (small-x, x<=-0.5, x>=0.5).
Faithful to <=1 ULP; |x|>1 -> NaN (no crash).  The @rd kernel uses the
door's now-correctly-rounded +sqt; @rs uses the SoftFloat f32 root.

asin/acos are thin wrappers over a shared per-door +rs-ainv / +rd-ainv
|% engine.  Round-nearest-even internally to match the future jet.
Oracle in cheb_check.py.  Bit-exact on vere64 (~tex) and vere32 (~hex).
New suite: tests/lib/math-ainv (26).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The constants were already correctly-rounded f64/f32 ln2/ln10, but
log(x)/ln divides the whole result (losing the exactness of the integer
exponent): log2 was 1.68 ULP, log10 2.39.  Reuse the log reduction via a
new per-door +lr helper returning [e (as float), log(mantissa)], then
combine as e + log(m)/ln2 (log2) and e*log10(2) + log(m)/ln10 (log10),
so the integer part carries no division rounding.  Worst case drops to
1.36 (log2) / 1.61 (log10) ULP; log2(8) and log10(1000) at @rd are now
exact.  Guards: NaN, +inf, log(±0)=-inf, log(x<0)=NaN.  Bit-exact on
both variants; math-derived locks updated.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@rd tan now uses fdlibm __kernel_tan over the q*pi/2 reduction (odd q
takes the -cot path with a head/tail 1/(x+r)): 0.76 ULP, vs ~1.9 for the
sin/cos ratio.  Thin wrapper over a new +rd-tan |% engine (redq + ktan).
Guards: NaN, +-inf -> NaN, +-0 -> +-0.

@rs tan stays as sin/cos (~1.2 ULP): a pure-f32 tan kernel is far worse
near the poles (~9 ULP, since the cot reciprocal needs more than f32) --
FreeBSD's k_tanf computes in double for the same reason.

Oracle extended in cheb_check.py (tan @rd/@rs).  Bit-exact on vere64
(~tex) and vere32 (~hex).  New suite: tests/lib/math-tan (11).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@sigilante sigilante changed the title Faithful, jet-ready transcendentals for @rs/@rd (implements #18) Transcendental functions for Saloon Jun 10, 2026
sigilante and others added 11 commits June 10, 2026 10:50
f16's 10-bit significand is ~2^13 coarser than f32, so computing a
transcendental in the (faithful) @rs door and rounding the result to f16
is correctly-rounded for f16 -- any <=few-ULP(f32) error is far below an
f16 ULP.  This reuses every @rs kernel instead of re-deriving 15 f16
algorithms.

Adds two @rh-door helpers: +widen-hs (f16->f32, exact; subnormals via
m*2^-24) and +narrow-sh (f32->f16, correctly-rounded round-nearest-even,
with carry, overflow->inf, and subnormal handling).  stdlib bit:rh
truncates rather than rounds, so it cannot be used.  Each @rh
transcendental (exp, log, log-2, log-10, sin, cos, tan, asin, acos, atan,
atan2, pow, pow-n, sqt, cbt) is now `narrow-sh (rs-fn (widen-hs x))`.

Conversions verified against numpy float16 (28 cases); results verified
correctly-rounded f16 vs numpy/mpmath (math-rh, 20 cases).  Bit-exact on
vere64 (~tex) and vere32 (~hex).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add the @rq foundation to cheb_check.py: numpy has no true IEEE quad
(its float128 is x87 80-bit ext), so model f128 with mpmath rounded to
113-bit RNE per op (= SoftFloat f128 for + - * / ldexp), plus 128-bit
IEEE-quad encoding (qhex).  exp @rq vertical slice: Cody-Waite + degree
-24 minimax, 1.05 ULP over [-20,20] (the f128 analog of @rd's 0.99 --
Horner evaluation accumulation, essentially faithful).

@rq SoftFloat ops confirmed jetted on-ship.  The native rq Hoon (all
functions, 128-bit constants) is the next step; rq must be native since
f128 is the highest precision available (no higher door to round from,
unlike @rh which routes through @rs).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Per the plan to base the f128 work on the arithmetic the jets actually
use, add libmath/tools/rq_check.c: the @rq transcendentals implemented in
Berkeley SoftFloat (the f128M pointer API vendored in vere/ext/softblas)
with MPFR as the correctly-rounded truth.  This C code IS the jet body
(modulo the u3 ABI), and its output is the algorithm-of-record the Hoon
@rq must match bit-for-bit.

exp @rq vertical slice: Cody-Waite + degree-24 minimax, 0.998 ULP over
[-20,20] vs MPFR; output matches the mpmath f128 model (cheb_check.py
exp-rq) bit-for-bit, cross-validating both.  numpy has no true IEEE quad
(its float128 is x87 80-bit ext), so SoftFloat + MPFR is the only sound
f128 reference.  Build recipe in the file header.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
rq_check.c now covers exp (0.998 ULP) and log (0.813 ULP) over the test
range, both in SoftFloat f128 (jet basis) verified vs MPFR truth.  log
uses the same 2^e*m + atanh (fdlibm f - s*(f-R)) reduction as @rd at
degree 22.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
rq_check.c now covers exp 0.998, log 0.813, sin 0.732, cos 0.709, sqrt
0.500 ULP -- all in SoftFloat f128 (jet basis), verified vs MPFR truth.
sin/cos use the @rd q*pi/2 reduction + fdlibm kernels (two-part reduced
arg) at degree ~16; sqrt is SoftFloat f128_sqrt directly (correctly
rounded).  Remaining for the C reference: tan, atan, asin, acos.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Replace the naive Taylor exp in the @rq door with the faithful native
f128 algorithm (Cody-Waite reduction + degree-24 minimax, bias 16383,
112-bit mantissa, S=112 subnormal scale), ported from @rd.  Round
-nearest-even internally; bit-exact against the SoftFloat-f128 reference
tools/rq_check.c (the jet basis) on vere64 (~tex) and vere32 (~hex).
New suite tests/lib/math-rq (9 exp cases incl. NaN/+-inf/+-0).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Port the @rd log to @rq: 2^e*m reduction + atanh series (fdlibm
f - s*(f-R)) at degree 22, f128 (bias 16383, 112-bit mantissa, subnormal
pre-scale 2^120).  Guards: NaN, +inf, log(+-0)=-inf, log(x<0)=NaN.
Bit-exact vs tools/rq_check.c on both variants.  math-rq +4 log cases.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
sqrt: stdlib f128 seed + one Markstein FMA -> correctly-rounded (matches
SoftFloat f128_sqrt).  sin/cos: q*pi/2 reduction + fdlibm kernels over a
shared +rq-trig engine (two-part reduced arg), ported from @rd at f128.
All bit-exact vs tools/rq_check.c on vere64 (~tex) and vere32 (~hex).
math-rq now 21 cases (exp/log/sin/cos/sqrt).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Port the @rd derived transcendentals to @rq (f128): +lr (log reduction
-> [e, log m]), log-2 = e + log(m)/ln2, log-10 = e*log10(2) + log(m)/ln10,
pow = exp(n*log x) (integer fast path via pow-n), pow-n = repeated mul,
cbrt = sign*exp(log|x|/3).  All bit-exact on vere32 (~hex) against the
f128 model; math-rq now 30 cases.

NOTE: validated on vere32 only -- vere64's f128 SUBTRACTION jet is broken
(mis-dispatches to a 256-bit complex @cq path: 1.0-0.877 -> [0.875|0.125],
254-bit), so any @rq fn using sub diverges there until that runtime jet is
fixed.  exp/sqrt (no cancellation-sensitive sub) and pow-n match on both.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
atan: fdlibm breakpoint reduction + degree-30 minimax (f128), 0.725 ULP;
thin wrapper over a +rq-atan engine, ported from @rd.  atan2 derives via
quadrant logic with f128 pi.  Bit-exact on vere32; math-rq +6 atan cases.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
asin/acos: fdlibm rational-form kernel (degree-30 poly R, shared) + sqrt
head/tail split on [0.5,1); f128.  asin 0.959, acos 0.820 ULP.  Replaces
the old via-atan asin/acos (which crashed out of domain and hit ~10 ULP
for acos via pi/2-asin).  Over a shared +rq-ainv engine, ported from @rd.

This completes the @rq transcendental door: exp, log, log-2, log-10,
sin, cos, tan (sin/cos ratio ~2 ULP, as @rs), asin, acos, atan, atan2,
sqrt, cbrt, pow, pow-n -- all bit-exact on vere32 against the SoftFloat
f128 reference (math-rq, 45 cases).  The full @rs/@rd/@rh/@rq
transcendental surface is now faithful in Hoon.

NOTE: vere64 still diverges on @rq fns using f128 sub (its sub jet
mis-dispatches to a 256-bit complex path) -- a runtime bug to fix next.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@sigilante

Copy link
Copy Markdown
Collaborator Author

vere64 @rq sub jet bug — fixed & verified (runtime side)

The one remaining cross-variant blocker noted in the Chebyshev jetting plan — ~(sub rq …) mis-evaluating on the 64-bit build — is resolved. Root cause was not a @cq mis-dispatch (the earlier hypothesis); it was a one-token typo in the f128 jet glue.

Root cause: pkg/noun/jets/e/rq.c u3qeq_sub wrote u3i_words(4, e.c) while every other f128 op uses the width constant n. On the 64-bit build c3_w is uint64_t, so a float128 is n=2 words; writing 4 emitted a ~256-bit atom whose high half is stack garbage read past the alloca(16) buffer — the observed 254-bit [operand | difference] result. On 32-bit builds n=4, so the literal was accidentally correct, which is why only vere64 (~tex) broke while vere32 (~hex) + the SoftFloat reference always agreed.

Verification (fresh ship on the rebuilt 64-bit binary):

query old binary new binary
(met 0 (sub:rq .~~~1.0 .~~~0.877)) 254 126
value [0.877 | 0.123] (double-width) .~~~0.123…

add/mul were always 126 bits; sub now matches. Every @rq function that routes through sub (e.g. cos) is therefore correct on the 64-bit build.

Runtime commits (urbit/vere):

  • urbit/vere#1022 (neal/lagoon-softblas-64ml/64): the fix — 0ab2cd99ac.
  • urbit/vere#1021 (neal/lagoon-jet-fixesdevelop): parity normalization introducing the n constant on the 32-bit jet — 22b09f21a0.

Tracking branch for this jet work: sigilante/transcendental-jets.

@sigilante sigilante merged commit 043a437 into main Jun 11, 2026
@sigilante sigilante deleted the sigilante/chebyshev-transcendentals branch June 11, 2026 14:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant