Add convergence criterion, full-rank and IAF approximations, ADVI stability notebooks by christiaanjs · Pull Request #108 · christiaanjs/treeflow

christiaanjs · 2026-06-16T09:30:39Z

Summary

RelativeLossNotDecreasing convergence criterion — tracks an EWMA of the per-step ELBO decrease normalised by |ELBO|, with a min_consecutive threshold to avoid spurious early stops from single-step dips in the convergence rate
Full-rank variational approximation (treeflow/model/approximation/full_rank.py) — multivariate Normal with full lower-triangular covariance in joint unconstrained space (~6k parameters for YFV vs 154 for mean-field); achieves ~2700–3700 nat ELBO improvement over mean-field on YFV
IAF approximation improvements (iaf.py) — trainable affine base (loc_var, log_scale_var) so the IAF starts as a mean-field at network init; DeferredTensor fix so log_scale_var receives gradients; small kernel init (stddev=0.01) to prevent NaN warm-up losses from extreme initial samples; surrogate warm-up against a fitted mean-field target before ELBO optimisation
advi_stability.ipynb — experiment notebook covering mean-field stability across seeds, full-rank comparison, and posterior geometry analysis of the clock rate × root height non-identifiability
advi_iaf.ipynb — self-contained IAF experiment notebook with MF/FR reference runs and the surrogate warm-up strategy; IAF achieves ~−6010 ELBO vs full-rank ~−6110
Tests — 20 tests for RelativeLossNotDecreasing; phylo likelihood test parametrized over all three unroll modes (unrolled, tensorarray, while_loop)
vi/util.py — VIResults namedtuple and default_vi_trace_fn to capture convergence criterion state in traces

Test plan

pytest test/vi/test_relative_loss_not_decreasing.py — all 20 convergence criterion tests pass
pytest test/traversal/test_phylo_likelihood.py — parametrized over unroll modes
Run advi_stability.ipynb end-to-end (MF + FR cells)
Run advi_iaf.ipynb end-to-end (verifies warm-up + ELBO convergence)

🤖 Generated with Claude Code

Implements a custom TFP ConvergenceCriterion that tracks EWMA of per-step loss decrease normalised by |ELBO|, making the threshold invariant to dataset scale and starting conditions. A min_consecutive parameter (default 10) requires the condition to hold for N consecutive steps, preventing spurious early stopping from transient single-step dips in rel_rate. Also adds: - VIResults.convergence_criterion_state field (backward-compat default None) so criterion state (ewma, rel_rate, consecutive_below) is available in traces - 20 unit tests covering EWMA update, NaN handling, and convergence logic - experiments/advi_stability.ipynb: 5-run YFV ADVI stability study; conclusion notes that clock_rate / root_height variation reflects the mean-field approximation's inability to capture the clock-rate × root-height ridge Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Adds cells to run NUM_IAF_RUNS independent IAF fits alongside the existing mean-field runs, then compares: - Within-IAF stability (inter_run/post_sd table) - Loss traces: mean-field (dashed) vs IAF (solid) on shared axes - Pooled posterior marginals: MF (blue) vs IAF (orange) histograms - Side-by-side MF vs IAF summary table Conclusion has a placeholder IAF findings section with the key questions to fill in once the cells have been run. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Implements get_fixed_topology_full_rank_approximation: a multivariate Normal in the joint unconstrained parameter space, capable of capturing correlations (e.g. the clock rate x root height ridge that degrades mean-field stability). Key design: - _FullRankAffineBijector stores loc (D,) and raw_scale (D,D) as tf.Variables - lower-triangular extraction via band_part, diagonal positivity via softplus - Composed with the existing split/restructure/event-space bijector chain (same pattern as the IAF approximation) - Initialised to loc=prior-medians (unconstrained), scale_tril ≈ identity Updates advi_stability.ipynb to run 3 full-rank fits alongside the 5 mean-field runs, with loss-trace comparison, within-FR stability summary, and pooled MF vs FR posterior marginal histograms. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Full-rank ELBO ~2700-3700 nats better than mean-field (FR: -6099 to -6123 vs MF: -8845 to -9790), confirming mean-field loses significant posterior mass on the strict-clock model. Root height posterior std expands 2.5x (86 -> 213) and pop_size 1.4x wider under full-rank, reflecting correct representation of the clock rate x root height ridge. Inter-run stability for clock_rate is similar between approximations (0.60 vs 0.62), showing the degeneracy is inherent in the posterior geometry, not an artifact of the approximation family. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Quantify the clock-rate × root-height ridge: Pearson corr = -0.48, log-log slope = -0.75 (partial non-identifiability), quadratic coeff = 0.08 - Test IAF as a more flexible approximation family: all three IAF runs fail (run 1 stalls at ELBO ≈ -14700; runs 2-3 crash with NaN gradients), showing optimisation difficulty is the limiting factor, not expressiveness - Update conclusion with geometry diagnostics and IAF findings - Refresh native_vs_tf_vi_validation.ipynb outputs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Move IAF cells from advi_stability.ipynb to new advi_iaf.ipynb, which includes self-contained MF/FR reference runs for comparison - advi_stability.ipynb now covers mean-field and full-rank only - iaf.py: add trainable affine base (loc_var + log_scale_var) so the IAF is Normal(init_loc, 1) at network init rather than a random scramble - iaf.py: use DeferredTensor for softplus(log_scale_var) so gradients flow back to log_scale_var during training - iaf.py: use TruncatedNormal(stddev=0.01) kernel init for the autoregressive network so IAF bijectors start near-identity, preventing NaN warm-up losses from extreme initial samples Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

christiaanjs and others added 7 commits June 16, 2026 19:34

Parametrize phylo likelihood test over all three unroll modes

e4c9765

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

christiaanjs changed the title ~~Add native C++ op for the node-height ratio transform~~ Add convergence criterion, full-rank and IAF approximations, ADVI stability notebooks Jun 16, 2026

christiaanjs and others added 4 commits June 16, 2026 22:01

Fix IAF variable count assertion for new affine base variables

92ad751

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Simplify build and traversal gradient tests

5772b56

Don't require bitwise equality for blocked operations

6bdc822

Some naming tweaks

02388bc

christiaanjs changed the base branch from master to claude/cpp-tensorflow-height-ratio-op-39g890 June 22, 2026 00:26

christiaanjs added 2 commits June 22, 2026 20:03

Run some notebooks

172689a

Parameter convergence criterion

88a542d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add convergence criterion, full-rank and IAF approximations, ADVI stability notebooks#108

Add convergence criterion, full-rank and IAF approximations, ADVI stability notebooks#108
christiaanjs wants to merge 13 commits into
claude/cpp-tensorflow-height-ratio-op-39g890from
convergence

christiaanjs commented Jun 16, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

christiaanjs commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

christiaanjs commented Jun 16, 2026 •

edited

Loading