Skip to content

fix(examples): self-bootstrap dirs, propagate npy-write failures#166

Closed
LeoBuron wants to merge 12 commits into
examples-ecg-aefrom
fix-examples-io-bootstrap
Closed

fix(examples): self-bootstrap dirs, propagate npy-write failures#166
LeoBuron wants to merge 12 commits into
examples-ecg-aefrom
fix-examples-io-bootstrap

Conversation

@LeoBuron
Copy link
Copy Markdown
Member

Summary

Both examples/har_classifier/train_c.c and examples/ecg_anomaly_ae/train_c.c:

  • Self-bootstrap logs/ and outputs/ dirs via a small ensureDir helper called at the top of main(). Those dirs are gitignored and were implicitly created by train_pytorch.py's Path.mkdir. Running C-only (fresh checkout, git clean -fdx, or CI without the Python step) used to fail with a cryptic fopen ENOENT.
  • Propagate npyWrite* failures to the program's exit code. The previous code printed the error to stderr and then return 0;, swallowing the failure. Now an accumulator int status collects rc != 0 and is returned at the end.

Why stacked on examples-ecg-ae

The ECG train_c.c doesn't exist on develop yet (it's introduced in PR #163), so this fix has to land in the same stack. Once #163 merges, this PR can be re-targeted to develop (or just merged through the stack).

Test plan

  • cmake --build --preset examples --target train_c_har_classifier --target train_c_ecg_anomaly_ae clean
  • ECG: removed logs/ and outputs/ dirs, ran binary → dirs auto-created, exit 0
  • ECG compare.py parity PASS
  • HAR happy path → exit 0, identical metrics (test_acc 0.9063)
  • HAR compare.py parity PASS
  • HAR + ECG failure-path test (chmod 444 on output file) → exit 1 with stderr message (was exit 0 before)
  • Full CI: alloc-locality, format-check, unit_test (gcc) 42/42, unit_test_asan 42/42, pytest 21 passed

Follow-up

A subagent investigation of the npy-write error path also surfaced a latent buffer-bounds issue in examples/_shared/npy_writer.c for high-rank shapes — filed as #165, separate fix.

🤖 Generated with Claude Code

LeoBuron added 12 commits May 10, 2026 17:37
EPOCHS=200 (vs spec §4.2's 50): the K=2 stride-2 decoder substitution
slows convergence enough that 50 epochs is insufficient. Empirical
trajectory at LR=0.005, batch=32 escapes a dead-ReLU plateau around
epoch 18 and crosses test_mse 0.10 around epoch 100; 200 epochs
provides a safety margin past 0.05.
…te failures

Both har_classifier and ecg_anomaly_ae train_c.c mains:

1. Added ensureDir helper that mkdir(2)'s the logs/ and outputs/
   directories at startup, treating EEXIST as success. Those dirs are
   gitignored and were implicitly created by train_pytorch.py via
   Path.mkdir; running C-only (fresh checkout, git clean -fdx, or CI
   without the Python step) used to fail with a cryptic fopen ENOENT.

2. Replaced 'return 0' after the npyWrite* error branches with a
   'status' accumulator so write failures propagate to the program's
   exit code instead of being swallowed.

Verified by removing logs/ and outputs/ for ECG, re-running the
binary (exit 0, dirs recreated), and re-running compare.py for both
examples (parity PASS).
@LeoBuron LeoBuron closed this May 11, 2026
LeoBuron added a commit that referenced this pull request May 11, 2026
PyTorch + C parity demo for a 1D-CNN reconstruction autoencoder on the
UCR ECG5000 dataset. Training is filtered to class-1 normals only; at
eval, reconstruction MSE acts as an anomaly score against the multi-
class test set, with the threshold derived from training-set normals.
First example to exercise Conv1dTransposed.

Adds:
- examples/ecg_anomaly_ae/ — prepare_data.py (download + parse ECG5000
  into [N,1,140] .npy), train_pytorch.py (1D-CNN encoder/decoder),
  train_c.c (11-layer model wired through Conv1d/Conv1dTransposed,
  JSON log writer, post-training reconstruction writes), compare.py
  (MSE + AUC parity + plot emission), README, CMakeLists.
- examples/_shared/plotting.py — plot_reconstructions and
  plot_anomaly_score_hist (with degenerate-bin guard).
- examples/README.md mark Stage 2 done; examples/CMakeLists.txt wires
  the ecg_anomaly_ae subdir.

The K=2 stride-2 decoder substitution (forced by Conv1dTransposed
supporting only paddingType=VALID + outputPadding) slows convergence
enough that spec section 4.2's 50 epochs is insufficient; EPOCHS=200
provides a safety margin past the expected test_mse around 0.05.

Also folds in the I/O hardening (was PR #166) for both examples'
train_c.c mains:
1. Self-bootstrap logs/ and outputs/ via an ensureDir helper that
   mkdir(2)'s with EEXIST treated as success. Previously those dirs
   were gitignored and implicitly created only by train_pytorch.py via
   Path.mkdir; running C-only used to fail with a cryptic fopen ENOENT.
2. Propagate npyWrite* failures to the program's exit code via a
   status accumulator; previously the error was printed to stderr
   then return 0 silently swallowed it.

Verified end-to-end: ECG with logs/ and outputs/ removed re-creates
both dirs and exits 0; both examples' compare.py parity PASS.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant