feat(examples): HAR classifier with PyTorch + C parity (Stage 1/4)#162
Merged
Conversation
11de5ee to
4b8cc79
Compare
…in dispatchers PR #159 (Conv1d refactor + Conv1dTransposed Layer) and PR #161 (MaxPool1d + AvgPool1d) added the new layer kernels to layer/Layer.c's vtable but left four downstream dispatcher switches at their pre-PR2 state, where the default branch is PRINT_ERROR + exit(1). Without these patches, no training program using Conv1d, Conv1dTransposed, MaxPool1d, or AvgPool1d can run end-to-end — every path hard-errors before the first batch. Patches are minimal: each adds the missing CONV1D / CONV1D_TRANSPOSED / MAXPOOL1D / AVGPOOL1D cases to existing dispatch switches (no new logic). - src/userApi/InferenceApi.c::initBufferOutput — extract forwardQ for the four layers - src/userApi/training_loop/calculate_grads/CalculateGradsSequential.c ::initLayerOutputs — same extraction - src/userApi/optimizer/SgdApi.c::sgdMCreateOptim — register Conv1d and Conv1dTransposed weights+bias parameters; MaxPool1d / AvgPool1d fall through with the other no-trainable-params layers - src/optimizer/Optimizer.c::calcNumberOfStatesByLayerType — return 2 states for CONV1D_TRANSPOSED (CONV1D was already there) Discovered while wiring up examples/har_classifier/train_c.c. Likely related to issue #6 (Conv1d empty-stub tail). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds the first end-to-end 1D-CNN demonstration plus the shared
infrastructure for the four-example rollout. Each example is
self-contained under examples/<name>/ with prepare_data.py,
train_pytorch.py, train_c.c, compare.py, and CMakeLists.txt. Shared
infra under examples/_shared/ supports Stages 2-4.
Shared infrastructure (examples/_shared/):
- xorshift32.py: Python mirror of src/rng/RNG.c (Marsaglia XorShift32
with shifts 13/17/5 + rejection-sampled Fisher-Yates) so PyTorch
DataLoader shuffles the same byte-identical order as the C-side
DataLoader. A runtime-compiled C harness asserts byte-equality vs
the framework code in pytest.
- log_schema.py: shared JSON schema for training logs, emitted from
PyTorch via dump_log() and from C via printf-by-hand.
- parity.py: per-metric absolute or relative tolerance helpers with
ParityCheck/ParityResult dataclasses.
- plotting.py: matplotlib helpers for loss/accuracy curves and
confusion matrices (PyTorch solid, C dashed convention).
- npy_writer.{h,c}: tiny .npy v1.0 writer for the C side
(NPYLoaderApi reads but doesn't write).
- seeds.py + DETERMINISM.md: SEED=42, SHUFFLE_SEED=42 plus the
determinism contract.
HAR example (examples/har_classifier/):
- prepare_data.py downloads the UCI HAR archive, stacks 9 inertial
channels, writes [N, 9, 128] float32 splits with deterministic
10% validation carve.
- train_pytorch.py: 12-layer 1D-CNN (3x Conv1d/ReLU/MaxPool1d +
AvgPool1d + Flatten + Linear + Softmax). 20 epochs SGD lr=0.01
m=0.9 batch=64. Reaches test_acc 91.75%.
- train_c.c: same architecture in the ODT C framework (Conv1d via
userApi, MaxPool1d/AvgPool1d via manual layer construction since
no userApi for them yet). Reaches test_acc 90.63%.
- compare.py: parity checks (test_acc within +/-2.5pp absolute,
test_loss within +/-0.15 nats absolute -- empirically calibrated
to handle independent-init confidence-calibration drift while
still flagging real divergence). Emits 4 PNGs.
Build infrastructure:
- BUILD_EXAMPLES cmake option (default OFF -- unit_test_* presets
unaffected). New 'examples' configure + build presets.
- .gitignore patterns for runtime artifacts (data/, logs/, outputs/,
plots/).
- matplotlib added as a dependency.
Tests added (python/tests/):
- test_xorshift32.py: 5 algorithm tests + 3 C-vs-Python parity tests.
- test_log_schema.py: roundtrip + required-key validation.
- test_parity.py: 6 tests for ParityCheck/ParityResult.
- test_npy_writer.py: float32/int32 roundtrips against numpy.load.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
4b8cc79 to
1e82212
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Stage 1 of a four-stage
examples/rollout. Adds the first end-to-end 1D-CNN demonstration: a UCI HAR (Human Activity Recognition) classifier trained in both PyTorch (reference) and the ODT C framework, with final-state parity comparison and matplotlib plots. Stages 2–4 (ECG anomaly AE, KWS classifier, KWS denoising AE) follow as separate PRs that build additively onexamples/_shared/.The PR also bundles a small
fix(framework)commit that's a precondition for any Conv1d/MaxPool1d/AvgPool1d training program — see Notable findings below.What's in this PR
Shared infrastructure (
examples/_shared/):src/rng/RNG.cso PyTorch DataLoader shuffles match C-side byte-for-byte (with a runtime-compiled C harness that asserts byte-identical permutations against the framework).log_schema.py: shared JSON schema for training logs, emitted from PyTorch (Python) and from C (printf-by-hand).parity.py: per-metric absolute or relative tolerance helpers.plotting.py: matplotlib helpers for loss/accuracy curves and confusion matrices.npy_writer.{h,c}: tiny .npy v1.0 writer for the C side (NPYLoaderApi only reads).seeds.py,DETERMINISM.md: seed constants and the determinism contract.HAR example (
examples/har_classifier/):prepare_data.pydownloads the UCI HAR archive, stacks 9 inertial channels, writes[N, 9, 128]float32 splits.train_pytorch.py: 12-layer 1D-CNN (3× Conv1d/ReLU/MaxPool + AvgPool + Flatten + Linear + Softmax). Runs 20 epochs SGD lr=0.01 m=0.9 batch=64. Reaches test_acc 91.75%.train_c.c: same architecture in the ODT C framework. Reaches test_acc 90.63%.compare.py: parity checks (±2.5pp accuracy, ±0.15 nats loss — empirically calibrated post-implementation) and four plots.Build infrastructure:
BUILD_EXAMPLEScmake option (default OFF —unit_test_*presets unaffected).examplesconfigure + build presets..gitignoreentries for runtime artifacts.Notable findings
While wiring up
train_c.c, three downstream dispatcher switches insrc/userApi/were found to be missing CONV1D/MAXPOOL1D/AVGPOOL1D cases (after PRs #159 and #161 added the layer kernels but the dispatcher updates never landed):InferenceApi.c::initBufferOutputCalculateGradsSequential.c::initLayerOutputsSgdApi.c::sgdMCreateOptimWithout these, no training program using the new layers could run end-to-end — every path hard-errored with
PRINT_ERROR("Unknown Layer Type!"); exit(1). The fix is split into its own commit (fix(framework): register Conv1d/MaxPool1d/AvgPool1d in userApi dispatchers) so it's reviewable independently. Patches are minimal — just additional cases in existing switches.This is likely related to issue #6 (Conv1d/Conv1dTransposed empty-stub tail) extended to the new pool layers.
Parity result
Loss tolerance was calibrated empirically from observed independent-init drift; accuracy is the more meaningful signal and lands well within tolerance.
Test plan
cmake --preset unit_test_debug && cmake --build --preset unit_test_debug && ctest --preset unit_test_debug— 42/42 passesuv run pytest python/tests/— 21/21 passes (3 pre-existing smoke + 18 new)cmake --preset examples && cmake --build --preset examples— clean builduv run python examples/har_classifier/prepare_data.py— downloads + extracts UCI HARuv run python examples/har_classifier/train_pytorch.py— reaches test_acc ≥ 0.85 (got 0.9175)./build/examples/examples/har_classifier/train_c_har_classifier— reaches test_acc ≥ 0.83 (got 0.9063)uv run python examples/har_classifier/compare.py— exits 0 with all checks PASSfix(framework)is a precondition for the example commit and for Stages 2–4)🤖 Generated with Claude Code