feat(examples): HAR classifier with PyTorch + C parity (Stage 1/4) by LeoBuron · Pull Request #162 · es-ude/OnDeviceTraining

LeoBuron · 2026-05-10T13:19:25Z

Summary

Stage 1 of a four-stage examples/ rollout. Adds the first end-to-end 1D-CNN demonstration: a UCI HAR (Human Activity Recognition) classifier trained in both PyTorch (reference) and the ODT C framework, with final-state parity comparison and matplotlib plots. Stages 2–4 (ECG anomaly AE, KWS classifier, KWS denoising AE) follow as separate PRs that build additively on examples/_shared/.

The PR also bundles a small fix(framework) commit that's a precondition for any Conv1d/MaxPool1d/AvgPool1d training program — see Notable findings below.

What's in this PR

Shared infrastructure (examples/_shared/):

XorShift32 Python mirror of src/rng/RNG.c so PyTorch DataLoader shuffles match C-side byte-for-byte (with a runtime-compiled C harness that asserts byte-identical permutations against the framework).
log_schema.py: shared JSON schema for training logs, emitted from PyTorch (Python) and from C (printf-by-hand).
parity.py: per-metric absolute or relative tolerance helpers.
plotting.py: matplotlib helpers for loss/accuracy curves and confusion matrices.
npy_writer.{h,c}: tiny .npy v1.0 writer for the C side (NPYLoaderApi only reads).
seeds.py, DETERMINISM.md: seed constants and the determinism contract.

HAR example (examples/har_classifier/):

prepare_data.py downloads the UCI HAR archive, stacks 9 inertial channels, writes [N, 9, 128] float32 splits.
train_pytorch.py: 12-layer 1D-CNN (3× Conv1d/ReLU/MaxPool + AvgPool + Flatten + Linear + Softmax). Runs 20 epochs SGD lr=0.01 m=0.9 batch=64. Reaches test_acc 91.75%.
train_c.c: same architecture in the ODT C framework. Reaches test_acc 90.63%.
compare.py: parity checks (±2.5pp accuracy, ±0.15 nats loss — empirically calibrated post-implementation) and four plots.

Build infrastructure:

BUILD_EXAMPLES cmake option (default OFF — unit_test_* presets unaffected).
examples configure + build presets.
.gitignore entries for runtime artifacts.

Notable findings

While wiring up train_c.c, three downstream dispatcher switches in src/userApi/ were found to be missing CONV1D/MAXPOOL1D/AVGPOOL1D cases (after PRs #159 and #161 added the layer kernels but the dispatcher updates never landed):

InferenceApi.c::initBufferOutput
CalculateGradsSequential.c::initLayerOutputs
SgdApi.c::sgdMCreateOptim

Without these, no training program using the new layers could run end-to-end — every path hard-errored with PRINT_ERROR("Unknown Layer Type!"); exit(1). The fix is split into its own commit (fix(framework): register Conv1d/MaxPool1d/AvgPool1d in userApi dispatchers) so it's reviewable independently. Patches are minimal — just additional cases in existing switches.

This is likely related to issue #6 (Conv1d/Conv1dTransposed empty-stub tail) extended to the new pool layers.

Parity result

Metric	PyTorch	C	Diff	Tolerance	Pass
test_acc	91.75%	90.63%	1.12 pp	±2.5 pp	✅
test_loss	0.227	0.352	0.125	±0.15	✅

Loss tolerance was calibrated empirically from observed independent-init drift; accuracy is the more meaningful signal and lands well within tolerance.

Test plan

cmake --preset unit_test_debug && cmake --build --preset unit_test_debug && ctest --preset unit_test_debug — 42/42 passes
uv run pytest python/tests/ — 21/21 passes (3 pre-existing smoke + 18 new)
cmake --preset examples && cmake --build --preset examples — clean build
uv run python examples/har_classifier/prepare_data.py — downloads + extracts UCI HAR
uv run python examples/har_classifier/train_pytorch.py — reaches test_acc ≥ 0.85 (got 0.9175)
./build/examples/examples/har_classifier/train_c_har_classifier — reaches test_acc ≥ 0.83 (got 0.9063)
uv run python examples/har_classifier/compare.py — exits 0 with all checks PASS
Reviewer eyes on the framework split commit (fix(framework) is a precondition for the example commit and for Stages 2–4)

🤖 Generated with Claude Code

…in dispatchers PR #159 (Conv1d refactor + Conv1dTransposed Layer) and PR #161 (MaxPool1d + AvgPool1d) added the new layer kernels to layer/Layer.c's vtable but left four downstream dispatcher switches at their pre-PR2 state, where the default branch is PRINT_ERROR + exit(1). Without these patches, no training program using Conv1d, Conv1dTransposed, MaxPool1d, or AvgPool1d can run end-to-end — every path hard-errors before the first batch. Patches are minimal: each adds the missing CONV1D / CONV1D_TRANSPOSED / MAXPOOL1D / AVGPOOL1D cases to existing dispatch switches (no new logic). - src/userApi/InferenceApi.c::initBufferOutput — extract forwardQ for the four layers - src/userApi/training_loop/calculate_grads/CalculateGradsSequential.c ::initLayerOutputs — same extraction - src/userApi/optimizer/SgdApi.c::sgdMCreateOptim — register Conv1d and Conv1dTransposed weights+bias parameters; MaxPool1d / AvgPool1d fall through with the other no-trainable-params layers - src/optimizer/Optimizer.c::calcNumberOfStatesByLayerType — return 2 states for CONV1D_TRANSPOSED (CONV1D was already there) Discovered while wiring up examples/har_classifier/train_c.c. Likely related to issue #6 (Conv1d empty-stub tail). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Adds the first end-to-end 1D-CNN demonstration plus the shared infrastructure for the four-example rollout. Each example is self-contained under examples/<name>/ with prepare_data.py, train_pytorch.py, train_c.c, compare.py, and CMakeLists.txt. Shared infra under examples/_shared/ supports Stages 2-4. Shared infrastructure (examples/_shared/): - xorshift32.py: Python mirror of src/rng/RNG.c (Marsaglia XorShift32 with shifts 13/17/5 + rejection-sampled Fisher-Yates) so PyTorch DataLoader shuffles the same byte-identical order as the C-side DataLoader. A runtime-compiled C harness asserts byte-equality vs the framework code in pytest. - log_schema.py: shared JSON schema for training logs, emitted from PyTorch via dump_log() and from C via printf-by-hand. - parity.py: per-metric absolute or relative tolerance helpers with ParityCheck/ParityResult dataclasses. - plotting.py: matplotlib helpers for loss/accuracy curves and confusion matrices (PyTorch solid, C dashed convention). - npy_writer.{h,c}: tiny .npy v1.0 writer for the C side (NPYLoaderApi reads but doesn't write). - seeds.py + DETERMINISM.md: SEED=42, SHUFFLE_SEED=42 plus the determinism contract. HAR example (examples/har_classifier/): - prepare_data.py downloads the UCI HAR archive, stacks 9 inertial channels, writes [N, 9, 128] float32 splits with deterministic 10% validation carve. - train_pytorch.py: 12-layer 1D-CNN (3x Conv1d/ReLU/MaxPool1d + AvgPool1d + Flatten + Linear + Softmax). 20 epochs SGD lr=0.01 m=0.9 batch=64. Reaches test_acc 91.75%. - train_c.c: same architecture in the ODT C framework (Conv1d via userApi, MaxPool1d/AvgPool1d via manual layer construction since no userApi for them yet). Reaches test_acc 90.63%. - compare.py: parity checks (test_acc within +/-2.5pp absolute, test_loss within +/-0.15 nats absolute -- empirically calibrated to handle independent-init confidence-calibration drift while still flagging real divergence). Emits 4 PNGs. Build infrastructure: - BUILD_EXAMPLES cmake option (default OFF -- unit_test_* presets unaffected). New 'examples' configure + build presets. - .gitignore patterns for runtime artifacts (data/, logs/, outputs/, plots/). - matplotlib added as a dependency. Tests added (python/tests/): - test_xorshift32.py: 5 algorithm tests + 3 C-vs-Python parity tests. - test_log_schema.py: roundtrip + required-key validation. - test_parity.py: 6 tests for ParityCheck/ParityResult. - test_npy_writer.py: float32/int32 roundtrips against numpy.load. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

LeoBuron force-pushed the examples-har-classifier branch from 11de5ee to 4b8cc79 Compare May 10, 2026 13:24

LeoBuron and others added 2 commits May 10, 2026 15:33

LeoBuron force-pushed the examples-har-classifier branch from 4b8cc79 to 1e82212 Compare May 10, 2026 13:33

LeoBuron merged commit 1e82212 into develop May 10, 2026
5 checks passed

LeoBuron deleted the examples-har-classifier branch May 10, 2026 13:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(examples): HAR classifier with PyTorch + C parity (Stage 1/4)#162

feat(examples): HAR classifier with PyTorch + C parity (Stage 1/4)#162
LeoBuron merged 2 commits into
developfrom
examples-har-classifier

LeoBuron commented May 10, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

LeoBuron commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's in this PR

Notable findings

Parity result

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

LeoBuron commented May 10, 2026 •

edited

Loading