From e88e4a109108408e59eeeb15142afa4d80ac5d95 Mon Sep 17 00:00:00 2001 From: Leo Buron Date: Fri, 15 May 2026 22:12:01 +0200 Subject: [PATCH 1/4] =?UTF-8?q?feat(userApi):=20layerLoadWeights=20CONV1D?= =?UTF-8?q?=5FTRANSPOSED=20dispatch=20=E2=80=94=20weight=20+=20bias=20memc?= =?UTF-8?q?py?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Mirrors the CONV1D case but routes through conv1dTransposedConfig_t. Caller-side weight buffer shape is [inChannels, outChannels/groups, kernelSize] — the SWAP relative to Conv1d is documented in the Conv1dTransposedApi header. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 8.1 refactor(userApi): rename Softmax factories to *Legacy for new API coexistence Frees the canonical softmaxLayerInit / freeSoftmaxLayer names. Legacy bodies are functionally unchanged except for the explicit softmaxConfig->ownsQuantizations = false (defensive — matches calloc default). Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md feat(userApi): declare conv1dInit_t and new Conv1d factory signatures Adds the per-layer init struct, Borrowing/Owning factory decls, and freeConv1dLayer decl. _Static_assert guards that paddingType_t::VALID remains enum value 0 so .padding zero-init defaults to VALID. No impl yet — that follows in the next two commits. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md sections 3.3, 3.4, 4 test(userApi): failing tests for new conv1dLayerInit Borrowing variant Four tests: shape correctness with explicit fields, BIAS_DEFAULT resolution, BIAS_FALSE leaves bias NULL, padding/stride/dilation/groups zero-init defaults (VALID/1/1/1). Fails at link until impl lands. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md sections 3.3, 4, 5.1 feat(userApi): declare conv1dTransposedInit_t and Conv1dTransposed factory signatures New Conv1dTransposedApi.h header with conv1dTransposedInit_t struct, Borrowing/Owning factory decls, freeConv1dTransposedLayer decl, and _Static_assert(VALID == 0). Stub .c file registered in CMake. Implementation in subsequent commits. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md sections 3.3, 3.4, 4 test(userApi): failing tests for new conv1dTransposedLayerInit Borrowing Three tests: shape correctness (with inChannels/outChannels weight shape SWAP relative to Conv1d), BIAS_FALSE leaves bias NULL, outputPadding propagates to internal config. Fails at link until impl lands. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md sections 3.3, 4 feat(userApi): declare Pool1d factory signatures (Max + Avg) Splits the spec's shared pool1dInit_t into maxPool1dInit_t (with inputChannels + inputLength for argmax pre-allocation) and avgPool1dInit_t (no input geometry, no dilation). Both factory pairs declared in one header; impl stubbed in Pool1dApi.c for the CMake graph. Per-layer init code follows. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md sections 3.3 (with documented split), 3.4, 4 test(userApi): failing tests for maxPool1dLayerInit Borrowing + Owning Four tests: kernel + argmax shape correctness, stride defaulting to kernelSize (PyTorch pool convention), Owning deep-copy of forwardMath and backwardMath into the two pool config slots (forwardQ + propLossQ), and a leak-check loop. Fails at link until impl lands. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md sections 3.3 (split), 4 test(userApi): failing tests for avgPool1dLayerInit Borrowing + Owning Three tests: kernel correctness, stride defaulting to kernelSize, and Owning deep-copy. AvgPool has no dilation (struct field omitted) and no argmax tensor (no input geometry required). Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 4 test(userApi): failing tests for layerLoadWeights CONV1D case Two tests: weight + bias memcpy, and no-bias accepts NULL biasData. Fails because the current CONV1D dispatch is the PR 1 PRINT_ERROR stub. Implementation in next commit. Also adds TensorApi include + MORE_LIBS entry (provides freeQuantization, which the conv1d tests need but the plan omitted). Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 8.1 feat(userApi): layerLoadWeights CONV1D dispatch — weight + bias memcpy Replaces PR 1's TODO stub. Same shape as the LINEAR case: memcpy from the caller-provided float* buffer into the factory-allocated tensor data, with bias presence/absence enforcement matching the bool resolved from conv1dInit_t::bias. Also links Conv1d into LayerWeightsApi CMake target and adds TensorApi to the test's MORE_LIBS (provides freeQuantization, omitted from spec). Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 8.1 test(userApi): failing test for layerLoadWeights CONV1D_TRANSPOSED case Verifies weight + bias memcpy into the factory-allocated Conv1dTransposed parameter tensors. Fails because the current dispatch is a PRINT_ERROR + exit stub. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 8.1 test(userApi): failing tests for shared deepCopyQuantization in LayerQuant Three tests cover the null-input shortcut, FLOAT32 (no qConfig), and SYM_INT32 (qConfig bytes duplicated). Fails at link until the impl lands in LayerQuant.c. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md sections 5.2, 5.3 refactor(userApi): extract deepCopyQuantization to shared LayerQuant utility PR 1 inlined deepCopyQuantization into LinearApi.c and an equivalent reluDeepCopyQuantization into ReluApi.c. Hoists both into a single externally-linked function in LayerQuant.c. LinearApi and ReluApi now share the same code path; the new Conv1d / Conv1dTransposed / Pool1d / Softmax Owning factories that follow in this PR use it directly. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md sections 5.2, 5.3 refactor(userApi): rename Conv1d factories to *Legacy for new API coexistence Frees the canonical conv1dLayerInit / freeConv1dLayer names for the new conv1dInit_t-based factories landing in this PR. Legacy bodies are functionally unchanged; only adds 'conv1dConfig->ownsQuantizations = false' (no behavior change since the new field defaults to false via calloc anyway, but explicit is clearer). Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md --- examples/ecg_anomaly_ae/train_c.c | 7 +- examples/har_classifier/train_c.c | 14 +- src/userApi/CMakeLists.txt | 4 + src/userApi/LayerQuant.c | 47 ++++ src/userApi/LayerWeightsApi.c | 57 ++++- src/userApi/include/LayerQuant.h | 14 ++ src/userApi/layer/CMakeLists.txt | 35 +++ src/userApi/layer/Conv1dApi.c | 9 +- src/userApi/layer/Conv1dTransposedApi.c | 8 + src/userApi/layer/LinearApi.c | 52 ----- src/userApi/layer/Pool1dApi.c | 5 + src/userApi/layer/ReluApi.c | 46 +--- src/userApi/layer/SoftmaxApi.c | 5 +- src/userApi/layer/include/Conv1dApi.h | 60 ++++- .../layer/include/Conv1dTransposedApi.h | 51 ++++ src/userApi/layer/include/Pool1dApi.h | 67 ++++++ src/userApi/layer/include/SoftmaxApi.h | 6 +- test/unit/layer/UnitTestConv1d.c | 10 +- test/unit/layer/UnitTestSoftmax.c | 20 +- .../loss_functions/UnitTestCrossEntropy.c | 8 +- test/unit/serial/UnitTestDeserialize.c | 8 +- test/unit/userAPI/CMakeLists.txt | 51 ++++ test/unit/userAPI/UnitTestConv1dApi.c | 147 ++++++++++++ .../userAPI/UnitTestConv1dTransposedApi.c | 127 ++++++++++ .../unit/userAPI/UnitTestFlattenIntegration.c | 4 +- test/unit/userAPI/UnitTestLayerQuant.c | 39 ++++ test/unit/userAPI/UnitTestLayerWeightsApi.c | 104 +++++++++ test/unit/userAPI/UnitTestMnistSmoke.c | 6 +- .../unit/userAPI/UnitTestMultiLayerTraining.c | 12 +- test/unit/userAPI/UnitTestPool1dApi.c | 219 ++++++++++++++++++ 30 files changed, 1081 insertions(+), 161 deletions(-) create mode 100644 src/userApi/layer/Conv1dTransposedApi.c create mode 100644 src/userApi/layer/Pool1dApi.c create mode 100644 src/userApi/layer/include/Conv1dTransposedApi.h create mode 100644 src/userApi/layer/include/Pool1dApi.h create mode 100644 test/unit/userAPI/UnitTestConv1dApi.c create mode 100644 test/unit/userAPI/UnitTestConv1dTransposedApi.c create mode 100644 test/unit/userAPI/UnitTestPool1dApi.c diff --git a/examples/ecg_anomaly_ae/train_c.c b/examples/ecg_anomaly_ae/train_c.c index 5f85dbf..0170690 100644 --- a/examples/ecg_anomaly_ae/train_c.c +++ b/examples/ecg_anomaly_ae/train_c.c @@ -271,7 +271,7 @@ static void buildModel(layer_t **model) { parameter_t *e1_w = buildParam(XAVIER_UNIFORM, e1_w_data, e1_w_dims, 3, IN_CHANNELS * E1_K, E1_OUT * E1_K); parameter_t *e1_b = buildParam(ZEROS, e1_b_data, e1_b_dims, 1, 1, E1_OUT); - model[0] = conv1dLayerInit(e1_w, e1_b, e1k, q, q, q, q); + model[0] = conv1dLayerInitLegacy(e1_w, e1_b, e1k, q, q, q, q); model[1] = reluLayerInitLegacy(quantizationInitFloat(), quantizationInitFloat()); /* Block P1: MaxPool1d(K=2, S=2). 70 → 35. */ @@ -283,8 +283,9 @@ static void buildModel(layer_t **model) { parameter_t *e2_w = buildParam(XAVIER_UNIFORM, e2_w_data, e2_w_dims, 3, E1_OUT * E2_K, E2_OUT * E2_K); parameter_t *e2_b = buildParam(ZEROS, e2_b_data, e2_b_dims, 1, 1, E2_OUT); - model[3] = conv1dLayerInit(e2_w, e2_b, e2k, quantizationInitFloat(), quantizationInitFloat(), - quantizationInitFloat(), quantizationInitFloat()); + model[3] = + conv1dLayerInitLegacy(e2_w, e2_b, e2k, quantizationInitFloat(), quantizationInitFloat(), + quantizationInitFloat(), quantizationInitFloat()); model[4] = reluLayerInitLegacy(quantizationInitFloat(), quantizationInitFloat()); /* Block P2: AvgPool1d(K=5, S=5). 35 → 7 (bottleneck). */ diff --git a/examples/har_classifier/train_c.c b/examples/har_classifier/train_c.c index 2ec319c..1f78dfc 100644 --- a/examples/har_classifier/train_c.c +++ b/examples/har_classifier/train_c.c @@ -264,7 +264,7 @@ static void buildModel(layer_t **model) { parameter_t *c1_w = buildParam(XAVIER_UNIFORM, c1_w_data, c1_w_dims, 3, IN_CHANNELS * C1_K, C1_OUT * C1_K); parameter_t *c1_b = buildParam(ZEROS, c1_b_data, c1_b_dims, 1, 1, C1_OUT); - model[0] = conv1dLayerInit(c1_w, c1_b, k1, q1, q2, q3, q4); + model[0] = conv1dLayerInitLegacy(c1_w, c1_b, k1, q1, q2, q3, q4); model[1] = reluLayerInitLegacy(quantizationInitFloat(), quantizationInitFloat()); model[2] = buildMaxPool1dLayer(2, 2, C1_OUT, LEN_INPUT / 2); @@ -274,8 +274,9 @@ static void buildModel(layer_t **model) { parameter_t *c2_w = buildParam(XAVIER_UNIFORM, c2_w_data, c2_w_dims, 3, C1_OUT * C2_K, C2_OUT * C2_K); parameter_t *c2_b = buildParam(ZEROS, c2_b_data, c2_b_dims, 1, 1, C2_OUT); - model[3] = conv1dLayerInit(c2_w, c2_b, k2, quantizationInitFloat(), quantizationInitFloat(), - quantizationInitFloat(), quantizationInitFloat()); + model[3] = + conv1dLayerInitLegacy(c2_w, c2_b, k2, quantizationInitFloat(), quantizationInitFloat(), + quantizationInitFloat(), quantizationInitFloat()); model[4] = reluLayerInitLegacy(quantizationInitFloat(), quantizationInitFloat()); model[5] = buildMaxPool1dLayer(2, 2, C2_OUT, LEN_INPUT / 4); @@ -285,8 +286,9 @@ static void buildModel(layer_t **model) { parameter_t *c3_w = buildParam(XAVIER_UNIFORM, c3_w_data, c3_w_dims, 3, C2_OUT * C3_K, C3_OUT * C3_K); parameter_t *c3_b = buildParam(ZEROS, c3_b_data, c3_b_dims, 1, 1, C3_OUT); - model[6] = conv1dLayerInit(c3_w, c3_b, k3, quantizationInitFloat(), quantizationInitFloat(), - quantizationInitFloat(), quantizationInitFloat()); + model[6] = + conv1dLayerInitLegacy(c3_w, c3_b, k3, quantizationInitFloat(), quantizationInitFloat(), + quantizationInitFloat(), quantizationInitFloat()); model[7] = reluLayerInitLegacy(quantizationInitFloat(), quantizationInitFloat()); model[8] = buildAvgPool1dLayer(LEN_INPUT / 4, LEN_INPUT / 4); @@ -296,7 +298,7 @@ static void buildModel(layer_t **model) { parameter_t *fc_b = buildParam(ZEROS, fc_b_data, fc_b_dims, 2, 1, NUM_CLASSES); model[10] = linearLayerInitLegacy(fc_w, fc_b, quantizationInitFloat(), quantizationInitFloat(), quantizationInitFloat(), quantizationInitFloat()); - model[11] = softmaxLayerInit(quantizationInitFloat(), quantizationInitFloat()); + model[11] = softmaxLayerInitLegacy(quantizationInitFloat(), quantizationInitFloat()); } /* ------------------------------------------------------------------------- */ diff --git a/src/userApi/CMakeLists.txt b/src/userApi/CMakeLists.txt index 74da8cf..cd2c781 100644 --- a/src/userApi/CMakeLists.txt +++ b/src/userApi/CMakeLists.txt @@ -35,14 +35,18 @@ target_link_libraries(StorageApi PRIVATE add_library(LayerQuant LayerQuant.c) target_include_directories(LayerQuant PUBLIC include) target_link_libraries(LayerQuant PRIVATE + Common Quantization Rounding + StorageApi ) add_library(LayerWeightsApi LayerWeightsApi.c) target_include_directories(LayerWeightsApi PUBLIC include) target_link_libraries(LayerWeightsApi PRIVATE Common + Conv1d + Conv1dTransposed Layer Linear Rounding diff --git a/src/userApi/LayerQuant.c b/src/userApi/LayerQuant.c index df06957..5ac004c 100644 --- a/src/userApi/LayerQuant.c +++ b/src/userApi/LayerQuant.c @@ -1,6 +1,11 @@ #define SOURCE_FILE "LAYER_QUANT" +#include +#include + +#include "Common.h" #include "LayerQuant.h" +#include "StorageApi.h" void layerQuantInitUniform(layerQuant_t *lq, quantization_t *q) { lq->forwardMath = q; @@ -8,3 +13,45 @@ void layerQuantInitUniform(layerQuant_t *lq, quantization_t *q) { lq->weightStorage = q; lq->biasStorage = q; } + +quantization_t *deepCopyQuantization(quantization_t *src) { + if (src == NULL) { + return NULL; + } + + quantization_t *dst = reserveMemory(sizeof(quantization_t)); + dst->type = src->type; + + size_t cfgSize = 0; + switch (src->type) { + case FLOAT32: + cfgSize = 0; + break; + case INT32: + cfgSize = 0; + break; + case BOOL: + cfgSize = 0; + break; + case SYM_INT32: + cfgSize = sizeof(symInt32QConfig_t); + break; + case SYM: + cfgSize = sizeof(symQConfig_t); + break; + case ASYM: + cfgSize = sizeof(asymQConfig_t); + break; + default: + PRINT_ERROR("deepCopyQuantization: unknown quantization type %d", (int)src->type); + exit(1); + } + + if (cfgSize == 0) { + dst->qConfig = NULL; + } else { + dst->qConfig = reserveMemory(cfgSize); + memcpy(dst->qConfig, src->qConfig, cfgSize); + } + return dst; +} diff --git a/src/userApi/LayerWeightsApi.c b/src/userApi/LayerWeightsApi.c index 432d7e9..9c043d2 100644 --- a/src/userApi/LayerWeightsApi.c +++ b/src/userApi/LayerWeightsApi.c @@ -2,6 +2,8 @@ #include "LayerWeightsApi.h" #include "Common.h" +#include "Conv1d.h" +#include "Conv1dTransposed.h" #include "Linear.h" #include "Tensor.h" #include @@ -42,6 +44,30 @@ void layerLoadWeights(layer_t *layer, float *weightData, float *biasData) { } break; } + case CONV1D: { + conv1dConfig_t *cfg = layer->config->conv1d; + if (cfg->weights == NULL) { + PRINT_ERROR("layerLoadWeights CONV1D: layer has no weight parameter"); + exit(1); + } + tensor_t *weightTensor = cfg->weights->param; + size_t numWeightElements = calcNumberOfElementsByTensor(weightTensor); + memcpy(weightTensor->data, weightData, numWeightElements * sizeof(float)); + + if (cfg->bias != NULL) { + if (biasData == NULL) { + PRINT_ERROR("layerLoadWeights CONV1D: layer has bias but biasData is NULL"); + exit(1); + } + tensor_t *biasTensor = cfg->bias->param; + size_t numBiasElements = calcNumberOfElementsByTensor(biasTensor); + memcpy(biasTensor->data, biasData, numBiasElements * sizeof(float)); + } else if (biasData != NULL) { + PRINT_ERROR("layerLoadWeights CONV1D: layer has no bias but biasData is non-NULL"); + exit(1); + } + break; + } case RELU: case SOFTMAX: case FLATTEN: @@ -49,11 +75,32 @@ void layerLoadWeights(layer_t *layer, float *weightData, float *biasData) { case AVGPOOL1D: PRINT_ERROR("layerLoadWeights: layer type %d has no parameters to load", (int)layer->type); exit(1); - case CONV1D: - case CONV1D_TRANSPOSED: - PRINT_ERROR("layerLoadWeights: layer type %d dispatch not implemented (TODO PR 2)", - (int)layer->type); - exit(1); + case CONV1D_TRANSPOSED: { + conv1dTransposedConfig_t *cfg = layer->config->conv1dTransposed; + if (cfg->weights == NULL) { + PRINT_ERROR("layerLoadWeights CONV1D_TRANSPOSED: layer has no weight parameter"); + exit(1); + } + tensor_t *weightTensor = cfg->weights->param; + size_t numWeightElements = calcNumberOfElementsByTensor(weightTensor); + memcpy(weightTensor->data, weightData, numWeightElements * sizeof(float)); + + if (cfg->bias != NULL) { + if (biasData == NULL) { + PRINT_ERROR("layerLoadWeights CONV1D_TRANSPOSED: layer has bias but biasData " + "is NULL"); + exit(1); + } + tensor_t *biasTensor = cfg->bias->param; + size_t numBiasElements = calcNumberOfElementsByTensor(biasTensor); + memcpy(biasTensor->data, biasData, numBiasElements * sizeof(float)); + } else if (biasData != NULL) { + PRINT_ERROR("layerLoadWeights CONV1D_TRANSPOSED: layer has no bias but biasData " + "is non-NULL"); + exit(1); + } + break; + } default: PRINT_ERROR("layerLoadWeights: dispatch not implemented for layer type %d", (int)layer->type); diff --git a/src/userApi/include/LayerQuant.h b/src/userApi/include/LayerQuant.h index 220e13f..b456634 100644 --- a/src/userApi/include/LayerQuant.h +++ b/src/userApi/include/LayerQuant.h @@ -22,4 +22,18 @@ typedef struct layerQuant { * common all-same-quantization case. Caller retains ownership of `q`. */ void layerQuantInitUniform(layerQuant_t *lq, quantization_t *q); +/*! Deep-copy a `quantization_t` and its `qConfig`. Returns NULL if `src` is NULL. + * + * Caller owns the returned allocation. Free via: + * freeReservedMemory(result->qConfig); + * freeReservedMemory(result); + * + * The `qConfig` size is dispatched by `src->type`; BOOL/INT32/FLOAT32 have + * no qConfig (result->qConfig == NULL). Unknown types fire PRINT_ERROR + + * exit(1). + * + * Used by every `*LayerInitOwning` factory to materialize per-layer copies + * of the four math quantizations referenced by `layerQuant_t`. */ +quantization_t *deepCopyQuantization(quantization_t *src); + #endif /* LAYER_QUANT_H */ diff --git a/src/userApi/layer/CMakeLists.txt b/src/userApi/layer/CMakeLists.txt index 805d5b0..c24e468 100644 --- a/src/userApi/layer/CMakeLists.txt +++ b/src/userApi/layer/CMakeLists.txt @@ -57,4 +57,39 @@ target_link_libraries(FlattenApi PRIVATE Layer Common StorageApi +) + +add_library(Conv1dTransposedApi Conv1dTransposedApi.c) +target_include_directories(Conv1dTransposedApi PUBLIC include) +target_link_libraries(Conv1dTransposedApi PRIVATE + Common + Conv1dTransposed + Distributions + Kernel + Layer + LayerCommon + LayerQuant + Quantization + QuantizationApi + Rounding + StorageApi + Tensor + TensorApi +) + +add_library(Pool1dApi Pool1dApi.c) +target_include_directories(Pool1dApi PUBLIC include) +target_link_libraries(Pool1dApi PRIVATE + AvgPool1d + Common + Kernel + Layer + LayerQuant + MaxPool1d + Quantization + QuantizationApi + Rounding + StorageApi + Tensor + TensorApi ) \ No newline at end of file diff --git a/src/userApi/layer/Conv1dApi.c b/src/userApi/layer/Conv1dApi.c index b17ddf6..22a3d3a 100644 --- a/src/userApi/layer/Conv1dApi.c +++ b/src/userApi/layer/Conv1dApi.c @@ -8,15 +8,16 @@ #include -layer_t *conv1dLayerInit(parameter_t *weights, parameter_t *bias, kernel_t *kernel, - quantization_t *forwardQ, quantization_t *weightGradQ, - quantization_t *biasGradQ, quantization_t *propLossQ) { +layer_t *conv1dLayerInitLegacy(parameter_t *weights, parameter_t *bias, kernel_t *kernel, + quantization_t *forwardQ, quantization_t *weightGradQ, + quantization_t *biasGradQ, quantization_t *propLossQ) { layer_t *conv1dLayer = reserveMemory(sizeof(layer_t)); layerConfig_t *layerConfig = reserveMemory(sizeof(layerConfig_t)); conv1dConfig_t *conv1dConfig = reserveMemory(sizeof(conv1dConfig_t)); initConv1dConfigWithWeightsAndBias(conv1dConfig, kernel, weights, bias, 1u, forwardQ, weightGradQ, biasGradQ, propLossQ); + conv1dConfig->ownsQuantizations = false; conv1dLayer->type = CONV1D; layerConfig->conv1d = conv1dConfig; @@ -25,7 +26,7 @@ layer_t *conv1dLayerInit(parameter_t *weights, parameter_t *bias, kernel_t *kern return conv1dLayer; } -void freeConv1dLayer(layer_t *conv1dLayer) { +void freeConv1dLayerLegacy(layer_t *conv1dLayer) { conv1dConfig_t *conv1dConfig = conv1dLayer->config->conv1d; freeParameter(conv1dConfig->weights); diff --git a/src/userApi/layer/Conv1dTransposedApi.c b/src/userApi/layer/Conv1dTransposedApi.c new file mode 100644 index 0000000..589351c --- /dev/null +++ b/src/userApi/layer/Conv1dTransposedApi.c @@ -0,0 +1,8 @@ +#define SOURCE_FILE "CONV1D_TRANSPOSED_API" + +/* Stub. Full implementation lands in Task 12. This file exists so + * Conv1dTransposedApi compiles as a library target for the CMake graph + * to discover; the headers above declare the functions but they will + * link-fail until Task 12 fills them in. */ + +#include "Conv1dTransposedApi.h" diff --git a/src/userApi/layer/LinearApi.c b/src/userApi/layer/LinearApi.c index 70c6b15..2b8bd15 100644 --- a/src/userApi/layer/LinearApi.c +++ b/src/userApi/layer/LinearApi.c @@ -2,7 +2,6 @@ #include #include -#include #include "Common.h" #include "Distributions.h" @@ -203,57 +202,6 @@ layer_t *linearLayerInit(linearInit_t *init, layerQuant_t *lq) { return layer; } -/*! Deep-copies a quantization_t and its qConfig. - * - * Returns NULL if `src` is NULL. Caller owns the returned allocation; free via: - * freeReservedMemory(result->qConfig); - * freeReservedMemory(result); - * - * qConfig size is dispatched by `src->type`. BOOL has no qConfig (aligned with - * the BOOL dtype added per the BOOL tensor spec). */ -static quantization_t *deepCopyQuantization(quantization_t *src) { - if (src == NULL) { - return NULL; - } - - quantization_t *dst = reserveMemory(sizeof(quantization_t)); - dst->type = src->type; - - size_t cfgSize = 0; - switch (src->type) { - case FLOAT32: - cfgSize = 0; - break; /* no qConfig */ - case INT32: - cfgSize = 0; - break; - case BOOL: - cfgSize = 0; - break; /* BOOL has no qConfig */ - case SYM_INT32: - cfgSize = sizeof(symInt32QConfig_t); - break; - case SYM: - cfgSize = sizeof(symQConfig_t); - break; - case ASYM: - cfgSize = sizeof(asymQConfig_t); - break; - default: - PRINT_ERROR("linearLayerInitOwning: cannot deep-copy quantization with unknown type %d", - (int)src->type); - exit(1); - } - - if (cfgSize == 0) { - dst->qConfig = NULL; - } else { - dst->qConfig = reserveMemory(cfgSize); - memcpy(dst->qConfig, src->qConfig, cfgSize); - } - return dst; -} - layer_t *linearLayerInitOwning(linearInit_t *init, layerQuant_t *lq) { validateLinearInit(init); bool hasBias = resolveLinearBias(init->bias); diff --git a/src/userApi/layer/Pool1dApi.c b/src/userApi/layer/Pool1dApi.c new file mode 100644 index 0000000..d743e54 --- /dev/null +++ b/src/userApi/layer/Pool1dApi.c @@ -0,0 +1,5 @@ +#define SOURCE_FILE "POOL1D_API" + +/* Stub. Full implementation lands in Tasks 15 and 16. */ + +#include "Pool1dApi.h" diff --git a/src/userApi/layer/ReluApi.c b/src/userApi/layer/ReluApi.c index 2deaccd..ace1232 100644 --- a/src/userApi/layer/ReluApi.c +++ b/src/userApi/layer/ReluApi.c @@ -2,7 +2,6 @@ #include #include /* exit */ -#include /* memcpy */ #include "Common.h" /* PRINT_ERROR */ #include "LayerQuant.h" @@ -38,47 +37,6 @@ void freeReluLayerLegacy(layer_t *reluLayer) { * New factory API — layerQuant_t profile (PR 1). * ========================================================================== */ -static quantization_t *reluDeepCopyQuantization(quantization_t *src) { - if (src == NULL) { - return NULL; - } - - quantization_t *dst = reserveMemory(sizeof(quantization_t)); - dst->type = src->type; - - size_t cfgSize = 0; - switch (src->type) { - case FLOAT32: - cfgSize = 0; - break; - case INT32: - cfgSize = 0; - break; - case BOOL: - cfgSize = 0; - break; - case SYM_INT32: - cfgSize = sizeof(symInt32QConfig_t); - break; - case SYM: - cfgSize = sizeof(symQConfig_t); - break; - case ASYM: - cfgSize = sizeof(asymQConfig_t); - break; - default: - PRINT_ERROR("reluLayerInitOwning: unknown quantization type %d", (int)src->type); - exit(1); - } - if (cfgSize == 0) { - dst->qConfig = NULL; - } else { - dst->qConfig = reserveMemory(cfgSize); - memcpy(dst->qConfig, src->qConfig, cfgSize); - } - return dst; -} - static void validateLayerQuantForRelu(layerQuant_t *lq) { if (lq == NULL) { PRINT_ERROR("reluLayerInit: lq pointer is NULL"); @@ -123,8 +81,8 @@ layer_t *reluLayerInitOwning(layerQuant_t *lq) { layerCfg->relu = cfg; layer->config = layerCfg; - cfg->forwardQ = reluDeepCopyQuantization(lq->forwardMath); - cfg->backwardQ = reluDeepCopyQuantization(lq->backwardMath); + cfg->forwardQ = deepCopyQuantization(lq->forwardMath); + cfg->backwardQ = deepCopyQuantization(lq->backwardMath); cfg->ownsQuantizations = true; return layer; diff --git a/src/userApi/layer/SoftmaxApi.c b/src/userApi/layer/SoftmaxApi.c index 2b95013..df4fe8e 100644 --- a/src/userApi/layer/SoftmaxApi.c +++ b/src/userApi/layer/SoftmaxApi.c @@ -4,7 +4,7 @@ #include "Softmax.h" #include "StorageApi.h" -layer_t *softmaxLayerInit(quantization_t *forwardQ, quantization_t *backwardQ) { +layer_t *softmaxLayerInitLegacy(quantization_t *forwardQ, quantization_t *backwardQ) { layer_t *softmaxLayer = reserveMemory(sizeof(layer_t)); softmaxLayer->type = SOFTMAX; @@ -15,12 +15,13 @@ layer_t *softmaxLayerInit(quantization_t *forwardQ, quantization_t *backwardQ) { softmaxConfig->forwardQ = forwardQ; softmaxConfig->backwardQ = backwardQ; + softmaxConfig->ownsQuantizations = false; softmaxLayer->config = layerConfig; return softmaxLayer; } -void freeSoftmaxLayer(layer_t *softmaxLayer) { +void freeSoftmaxLayerLegacy(layer_t *softmaxLayer) { freeReservedMemory(softmaxLayer->config->softmax); freeReservedMemory(softmaxLayer->config); freeReservedMemory(softmaxLayer); diff --git a/src/userApi/layer/include/Conv1dApi.h b/src/userApi/layer/include/Conv1dApi.h index 1048366..79abb33 100644 --- a/src/userApi/layer/include/Conv1dApi.h +++ b/src/userApi/layer/include/Conv1dApi.h @@ -4,7 +4,10 @@ #include "Kernel.h" #include "Layer.h" -/*! Initializes a 1D convolution layer with given parameters. +/* Legacy (pre-2026-05-15 factory API) — retained during PR 1/2 coexistence window. + * New code should use the conv1dInit_t-based factories declared in PR 2. */ + +/*! Legacy Conv1d factory. * * @param weights Weights with gradients * @param bias Optional bias parameter with gradients @@ -14,16 +17,57 @@ * @param biasGradQ Quantization for bias gradient calculation * @param propLossQ Quantization for prop loss calculation * - * @returns Pointer to initializes layer_t + * @returns Pointer to initialized layer_t */ -layer_t *conv1dLayerInit(parameter_t *weights, parameter_t *bias, kernel_t *kernel, - quantization_t *forwardQ, quantization_t *weightGradQ, - quantization_t *biasGradQ, quantization_t *propLossQ); +layer_t *conv1dLayerInitLegacy(parameter_t *weights, parameter_t *bias, kernel_t *kernel, + quantization_t *forwardQ, quantization_t *weightGradQ, + quantization_t *biasGradQ, quantization_t *propLossQ); + +/*! Frees a Conv1d layer built via the legacy factory. */ +void freeConv1dLayerLegacy(layer_t *conv1dLayer); + +#include "LayerCommon.h" +#include "LayerQuant.h" -/*! Frees 1D convolutional layer and all contained data structures recursively +_Static_assert(VALID == 0, + "paddingType_t::VALID must be enum value 0 so .padding zero-init defaults to VALID"); + +/*! Conv1d factory configuration. Build via designated initializer: * - * @param conv1dLayer Pointer to layer_t - */ + * conv1dLayerInit(&(conv1dInit_t){ + * .inChannels = 3, .outChannels = 16, .kernelSize = 5, + * .padding = SAME, .stride = 1, + * }, lq); + * + * REQUIRED fields fire PRINT_ERROR + exit(1) if zero. Defaults below are + * applied when the field is zero-init (compound-literal omission). */ +typedef struct conv1dInit { + /* REQUIRED */ + size_t inChannels; + size_t outChannels; + size_t kernelSize; + /* OPTIONAL — zero-init defaults */ + size_t stride; /* 0 → 1 */ + paddingType_t padding; /* 0 → VALID (enum value 0) */ + size_t dilation; /* 0 → 1 */ + size_t groups; /* 0 → 1 */ + bias_t bias; /* BIAS_DEFAULT (0) → resolves to true (PyTorch parity) */ +} conv1dInit_t; + +/*! Borrowing variant — factory allocates weights/bias/kernel internally + * and stores the four math `quantization_t*` from `lq` verbatim. Caller + * retains ownership of `lq` and the quantizations; `lq` may be a + * compound literal. */ +layer_t *conv1dLayerInit(conv1dInit_t *init, layerQuant_t *lq); + +/*! Owning variant — same as `conv1dLayerInit`, but additionally + * `deepCopyQuantization`s each of the four math quantizations. Caller + * can drop `lq` and the quantization_t's immediately. */ +layer_t *conv1dLayerInitOwning(conv1dInit_t *init, layerQuant_t *lq); + +/*! Tears down everything the factory allocated. Reads + * `config->ownsQuantizations` to decide whether to also free the four + * math quantizations and their qConfigs. */ void freeConv1dLayer(layer_t *conv1dLayer); #endif // CONV1DAPI_H diff --git a/src/userApi/layer/include/Conv1dTransposedApi.h b/src/userApi/layer/include/Conv1dTransposedApi.h new file mode 100644 index 0000000..a24e89b --- /dev/null +++ b/src/userApi/layer/include/Conv1dTransposedApi.h @@ -0,0 +1,51 @@ +#ifndef CONV1D_TRANSPOSED_API_H +#define CONV1D_TRANSPOSED_API_H + +#include + +#include "Kernel.h" +#include "Layer.h" +#include "LayerCommon.h" +#include "LayerQuant.h" + +_Static_assert(VALID == 0, + "paddingType_t::VALID must be enum value 0 so .padding zero-init defaults to VALID"); + +/*! Conv1dTransposed factory configuration. Mirrors conv1dInit_t plus PyTorch's + * outputPadding parameter. Build via designated initializer: + * + * conv1dTransposedLayerInit(&(conv1dTransposedInit_t){ + * .inChannels = 16, .outChannels = 8, .kernelSize = 5, .stride = 5, + * }, lq); + * + * REQUIRED fields fire PRINT_ERROR + exit(1) if zero. Phase-1 contract: + * only VALID padding is supported (initConv1dTransposedConfigWithWeightsAndBias + * aborts on SAME). */ +typedef struct conv1dTransposedInit { + /* REQUIRED */ + size_t inChannels; + size_t outChannels; + size_t kernelSize; + /* OPTIONAL */ + size_t stride; /* 0 → 1 */ + paddingType_t padding; /* 0 → VALID. SAME is rejected by the internal layer in Phase 1. */ + size_t dilation; /* 0 → 1 */ + size_t groups; /* 0 → 1 */ + size_t outputPadding; /* PyTorch parity; default 0; must be < max(stride, dilation) */ + bias_t bias; /* BIAS_DEFAULT (0) → resolves to true */ +} conv1dTransposedInit_t; + +/*! Borrowing variant — allocates kernel, weights, bias; stores the four + * lq math quantizations verbatim. Caller retains ownership of lq. */ +layer_t *conv1dTransposedLayerInit(conv1dTransposedInit_t *init, layerQuant_t *lq); + +/*! Owning variant — additionally deep-copies the four math quantizations + * via deepCopyQuantization. */ +layer_t *conv1dTransposedLayerInitOwning(conv1dTransposedInit_t *init, layerQuant_t *lq); + +/*! Tears down everything the factory allocated. Reads + * config->ownsQuantizations to decide whether to also free the four + * math quantizations and their qConfigs. */ +void freeConv1dTransposedLayer(layer_t *layer); + +#endif /* CONV1D_TRANSPOSED_API_H */ diff --git a/src/userApi/layer/include/Pool1dApi.h b/src/userApi/layer/include/Pool1dApi.h new file mode 100644 index 0000000..a1f77bd --- /dev/null +++ b/src/userApi/layer/include/Pool1dApi.h @@ -0,0 +1,67 @@ +#ifndef POOL1D_API_H +#define POOL1D_API_H + +#include + +#include "Kernel.h" +#include "Layer.h" +#include "LayerQuant.h" + +_Static_assert(VALID == 0, + "paddingType_t::VALID must be enum value 0 so .padding zero-init defaults to VALID"); + +/*! MaxPool1d factory configuration. + * + * Requires input geometry (inputChannels, inputLength) because the + * factory pre-allocates an argmaxIndices INT32 tensor sized for the + * layer's output shape. Batch size is hardcoded to 1 (the training + * loop iterates microbatch-by-microbatch in this framework). + * + * Usage: + * + * maxPool1dLayerInit(&(maxPool1dInit_t){ + * .kernelSize = 2, .stride = 2, + * .inputChannels = 16, .inputLength = 64, + * }, lq); + */ +typedef struct maxPool1dInit { + /* REQUIRED */ + size_t kernelSize; + size_t inputChannels; + size_t inputLength; + /* OPTIONAL — zero-init defaults */ + size_t stride; /* 0 → kernelSize (PyTorch pool convention) */ + paddingType_t padding; /* 0 → VALID */ + size_t dilation; /* 0 → 1 */ +} maxPool1dInit_t; + +/*! AvgPool1d factory configuration. No argmax tensor needed, hence no + * input geometry. Note: dilation field omitted because AvgPool1d + * arithmetic kernel does not support dilation. */ +typedef struct avgPool1dInit { + /* REQUIRED */ + size_t kernelSize; + /* OPTIONAL */ + size_t stride; /* 0 → kernelSize */ + paddingType_t padding; /* 0 → VALID */ +} avgPool1dInit_t; + +/*! Borrowing variant — allocates kernel and (for MaxPool) the argmax + * tensor; stores lq->forwardMath in forwardQ and lq->backwardMath in + * propLossQ. */ +layer_t *maxPool1dLayerInit(maxPool1dInit_t *init, layerQuant_t *lq); +layer_t *avgPool1dLayerInit(avgPool1dInit_t *init, layerQuant_t *lq); + +/*! Owning variant — additionally deep-copies forwardMath and + * backwardMath via deepCopyQuantization. */ +layer_t *maxPool1dLayerInitOwning(maxPool1dInit_t *init, layerQuant_t *lq); +layer_t *avgPool1dLayerInitOwning(avgPool1dInit_t *init, layerQuant_t *lq); + +/*! Tears down everything the factory allocated. For MaxPool, this + * includes the argmax tensor. Reads config->ownsQuantizations to + * decide whether to also free the two math quantizations and their + * qConfigs. */ +void freeMaxPool1dLayer(layer_t *layer); +void freeAvgPool1dLayer(layer_t *layer); + +#endif /* POOL1D_API_H */ diff --git a/src/userApi/layer/include/SoftmaxApi.h b/src/userApi/layer/include/SoftmaxApi.h index 5ea4d7c..beaa289 100644 --- a/src/userApi/layer/include/SoftmaxApi.h +++ b/src/userApi/layer/include/SoftmaxApi.h @@ -4,8 +4,8 @@ #include "Layer.h" #include "Tensor.h" -layer_t *softmaxLayerInit(quantization_t *forwardQ, quantization_t *backwardQ); - -void freeSoftmaxLayer(layer_t *softmaxLayer); +/* Legacy (pre-2026-05-15 factory API) — retained during PR 1/2 coexistence window. */ +layer_t *softmaxLayerInitLegacy(quantization_t *forwardQ, quantization_t *backwardQ); +void freeSoftmaxLayerLegacy(layer_t *softmaxLayer); #endif // SOFTMAXAPI_H diff --git a/test/unit/layer/UnitTestConv1d.c b/test/unit/layer/UnitTestConv1d.c index f09556f..93252bb 100644 --- a/test/unit/layer/UnitTestConv1d.c +++ b/test/unit/layer/UnitTestConv1d.c @@ -66,7 +66,7 @@ static conv1dRunResult_t conv1dRunForward(conv1dFixtureSetup_t s, float *outputB r.q = quantizationInitFloat(); if (s.groups == 1) { - r.layer = conv1dLayerInit(r.weights, r.bias, &kernelStore, r.q, r.q, r.q, r.q); + r.layer = conv1dLayerInitLegacy(r.weights, r.bias, &kernelStore, r.q, r.q, r.q, r.q); } else { // Phase-2 will expose groups via UserAPI; here we go around the UserAPI. // All statics so their addresses remain valid after this function returns. @@ -104,7 +104,7 @@ void testConv1dForwardMultiChannelWithBias() { kernel_t kernel; initKernel(&kernel, 3, VALID, 1, 1); quantization_t *q = quantizationInitFloat(); - layer_t *conv1d = conv1dLayerInit(weights, bias, &kernel, q, q, q, q); + layer_t *conv1d = conv1dLayerInitLegacy(weights, bias, &kernel, q, q, q, q); size_t inputDims[] = {1, 3, 5}; tensor_t *input = @@ -131,7 +131,7 @@ void testConv1dForwardSingleChannelSingleBatch() { initKernel(&kernel, 2, VALID, 1, 1); quantization_t *q = quantizationInitFloat(); - layer_t *conv1d = conv1dLayerInit(weights, NULL, &kernel, q, q, q, q); + layer_t *conv1d = conv1dLayerInitLegacy(weights, NULL, &kernel, q, q, q, q); size_t inputDims[] = {1, 1, 4}; tensor_t *input = @@ -165,7 +165,7 @@ void testConv1dBackwardSingleChannelWithBias() { kernel_t kernel; initKernel(&kernel, 2, VALID, 1, 1); quantization_t *q = quantizationInitFloat(); - layer_t *conv1d = conv1dLayerInit(weights, bias, &kernel, q, q, q, q); + layer_t *conv1d = conv1dLayerInitLegacy(weights, bias, &kernel, q, q, q, q); size_t inputDims[] = {1, 1, 4}; tensor_t *input = @@ -210,7 +210,7 @@ void testConv1dBackwardSamePaddingSymmetric() { kernel_t kernel; initKernel(&kernel, 3, SAME, 1, 1); quantization_t *q = quantizationInitFloat(); - layer_t *conv1d = conv1dLayerInit(weights, NULL, &kernel, q, q, q, q); + layer_t *conv1d = conv1dLayerInitLegacy(weights, NULL, &kernel, q, q, q, q); size_t inputDims[] = {1, 1, 5}; tensor_t *input = diff --git a/test/unit/layer/UnitTestSoftmax.c b/test/unit/layer/UnitTestSoftmax.c index 8a25b33..087116b 100644 --- a/test/unit/layer/UnitTestSoftmax.c +++ b/test/unit/layer/UnitTestSoftmax.c @@ -34,7 +34,7 @@ void unitTestSoftmaxForwardFloat() { /* 3. Build the layer with shared float quantization. */ quantization_t *floatQ = quantizationInitFloat(); - layer_t *softmaxLayer = softmaxLayerInit(floatQ, floatQ); + layer_t *softmaxLayer = softmaxLayerInitLegacy(floatQ, floatQ); layerFunctions_t softmaxFns = layerFunctions[SOFTMAX]; softmaxFns.forward(softmaxLayer, input, output); @@ -45,7 +45,7 @@ void unitTestSoftmaxForwardFloat() { } /* 5. FREE. */ - freeSoftmaxLayer(softmaxLayer); + freeSoftmaxLayerLegacy(softmaxLayer); freeTensor(output); freeTensor(input); freeQuantization(floatQ); @@ -84,7 +84,7 @@ void unitTestSoftmaxForwardSymInt32() { /* 3. Shared SymInt32 quantization for the layer. */ quantization_t *symIntQ = quantizationInitSymInt32(HTE); - layer_t *softmaxLayer = softmaxLayerInit(symIntQ, symIntQ); + layer_t *softmaxLayer = softmaxLayerInitLegacy(symIntQ, symIntQ); layerFunctions_t softmaxFns = layerFunctions[SOFTMAX]; softmaxFns.forward(softmaxLayer, input, output); @@ -107,7 +107,7 @@ void unitTestSoftmaxForwardSymInt32() { /* 6. FREE. */ freeTensor(outputFloat); - freeSoftmaxLayer(softmaxLayer); + freeSoftmaxLayerLegacy(softmaxLayer); freeTensor(output); freeTensor(input); freeQuantization(symIntQ); @@ -159,7 +159,7 @@ void unitTestSoftmaxBackwardFloat() { /* 4. Build layer. */ quantization_t *floatQ = quantizationInitFloat(); - layer_t *softmaxLayer = softmaxLayerInit(floatQ, floatQ); + layer_t *softmaxLayer = softmaxLayerInitLegacy(floatQ, floatQ); layerFunctions_t softmaxFns = layerFunctions[SOFTMAX]; softmaxFns.backward(softmaxLayer, input, loss, propLoss); @@ -170,7 +170,7 @@ void unitTestSoftmaxBackwardFloat() { } /* 6. FREE. */ - freeSoftmaxLayer(softmaxLayer); + freeSoftmaxLayerLegacy(softmaxLayer); freeTensor(propLoss); freeTensor(loss); freeTensor(input); @@ -223,7 +223,7 @@ void unitTestSoftmaxBackwardSymInt32() { /* 4. Build layer. */ quantization_t *symIntQ = quantizationInitSymInt32(HTE); - layer_t *softmaxLayer = softmaxLayerInit(symIntQ, symIntQ); + layer_t *softmaxLayer = softmaxLayerInitLegacy(symIntQ, symIntQ); layerFunctions_t softmaxFns = layerFunctions[SOFTMAX]; softmaxFns.backward(softmaxLayer, input, loss, propLoss); @@ -246,7 +246,7 @@ void unitTestSoftmaxBackwardSymInt32() { /* 7. FREE. */ freeTensor(propLossFloat); - freeSoftmaxLayer(softmaxLayer); + freeSoftmaxLayerLegacy(softmaxLayer); freeTensor(propLoss); freeTensor(loss); freeTensor(input); @@ -267,13 +267,13 @@ void testSoftmaxLayerInitAndFreeRoundTrip(void) { * sweep — this test asserts only that the round-trip completes * without a crash and that the layer was wired correctly. */ quantization_t *floatQ = quantizationInitFloat(); - layer_t *softmaxLayer = softmaxLayerInit(floatQ, floatQ); + layer_t *softmaxLayer = softmaxLayerInitLegacy(floatQ, floatQ); TEST_ASSERT_NOT_NULL(softmaxLayer); TEST_ASSERT_EQUAL_INT(SOFTMAX, softmaxLayer->type); TEST_ASSERT_NOT_NULL(softmaxLayer->config); TEST_ASSERT_NOT_NULL(softmaxLayer->config->softmax); - freeSoftmaxLayer(softmaxLayer); + freeSoftmaxLayerLegacy(softmaxLayer); /* floatQ is owned by the test; freeSoftmaxLayer must not have freed * it (quantization configs are externally owned and shared). */ diff --git a/test/unit/loss_functions/UnitTestCrossEntropy.c b/test/unit/loss_functions/UnitTestCrossEntropy.c index 8cad28e..065dbb5 100644 --- a/test/unit/loss_functions/UnitTestCrossEntropy.c +++ b/test/unit/loss_functions/UnitTestCrossEntropy.c @@ -33,7 +33,7 @@ void unitTestCrossEntropySoftmaxBackward() { &softmaxOutputQ, NULL); quantization_t *floatQ = quantizationInitFloat(); - layer_t *softmaxLayer = softmaxLayerInit(floatQ, floatQ); + layer_t *softmaxLayer = softmaxLayerInitLegacy(floatQ, floatQ); layerFunctions_t softmaxFns = layerFunctions[SOFTMAX]; softmaxFns.forward(softmaxLayer, &logits, &softmaxOutput); @@ -70,7 +70,7 @@ void unitTestCrossEntropySoftmaxBackward() { } /* FREE. */ - freeSoftmaxLayer(softmaxLayer); + freeSoftmaxLayerLegacy(softmaxLayer); freeQuantization(floatQ); /* ASSERT: raw per-element gradient (p-y), no batch divisor. */ @@ -138,7 +138,7 @@ void testCrossEntropyForward_SumReturnsRawSum() { setTensorValues(&softmaxOutput, (uint8_t *)outputData, &outputShape, &outputQ, NULL); quantization_t *floatQ = quantizationInitFloat(); - layer_t *softmaxLayer = softmaxLayerInit(floatQ, floatQ); + layer_t *softmaxLayer = softmaxLayerInitLegacy(floatQ, floatQ); layerFunctions_t softmaxFns = layerFunctions[SOFTMAX]; softmaxFns.forward(softmaxLayer, &logits, &softmaxOutput); @@ -154,7 +154,7 @@ void testCrossEntropyForward_SumReturnsRawSum() { float capturedActual = crossEntropyForwardFloat(&softmaxOutput, &distribution, REDUCTION_SUM); - freeSoftmaxLayer(softmaxLayer); + freeSoftmaxLayerLegacy(softmaxLayer); freeQuantization(floatQ); /* SUM: same as the pre-existing forward value (raw -log probability sum). */ diff --git a/test/unit/serial/UnitTestDeserialize.c b/test/unit/serial/UnitTestDeserialize.c index 34d749b..4641fe0 100644 --- a/test/unit/serial/UnitTestDeserialize.c +++ b/test/unit/serial/UnitTestDeserialize.c @@ -134,7 +134,7 @@ void testSerializeAndDeserializeModel() { layer_t *serialLinear1 = linearLayerInitLegacy(serialWeight1, serialBias1, serialLayerQ, serialLayerQ, serialLayerQ, serialLayerQ); - layer_t *serialSoftmax = softmaxLayerInit(serialLayerQ, serialLayerQ); + layer_t *serialSoftmax = softmaxLayerInitLegacy(serialLayerQ, serialLayerQ); layer_t *serialModel[] = {serialLinear0, serialRelu, serialLinear1, serialSoftmax}; size_t sizeModel = 4; @@ -168,7 +168,7 @@ void testSerializeAndDeserializeModel() { layer_t *deserialLinear1 = linearLayerInitLegacy(deserialWeight1, deserialBias1, deserialLayerQ, deserialLayerQ, deserialLayerQ, deserialLayerQ); - layer_t *deserialSoftmax = softmaxLayerInit(deserialLayerQ, deserialLayerQ); + layer_t *deserialSoftmax = softmaxLayerInitLegacy(deserialLayerQ, deserialLayerQ); layer_t *deserialModel[] = {deserialLinear0, deserialRelu, deserialLinear1, deserialSoftmax}; @@ -246,7 +246,7 @@ void testSerializeAndDeserializeModel() { /* FREE in reverse-init order. Layer free-functions release only the * wrapper; parameters and the shared layerQ are caller-managed (per * docs/CONVENTIONS.md "Test memory discipline"). */ - freeSoftmaxLayer(deserialSoftmax); + freeSoftmaxLayerLegacy(deserialSoftmax); freeLinearLayerLegacy(deserialLinear1); freeParameter(deserialBias1); freeParameter(deserialWeight1); @@ -256,7 +256,7 @@ void testSerializeAndDeserializeModel() { freeParameter(deserialWeight0); freeQuantization(deserialLayerQ); - freeSoftmaxLayer(serialSoftmax); + freeSoftmaxLayerLegacy(serialSoftmax); freeLinearLayerLegacy(serialLinear1); freeParameter(serialBias1); freeParameter(serialWeight1); diff --git a/test/unit/userAPI/CMakeLists.txt b/test/unit/userAPI/CMakeLists.txt index 3153a87..4f0558b 100644 --- a/test/unit/userAPI/CMakeLists.txt +++ b/test/unit/userAPI/CMakeLists.txt @@ -19,13 +19,19 @@ add_elastic_ai_unit_test( LayerWeightsApi MORE_LIBS LinearApi + Conv1dApi + Conv1dTransposedApi QuantizationApi LayerQuant LayerCommon Quantization Rounding Linear + Conv1d + Conv1dTransposed + Kernel Tensor + TensorApi ) @@ -81,6 +87,51 @@ add_elastic_ai_unit_test( StorageApi ) +add_elastic_ai_unit_test( + LIB_UNDER_TEST + Conv1dApi + MORE_LIBS + LayerQuant + LayerCommon + QuantizationApi + Quantization + Rounding + Conv1d + Kernel + Tensor + TensorApi +) + +add_elastic_ai_unit_test( + LIB_UNDER_TEST + Conv1dTransposedApi + MORE_LIBS + LayerQuant + LayerCommon + QuantizationApi + Quantization + Rounding + Conv1dTransposed + Kernel + Tensor + TensorApi +) + +add_elastic_ai_unit_test( + LIB_UNDER_TEST + Pool1dApi + MORE_LIBS + LayerQuant + QuantizationApi + Quantization + Rounding + MaxPool1d + AvgPool1d + Kernel + Tensor + TensorApi +) + add_executable(UnitTestMultiLayerTraining UnitTestMultiLayerTraining.c) target_link_libraries(UnitTestMultiLayerTraining PRIVATE unity diff --git a/test/unit/userAPI/UnitTestConv1dApi.c b/test/unit/userAPI/UnitTestConv1dApi.c new file mode 100644 index 0000000..a605a9b --- /dev/null +++ b/test/unit/userAPI/UnitTestConv1dApi.c @@ -0,0 +1,147 @@ +#define SOURCE_FILE "UNIT_TEST_CONV1D_API" + +#include "Conv1d.h" +#include "Conv1dApi.h" +#include "Kernel.h" +#include "Layer.h" +#include "LayerCommon.h" +#include "LayerQuant.h" +#include "QuantizationApi.h" +#include "Tensor.h" +#include "unity.h" + +void setUp() {} +void tearDown() {} + +void testConv1dLayerInitBorrowingBuildsLayerWithCorrectShape(void) { + quantization_t *q = quantizationInitFloat(); + layerQuant_t lq; + layerQuantInitUniform(&lq, q); + + layer_t *layer = conv1dLayerInit( + &(conv1dInit_t){ + .inChannels = 3, + .outChannels = 4, + .kernelSize = 5, + .padding = VALID, + .stride = 1, + .dilation = 1, + .groups = 1, + .bias = BIAS_TRUE, + }, + &lq); + + TEST_ASSERT_NOT_NULL(layer); + TEST_ASSERT_EQUAL_INT(CONV1D, layer->type); + + conv1dConfig_t *cfg = layer->config->conv1d; + TEST_ASSERT_NOT_NULL(cfg); + TEST_ASSERT_FALSE(cfg->ownsQuantizations); + + /* Borrowing variant stores pointers verbatim */ + TEST_ASSERT_EQUAL_PTR(q, cfg->forwardQ); + TEST_ASSERT_EQUAL_PTR(q, cfg->weightGradQ); + TEST_ASSERT_EQUAL_PTR(q, cfg->biasGradQ); + TEST_ASSERT_EQUAL_PTR(q, cfg->propLossQ); + + /* Weights allocated with shape [outChannels, inChannels/groups, kernelSize] */ + TEST_ASSERT_NOT_NULL(cfg->weights); + tensor_t *weightTensor = cfg->weights->param; + TEST_ASSERT_NOT_NULL(weightTensor); + TEST_ASSERT_EQUAL_UINT(3, weightTensor->shape->numberOfDimensions); + TEST_ASSERT_EQUAL_UINT(4, weightTensor->shape->dimensions[0]); /* outChannels */ + TEST_ASSERT_EQUAL_UINT(3, weightTensor->shape->dimensions[1]); /* inChannels / groups */ + TEST_ASSERT_EQUAL_UINT(5, weightTensor->shape->dimensions[2]); /* kernelSize */ + + /* Bias allocated with shape [outChannels] */ + TEST_ASSERT_NOT_NULL(cfg->bias); + tensor_t *biasTensor = cfg->bias->param; + TEST_ASSERT_NOT_NULL(biasTensor); + TEST_ASSERT_EQUAL_UINT(1, biasTensor->shape->numberOfDimensions); + TEST_ASSERT_EQUAL_UINT(4, biasTensor->shape->dimensions[0]); + + /* Kernel populated from init struct */ + TEST_ASSERT_NOT_NULL(cfg->kernel); + TEST_ASSERT_EQUAL_UINT(5, cfg->kernel->size); + TEST_ASSERT_EQUAL_INT(VALID, cfg->kernel->paddingType); + TEST_ASSERT_EQUAL_UINT(1, cfg->kernel->stride); + TEST_ASSERT_EQUAL_UINT(1, cfg->kernel->dilation); + + /* groups defaulted to 1 explicitly via init */ + TEST_ASSERT_EQUAL_UINT(1, cfg->groups); + + freeConv1dLayer(layer); +} + +void testConv1dLayerInitBorrowingBiasDefaultResolvesToTrue(void) { + quantization_t *q = quantizationInitFloat(); + layerQuant_t lq; + layerQuantInitUniform(&lq, q); + + layer_t *layer = conv1dLayerInit( + &(conv1dInit_t){ + .inChannels = 1, + .outChannels = 2, + .kernelSize = 3, + /* .bias omitted → BIAS_DEFAULT (0) → resolves to true */ + }, + &lq); + + conv1dConfig_t *cfg = layer->config->conv1d; + TEST_ASSERT_NOT_NULL(cfg->bias); + + freeConv1dLayer(layer); +} + +void testConv1dLayerInitBorrowingBiasFalseLeavesBiasNull(void) { + quantization_t *q = quantizationInitFloat(); + layerQuant_t lq; + layerQuantInitUniform(&lq, q); + + layer_t *layer = conv1dLayerInit( + &(conv1dInit_t){ + .inChannels = 1, + .outChannels = 2, + .kernelSize = 3, + .bias = BIAS_FALSE, + }, + &lq); + + conv1dConfig_t *cfg = layer->config->conv1d; + TEST_ASSERT_NULL(cfg->bias); + + freeConv1dLayer(layer); +} + +void testConv1dLayerInitBorrowingPaddingDefaultIsValid(void) { + quantization_t *q = quantizationInitFloat(); + layerQuant_t lq; + layerQuantInitUniform(&lq, q); + + layer_t *layer = conv1dLayerInit( + &(conv1dInit_t){ + .inChannels = 1, + .outChannels = 1, + .kernelSize = 3, + /* .padding omitted → VALID (enum value 0) */ + /* .stride, .dilation, .groups omitted → 1 (resolved from 0) */ + }, + &lq); + + conv1dConfig_t *cfg = layer->config->conv1d; + TEST_ASSERT_EQUAL_INT(VALID, cfg->kernel->paddingType); + TEST_ASSERT_EQUAL_UINT(1, cfg->kernel->stride); + TEST_ASSERT_EQUAL_UINT(1, cfg->kernel->dilation); + TEST_ASSERT_EQUAL_UINT(1, cfg->groups); + + freeConv1dLayer(layer); +} + +int main(void) { + UNITY_BEGIN(); + RUN_TEST(testConv1dLayerInitBorrowingBuildsLayerWithCorrectShape); + RUN_TEST(testConv1dLayerInitBorrowingBiasDefaultResolvesToTrue); + RUN_TEST(testConv1dLayerInitBorrowingBiasFalseLeavesBiasNull); + RUN_TEST(testConv1dLayerInitBorrowingPaddingDefaultIsValid); + return UNITY_END(); +} diff --git a/test/unit/userAPI/UnitTestConv1dTransposedApi.c b/test/unit/userAPI/UnitTestConv1dTransposedApi.c new file mode 100644 index 0000000..4725fd0 --- /dev/null +++ b/test/unit/userAPI/UnitTestConv1dTransposedApi.c @@ -0,0 +1,127 @@ +#define SOURCE_FILE "UNIT_TEST_CONV1D_TRANSPOSED_API" + +#include "Conv1dTransposed.h" +#include "Conv1dTransposedApi.h" +#include "Kernel.h" +#include "Layer.h" +#include "LayerCommon.h" +#include "LayerQuant.h" +#include "QuantizationApi.h" +#include "Tensor.h" +#include "TensorApi.h" +#include "unity.h" + +void setUp() {} +void tearDown() {} + +void testConv1dTransposedLayerInitBorrowingBuildsLayerWithCorrectShape(void) { + quantization_t *q = quantizationInitFloat(); + layerQuant_t lq; + layerQuantInitUniform(&lq, q); + + layer_t *layer = conv1dTransposedLayerInit( + &(conv1dTransposedInit_t){ + .inChannels = 16, + .outChannels = 8, + .kernelSize = 5, + .stride = 5, + .padding = VALID, + .bias = BIAS_TRUE, + }, + &lq); + + TEST_ASSERT_NOT_NULL(layer); + TEST_ASSERT_EQUAL_INT(CONV1D_TRANSPOSED, layer->type); + + conv1dTransposedConfig_t *cfg = layer->config->conv1dTransposed; + TEST_ASSERT_NOT_NULL(cfg); + TEST_ASSERT_FALSE(cfg->ownsQuantizations); + + /* Borrowing variant stores pointers verbatim */ + TEST_ASSERT_EQUAL_PTR(q, cfg->forwardQ); + TEST_ASSERT_EQUAL_PTR(q, cfg->weightGradQ); + TEST_ASSERT_EQUAL_PTR(q, cfg->biasGradQ); + TEST_ASSERT_EQUAL_PTR(q, cfg->propLossQ); + + /* Weight shape: [inChannels, outChannels/groups, kernelSize] per Conv1dTransposed.h:12. + * Note SWAP from Conv1d. */ + TEST_ASSERT_NOT_NULL(cfg->weights); + tensor_t *weightTensor = cfg->weights->param; + TEST_ASSERT_NOT_NULL(weightTensor); + TEST_ASSERT_EQUAL_UINT(3, weightTensor->shape->numberOfDimensions); + TEST_ASSERT_EQUAL_UINT(16, weightTensor->shape->dimensions[0]); /* inChannels */ + TEST_ASSERT_EQUAL_UINT(8, weightTensor->shape->dimensions[1]); /* outChannels / groups */ + TEST_ASSERT_EQUAL_UINT(5, weightTensor->shape->dimensions[2]); /* kernelSize */ + + /* Bias shape: [outChannels] */ + TEST_ASSERT_NOT_NULL(cfg->bias); + tensor_t *biasTensor = cfg->bias->param; + TEST_ASSERT_EQUAL_UINT(1, biasTensor->shape->numberOfDimensions); + TEST_ASSERT_EQUAL_UINT(8, biasTensor->shape->dimensions[0]); + + /* Kernel populated from init struct */ + TEST_ASSERT_NOT_NULL(cfg->kernel); + TEST_ASSERT_EQUAL_UINT(5, cfg->kernel->size); + TEST_ASSERT_EQUAL_INT(VALID, cfg->kernel->paddingType); + TEST_ASSERT_EQUAL_UINT(5, cfg->kernel->stride); + + /* groups + outputPadding defaulted to 1 / 0 */ + TEST_ASSERT_EQUAL_UINT(1, cfg->groups); + TEST_ASSERT_EQUAL_UINT(0, cfg->outputPadding); + + freeConv1dTransposedLayer(layer); + freeQuantization(q); +} + +void testConv1dTransposedLayerInitBorrowingBiasFalseLeavesBiasNull(void) { + quantization_t *q = quantizationInitFloat(); + layerQuant_t lq; + layerQuantInitUniform(&lq, q); + + layer_t *layer = conv1dTransposedLayerInit( + &(conv1dTransposedInit_t){ + .inChannels = 4, + .outChannels = 2, + .kernelSize = 3, + .bias = BIAS_FALSE, + }, + &lq); + + conv1dTransposedConfig_t *cfg = layer->config->conv1dTransposed; + TEST_ASSERT_NULL(cfg->bias); + + freeConv1dTransposedLayer(layer); + freeQuantization(q); +} + +void testConv1dTransposedLayerInitBorrowingOutputPaddingPropagatesToConfig(void) { + quantization_t *q = quantizationInitFloat(); + layerQuant_t lq; + layerQuantInitUniform(&lq, q); + + layer_t *layer = conv1dTransposedLayerInit( + &(conv1dTransposedInit_t){ + .inChannels = 4, + .outChannels = 2, + .kernelSize = 3, + .stride = 2, + .outputPadding = 1, + .bias = BIAS_TRUE, + }, + &lq); + + conv1dTransposedConfig_t *cfg = layer->config->conv1dTransposed; + TEST_ASSERT_EQUAL_UINT(1, cfg->outputPadding); + TEST_ASSERT_EQUAL_UINT(2, cfg->kernel->stride); + + freeConv1dTransposedLayer(layer); + freeQuantization(q); +} + +int main(void) { + UNITY_BEGIN(); + RUN_TEST(testConv1dTransposedLayerInitBorrowingBuildsLayerWithCorrectShape); + RUN_TEST(testConv1dTransposedLayerInitBorrowingBiasFalseLeavesBiasNull); + RUN_TEST(testConv1dTransposedLayerInitBorrowingOutputPaddingPropagatesToConfig); + return UNITY_END(); +} diff --git a/test/unit/userAPI/UnitTestFlattenIntegration.c b/test/unit/userAPI/UnitTestFlattenIntegration.c index 0203831..d4124c1 100644 --- a/test/unit/userAPI/UnitTestFlattenIntegration.c +++ b/test/unit/userAPI/UnitTestFlattenIntegration.c @@ -51,7 +51,7 @@ void testCalculateGradsSequential_WithFlattenFirst_DoesNotCrash(void) { layer_t *flatten = flattenLayerInit(); layer_t *linear = linearLayerInitLegacy(w0, b0, q, q, q, q); - layer_t *softmax = softmaxLayerInit(q, q); + layer_t *softmax = softmaxLayerInitLegacy(q, q); layer_t *model[3] = {flatten, linear, softmax}; /* Input [1, 2, 3] = 6 elements. */ @@ -93,7 +93,7 @@ void testCalculateGradsSequential_WithFlattenFirst_DoesNotCrash(void) { freeTrainingStats(stats); freeTensor(label); freeTensor(input); - freeSoftmaxLayer(softmax); + freeSoftmaxLayerLegacy(softmax); freeLinearLayerLegacy(linear); freeFlattenLayer(flatten); freeParameter(b0); diff --git a/test/unit/userAPI/UnitTestLayerQuant.c b/test/unit/userAPI/UnitTestLayerQuant.c index ff3f919..8c37b10 100644 --- a/test/unit/userAPI/UnitTestLayerQuant.c +++ b/test/unit/userAPI/UnitTestLayerQuant.c @@ -2,6 +2,7 @@ #include "LayerQuant.h" #include "QuantizationApi.h" +#include "StorageApi.h" #include "unity.h" void setUp() {} @@ -31,9 +32,47 @@ void testLayerQuantInitUniformDoesNotMutateTheQuantization(void) { TEST_ASSERT_EQUAL_PTR(configBefore, q->qConfig); } +void testDeepCopyQuantizationReturnsNullForNullInput(void) { + TEST_ASSERT_NULL(deepCopyQuantization(NULL)); +} + +void testDeepCopyQuantizationFloat32ReturnsFreshAllocationWithNullQConfig(void) { + quantization_t *src = quantizationInitFloat(); + quantization_t *dst = deepCopyQuantization(src); + + TEST_ASSERT_NOT_NULL(dst); + TEST_ASSERT_NOT_EQUAL(src, dst); /* fresh allocation */ + TEST_ASSERT_EQUAL_INT(FLOAT32, dst->type); + TEST_ASSERT_NULL(dst->qConfig); + + freeReservedMemory(dst->qConfig); + freeReservedMemory(dst); +} + +void testDeepCopyQuantizationSymInt32DuplicatesQConfigBytes(void) { + quantization_t *src = quantizationInitSymInt32(HTE); + quantization_t *dst = deepCopyQuantization(src); + + TEST_ASSERT_NOT_NULL(dst); + TEST_ASSERT_NOT_EQUAL(src, dst); + TEST_ASSERT_EQUAL_INT(SYM_INT32, dst->type); + TEST_ASSERT_NOT_NULL(dst->qConfig); + TEST_ASSERT_NOT_EQUAL(src->qConfig, dst->qConfig); + + symInt32QConfig_t *srcCfg = (symInt32QConfig_t *)src->qConfig; + symInt32QConfig_t *dstCfg = (symInt32QConfig_t *)dst->qConfig; + TEST_ASSERT_EQUAL_MEMORY(srcCfg, dstCfg, sizeof(symInt32QConfig_t)); + + freeReservedMemory(dst->qConfig); + freeReservedMemory(dst); +} + int main(void) { UNITY_BEGIN(); RUN_TEST(testLayerQuantInitUniformSetsAllFourSlotsToTheSamePointer); RUN_TEST(testLayerQuantInitUniformDoesNotMutateTheQuantization); + RUN_TEST(testDeepCopyQuantizationReturnsNullForNullInput); + RUN_TEST(testDeepCopyQuantizationFloat32ReturnsFreshAllocationWithNullQConfig); + RUN_TEST(testDeepCopyQuantizationSymInt32DuplicatesQConfigBytes); return UNITY_END(); } diff --git a/test/unit/userAPI/UnitTestLayerWeightsApi.c b/test/unit/userAPI/UnitTestLayerWeightsApi.c index d6b3318..b8330d7a 100644 --- a/test/unit/userAPI/UnitTestLayerWeightsApi.c +++ b/test/unit/userAPI/UnitTestLayerWeightsApi.c @@ -1,5 +1,9 @@ #define SOURCE_FILE "UNIT_TEST_LAYER_WEIGHTS_API" +#include "Conv1d.h" +#include "Conv1dApi.h" +#include "Conv1dTransposed.h" +#include "Conv1dTransposedApi.h" #include "LayerCommon.h" #include "LayerQuant.h" #include "LayerWeightsApi.h" @@ -7,6 +11,7 @@ #include "LinearApi.h" #include "QuantizationApi.h" #include "Tensor.h" +#include "TensorApi.h" #include "unity.h" void setUp() {} @@ -64,9 +69,108 @@ void testLayerLoadWeightsLinearNoBiasAcceptsNullBiasData(void) { freeLinearLayer(layer); } +void testLayerLoadWeightsConv1dOverwritesWeightAndBiasTensors(void) { + quantization_t *q = quantizationInitFloat(); + layerQuant_t lq; + layerQuantInitUniform(&lq, q); + + layer_t *layer = conv1dLayerInit( + &(conv1dInit_t){ + .inChannels = 2, + .outChannels = 3, + .kernelSize = 4, + .bias = BIAS_TRUE, + }, + &lq); + + /* Weight tensor: [outChannels=3, inChannels/groups=2, K=4] → 24 elems + * Bias tensor: [outChannels=3] → 3 elems */ + float weightData[24] = { + 1.f, 2.f, 3.f, 4.f, 5.f, 6.f, 7.f, 8.f, 9.f, 10.f, 11.f, 12.f, + 13.f, 14.f, 15.f, 16.f, 17.f, 18.f, 19.f, 20.f, 21.f, 22.f, 23.f, 24.f, + }; + float biasData[3] = {-1.f, -2.f, -3.f}; + + layerLoadWeights(layer, weightData, biasData); + + conv1dConfig_t *cfg = layer->config->conv1d; + float *loadedWeights = (float *)cfg->weights->param->data; + float *loadedBias = (float *)cfg->bias->param->data; + + TEST_ASSERT_EQUAL_FLOAT_ARRAY(weightData, loadedWeights, 24); + TEST_ASSERT_EQUAL_FLOAT_ARRAY(biasData, loadedBias, 3); + + freeConv1dLayer(layer); + freeQuantization(q); +} + +void testLayerLoadWeightsConv1dNoBiasAcceptsNullBiasData(void) { + quantization_t *q = quantizationInitFloat(); + layerQuant_t lq; + layerQuantInitUniform(&lq, q); + + layer_t *layer = conv1dLayerInit( + &(conv1dInit_t){ + .inChannels = 1, + .outChannels = 1, + .kernelSize = 3, + .bias = BIAS_FALSE, + }, + &lq); + + float weightData[3] = {0.5f, 0.25f, 0.125f}; + layerLoadWeights(layer, weightData, NULL); + + conv1dConfig_t *cfg = layer->config->conv1d; + float *loadedWeights = (float *)cfg->weights->param->data; + TEST_ASSERT_EQUAL_FLOAT_ARRAY(weightData, loadedWeights, 3); + TEST_ASSERT_NULL(cfg->bias); + + freeConv1dLayer(layer); + freeQuantization(q); +} + +void testLayerLoadWeightsConv1dTransposedOverwritesWeightAndBiasTensors(void) { + quantization_t *q = quantizationInitFloat(); + layerQuant_t lq; + layerQuantInitUniform(&lq, q); + + layer_t *layer = conv1dTransposedLayerInit( + &(conv1dTransposedInit_t){ + .inChannels = 4, + .outChannels = 2, + .kernelSize = 3, + .bias = BIAS_TRUE, + }, + &lq); + + /* Weight tensor: [inChannels=4, outChannels/groups=2, K=3] → 24 elems. + * NOTE the SWAP relative to Conv1d. */ + float weightData[24] = {0}; + for (size_t i = 0; i < 24; i++) { + weightData[i] = (float)(i + 100); + } + float biasData[2] = {-10.f, -20.f}; + + layerLoadWeights(layer, weightData, biasData); + + conv1dTransposedConfig_t *cfg = layer->config->conv1dTransposed; + float *loadedWeights = (float *)cfg->weights->param->data; + float *loadedBias = (float *)cfg->bias->param->data; + + TEST_ASSERT_EQUAL_FLOAT_ARRAY(weightData, loadedWeights, 24); + TEST_ASSERT_EQUAL_FLOAT_ARRAY(biasData, loadedBias, 2); + + freeConv1dTransposedLayer(layer); + freeQuantization(q); +} + int main(void) { UNITY_BEGIN(); RUN_TEST(testLayerLoadWeightsLinearOverwritesWeightAndBiasTensors); RUN_TEST(testLayerLoadWeightsLinearNoBiasAcceptsNullBiasData); + RUN_TEST(testLayerLoadWeightsConv1dOverwritesWeightAndBiasTensors); + RUN_TEST(testLayerLoadWeightsConv1dNoBiasAcceptsNullBiasData); + RUN_TEST(testLayerLoadWeightsConv1dTransposedOverwritesWeightAndBiasTensors); return UNITY_END(); } diff --git a/test/unit/userAPI/UnitTestMnistSmoke.c b/test/unit/userAPI/UnitTestMnistSmoke.c index 3b312fb..65c4dbc 100644 --- a/test/unit/userAPI/UnitTestMnistSmoke.c +++ b/test/unit/userAPI/UnitTestMnistSmoke.c @@ -162,7 +162,7 @@ static void buildModel(layer_t **model, quantization_t **q_out) { parameter_t *b1 = parameterInit(b1Param, b1Grad); model[2] = linearLayerInitLegacy(w1, b1, q, q, q, q); - model[3] = softmaxLayerInit(q, q); + model[3] = softmaxLayerInitLegacy(q, q); } static size_t cbInvocations; @@ -215,7 +215,7 @@ void testMnistSmoke_FullTrainingPipelineReducesLoss() { * NOTE: freeOptimSgdM cascades to all model parameters via freeParameter. * Do NOT also call freeParameter on w0/b0/w1/b1 — would be a double-free. */ freeOptimSgdM(sgd); - freeSoftmaxLayer(model[3]); + freeSoftmaxLayerLegacy(model[3]); freeLinearLayerLegacy(model[2]); freeReluLayerLegacy(model[1]); freeLinearLayerLegacy(model[0]); @@ -275,7 +275,7 @@ void testMnistSmoke_SnprintfGmtimeRBetweenSetupAndTrainingRun_NoSilentExit() { char capturedFirstChar = buf[0]; freeOptimSgdM(sgd); - freeSoftmaxLayer(model[3]); + freeSoftmaxLayerLegacy(model[3]); freeLinearLayerLegacy(model[2]); freeReluLayerLegacy(model[1]); freeLinearLayerLegacy(model[0]); diff --git a/test/unit/userAPI/UnitTestMultiLayerTraining.c b/test/unit/userAPI/UnitTestMultiLayerTraining.c index 9aec230..30f7460 100644 --- a/test/unit/userAPI/UnitTestMultiLayerTraining.c +++ b/test/unit/userAPI/UnitTestMultiLayerTraining.c @@ -86,7 +86,7 @@ void testMultiLayerBackward_WithCrossEntropy_DoesNotCrash() { parameter_t *b1 = parameterInit(b1Param, b1Grad); layer_t *linear1 = linearLayerInitLegacy(w1, b1, q, q, q, q); - layer_t *softmax = softmaxLayerInit(q, q); + layer_t *softmax = softmaxLayerInitLegacy(q, q); layer_t *model[] = {linear0, relu, linear1, softmax}; size_t sizeModel = 4; @@ -126,7 +126,7 @@ void testMultiLayerBackward_WithCrossEntropy_DoesNotCrash() { freeTrainingStats(stats); freeTensor(label); freeTensor(input); - freeSoftmaxLayer(softmax); + freeSoftmaxLayerLegacy(softmax); freeLinearLayerLegacy(linear1); freeParameter(b1); freeParameter(w1); @@ -205,7 +205,7 @@ void testMultiLayerBackward_WithManualInit_DoesNotCrash() { parameter_t *b1 = parameterInit(b1Param, b1Grad); layer_t *linear1 = linearLayerInitLegacy(w1, b1, q, q, q, q); - layer_t *softmax = softmaxLayerInit(q, q); + layer_t *softmax = softmaxLayerInitLegacy(q, q); layer_t *model[] = {linear0, relu, linear1, softmax}; size_t sizeModel = 4; @@ -257,7 +257,7 @@ void testMultiLayerBackward_WithManualInit_DoesNotCrash() { freeTrainingStats(stats); freeTensor(label); freeTensor(input); - freeSoftmaxLayer(softmax); + freeSoftmaxLayerLegacy(softmax); freeLinearLayerLegacy(linear1); freeParameter(b1); freeParameter(w1); @@ -334,7 +334,7 @@ void testMultiLayerTraining_MultipleSteps_GradsAccumulate() { parameter_t *b1 = parameterInit(b1Param, b1Grad); layer_t *linear1 = linearLayerInitLegacy(w1, b1, q, q, q, q); - layer_t *softmax = softmaxLayerInit(q, q); + layer_t *softmax = softmaxLayerInitLegacy(q, q); layer_t *model[] = {linear0, relu, linear1, softmax}; size_t sizeModel = 4; @@ -389,7 +389,7 @@ void testMultiLayerTraining_MultipleSteps_GradsAccumulate() { freeTensor(label); freeTensor(input); freeOptimSgdM(sgd); - freeSoftmaxLayer(softmax); + freeSoftmaxLayerLegacy(softmax); freeLinearLayerLegacy(linear1); freeReluLayerLegacy(relu); freeLinearLayerLegacy(linear0); diff --git a/test/unit/userAPI/UnitTestPool1dApi.c b/test/unit/userAPI/UnitTestPool1dApi.c new file mode 100644 index 0000000..c9a8e03 --- /dev/null +++ b/test/unit/userAPI/UnitTestPool1dApi.c @@ -0,0 +1,219 @@ +#define SOURCE_FILE "UNIT_TEST_POOL1D_API" + +#include "AvgPool1d.h" +#include "Kernel.h" +#include "Layer.h" +#include "LayerQuant.h" +#include "MaxPool1d.h" +#include "Pool1dApi.h" +#include "QuantizationApi.h" +#include "Tensor.h" +#include "TensorApi.h" +#include "unity.h" + +void setUp() {} +void tearDown() {} + +/* ============================================================================ + * MaxPool1d + * ========================================================================== */ + +void testMaxPool1dLayerInitBorrowingBuildsLayerWithKernelAndArgmax(void) { + quantization_t *q = quantizationInitFloat(); + layerQuant_t lq; + layerQuantInitUniform(&lq, q); + + /* For K=2 S=2 VALID on inputLength=64, outputLength = (64 - 2)/2 + 1 = 32 */ + layer_t *layer = maxPool1dLayerInit( + &(maxPool1dInit_t){ + .kernelSize = 2, + .stride = 2, + .inputChannels = 16, + .inputLength = 64, + }, + &lq); + + TEST_ASSERT_NOT_NULL(layer); + TEST_ASSERT_EQUAL_INT(MAXPOOL1D, layer->type); + + maxPool1dConfig_t *cfg = layer->config->maxPool1d; + TEST_ASSERT_NOT_NULL(cfg); + TEST_ASSERT_FALSE(cfg->ownsQuantizations); + + TEST_ASSERT_EQUAL_PTR(q, cfg->forwardQ); + TEST_ASSERT_EQUAL_PTR(q, cfg->propLossQ); + + /* Kernel correctness */ + TEST_ASSERT_NOT_NULL(cfg->kernel); + TEST_ASSERT_EQUAL_UINT(2, cfg->kernel->size); + TEST_ASSERT_EQUAL_INT(VALID, cfg->kernel->paddingType); + TEST_ASSERT_EQUAL_UINT(2, cfg->kernel->stride); + TEST_ASSERT_EQUAL_UINT(1, cfg->kernel->dilation); + + /* Argmax tensor shape: [1, inputChannels, outputLength] = [1, 16, 32] */ + TEST_ASSERT_NOT_NULL(cfg->argmaxIndices); + TEST_ASSERT_EQUAL_UINT(3, cfg->argmaxIndices->shape->numberOfDimensions); + TEST_ASSERT_EQUAL_UINT(1, cfg->argmaxIndices->shape->dimensions[0]); + TEST_ASSERT_EQUAL_UINT(16, cfg->argmaxIndices->shape->dimensions[1]); + TEST_ASSERT_EQUAL_UINT(32, cfg->argmaxIndices->shape->dimensions[2]); + TEST_ASSERT_EQUAL_INT(INT32, cfg->argmaxIndices->quantization->type); + + freeMaxPool1dLayer(layer); + freeQuantization(q); +} + +void testMaxPool1dLayerInitBorrowingStrideDefaultsToKernelSize(void) { + quantization_t *q = quantizationInitFloat(); + layerQuant_t lq; + layerQuantInitUniform(&lq, q); + + /* stride omitted → defaults to kernelSize per PyTorch convention */ + layer_t *layer = maxPool1dLayerInit( + &(maxPool1dInit_t){ + .kernelSize = 4, + .inputChannels = 1, + .inputLength = 16, + }, + &lq); + + maxPool1dConfig_t *cfg = layer->config->maxPool1d; + TEST_ASSERT_EQUAL_UINT(4, cfg->kernel->size); + TEST_ASSERT_EQUAL_UINT(4, cfg->kernel->stride); + /* outputLength = (16 - 4)/4 + 1 = 4 */ + TEST_ASSERT_EQUAL_UINT(4, cfg->argmaxIndices->shape->dimensions[2]); + + freeMaxPool1dLayer(layer); + freeQuantization(q); +} + +void testMaxPool1dLayerInitOwningDeepCopiesTwoQuantizations(void) { + quantization_t *q = quantizationInitFloat(); + layerQuant_t lq; + layerQuantInitUniform(&lq, q); + + layer_t *layer = maxPool1dLayerInitOwning( + &(maxPool1dInit_t){ + .kernelSize = 2, + .stride = 2, + .inputChannels = 4, + .inputLength = 8, + }, + &lq); + + maxPool1dConfig_t *cfg = layer->config->maxPool1d; + TEST_ASSERT_NOT_EQUAL(q, cfg->forwardQ); + TEST_ASSERT_NOT_EQUAL(q, cfg->propLossQ); + TEST_ASSERT_EQUAL_INT(q->type, cfg->forwardQ->type); + TEST_ASSERT_TRUE(cfg->ownsQuantizations); + + freeMaxPool1dLayer(layer); + freeQuantization(q); +} + +void testMaxPool1dLayerInitOwningRepeatedBuildFreeNoLeak(void) { + for (int i = 0; i < 5; i++) { + quantization_t *q = quantizationInitFloat(); + layerQuant_t lq; + layerQuantInitUniform(&lq, q); + + layer_t *layer = maxPool1dLayerInitOwning( + &(maxPool1dInit_t){ + .kernelSize = 2, + .stride = 2, + .inputChannels = 4, + .inputLength = 8, + }, + &lq); + + freeMaxPool1dLayer(layer); + freeQuantization(q); + } + TEST_PASS(); +} + +/* ============================================================================ + * AvgPool1d + * ========================================================================== */ + +void testAvgPool1dLayerInitBorrowingBuildsLayerWithKernel(void) { + quantization_t *q = quantizationInitFloat(); + layerQuant_t lq; + layerQuantInitUniform(&lq, q); + + layer_t *layer = avgPool1dLayerInit( + &(avgPool1dInit_t){ + .kernelSize = 5, + .stride = 5, + }, + &lq); + + TEST_ASSERT_NOT_NULL(layer); + TEST_ASSERT_EQUAL_INT(AVGPOOL1D, layer->type); + + avgPool1dConfig_t *cfg = layer->config->avgPool1d; + TEST_ASSERT_NOT_NULL(cfg); + TEST_ASSERT_FALSE(cfg->ownsQuantizations); + + TEST_ASSERT_EQUAL_PTR(q, cfg->forwardQ); + TEST_ASSERT_EQUAL_PTR(q, cfg->propLossQ); + + TEST_ASSERT_NOT_NULL(cfg->kernel); + TEST_ASSERT_EQUAL_UINT(5, cfg->kernel->size); + TEST_ASSERT_EQUAL_INT(VALID, cfg->kernel->paddingType); + TEST_ASSERT_EQUAL_UINT(5, cfg->kernel->stride); + + freeAvgPool1dLayer(layer); + freeQuantization(q); +} + +void testAvgPool1dLayerInitBorrowingStrideDefaultsToKernelSize(void) { + quantization_t *q = quantizationInitFloat(); + layerQuant_t lq; + layerQuantInitUniform(&lq, q); + + layer_t *layer = avgPool1dLayerInit( + &(avgPool1dInit_t){ + .kernelSize = 3, + /* stride omitted → kernelSize=3 */ + }, + &lq); + + avgPool1dConfig_t *cfg = layer->config->avgPool1d; + TEST_ASSERT_EQUAL_UINT(3, cfg->kernel->stride); + + freeAvgPool1dLayer(layer); + freeQuantization(q); +} + +void testAvgPool1dLayerInitOwningDeepCopiesTwoQuantizations(void) { + quantization_t *q = quantizationInitFloat(); + layerQuant_t lq; + layerQuantInitUniform(&lq, q); + + layer_t *layer = avgPool1dLayerInitOwning( + &(avgPool1dInit_t){ + .kernelSize = 2, + .stride = 2, + }, + &lq); + + avgPool1dConfig_t *cfg = layer->config->avgPool1d; + TEST_ASSERT_NOT_EQUAL(q, cfg->forwardQ); + TEST_ASSERT_NOT_EQUAL(q, cfg->propLossQ); + TEST_ASSERT_TRUE(cfg->ownsQuantizations); + + freeAvgPool1dLayer(layer); + freeQuantization(q); +} + +int main(void) { + UNITY_BEGIN(); + RUN_TEST(testMaxPool1dLayerInitBorrowingBuildsLayerWithKernelAndArgmax); + RUN_TEST(testMaxPool1dLayerInitBorrowingStrideDefaultsToKernelSize); + RUN_TEST(testMaxPool1dLayerInitOwningDeepCopiesTwoQuantizations); + RUN_TEST(testMaxPool1dLayerInitOwningRepeatedBuildFreeNoLeak); + RUN_TEST(testAvgPool1dLayerInitBorrowingBuildsLayerWithKernel); + RUN_TEST(testAvgPool1dLayerInitBorrowingStrideDefaultsToKernelSize); + RUN_TEST(testAvgPool1dLayerInitOwningDeepCopiesTwoQuantizations); + return UNITY_END(); +} From 1bbf2b467e806c2e76e169c6da84ff06a6b6ef4b Mon Sep 17 00:00:00 2001 From: Leo Buron Date: Fri, 15 May 2026 22:03:42 +0200 Subject: [PATCH 2/4] feat(layer): implement softmaxLayerInit Borrowing + Owning + new freeSoftmaxLayer MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Factory takes layerQuant_t* and stores .forwardMath / .backwardMath as the layer's forward/backward quantizations. Owning deep-copies both via deepCopyQuantization. freeSoftmaxLayer reads ownsQuantizations to decide whether to also tear down the two quantizations and qConfigs. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md sections 4, 5 feat(layer): add ownsQuantizations flag to five internal layer configs Mirrors the field PR 1 added to linearConfig_t and reluConfig_t. Foundation for the new factory API in subsequent commits — each new *LayerInitOwning sets the flag to true and the canonical free*Layer branches on it. Calloc-backed allocation makes the default false, which preserves the existing borrowing-semantics for legacy callers. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 5.3 feat(layer): implement conv1dLayerInit Borrowing variant + freeConv1dLayer Factory allocates kernel, weight, and bias internally. KAIMING_UNIFORM weights / ZEROS bias (calloc-implicit). Stores the four lq quantization pointers verbatim; sets ownsQuantizations=false. freeConv1dLayer tears down parameters + kernel unconditionally and the quantizations only when ownsQuantizations=true (defensive dedup against pointer aliasing). Fixes the pre-existing layer->config leak that the legacy free* path still has. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md sections 4, 5.1, 5.3 test(layer): failing tests for conv1dLayerInitOwning Verifies deep copy of all four quantization_t into fresh allocations, ownsQuantizations=true, and clean teardown without leaks. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 5.2 feat(layer): implement conv1dLayerInitOwning (deep-copy variant) Factory deep-copies each of the four quantization_t in lq via the shared deepCopyQuantization helper. Always four separate copies (no aliasing), keeping freeConv1dLayer simple. Caller can drop lq + all four quantizations immediately after the call. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 5.2 feat(layer): implement conv1dTransposedLayerInit Borrowing variant + freeConv1dTransposedLayer Allocates kernel, weights ([inChannels, outChannels/groups, kernelSize]), optional bias. KAIMING_UNIFORM weight init / ZEROS bias. Stores the four lq pointers verbatim. freeConv1dTransposedLayer tears down parameters + kernel unconditionally and quantizations only when ownsQuantizations=true. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md sections 4, 5.1, 5.3 test(layer): failing tests for conv1dTransposedLayerInitOwning Verifies deep-copy of the four quantization_t into fresh allocations, ownsQuantizations=true, and clean teardown. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 5.2 feat(layer): implement conv1dTransposedLayerInitOwning (deep-copy variant) Deep-copies each of the four quantization_t via deepCopyQuantization. Always four separate copies (no aliasing). Caller can drop lq + all four quantizations immediately after the call. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 5.2 feat(layer): implement maxPool1dLayerInit Borrowing + Owning + freeMaxPool1dLayer Factory pre-allocates kernel + argmaxIndices INT32 tensor (shape [1, inputChannels, outputLength]). outputLength derived via computePool1dOutputLength replicating the geometry rule from windowGeometry1dCalc. Stride defaults to kernelSize (PyTorch convention). Owning deep-copies forwardMath + backwardMath into the config's forwardQ + propLossQ slots. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md sections 3.3 (split), 4, 5 feat(layer): implement avgPool1dLayerInit Borrowing + Owning + freeAvgPool1dLayer Factory pre-allocates kernel only (no argmax). Stride defaults to kernelSize. Owning deep-copies forwardMath + backwardMath into the forwardQ + propLossQ slots. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md sections 4, 5 test(layer): failing tests for new softmaxLayerInit Borrowing + Owning Two tests: Borrowing stores lq pointers verbatim with ownsQuantizations=false; Owning deep-copies them with ownsQuantizations=true. Fails at link until impl lands. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 4 --- src/layer/include/AvgPool1d.h | 2 + src/layer/include/Conv1d.h | 2 + src/layer/include/Conv1dTransposed.h | 2 + src/layer/include/MaxPool1d.h | 2 + src/layer/include/Softmax.h | 3 + src/userApi/layer/CMakeLists.txt | 21 +- src/userApi/layer/Conv1dApi.c | 257 ++++++++++++++++- src/userApi/layer/Conv1dTransposedApi.c | 232 ++++++++++++++- src/userApi/layer/Pool1dApi.c | 269 +++++++++++++++++- src/userApi/layer/SoftmaxApi.c | 83 +++++- src/userApi/layer/include/SoftmaxApi.h | 13 + test/unit/layer/CMakeLists.txt | 3 + test/unit/layer/UnitTestSoftmax.c | 52 ++++ test/unit/loss_functions/CMakeLists.txt | 1 + test/unit/userAPI/UnitTestConv1dApi.c | 61 ++++ .../userAPI/UnitTestConv1dTransposedApi.c | 50 ++++ 16 files changed, 1039 insertions(+), 14 deletions(-) diff --git a/src/layer/include/AvgPool1d.h b/src/layer/include/AvgPool1d.h index d1f2dc0..2f7c160 100644 --- a/src/layer/include/AvgPool1d.h +++ b/src/layer/include/AvgPool1d.h @@ -1,6 +1,7 @@ #ifndef ODT_AVG_POOL_1D_H #define ODT_AVG_POOL_1D_H +#include #include #include "Kernel.h" @@ -11,6 +12,7 @@ typedef struct avgPool1dConfig { kernel_t *kernel; quantization_t *forwardQ; quantization_t *propLossQ; + bool ownsQuantizations; } avgPool1dConfig_t; void initAvgPool1dConfig(avgPool1dConfig_t *cfg, kernel_t *kernel, quantization_t *forwardQ, diff --git a/src/layer/include/Conv1d.h b/src/layer/include/Conv1d.h index 693e3b0..6b3c19a 100644 --- a/src/layer/include/Conv1d.h +++ b/src/layer/include/Conv1d.h @@ -1,6 +1,7 @@ #ifndef ODT_CONV1D_H #define ODT_CONV1D_H +#include #include #include "Kernel.h" @@ -16,6 +17,7 @@ typedef struct conv1dConfig { quantization_t *weightGradQ; quantization_t *biasGradQ; quantization_t *propLossQ; + bool ownsQuantizations; } conv1dConfig_t; void initConv1dConfigWithWeightsAndBias(conv1dConfig_t *conv1dConfig, kernel_t *kernel, diff --git a/src/layer/include/Conv1dTransposed.h b/src/layer/include/Conv1dTransposed.h index d9ad100..6ab9219 100644 --- a/src/layer/include/Conv1dTransposed.h +++ b/src/layer/include/Conv1dTransposed.h @@ -1,6 +1,7 @@ #ifndef ODT_CONV1D_TRANSPOSED_H #define ODT_CONV1D_TRANSPOSED_H +#include #include #include "Kernel.h" @@ -17,6 +18,7 @@ typedef struct conv1dTransposedConfig { quantization_t *weightGradQ; quantization_t *biasGradQ; quantization_t *propLossQ; + bool ownsQuantizations; } conv1dTransposedConfig_t; void initConv1dTransposedConfigWithWeightsAndBias( diff --git a/src/layer/include/MaxPool1d.h b/src/layer/include/MaxPool1d.h index df7dfa2..fb52550 100644 --- a/src/layer/include/MaxPool1d.h +++ b/src/layer/include/MaxPool1d.h @@ -1,6 +1,7 @@ #ifndef ODT_MAX_POOL_1D_H #define ODT_MAX_POOL_1D_H +#include #include #include "Kernel.h" @@ -12,6 +13,7 @@ typedef struct maxPool1dConfig { tensor_t *argmaxIndices; // INT32, shape == output shape; pre-allocated by caller quantization_t *forwardQ; quantization_t *propLossQ; + bool ownsQuantizations; } maxPool1dConfig_t; void initMaxPool1dConfig(maxPool1dConfig_t *cfg, kernel_t *kernel, tensor_t *argmaxIndices, diff --git a/src/layer/include/Softmax.h b/src/layer/include/Softmax.h index d3b2dcf..1254c59 100644 --- a/src/layer/include/Softmax.h +++ b/src/layer/include/Softmax.h @@ -1,11 +1,14 @@ #ifndef ENV5_RUNTIME_SOFTMAX_H #define ENV5_RUNTIME_SOFTMAX_H +#include + #include "Layer.h" typedef struct softmaxConfig { quantization_t *forwardQ; quantization_t *backwardQ; + bool ownsQuantizations; } softmaxConfig_t; void softmaxInitConfig(softmaxConfig_t *softmaxConfig, quantization_t *forwardQ, diff --git a/src/userApi/layer/CMakeLists.txt b/src/userApi/layer/CMakeLists.txt index c24e468..c4367c3 100644 --- a/src/userApi/layer/CMakeLists.txt +++ b/src/userApi/layer/CMakeLists.txt @@ -1,12 +1,18 @@ add_library(Conv1dApi Conv1dApi.c) target_include_directories(Conv1dApi PUBLIC include) target_link_libraries(Conv1dApi PRIVATE - Tensor - Rounding - Layer - Conv1d Common + Conv1d + Distributions + Kernel + Layer + LayerCommon + LayerQuant + Quantization + QuantizationApi + Rounding StorageApi + Tensor TensorApi ) @@ -42,11 +48,14 @@ target_link_libraries(ReluApi PRIVATE add_library(SoftmaxApi SoftmaxApi.c) target_include_directories(SoftmaxApi PUBLIC include) target_link_libraries(SoftmaxApi PRIVATE + Common Layer - Tensor + LayerQuant + Quantization Rounding - Common + Softmax StorageApi + Tensor ) add_library(FlattenApi FlattenApi.c) diff --git a/src/userApi/layer/Conv1dApi.c b/src/userApi/layer/Conv1dApi.c index 22a3d3a..e6c23ba 100644 --- a/src/userApi/layer/Conv1dApi.c +++ b/src/userApi/layer/Conv1dApi.c @@ -1,12 +1,24 @@ #define SOURCE_FILE "CONV1D_API" -#include "Conv1dApi.h" +#include +#include + +#include "Common.h" #include "Conv1d.h" +#include "Conv1dApi.h" +#include "Distributions.h" +#include "Kernel.h" #include "Layer.h" +#include "LayerCommon.h" +#include "LayerQuant.h" +#include "QuantizationApi.h" #include "StorageApi.h" +#include "Tensor.h" #include "TensorApi.h" -#include +/* ============================================================================ + * Legacy factory (renamed in Task 4). + * ========================================================================== */ layer_t *conv1dLayerInitLegacy(parameter_t *weights, parameter_t *bias, kernel_t *kernel, quantization_t *forwardQ, quantization_t *weightGradQ, @@ -41,3 +53,244 @@ void freeConv1dLayerLegacy(layer_t *conv1dLayer) { freeReservedMemory(conv1dConfig); freeReservedMemory(conv1dLayer); } + +/* ============================================================================ + * New factory API — conv1dInit_t struct + layerQuant_t profile (PR 2). + * ========================================================================== */ + +static bool resolveConv1dBias(bias_t b) { + switch (b) { + case BIAS_DEFAULT: + return true; /* PyTorch parity for Conv1d */ + case BIAS_TRUE: + return true; + case BIAS_FALSE: + return false; + default: + PRINT_ERROR("conv1dLayerInit: invalid bias value (got %d)", (int)b); + exit(1); + } +} + +/*! Build a heap-owned shape_t with the given dims; the tensor that this shape + * is passed to takes ownership and freeTensor cascades into freeShape. */ +static shape_t *buildOwnedShape(const size_t *srcDims, size_t numberOfDims) { + size_t *dims = reserveMemory(numberOfDims * sizeof(size_t)); + for (size_t i = 0; i < numberOfDims; i++) { + dims[i] = srcDims[i]; + } + size_t *order = reserveMemory(numberOfDims * sizeof(size_t)); + setOrderOfDimsForNewTensor(numberOfDims, order); + shape_t *shape = reserveMemory(sizeof(shape_t)); + setShape(shape, dims, numberOfDims, order); + return shape; +} + +static parameter_t *allocateConv1dWeights(size_t outChannels, size_t inChannels, size_t groups, + size_t kernelSize, quantization_t *storageQ) { + /* Conv1d weight shape: [outChannels, inChannels/groups, kernelSize]. + * Per Conv1d.h:11. */ + if (inChannels % groups != 0) { + PRINT_ERROR("conv1dLayerInit: inChannels (%zu) must be divisible by groups (%zu)", + inChannels, groups); + exit(1); + } + if (outChannels % groups != 0) { + PRINT_ERROR("conv1dLayerInit: outChannels (%zu) must be divisible by groups (%zu)", + outChannels, groups); + exit(1); + } + size_t inPerGroup = inChannels / groups; + + shape_t *shape = buildOwnedShape((size_t[]){outChannels, inPerGroup, kernelSize}, 3); + tensor_t *paramTensor = initTensor(shape, getQLike(storageQ), NULL); + + /* PyTorch-aligned default: Kaiming uniform with fan_in mode. + * Note: PyTorch's actual default uses a=sqrt(5); bit-identical parity + * requires Issue C (distribution parametrization). */ + if (storageQ->type != FLOAT32) { + PRINT_ERROR("conv1dLayerInit: KAIMING_UNIFORM init currently requires FLOAT32 " + "weight storage (Issue C will lift this limit)"); + exit(1); + } + distribution_t dist = { + .type = KAIMING_UNIFORM, + .params.kaiming = {.gain = 1.4142135623730951f /* sqrtf(2.0f) */, + .fanMode = inPerGroup * kernelSize}, + }; + initDistribution(paramTensor, &dist); + + tensor_t *gradTensor = gradInitFloat(paramTensor, NULL); + return parameterInit(paramTensor, gradTensor); +} + +static parameter_t *allocateConv1dBias(size_t outChannels, quantization_t *storageQ) { + /* Bias tensor: shape [outChannels]. Zero-initialized via calloc (reserveMemory). */ + shape_t *shape = buildOwnedShape((size_t[]){outChannels}, 1); + tensor_t *paramTensor = initTensor(shape, getQLike(storageQ), NULL); + /* No initDistribution(ZEROS) — calloc already gave us zeros. */ + + tensor_t *gradTensor = gradInitFloat(paramTensor, NULL); + return parameterInit(paramTensor, gradTensor); +} + +static void validateConv1dInit(conv1dInit_t *init) { + if (init == NULL) { + PRINT_ERROR("conv1dLayerInit: init pointer is NULL"); + exit(1); + } + if (init->inChannels == 0) { + PRINT_ERROR("conv1dLayerInit: inChannels must be > 0"); + exit(1); + } + if (init->outChannels == 0) { + PRINT_ERROR("conv1dLayerInit: outChannels must be > 0"); + exit(1); + } + if (init->kernelSize == 0) { + PRINT_ERROR("conv1dLayerInit: kernelSize must be > 0"); + exit(1); + } +} + +static void validateLayerQuantForConv1d(layerQuant_t *lq, bool hasBias) { + if (lq == NULL) { + PRINT_ERROR("conv1dLayerInit: lq pointer is NULL"); + exit(1); + } + if (lq->forwardMath == NULL) { + PRINT_ERROR("conv1dLayerInit: layerQuant.forwardMath must be set"); + exit(1); + } + if (lq->backwardMath == NULL) { + PRINT_ERROR("conv1dLayerInit: layerQuant.backwardMath must be set"); + exit(1); + } + if (lq->weightStorage == NULL) { + PRINT_ERROR("conv1dLayerInit: layerQuant.weightStorage must be set"); + exit(1); + } + if (hasBias && lq->biasStorage == NULL) { + PRINT_ERROR("conv1dLayerInit: layerQuant.biasStorage must be set when bias is enabled"); + exit(1); + } +} + +/*! Build a heap-owned kernel_t from the conv1dInit_t fields, applying + * zero-init defaults (stride=1, dilation=1, padding=VALID). */ +static kernel_t *buildConv1dKernel(conv1dInit_t *init) { + kernel_t *kernel = reserveMemory(sizeof(kernel_t)); + size_t stride = init->stride == 0 ? 1 : init->stride; + size_t dilation = init->dilation == 0 ? 1 : init->dilation; + initKernel(kernel, init->kernelSize, init->padding, dilation, stride); + return kernel; +} + +layer_t *conv1dLayerInit(conv1dInit_t *init, layerQuant_t *lq) { + validateConv1dInit(init); + bool hasBias = resolveConv1dBias(init->bias); + validateLayerQuantForConv1d(lq, hasBias); + + size_t groups = init->groups == 0 ? 1 : init->groups; + + layer_t *layer = reserveMemory(sizeof(layer_t)); + layer->type = CONV1D; + + layerConfig_t *layerCfg = reserveMemory(sizeof(layerConfig_t)); + conv1dConfig_t *cfg = reserveMemory(sizeof(conv1dConfig_t)); + layerCfg->conv1d = cfg; + layer->config = layerCfg; + + cfg->kernel = buildConv1dKernel(init); + cfg->weights = allocateConv1dWeights(init->outChannels, init->inChannels, groups, + init->kernelSize, lq->weightStorage); + cfg->bias = hasBias ? allocateConv1dBias(init->outChannels, lq->biasStorage) : NULL; + cfg->groups = groups; + cfg->forwardQ = lq->forwardMath; + cfg->weightGradQ = lq->backwardMath; + cfg->biasGradQ = lq->backwardMath; + cfg->propLossQ = lq->backwardMath; + cfg->ownsQuantizations = false; + + return layer; +} + +layer_t *conv1dLayerInitOwning(conv1dInit_t *init, layerQuant_t *lq) { + validateConv1dInit(init); + bool hasBias = resolveConv1dBias(init->bias); + validateLayerQuantForConv1d(lq, hasBias); + + size_t groups = init->groups == 0 ? 1 : init->groups; + + layer_t *layer = reserveMemory(sizeof(layer_t)); + layer->type = CONV1D; + + layerConfig_t *layerCfg = reserveMemory(sizeof(layerConfig_t)); + conv1dConfig_t *cfg = reserveMemory(sizeof(conv1dConfig_t)); + layerCfg->conv1d = cfg; + layer->config = layerCfg; + + cfg->kernel = buildConv1dKernel(init); + /* allocateConv1dWeights / allocateConv1dBias internally clone via getQLike, + * so the parameter tensors own their quantization_t — caller can drop + * lq->weightStorage / lq->biasStorage immediately. */ + cfg->weights = allocateConv1dWeights(init->outChannels, init->inChannels, groups, + init->kernelSize, lq->weightStorage); + cfg->bias = hasBias ? allocateConv1dBias(init->outChannels, lq->biasStorage) : NULL; + cfg->groups = groups; + + /* Owning: deep-copy each of the four math quantizations. Always four + * separate copies (no aliasing), keeping freeConv1dLayer simple. */ + cfg->forwardQ = deepCopyQuantization(lq->forwardMath); + cfg->weightGradQ = deepCopyQuantization(lq->backwardMath); + cfg->biasGradQ = deepCopyQuantization(lq->backwardMath); + cfg->propLossQ = deepCopyQuantization(lq->backwardMath); + cfg->ownsQuantizations = true; + + return layer; +} + +void freeConv1dLayer(layer_t *conv1dLayer) { + if (conv1dLayer == NULL) { + return; + } + conv1dConfig_t *cfg = conv1dLayer->config->conv1d; + + /* Always factory-owned: parameters + kernel. */ + if (cfg->weights != NULL) { + freeParameter(cfg->weights); + } + if (cfg->bias != NULL) { + freeParameter(cfg->bias); + } + freeReservedMemory(cfg->kernel); + + /* Conditionally factory-owned: quantizations (Owning variant only). + * Defensive dedup: the Owning factory in Task 9 allocates four + * separate copies (no aliasing), so the dedup is a no-op there but + * protects against future aliasing. */ + if (cfg->ownsQuantizations) { + if (cfg->forwardQ != NULL) { + freeReservedMemory(cfg->forwardQ->qConfig); + freeReservedMemory(cfg->forwardQ); + } + if (cfg->weightGradQ != NULL && cfg->weightGradQ != cfg->forwardQ) { + freeReservedMemory(cfg->weightGradQ->qConfig); + freeReservedMemory(cfg->weightGradQ); + } + if (cfg->biasGradQ != NULL && cfg->biasGradQ != cfg->forwardQ && + cfg->biasGradQ != cfg->weightGradQ) { + freeReservedMemory(cfg->biasGradQ->qConfig); + freeReservedMemory(cfg->biasGradQ); + } + if (cfg->propLossQ != NULL && cfg->propLossQ != cfg->forwardQ && + cfg->propLossQ != cfg->weightGradQ && cfg->propLossQ != cfg->biasGradQ) { + freeReservedMemory(cfg->propLossQ->qConfig); + freeReservedMemory(cfg->propLossQ); + } + } + + freeReservedMemory(cfg); + freeReservedMemory(conv1dLayer->config); + freeReservedMemory(conv1dLayer); +} diff --git a/src/userApi/layer/Conv1dTransposedApi.c b/src/userApi/layer/Conv1dTransposedApi.c index 589351c..2a8b869 100644 --- a/src/userApi/layer/Conv1dTransposedApi.c +++ b/src/userApi/layer/Conv1dTransposedApi.c @@ -1,8 +1,232 @@ #define SOURCE_FILE "CONV1D_TRANSPOSED_API" -/* Stub. Full implementation lands in Task 12. This file exists so - * Conv1dTransposedApi compiles as a library target for the CMake graph - * to discover; the headers above declare the functions but they will - * link-fail until Task 12 fills them in. */ +#include +#include +#include "Common.h" +#include "Conv1dTransposed.h" #include "Conv1dTransposedApi.h" +#include "Distributions.h" +#include "Kernel.h" +#include "Layer.h" +#include "LayerCommon.h" +#include "LayerQuant.h" +#include "QuantizationApi.h" +#include "StorageApi.h" +#include "Tensor.h" +#include "TensorApi.h" + +static bool resolveConv1dTransposedBias(bias_t b) { + switch (b) { + case BIAS_DEFAULT: + return true; + case BIAS_TRUE: + return true; + case BIAS_FALSE: + return false; + default: + PRINT_ERROR("conv1dTransposedLayerInit: invalid bias value (got %d)", (int)b); + exit(1); + } +} + +static shape_t *buildOwnedShape(const size_t *srcDims, size_t numberOfDims) { + size_t *dims = reserveMemory(numberOfDims * sizeof(size_t)); + for (size_t i = 0; i < numberOfDims; i++) { + dims[i] = srcDims[i]; + } + size_t *order = reserveMemory(numberOfDims * sizeof(size_t)); + setOrderOfDimsForNewTensor(numberOfDims, order); + shape_t *shape = reserveMemory(sizeof(shape_t)); + setShape(shape, dims, numberOfDims, order); + return shape; +} + +static parameter_t *allocateConv1dTransposedWeights(size_t inChannels, size_t outChannels, + size_t groups, size_t kernelSize, + quantization_t *storageQ) { + /* Conv1dTransposed weight shape: [inChannels, outChannels/groups, kernelSize]. + * Note SWAP relative to Conv1d. Per Conv1dTransposed.h:12. */ + if (outChannels % groups != 0) { + PRINT_ERROR("conv1dTransposedLayerInit: outChannels (%zu) must be divisible by " + "groups (%zu)", + outChannels, groups); + exit(1); + } + if (inChannels % groups != 0) { + PRINT_ERROR("conv1dTransposedLayerInit: inChannels (%zu) must be divisible by " + "groups (%zu)", + inChannels, groups); + exit(1); + } + size_t outPerGroup = outChannels / groups; + + shape_t *shape = buildOwnedShape((size_t[]){inChannels, outPerGroup, kernelSize}, 3); + tensor_t *paramTensor = initTensor(shape, getQLike(storageQ), NULL); + + if (storageQ->type != FLOAT32) { + PRINT_ERROR("conv1dTransposedLayerInit: KAIMING_UNIFORM init currently requires FLOAT32 " + "weight storage (Issue C will lift this limit)"); + exit(1); + } + distribution_t dist = { + .type = KAIMING_UNIFORM, + .params.kaiming = {.gain = 1.4142135623730951f, .fanMode = outPerGroup * kernelSize}, + }; + initDistribution(paramTensor, &dist); + + tensor_t *gradTensor = gradInitFloat(paramTensor, NULL); + return parameterInit(paramTensor, gradTensor); +} + +static parameter_t *allocateConv1dTransposedBias(size_t outChannels, quantization_t *storageQ) { + shape_t *shape = buildOwnedShape((size_t[]){outChannels}, 1); + tensor_t *paramTensor = initTensor(shape, getQLike(storageQ), NULL); + tensor_t *gradTensor = gradInitFloat(paramTensor, NULL); + return parameterInit(paramTensor, gradTensor); +} + +static void validateConv1dTransposedInit(conv1dTransposedInit_t *init) { + if (init == NULL) { + PRINT_ERROR("conv1dTransposedLayerInit: init pointer is NULL"); + exit(1); + } + if (init->inChannels == 0) { + PRINT_ERROR("conv1dTransposedLayerInit: inChannels must be > 0"); + exit(1); + } + if (init->outChannels == 0) { + PRINT_ERROR("conv1dTransposedLayerInit: outChannels must be > 0"); + exit(1); + } + if (init->kernelSize == 0) { + PRINT_ERROR("conv1dTransposedLayerInit: kernelSize must be > 0"); + exit(1); + } +} + +static void validateLayerQuantForConv1dTransposed(layerQuant_t *lq, bool hasBias) { + if (lq == NULL) { + PRINT_ERROR("conv1dTransposedLayerInit: lq pointer is NULL"); + exit(1); + } + if (lq->forwardMath == NULL) { + PRINT_ERROR("conv1dTransposedLayerInit: layerQuant.forwardMath must be set"); + exit(1); + } + if (lq->backwardMath == NULL) { + PRINT_ERROR("conv1dTransposedLayerInit: layerQuant.backwardMath must be set"); + exit(1); + } + if (lq->weightStorage == NULL) { + PRINT_ERROR("conv1dTransposedLayerInit: layerQuant.weightStorage must be set"); + exit(1); + } + if (hasBias && lq->biasStorage == NULL) { + PRINT_ERROR("conv1dTransposedLayerInit: layerQuant.biasStorage must be set when bias " + "is enabled"); + exit(1); + } +} + +static kernel_t *buildConv1dTransposedKernel(conv1dTransposedInit_t *init) { + kernel_t *kernel = reserveMemory(sizeof(kernel_t)); + size_t stride = init->stride == 0 ? 1 : init->stride; + size_t dilation = init->dilation == 0 ? 1 : init->dilation; + initKernel(kernel, init->kernelSize, init->padding, dilation, stride); + return kernel; +} + +static layer_t *buildConv1dTransposedLayerSkeleton(conv1dTransposedInit_t *init, layerQuant_t *lq, + bool hasBias, size_t groups) { + layer_t *layer = reserveMemory(sizeof(layer_t)); + layer->type = CONV1D_TRANSPOSED; + + layerConfig_t *layerCfg = reserveMemory(sizeof(layerConfig_t)); + conv1dTransposedConfig_t *cfg = reserveMemory(sizeof(conv1dTransposedConfig_t)); + layerCfg->conv1dTransposed = cfg; + layer->config = layerCfg; + + cfg->kernel = buildConv1dTransposedKernel(init); + cfg->weights = allocateConv1dTransposedWeights(init->inChannels, init->outChannels, groups, + init->kernelSize, lq->weightStorage); + cfg->bias = hasBias ? allocateConv1dTransposedBias(init->outChannels, lq->biasStorage) : NULL; + cfg->groups = groups; + cfg->outputPadding = init->outputPadding; + return layer; +} + +layer_t *conv1dTransposedLayerInit(conv1dTransposedInit_t *init, layerQuant_t *lq) { + validateConv1dTransposedInit(init); + bool hasBias = resolveConv1dTransposedBias(init->bias); + validateLayerQuantForConv1dTransposed(lq, hasBias); + + size_t groups = init->groups == 0 ? 1 : init->groups; + + layer_t *layer = buildConv1dTransposedLayerSkeleton(init, lq, hasBias, groups); + conv1dTransposedConfig_t *cfg = layer->config->conv1dTransposed; + cfg->forwardQ = lq->forwardMath; + cfg->weightGradQ = lq->backwardMath; + cfg->biasGradQ = lq->backwardMath; + cfg->propLossQ = lq->backwardMath; + cfg->ownsQuantizations = false; + return layer; +} + +layer_t *conv1dTransposedLayerInitOwning(conv1dTransposedInit_t *init, layerQuant_t *lq) { + validateConv1dTransposedInit(init); + bool hasBias = resolveConv1dTransposedBias(init->bias); + validateLayerQuantForConv1dTransposed(lq, hasBias); + + size_t groups = init->groups == 0 ? 1 : init->groups; + + layer_t *layer = buildConv1dTransposedLayerSkeleton(init, lq, hasBias, groups); + conv1dTransposedConfig_t *cfg = layer->config->conv1dTransposed; + + cfg->forwardQ = deepCopyQuantization(lq->forwardMath); + cfg->weightGradQ = deepCopyQuantization(lq->backwardMath); + cfg->biasGradQ = deepCopyQuantization(lq->backwardMath); + cfg->propLossQ = deepCopyQuantization(lq->backwardMath); + cfg->ownsQuantizations = true; + return layer; +} + +void freeConv1dTransposedLayer(layer_t *layer) { + if (layer == NULL) { + return; + } + conv1dTransposedConfig_t *cfg = layer->config->conv1dTransposed; + + if (cfg->weights != NULL) { + freeParameter(cfg->weights); + } + if (cfg->bias != NULL) { + freeParameter(cfg->bias); + } + freeReservedMemory(cfg->kernel); + + if (cfg->ownsQuantizations) { + if (cfg->forwardQ != NULL) { + freeReservedMemory(cfg->forwardQ->qConfig); + freeReservedMemory(cfg->forwardQ); + } + if (cfg->weightGradQ != NULL && cfg->weightGradQ != cfg->forwardQ) { + freeReservedMemory(cfg->weightGradQ->qConfig); + freeReservedMemory(cfg->weightGradQ); + } + if (cfg->biasGradQ != NULL && cfg->biasGradQ != cfg->forwardQ && + cfg->biasGradQ != cfg->weightGradQ) { + freeReservedMemory(cfg->biasGradQ->qConfig); + freeReservedMemory(cfg->biasGradQ); + } + if (cfg->propLossQ != NULL && cfg->propLossQ != cfg->forwardQ && + cfg->propLossQ != cfg->weightGradQ && cfg->propLossQ != cfg->biasGradQ) { + freeReservedMemory(cfg->propLossQ->qConfig); + freeReservedMemory(cfg->propLossQ); + } + } + + freeReservedMemory(cfg); + freeReservedMemory(layer->config); + freeReservedMemory(layer); +} diff --git a/src/userApi/layer/Pool1dApi.c b/src/userApi/layer/Pool1dApi.c index d743e54..0fc811c 100644 --- a/src/userApi/layer/Pool1dApi.c +++ b/src/userApi/layer/Pool1dApi.c @@ -1,5 +1,272 @@ #define SOURCE_FILE "POOL1D_API" -/* Stub. Full implementation lands in Tasks 15 and 16. */ +#include +#include +#include "AvgPool1d.h" +#include "Common.h" +#include "Kernel.h" +#include "Layer.h" +#include "LayerQuant.h" +#include "MaxPool1d.h" #include "Pool1dApi.h" +#include "QuantizationApi.h" +#include "StorageApi.h" +#include "Tensor.h" +#include "TensorApi.h" + +/* ============================================================================ + * Shared helpers + * ========================================================================== */ + +/*! Compute output length per the geometry rule used by the internal + * windowGeometry1dCalc, replicated here for factory pre-allocation + * without bringing in SlidingWindow1d. + * + * VALID: outputLength = (inputLength - dilation*(kernelSize - 1) - 1) / stride + 1 + * SAME: outputLength = ceil(inputLength / stride) + * + * Matches the runtime windowGeometry1dCalc result used by both pool + * layers' forward paths. */ +static size_t computePool1dOutputLength(paddingType_t padding, size_t inputLength, + size_t kernelSize, size_t dilation, size_t stride) { + if (padding == SAME) { + return (inputLength + stride - 1) / stride; + } + /* VALID */ + size_t effectiveK = dilation * (kernelSize - 1) + 1; + if (effectiveK > inputLength) { + PRINT_ERROR("Pool1d: effective kernel %zu exceeds inputLength %zu", effectiveK, + inputLength); + exit(1); + } + return (inputLength - effectiveK) / stride + 1; +} + +/* ============================================================================ + * MaxPool1d + * ========================================================================== */ + +static void validateMaxPool1dInit(maxPool1dInit_t *init) { + if (init == NULL) { + PRINT_ERROR("maxPool1dLayerInit: init pointer is NULL"); + exit(1); + } + if (init->kernelSize == 0) { + PRINT_ERROR("maxPool1dLayerInit: kernelSize must be > 0"); + exit(1); + } + if (init->inputChannels == 0) { + PRINT_ERROR("maxPool1dLayerInit: inputChannels must be > 0"); + exit(1); + } + if (init->inputLength == 0) { + PRINT_ERROR("maxPool1dLayerInit: inputLength must be > 0"); + exit(1); + } +} + +static void validateLayerQuantForMaxPool1d(layerQuant_t *lq) { + if (lq == NULL) { + PRINT_ERROR("maxPool1dLayerInit: lq pointer is NULL"); + exit(1); + } + if (lq->forwardMath == NULL) { + PRINT_ERROR("maxPool1dLayerInit: layerQuant.forwardMath must be set"); + exit(1); + } + if (lq->backwardMath == NULL) { + PRINT_ERROR("maxPool1dLayerInit: layerQuant.backwardMath must be set"); + exit(1); + } +} + +static shape_t *buildOwnedShape(const size_t *srcDims, size_t numberOfDims) { + size_t *dims = reserveMemory(numberOfDims * sizeof(size_t)); + for (size_t i = 0; i < numberOfDims; i++) { + dims[i] = srcDims[i]; + } + size_t *order = reserveMemory(numberOfDims * sizeof(size_t)); + setOrderOfDimsForNewTensor(numberOfDims, order); + shape_t *shape = reserveMemory(sizeof(shape_t)); + setShape(shape, dims, numberOfDims, order); + return shape; +} + +static tensor_t *buildMaxPool1dArgmax(size_t inputChannels, size_t outputLength) { + /* Argmax buffer is sized for batch=1 (training_batch iterates microbatch- + * by-microbatch in this framework). Shape: [1, inputChannels, outputLength]. */ + shape_t *shape = buildOwnedShape((size_t[]){1, inputChannels, outputLength}, 3); + quantization_t *q = quantizationInitInt32(); + return initTensor(shape, q, NULL); +} + +static layer_t *buildMaxPool1dLayerSkeleton(maxPool1dInit_t *init) { + size_t stride = init->stride == 0 ? init->kernelSize : init->stride; + size_t dilation = init->dilation == 0 ? 1 : init->dilation; + + kernel_t *kernel = reserveMemory(sizeof(kernel_t)); + initKernel(kernel, init->kernelSize, init->padding, dilation, stride); + + size_t outputLength = computePool1dOutputLength(init->padding, init->inputLength, + init->kernelSize, dilation, stride); + tensor_t *argmax = buildMaxPool1dArgmax(init->inputChannels, outputLength); + + layer_t *layer = reserveMemory(sizeof(layer_t)); + layer->type = MAXPOOL1D; + layerConfig_t *layerCfg = reserveMemory(sizeof(layerConfig_t)); + maxPool1dConfig_t *cfg = reserveMemory(sizeof(maxPool1dConfig_t)); + layerCfg->maxPool1d = cfg; + layer->config = layerCfg; + + cfg->kernel = kernel; + cfg->argmaxIndices = argmax; + return layer; +} + +layer_t *maxPool1dLayerInit(maxPool1dInit_t *init, layerQuant_t *lq) { + validateMaxPool1dInit(init); + validateLayerQuantForMaxPool1d(lq); + + layer_t *layer = buildMaxPool1dLayerSkeleton(init); + maxPool1dConfig_t *cfg = layer->config->maxPool1d; + cfg->forwardQ = lq->forwardMath; + cfg->propLossQ = lq->backwardMath; + cfg->ownsQuantizations = false; + return layer; +} + +layer_t *maxPool1dLayerInitOwning(maxPool1dInit_t *init, layerQuant_t *lq) { + validateMaxPool1dInit(init); + validateLayerQuantForMaxPool1d(lq); + + layer_t *layer = buildMaxPool1dLayerSkeleton(init); + maxPool1dConfig_t *cfg = layer->config->maxPool1d; + cfg->forwardQ = deepCopyQuantization(lq->forwardMath); + cfg->propLossQ = deepCopyQuantization(lq->backwardMath); + cfg->ownsQuantizations = true; + return layer; +} + +void freeMaxPool1dLayer(layer_t *layer) { + if (layer == NULL) { + return; + } + maxPool1dConfig_t *cfg = layer->config->maxPool1d; + + freeReservedMemory(cfg->kernel); + if (cfg->argmaxIndices != NULL) { + freeTensor(cfg->argmaxIndices); + } + + if (cfg->ownsQuantizations) { + if (cfg->forwardQ != NULL) { + freeReservedMemory(cfg->forwardQ->qConfig); + freeReservedMemory(cfg->forwardQ); + } + if (cfg->propLossQ != NULL && cfg->propLossQ != cfg->forwardQ) { + freeReservedMemory(cfg->propLossQ->qConfig); + freeReservedMemory(cfg->propLossQ); + } + } + + freeReservedMemory(cfg); + freeReservedMemory(layer->config); + freeReservedMemory(layer); +} + +/* ============================================================================ + * AvgPool1d + * ========================================================================== */ + +static void validateAvgPool1dInit(avgPool1dInit_t *init) { + if (init == NULL) { + PRINT_ERROR("avgPool1dLayerInit: init pointer is NULL"); + exit(1); + } + if (init->kernelSize == 0) { + PRINT_ERROR("avgPool1dLayerInit: kernelSize must be > 0"); + exit(1); + } +} + +static void validateLayerQuantForAvgPool1d(layerQuant_t *lq) { + if (lq == NULL) { + PRINT_ERROR("avgPool1dLayerInit: lq pointer is NULL"); + exit(1); + } + if (lq->forwardMath == NULL) { + PRINT_ERROR("avgPool1dLayerInit: layerQuant.forwardMath must be set"); + exit(1); + } + if (lq->backwardMath == NULL) { + PRINT_ERROR("avgPool1dLayerInit: layerQuant.backwardMath must be set"); + exit(1); + } +} + +static layer_t *buildAvgPool1dLayerSkeleton(avgPool1dInit_t *init) { + size_t stride = init->stride == 0 ? init->kernelSize : init->stride; + + kernel_t *kernel = reserveMemory(sizeof(kernel_t)); + /* AvgPool1d has no dilation (kernel doesn't support it); pass 1. */ + initKernel(kernel, init->kernelSize, init->padding, /*dilation*/ 1, stride); + + layer_t *layer = reserveMemory(sizeof(layer_t)); + layer->type = AVGPOOL1D; + layerConfig_t *layerCfg = reserveMemory(sizeof(layerConfig_t)); + avgPool1dConfig_t *cfg = reserveMemory(sizeof(avgPool1dConfig_t)); + layerCfg->avgPool1d = cfg; + layer->config = layerCfg; + + cfg->kernel = kernel; + return layer; +} + +layer_t *avgPool1dLayerInit(avgPool1dInit_t *init, layerQuant_t *lq) { + validateAvgPool1dInit(init); + validateLayerQuantForAvgPool1d(lq); + + layer_t *layer = buildAvgPool1dLayerSkeleton(init); + avgPool1dConfig_t *cfg = layer->config->avgPool1d; + cfg->forwardQ = lq->forwardMath; + cfg->propLossQ = lq->backwardMath; + cfg->ownsQuantizations = false; + return layer; +} + +layer_t *avgPool1dLayerInitOwning(avgPool1dInit_t *init, layerQuant_t *lq) { + validateAvgPool1dInit(init); + validateLayerQuantForAvgPool1d(lq); + + layer_t *layer = buildAvgPool1dLayerSkeleton(init); + avgPool1dConfig_t *cfg = layer->config->avgPool1d; + cfg->forwardQ = deepCopyQuantization(lq->forwardMath); + cfg->propLossQ = deepCopyQuantization(lq->backwardMath); + cfg->ownsQuantizations = true; + return layer; +} + +void freeAvgPool1dLayer(layer_t *layer) { + if (layer == NULL) { + return; + } + avgPool1dConfig_t *cfg = layer->config->avgPool1d; + + freeReservedMemory(cfg->kernel); + + if (cfg->ownsQuantizations) { + if (cfg->forwardQ != NULL) { + freeReservedMemory(cfg->forwardQ->qConfig); + freeReservedMemory(cfg->forwardQ); + } + if (cfg->propLossQ != NULL && cfg->propLossQ != cfg->forwardQ) { + freeReservedMemory(cfg->propLossQ->qConfig); + freeReservedMemory(cfg->propLossQ); + } + } + + freeReservedMemory(cfg); + freeReservedMemory(layer->config); + freeReservedMemory(layer); +} diff --git a/src/userApi/layer/SoftmaxApi.c b/src/userApi/layer/SoftmaxApi.c index df4fe8e..0cae60c 100644 --- a/src/userApi/layer/SoftmaxApi.c +++ b/src/userApi/layer/SoftmaxApi.c @@ -1,7 +1,12 @@ #define SOURCE_FILE "SOFTMAX_API" -#include "SoftmaxApi.h" +#include +#include + +#include "Common.h" +#include "LayerQuant.h" #include "Softmax.h" +#include "SoftmaxApi.h" #include "StorageApi.h" layer_t *softmaxLayerInitLegacy(quantization_t *forwardQ, quantization_t *backwardQ) { @@ -26,3 +31,79 @@ void freeSoftmaxLayerLegacy(layer_t *softmaxLayer) { freeReservedMemory(softmaxLayer->config); freeReservedMemory(softmaxLayer); } + +/* ============================================================================ + * New factory API — layerQuant_t profile (PR 2). + * ========================================================================== */ + +static void validateLayerQuantForSoftmax(layerQuant_t *lq) { + if (lq == NULL) { + PRINT_ERROR("softmaxLayerInit: lq pointer is NULL"); + exit(1); + } + if (lq->forwardMath == NULL) { + PRINT_ERROR("softmaxLayerInit: layerQuant.forwardMath must be set"); + exit(1); + } + if (lq->backwardMath == NULL) { + PRINT_ERROR("softmaxLayerInit: layerQuant.backwardMath must be set"); + exit(1); + } +} + +layer_t *softmaxLayerInit(layerQuant_t *lq) { + validateLayerQuantForSoftmax(lq); + + layer_t *layer = reserveMemory(sizeof(layer_t)); + layer->type = SOFTMAX; + + layerConfig_t *layerCfg = reserveMemory(sizeof(layerConfig_t)); + softmaxConfig_t *cfg = reserveMemory(sizeof(softmaxConfig_t)); + layerCfg->softmax = cfg; + layer->config = layerCfg; + + cfg->forwardQ = lq->forwardMath; + cfg->backwardQ = lq->backwardMath; + cfg->ownsQuantizations = false; + + return layer; +} + +layer_t *softmaxLayerInitOwning(layerQuant_t *lq) { + validateLayerQuantForSoftmax(lq); + + layer_t *layer = reserveMemory(sizeof(layer_t)); + layer->type = SOFTMAX; + + layerConfig_t *layerCfg = reserveMemory(sizeof(layerConfig_t)); + softmaxConfig_t *cfg = reserveMemory(sizeof(softmaxConfig_t)); + layerCfg->softmax = cfg; + layer->config = layerCfg; + + cfg->forwardQ = deepCopyQuantization(lq->forwardMath); + cfg->backwardQ = deepCopyQuantization(lq->backwardMath); + cfg->ownsQuantizations = true; + + return layer; +} + +void freeSoftmaxLayer(layer_t *softmaxLayer) { + if (softmaxLayer == NULL) { + return; + } + softmaxConfig_t *cfg = softmaxLayer->config->softmax; + + if (cfg->ownsQuantizations) { + if (cfg->forwardQ != NULL) { + freeReservedMemory(cfg->forwardQ->qConfig); + freeReservedMemory(cfg->forwardQ); + } + if (cfg->backwardQ != NULL && cfg->backwardQ != cfg->forwardQ) { + freeReservedMemory(cfg->backwardQ->qConfig); + freeReservedMemory(cfg->backwardQ); + } + } + freeReservedMemory(cfg); + freeReservedMemory(softmaxLayer->config); + freeReservedMemory(softmaxLayer); +} diff --git a/src/userApi/layer/include/SoftmaxApi.h b/src/userApi/layer/include/SoftmaxApi.h index beaa289..0498359 100644 --- a/src/userApi/layer/include/SoftmaxApi.h +++ b/src/userApi/layer/include/SoftmaxApi.h @@ -2,10 +2,23 @@ #define SOFTMAXAPI_H #include "Layer.h" +#include "LayerQuant.h" #include "Tensor.h" /* Legacy (pre-2026-05-15 factory API) — retained during PR 1/2 coexistence window. */ layer_t *softmaxLayerInitLegacy(quantization_t *forwardQ, quantization_t *backwardQ); void freeSoftmaxLayerLegacy(layer_t *softmaxLayer); +/*! Borrowing variant — stores lq->forwardMath in forwardQ and + * lq->backwardMath in backwardQ verbatim. */ +layer_t *softmaxLayerInit(layerQuant_t *lq); + +/*! Owning variant — deep-copies forwardMath + backwardMath via + * deepCopyQuantization. */ +layer_t *softmaxLayerInitOwning(layerQuant_t *lq); + +/*! Tears down the layer. Reads config->ownsQuantizations to decide + * whether to also free the two quantization_t and their qConfigs. */ +void freeSoftmaxLayer(layer_t *softmaxLayer); + #endif // SOFTMAXAPI_H diff --git a/test/unit/layer/CMakeLists.txt b/test/unit/layer/CMakeLists.txt index 5216d1c..367107c 100644 --- a/test/unit/layer/CMakeLists.txt +++ b/test/unit/layer/CMakeLists.txt @@ -99,6 +99,9 @@ add_elastic_ai_unit_test( TensorApi SoftmaxApi QuantizationApi + LayerQuant + Quantization + Rounding StorageApi ) diff --git a/test/unit/layer/UnitTestSoftmax.c b/test/unit/layer/UnitTestSoftmax.c index 087116b..ba880eb 100644 --- a/test/unit/layer/UnitTestSoftmax.c +++ b/test/unit/layer/UnitTestSoftmax.c @@ -1,6 +1,8 @@ #include +#include "LayerQuant.h" #include "QuantizationApi.h" +#include "Softmax.h" #include "SoftmaxApi.h" #include "StorageApi.h" #include "Tensor.h" @@ -280,6 +282,54 @@ void testSoftmaxLayerInitAndFreeRoundTrip(void) { freeQuantization(floatQ); } +/* ============================================================================ + * Tests for the new layerQuant_t-based Softmax factory (PR 2). + * ========================================================================== */ + +void testSoftmaxLayerInitBorrowingStoresLqPointers(void) { + quantization_t *qFwd = quantizationInitFloat(); + quantization_t *qBwd = quantizationInitFloat(); + layerQuant_t lq = { + .forwardMath = qFwd, + .backwardMath = qBwd, + }; + + layer_t *layer = softmaxLayerInit(&lq); + + TEST_ASSERT_NOT_NULL(layer); + TEST_ASSERT_EQUAL_INT(SOFTMAX, layer->type); + + softmaxConfig_t *cfg = layer->config->softmax; + TEST_ASSERT_EQUAL_PTR(qFwd, cfg->forwardQ); + TEST_ASSERT_EQUAL_PTR(qBwd, cfg->backwardQ); + TEST_ASSERT_FALSE(cfg->ownsQuantizations); + + freeSoftmaxLayer(layer); + freeQuantization(qFwd); + freeQuantization(qBwd); +} + +void testSoftmaxLayerInitOwningDeepCopiesLqPointers(void) { + quantization_t *qFwd = quantizationInitFloat(); + quantization_t *qBwd = quantizationInitFloat(); + layerQuant_t lq = { + .forwardMath = qFwd, + .backwardMath = qBwd, + }; + + layer_t *layer = softmaxLayerInitOwning(&lq); + + softmaxConfig_t *cfg = layer->config->softmax; + TEST_ASSERT_NOT_EQUAL(qFwd, cfg->forwardQ); + TEST_ASSERT_NOT_EQUAL(qBwd, cfg->backwardQ); + TEST_ASSERT_EQUAL_INT(qFwd->type, cfg->forwardQ->type); + TEST_ASSERT_TRUE(cfg->ownsQuantizations); + + freeSoftmaxLayer(layer); + freeQuantization(qFwd); + freeQuantization(qBwd); +} + void setUp() {} void tearDown() {} @@ -292,5 +342,7 @@ int main() { RUN_TEST(unitTestSoftmaxBackwardSymInt32); RUN_TEST(testSoftmaxLayerInitAndFreeRoundTrip); + RUN_TEST(testSoftmaxLayerInitBorrowingStoresLqPointers); + RUN_TEST(testSoftmaxLayerInitOwningDeepCopiesLqPointers); return UNITY_END(); } diff --git a/test/unit/loss_functions/CMakeLists.txt b/test/unit/loss_functions/CMakeLists.txt index 0aca768..c91b357 100644 --- a/test/unit/loss_functions/CMakeLists.txt +++ b/test/unit/loss_functions/CMakeLists.txt @@ -6,6 +6,7 @@ add_elastic_ai_unit_test( Log SoftmaxApi QuantizationApi + LayerQuant TensorApi ) diff --git a/test/unit/userAPI/UnitTestConv1dApi.c b/test/unit/userAPI/UnitTestConv1dApi.c index a605a9b..072aee7 100644 --- a/test/unit/userAPI/UnitTestConv1dApi.c +++ b/test/unit/userAPI/UnitTestConv1dApi.c @@ -8,6 +8,7 @@ #include "LayerQuant.h" #include "QuantizationApi.h" #include "Tensor.h" +#include "TensorApi.h" #include "unity.h" void setUp() {} @@ -71,6 +72,7 @@ void testConv1dLayerInitBorrowingBuildsLayerWithCorrectShape(void) { TEST_ASSERT_EQUAL_UINT(1, cfg->groups); freeConv1dLayer(layer); + freeQuantization(q); } void testConv1dLayerInitBorrowingBiasDefaultResolvesToTrue(void) { @@ -91,6 +93,7 @@ void testConv1dLayerInitBorrowingBiasDefaultResolvesToTrue(void) { TEST_ASSERT_NOT_NULL(cfg->bias); freeConv1dLayer(layer); + freeQuantization(q); } void testConv1dLayerInitBorrowingBiasFalseLeavesBiasNull(void) { @@ -111,6 +114,7 @@ void testConv1dLayerInitBorrowingBiasFalseLeavesBiasNull(void) { TEST_ASSERT_NULL(cfg->bias); freeConv1dLayer(layer); + freeQuantization(q); } void testConv1dLayerInitBorrowingPaddingDefaultIsValid(void) { @@ -135,6 +139,61 @@ void testConv1dLayerInitBorrowingPaddingDefaultIsValid(void) { TEST_ASSERT_EQUAL_UINT(1, cfg->groups); freeConv1dLayer(layer); + freeQuantization(q); +} + +void testConv1dLayerInitOwningDeepCopiesQuantizations(void) { + quantization_t *q = quantizationInitFloat(); + layerQuant_t lq; + layerQuantInitUniform(&lq, q); + + layer_t *layer = conv1dLayerInitOwning( + &(conv1dInit_t){ + .inChannels = 3, + .outChannels = 4, + .kernelSize = 5, + .bias = BIAS_TRUE, + }, + &lq); + + conv1dConfig_t *cfg = layer->config->conv1d; + + /* Owning variant: cfg->forwardQ is a fresh allocation, NOT the original q */ + TEST_ASSERT_NOT_EQUAL(q, cfg->forwardQ); + TEST_ASSERT_NOT_EQUAL(q, cfg->weightGradQ); + TEST_ASSERT_NOT_EQUAL(q, cfg->biasGradQ); + TEST_ASSERT_NOT_EQUAL(q, cfg->propLossQ); + + /* But the copy has equal type to the original */ + TEST_ASSERT_EQUAL_INT(q->type, cfg->forwardQ->type); + + /* ownsQuantizations flag is set */ + TEST_ASSERT_TRUE(cfg->ownsQuantizations); + + freeConv1dLayer(layer); + freeQuantization(q); +} + +void testConv1dLayerInitOwningFreesAllAllocationsWithoutLeak(void) { + /* Build + free 5 layers — if anything leaks, LSan catches it in CI. */ + for (int i = 0; i < 5; i++) { + quantization_t *q = quantizationInitFloat(); + layerQuant_t lq; + layerQuantInitUniform(&lq, q); + + layer_t *layer = conv1dLayerInitOwning( + &(conv1dInit_t){ + .inChannels = 8, + .outChannels = 4, + .kernelSize = 3, + .bias = BIAS_TRUE, + }, + &lq); + + freeConv1dLayer(layer); + freeQuantization(q); + } + TEST_PASS(); } int main(void) { @@ -143,5 +202,7 @@ int main(void) { RUN_TEST(testConv1dLayerInitBorrowingBiasDefaultResolvesToTrue); RUN_TEST(testConv1dLayerInitBorrowingBiasFalseLeavesBiasNull); RUN_TEST(testConv1dLayerInitBorrowingPaddingDefaultIsValid); + RUN_TEST(testConv1dLayerInitOwningDeepCopiesQuantizations); + RUN_TEST(testConv1dLayerInitOwningFreesAllAllocationsWithoutLeak); return UNITY_END(); } diff --git a/test/unit/userAPI/UnitTestConv1dTransposedApi.c b/test/unit/userAPI/UnitTestConv1dTransposedApi.c index 4725fd0..d1a55af 100644 --- a/test/unit/userAPI/UnitTestConv1dTransposedApi.c +++ b/test/unit/userAPI/UnitTestConv1dTransposedApi.c @@ -118,10 +118,60 @@ void testConv1dTransposedLayerInitBorrowingOutputPaddingPropagatesToConfig(void) freeQuantization(q); } +void testConv1dTransposedLayerInitOwningDeepCopiesQuantizations(void) { + quantization_t *q = quantizationInitFloat(); + layerQuant_t lq; + layerQuantInitUniform(&lq, q); + + layer_t *layer = conv1dTransposedLayerInitOwning( + &(conv1dTransposedInit_t){ + .inChannels = 8, + .outChannels = 4, + .kernelSize = 3, + .bias = BIAS_TRUE, + }, + &lq); + + conv1dTransposedConfig_t *cfg = layer->config->conv1dTransposed; + + TEST_ASSERT_NOT_EQUAL(q, cfg->forwardQ); + TEST_ASSERT_NOT_EQUAL(q, cfg->weightGradQ); + TEST_ASSERT_NOT_EQUAL(q, cfg->biasGradQ); + TEST_ASSERT_NOT_EQUAL(q, cfg->propLossQ); + TEST_ASSERT_EQUAL_INT(q->type, cfg->forwardQ->type); + TEST_ASSERT_TRUE(cfg->ownsQuantizations); + + freeConv1dTransposedLayer(layer); + freeQuantization(q); +} + +void testConv1dTransposedLayerInitOwningFreesAllAllocationsWithoutLeak(void) { + for (int i = 0; i < 5; i++) { + quantization_t *q = quantizationInitFloat(); + layerQuant_t lq; + layerQuantInitUniform(&lq, q); + + layer_t *layer = conv1dTransposedLayerInitOwning( + &(conv1dTransposedInit_t){ + .inChannels = 4, + .outChannels = 2, + .kernelSize = 3, + .bias = BIAS_TRUE, + }, + &lq); + + freeConv1dTransposedLayer(layer); + freeQuantization(q); + } + TEST_PASS(); +} + int main(void) { UNITY_BEGIN(); RUN_TEST(testConv1dTransposedLayerInitBorrowingBuildsLayerWithCorrectShape); RUN_TEST(testConv1dTransposedLayerInitBorrowingBiasFalseLeavesBiasNull); RUN_TEST(testConv1dTransposedLayerInitBorrowingOutputPaddingPropagatesToConfig); + RUN_TEST(testConv1dTransposedLayerInitOwningDeepCopiesQuantizations); + RUN_TEST(testConv1dTransposedLayerInitOwningFreesAllAllocationsWithoutLeak); return UNITY_END(); } From 96f0639a4f7bb967dfbd539acfa369cc1b88ccd8 Mon Sep 17 00:00:00 2001 From: Leo Buron Date: Fri, 15 May 2026 22:21:02 +0200 Subject: [PATCH 3/4] feat(examples): emit per-layer state_dict .npy files post-training (HAR + ECG) train_pytorch.py for both examples now writes the trained model's per-layer weight + bias tensors to examples//weights/. This is the input to the v2 binary's BIT_PARITY mode introduced in PR 2 Tasks 20/21, which loads these via modelLoadStateDict and runs inference for the CI bit-parity gate. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 10 feat(examples): add har_classifier_v2 using new factory API Same architecture as the legacy har_classifier binary (Conv->ReLU->Pool x3 + Flatten + Linear + Softmax) but constructed via conv1dLayerInit, reluLayerInit, maxPool1dLayerInit, avgPool1dLayerInit, flattenLayerInit, linearLayerInit, softmaxLayerInit. Shares the legacy data directory. Outputs to examples/har_classifier_v2/{logs,outputs}/. Supports BIT_PARITY env-var mode (used by the bit-parity CI step) which loads PyTorch state_dict via modelLoadStateDict and skips training. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 10 (coexistence strategy Z) feat(examples): add ecg_anomaly_ae_v2 using new factory API Encoder/decoder AE same as legacy ecg_anomaly_ae but built via conv1dLayerInit / reluLayerInit / maxPool1dLayerInit / avgPool1dLayerInit / conv1dTransposedLayerInit. Supports BIT_PARITY env-var mode using modelLoadStateDict. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 10 --- examples/CMakeLists.txt | 2 + examples/ecg_anomaly_ae/train_pytorch.py | 27 ++ examples/ecg_anomaly_ae_v2/CMakeLists.txt | 56 ++++ examples/ecg_anomaly_ae_v2/train_c.c | 356 ++++++++++++++++++++ examples/har_classifier/train_pytorch.py | 25 ++ examples/har_classifier_v2/CMakeLists.txt | 62 ++++ examples/har_classifier_v2/train_c.c | 387 ++++++++++++++++++++++ 7 files changed, 915 insertions(+) create mode 100644 examples/ecg_anomaly_ae_v2/CMakeLists.txt create mode 100644 examples/ecg_anomaly_ae_v2/train_c.c create mode 100644 examples/har_classifier_v2/CMakeLists.txt create mode 100644 examples/har_classifier_v2/train_c.c diff --git a/examples/CMakeLists.txt b/examples/CMakeLists.txt index 7872c09..2f2fbaa 100644 --- a/examples/CMakeLists.txt +++ b/examples/CMakeLists.txt @@ -1,3 +1,5 @@ add_subdirectory(_shared) add_subdirectory(har_classifier) +add_subdirectory(har_classifier_v2) add_subdirectory(ecg_anomaly_ae) +add_subdirectory(ecg_anomaly_ae_v2) diff --git a/examples/ecg_anomaly_ae/train_pytorch.py b/examples/ecg_anomaly_ae/train_pytorch.py index 81b5f99..7ef0f64 100644 --- a/examples/ecg_anomaly_ae/train_pytorch.py +++ b/examples/ecg_anomaly_ae/train_pytorch.py @@ -188,6 +188,33 @@ def main() -> None: np.save(OUTPUTS / "pytorch_train_recons.npy", pt_train_recons.astype(np.float32)) print(f"FINAL test_loss={test_loss:.6f}", flush=True) + # Save per-layer weights for the C-side BIT_PARITY mode. + # C-side expects: examples/ecg_anomaly_ae/weights/.{weight,bias}.npy + # Where in {e1, e2, d1, d2, d3} matches the order in v2's buildModel. + import os + + weights_dir = HERE / "weights" + os.makedirs(weights_dir, exist_ok=True) + + # Keys match C-side loadStateDictFromDir() names; values are actual PyTorch attrs. + layer_map = { + "e1": model.enc1, # Conv1d(1->8, K=7, S=2) + "e2": model.enc2, # Conv1d(8->16, K=5) + "d1": model.dec1, # ConvTranspose1d(16->8, K=5, S=5) + "d2": model.dec2, # ConvTranspose1d(8->4, K=2, S=2) + "d3": model.dec3, # ConvTranspose1d(4->1, K=2, S=2) + } + + print("Saving per-layer weights:", flush=True) + for name, layer in layer_map.items(): + w = layer.weight.detach().cpu().numpy().astype(np.float32) + np.save(weights_dir / f"{name}.weight.npy", w) + if layer.bias is not None: + b = layer.bias.detach().cpu().numpy().astype(np.float32) + np.save(weights_dir / f"{name}.bias.npy", b) + has_bias = f" + {name}.bias.npy" if layer.bias is not None else "" + print(f" wrote {name}.weight.npy shape={w.shape}{has_bias}", flush=True) + if __name__ == "__main__": main() diff --git a/examples/ecg_anomaly_ae_v2/CMakeLists.txt b/examples/ecg_anomaly_ae_v2/CMakeLists.txt new file mode 100644 index 0000000..d9a9c07 --- /dev/null +++ b/examples/ecg_anomaly_ae_v2/CMakeLists.txt @@ -0,0 +1,56 @@ +add_executable(train_c_ecg_anomaly_ae_v2 train_c.c) + +target_link_libraries(train_c_ecg_anomaly_ae_v2 PRIVATE + DataLoaderApi + DataLoader + NPYLoaderApi + NPYLoader + + Layer + + Conv1dApi + Conv1d + + Conv1dTransposedApi + Conv1dTransposed + + ReluApi + Relu + + Pool1dApi + MaxPool1d + AvgPool1d + + QuantizationApi + Quantization + + TensorApi + Tensor + Rounding + + TrainingLoopApi + CalculateGradsSequential + TrainingBatchDefault + TrainingEpochDefault + Optimizer + + LossFunction + MSE + + Sgd + SgdApi + + InferenceApi + + StateDictApi + LayerWeightsApi + LayerQuant + LayerCommon + Distributions + + Common + StorageApi + RNG + + examples_shared +) diff --git a/examples/ecg_anomaly_ae_v2/train_c.c b/examples/ecg_anomaly_ae_v2/train_c.c new file mode 100644 index 0000000..06c3a32 --- /dev/null +++ b/examples/ecg_anomaly_ae_v2/train_c.c @@ -0,0 +1,356 @@ +#define SOURCE_FILE "ecg_anomaly_ae_v2_train_c" + +#include +#include +#include +#include +#include +#include +#include + +#include "CalculateGradsSequential.h" +#include "Common.h" +#include "Conv1dApi.h" +#include "Conv1dTransposedApi.h" +#include "DataLoader.h" +#include "DataLoaderApi.h" +#include "InferenceApi.h" +#include "Layer.h" +#include "LayerCommon.h" +#include "LayerQuant.h" +#include "LossFunction.h" +#include "NPYLoaderApi.h" +#include "Pool1dApi.h" +#include "Quantization.h" +#include "QuantizationApi.h" +#include "ReluApi.h" +#include "SgdApi.h" +#include "StateDictApi.h" +#include "StorageApi.h" +#include "Tensor.h" +#include "TensorApi.h" +#include "TrainingLoopApi.h" + +#include "npy_writer.h" + +#define EPOCHS 200 +#define BATCH 32 +#define LR 0.005f +#define MOMENTUM 0.9f +#define SEED 42 +#define SHUFFLE_SEED 42 + +#define IN_CHANNELS 1 +#define LEN_INPUT 140 + +#define E1_OUT 8 +#define E1_K 7 +#define E1_S 2 +#define E2_OUT 16 +#define E2_K 5 + +#define D1_OUT 8 +#define D1_K 5 +#define D1_S 5 +#define D2_OUT 4 +#define D2_K 2 +#define D2_S 2 +#define D3_OUT 1 +#define D3_K 2 +#define D3_S 2 + +#define MODEL_SIZE 11 + +static dataset_t g_trainDataset; +static dataset_t g_valDataset; +static dataset_t g_testDataset; + +static void reshapeItemsAddBatchDim(tensorArray_t *items) { + for (size_t i = 0; i < items->size; ++i) { + tensor_t *t = items->array[i]; + size_t oldRank = t->shape->numberOfDimensions; + size_t newRank = oldRank + 1; + + size_t *newDims = reserveMemory(newRank * sizeof(size_t)); + size_t *newOrder = reserveMemory(newRank * sizeof(size_t)); + newDims[0] = 1; + for (size_t d = 0; d < oldRank; ++d) { + newDims[d + 1] = t->shape->dimensions[d]; + } + for (size_t d = 0; d < newRank; ++d) { + newOrder[d] = d; + } + + freeReservedMemory(t->shape->dimensions); + freeReservedMemory(t->shape->orderOfDimensions); + t->shape->dimensions = newDims; + t->shape->orderOfDimensions = newOrder; + t->shape->numberOfDimensions = newRank; + } +} + +static void initDataSets(void) { + tensorArray_t *trainItems = npyLoad("examples/ecg_anomaly_ae/data/train_x.npy"); + tensorArray_t *trainLabels = npyLoad("examples/ecg_anomaly_ae/data/train_x.npy"); + reshapeItemsAddBatchDim(trainItems); + reshapeItemsAddBatchDim(trainLabels); + g_trainDataset.items = trainItems; + g_trainDataset.labels = trainLabels; + + tensorArray_t *valItems = npyLoad("examples/ecg_anomaly_ae/data/val_x.npy"); + tensorArray_t *valLabels = npyLoad("examples/ecg_anomaly_ae/data/val_x.npy"); + reshapeItemsAddBatchDim(valItems); + reshapeItemsAddBatchDim(valLabels); + g_valDataset.items = valItems; + g_valDataset.labels = valLabels; + + tensorArray_t *testItems = npyLoad("examples/ecg_anomaly_ae/data/test_x.npy"); + tensorArray_t *testLabels = npyLoad("examples/ecg_anomaly_ae/data/test_x.npy"); + reshapeItemsAddBatchDim(testItems); + reshapeItemsAddBatchDim(testLabels); + g_testDataset.items = testItems; + g_testDataset.labels = testLabels; +} + +static sample_t *getTrainSample(size_t id) { + return npyGetSample(&g_trainDataset, id); +} +static sample_t *getValSample(size_t id) { + return npyGetSample(&g_valDataset, id); +} +static sample_t *getTestSample(size_t id) { + return npyGetSample(&g_testDataset, id); +} +static size_t getTrainSize(void) { + return g_trainDataset.items->size; +} +static size_t getValSize(void) { + return g_valDataset.items->size; +} +static size_t getTestSize(void) { + return g_testDataset.items->size; +} + +static void buildModel(layer_t **model, layerQuant_t *lq) { + /* Encoder */ + model[0] = conv1dLayerInit(&(conv1dInit_t){.inChannels = IN_CHANNELS, + .outChannels = E1_OUT, + .kernelSize = E1_K, + .stride = E1_S, + .padding = SAME}, + lq); + model[1] = reluLayerInit(lq); + model[2] = maxPool1dLayerInit( + &(maxPool1dInit_t){ + .kernelSize = 2, .stride = 2, .inputChannels = E1_OUT, .inputLength = LEN_INPUT / E1_S}, + lq); + + model[3] = conv1dLayerInit( + &(conv1dInit_t){ + .inChannels = E1_OUT, .outChannels = E2_OUT, .kernelSize = E2_K, .padding = SAME}, + lq); + model[4] = reluLayerInit(lq); + model[5] = avgPool1dLayerInit(&(avgPool1dInit_t){.kernelSize = 5, .stride = 5}, lq); + + /* Decoder */ + model[6] = conv1dTransposedLayerInit( + &(conv1dTransposedInit_t){ + .inChannels = E2_OUT, .outChannels = D1_OUT, .kernelSize = D1_K, .stride = D1_S}, + lq); + model[7] = reluLayerInit(lq); + + model[8] = conv1dTransposedLayerInit( + &(conv1dTransposedInit_t){ + .inChannels = D1_OUT, .outChannels = D2_OUT, .kernelSize = D2_K, .stride = D2_S}, + lq); + model[9] = reluLayerInit(lq); + + model[10] = conv1dTransposedLayerInit( + &(conv1dTransposedInit_t){ + .inChannels = D2_OUT, .outChannels = D3_OUT, .kernelSize = D3_K, .stride = D3_S}, + lq); +} + +static int loadStateDictFromDir(layer_t **model, const char *weightsDir) { + /* Param layer order in model[]: e1 (0), e2 (3), d1 (6), d2 (8), d3 (10). 5 entries. */ + char wPath[256], bPath[256]; + const char *names[5] = {"e1", "e2", "d1", "d2", "d3"}; + tensor_t *w[5] = {0}; + tensor_t *b[5] = {0}; + + for (int i = 0; i < 5; i++) { + snprintf(wPath, sizeof(wPath), "%s/%s.weight.npy", weightsDir, names[i]); + snprintf(bPath, sizeof(bPath), "%s/%s.bias.npy", weightsDir, names[i]); + tensorArray_t *wArr = npyLoad(wPath); + tensorArray_t *bArr = npyLoad(bPath); + if (wArr == NULL || bArr == NULL) { + fprintf(stderr, "loadStateDictFromDir: missing %s or %s\n", wPath, bPath); + return 1; + } + w[i] = wArr->array[0]; + b[i] = bArr->array[0]; + } + + modelLoadStateDict( + model, MODEL_SIZE, + (stateDictEntry_t[]){ + {.name = names[0], .weightData = (float *)w[0]->data, .biasData = (float *)b[0]->data}, + {.name = names[1], .weightData = (float *)w[1]->data, .biasData = (float *)b[1]->data}, + {.name = names[2], .weightData = (float *)w[2]->data, .biasData = (float *)b[2]->data}, + {.name = names[3], .weightData = (float *)w[3]->data, .biasData = (float *)b[3]->data}, + {.name = names[4], .weightData = (float *)w[4]->data, .biasData = (float *)b[4]->data}, + }, + 5); + return 0; +} + +static FILE *g_log_file = NULL; +static int g_first_epoch = 1; +static struct timespec g_epoch_t0; + +static void epochCallback(size_t epoch, float trainLoss, epochStats_t evalStats) { + struct timespec t1; + clock_gettime(CLOCK_MONOTONIC, &t1); + double wall_s = + (double)(t1.tv_sec - g_epoch_t0.tv_sec) + (double)(t1.tv_nsec - g_epoch_t0.tv_nsec) * 1e-9; + + if (!g_first_epoch) { + fprintf(g_log_file, ",\n"); + } + fprintf(g_log_file, + " {\"epoch\": %zu, \"step_losses\": [], \"train_loss\": %.6f, " + "\"val_loss\": %.6f, \"val_acc\": null, \"wall_s\": %.4f}", + epoch, (double)trainLoss, (double)evalStats.loss, wall_s); + fflush(g_log_file); + g_first_epoch = 0; + + fprintf(stdout, "epoch %zu: train_loss=%.6f val_loss=%.6f wall_s=%.2f\n", epoch, + (double)trainLoss, (double)evalStats.loss, wall_s); + fflush(stdout); + + clock_gettime(CLOCK_MONOTONIC, &g_epoch_t0); +} + +static int writeAllReconstructions(layer_t **model, size_t modelSize, + sample_t *(*getSample)(size_t), size_t n, const char *outPath) { + size_t totalElems = n * IN_CHANNELS * LEN_INPUT; + float *buf = malloc(totalElems * sizeof(float)); + if (!buf) { + fprintf(stderr, "OOM allocating reconstruction buffer (n=%zu)\n", n); + return 1; + } + + for (size_t i = 0; i < n; ++i) { + sample_t *s = getSample(i); + tensor_t *out = inference(model, modelSize, s->item); + const float *recon = (const float *)out->data; + memcpy(buf + i * IN_CHANNELS * LEN_INPUT, recon, IN_CHANNELS * LEN_INPUT * sizeof(float)); + freeTensor(out); + freeSample(s); + } + + size_t outShape[3] = {n, IN_CHANNELS, LEN_INPUT}; + int rc = npyWriteFloat32(outPath, buf, outShape, 3); + free(buf); + return rc; +} + +static int ensureDir(const char *p) { + if (mkdir(p, S_IRWXU | S_IRWXG | S_IROTH | S_IXOTH) == 0) { + return 0; + } + if (errno == EEXIST) { + return 0; + } + fprintf(stderr, "ERROR: cannot create %s: %s\n", p, strerror(errno)); + return 1; +} + +int main(void) { + if (ensureDir("examples/ecg_anomaly_ae_v2/logs") != 0) { + return 1; + } + if (ensureDir("examples/ecg_anomaly_ae_v2/outputs") != 0) { + return 1; + } + + initDataSets(); + + dataLoader_t *testLoader = dataLoaderInit(getTestSample, getTestSize, 1, NULL, NULL, + /*shuffle*/ false, /*shuffleSeed*/ 0, + /*dropLast*/ true); + + layerQuant_t lq; + layerQuantInitUniform(&lq, quantizationInitFloat()); + + layer_t *model[MODEL_SIZE]; + buildModel(model, &lq); + + const char *bitParity = getenv("BIT_PARITY"); + if (bitParity != NULL && bitParity[0] != '\0') { + const char *wDir = "examples/ecg_anomaly_ae/weights"; + if (loadStateDictFromDir(model, wDir) != 0) { + fprintf(stderr, "BIT_PARITY: state_dict load failed\n"); + return 1; + } + fprintf(stdout, "BIT_PARITY: loaded state_dict from %s\n", wDir); + } else { + dataLoader_t *trainLoader = dataLoaderInit(getTrainSample, getTrainSize, BATCH, NULL, NULL, + /*shuffle*/ true, /*shuffleSeed*/ SHUFFLE_SEED, + /*dropLast*/ true); + dataLoader_t *valLoader = dataLoaderInit(getValSample, getValSize, 1, NULL, NULL, + /*shuffle*/ false, /*shuffleSeed*/ 0, + /*dropLast*/ true); + + optimizer_t *sgd = + sgdMCreateOptim(LR, MOMENTUM, /*weightDecay*/ 0.0f, model, MODEL_SIZE, FLOAT32); + + g_log_file = fopen("examples/ecg_anomaly_ae_v2/logs/c.json", "w"); + if (!g_log_file) { + fprintf(stderr, "ERROR: cannot open log file for writing\n"); + return 1; + } + fprintf(g_log_file, + "{\n" + " \"impl\": \"c_v2\",\n" + " \"example\": \"ecg_anomaly_ae\",\n" + " \"config\": {\"epochs\": %d, \"batch\": %d, \"lr\": %.6f, " + "\"momentum\": %.6f, \"seed\": %d, \"shuffle_seed\": %d},\n" + " \"epochs\": [\n", + EPOCHS, BATCH, (double)LR, (double)MOMENTUM, SEED, SHUFFLE_SEED); + fflush(g_log_file); + + clock_gettime(CLOCK_MONOTONIC, &g_epoch_t0); + + trainingRunResult_t result = trainingRun( + model, MODEL_SIZE, + (lossConfig_t){ + .funcType = MSE, .backwardReduction = REDUCTION_MEAN, .classWeights = NULL}, + trainLoader, valLoader, sgd, EPOCHS, calculateGradsSequential, inferenceWithLoss, + epochCallback); + (void)result; + + float testLoss = + evaluationEpoch(model, MODEL_SIZE, MSE, testLoader, inferenceWithLoss, REDUCTION_MEAN); + + fprintf(g_log_file, + "\n ],\n" + " \"final\": {\"test_loss\": %.6f, \"test_acc\": null, " + "\"test_auc\": null}\n" + "}\n", + (double)testLoss); + fclose(g_log_file); + + fprintf(stdout, "FINAL test_loss=%.6f\n", (double)testLoss); + } + + int status = 0; + int rc = writeAllReconstructions(model, MODEL_SIZE, getTestSample, getTestSize(), + "examples/ecg_anomaly_ae_v2/outputs/c_reconstructions.npy"); + if (rc != 0) { + fprintf(stderr, "ERROR: c_reconstructions.npy write failed (rc=%d)\n", rc); + status = 1; + } + + return status; +} diff --git a/examples/har_classifier/train_pytorch.py b/examples/har_classifier/train_pytorch.py index 84df12f..07149a7 100644 --- a/examples/har_classifier/train_pytorch.py +++ b/examples/har_classifier/train_pytorch.py @@ -154,6 +154,31 @@ def main() -> None: np.save(OUTPUTS / "pytorch_predictions.npy", preds) print(f"FINAL test_loss={test_loss:.4f} test_acc={test_acc:.4f}", flush=True) + # Save per-layer weights for the C-side BIT_PARITY mode. + # C-side expects: examples/har_classifier/weights/.{weight,bias}.npy + # Where in {conv1, conv2, conv3, fc} matches the order in v2's buildModel. + import os + + weights_dir = HERE / "weights" + os.makedirs(weights_dir, exist_ok=True) + + layer_map = { + "conv1": model.conv1, + "conv2": model.conv2, + "conv3": model.conv3, + "fc": model.fc, + } + + print("Saving per-layer weights:", flush=True) + for name, layer in layer_map.items(): + w = layer.weight.detach().cpu().numpy().astype(np.float32) + np.save(weights_dir / f"{name}.weight.npy", w) + if layer.bias is not None: + b = layer.bias.detach().cpu().numpy().astype(np.float32) + np.save(weights_dir / f"{name}.bias.npy", b) + has_bias = f" + {name}.bias.npy" if layer.bias is not None else "" + print(f" wrote {name}.weight.npy shape={w.shape}{has_bias}", flush=True) + if __name__ == "__main__": main() diff --git a/examples/har_classifier_v2/CMakeLists.txt b/examples/har_classifier_v2/CMakeLists.txt new file mode 100644 index 0000000..ad72f40 --- /dev/null +++ b/examples/har_classifier_v2/CMakeLists.txt @@ -0,0 +1,62 @@ +add_executable(train_c_har_classifier_v2 train_c.c) + +target_link_libraries(train_c_har_classifier_v2 PRIVATE + DataLoaderApi + DataLoader + NPYLoaderApi + NPYLoader + + Layer + + Conv1dApi + Conv1d + + LinearApi + Linear + + ReluApi + Relu + + FlattenApi + Flatten + + Pool1dApi + MaxPool1d + AvgPool1d + + QuantizationApi + Quantization + + TensorApi + Tensor + Rounding + + TrainingLoopApi + CalculateGradsSequential + TrainingBatchDefault + TrainingEpochDefault + Optimizer + + LossFunction + CrossEntropy + + SoftmaxApi + Softmax + + Sgd + SgdApi + + InferenceApi + + StateDictApi + LayerWeightsApi + LayerQuant + LayerCommon + Distributions + + Common + StorageApi + RNG + + examples_shared +) diff --git a/examples/har_classifier_v2/train_c.c b/examples/har_classifier_v2/train_c.c new file mode 100644 index 0000000..0171a80 --- /dev/null +++ b/examples/har_classifier_v2/train_c.c @@ -0,0 +1,387 @@ +#define SOURCE_FILE "har_classifier_v2_train_c" + +#include +#include +#include +#include +#include +#include +#include + +#include "CalculateGradsSequential.h" +#include "Common.h" +#include "Conv1dApi.h" +#include "DataLoader.h" +#include "DataLoaderApi.h" +#include "FlattenApi.h" +#include "InferenceApi.h" +#include "Layer.h" +#include "LayerCommon.h" +#include "LayerQuant.h" +#include "LinearApi.h" +#include "LossFunction.h" +#include "NPYLoaderApi.h" +#include "Pool1dApi.h" +#include "Quantization.h" +#include "QuantizationApi.h" +#include "ReluApi.h" +#include "SgdApi.h" +#include "SoftmaxApi.h" +#include "StateDictApi.h" +#include "StorageApi.h" +#include "Tensor.h" +#include "TensorApi.h" +#include "TrainingLoopApi.h" + +#include "npy_writer.h" + +#define EPOCHS 20 +#define BATCH 64 +#define LR 0.01f +#define MOMENTUM 0.9f +#define SEED 42 +#define SHUFFLE_SEED 42 +#define NUM_CLASSES 6 + +#define IN_CHANNELS 9 +#define LEN_INPUT 128 + +#define C1_OUT 16 +#define C1_K 7 +#define C2_OUT 32 +#define C2_K 5 +#define C3_OUT 64 +#define C3_K 3 + +/* 3 x (Conv1d + ReLU + Pool) + Flatten + Linear + Softmax = 12 layers */ +#define MODEL_SIZE 12 + +static dataset_t g_trainDataset; +static dataset_t g_valDataset; +static dataset_t g_testDataset; + +static void reshapeItemsAddBatchDim(tensorArray_t *items) { + for (size_t i = 0; i < items->size; ++i) { + tensor_t *t = items->array[i]; + size_t oldRank = t->shape->numberOfDimensions; + size_t newRank = oldRank + 1; + + size_t *newDims = reserveMemory(newRank * sizeof(size_t)); + size_t *newOrder = reserveMemory(newRank * sizeof(size_t)); + newDims[0] = 1; + for (size_t d = 0; d < oldRank; ++d) { + newDims[d + 1] = t->shape->dimensions[d]; + } + for (size_t d = 0; d < newRank; ++d) { + newOrder[d] = d; + } + + freeReservedMemory(t->shape->dimensions); + freeReservedMemory(t->shape->orderOfDimensions); + t->shape->dimensions = newDims; + t->shape->orderOfDimensions = newOrder; + t->shape->numberOfDimensions = newRank; + } +} + +static tensorArray_t *buildOneHotLabels(tensorArray_t *intLabels) { + tensorArray_t *out = reserveMemory(sizeof(tensorArray_t)); + tensor_t **arr = reserveMemory(intLabels->size * sizeof(tensor_t *)); + out->array = arr; + out->size = intLabels->size; + + for (size_t i = 0; i < intLabels->size; ++i) { + size_t *dims = reserveMemory(1 * sizeof(size_t)); + size_t *order = reserveMemory(1 * sizeof(size_t)); + dims[0] = NUM_CLASSES; + order[0] = 0; + shape_t *shape = reserveMemory(sizeof(shape_t)); + shape->dimensions = dims; + shape->orderOfDimensions = order; + shape->numberOfDimensions = 1; + + quantization_t *q = quantizationInitFloat(); + tensor_t *t = initTensor(shape, q, NULL); + + int32_t cls = ((int32_t *)intLabels->array[i]->data)[0]; + float *data = (float *)t->data; + for (size_t c = 0; c < NUM_CLASSES; ++c) { + data[c] = (c == (size_t)cls) ? 1.0f : 0.0f; + } + arr[i] = t; + } + return out; +} + +static void initDataSets(void) { + /* Data path: reuse legacy directory; v2 doesn't duplicate the data. */ + tensorArray_t *trainItems = npyLoad("examples/har_classifier/data/train_x.npy"); + tensorArray_t *trainLabelsRaw = npyLoad("examples/har_classifier/data/train_y.npy"); + reshapeItemsAddBatchDim(trainItems); + g_trainDataset.items = trainItems; + g_trainDataset.labels = buildOneHotLabels(trainLabelsRaw); + + tensorArray_t *valItems = npyLoad("examples/har_classifier/data/val_x.npy"); + tensorArray_t *valLabelsRaw = npyLoad("examples/har_classifier/data/val_y.npy"); + reshapeItemsAddBatchDim(valItems); + g_valDataset.items = valItems; + g_valDataset.labels = buildOneHotLabels(valLabelsRaw); + + tensorArray_t *testItems = npyLoad("examples/har_classifier/data/test_x.npy"); + tensorArray_t *testLabelsRaw = npyLoad("examples/har_classifier/data/test_y.npy"); + reshapeItemsAddBatchDim(testItems); + g_testDataset.items = testItems; + g_testDataset.labels = buildOneHotLabels(testLabelsRaw); +} + +static sample_t *getTrainSample(size_t id) { + return npyGetSample(&g_trainDataset, id); +} +static sample_t *getValSample(size_t id) { + return npyGetSample(&g_valDataset, id); +} +static sample_t *getTestSample(size_t id) { + return npyGetSample(&g_testDataset, id); +} +static size_t getTrainSize(void) { + return g_trainDataset.items->size; +} +static size_t getValSize(void) { + return g_valDataset.items->size; +} +static size_t getTestSize(void) { + return g_testDataset.items->size; +} + +static void buildModel(layer_t **model, layerQuant_t *lq) { + /* Block 1: Conv1d(9->16, K=7, padding=SAME), ReLU, MaxPool(K=2, S=2). */ + model[0] = conv1dLayerInit( + &(conv1dInit_t){ + .inChannels = IN_CHANNELS, .outChannels = C1_OUT, .kernelSize = C1_K, .padding = SAME}, + lq); + model[1] = reluLayerInit(lq); + model[2] = maxPool1dLayerInit( + &(maxPool1dInit_t){ + .kernelSize = 2, .stride = 2, .inputChannels = C1_OUT, .inputLength = LEN_INPUT}, + lq); + + /* Block 2 */ + model[3] = conv1dLayerInit( + &(conv1dInit_t){ + .inChannels = C1_OUT, .outChannels = C2_OUT, .kernelSize = C2_K, .padding = SAME}, + lq); + model[4] = reluLayerInit(lq); + model[5] = maxPool1dLayerInit( + &(maxPool1dInit_t){ + .kernelSize = 2, .stride = 2, .inputChannels = C2_OUT, .inputLength = LEN_INPUT / 2}, + lq); + + /* Block 3 */ + model[6] = conv1dLayerInit( + &(conv1dInit_t){ + .inChannels = C2_OUT, .outChannels = C3_OUT, .kernelSize = C3_K, .padding = SAME}, + lq); + model[7] = reluLayerInit(lq); + model[8] = avgPool1dLayerInit( + &(avgPool1dInit_t){.kernelSize = LEN_INPUT / 4, .stride = LEN_INPUT / 4}, lq); + + /* Head */ + model[9] = flattenLayerInit(); + model[10] = + linearLayerInit(&(linearInit_t){.inFeatures = C3_OUT, .outFeatures = NUM_CLASSES}, lq); + model[11] = softmaxLayerInit(lq); +} + +/* Load PyTorch state_dict from per-layer .npy files written by + * examples/har_classifier/train_pytorch.py --save-weights. + * + * Returns 0 on success, non-zero on first missing file. */ +static int loadStateDictFromDir(layer_t **model, const char *weightsDir) { + /* Param layer order in model[]: model[0] conv1, model[3] conv2, + * model[6] conv3, model[10] fc. 4 entries. */ + char wPath[256], bPath[256]; + const char *names[4] = {"conv1", "conv2", "conv3", "fc"}; + tensor_t *w[4] = {0}; + tensor_t *b[4] = {0}; + + for (int i = 0; i < 4; i++) { + snprintf(wPath, sizeof(wPath), "%s/%s.weight.npy", weightsDir, names[i]); + snprintf(bPath, sizeof(bPath), "%s/%s.bias.npy", weightsDir, names[i]); + tensorArray_t *wArr = npyLoad(wPath); + tensorArray_t *bArr = npyLoad(bPath); + if (wArr == NULL || bArr == NULL) { + fprintf(stderr, "loadStateDictFromDir: missing %s or %s\n", wPath, bPath); + return 1; + } + w[i] = wArr->array[0]; + b[i] = bArr->array[0]; + } + + modelLoadStateDict( + model, MODEL_SIZE, + (stateDictEntry_t[]){ + {.name = names[0], .weightData = (float *)w[0]->data, .biasData = (float *)b[0]->data}, + {.name = names[1], .weightData = (float *)w[1]->data, .biasData = (float *)b[1]->data}, + {.name = names[2], .weightData = (float *)w[2]->data, .biasData = (float *)b[2]->data}, + {.name = names[3], .weightData = (float *)w[3]->data, .biasData = (float *)b[3]->data}, + }, + 4); + return 0; +} + +static FILE *g_log_file = NULL; +static int g_first_epoch = 1; +static struct timespec g_epoch_t0; + +static void epochCallback(size_t epoch, float trainLoss, epochStats_t evalStats) { + struct timespec t1; + clock_gettime(CLOCK_MONOTONIC, &t1); + double wall_s = + (double)(t1.tv_sec - g_epoch_t0.tv_sec) + (double)(t1.tv_nsec - g_epoch_t0.tv_nsec) * 1e-9; + + if (!g_first_epoch) { + fprintf(g_log_file, ",\n"); + } + fprintf(g_log_file, + " {\"epoch\": %zu, \"step_losses\": [], \"train_loss\": %.6f, " + "\"val_loss\": %.6f, \"val_acc\": %.6f, \"wall_s\": %.4f}", + epoch, (double)trainLoss, (double)evalStats.loss, (double)evalStats.accuracy, wall_s); + fflush(g_log_file); + g_first_epoch = 0; + + fprintf(stdout, "epoch %zu: train_loss=%.4f val_loss=%.4f val_acc=%.4f wall_s=%.2f\n", epoch, + (double)trainLoss, (double)evalStats.loss, (double)evalStats.accuracy, wall_s); + fflush(stdout); + + clock_gettime(CLOCK_MONOTONIC, &g_epoch_t0); +} + +static int ensureDir(const char *p) { + if (mkdir(p, S_IRWXU | S_IRWXG | S_IROTH | S_IXOTH) == 0) { + return 0; + } + if (errno == EEXIST) { + return 0; + } + fprintf(stderr, "ERROR: cannot create %s: %s\n", p, strerror(errno)); + return 1; +} + +int main(void) { + if (ensureDir("examples/har_classifier_v2/logs") != 0) { + return 1; + } + if (ensureDir("examples/har_classifier_v2/outputs") != 0) { + return 1; + } + + initDataSets(); + + dataLoader_t *testLoader = dataLoaderInit(getTestSample, getTestSize, 1, NULL, NULL, + /*shuffle*/ false, /*shuffleSeed*/ 0, + /*dropLast*/ true); + + layerQuant_t lq; + layerQuantInitUniform(&lq, quantizationInitFloat()); + + layer_t *model[MODEL_SIZE]; + buildModel(model, &lq); + + const char *bitParity = getenv("BIT_PARITY"); + if (bitParity != NULL && bitParity[0] != '\0') { + /* Bit-parity mode: load PyTorch state_dict, skip training, run inference. */ + const char *wDir = "examples/har_classifier/weights"; + if (loadStateDictFromDir(model, wDir) != 0) { + fprintf(stderr, "BIT_PARITY: state_dict load failed\n"); + return 1; + } + fprintf(stdout, "BIT_PARITY: loaded state_dict from %s\n", wDir); + } else { + dataLoader_t *trainLoader = dataLoaderInit(getTrainSample, getTrainSize, BATCH, NULL, NULL, + /*shuffle*/ true, /*shuffleSeed*/ SHUFFLE_SEED, + /*dropLast*/ true); + dataLoader_t *valLoader = dataLoaderInit(getValSample, getValSize, 1, NULL, NULL, + /*shuffle*/ false, /*shuffleSeed*/ 0, + /*dropLast*/ true); + + optimizer_t *sgd = + sgdMCreateOptim(LR, MOMENTUM, /*weightDecay*/ 0.0f, model, MODEL_SIZE, FLOAT32); + + g_log_file = fopen("examples/har_classifier_v2/logs/c.json", "w"); + if (!g_log_file) { + fprintf(stderr, "ERROR: cannot open log file for writing\n"); + return 1; + } + fprintf(g_log_file, + "{\n" + " \"impl\": \"c_v2\",\n" + " \"example\": \"har_classifier\",\n" + " \"config\": {\"epochs\": %d, \"batch\": %d, \"lr\": %.6f, " + "\"momentum\": %.6f, \"seed\": %d, \"shuffle_seed\": %d},\n" + " \"epochs\": [\n", + EPOCHS, BATCH, (double)LR, (double)MOMENTUM, SEED, SHUFFLE_SEED); + fflush(g_log_file); + + clock_gettime(CLOCK_MONOTONIC, &g_epoch_t0); + + trainingRunResult_t result = + trainingRun(model, MODEL_SIZE, + (lossConfig_t){.funcType = CROSS_ENTROPY, + .backwardReduction = REDUCTION_MEAN, + .classWeights = NULL}, + trainLoader, valLoader, sgd, EPOCHS, calculateGradsSequential, + inferenceWithLoss, epochCallback); + (void)result; + + epochStats_t testStats = evaluationEpochWithMetrics( + model, MODEL_SIZE, CROSS_ENTROPY, testLoader, inferenceWithLoss, REDUCTION_MEAN); + + fprintf(g_log_file, + "\n ],\n" + " \"final\": {\"test_loss\": %.6f, \"test_acc\": %.6f, " + "\"test_auc\": null}\n" + "}\n", + (double)testStats.loss, (double)testStats.accuracy); + fclose(g_log_file); + + fprintf(stdout, "FINAL test_loss=%.4f test_acc=%.4f\n", (double)testStats.loss, + (double)testStats.accuracy); + } + + /* Predictions on test set (both modes). */ + size_t numTest = getTestSize(); + int32_t *predictions = malloc(numTest * sizeof(int32_t)); + if (!predictions) { + fprintf(stderr, "OOM allocating predictions\n"); + return 1; + } + + for (size_t i = 0; i < numTest; ++i) { + sample_t *s = getTestSample(i); + tensor_t *out = inference(model, MODEL_SIZE, s->item); + float *probs = (float *)out->data; + size_t argmax = 0; + float best = probs[0]; + for (size_t c = 1; c < NUM_CLASSES; ++c) { + if (probs[c] > best) { + best = probs[c]; + argmax = c; + } + } + predictions[i] = (int32_t)argmax; + freeTensor(out); + freeSample(s); + } + + size_t outShape[] = {numTest}; + int status = 0; + int rc = npyWriteInt32("examples/har_classifier_v2/outputs/c_predictions.npy", predictions, + outShape, 1); + if (rc != 0) { + fprintf(stderr, "ERROR: npyWriteInt32 failed (rc=%d)\n", rc); + status = 1; + } + free(predictions); + + return status; +} From 979826067311739fac60d0d17f77c311f80806bb Mon Sep 17 00:00:00 2001 From: Leo Buron Date: Fri, 15 May 2026 22:23:31 +0200 Subject: [PATCH 4/4] =?UTF-8?q?ci:=20add=20bit-parity=20job=20=E2=80=94=20?= =?UTF-8?q?HAR=20and=20ECG=20v2=20binaries=20diff'd=20against=20PyTorch=20?= =?UTF-8?q?reference?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit New job c-bit-parity runs in parallel with c-build-and-test. Steps: 1. PyTorch trains HAR + ECG (emits pytorch_*.npy + per-layer weights) 2. Builds the two v2 binaries via cmake --preset examples 3. Runs both v2 binaries with BIT_PARITY=1 (loads state_dict via modelLoadStateDict, skips training, writes inference outputs) 4. uv-run examples/_shared/compare_predictions.py per example — exact match for HAR int32, allclose (rtol=1e-4, atol=1e-5) for ECG float32 The job fails the CI if the new factories produce different inference outputs than PyTorch with the same weights — catches factory-wiring regressions immediately. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 10 --- .github/workflows/ci.yml | 60 +++++++++++++++++++++++ examples/_shared/compare_predictions.py | 63 +++++++++++++++++++++++++ 2 files changed, 123 insertions(+) create mode 100644 examples/_shared/compare_predictions.py diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 4f42cb7..ec50d4c 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -103,6 +103,66 @@ jobs: - name: Test run: ctest --preset unit_test_asan + c-bit-parity: + runs-on: ubuntu-latest + + steps: + - uses: actions/checkout@v4 + + - name: Install dependencies + run: sudo apt-get update && sudo apt-get install -y cmake ninja-build gcc + + - name: Install uv + uses: astral-sh/setup-uv@v6 + + - name: Set up Python + run: uv python install 3.12 + + - name: Sync Python deps + run: uv sync + + - name: Prepare HAR data + run: uv run examples/har_classifier/prepare_data.py + + - name: Prepare ECG data + run: uv run examples/ecg_anomaly_ae/prepare_data.py + + - name: Train PyTorch HAR (produces reference predictions + weights) + run: uv run examples/har_classifier/train_pytorch.py + + - name: Train PyTorch ECG (produces reference reconstructions + weights) + run: uv run examples/ecg_anomaly_ae/train_pytorch.py + + - name: Configure + run: cmake --preset examples + + - name: Build v2 binaries + run: | + cmake --build --preset examples --target train_c_har_classifier_v2 + cmake --build --preset examples --target train_c_ecg_anomaly_ae_v2 + + - name: Run HAR v2 in BIT_PARITY mode + run: BIT_PARITY=1 build/examples/examples/har_classifier_v2/train_c_har_classifier_v2 + + - name: Run ECG v2 in BIT_PARITY mode + run: BIT_PARITY=1 build/examples/examples/ecg_anomaly_ae_v2/train_c_ecg_anomaly_ae_v2 + + - name: Diff HAR predictions (int32, exact match required) + run: | + uv run examples/_shared/compare_predictions.py \ + --pytorch examples/har_classifier/outputs/pytorch_predictions.npy \ + --c examples/har_classifier_v2/outputs/c_predictions.npy \ + --dtype int32 + + - name: Diff ECG reconstructions (float32, allclose) + run: | + uv run examples/_shared/compare_predictions.py \ + --pytorch examples/ecg_anomaly_ae/outputs/pytorch_reconstructions.npy \ + --c examples/ecg_anomaly_ae_v2/outputs/c_reconstructions.npy \ + --dtype float32 \ + --rtol 1e-4 \ + --atol 1e-5 + python-test: runs-on: ubuntu-latest diff --git a/examples/_shared/compare_predictions.py b/examples/_shared/compare_predictions.py new file mode 100644 index 0000000..50e797e --- /dev/null +++ b/examples/_shared/compare_predictions.py @@ -0,0 +1,63 @@ +"""Compare C-side predictions/reconstructions against PyTorch reference outputs. + +Used by the bit-parity CI step. Exits 0 on match, 1 on mismatch. + +Usage: + uv run examples/_shared/compare_predictions.py \\ + --pytorch \\ + --c \\ + --dtype {int32,float32} \\ + [--rtol 1e-4] [--atol 1e-5] +""" + +import argparse +import sys +import numpy as np + + +def main() -> int: + parser = argparse.ArgumentParser() + parser.add_argument("--pytorch", required=True, help="PyTorch reference .npy") + parser.add_argument("--c", required=True, help="C-side .npy") + parser.add_argument("--dtype", required=True, choices=["int32", "float32"]) + parser.add_argument("--rtol", type=float, default=1e-4) + parser.add_argument("--atol", type=float, default=1e-5) + args = parser.parse_args() + + py = np.load(args.pytorch) + c = np.load(args.c) + + if py.shape != c.shape: + print(f"FAIL: shape mismatch — pytorch={py.shape}, c={c.shape}", file=sys.stderr) + return 1 + + if args.dtype == "int32": + if not np.array_equal(py, c): + mismatches = np.flatnonzero(py != c) + print(f"FAIL: int32 mismatch at {mismatches.size}/{py.size} positions", + file=sys.stderr) + for idx in mismatches[:5]: + print(f" idx={idx}: pytorch={py.flat[idx]}, c={c.flat[idx]}", file=sys.stderr) + return 1 + print(f"PASS: int32 arrays bit-identical ({py.size} elements)") + return 0 + + # float32 + if not np.allclose(py, c, rtol=args.rtol, atol=args.atol): + diffs = np.abs(py - c) + max_diff = diffs.max() + rel_diffs = diffs / (np.abs(py) + args.atol) + max_rel = rel_diffs.max() + print(f"FAIL: float32 mismatch — max_abs={max_diff:.6e}, " + f"max_rel={max_rel:.6e}, rtol={args.rtol}, atol={args.atol}", file=sys.stderr) + worst = np.argmax(diffs) + print(f" worst idx={worst}: pytorch={py.flat[worst]:.6e}, c={c.flat[worst]:.6e}", + file=sys.stderr) + return 1 + print(f"PASS: float32 arrays close (rtol={args.rtol}, atol={args.atol}, " + f"{py.size} elements)") + return 0 + + +if __name__ == "__main__": + sys.exit(main())