From e88e4a109108408e59eeeb15142afa4d80ac5d95 Mon Sep 17 00:00:00 2001
From: Leo Buron <leo.buron@uni-due.de>
Date: Fri, 15 May 2026 22:12:01 +0200
Subject: [PATCH 1/4] =?UTF-8?q?feat(userApi):=20layerLoadWeights=20CONV1D?=
 =?UTF-8?q?=5FTRANSPOSED=20dispatch=20=E2=80=94=20weight=20+=20bias=20memc?=
 =?UTF-8?q?py?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Mirrors the CONV1D case but routes through conv1dTransposedConfig_t.
Caller-side weight buffer shape is [inChannels, outChannels/groups,
kernelSize] — the SWAP relative to Conv1d is documented in the
Conv1dTransposedApi header.

Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 8.1

refactor(userApi): rename Softmax factories to *Legacy for new API coexistence

Frees the canonical softmaxLayerInit / freeSoftmaxLayer names. Legacy
bodies are functionally unchanged except for the explicit
softmaxConfig->ownsQuantizations = false (defensive — matches calloc
default).

Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md

feat(userApi): declare conv1dInit_t and new Conv1d factory signatures

Adds the per-layer init struct, Borrowing/Owning factory decls, and
freeConv1dLayer decl. _Static_assert guards that paddingType_t::VALID
remains enum value 0 so .padding zero-init defaults to VALID. No impl
yet — that follows in the next two commits.

Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md sections 3.3, 3.4, 4

test(userApi): failing tests for new conv1dLayerInit Borrowing variant

Four tests: shape correctness with explicit fields, BIAS_DEFAULT
resolution, BIAS_FALSE leaves bias NULL, padding/stride/dilation/groups
zero-init defaults (VALID/1/1/1). Fails at link until impl lands.

Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md sections 3.3, 4, 5.1

feat(userApi): declare conv1dTransposedInit_t and Conv1dTransposed factory signatures

New Conv1dTransposedApi.h header with conv1dTransposedInit_t struct,
Borrowing/Owning factory decls, freeConv1dTransposedLayer decl, and
_Static_assert(VALID == 0). Stub .c file registered in CMake.
Implementation in subsequent commits.

Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md sections 3.3, 3.4, 4

test(userApi): failing tests for new conv1dTransposedLayerInit Borrowing

Three tests: shape correctness (with inChannels/outChannels weight
shape SWAP relative to Conv1d), BIAS_FALSE leaves bias NULL,
outputPadding propagates to internal config. Fails at link until impl
lands.

Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md sections 3.3, 4

feat(userApi): declare Pool1d factory signatures (Max + Avg)

Splits the spec's shared pool1dInit_t into maxPool1dInit_t (with
inputChannels + inputLength for argmax pre-allocation) and
avgPool1dInit_t (no input geometry, no dilation). Both factory pairs
declared in one header; impl stubbed in Pool1dApi.c for the CMake
graph. Per-layer init code follows.

Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md sections 3.3 (with documented split), 3.4, 4

test(userApi): failing tests for maxPool1dLayerInit Borrowing + Owning

Four tests: kernel + argmax shape correctness, stride defaulting to
kernelSize (PyTorch pool convention), Owning deep-copy of forwardMath
and backwardMath into the two pool config slots (forwardQ + propLossQ),
and a leak-check loop. Fails at link until impl lands.

Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md sections 3.3 (split), 4

test(userApi): failing tests for avgPool1dLayerInit Borrowing + Owning

Three tests: kernel correctness, stride defaulting to kernelSize, and
Owning deep-copy. AvgPool has no dilation (struct field omitted) and
no argmax tensor (no input geometry required).

Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 4

test(userApi): failing tests for layerLoadWeights CONV1D case

Two tests: weight + bias memcpy, and no-bias accepts NULL biasData.
Fails because the current CONV1D dispatch is the PR 1 PRINT_ERROR
stub. Implementation in next commit.

Also adds TensorApi include + MORE_LIBS entry (provides freeQuantization,
which the conv1d tests need but the plan omitted).

Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 8.1

feat(userApi): layerLoadWeights CONV1D dispatch — weight + bias memcpy

Replaces PR 1's TODO stub. Same shape as the LINEAR case: memcpy from
the caller-provided float* buffer into the factory-allocated tensor
data, with bias presence/absence enforcement matching the bool resolved
from conv1dInit_t::bias.

Also links Conv1d into LayerWeightsApi CMake target and adds TensorApi
to the test's MORE_LIBS (provides freeQuantization, omitted from spec).

Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 8.1

test(userApi): failing test for layerLoadWeights CONV1D_TRANSPOSED case

Verifies weight + bias memcpy into the factory-allocated Conv1dTransposed
parameter tensors. Fails because the current dispatch is a
PRINT_ERROR + exit stub.

Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 8.1

test(userApi): failing tests for shared deepCopyQuantization in LayerQuant

Three tests cover the null-input shortcut, FLOAT32 (no qConfig), and
SYM_INT32 (qConfig bytes duplicated). Fails at link until the impl
lands in LayerQuant.c.

Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md sections 5.2, 5.3

refactor(userApi): extract deepCopyQuantization to shared LayerQuant utility

PR 1 inlined deepCopyQuantization into LinearApi.c and an equivalent
reluDeepCopyQuantization into ReluApi.c. Hoists both into a single
externally-linked function in LayerQuant.c. LinearApi and ReluApi now
share the same code path; the new Conv1d / Conv1dTransposed / Pool1d /
Softmax Owning factories that follow in this PR use it directly.

Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md sections 5.2, 5.3

refactor(userApi): rename Conv1d factories to *Legacy for new API coexistence

Frees the canonical conv1dLayerInit / freeConv1dLayer names for the new
conv1dInit_t-based factories landing in this PR. Legacy bodies are
functionally unchanged; only adds 'conv1dConfig->ownsQuantizations =
false' (no behavior change since the new field defaults to false via
calloc anyway, but explicit is clearer).

Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md
---
 examples/ecg_anomaly_ae/train_c.c             |   7 +-
 examples/har_classifier/train_c.c             |  14 +-
 src/userApi/CMakeLists.txt                    |   4 +
 src/userApi/LayerQuant.c                      |  47 ++++
 src/userApi/LayerWeightsApi.c                 |  57 ++++-
 src/userApi/include/LayerQuant.h              |  14 ++
 src/userApi/layer/CMakeLists.txt              |  35 +++
 src/userApi/layer/Conv1dApi.c                 |   9 +-
 src/userApi/layer/Conv1dTransposedApi.c       |   8 +
 src/userApi/layer/LinearApi.c                 |  52 -----
 src/userApi/layer/Pool1dApi.c                 |   5 +
 src/userApi/layer/ReluApi.c                   |  46 +---
 src/userApi/layer/SoftmaxApi.c                |   5 +-
 src/userApi/layer/include/Conv1dApi.h         |  60 ++++-
 .../layer/include/Conv1dTransposedApi.h       |  51 ++++
 src/userApi/layer/include/Pool1dApi.h         |  67 ++++++
 src/userApi/layer/include/SoftmaxApi.h        |   6 +-
 test/unit/layer/UnitTestConv1d.c              |  10 +-
 test/unit/layer/UnitTestSoftmax.c             |  20 +-
 .../loss_functions/UnitTestCrossEntropy.c     |   8 +-
 test/unit/serial/UnitTestDeserialize.c        |   8 +-
 test/unit/userAPI/CMakeLists.txt              |  51 ++++
 test/unit/userAPI/UnitTestConv1dApi.c         | 147 ++++++++++++
 .../userAPI/UnitTestConv1dTransposedApi.c     | 127 ++++++++++
 .../unit/userAPI/UnitTestFlattenIntegration.c |   4 +-
 test/unit/userAPI/UnitTestLayerQuant.c        |  39 ++++
 test/unit/userAPI/UnitTestLayerWeightsApi.c   | 104 +++++++++
 test/unit/userAPI/UnitTestMnistSmoke.c        |   6 +-
 .../unit/userAPI/UnitTestMultiLayerTraining.c |  12 +-
 test/unit/userAPI/UnitTestPool1dApi.c         | 219 ++++++++++++++++++
 30 files changed, 1081 insertions(+), 161 deletions(-)
 create mode 100644 src/userApi/layer/Conv1dTransposedApi.c
 create mode 100644 src/userApi/layer/Pool1dApi.c
 create mode 100644 src/userApi/layer/include/Conv1dTransposedApi.h
 create mode 100644 src/userApi/layer/include/Pool1dApi.h
 create mode 100644 test/unit/userAPI/UnitTestConv1dApi.c
 create mode 100644 test/unit/userAPI/UnitTestConv1dTransposedApi.c
 create mode 100644 test/unit/userAPI/UnitTestPool1dApi.c

diff --git a/examples/ecg_anomaly_ae/train_c.c b/examples/ecg_anomaly_ae/train_c.c
index 5f85dbf..0170690 100644
--- a/examples/ecg_anomaly_ae/train_c.c
+++ b/examples/ecg_anomaly_ae/train_c.c
@@ -271,7 +271,7 @@ static void buildModel(layer_t **model) {
     parameter_t *e1_w =
         buildParam(XAVIER_UNIFORM, e1_w_data, e1_w_dims, 3, IN_CHANNELS * E1_K, E1_OUT * E1_K);
     parameter_t *e1_b = buildParam(ZEROS, e1_b_data, e1_b_dims, 1, 1, E1_OUT);
-    model[0] = conv1dLayerInit(e1_w, e1_b, e1k, q, q, q, q);
+    model[0] = conv1dLayerInitLegacy(e1_w, e1_b, e1k, q, q, q, q);
     model[1] = reluLayerInitLegacy(quantizationInitFloat(), quantizationInitFloat());
 
     /* Block P1: MaxPool1d(K=2, S=2). 70 → 35. */
@@ -283,8 +283,9 @@ static void buildModel(layer_t **model) {
     parameter_t *e2_w =
         buildParam(XAVIER_UNIFORM, e2_w_data, e2_w_dims, 3, E1_OUT * E2_K, E2_OUT * E2_K);
     parameter_t *e2_b = buildParam(ZEROS, e2_b_data, e2_b_dims, 1, 1, E2_OUT);
-    model[3] = conv1dLayerInit(e2_w, e2_b, e2k, quantizationInitFloat(), quantizationInitFloat(),
-                               quantizationInitFloat(), quantizationInitFloat());
+    model[3] =
+        conv1dLayerInitLegacy(e2_w, e2_b, e2k, quantizationInitFloat(), quantizationInitFloat(),
+                              quantizationInitFloat(), quantizationInitFloat());
     model[4] = reluLayerInitLegacy(quantizationInitFloat(), quantizationInitFloat());
 
     /* Block P2: AvgPool1d(K=5, S=5). 35 → 7 (bottleneck). */
diff --git a/examples/har_classifier/train_c.c b/examples/har_classifier/train_c.c
index 2ec319c..1f78dfc 100644
--- a/examples/har_classifier/train_c.c
+++ b/examples/har_classifier/train_c.c
@@ -264,7 +264,7 @@ static void buildModel(layer_t **model) {
     parameter_t *c1_w =
         buildParam(XAVIER_UNIFORM, c1_w_data, c1_w_dims, 3, IN_CHANNELS * C1_K, C1_OUT * C1_K);
     parameter_t *c1_b = buildParam(ZEROS, c1_b_data, c1_b_dims, 1, 1, C1_OUT);
-    model[0] = conv1dLayerInit(c1_w, c1_b, k1, q1, q2, q3, q4);
+    model[0] = conv1dLayerInitLegacy(c1_w, c1_b, k1, q1, q2, q3, q4);
     model[1] = reluLayerInitLegacy(quantizationInitFloat(), quantizationInitFloat());
     model[2] = buildMaxPool1dLayer(2, 2, C1_OUT, LEN_INPUT / 2);
 
@@ -274,8 +274,9 @@ static void buildModel(layer_t **model) {
     parameter_t *c2_w =
         buildParam(XAVIER_UNIFORM, c2_w_data, c2_w_dims, 3, C1_OUT * C2_K, C2_OUT * C2_K);
     parameter_t *c2_b = buildParam(ZEROS, c2_b_data, c2_b_dims, 1, 1, C2_OUT);
-    model[3] = conv1dLayerInit(c2_w, c2_b, k2, quantizationInitFloat(), quantizationInitFloat(),
-                               quantizationInitFloat(), quantizationInitFloat());
+    model[3] =
+        conv1dLayerInitLegacy(c2_w, c2_b, k2, quantizationInitFloat(), quantizationInitFloat(),
+                              quantizationInitFloat(), quantizationInitFloat());
     model[4] = reluLayerInitLegacy(quantizationInitFloat(), quantizationInitFloat());
     model[5] = buildMaxPool1dLayer(2, 2, C2_OUT, LEN_INPUT / 4);
 
@@ -285,8 +286,9 @@ static void buildModel(layer_t **model) {
     parameter_t *c3_w =
         buildParam(XAVIER_UNIFORM, c3_w_data, c3_w_dims, 3, C2_OUT * C3_K, C3_OUT * C3_K);
     parameter_t *c3_b = buildParam(ZEROS, c3_b_data, c3_b_dims, 1, 1, C3_OUT);
-    model[6] = conv1dLayerInit(c3_w, c3_b, k3, quantizationInitFloat(), quantizationInitFloat(),
-                               quantizationInitFloat(), quantizationInitFloat());
+    model[6] =
+        conv1dLayerInitLegacy(c3_w, c3_b, k3, quantizationInitFloat(), quantizationInitFloat(),
+                              quantizationInitFloat(), quantizationInitFloat());
     model[7] = reluLayerInitLegacy(quantizationInitFloat(), quantizationInitFloat());
     model[8] = buildAvgPool1dLayer(LEN_INPUT / 4, LEN_INPUT / 4);
 
@@ -296,7 +298,7 @@ static void buildModel(layer_t **model) {
     parameter_t *fc_b = buildParam(ZEROS, fc_b_data, fc_b_dims, 2, 1, NUM_CLASSES);
     model[10] = linearLayerInitLegacy(fc_w, fc_b, quantizationInitFloat(), quantizationInitFloat(),
                                       quantizationInitFloat(), quantizationInitFloat());
-    model[11] = softmaxLayerInit(quantizationInitFloat(), quantizationInitFloat());
+    model[11] = softmaxLayerInitLegacy(quantizationInitFloat(), quantizationInitFloat());
 }
 
 /* ------------------------------------------------------------------------- */
diff --git a/src/userApi/CMakeLists.txt b/src/userApi/CMakeLists.txt
index 74da8cf..cd2c781 100644
--- a/src/userApi/CMakeLists.txt
+++ b/src/userApi/CMakeLists.txt
@@ -35,14 +35,18 @@ target_link_libraries(StorageApi PRIVATE
 add_library(LayerQuant LayerQuant.c)
 target_include_directories(LayerQuant PUBLIC include)
 target_link_libraries(LayerQuant PRIVATE
+        Common
         Quantization
         Rounding
+        StorageApi
 )
 
 add_library(LayerWeightsApi LayerWeightsApi.c)
 target_include_directories(LayerWeightsApi PUBLIC include)
 target_link_libraries(LayerWeightsApi PRIVATE
         Common
+        Conv1d
+        Conv1dTransposed
         Layer
         Linear
         Rounding
diff --git a/src/userApi/LayerQuant.c b/src/userApi/LayerQuant.c
index df06957..5ac004c 100644
--- a/src/userApi/LayerQuant.c
+++ b/src/userApi/LayerQuant.c
@@ -1,6 +1,11 @@
 #define SOURCE_FILE "LAYER_QUANT"
 
+#include <stdlib.h>
+#include <string.h>
+
+#include "Common.h"
 #include "LayerQuant.h"
+#include "StorageApi.h"
 
 void layerQuantInitUniform(layerQuant_t *lq, quantization_t *q) {
     lq->forwardMath = q;
@@ -8,3 +13,45 @@ void layerQuantInitUniform(layerQuant_t *lq, quantization_t *q) {
     lq->weightStorage = q;
     lq->biasStorage = q;
 }
+
+quantization_t *deepCopyQuantization(quantization_t *src) {
+    if (src == NULL) {
+        return NULL;
+    }
+
+    quantization_t *dst = reserveMemory(sizeof(quantization_t));
+    dst->type = src->type;
+
+    size_t cfgSize = 0;
+    switch (src->type) {
+    case FLOAT32:
+        cfgSize = 0;
+        break;
+    case INT32:
+        cfgSize = 0;
+        break;
+    case BOOL:
+        cfgSize = 0;
+        break;
+    case SYM_INT32:
+        cfgSize = sizeof(symInt32QConfig_t);
+        break;
+    case SYM:
+        cfgSize = sizeof(symQConfig_t);
+        break;
+    case ASYM:
+        cfgSize = sizeof(asymQConfig_t);
+        break;
+    default:
+        PRINT_ERROR("deepCopyQuantization: unknown quantization type %d", (int)src->type);
+        exit(1);
+    }
+
+    if (cfgSize == 0) {
+        dst->qConfig = NULL;
+    } else {
+        dst->qConfig = reserveMemory(cfgSize);
+        memcpy(dst->qConfig, src->qConfig, cfgSize);
+    }
+    return dst;
+}
diff --git a/src/userApi/LayerWeightsApi.c b/src/userApi/LayerWeightsApi.c
index 432d7e9..9c043d2 100644
--- a/src/userApi/LayerWeightsApi.c
+++ b/src/userApi/LayerWeightsApi.c
@@ -2,6 +2,8 @@
 
 #include "LayerWeightsApi.h"
 #include "Common.h"
+#include "Conv1d.h"
+#include "Conv1dTransposed.h"
 #include "Linear.h"
 #include "Tensor.h"
 #include <stdlib.h>
@@ -42,6 +44,30 @@ void layerLoadWeights(layer_t *layer, float *weightData, float *biasData) {
         }
         break;
     }
+    case CONV1D: {
+        conv1dConfig_t *cfg = layer->config->conv1d;
+        if (cfg->weights == NULL) {
+            PRINT_ERROR("layerLoadWeights CONV1D: layer has no weight parameter");
+            exit(1);
+        }
+        tensor_t *weightTensor = cfg->weights->param;
+        size_t numWeightElements = calcNumberOfElementsByTensor(weightTensor);
+        memcpy(weightTensor->data, weightData, numWeightElements * sizeof(float));
+
+        if (cfg->bias != NULL) {
+            if (biasData == NULL) {
+                PRINT_ERROR("layerLoadWeights CONV1D: layer has bias but biasData is NULL");
+                exit(1);
+            }
+            tensor_t *biasTensor = cfg->bias->param;
+            size_t numBiasElements = calcNumberOfElementsByTensor(biasTensor);
+            memcpy(biasTensor->data, biasData, numBiasElements * sizeof(float));
+        } else if (biasData != NULL) {
+            PRINT_ERROR("layerLoadWeights CONV1D: layer has no bias but biasData is non-NULL");
+            exit(1);
+        }
+        break;
+    }
     case RELU:
     case SOFTMAX:
     case FLATTEN:
@@ -49,11 +75,32 @@ void layerLoadWeights(layer_t *layer, float *weightData, float *biasData) {
     case AVGPOOL1D:
         PRINT_ERROR("layerLoadWeights: layer type %d has no parameters to load", (int)layer->type);
         exit(1);
-    case CONV1D:
-    case CONV1D_TRANSPOSED:
-        PRINT_ERROR("layerLoadWeights: layer type %d dispatch not implemented (TODO PR 2)",
-                    (int)layer->type);
-        exit(1);
+    case CONV1D_TRANSPOSED: {
+        conv1dTransposedConfig_t *cfg = layer->config->conv1dTransposed;
+        if (cfg->weights == NULL) {
+            PRINT_ERROR("layerLoadWeights CONV1D_TRANSPOSED: layer has no weight parameter");
+            exit(1);
+        }
+        tensor_t *weightTensor = cfg->weights->param;
+        size_t numWeightElements = calcNumberOfElementsByTensor(weightTensor);
+        memcpy(weightTensor->data, weightData, numWeightElements * sizeof(float));
+
+        if (cfg->bias != NULL) {
+            if (biasData == NULL) {
+                PRINT_ERROR("layerLoadWeights CONV1D_TRANSPOSED: layer has bias but biasData "
+                            "is NULL");
+                exit(1);
+            }
+            tensor_t *biasTensor = cfg->bias->param;
+            size_t numBiasElements = calcNumberOfElementsByTensor(biasTensor);
+            memcpy(biasTensor->data, biasData, numBiasElements * sizeof(float));
+        } else if (biasData != NULL) {
+            PRINT_ERROR("layerLoadWeights CONV1D_TRANSPOSED: layer has no bias but biasData "
+                        "is non-NULL");
+            exit(1);
+        }
+        break;
+    }
     default:
         PRINT_ERROR("layerLoadWeights: dispatch not implemented for layer type %d",
                     (int)layer->type);
diff --git a/src/userApi/include/LayerQuant.h b/src/userApi/include/LayerQuant.h
index 220e13f..b456634 100644
--- a/src/userApi/include/LayerQuant.h
+++ b/src/userApi/include/LayerQuant.h
@@ -22,4 +22,18 @@ typedef struct layerQuant {
  *  common all-same-quantization case.  Caller retains ownership of `q`. */
 void layerQuantInitUniform(layerQuant_t *lq, quantization_t *q);
 
+/*! Deep-copy a `quantization_t` and its `qConfig`. Returns NULL if `src` is NULL.
+ *
+ *  Caller owns the returned allocation. Free via:
+ *      freeReservedMemory(result->qConfig);
+ *      freeReservedMemory(result);
+ *
+ *  The `qConfig` size is dispatched by `src->type`; BOOL/INT32/FLOAT32 have
+ *  no qConfig (result->qConfig == NULL). Unknown types fire PRINT_ERROR +
+ *  exit(1).
+ *
+ *  Used by every `*LayerInitOwning` factory to materialize per-layer copies
+ *  of the four math quantizations referenced by `layerQuant_t`. */
+quantization_t *deepCopyQuantization(quantization_t *src);
+
 #endif /* LAYER_QUANT_H */
diff --git a/src/userApi/layer/CMakeLists.txt b/src/userApi/layer/CMakeLists.txt
index 805d5b0..c24e468 100644
--- a/src/userApi/layer/CMakeLists.txt
+++ b/src/userApi/layer/CMakeLists.txt
@@ -57,4 +57,39 @@ target_link_libraries(FlattenApi PRIVATE
         Layer
         Common
         StorageApi
+)
+
+add_library(Conv1dTransposedApi Conv1dTransposedApi.c)
+target_include_directories(Conv1dTransposedApi PUBLIC include)
+target_link_libraries(Conv1dTransposedApi PRIVATE
+        Common
+        Conv1dTransposed
+        Distributions
+        Kernel
+        Layer
+        LayerCommon
+        LayerQuant
+        Quantization
+        QuantizationApi
+        Rounding
+        StorageApi
+        Tensor
+        TensorApi
+)
+
+add_library(Pool1dApi Pool1dApi.c)
+target_include_directories(Pool1dApi PUBLIC include)
+target_link_libraries(Pool1dApi PRIVATE
+        AvgPool1d
+        Common
+        Kernel
+        Layer
+        LayerQuant
+        MaxPool1d
+        Quantization
+        QuantizationApi
+        Rounding
+        StorageApi
+        Tensor
+        TensorApi
 )
\ No newline at end of file
diff --git a/src/userApi/layer/Conv1dApi.c b/src/userApi/layer/Conv1dApi.c
index b17ddf6..22a3d3a 100644
--- a/src/userApi/layer/Conv1dApi.c
+++ b/src/userApi/layer/Conv1dApi.c
@@ -8,15 +8,16 @@
 
 #include <stdio.h>
 
-layer_t *conv1dLayerInit(parameter_t *weights, parameter_t *bias, kernel_t *kernel,
-                         quantization_t *forwardQ, quantization_t *weightGradQ,
-                         quantization_t *biasGradQ, quantization_t *propLossQ) {
+layer_t *conv1dLayerInitLegacy(parameter_t *weights, parameter_t *bias, kernel_t *kernel,
+                               quantization_t *forwardQ, quantization_t *weightGradQ,
+                               quantization_t *biasGradQ, quantization_t *propLossQ) {
     layer_t *conv1dLayer = reserveMemory(sizeof(layer_t));
     layerConfig_t *layerConfig = reserveMemory(sizeof(layerConfig_t));
     conv1dConfig_t *conv1dConfig = reserveMemory(sizeof(conv1dConfig_t));
 
     initConv1dConfigWithWeightsAndBias(conv1dConfig, kernel, weights, bias, 1u, forwardQ,
                                        weightGradQ, biasGradQ, propLossQ);
+    conv1dConfig->ownsQuantizations = false;
 
     conv1dLayer->type = CONV1D;
     layerConfig->conv1d = conv1dConfig;
@@ -25,7 +26,7 @@ layer_t *conv1dLayerInit(parameter_t *weights, parameter_t *bias, kernel_t *kern
     return conv1dLayer;
 }
 
-void freeConv1dLayer(layer_t *conv1dLayer) {
+void freeConv1dLayerLegacy(layer_t *conv1dLayer) {
     conv1dConfig_t *conv1dConfig = conv1dLayer->config->conv1d;
 
     freeParameter(conv1dConfig->weights);
diff --git a/src/userApi/layer/Conv1dTransposedApi.c b/src/userApi/layer/Conv1dTransposedApi.c
new file mode 100644
index 0000000..589351c
--- /dev/null
+++ b/src/userApi/layer/Conv1dTransposedApi.c
@@ -0,0 +1,8 @@
+#define SOURCE_FILE "CONV1D_TRANSPOSED_API"
+
+/* Stub. Full implementation lands in Task 12. This file exists so
+ * Conv1dTransposedApi compiles as a library target for the CMake graph
+ * to discover; the headers above declare the functions but they will
+ * link-fail until Task 12 fills them in. */
+
+#include "Conv1dTransposedApi.h"
diff --git a/src/userApi/layer/LinearApi.c b/src/userApi/layer/LinearApi.c
index 70c6b15..2b8bd15 100644
--- a/src/userApi/layer/LinearApi.c
+++ b/src/userApi/layer/LinearApi.c
@@ -2,7 +2,6 @@
 
 #include <stdbool.h>
 #include <stdlib.h>
-#include <string.h>
 
 #include "Common.h"
 #include "Distributions.h"
@@ -203,57 +202,6 @@ layer_t *linearLayerInit(linearInit_t *init, layerQuant_t *lq) {
     return layer;
 }
 
-/*! Deep-copies a quantization_t and its qConfig.
- *
- *  Returns NULL if `src` is NULL.  Caller owns the returned allocation; free via:
- *      freeReservedMemory(result->qConfig);
- *      freeReservedMemory(result);
- *
- *  qConfig size is dispatched by `src->type`.  BOOL has no qConfig (aligned with
- *  the BOOL dtype added per the BOOL tensor spec). */
-static quantization_t *deepCopyQuantization(quantization_t *src) {
-    if (src == NULL) {
-        return NULL;
-    }
-
-    quantization_t *dst = reserveMemory(sizeof(quantization_t));
-    dst->type = src->type;
-
-    size_t cfgSize = 0;
-    switch (src->type) {
-    case FLOAT32:
-        cfgSize = 0;
-        break; /* no qConfig */
-    case INT32:
-        cfgSize = 0;
-        break;
-    case BOOL:
-        cfgSize = 0;
-        break; /* BOOL has no qConfig */
-    case SYM_INT32:
-        cfgSize = sizeof(symInt32QConfig_t);
-        break;
-    case SYM:
-        cfgSize = sizeof(symQConfig_t);
-        break;
-    case ASYM:
-        cfgSize = sizeof(asymQConfig_t);
-        break;
-    default:
-        PRINT_ERROR("linearLayerInitOwning: cannot deep-copy quantization with unknown type %d",
-                    (int)src->type);
-        exit(1);
-    }
-
-    if (cfgSize == 0) {
-        dst->qConfig = NULL;
-    } else {
-        dst->qConfig = reserveMemory(cfgSize);
-        memcpy(dst->qConfig, src->qConfig, cfgSize);
-    }
-    return dst;
-}
-
 layer_t *linearLayerInitOwning(linearInit_t *init, layerQuant_t *lq) {
     validateLinearInit(init);
     bool hasBias = resolveLinearBias(init->bias);
diff --git a/src/userApi/layer/Pool1dApi.c b/src/userApi/layer/Pool1dApi.c
new file mode 100644
index 0000000..d743e54
--- /dev/null
+++ b/src/userApi/layer/Pool1dApi.c
@@ -0,0 +1,5 @@
+#define SOURCE_FILE "POOL1D_API"
+
+/* Stub. Full implementation lands in Tasks 15 and 16. */
+
+#include "Pool1dApi.h"
diff --git a/src/userApi/layer/ReluApi.c b/src/userApi/layer/ReluApi.c
index 2deaccd..ace1232 100644
--- a/src/userApi/layer/ReluApi.c
+++ b/src/userApi/layer/ReluApi.c
@@ -2,7 +2,6 @@
 
 #include <stdbool.h>
 #include <stdlib.h> /* exit */
-#include <string.h> /* memcpy */
 
 #include "Common.h" /* PRINT_ERROR */
 #include "LayerQuant.h"
@@ -38,47 +37,6 @@ void freeReluLayerLegacy(layer_t *reluLayer) {
  * New factory API — layerQuant_t profile (PR 1).
  * ========================================================================== */
 
-static quantization_t *reluDeepCopyQuantization(quantization_t *src) {
-    if (src == NULL) {
-        return NULL;
-    }
-
-    quantization_t *dst = reserveMemory(sizeof(quantization_t));
-    dst->type = src->type;
-
-    size_t cfgSize = 0;
-    switch (src->type) {
-    case FLOAT32:
-        cfgSize = 0;
-        break;
-    case INT32:
-        cfgSize = 0;
-        break;
-    case BOOL:
-        cfgSize = 0;
-        break;
-    case SYM_INT32:
-        cfgSize = sizeof(symInt32QConfig_t);
-        break;
-    case SYM:
-        cfgSize = sizeof(symQConfig_t);
-        break;
-    case ASYM:
-        cfgSize = sizeof(asymQConfig_t);
-        break;
-    default:
-        PRINT_ERROR("reluLayerInitOwning: unknown quantization type %d", (int)src->type);
-        exit(1);
-    }
-    if (cfgSize == 0) {
-        dst->qConfig = NULL;
-    } else {
-        dst->qConfig = reserveMemory(cfgSize);
-        memcpy(dst->qConfig, src->qConfig, cfgSize);
-    }
-    return dst;
-}
-
 static void validateLayerQuantForRelu(layerQuant_t *lq) {
     if (lq == NULL) {
         PRINT_ERROR("reluLayerInit: lq pointer is NULL");
@@ -123,8 +81,8 @@ layer_t *reluLayerInitOwning(layerQuant_t *lq) {
     layerCfg->relu = cfg;
     layer->config = layerCfg;
 
-    cfg->forwardQ = reluDeepCopyQuantization(lq->forwardMath);
-    cfg->backwardQ = reluDeepCopyQuantization(lq->backwardMath);
+    cfg->forwardQ = deepCopyQuantization(lq->forwardMath);
+    cfg->backwardQ = deepCopyQuantization(lq->backwardMath);
     cfg->ownsQuantizations = true;
 
     return layer;
diff --git a/src/userApi/layer/SoftmaxApi.c b/src/userApi/layer/SoftmaxApi.c
index 2b95013..df4fe8e 100644
--- a/src/userApi/layer/SoftmaxApi.c
+++ b/src/userApi/layer/SoftmaxApi.c
@@ -4,7 +4,7 @@
 #include "Softmax.h"
 #include "StorageApi.h"
 
-layer_t *softmaxLayerInit(quantization_t *forwardQ, quantization_t *backwardQ) {
+layer_t *softmaxLayerInitLegacy(quantization_t *forwardQ, quantization_t *backwardQ) {
     layer_t *softmaxLayer = reserveMemory(sizeof(layer_t));
 
     softmaxLayer->type = SOFTMAX;
@@ -15,12 +15,13 @@ layer_t *softmaxLayerInit(quantization_t *forwardQ, quantization_t *backwardQ) {
 
     softmaxConfig->forwardQ = forwardQ;
     softmaxConfig->backwardQ = backwardQ;
+    softmaxConfig->ownsQuantizations = false;
     softmaxLayer->config = layerConfig;
 
     return softmaxLayer;
 }
 
-void freeSoftmaxLayer(layer_t *softmaxLayer) {
+void freeSoftmaxLayerLegacy(layer_t *softmaxLayer) {
     freeReservedMemory(softmaxLayer->config->softmax);
     freeReservedMemory(softmaxLayer->config);
     freeReservedMemory(softmaxLayer);
diff --git a/src/userApi/layer/include/Conv1dApi.h b/src/userApi/layer/include/Conv1dApi.h
index 1048366..79abb33 100644
--- a/src/userApi/layer/include/Conv1dApi.h
+++ b/src/userApi/layer/include/Conv1dApi.h
@@ -4,7 +4,10 @@
 #include "Kernel.h"
 #include "Layer.h"
 
-/*! Initializes a 1D convolution layer with given parameters.
+/* Legacy (pre-2026-05-15 factory API) — retained during PR 1/2 coexistence window.
+ * New code should use the conv1dInit_t-based factories declared in PR 2. */
+
+/*! Legacy Conv1d factory.
  *
  * @param weights Weights with gradients
  * @param bias Optional bias parameter with gradients
@@ -14,16 +17,57 @@
  * @param biasGradQ Quantization for bias gradient calculation
  * @param propLossQ Quantization for prop loss calculation
  *
- * @returns Pointer to initializes layer_t
+ * @returns Pointer to initialized layer_t
  */
-layer_t *conv1dLayerInit(parameter_t *weights, parameter_t *bias, kernel_t *kernel,
-                         quantization_t *forwardQ, quantization_t *weightGradQ,
-                         quantization_t *biasGradQ, quantization_t *propLossQ);
+layer_t *conv1dLayerInitLegacy(parameter_t *weights, parameter_t *bias, kernel_t *kernel,
+                               quantization_t *forwardQ, quantization_t *weightGradQ,
+                               quantization_t *biasGradQ, quantization_t *propLossQ);
+
+/*! Frees a Conv1d layer built via the legacy factory. */
+void freeConv1dLayerLegacy(layer_t *conv1dLayer);
+
+#include "LayerCommon.h"
+#include "LayerQuant.h"
 
-/*! Frees 1D convolutional layer and all contained data structures recursively
+_Static_assert(VALID == 0,
+               "paddingType_t::VALID must be enum value 0 so .padding zero-init defaults to VALID");
+
+/*! Conv1d factory configuration. Build via designated initializer:
  *
- * @param conv1dLayer Pointer to layer_t
- */
+ *      conv1dLayerInit(&(conv1dInit_t){
+ *          .inChannels = 3, .outChannels = 16, .kernelSize = 5,
+ *          .padding = SAME, .stride = 1,
+ *      }, lq);
+ *
+ *  REQUIRED fields fire PRINT_ERROR + exit(1) if zero. Defaults below are
+ *  applied when the field is zero-init (compound-literal omission). */
+typedef struct conv1dInit {
+    /* REQUIRED */
+    size_t inChannels;
+    size_t outChannels;
+    size_t kernelSize;
+    /* OPTIONAL — zero-init defaults */
+    size_t stride;         /* 0 → 1 */
+    paddingType_t padding; /* 0 → VALID (enum value 0) */
+    size_t dilation;       /* 0 → 1 */
+    size_t groups;         /* 0 → 1 */
+    bias_t bias;           /* BIAS_DEFAULT (0) → resolves to true (PyTorch parity) */
+} conv1dInit_t;
+
+/*! Borrowing variant — factory allocates weights/bias/kernel internally
+ *  and stores the four math `quantization_t*` from `lq` verbatim. Caller
+ *  retains ownership of `lq` and the quantizations; `lq` may be a
+ *  compound literal. */
+layer_t *conv1dLayerInit(conv1dInit_t *init, layerQuant_t *lq);
+
+/*! Owning variant — same as `conv1dLayerInit`, but additionally
+ *  `deepCopyQuantization`s each of the four math quantizations. Caller
+ *  can drop `lq` and the quantization_t's immediately. */
+layer_t *conv1dLayerInitOwning(conv1dInit_t *init, layerQuant_t *lq);
+
+/*! Tears down everything the factory allocated. Reads
+ *  `config->ownsQuantizations` to decide whether to also free the four
+ *  math quantizations and their qConfigs. */
 void freeConv1dLayer(layer_t *conv1dLayer);
 
 #endif // CONV1DAPI_H
diff --git a/src/userApi/layer/include/Conv1dTransposedApi.h b/src/userApi/layer/include/Conv1dTransposedApi.h
new file mode 100644
index 0000000..a24e89b
--- /dev/null
+++ b/src/userApi/layer/include/Conv1dTransposedApi.h
@@ -0,0 +1,51 @@
+#ifndef CONV1D_TRANSPOSED_API_H
+#define CONV1D_TRANSPOSED_API_H
+
+#include <stddef.h>
+
+#include "Kernel.h"
+#include "Layer.h"
+#include "LayerCommon.h"
+#include "LayerQuant.h"
+
+_Static_assert(VALID == 0,
+               "paddingType_t::VALID must be enum value 0 so .padding zero-init defaults to VALID");
+
+/*! Conv1dTransposed factory configuration. Mirrors conv1dInit_t plus PyTorch's
+ *  outputPadding parameter. Build via designated initializer:
+ *
+ *      conv1dTransposedLayerInit(&(conv1dTransposedInit_t){
+ *          .inChannels = 16, .outChannels = 8, .kernelSize = 5, .stride = 5,
+ *      }, lq);
+ *
+ *  REQUIRED fields fire PRINT_ERROR + exit(1) if zero. Phase-1 contract:
+ *  only VALID padding is supported (initConv1dTransposedConfigWithWeightsAndBias
+ *  aborts on SAME). */
+typedef struct conv1dTransposedInit {
+    /* REQUIRED */
+    size_t inChannels;
+    size_t outChannels;
+    size_t kernelSize;
+    /* OPTIONAL */
+    size_t stride;         /* 0 → 1 */
+    paddingType_t padding; /* 0 → VALID. SAME is rejected by the internal layer in Phase 1. */
+    size_t dilation;       /* 0 → 1 */
+    size_t groups;         /* 0 → 1 */
+    size_t outputPadding;  /* PyTorch parity; default 0; must be < max(stride, dilation) */
+    bias_t bias;           /* BIAS_DEFAULT (0) → resolves to true */
+} conv1dTransposedInit_t;
+
+/*! Borrowing variant — allocates kernel, weights, bias; stores the four
+ *  lq math quantizations verbatim. Caller retains ownership of lq. */
+layer_t *conv1dTransposedLayerInit(conv1dTransposedInit_t *init, layerQuant_t *lq);
+
+/*! Owning variant — additionally deep-copies the four math quantizations
+ *  via deepCopyQuantization. */
+layer_t *conv1dTransposedLayerInitOwning(conv1dTransposedInit_t *init, layerQuant_t *lq);
+
+/*! Tears down everything the factory allocated. Reads
+ *  config->ownsQuantizations to decide whether to also free the four
+ *  math quantizations and their qConfigs. */
+void freeConv1dTransposedLayer(layer_t *layer);
+
+#endif /* CONV1D_TRANSPOSED_API_H */
diff --git a/src/userApi/layer/include/Pool1dApi.h b/src/userApi/layer/include/Pool1dApi.h
new file mode 100644
index 0000000..a1f77bd
--- /dev/null
+++ b/src/userApi/layer/include/Pool1dApi.h
@@ -0,0 +1,67 @@
+#ifndef POOL1D_API_H
+#define POOL1D_API_H
+
+#include <stddef.h>
+
+#include "Kernel.h"
+#include "Layer.h"
+#include "LayerQuant.h"
+
+_Static_assert(VALID == 0,
+               "paddingType_t::VALID must be enum value 0 so .padding zero-init defaults to VALID");
+
+/*! MaxPool1d factory configuration.
+ *
+ *  Requires input geometry (inputChannels, inputLength) because the
+ *  factory pre-allocates an argmaxIndices INT32 tensor sized for the
+ *  layer's output shape. Batch size is hardcoded to 1 (the training
+ *  loop iterates microbatch-by-microbatch in this framework).
+ *
+ *  Usage:
+ *
+ *      maxPool1dLayerInit(&(maxPool1dInit_t){
+ *          .kernelSize = 2, .stride = 2,
+ *          .inputChannels = 16, .inputLength = 64,
+ *      }, lq);
+ */
+typedef struct maxPool1dInit {
+    /* REQUIRED */
+    size_t kernelSize;
+    size_t inputChannels;
+    size_t inputLength;
+    /* OPTIONAL — zero-init defaults */
+    size_t stride;         /* 0 → kernelSize (PyTorch pool convention) */
+    paddingType_t padding; /* 0 → VALID */
+    size_t dilation;       /* 0 → 1 */
+} maxPool1dInit_t;
+
+/*! AvgPool1d factory configuration. No argmax tensor needed, hence no
+ *  input geometry. Note: dilation field omitted because AvgPool1d
+ *  arithmetic kernel does not support dilation. */
+typedef struct avgPool1dInit {
+    /* REQUIRED */
+    size_t kernelSize;
+    /* OPTIONAL */
+    size_t stride;         /* 0 → kernelSize */
+    paddingType_t padding; /* 0 → VALID */
+} avgPool1dInit_t;
+
+/*! Borrowing variant — allocates kernel and (for MaxPool) the argmax
+ *  tensor; stores lq->forwardMath in forwardQ and lq->backwardMath in
+ *  propLossQ. */
+layer_t *maxPool1dLayerInit(maxPool1dInit_t *init, layerQuant_t *lq);
+layer_t *avgPool1dLayerInit(avgPool1dInit_t *init, layerQuant_t *lq);
+
+/*! Owning variant — additionally deep-copies forwardMath and
+ *  backwardMath via deepCopyQuantization. */
+layer_t *maxPool1dLayerInitOwning(maxPool1dInit_t *init, layerQuant_t *lq);
+layer_t *avgPool1dLayerInitOwning(avgPool1dInit_t *init, layerQuant_t *lq);
+
+/*! Tears down everything the factory allocated. For MaxPool, this
+ *  includes the argmax tensor. Reads config->ownsQuantizations to
+ *  decide whether to also free the two math quantizations and their
+ *  qConfigs. */
+void freeMaxPool1dLayer(layer_t *layer);
+void freeAvgPool1dLayer(layer_t *layer);
+
+#endif /* POOL1D_API_H */
diff --git a/src/userApi/layer/include/SoftmaxApi.h b/src/userApi/layer/include/SoftmaxApi.h
index 5ea4d7c..beaa289 100644
--- a/src/userApi/layer/include/SoftmaxApi.h
+++ b/src/userApi/layer/include/SoftmaxApi.h
@@ -4,8 +4,8 @@
 #include "Layer.h"
 #include "Tensor.h"
 
-layer_t *softmaxLayerInit(quantization_t *forwardQ, quantization_t *backwardQ);
-
-void freeSoftmaxLayer(layer_t *softmaxLayer);
+/* Legacy (pre-2026-05-15 factory API) — retained during PR 1/2 coexistence window. */
+layer_t *softmaxLayerInitLegacy(quantization_t *forwardQ, quantization_t *backwardQ);
+void freeSoftmaxLayerLegacy(layer_t *softmaxLayer);
 
 #endif // SOFTMAXAPI_H
diff --git a/test/unit/layer/UnitTestConv1d.c b/test/unit/layer/UnitTestConv1d.c
index f09556f..93252bb 100644
--- a/test/unit/layer/UnitTestConv1d.c
+++ b/test/unit/layer/UnitTestConv1d.c
@@ -66,7 +66,7 @@ static conv1dRunResult_t conv1dRunForward(conv1dFixtureSetup_t s, float *outputB
     r.q = quantizationInitFloat();
 
     if (s.groups == 1) {
-        r.layer = conv1dLayerInit(r.weights, r.bias, &kernelStore, r.q, r.q, r.q, r.q);
+        r.layer = conv1dLayerInitLegacy(r.weights, r.bias, &kernelStore, r.q, r.q, r.q, r.q);
     } else {
         // Phase-2 will expose groups via UserAPI; here we go around the UserAPI.
         // All statics so their addresses remain valid after this function returns.
@@ -104,7 +104,7 @@ void testConv1dForwardMultiChannelWithBias() {
     kernel_t kernel;
     initKernel(&kernel, 3, VALID, 1, 1);
     quantization_t *q = quantizationInitFloat();
-    layer_t *conv1d = conv1dLayerInit(weights, bias, &kernel, q, q, q, q);
+    layer_t *conv1d = conv1dLayerInitLegacy(weights, bias, &kernel, q, q, q, q);
 
     size_t inputDims[] = {1, 3, 5};
     tensor_t *input =
@@ -131,7 +131,7 @@ void testConv1dForwardSingleChannelSingleBatch() {
     initKernel(&kernel, 2, VALID, 1, 1);
 
     quantization_t *q = quantizationInitFloat();
-    layer_t *conv1d = conv1dLayerInit(weights, NULL, &kernel, q, q, q, q);
+    layer_t *conv1d = conv1dLayerInitLegacy(weights, NULL, &kernel, q, q, q, q);
 
     size_t inputDims[] = {1, 1, 4};
     tensor_t *input =
@@ -165,7 +165,7 @@ void testConv1dBackwardSingleChannelWithBias() {
     kernel_t kernel;
     initKernel(&kernel, 2, VALID, 1, 1);
     quantization_t *q = quantizationInitFloat();
-    layer_t *conv1d = conv1dLayerInit(weights, bias, &kernel, q, q, q, q);
+    layer_t *conv1d = conv1dLayerInitLegacy(weights, bias, &kernel, q, q, q, q);
 
     size_t inputDims[] = {1, 1, 4};
     tensor_t *input =
@@ -210,7 +210,7 @@ void testConv1dBackwardSamePaddingSymmetric() {
     kernel_t kernel;
     initKernel(&kernel, 3, SAME, 1, 1);
     quantization_t *q = quantizationInitFloat();
-    layer_t *conv1d = conv1dLayerInit(weights, NULL, &kernel, q, q, q, q);
+    layer_t *conv1d = conv1dLayerInitLegacy(weights, NULL, &kernel, q, q, q, q);
 
     size_t inputDims[] = {1, 1, 5};
     tensor_t *input =
diff --git a/test/unit/layer/UnitTestSoftmax.c b/test/unit/layer/UnitTestSoftmax.c
index 8a25b33..087116b 100644
--- a/test/unit/layer/UnitTestSoftmax.c
+++ b/test/unit/layer/UnitTestSoftmax.c
@@ -34,7 +34,7 @@ void unitTestSoftmaxForwardFloat() {
 
     /* 3. Build the layer with shared float quantization. */
     quantization_t *floatQ = quantizationInitFloat();
-    layer_t *softmaxLayer = softmaxLayerInit(floatQ, floatQ);
+    layer_t *softmaxLayer = softmaxLayerInitLegacy(floatQ, floatQ);
     layerFunctions_t softmaxFns = layerFunctions[SOFTMAX];
     softmaxFns.forward(softmaxLayer, input, output);
 
@@ -45,7 +45,7 @@ void unitTestSoftmaxForwardFloat() {
     }
 
     /* 5. FREE. */
-    freeSoftmaxLayer(softmaxLayer);
+    freeSoftmaxLayerLegacy(softmaxLayer);
     freeTensor(output);
     freeTensor(input);
     freeQuantization(floatQ);
@@ -84,7 +84,7 @@ void unitTestSoftmaxForwardSymInt32() {
 
     /* 3. Shared SymInt32 quantization for the layer. */
     quantization_t *symIntQ = quantizationInitSymInt32(HTE);
-    layer_t *softmaxLayer = softmaxLayerInit(symIntQ, symIntQ);
+    layer_t *softmaxLayer = softmaxLayerInitLegacy(symIntQ, symIntQ);
     layerFunctions_t softmaxFns = layerFunctions[SOFTMAX];
     softmaxFns.forward(softmaxLayer, input, output);
 
@@ -107,7 +107,7 @@ void unitTestSoftmaxForwardSymInt32() {
 
     /* 6. FREE. */
     freeTensor(outputFloat);
-    freeSoftmaxLayer(softmaxLayer);
+    freeSoftmaxLayerLegacy(softmaxLayer);
     freeTensor(output);
     freeTensor(input);
     freeQuantization(symIntQ);
@@ -159,7 +159,7 @@ void unitTestSoftmaxBackwardFloat() {
 
     /* 4. Build layer. */
     quantization_t *floatQ = quantizationInitFloat();
-    layer_t *softmaxLayer = softmaxLayerInit(floatQ, floatQ);
+    layer_t *softmaxLayer = softmaxLayerInitLegacy(floatQ, floatQ);
     layerFunctions_t softmaxFns = layerFunctions[SOFTMAX];
     softmaxFns.backward(softmaxLayer, input, loss, propLoss);
 
@@ -170,7 +170,7 @@ void unitTestSoftmaxBackwardFloat() {
     }
 
     /* 6. FREE. */
-    freeSoftmaxLayer(softmaxLayer);
+    freeSoftmaxLayerLegacy(softmaxLayer);
     freeTensor(propLoss);
     freeTensor(loss);
     freeTensor(input);
@@ -223,7 +223,7 @@ void unitTestSoftmaxBackwardSymInt32() {
 
     /* 4. Build layer. */
     quantization_t *symIntQ = quantizationInitSymInt32(HTE);
-    layer_t *softmaxLayer = softmaxLayerInit(symIntQ, symIntQ);
+    layer_t *softmaxLayer = softmaxLayerInitLegacy(symIntQ, symIntQ);
     layerFunctions_t softmaxFns = layerFunctions[SOFTMAX];
     softmaxFns.backward(softmaxLayer, input, loss, propLoss);
 
@@ -246,7 +246,7 @@ void unitTestSoftmaxBackwardSymInt32() {
 
     /* 7. FREE. */
     freeTensor(propLossFloat);
-    freeSoftmaxLayer(softmaxLayer);
+    freeSoftmaxLayerLegacy(softmaxLayer);
     freeTensor(propLoss);
     freeTensor(loss);
     freeTensor(input);
@@ -267,13 +267,13 @@ void testSoftmaxLayerInitAndFreeRoundTrip(void) {
      * sweep — this test asserts only that the round-trip completes
      * without a crash and that the layer was wired correctly. */
     quantization_t *floatQ = quantizationInitFloat();
-    layer_t *softmaxLayer = softmaxLayerInit(floatQ, floatQ);
+    layer_t *softmaxLayer = softmaxLayerInitLegacy(floatQ, floatQ);
     TEST_ASSERT_NOT_NULL(softmaxLayer);
     TEST_ASSERT_EQUAL_INT(SOFTMAX, softmaxLayer->type);
     TEST_ASSERT_NOT_NULL(softmaxLayer->config);
     TEST_ASSERT_NOT_NULL(softmaxLayer->config->softmax);
 
-    freeSoftmaxLayer(softmaxLayer);
+    freeSoftmaxLayerLegacy(softmaxLayer);
 
     /* floatQ is owned by the test; freeSoftmaxLayer must not have freed
      * it (quantization configs are externally owned and shared). */
diff --git a/test/unit/loss_functions/UnitTestCrossEntropy.c b/test/unit/loss_functions/UnitTestCrossEntropy.c
index 8cad28e..065dbb5 100644
--- a/test/unit/loss_functions/UnitTestCrossEntropy.c
+++ b/test/unit/loss_functions/UnitTestCrossEntropy.c
@@ -33,7 +33,7 @@ void unitTestCrossEntropySoftmaxBackward() {
                     &softmaxOutputQ, NULL);
 
     quantization_t *floatQ = quantizationInitFloat();
-    layer_t *softmaxLayer = softmaxLayerInit(floatQ, floatQ);
+    layer_t *softmaxLayer = softmaxLayerInitLegacy(floatQ, floatQ);
     layerFunctions_t softmaxFns = layerFunctions[SOFTMAX];
     softmaxFns.forward(softmaxLayer, &logits, &softmaxOutput);
 
@@ -70,7 +70,7 @@ void unitTestCrossEntropySoftmaxBackward() {
     }
 
     /* FREE. */
-    freeSoftmaxLayer(softmaxLayer);
+    freeSoftmaxLayerLegacy(softmaxLayer);
     freeQuantization(floatQ);
 
     /* ASSERT: raw per-element gradient (p-y), no batch divisor. */
@@ -138,7 +138,7 @@ void testCrossEntropyForward_SumReturnsRawSum() {
     setTensorValues(&softmaxOutput, (uint8_t *)outputData, &outputShape, &outputQ, NULL);
 
     quantization_t *floatQ = quantizationInitFloat();
-    layer_t *softmaxLayer = softmaxLayerInit(floatQ, floatQ);
+    layer_t *softmaxLayer = softmaxLayerInitLegacy(floatQ, floatQ);
     layerFunctions_t softmaxFns = layerFunctions[SOFTMAX];
     softmaxFns.forward(softmaxLayer, &logits, &softmaxOutput);
 
@@ -154,7 +154,7 @@ void testCrossEntropyForward_SumReturnsRawSum() {
 
     float capturedActual = crossEntropyForwardFloat(&softmaxOutput, &distribution, REDUCTION_SUM);
 
-    freeSoftmaxLayer(softmaxLayer);
+    freeSoftmaxLayerLegacy(softmaxLayer);
     freeQuantization(floatQ);
 
     /* SUM: same as the pre-existing forward value (raw -log probability sum). */
diff --git a/test/unit/serial/UnitTestDeserialize.c b/test/unit/serial/UnitTestDeserialize.c
index 34d749b..4641fe0 100644
--- a/test/unit/serial/UnitTestDeserialize.c
+++ b/test/unit/serial/UnitTestDeserialize.c
@@ -134,7 +134,7 @@ void testSerializeAndDeserializeModel() {
 
     layer_t *serialLinear1 = linearLayerInitLegacy(serialWeight1, serialBias1, serialLayerQ,
                                                    serialLayerQ, serialLayerQ, serialLayerQ);
-    layer_t *serialSoftmax = softmaxLayerInit(serialLayerQ, serialLayerQ);
+    layer_t *serialSoftmax = softmaxLayerInitLegacy(serialLayerQ, serialLayerQ);
 
     layer_t *serialModel[] = {serialLinear0, serialRelu, serialLinear1, serialSoftmax};
     size_t sizeModel = 4;
@@ -168,7 +168,7 @@ void testSerializeAndDeserializeModel() {
     layer_t *deserialLinear1 =
         linearLayerInitLegacy(deserialWeight1, deserialBias1, deserialLayerQ, deserialLayerQ,
                               deserialLayerQ, deserialLayerQ);
-    layer_t *deserialSoftmax = softmaxLayerInit(deserialLayerQ, deserialLayerQ);
+    layer_t *deserialSoftmax = softmaxLayerInitLegacy(deserialLayerQ, deserialLayerQ);
 
     layer_t *deserialModel[] = {deserialLinear0, deserialRelu, deserialLinear1, deserialSoftmax};
 
@@ -246,7 +246,7 @@ void testSerializeAndDeserializeModel() {
     /* FREE in reverse-init order. Layer free-functions release only the
      * wrapper; parameters and the shared layerQ are caller-managed (per
      * docs/CONVENTIONS.md "Test memory discipline"). */
-    freeSoftmaxLayer(deserialSoftmax);
+    freeSoftmaxLayerLegacy(deserialSoftmax);
     freeLinearLayerLegacy(deserialLinear1);
     freeParameter(deserialBias1);
     freeParameter(deserialWeight1);
@@ -256,7 +256,7 @@ void testSerializeAndDeserializeModel() {
     freeParameter(deserialWeight0);
     freeQuantization(deserialLayerQ);
 
-    freeSoftmaxLayer(serialSoftmax);
+    freeSoftmaxLayerLegacy(serialSoftmax);
     freeLinearLayerLegacy(serialLinear1);
     freeParameter(serialBias1);
     freeParameter(serialWeight1);
diff --git a/test/unit/userAPI/CMakeLists.txt b/test/unit/userAPI/CMakeLists.txt
index 3153a87..4f0558b 100644
--- a/test/unit/userAPI/CMakeLists.txt
+++ b/test/unit/userAPI/CMakeLists.txt
@@ -19,13 +19,19 @@ add_elastic_ai_unit_test(
         LayerWeightsApi
         MORE_LIBS
         LinearApi
+        Conv1dApi
+        Conv1dTransposedApi
         QuantizationApi
         LayerQuant
         LayerCommon
         Quantization
         Rounding
         Linear
+        Conv1d
+        Conv1dTransposed
+        Kernel
         Tensor
+        TensorApi
 )
 
 
@@ -81,6 +87,51 @@ add_elastic_ai_unit_test(
         StorageApi
 )
 
+add_elastic_ai_unit_test(
+        LIB_UNDER_TEST
+        Conv1dApi
+        MORE_LIBS
+        LayerQuant
+        LayerCommon
+        QuantizationApi
+        Quantization
+        Rounding
+        Conv1d
+        Kernel
+        Tensor
+        TensorApi
+)
+
+add_elastic_ai_unit_test(
+        LIB_UNDER_TEST
+        Conv1dTransposedApi
+        MORE_LIBS
+        LayerQuant
+        LayerCommon
+        QuantizationApi
+        Quantization
+        Rounding
+        Conv1dTransposed
+        Kernel
+        Tensor
+        TensorApi
+)
+
+add_elastic_ai_unit_test(
+        LIB_UNDER_TEST
+        Pool1dApi
+        MORE_LIBS
+        LayerQuant
+        QuantizationApi
+        Quantization
+        Rounding
+        MaxPool1d
+        AvgPool1d
+        Kernel
+        Tensor
+        TensorApi
+)
+
 add_executable(UnitTestMultiLayerTraining UnitTestMultiLayerTraining.c)
 target_link_libraries(UnitTestMultiLayerTraining PRIVATE
         unity
diff --git a/test/unit/userAPI/UnitTestConv1dApi.c b/test/unit/userAPI/UnitTestConv1dApi.c
new file mode 100644
index 0000000..a605a9b
--- /dev/null
+++ b/test/unit/userAPI/UnitTestConv1dApi.c
@@ -0,0 +1,147 @@
+#define SOURCE_FILE "UNIT_TEST_CONV1D_API"
+
+#include "Conv1d.h"
+#include "Conv1dApi.h"
+#include "Kernel.h"
+#include "Layer.h"
+#include "LayerCommon.h"
+#include "LayerQuant.h"
+#include "QuantizationApi.h"
+#include "Tensor.h"
+#include "unity.h"
+
+void setUp() {}
+void tearDown() {}
+
+void testConv1dLayerInitBorrowingBuildsLayerWithCorrectShape(void) {
+    quantization_t *q = quantizationInitFloat();
+    layerQuant_t lq;
+    layerQuantInitUniform(&lq, q);
+
+    layer_t *layer = conv1dLayerInit(
+        &(conv1dInit_t){
+            .inChannels = 3,
+            .outChannels = 4,
+            .kernelSize = 5,
+            .padding = VALID,
+            .stride = 1,
+            .dilation = 1,
+            .groups = 1,
+            .bias = BIAS_TRUE,
+        },
+        &lq);
+
+    TEST_ASSERT_NOT_NULL(layer);
+    TEST_ASSERT_EQUAL_INT(CONV1D, layer->type);
+
+    conv1dConfig_t *cfg = layer->config->conv1d;
+    TEST_ASSERT_NOT_NULL(cfg);
+    TEST_ASSERT_FALSE(cfg->ownsQuantizations);
+
+    /* Borrowing variant stores pointers verbatim */
+    TEST_ASSERT_EQUAL_PTR(q, cfg->forwardQ);
+    TEST_ASSERT_EQUAL_PTR(q, cfg->weightGradQ);
+    TEST_ASSERT_EQUAL_PTR(q, cfg->biasGradQ);
+    TEST_ASSERT_EQUAL_PTR(q, cfg->propLossQ);
+
+    /* Weights allocated with shape [outChannels, inChannels/groups, kernelSize] */
+    TEST_ASSERT_NOT_NULL(cfg->weights);
+    tensor_t *weightTensor = cfg->weights->param;
+    TEST_ASSERT_NOT_NULL(weightTensor);
+    TEST_ASSERT_EQUAL_UINT(3, weightTensor->shape->numberOfDimensions);
+    TEST_ASSERT_EQUAL_UINT(4, weightTensor->shape->dimensions[0]); /* outChannels */
+    TEST_ASSERT_EQUAL_UINT(3, weightTensor->shape->dimensions[1]); /* inChannels / groups */
+    TEST_ASSERT_EQUAL_UINT(5, weightTensor->shape->dimensions[2]); /* kernelSize */
+
+    /* Bias allocated with shape [outChannels] */
+    TEST_ASSERT_NOT_NULL(cfg->bias);
+    tensor_t *biasTensor = cfg->bias->param;
+    TEST_ASSERT_NOT_NULL(biasTensor);
+    TEST_ASSERT_EQUAL_UINT(1, biasTensor->shape->numberOfDimensions);
+    TEST_ASSERT_EQUAL_UINT(4, biasTensor->shape->dimensions[0]);
+
+    /* Kernel populated from init struct */
+    TEST_ASSERT_NOT_NULL(cfg->kernel);
+    TEST_ASSERT_EQUAL_UINT(5, cfg->kernel->size);
+    TEST_ASSERT_EQUAL_INT(VALID, cfg->kernel->paddingType);
+    TEST_ASSERT_EQUAL_UINT(1, cfg->kernel->stride);
+    TEST_ASSERT_EQUAL_UINT(1, cfg->kernel->dilation);
+
+    /* groups defaulted to 1 explicitly via init */
+    TEST_ASSERT_EQUAL_UINT(1, cfg->groups);
+
+    freeConv1dLayer(layer);
+}
+
+void testConv1dLayerInitBorrowingBiasDefaultResolvesToTrue(void) {
+    quantization_t *q = quantizationInitFloat();
+    layerQuant_t lq;
+    layerQuantInitUniform(&lq, q);
+
+    layer_t *layer = conv1dLayerInit(
+        &(conv1dInit_t){
+            .inChannels = 1,
+            .outChannels = 2,
+            .kernelSize = 3,
+            /* .bias omitted → BIAS_DEFAULT (0) → resolves to true */
+        },
+        &lq);
+
+    conv1dConfig_t *cfg = layer->config->conv1d;
+    TEST_ASSERT_NOT_NULL(cfg->bias);
+
+    freeConv1dLayer(layer);
+}
+
+void testConv1dLayerInitBorrowingBiasFalseLeavesBiasNull(void) {
+    quantization_t *q = quantizationInitFloat();
+    layerQuant_t lq;
+    layerQuantInitUniform(&lq, q);
+
+    layer_t *layer = conv1dLayerInit(
+        &(conv1dInit_t){
+            .inChannels = 1,
+            .outChannels = 2,
+            .kernelSize = 3,
+            .bias = BIAS_FALSE,
+        },
+        &lq);
+
+    conv1dConfig_t *cfg = layer->config->conv1d;
+    TEST_ASSERT_NULL(cfg->bias);
+
+    freeConv1dLayer(layer);
+}
+
+void testConv1dLayerInitBorrowingPaddingDefaultIsValid(void) {
+    quantization_t *q = quantizationInitFloat();
+    layerQuant_t lq;
+    layerQuantInitUniform(&lq, q);
+
+    layer_t *layer = conv1dLayerInit(
+        &(conv1dInit_t){
+            .inChannels = 1,
+            .outChannels = 1,
+            .kernelSize = 3,
+            /* .padding omitted → VALID (enum value 0) */
+            /* .stride, .dilation, .groups omitted → 1 (resolved from 0) */
+        },
+        &lq);
+
+    conv1dConfig_t *cfg = layer->config->conv1d;
+    TEST_ASSERT_EQUAL_INT(VALID, cfg->kernel->paddingType);
+    TEST_ASSERT_EQUAL_UINT(1, cfg->kernel->stride);
+    TEST_ASSERT_EQUAL_UINT(1, cfg->kernel->dilation);
+    TEST_ASSERT_EQUAL_UINT(1, cfg->groups);
+
+    freeConv1dLayer(layer);
+}
+
+int main(void) {
+    UNITY_BEGIN();
+    RUN_TEST(testConv1dLayerInitBorrowingBuildsLayerWithCorrectShape);
+    RUN_TEST(testConv1dLayerInitBorrowingBiasDefaultResolvesToTrue);
+    RUN_TEST(testConv1dLayerInitBorrowingBiasFalseLeavesBiasNull);
+    RUN_TEST(testConv1dLayerInitBorrowingPaddingDefaultIsValid);
+    return UNITY_END();
+}
diff --git a/test/unit/userAPI/UnitTestConv1dTransposedApi.c b/test/unit/userAPI/UnitTestConv1dTransposedApi.c
new file mode 100644
index 0000000..4725fd0
--- /dev/null
+++ b/test/unit/userAPI/UnitTestConv1dTransposedApi.c
@@ -0,0 +1,127 @@
+#define SOURCE_FILE "UNIT_TEST_CONV1D_TRANSPOSED_API"
+
+#include "Conv1dTransposed.h"
+#include "Conv1dTransposedApi.h"
+#include "Kernel.h"
+#include "Layer.h"
+#include "LayerCommon.h"
+#include "LayerQuant.h"
+#include "QuantizationApi.h"
+#include "Tensor.h"
+#include "TensorApi.h"
+#include "unity.h"
+
+void setUp() {}
+void tearDown() {}
+
+void testConv1dTransposedLayerInitBorrowingBuildsLayerWithCorrectShape(void) {
+    quantization_t *q = quantizationInitFloat();
+    layerQuant_t lq;
+    layerQuantInitUniform(&lq, q);
+
+    layer_t *layer = conv1dTransposedLayerInit(
+        &(conv1dTransposedInit_t){
+            .inChannels = 16,
+            .outChannels = 8,
+            .kernelSize = 5,
+            .stride = 5,
+            .padding = VALID,
+            .bias = BIAS_TRUE,
+        },
+        &lq);
+
+    TEST_ASSERT_NOT_NULL(layer);
+    TEST_ASSERT_EQUAL_INT(CONV1D_TRANSPOSED, layer->type);
+
+    conv1dTransposedConfig_t *cfg = layer->config->conv1dTransposed;
+    TEST_ASSERT_NOT_NULL(cfg);
+    TEST_ASSERT_FALSE(cfg->ownsQuantizations);
+
+    /* Borrowing variant stores pointers verbatim */
+    TEST_ASSERT_EQUAL_PTR(q, cfg->forwardQ);
+    TEST_ASSERT_EQUAL_PTR(q, cfg->weightGradQ);
+    TEST_ASSERT_EQUAL_PTR(q, cfg->biasGradQ);
+    TEST_ASSERT_EQUAL_PTR(q, cfg->propLossQ);
+
+    /* Weight shape: [inChannels, outChannels/groups, kernelSize] per Conv1dTransposed.h:12.
+     * Note SWAP from Conv1d. */
+    TEST_ASSERT_NOT_NULL(cfg->weights);
+    tensor_t *weightTensor = cfg->weights->param;
+    TEST_ASSERT_NOT_NULL(weightTensor);
+    TEST_ASSERT_EQUAL_UINT(3, weightTensor->shape->numberOfDimensions);
+    TEST_ASSERT_EQUAL_UINT(16, weightTensor->shape->dimensions[0]); /* inChannels */
+    TEST_ASSERT_EQUAL_UINT(8, weightTensor->shape->dimensions[1]);  /* outChannels / groups */
+    TEST_ASSERT_EQUAL_UINT(5, weightTensor->shape->dimensions[2]);  /* kernelSize */
+
+    /* Bias shape: [outChannels] */
+    TEST_ASSERT_NOT_NULL(cfg->bias);
+    tensor_t *biasTensor = cfg->bias->param;
+    TEST_ASSERT_EQUAL_UINT(1, biasTensor->shape->numberOfDimensions);
+    TEST_ASSERT_EQUAL_UINT(8, biasTensor->shape->dimensions[0]);
+
+    /* Kernel populated from init struct */
+    TEST_ASSERT_NOT_NULL(cfg->kernel);
+    TEST_ASSERT_EQUAL_UINT(5, cfg->kernel->size);
+    TEST_ASSERT_EQUAL_INT(VALID, cfg->kernel->paddingType);
+    TEST_ASSERT_EQUAL_UINT(5, cfg->kernel->stride);
+
+    /* groups + outputPadding defaulted to 1 / 0 */
+    TEST_ASSERT_EQUAL_UINT(1, cfg->groups);
+    TEST_ASSERT_EQUAL_UINT(0, cfg->outputPadding);
+
+    freeConv1dTransposedLayer(layer);
+    freeQuantization(q);
+}
+
+void testConv1dTransposedLayerInitBorrowingBiasFalseLeavesBiasNull(void) {
+    quantization_t *q = quantizationInitFloat();
+    layerQuant_t lq;
+    layerQuantInitUniform(&lq, q);
+
+    layer_t *layer = conv1dTransposedLayerInit(
+        &(conv1dTransposedInit_t){
+            .inChannels = 4,
+            .outChannels = 2,
+            .kernelSize = 3,
+            .bias = BIAS_FALSE,
+        },
+        &lq);
+
+    conv1dTransposedConfig_t *cfg = layer->config->conv1dTransposed;
+    TEST_ASSERT_NULL(cfg->bias);
+
+    freeConv1dTransposedLayer(layer);
+    freeQuantization(q);
+}
+
+void testConv1dTransposedLayerInitBorrowingOutputPaddingPropagatesToConfig(void) {
+    quantization_t *q = quantizationInitFloat();
+    layerQuant_t lq;
+    layerQuantInitUniform(&lq, q);
+
+    layer_t *layer = conv1dTransposedLayerInit(
+        &(conv1dTransposedInit_t){
+            .inChannels = 4,
+            .outChannels = 2,
+            .kernelSize = 3,
+            .stride = 2,
+            .outputPadding = 1,
+            .bias = BIAS_TRUE,
+        },
+        &lq);
+
+    conv1dTransposedConfig_t *cfg = layer->config->conv1dTransposed;
+    TEST_ASSERT_EQUAL_UINT(1, cfg->outputPadding);
+    TEST_ASSERT_EQUAL_UINT(2, cfg->kernel->stride);
+
+    freeConv1dTransposedLayer(layer);
+    freeQuantization(q);
+}
+
+int main(void) {
+    UNITY_BEGIN();
+    RUN_TEST(testConv1dTransposedLayerInitBorrowingBuildsLayerWithCorrectShape);
+    RUN_TEST(testConv1dTransposedLayerInitBorrowingBiasFalseLeavesBiasNull);
+    RUN_TEST(testConv1dTransposedLayerInitBorrowingOutputPaddingPropagatesToConfig);
+    return UNITY_END();
+}
diff --git a/test/unit/userAPI/UnitTestFlattenIntegration.c b/test/unit/userAPI/UnitTestFlattenIntegration.c
index 0203831..d4124c1 100644
--- a/test/unit/userAPI/UnitTestFlattenIntegration.c
+++ b/test/unit/userAPI/UnitTestFlattenIntegration.c
@@ -51,7 +51,7 @@ void testCalculateGradsSequential_WithFlattenFirst_DoesNotCrash(void) {
 
     layer_t *flatten = flattenLayerInit();
     layer_t *linear = linearLayerInitLegacy(w0, b0, q, q, q, q);
-    layer_t *softmax = softmaxLayerInit(q, q);
+    layer_t *softmax = softmaxLayerInitLegacy(q, q);
     layer_t *model[3] = {flatten, linear, softmax};
 
     /* Input [1, 2, 3] = 6 elements. */
@@ -93,7 +93,7 @@ void testCalculateGradsSequential_WithFlattenFirst_DoesNotCrash(void) {
     freeTrainingStats(stats);
     freeTensor(label);
     freeTensor(input);
-    freeSoftmaxLayer(softmax);
+    freeSoftmaxLayerLegacy(softmax);
     freeLinearLayerLegacy(linear);
     freeFlattenLayer(flatten);
     freeParameter(b0);
diff --git a/test/unit/userAPI/UnitTestLayerQuant.c b/test/unit/userAPI/UnitTestLayerQuant.c
index ff3f919..8c37b10 100644
--- a/test/unit/userAPI/UnitTestLayerQuant.c
+++ b/test/unit/userAPI/UnitTestLayerQuant.c
@@ -2,6 +2,7 @@
 
 #include "LayerQuant.h"
 #include "QuantizationApi.h"
+#include "StorageApi.h"
 #include "unity.h"
 
 void setUp() {}
@@ -31,9 +32,47 @@ void testLayerQuantInitUniformDoesNotMutateTheQuantization(void) {
     TEST_ASSERT_EQUAL_PTR(configBefore, q->qConfig);
 }
 
+void testDeepCopyQuantizationReturnsNullForNullInput(void) {
+    TEST_ASSERT_NULL(deepCopyQuantization(NULL));
+}
+
+void testDeepCopyQuantizationFloat32ReturnsFreshAllocationWithNullQConfig(void) {
+    quantization_t *src = quantizationInitFloat();
+    quantization_t *dst = deepCopyQuantization(src);
+
+    TEST_ASSERT_NOT_NULL(dst);
+    TEST_ASSERT_NOT_EQUAL(src, dst); /* fresh allocation */
+    TEST_ASSERT_EQUAL_INT(FLOAT32, dst->type);
+    TEST_ASSERT_NULL(dst->qConfig);
+
+    freeReservedMemory(dst->qConfig);
+    freeReservedMemory(dst);
+}
+
+void testDeepCopyQuantizationSymInt32DuplicatesQConfigBytes(void) {
+    quantization_t *src = quantizationInitSymInt32(HTE);
+    quantization_t *dst = deepCopyQuantization(src);
+
+    TEST_ASSERT_NOT_NULL(dst);
+    TEST_ASSERT_NOT_EQUAL(src, dst);
+    TEST_ASSERT_EQUAL_INT(SYM_INT32, dst->type);
+    TEST_ASSERT_NOT_NULL(dst->qConfig);
+    TEST_ASSERT_NOT_EQUAL(src->qConfig, dst->qConfig);
+
+    symInt32QConfig_t *srcCfg = (symInt32QConfig_t *)src->qConfig;
+    symInt32QConfig_t *dstCfg = (symInt32QConfig_t *)dst->qConfig;
+    TEST_ASSERT_EQUAL_MEMORY(srcCfg, dstCfg, sizeof(symInt32QConfig_t));
+
+    freeReservedMemory(dst->qConfig);
+    freeReservedMemory(dst);
+}
+
 int main(void) {
     UNITY_BEGIN();
     RUN_TEST(testLayerQuantInitUniformSetsAllFourSlotsToTheSamePointer);
     RUN_TEST(testLayerQuantInitUniformDoesNotMutateTheQuantization);
+    RUN_TEST(testDeepCopyQuantizationReturnsNullForNullInput);
+    RUN_TEST(testDeepCopyQuantizationFloat32ReturnsFreshAllocationWithNullQConfig);
+    RUN_TEST(testDeepCopyQuantizationSymInt32DuplicatesQConfigBytes);
     return UNITY_END();
 }
diff --git a/test/unit/userAPI/UnitTestLayerWeightsApi.c b/test/unit/userAPI/UnitTestLayerWeightsApi.c
index d6b3318..b8330d7a 100644
--- a/test/unit/userAPI/UnitTestLayerWeightsApi.c
+++ b/test/unit/userAPI/UnitTestLayerWeightsApi.c
@@ -1,5 +1,9 @@
 #define SOURCE_FILE "UNIT_TEST_LAYER_WEIGHTS_API"
 
+#include "Conv1d.h"
+#include "Conv1dApi.h"
+#include "Conv1dTransposed.h"
+#include "Conv1dTransposedApi.h"
 #include "LayerCommon.h"
 #include "LayerQuant.h"
 #include "LayerWeightsApi.h"
@@ -7,6 +11,7 @@
 #include "LinearApi.h"
 #include "QuantizationApi.h"
 #include "Tensor.h"
+#include "TensorApi.h"
 #include "unity.h"
 
 void setUp() {}
@@ -64,9 +69,108 @@ void testLayerLoadWeightsLinearNoBiasAcceptsNullBiasData(void) {
     freeLinearLayer(layer);
 }
 
+void testLayerLoadWeightsConv1dOverwritesWeightAndBiasTensors(void) {
+    quantization_t *q = quantizationInitFloat();
+    layerQuant_t lq;
+    layerQuantInitUniform(&lq, q);
+
+    layer_t *layer = conv1dLayerInit(
+        &(conv1dInit_t){
+            .inChannels = 2,
+            .outChannels = 3,
+            .kernelSize = 4,
+            .bias = BIAS_TRUE,
+        },
+        &lq);
+
+    /* Weight tensor: [outChannels=3, inChannels/groups=2, K=4] → 24 elems
+     * Bias tensor:   [outChannels=3] → 3 elems */
+    float weightData[24] = {
+        1.f,  2.f,  3.f,  4.f,  5.f,  6.f,  7.f,  8.f,  9.f,  10.f, 11.f, 12.f,
+        13.f, 14.f, 15.f, 16.f, 17.f, 18.f, 19.f, 20.f, 21.f, 22.f, 23.f, 24.f,
+    };
+    float biasData[3] = {-1.f, -2.f, -3.f};
+
+    layerLoadWeights(layer, weightData, biasData);
+
+    conv1dConfig_t *cfg = layer->config->conv1d;
+    float *loadedWeights = (float *)cfg->weights->param->data;
+    float *loadedBias = (float *)cfg->bias->param->data;
+
+    TEST_ASSERT_EQUAL_FLOAT_ARRAY(weightData, loadedWeights, 24);
+    TEST_ASSERT_EQUAL_FLOAT_ARRAY(biasData, loadedBias, 3);
+
+    freeConv1dLayer(layer);
+    freeQuantization(q);
+}
+
+void testLayerLoadWeightsConv1dNoBiasAcceptsNullBiasData(void) {
+    quantization_t *q = quantizationInitFloat();
+    layerQuant_t lq;
+    layerQuantInitUniform(&lq, q);
+
+    layer_t *layer = conv1dLayerInit(
+        &(conv1dInit_t){
+            .inChannels = 1,
+            .outChannels = 1,
+            .kernelSize = 3,
+            .bias = BIAS_FALSE,
+        },
+        &lq);
+
+    float weightData[3] = {0.5f, 0.25f, 0.125f};
+    layerLoadWeights(layer, weightData, NULL);
+
+    conv1dConfig_t *cfg = layer->config->conv1d;
+    float *loadedWeights = (float *)cfg->weights->param->data;
+    TEST_ASSERT_EQUAL_FLOAT_ARRAY(weightData, loadedWeights, 3);
+    TEST_ASSERT_NULL(cfg->bias);
+
+    freeConv1dLayer(layer);
+    freeQuantization(q);
+}
+
+void testLayerLoadWeightsConv1dTransposedOverwritesWeightAndBiasTensors(void) {
+    quantization_t *q = quantizationInitFloat();
+    layerQuant_t lq;
+    layerQuantInitUniform(&lq, q);
+
+    layer_t *layer = conv1dTransposedLayerInit(
+        &(conv1dTransposedInit_t){
+            .inChannels = 4,
+            .outChannels = 2,
+            .kernelSize = 3,
+            .bias = BIAS_TRUE,
+        },
+        &lq);
+
+    /* Weight tensor: [inChannels=4, outChannels/groups=2, K=3] → 24 elems.
+     * NOTE the SWAP relative to Conv1d. */
+    float weightData[24] = {0};
+    for (size_t i = 0; i < 24; i++) {
+        weightData[i] = (float)(i + 100);
+    }
+    float biasData[2] = {-10.f, -20.f};
+
+    layerLoadWeights(layer, weightData, biasData);
+
+    conv1dTransposedConfig_t *cfg = layer->config->conv1dTransposed;
+    float *loadedWeights = (float *)cfg->weights->param->data;
+    float *loadedBias = (float *)cfg->bias->param->data;
+
+    TEST_ASSERT_EQUAL_FLOAT_ARRAY(weightData, loadedWeights, 24);
+    TEST_ASSERT_EQUAL_FLOAT_ARRAY(biasData, loadedBias, 2);
+
+    freeConv1dTransposedLayer(layer);
+    freeQuantization(q);
+}
+
 int main(void) {
     UNITY_BEGIN();
     RUN_TEST(testLayerLoadWeightsLinearOverwritesWeightAndBiasTensors);
     RUN_TEST(testLayerLoadWeightsLinearNoBiasAcceptsNullBiasData);
+    RUN_TEST(testLayerLoadWeightsConv1dOverwritesWeightAndBiasTensors);
+    RUN_TEST(testLayerLoadWeightsConv1dNoBiasAcceptsNullBiasData);
+    RUN_TEST(testLayerLoadWeightsConv1dTransposedOverwritesWeightAndBiasTensors);
     return UNITY_END();
 }
diff --git a/test/unit/userAPI/UnitTestMnistSmoke.c b/test/unit/userAPI/UnitTestMnistSmoke.c
index 3b312fb..65c4dbc 100644
--- a/test/unit/userAPI/UnitTestMnistSmoke.c
+++ b/test/unit/userAPI/UnitTestMnistSmoke.c
@@ -162,7 +162,7 @@ static void buildModel(layer_t **model, quantization_t **q_out) {
     parameter_t *b1 = parameterInit(b1Param, b1Grad);
 
     model[2] = linearLayerInitLegacy(w1, b1, q, q, q, q);
-    model[3] = softmaxLayerInit(q, q);
+    model[3] = softmaxLayerInitLegacy(q, q);
 }
 
 static size_t cbInvocations;
@@ -215,7 +215,7 @@ void testMnistSmoke_FullTrainingPipelineReducesLoss() {
      * NOTE: freeOptimSgdM cascades to all model parameters via freeParameter.
      * Do NOT also call freeParameter on w0/b0/w1/b1 — would be a double-free. */
     freeOptimSgdM(sgd);
-    freeSoftmaxLayer(model[3]);
+    freeSoftmaxLayerLegacy(model[3]);
     freeLinearLayerLegacy(model[2]);
     freeReluLayerLegacy(model[1]);
     freeLinearLayerLegacy(model[0]);
@@ -275,7 +275,7 @@ void testMnistSmoke_SnprintfGmtimeRBetweenSetupAndTrainingRun_NoSilentExit() {
     char capturedFirstChar = buf[0];
 
     freeOptimSgdM(sgd);
-    freeSoftmaxLayer(model[3]);
+    freeSoftmaxLayerLegacy(model[3]);
     freeLinearLayerLegacy(model[2]);
     freeReluLayerLegacy(model[1]);
     freeLinearLayerLegacy(model[0]);
diff --git a/test/unit/userAPI/UnitTestMultiLayerTraining.c b/test/unit/userAPI/UnitTestMultiLayerTraining.c
index 9aec230..30f7460 100644
--- a/test/unit/userAPI/UnitTestMultiLayerTraining.c
+++ b/test/unit/userAPI/UnitTestMultiLayerTraining.c
@@ -86,7 +86,7 @@ void testMultiLayerBackward_WithCrossEntropy_DoesNotCrash() {
     parameter_t *b1 = parameterInit(b1Param, b1Grad);
 
     layer_t *linear1 = linearLayerInitLegacy(w1, b1, q, q, q, q);
-    layer_t *softmax = softmaxLayerInit(q, q);
+    layer_t *softmax = softmaxLayerInitLegacy(q, q);
 
     layer_t *model[] = {linear0, relu, linear1, softmax};
     size_t sizeModel = 4;
@@ -126,7 +126,7 @@ void testMultiLayerBackward_WithCrossEntropy_DoesNotCrash() {
     freeTrainingStats(stats);
     freeTensor(label);
     freeTensor(input);
-    freeSoftmaxLayer(softmax);
+    freeSoftmaxLayerLegacy(softmax);
     freeLinearLayerLegacy(linear1);
     freeParameter(b1);
     freeParameter(w1);
@@ -205,7 +205,7 @@ void testMultiLayerBackward_WithManualInit_DoesNotCrash() {
     parameter_t *b1 = parameterInit(b1Param, b1Grad);
 
     layer_t *linear1 = linearLayerInitLegacy(w1, b1, q, q, q, q);
-    layer_t *softmax = softmaxLayerInit(q, q);
+    layer_t *softmax = softmaxLayerInitLegacy(q, q);
 
     layer_t *model[] = {linear0, relu, linear1, softmax};
     size_t sizeModel = 4;
@@ -257,7 +257,7 @@ void testMultiLayerBackward_WithManualInit_DoesNotCrash() {
     freeTrainingStats(stats);
     freeTensor(label);
     freeTensor(input);
-    freeSoftmaxLayer(softmax);
+    freeSoftmaxLayerLegacy(softmax);
     freeLinearLayerLegacy(linear1);
     freeParameter(b1);
     freeParameter(w1);
@@ -334,7 +334,7 @@ void testMultiLayerTraining_MultipleSteps_GradsAccumulate() {
     parameter_t *b1 = parameterInit(b1Param, b1Grad);
 
     layer_t *linear1 = linearLayerInitLegacy(w1, b1, q, q, q, q);
-    layer_t *softmax = softmaxLayerInit(q, q);
+    layer_t *softmax = softmaxLayerInitLegacy(q, q);
 
     layer_t *model[] = {linear0, relu, linear1, softmax};
     size_t sizeModel = 4;
@@ -389,7 +389,7 @@ void testMultiLayerTraining_MultipleSteps_GradsAccumulate() {
     freeTensor(label);
     freeTensor(input);
     freeOptimSgdM(sgd);
-    freeSoftmaxLayer(softmax);
+    freeSoftmaxLayerLegacy(softmax);
     freeLinearLayerLegacy(linear1);
     freeReluLayerLegacy(relu);
     freeLinearLayerLegacy(linear0);
diff --git a/test/unit/userAPI/UnitTestPool1dApi.c b/test/unit/userAPI/UnitTestPool1dApi.c
new file mode 100644
index 0000000..c9a8e03
--- /dev/null
+++ b/test/unit/userAPI/UnitTestPool1dApi.c
@@ -0,0 +1,219 @@
+#define SOURCE_FILE "UNIT_TEST_POOL1D_API"
+
+#include "AvgPool1d.h"
+#include "Kernel.h"
+#include "Layer.h"
+#include "LayerQuant.h"
+#include "MaxPool1d.h"
+#include "Pool1dApi.h"
+#include "QuantizationApi.h"
+#include "Tensor.h"
+#include "TensorApi.h"
+#include "unity.h"
+
+void setUp() {}
+void tearDown() {}
+
+/* ============================================================================
+ * MaxPool1d
+ * ========================================================================== */
+
+void testMaxPool1dLayerInitBorrowingBuildsLayerWithKernelAndArgmax(void) {
+    quantization_t *q = quantizationInitFloat();
+    layerQuant_t lq;
+    layerQuantInitUniform(&lq, q);
+
+    /* For K=2 S=2 VALID on inputLength=64, outputLength = (64 - 2)/2 + 1 = 32 */
+    layer_t *layer = maxPool1dLayerInit(
+        &(maxPool1dInit_t){
+            .kernelSize = 2,
+            .stride = 2,
+            .inputChannels = 16,
+            .inputLength = 64,
+        },
+        &lq);
+
+    TEST_ASSERT_NOT_NULL(layer);
+    TEST_ASSERT_EQUAL_INT(MAXPOOL1D, layer->type);
+
+    maxPool1dConfig_t *cfg = layer->config->maxPool1d;
+    TEST_ASSERT_NOT_NULL(cfg);
+    TEST_ASSERT_FALSE(cfg->ownsQuantizations);
+
+    TEST_ASSERT_EQUAL_PTR(q, cfg->forwardQ);
+    TEST_ASSERT_EQUAL_PTR(q, cfg->propLossQ);
+
+    /* Kernel correctness */
+    TEST_ASSERT_NOT_NULL(cfg->kernel);
+    TEST_ASSERT_EQUAL_UINT(2, cfg->kernel->size);
+    TEST_ASSERT_EQUAL_INT(VALID, cfg->kernel->paddingType);
+    TEST_ASSERT_EQUAL_UINT(2, cfg->kernel->stride);
+    TEST_ASSERT_EQUAL_UINT(1, cfg->kernel->dilation);
+
+    /* Argmax tensor shape: [1, inputChannels, outputLength] = [1, 16, 32] */
+    TEST_ASSERT_NOT_NULL(cfg->argmaxIndices);
+    TEST_ASSERT_EQUAL_UINT(3, cfg->argmaxIndices->shape->numberOfDimensions);
+    TEST_ASSERT_EQUAL_UINT(1, cfg->argmaxIndices->shape->dimensions[0]);
+    TEST_ASSERT_EQUAL_UINT(16, cfg->argmaxIndices->shape->dimensions[1]);
+    TEST_ASSERT_EQUAL_UINT(32, cfg->argmaxIndices->shape->dimensions[2]);
+    TEST_ASSERT_EQUAL_INT(INT32, cfg->argmaxIndices->quantization->type);
+
+    freeMaxPool1dLayer(layer);
+    freeQuantization(q);
+}
+
+void testMaxPool1dLayerInitBorrowingStrideDefaultsToKernelSize(void) {
+    quantization_t *q = quantizationInitFloat();
+    layerQuant_t lq;
+    layerQuantInitUniform(&lq, q);
+
+    /* stride omitted → defaults to kernelSize per PyTorch convention */
+    layer_t *layer = maxPool1dLayerInit(
+        &(maxPool1dInit_t){
+            .kernelSize = 4,
+            .inputChannels = 1,
+            .inputLength = 16,
+        },
+        &lq);
+
+    maxPool1dConfig_t *cfg = layer->config->maxPool1d;
+    TEST_ASSERT_EQUAL_UINT(4, cfg->kernel->size);
+    TEST_ASSERT_EQUAL_UINT(4, cfg->kernel->stride);
+    /* outputLength = (16 - 4)/4 + 1 = 4 */
+    TEST_ASSERT_EQUAL_UINT(4, cfg->argmaxIndices->shape->dimensions[2]);
+
+    freeMaxPool1dLayer(layer);
+    freeQuantization(q);
+}
+
+void testMaxPool1dLayerInitOwningDeepCopiesTwoQuantizations(void) {
+    quantization_t *q = quantizationInitFloat();
+    layerQuant_t lq;
+    layerQuantInitUniform(&lq, q);
+
+    layer_t *layer = maxPool1dLayerInitOwning(
+        &(maxPool1dInit_t){
+            .kernelSize = 2,
+            .stride = 2,
+            .inputChannels = 4,
+            .inputLength = 8,
+        },
+        &lq);
+
+    maxPool1dConfig_t *cfg = layer->config->maxPool1d;
+    TEST_ASSERT_NOT_EQUAL(q, cfg->forwardQ);
+    TEST_ASSERT_NOT_EQUAL(q, cfg->propLossQ);
+    TEST_ASSERT_EQUAL_INT(q->type, cfg->forwardQ->type);
+    TEST_ASSERT_TRUE(cfg->ownsQuantizations);
+
+    freeMaxPool1dLayer(layer);
+    freeQuantization(q);
+}
+
+void testMaxPool1dLayerInitOwningRepeatedBuildFreeNoLeak(void) {
+    for (int i = 0; i < 5; i++) {
+        quantization_t *q = quantizationInitFloat();
+        layerQuant_t lq;
+        layerQuantInitUniform(&lq, q);
+
+        layer_t *layer = maxPool1dLayerInitOwning(
+            &(maxPool1dInit_t){
+                .kernelSize = 2,
+                .stride = 2,
+                .inputChannels = 4,
+                .inputLength = 8,
+            },
+            &lq);
+
+        freeMaxPool1dLayer(layer);
+        freeQuantization(q);
+    }
+    TEST_PASS();
+}
+
+/* ============================================================================
+ * AvgPool1d
+ * ========================================================================== */
+
+void testAvgPool1dLayerInitBorrowingBuildsLayerWithKernel(void) {
+    quantization_t *q = quantizationInitFloat();
+    layerQuant_t lq;
+    layerQuantInitUniform(&lq, q);
+
+    layer_t *layer = avgPool1dLayerInit(
+        &(avgPool1dInit_t){
+            .kernelSize = 5,
+            .stride = 5,
+        },
+        &lq);
+
+    TEST_ASSERT_NOT_NULL(layer);
+    TEST_ASSERT_EQUAL_INT(AVGPOOL1D, layer->type);
+
+    avgPool1dConfig_t *cfg = layer->config->avgPool1d;
+    TEST_ASSERT_NOT_NULL(cfg);
+    TEST_ASSERT_FALSE(cfg->ownsQuantizations);
+
+    TEST_ASSERT_EQUAL_PTR(q, cfg->forwardQ);
+    TEST_ASSERT_EQUAL_PTR(q, cfg->propLossQ);
+
+    TEST_ASSERT_NOT_NULL(cfg->kernel);
+    TEST_ASSERT_EQUAL_UINT(5, cfg->kernel->size);
+    TEST_ASSERT_EQUAL_INT(VALID, cfg->kernel->paddingType);
+    TEST_ASSERT_EQUAL_UINT(5, cfg->kernel->stride);
+
+    freeAvgPool1dLayer(layer);
+    freeQuantization(q);
+}
+
+void testAvgPool1dLayerInitBorrowingStrideDefaultsToKernelSize(void) {
+    quantization_t *q = quantizationInitFloat();
+    layerQuant_t lq;
+    layerQuantInitUniform(&lq, q);
+
+    layer_t *layer = avgPool1dLayerInit(
+        &(avgPool1dInit_t){
+            .kernelSize = 3,
+            /* stride omitted → kernelSize=3 */
+        },
+        &lq);
+
+    avgPool1dConfig_t *cfg = layer->config->avgPool1d;
+    TEST_ASSERT_EQUAL_UINT(3, cfg->kernel->stride);
+
+    freeAvgPool1dLayer(layer);
+    freeQuantization(q);
+}
+
+void testAvgPool1dLayerInitOwningDeepCopiesTwoQuantizations(void) {
+    quantization_t *q = quantizationInitFloat();
+    layerQuant_t lq;
+    layerQuantInitUniform(&lq, q);
+
+    layer_t *layer = avgPool1dLayerInitOwning(
+        &(avgPool1dInit_t){
+            .kernelSize = 2,
+            .stride = 2,
+        },
+        &lq);
+
+    avgPool1dConfig_t *cfg = layer->config->avgPool1d;
+    TEST_ASSERT_NOT_EQUAL(q, cfg->forwardQ);
+    TEST_ASSERT_NOT_EQUAL(q, cfg->propLossQ);
+    TEST_ASSERT_TRUE(cfg->ownsQuantizations);
+
+    freeAvgPool1dLayer(layer);
+    freeQuantization(q);
+}
+
+int main(void) {
+    UNITY_BEGIN();
+    RUN_TEST(testMaxPool1dLayerInitBorrowingBuildsLayerWithKernelAndArgmax);
+    RUN_TEST(testMaxPool1dLayerInitBorrowingStrideDefaultsToKernelSize);
+    RUN_TEST(testMaxPool1dLayerInitOwningDeepCopiesTwoQuantizations);
+    RUN_TEST(testMaxPool1dLayerInitOwningRepeatedBuildFreeNoLeak);
+    RUN_TEST(testAvgPool1dLayerInitBorrowingBuildsLayerWithKernel);
+    RUN_TEST(testAvgPool1dLayerInitBorrowingStrideDefaultsToKernelSize);
+    RUN_TEST(testAvgPool1dLayerInitOwningDeepCopiesTwoQuantizations);
+    return UNITY_END();
+}

From 1bbf2b467e806c2e76e169c6da84ff06a6b6ef4b Mon Sep 17 00:00:00 2001
From: Leo Buron <leo.buron@uni-due.de>
Date: Fri, 15 May 2026 22:03:42 +0200
Subject: [PATCH 2/4] feat(layer): implement softmaxLayerInit Borrowing +
 Owning + new freeSoftmaxLayer
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Factory takes layerQuant_t* and stores .forwardMath / .backwardMath as
the layer's forward/backward quantizations. Owning deep-copies both
via deepCopyQuantization. freeSoftmaxLayer reads ownsQuantizations to
decide whether to also tear down the two quantizations and qConfigs.

Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md sections 4, 5

feat(layer): add ownsQuantizations flag to five internal layer configs

Mirrors the field PR 1 added to linearConfig_t and reluConfig_t.
Foundation for the new factory API in subsequent commits — each new
*LayerInitOwning sets the flag to true and the canonical free*Layer
branches on it. Calloc-backed allocation makes the default false,
which preserves the existing borrowing-semantics for legacy callers.

Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 5.3

feat(layer): implement conv1dLayerInit Borrowing variant + freeConv1dLayer

Factory allocates kernel, weight, and bias internally. KAIMING_UNIFORM
weights / ZEROS bias (calloc-implicit). Stores the four lq quantization
pointers verbatim; sets ownsQuantizations=false. freeConv1dLayer tears
down parameters + kernel unconditionally and the quantizations only
when ownsQuantizations=true (defensive dedup against pointer aliasing).
Fixes the pre-existing layer->config leak that the legacy free* path
still has.

Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md sections 4, 5.1, 5.3

test(layer): failing tests for conv1dLayerInitOwning

Verifies deep copy of all four quantization_t into fresh allocations,
ownsQuantizations=true, and clean teardown without leaks.

Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 5.2

feat(layer): implement conv1dLayerInitOwning (deep-copy variant)

Factory deep-copies each of the four quantization_t in lq via the
shared deepCopyQuantization helper. Always four separate copies (no
aliasing), keeping freeConv1dLayer simple. Caller can drop lq + all
four quantizations immediately after the call.

Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 5.2

feat(layer): implement conv1dTransposedLayerInit Borrowing variant + freeConv1dTransposedLayer

Allocates kernel, weights ([inChannels, outChannels/groups, kernelSize]),
optional bias. KAIMING_UNIFORM weight init / ZEROS bias.
Stores the four lq pointers verbatim. freeConv1dTransposedLayer tears
down parameters + kernel unconditionally and quantizations only when
ownsQuantizations=true.

Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md sections 4, 5.1, 5.3

test(layer): failing tests for conv1dTransposedLayerInitOwning

Verifies deep-copy of the four quantization_t into fresh allocations,
ownsQuantizations=true, and clean teardown.

Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 5.2

feat(layer): implement conv1dTransposedLayerInitOwning (deep-copy variant)

Deep-copies each of the four quantization_t via deepCopyQuantization.
Always four separate copies (no aliasing). Caller can drop lq + all
four quantizations immediately after the call.

Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 5.2

feat(layer): implement maxPool1dLayerInit Borrowing + Owning + freeMaxPool1dLayer

Factory pre-allocates kernel + argmaxIndices INT32 tensor (shape
[1, inputChannels, outputLength]). outputLength derived via
computePool1dOutputLength replicating the geometry rule from
windowGeometry1dCalc. Stride defaults to kernelSize (PyTorch
convention). Owning deep-copies forwardMath + backwardMath into the
config's forwardQ + propLossQ slots.

Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md sections 3.3 (split), 4, 5

feat(layer): implement avgPool1dLayerInit Borrowing + Owning + freeAvgPool1dLayer

Factory pre-allocates kernel only (no argmax). Stride defaults to
kernelSize. Owning deep-copies forwardMath + backwardMath into the
forwardQ + propLossQ slots.

Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md sections 4, 5

test(layer): failing tests for new softmaxLayerInit Borrowing + Owning

Two tests: Borrowing stores lq pointers verbatim with
ownsQuantizations=false; Owning deep-copies them with
ownsQuantizations=true. Fails at link until impl lands.

Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 4
---
 src/layer/include/AvgPool1d.h                 |   2 +
 src/layer/include/Conv1d.h                    |   2 +
 src/layer/include/Conv1dTransposed.h          |   2 +
 src/layer/include/MaxPool1d.h                 |   2 +
 src/layer/include/Softmax.h                   |   3 +
 src/userApi/layer/CMakeLists.txt              |  21 +-
 src/userApi/layer/Conv1dApi.c                 | 257 ++++++++++++++++-
 src/userApi/layer/Conv1dTransposedApi.c       | 232 ++++++++++++++-
 src/userApi/layer/Pool1dApi.c                 | 269 +++++++++++++++++-
 src/userApi/layer/SoftmaxApi.c                |  83 +++++-
 src/userApi/layer/include/SoftmaxApi.h        |  13 +
 test/unit/layer/CMakeLists.txt                |   3 +
 test/unit/layer/UnitTestSoftmax.c             |  52 ++++
 test/unit/loss_functions/CMakeLists.txt       |   1 +
 test/unit/userAPI/UnitTestConv1dApi.c         |  61 ++++
 .../userAPI/UnitTestConv1dTransposedApi.c     |  50 ++++
 16 files changed, 1039 insertions(+), 14 deletions(-)

diff --git a/src/layer/include/AvgPool1d.h b/src/layer/include/AvgPool1d.h
index d1f2dc0..2f7c160 100644
--- a/src/layer/include/AvgPool1d.h
+++ b/src/layer/include/AvgPool1d.h
@@ -1,6 +1,7 @@
 #ifndef ODT_AVG_POOL_1D_H
 #define ODT_AVG_POOL_1D_H
 
+#include <stdbool.h>
 #include <stdlib.h>
 
 #include "Kernel.h"
@@ -11,6 +12,7 @@ typedef struct avgPool1dConfig {
     kernel_t *kernel;
     quantization_t *forwardQ;
     quantization_t *propLossQ;
+    bool ownsQuantizations;
 } avgPool1dConfig_t;
 
 void initAvgPool1dConfig(avgPool1dConfig_t *cfg, kernel_t *kernel, quantization_t *forwardQ,
diff --git a/src/layer/include/Conv1d.h b/src/layer/include/Conv1d.h
index 693e3b0..6b3c19a 100644
--- a/src/layer/include/Conv1d.h
+++ b/src/layer/include/Conv1d.h
@@ -1,6 +1,7 @@
 #ifndef ODT_CONV1D_H
 #define ODT_CONV1D_H
 
+#include <stdbool.h>
 #include <stdlib.h>
 
 #include "Kernel.h"
@@ -16,6 +17,7 @@ typedef struct conv1dConfig {
     quantization_t *weightGradQ;
     quantization_t *biasGradQ;
     quantization_t *propLossQ;
+    bool ownsQuantizations;
 } conv1dConfig_t;
 
 void initConv1dConfigWithWeightsAndBias(conv1dConfig_t *conv1dConfig, kernel_t *kernel,
diff --git a/src/layer/include/Conv1dTransposed.h b/src/layer/include/Conv1dTransposed.h
index d9ad100..6ab9219 100644
--- a/src/layer/include/Conv1dTransposed.h
+++ b/src/layer/include/Conv1dTransposed.h
@@ -1,6 +1,7 @@
 #ifndef ODT_CONV1D_TRANSPOSED_H
 #define ODT_CONV1D_TRANSPOSED_H
 
+#include <stdbool.h>
 #include <stdlib.h>
 
 #include "Kernel.h"
@@ -17,6 +18,7 @@ typedef struct conv1dTransposedConfig {
     quantization_t *weightGradQ;
     quantization_t *biasGradQ;
     quantization_t *propLossQ;
+    bool ownsQuantizations;
 } conv1dTransposedConfig_t;
 
 void initConv1dTransposedConfigWithWeightsAndBias(
diff --git a/src/layer/include/MaxPool1d.h b/src/layer/include/MaxPool1d.h
index df7dfa2..fb52550 100644
--- a/src/layer/include/MaxPool1d.h
+++ b/src/layer/include/MaxPool1d.h
@@ -1,6 +1,7 @@
 #ifndef ODT_MAX_POOL_1D_H
 #define ODT_MAX_POOL_1D_H
 
+#include <stdbool.h>
 #include <stdlib.h>
 
 #include "Kernel.h"
@@ -12,6 +13,7 @@ typedef struct maxPool1dConfig {
     tensor_t *argmaxIndices; // INT32, shape == output shape; pre-allocated by caller
     quantization_t *forwardQ;
     quantization_t *propLossQ;
+    bool ownsQuantizations;
 } maxPool1dConfig_t;
 
 void initMaxPool1dConfig(maxPool1dConfig_t *cfg, kernel_t *kernel, tensor_t *argmaxIndices,
diff --git a/src/layer/include/Softmax.h b/src/layer/include/Softmax.h
index d3b2dcf..1254c59 100644
--- a/src/layer/include/Softmax.h
+++ b/src/layer/include/Softmax.h
@@ -1,11 +1,14 @@
 #ifndef ENV5_RUNTIME_SOFTMAX_H
 #define ENV5_RUNTIME_SOFTMAX_H
 
+#include <stdbool.h>
+
 #include "Layer.h"
 
 typedef struct softmaxConfig {
     quantization_t *forwardQ;
     quantization_t *backwardQ;
+    bool ownsQuantizations;
 } softmaxConfig_t;
 
 void softmaxInitConfig(softmaxConfig_t *softmaxConfig, quantization_t *forwardQ,
diff --git a/src/userApi/layer/CMakeLists.txt b/src/userApi/layer/CMakeLists.txt
index c24e468..c4367c3 100644
--- a/src/userApi/layer/CMakeLists.txt
+++ b/src/userApi/layer/CMakeLists.txt
@@ -1,12 +1,18 @@
 add_library(Conv1dApi Conv1dApi.c)
 target_include_directories(Conv1dApi PUBLIC include)
 target_link_libraries(Conv1dApi PRIVATE
-        Tensor
-        Rounding
-        Layer
-        Conv1d
         Common
+        Conv1d
+        Distributions
+        Kernel
+        Layer
+        LayerCommon
+        LayerQuant
+        Quantization
+        QuantizationApi
+        Rounding
         StorageApi
+        Tensor
         TensorApi
 )
 
@@ -42,11 +48,14 @@ target_link_libraries(ReluApi PRIVATE
 add_library(SoftmaxApi SoftmaxApi.c)
 target_include_directories(SoftmaxApi PUBLIC include)
 target_link_libraries(SoftmaxApi PRIVATE
+        Common
         Layer
-        Tensor
+        LayerQuant
+        Quantization
         Rounding
-        Common
+        Softmax
         StorageApi
+        Tensor
 )
 
 add_library(FlattenApi FlattenApi.c)
diff --git a/src/userApi/layer/Conv1dApi.c b/src/userApi/layer/Conv1dApi.c
index 22a3d3a..e6c23ba 100644
--- a/src/userApi/layer/Conv1dApi.c
+++ b/src/userApi/layer/Conv1dApi.c
@@ -1,12 +1,24 @@
 #define SOURCE_FILE "CONV1D_API"
 
-#include "Conv1dApi.h"
+#include <stdbool.h>
+#include <stdlib.h>
+
+#include "Common.h"
 #include "Conv1d.h"
+#include "Conv1dApi.h"
+#include "Distributions.h"
+#include "Kernel.h"
 #include "Layer.h"
+#include "LayerCommon.h"
+#include "LayerQuant.h"
+#include "QuantizationApi.h"
 #include "StorageApi.h"
+#include "Tensor.h"
 #include "TensorApi.h"
 
-#include <stdio.h>
+/* ============================================================================
+ * Legacy factory (renamed in Task 4).
+ * ========================================================================== */
 
 layer_t *conv1dLayerInitLegacy(parameter_t *weights, parameter_t *bias, kernel_t *kernel,
                                quantization_t *forwardQ, quantization_t *weightGradQ,
@@ -41,3 +53,244 @@ void freeConv1dLayerLegacy(layer_t *conv1dLayer) {
     freeReservedMemory(conv1dConfig);
     freeReservedMemory(conv1dLayer);
 }
+
+/* ============================================================================
+ * New factory API — conv1dInit_t struct + layerQuant_t profile (PR 2).
+ * ========================================================================== */
+
+static bool resolveConv1dBias(bias_t b) {
+    switch (b) {
+    case BIAS_DEFAULT:
+        return true; /* PyTorch parity for Conv1d */
+    case BIAS_TRUE:
+        return true;
+    case BIAS_FALSE:
+        return false;
+    default:
+        PRINT_ERROR("conv1dLayerInit: invalid bias value (got %d)", (int)b);
+        exit(1);
+    }
+}
+
+/*! Build a heap-owned shape_t with the given dims; the tensor that this shape
+ *  is passed to takes ownership and freeTensor cascades into freeShape. */
+static shape_t *buildOwnedShape(const size_t *srcDims, size_t numberOfDims) {
+    size_t *dims = reserveMemory(numberOfDims * sizeof(size_t));
+    for (size_t i = 0; i < numberOfDims; i++) {
+        dims[i] = srcDims[i];
+    }
+    size_t *order = reserveMemory(numberOfDims * sizeof(size_t));
+    setOrderOfDimsForNewTensor(numberOfDims, order);
+    shape_t *shape = reserveMemory(sizeof(shape_t));
+    setShape(shape, dims, numberOfDims, order);
+    return shape;
+}
+
+static parameter_t *allocateConv1dWeights(size_t outChannels, size_t inChannels, size_t groups,
+                                          size_t kernelSize, quantization_t *storageQ) {
+    /* Conv1d weight shape: [outChannels, inChannels/groups, kernelSize].
+     * Per Conv1d.h:11. */
+    if (inChannels % groups != 0) {
+        PRINT_ERROR("conv1dLayerInit: inChannels (%zu) must be divisible by groups (%zu)",
+                    inChannels, groups);
+        exit(1);
+    }
+    if (outChannels % groups != 0) {
+        PRINT_ERROR("conv1dLayerInit: outChannels (%zu) must be divisible by groups (%zu)",
+                    outChannels, groups);
+        exit(1);
+    }
+    size_t inPerGroup = inChannels / groups;
+
+    shape_t *shape = buildOwnedShape((size_t[]){outChannels, inPerGroup, kernelSize}, 3);
+    tensor_t *paramTensor = initTensor(shape, getQLike(storageQ), NULL);
+
+    /* PyTorch-aligned default: Kaiming uniform with fan_in mode.
+     * Note: PyTorch's actual default uses a=sqrt(5); bit-identical parity
+     * requires Issue C (distribution parametrization). */
+    if (storageQ->type != FLOAT32) {
+        PRINT_ERROR("conv1dLayerInit: KAIMING_UNIFORM init currently requires FLOAT32 "
+                    "weight storage (Issue C will lift this limit)");
+        exit(1);
+    }
+    distribution_t dist = {
+        .type = KAIMING_UNIFORM,
+        .params.kaiming = {.gain = 1.4142135623730951f /* sqrtf(2.0f) */,
+                           .fanMode = inPerGroup * kernelSize},
+    };
+    initDistribution(paramTensor, &dist);
+
+    tensor_t *gradTensor = gradInitFloat(paramTensor, NULL);
+    return parameterInit(paramTensor, gradTensor);
+}
+
+static parameter_t *allocateConv1dBias(size_t outChannels, quantization_t *storageQ) {
+    /* Bias tensor: shape [outChannels]. Zero-initialized via calloc (reserveMemory). */
+    shape_t *shape = buildOwnedShape((size_t[]){outChannels}, 1);
+    tensor_t *paramTensor = initTensor(shape, getQLike(storageQ), NULL);
+    /* No initDistribution(ZEROS) — calloc already gave us zeros. */
+
+    tensor_t *gradTensor = gradInitFloat(paramTensor, NULL);
+    return parameterInit(paramTensor, gradTensor);
+}
+
+static void validateConv1dInit(conv1dInit_t *init) {
+    if (init == NULL) {
+        PRINT_ERROR("conv1dLayerInit: init pointer is NULL");
+        exit(1);
+    }
+    if (init->inChannels == 0) {
+        PRINT_ERROR("conv1dLayerInit: inChannels must be > 0");
+        exit(1);
+    }
+    if (init->outChannels == 0) {
+        PRINT_ERROR("conv1dLayerInit: outChannels must be > 0");
+        exit(1);
+    }
+    if (init->kernelSize == 0) {
+        PRINT_ERROR("conv1dLayerInit: kernelSize must be > 0");
+        exit(1);
+    }
+}
+
+static void validateLayerQuantForConv1d(layerQuant_t *lq, bool hasBias) {
+    if (lq == NULL) {
+        PRINT_ERROR("conv1dLayerInit: lq pointer is NULL");
+        exit(1);
+    }
+    if (lq->forwardMath == NULL) {
+        PRINT_ERROR("conv1dLayerInit: layerQuant.forwardMath must be set");
+        exit(1);
+    }
+    if (lq->backwardMath == NULL) {
+        PRINT_ERROR("conv1dLayerInit: layerQuant.backwardMath must be set");
+        exit(1);
+    }
+    if (lq->weightStorage == NULL) {
+        PRINT_ERROR("conv1dLayerInit: layerQuant.weightStorage must be set");
+        exit(1);
+    }
+    if (hasBias && lq->biasStorage == NULL) {
+        PRINT_ERROR("conv1dLayerInit: layerQuant.biasStorage must be set when bias is enabled");
+        exit(1);
+    }
+}
+
+/*! Build a heap-owned kernel_t from the conv1dInit_t fields, applying
+ *  zero-init defaults (stride=1, dilation=1, padding=VALID). */
+static kernel_t *buildConv1dKernel(conv1dInit_t *init) {
+    kernel_t *kernel = reserveMemory(sizeof(kernel_t));
+    size_t stride = init->stride == 0 ? 1 : init->stride;
+    size_t dilation = init->dilation == 0 ? 1 : init->dilation;
+    initKernel(kernel, init->kernelSize, init->padding, dilation, stride);
+    return kernel;
+}
+
+layer_t *conv1dLayerInit(conv1dInit_t *init, layerQuant_t *lq) {
+    validateConv1dInit(init);
+    bool hasBias = resolveConv1dBias(init->bias);
+    validateLayerQuantForConv1d(lq, hasBias);
+
+    size_t groups = init->groups == 0 ? 1 : init->groups;
+
+    layer_t *layer = reserveMemory(sizeof(layer_t));
+    layer->type = CONV1D;
+
+    layerConfig_t *layerCfg = reserveMemory(sizeof(layerConfig_t));
+    conv1dConfig_t *cfg = reserveMemory(sizeof(conv1dConfig_t));
+    layerCfg->conv1d = cfg;
+    layer->config = layerCfg;
+
+    cfg->kernel = buildConv1dKernel(init);
+    cfg->weights = allocateConv1dWeights(init->outChannels, init->inChannels, groups,
+                                         init->kernelSize, lq->weightStorage);
+    cfg->bias = hasBias ? allocateConv1dBias(init->outChannels, lq->biasStorage) : NULL;
+    cfg->groups = groups;
+    cfg->forwardQ = lq->forwardMath;
+    cfg->weightGradQ = lq->backwardMath;
+    cfg->biasGradQ = lq->backwardMath;
+    cfg->propLossQ = lq->backwardMath;
+    cfg->ownsQuantizations = false;
+
+    return layer;
+}
+
+layer_t *conv1dLayerInitOwning(conv1dInit_t *init, layerQuant_t *lq) {
+    validateConv1dInit(init);
+    bool hasBias = resolveConv1dBias(init->bias);
+    validateLayerQuantForConv1d(lq, hasBias);
+
+    size_t groups = init->groups == 0 ? 1 : init->groups;
+
+    layer_t *layer = reserveMemory(sizeof(layer_t));
+    layer->type = CONV1D;
+
+    layerConfig_t *layerCfg = reserveMemory(sizeof(layerConfig_t));
+    conv1dConfig_t *cfg = reserveMemory(sizeof(conv1dConfig_t));
+    layerCfg->conv1d = cfg;
+    layer->config = layerCfg;
+
+    cfg->kernel = buildConv1dKernel(init);
+    /* allocateConv1dWeights / allocateConv1dBias internally clone via getQLike,
+     * so the parameter tensors own their quantization_t — caller can drop
+     * lq->weightStorage / lq->biasStorage immediately. */
+    cfg->weights = allocateConv1dWeights(init->outChannels, init->inChannels, groups,
+                                         init->kernelSize, lq->weightStorage);
+    cfg->bias = hasBias ? allocateConv1dBias(init->outChannels, lq->biasStorage) : NULL;
+    cfg->groups = groups;
+
+    /* Owning: deep-copy each of the four math quantizations. Always four
+     * separate copies (no aliasing), keeping freeConv1dLayer simple. */
+    cfg->forwardQ = deepCopyQuantization(lq->forwardMath);
+    cfg->weightGradQ = deepCopyQuantization(lq->backwardMath);
+    cfg->biasGradQ = deepCopyQuantization(lq->backwardMath);
+    cfg->propLossQ = deepCopyQuantization(lq->backwardMath);
+    cfg->ownsQuantizations = true;
+
+    return layer;
+}
+
+void freeConv1dLayer(layer_t *conv1dLayer) {
+    if (conv1dLayer == NULL) {
+        return;
+    }
+    conv1dConfig_t *cfg = conv1dLayer->config->conv1d;
+
+    /* Always factory-owned: parameters + kernel. */
+    if (cfg->weights != NULL) {
+        freeParameter(cfg->weights);
+    }
+    if (cfg->bias != NULL) {
+        freeParameter(cfg->bias);
+    }
+    freeReservedMemory(cfg->kernel);
+
+    /* Conditionally factory-owned: quantizations (Owning variant only).
+     * Defensive dedup: the Owning factory in Task 9 allocates four
+     * separate copies (no aliasing), so the dedup is a no-op there but
+     * protects against future aliasing. */
+    if (cfg->ownsQuantizations) {
+        if (cfg->forwardQ != NULL) {
+            freeReservedMemory(cfg->forwardQ->qConfig);
+            freeReservedMemory(cfg->forwardQ);
+        }
+        if (cfg->weightGradQ != NULL && cfg->weightGradQ != cfg->forwardQ) {
+            freeReservedMemory(cfg->weightGradQ->qConfig);
+            freeReservedMemory(cfg->weightGradQ);
+        }
+        if (cfg->biasGradQ != NULL && cfg->biasGradQ != cfg->forwardQ &&
+            cfg->biasGradQ != cfg->weightGradQ) {
+            freeReservedMemory(cfg->biasGradQ->qConfig);
+            freeReservedMemory(cfg->biasGradQ);
+        }
+        if (cfg->propLossQ != NULL && cfg->propLossQ != cfg->forwardQ &&
+            cfg->propLossQ != cfg->weightGradQ && cfg->propLossQ != cfg->biasGradQ) {
+            freeReservedMemory(cfg->propLossQ->qConfig);
+            freeReservedMemory(cfg->propLossQ);
+        }
+    }
+
+    freeReservedMemory(cfg);
+    freeReservedMemory(conv1dLayer->config);
+    freeReservedMemory(conv1dLayer);
+}
diff --git a/src/userApi/layer/Conv1dTransposedApi.c b/src/userApi/layer/Conv1dTransposedApi.c
index 589351c..2a8b869 100644
--- a/src/userApi/layer/Conv1dTransposedApi.c
+++ b/src/userApi/layer/Conv1dTransposedApi.c
@@ -1,8 +1,232 @@
 #define SOURCE_FILE "CONV1D_TRANSPOSED_API"
 
-/* Stub. Full implementation lands in Task 12. This file exists so
- * Conv1dTransposedApi compiles as a library target for the CMake graph
- * to discover; the headers above declare the functions but they will
- * link-fail until Task 12 fills them in. */
+#include <stdbool.h>
+#include <stdlib.h>
 
+#include "Common.h"
+#include "Conv1dTransposed.h"
 #include "Conv1dTransposedApi.h"
+#include "Distributions.h"
+#include "Kernel.h"
+#include "Layer.h"
+#include "LayerCommon.h"
+#include "LayerQuant.h"
+#include "QuantizationApi.h"
+#include "StorageApi.h"
+#include "Tensor.h"
+#include "TensorApi.h"
+
+static bool resolveConv1dTransposedBias(bias_t b) {
+    switch (b) {
+    case BIAS_DEFAULT:
+        return true;
+    case BIAS_TRUE:
+        return true;
+    case BIAS_FALSE:
+        return false;
+    default:
+        PRINT_ERROR("conv1dTransposedLayerInit: invalid bias value (got %d)", (int)b);
+        exit(1);
+    }
+}
+
+static shape_t *buildOwnedShape(const size_t *srcDims, size_t numberOfDims) {
+    size_t *dims = reserveMemory(numberOfDims * sizeof(size_t));
+    for (size_t i = 0; i < numberOfDims; i++) {
+        dims[i] = srcDims[i];
+    }
+    size_t *order = reserveMemory(numberOfDims * sizeof(size_t));
+    setOrderOfDimsForNewTensor(numberOfDims, order);
+    shape_t *shape = reserveMemory(sizeof(shape_t));
+    setShape(shape, dims, numberOfDims, order);
+    return shape;
+}
+
+static parameter_t *allocateConv1dTransposedWeights(size_t inChannels, size_t outChannels,
+                                                    size_t groups, size_t kernelSize,
+                                                    quantization_t *storageQ) {
+    /* Conv1dTransposed weight shape: [inChannels, outChannels/groups, kernelSize].
+     * Note SWAP relative to Conv1d. Per Conv1dTransposed.h:12. */
+    if (outChannels % groups != 0) {
+        PRINT_ERROR("conv1dTransposedLayerInit: outChannels (%zu) must be divisible by "
+                    "groups (%zu)",
+                    outChannels, groups);
+        exit(1);
+    }
+    if (inChannels % groups != 0) {
+        PRINT_ERROR("conv1dTransposedLayerInit: inChannels (%zu) must be divisible by "
+                    "groups (%zu)",
+                    inChannels, groups);
+        exit(1);
+    }
+    size_t outPerGroup = outChannels / groups;
+
+    shape_t *shape = buildOwnedShape((size_t[]){inChannels, outPerGroup, kernelSize}, 3);
+    tensor_t *paramTensor = initTensor(shape, getQLike(storageQ), NULL);
+
+    if (storageQ->type != FLOAT32) {
+        PRINT_ERROR("conv1dTransposedLayerInit: KAIMING_UNIFORM init currently requires FLOAT32 "
+                    "weight storage (Issue C will lift this limit)");
+        exit(1);
+    }
+    distribution_t dist = {
+        .type = KAIMING_UNIFORM,
+        .params.kaiming = {.gain = 1.4142135623730951f, .fanMode = outPerGroup * kernelSize},
+    };
+    initDistribution(paramTensor, &dist);
+
+    tensor_t *gradTensor = gradInitFloat(paramTensor, NULL);
+    return parameterInit(paramTensor, gradTensor);
+}
+
+static parameter_t *allocateConv1dTransposedBias(size_t outChannels, quantization_t *storageQ) {
+    shape_t *shape = buildOwnedShape((size_t[]){outChannels}, 1);
+    tensor_t *paramTensor = initTensor(shape, getQLike(storageQ), NULL);
+    tensor_t *gradTensor = gradInitFloat(paramTensor, NULL);
+    return parameterInit(paramTensor, gradTensor);
+}
+
+static void validateConv1dTransposedInit(conv1dTransposedInit_t *init) {
+    if (init == NULL) {
+        PRINT_ERROR("conv1dTransposedLayerInit: init pointer is NULL");
+        exit(1);
+    }
+    if (init->inChannels == 0) {
+        PRINT_ERROR("conv1dTransposedLayerInit: inChannels must be > 0");
+        exit(1);
+    }
+    if (init->outChannels == 0) {
+        PRINT_ERROR("conv1dTransposedLayerInit: outChannels must be > 0");
+        exit(1);
+    }
+    if (init->kernelSize == 0) {
+        PRINT_ERROR("conv1dTransposedLayerInit: kernelSize must be > 0");
+        exit(1);
+    }
+}
+
+static void validateLayerQuantForConv1dTransposed(layerQuant_t *lq, bool hasBias) {
+    if (lq == NULL) {
+        PRINT_ERROR("conv1dTransposedLayerInit: lq pointer is NULL");
+        exit(1);
+    }
+    if (lq->forwardMath == NULL) {
+        PRINT_ERROR("conv1dTransposedLayerInit: layerQuant.forwardMath must be set");
+        exit(1);
+    }
+    if (lq->backwardMath == NULL) {
+        PRINT_ERROR("conv1dTransposedLayerInit: layerQuant.backwardMath must be set");
+        exit(1);
+    }
+    if (lq->weightStorage == NULL) {
+        PRINT_ERROR("conv1dTransposedLayerInit: layerQuant.weightStorage must be set");
+        exit(1);
+    }
+    if (hasBias && lq->biasStorage == NULL) {
+        PRINT_ERROR("conv1dTransposedLayerInit: layerQuant.biasStorage must be set when bias "
+                    "is enabled");
+        exit(1);
+    }
+}
+
+static kernel_t *buildConv1dTransposedKernel(conv1dTransposedInit_t *init) {
+    kernel_t *kernel = reserveMemory(sizeof(kernel_t));
+    size_t stride = init->stride == 0 ? 1 : init->stride;
+    size_t dilation = init->dilation == 0 ? 1 : init->dilation;
+    initKernel(kernel, init->kernelSize, init->padding, dilation, stride);
+    return kernel;
+}
+
+static layer_t *buildConv1dTransposedLayerSkeleton(conv1dTransposedInit_t *init, layerQuant_t *lq,
+                                                   bool hasBias, size_t groups) {
+    layer_t *layer = reserveMemory(sizeof(layer_t));
+    layer->type = CONV1D_TRANSPOSED;
+
+    layerConfig_t *layerCfg = reserveMemory(sizeof(layerConfig_t));
+    conv1dTransposedConfig_t *cfg = reserveMemory(sizeof(conv1dTransposedConfig_t));
+    layerCfg->conv1dTransposed = cfg;
+    layer->config = layerCfg;
+
+    cfg->kernel = buildConv1dTransposedKernel(init);
+    cfg->weights = allocateConv1dTransposedWeights(init->inChannels, init->outChannels, groups,
+                                                   init->kernelSize, lq->weightStorage);
+    cfg->bias = hasBias ? allocateConv1dTransposedBias(init->outChannels, lq->biasStorage) : NULL;
+    cfg->groups = groups;
+    cfg->outputPadding = init->outputPadding;
+    return layer;
+}
+
+layer_t *conv1dTransposedLayerInit(conv1dTransposedInit_t *init, layerQuant_t *lq) {
+    validateConv1dTransposedInit(init);
+    bool hasBias = resolveConv1dTransposedBias(init->bias);
+    validateLayerQuantForConv1dTransposed(lq, hasBias);
+
+    size_t groups = init->groups == 0 ? 1 : init->groups;
+
+    layer_t *layer = buildConv1dTransposedLayerSkeleton(init, lq, hasBias, groups);
+    conv1dTransposedConfig_t *cfg = layer->config->conv1dTransposed;
+    cfg->forwardQ = lq->forwardMath;
+    cfg->weightGradQ = lq->backwardMath;
+    cfg->biasGradQ = lq->backwardMath;
+    cfg->propLossQ = lq->backwardMath;
+    cfg->ownsQuantizations = false;
+    return layer;
+}
+
+layer_t *conv1dTransposedLayerInitOwning(conv1dTransposedInit_t *init, layerQuant_t *lq) {
+    validateConv1dTransposedInit(init);
+    bool hasBias = resolveConv1dTransposedBias(init->bias);
+    validateLayerQuantForConv1dTransposed(lq, hasBias);
+
+    size_t groups = init->groups == 0 ? 1 : init->groups;
+
+    layer_t *layer = buildConv1dTransposedLayerSkeleton(init, lq, hasBias, groups);
+    conv1dTransposedConfig_t *cfg = layer->config->conv1dTransposed;
+
+    cfg->forwardQ = deepCopyQuantization(lq->forwardMath);
+    cfg->weightGradQ = deepCopyQuantization(lq->backwardMath);
+    cfg->biasGradQ = deepCopyQuantization(lq->backwardMath);
+    cfg->propLossQ = deepCopyQuantization(lq->backwardMath);
+    cfg->ownsQuantizations = true;
+    return layer;
+}
+
+void freeConv1dTransposedLayer(layer_t *layer) {
+    if (layer == NULL) {
+        return;
+    }
+    conv1dTransposedConfig_t *cfg = layer->config->conv1dTransposed;
+
+    if (cfg->weights != NULL) {
+        freeParameter(cfg->weights);
+    }
+    if (cfg->bias != NULL) {
+        freeParameter(cfg->bias);
+    }
+    freeReservedMemory(cfg->kernel);
+
+    if (cfg->ownsQuantizations) {
+        if (cfg->forwardQ != NULL) {
+            freeReservedMemory(cfg->forwardQ->qConfig);
+            freeReservedMemory(cfg->forwardQ);
+        }
+        if (cfg->weightGradQ != NULL && cfg->weightGradQ != cfg->forwardQ) {
+            freeReservedMemory(cfg->weightGradQ->qConfig);
+            freeReservedMemory(cfg->weightGradQ);
+        }
+        if (cfg->biasGradQ != NULL && cfg->biasGradQ != cfg->forwardQ &&
+            cfg->biasGradQ != cfg->weightGradQ) {
+            freeReservedMemory(cfg->biasGradQ->qConfig);
+            freeReservedMemory(cfg->biasGradQ);
+        }
+        if (cfg->propLossQ != NULL && cfg->propLossQ != cfg->forwardQ &&
+            cfg->propLossQ != cfg->weightGradQ && cfg->propLossQ != cfg->biasGradQ) {
+            freeReservedMemory(cfg->propLossQ->qConfig);
+            freeReservedMemory(cfg->propLossQ);
+        }
+    }
+
+    freeReservedMemory(cfg);
+    freeReservedMemory(layer->config);
+    freeReservedMemory(layer);
+}
diff --git a/src/userApi/layer/Pool1dApi.c b/src/userApi/layer/Pool1dApi.c
index d743e54..0fc811c 100644
--- a/src/userApi/layer/Pool1dApi.c
+++ b/src/userApi/layer/Pool1dApi.c
@@ -1,5 +1,272 @@
 #define SOURCE_FILE "POOL1D_API"
 
-/* Stub. Full implementation lands in Tasks 15 and 16. */
+#include <stdbool.h>
+#include <stdlib.h>
 
+#include "AvgPool1d.h"
+#include "Common.h"
+#include "Kernel.h"
+#include "Layer.h"
+#include "LayerQuant.h"
+#include "MaxPool1d.h"
 #include "Pool1dApi.h"
+#include "QuantizationApi.h"
+#include "StorageApi.h"
+#include "Tensor.h"
+#include "TensorApi.h"
+
+/* ============================================================================
+ * Shared helpers
+ * ========================================================================== */
+
+/*! Compute output length per the geometry rule used by the internal
+ *  windowGeometry1dCalc, replicated here for factory pre-allocation
+ *  without bringing in SlidingWindow1d.
+ *
+ *  VALID: outputLength = (inputLength - dilation*(kernelSize - 1) - 1) / stride + 1
+ *  SAME:  outputLength = ceil(inputLength / stride)
+ *
+ *  Matches the runtime windowGeometry1dCalc result used by both pool
+ *  layers' forward paths. */
+static size_t computePool1dOutputLength(paddingType_t padding, size_t inputLength,
+                                        size_t kernelSize, size_t dilation, size_t stride) {
+    if (padding == SAME) {
+        return (inputLength + stride - 1) / stride;
+    }
+    /* VALID */
+    size_t effectiveK = dilation * (kernelSize - 1) + 1;
+    if (effectiveK > inputLength) {
+        PRINT_ERROR("Pool1d: effective kernel %zu exceeds inputLength %zu", effectiveK,
+                    inputLength);
+        exit(1);
+    }
+    return (inputLength - effectiveK) / stride + 1;
+}
+
+/* ============================================================================
+ * MaxPool1d
+ * ========================================================================== */
+
+static void validateMaxPool1dInit(maxPool1dInit_t *init) {
+    if (init == NULL) {
+        PRINT_ERROR("maxPool1dLayerInit: init pointer is NULL");
+        exit(1);
+    }
+    if (init->kernelSize == 0) {
+        PRINT_ERROR("maxPool1dLayerInit: kernelSize must be > 0");
+        exit(1);
+    }
+    if (init->inputChannels == 0) {
+        PRINT_ERROR("maxPool1dLayerInit: inputChannels must be > 0");
+        exit(1);
+    }
+    if (init->inputLength == 0) {
+        PRINT_ERROR("maxPool1dLayerInit: inputLength must be > 0");
+        exit(1);
+    }
+}
+
+static void validateLayerQuantForMaxPool1d(layerQuant_t *lq) {
+    if (lq == NULL) {
+        PRINT_ERROR("maxPool1dLayerInit: lq pointer is NULL");
+        exit(1);
+    }
+    if (lq->forwardMath == NULL) {
+        PRINT_ERROR("maxPool1dLayerInit: layerQuant.forwardMath must be set");
+        exit(1);
+    }
+    if (lq->backwardMath == NULL) {
+        PRINT_ERROR("maxPool1dLayerInit: layerQuant.backwardMath must be set");
+        exit(1);
+    }
+}
+
+static shape_t *buildOwnedShape(const size_t *srcDims, size_t numberOfDims) {
+    size_t *dims = reserveMemory(numberOfDims * sizeof(size_t));
+    for (size_t i = 0; i < numberOfDims; i++) {
+        dims[i] = srcDims[i];
+    }
+    size_t *order = reserveMemory(numberOfDims * sizeof(size_t));
+    setOrderOfDimsForNewTensor(numberOfDims, order);
+    shape_t *shape = reserveMemory(sizeof(shape_t));
+    setShape(shape, dims, numberOfDims, order);
+    return shape;
+}
+
+static tensor_t *buildMaxPool1dArgmax(size_t inputChannels, size_t outputLength) {
+    /* Argmax buffer is sized for batch=1 (training_batch iterates microbatch-
+     * by-microbatch in this framework). Shape: [1, inputChannels, outputLength]. */
+    shape_t *shape = buildOwnedShape((size_t[]){1, inputChannels, outputLength}, 3);
+    quantization_t *q = quantizationInitInt32();
+    return initTensor(shape, q, NULL);
+}
+
+static layer_t *buildMaxPool1dLayerSkeleton(maxPool1dInit_t *init) {
+    size_t stride = init->stride == 0 ? init->kernelSize : init->stride;
+    size_t dilation = init->dilation == 0 ? 1 : init->dilation;
+
+    kernel_t *kernel = reserveMemory(sizeof(kernel_t));
+    initKernel(kernel, init->kernelSize, init->padding, dilation, stride);
+
+    size_t outputLength = computePool1dOutputLength(init->padding, init->inputLength,
+                                                    init->kernelSize, dilation, stride);
+    tensor_t *argmax = buildMaxPool1dArgmax(init->inputChannels, outputLength);
+
+    layer_t *layer = reserveMemory(sizeof(layer_t));
+    layer->type = MAXPOOL1D;
+    layerConfig_t *layerCfg = reserveMemory(sizeof(layerConfig_t));
+    maxPool1dConfig_t *cfg = reserveMemory(sizeof(maxPool1dConfig_t));
+    layerCfg->maxPool1d = cfg;
+    layer->config = layerCfg;
+
+    cfg->kernel = kernel;
+    cfg->argmaxIndices = argmax;
+    return layer;
+}
+
+layer_t *maxPool1dLayerInit(maxPool1dInit_t *init, layerQuant_t *lq) {
+    validateMaxPool1dInit(init);
+    validateLayerQuantForMaxPool1d(lq);
+
+    layer_t *layer = buildMaxPool1dLayerSkeleton(init);
+    maxPool1dConfig_t *cfg = layer->config->maxPool1d;
+    cfg->forwardQ = lq->forwardMath;
+    cfg->propLossQ = lq->backwardMath;
+    cfg->ownsQuantizations = false;
+    return layer;
+}
+
+layer_t *maxPool1dLayerInitOwning(maxPool1dInit_t *init, layerQuant_t *lq) {
+    validateMaxPool1dInit(init);
+    validateLayerQuantForMaxPool1d(lq);
+
+    layer_t *layer = buildMaxPool1dLayerSkeleton(init);
+    maxPool1dConfig_t *cfg = layer->config->maxPool1d;
+    cfg->forwardQ = deepCopyQuantization(lq->forwardMath);
+    cfg->propLossQ = deepCopyQuantization(lq->backwardMath);
+    cfg->ownsQuantizations = true;
+    return layer;
+}
+
+void freeMaxPool1dLayer(layer_t *layer) {
+    if (layer == NULL) {
+        return;
+    }
+    maxPool1dConfig_t *cfg = layer->config->maxPool1d;
+
+    freeReservedMemory(cfg->kernel);
+    if (cfg->argmaxIndices != NULL) {
+        freeTensor(cfg->argmaxIndices);
+    }
+
+    if (cfg->ownsQuantizations) {
+        if (cfg->forwardQ != NULL) {
+            freeReservedMemory(cfg->forwardQ->qConfig);
+            freeReservedMemory(cfg->forwardQ);
+        }
+        if (cfg->propLossQ != NULL && cfg->propLossQ != cfg->forwardQ) {
+            freeReservedMemory(cfg->propLossQ->qConfig);
+            freeReservedMemory(cfg->propLossQ);
+        }
+    }
+
+    freeReservedMemory(cfg);
+    freeReservedMemory(layer->config);
+    freeReservedMemory(layer);
+}
+
+/* ============================================================================
+ * AvgPool1d
+ * ========================================================================== */
+
+static void validateAvgPool1dInit(avgPool1dInit_t *init) {
+    if (init == NULL) {
+        PRINT_ERROR("avgPool1dLayerInit: init pointer is NULL");
+        exit(1);
+    }
+    if (init->kernelSize == 0) {
+        PRINT_ERROR("avgPool1dLayerInit: kernelSize must be > 0");
+        exit(1);
+    }
+}
+
+static void validateLayerQuantForAvgPool1d(layerQuant_t *lq) {
+    if (lq == NULL) {
+        PRINT_ERROR("avgPool1dLayerInit: lq pointer is NULL");
+        exit(1);
+    }
+    if (lq->forwardMath == NULL) {
+        PRINT_ERROR("avgPool1dLayerInit: layerQuant.forwardMath must be set");
+        exit(1);
+    }
+    if (lq->backwardMath == NULL) {
+        PRINT_ERROR("avgPool1dLayerInit: layerQuant.backwardMath must be set");
+        exit(1);
+    }
+}
+
+static layer_t *buildAvgPool1dLayerSkeleton(avgPool1dInit_t *init) {
+    size_t stride = init->stride == 0 ? init->kernelSize : init->stride;
+
+    kernel_t *kernel = reserveMemory(sizeof(kernel_t));
+    /* AvgPool1d has no dilation (kernel doesn't support it); pass 1. */
+    initKernel(kernel, init->kernelSize, init->padding, /*dilation*/ 1, stride);
+
+    layer_t *layer = reserveMemory(sizeof(layer_t));
+    layer->type = AVGPOOL1D;
+    layerConfig_t *layerCfg = reserveMemory(sizeof(layerConfig_t));
+    avgPool1dConfig_t *cfg = reserveMemory(sizeof(avgPool1dConfig_t));
+    layerCfg->avgPool1d = cfg;
+    layer->config = layerCfg;
+
+    cfg->kernel = kernel;
+    return layer;
+}
+
+layer_t *avgPool1dLayerInit(avgPool1dInit_t *init, layerQuant_t *lq) {
+    validateAvgPool1dInit(init);
+    validateLayerQuantForAvgPool1d(lq);
+
+    layer_t *layer = buildAvgPool1dLayerSkeleton(init);
+    avgPool1dConfig_t *cfg = layer->config->avgPool1d;
+    cfg->forwardQ = lq->forwardMath;
+    cfg->propLossQ = lq->backwardMath;
+    cfg->ownsQuantizations = false;
+    return layer;
+}
+
+layer_t *avgPool1dLayerInitOwning(avgPool1dInit_t *init, layerQuant_t *lq) {
+    validateAvgPool1dInit(init);
+    validateLayerQuantForAvgPool1d(lq);
+
+    layer_t *layer = buildAvgPool1dLayerSkeleton(init);
+    avgPool1dConfig_t *cfg = layer->config->avgPool1d;
+    cfg->forwardQ = deepCopyQuantization(lq->forwardMath);
+    cfg->propLossQ = deepCopyQuantization(lq->backwardMath);
+    cfg->ownsQuantizations = true;
+    return layer;
+}
+
+void freeAvgPool1dLayer(layer_t *layer) {
+    if (layer == NULL) {
+        return;
+    }
+    avgPool1dConfig_t *cfg = layer->config->avgPool1d;
+
+    freeReservedMemory(cfg->kernel);
+
+    if (cfg->ownsQuantizations) {
+        if (cfg->forwardQ != NULL) {
+            freeReservedMemory(cfg->forwardQ->qConfig);
+            freeReservedMemory(cfg->forwardQ);
+        }
+        if (cfg->propLossQ != NULL && cfg->propLossQ != cfg->forwardQ) {
+            freeReservedMemory(cfg->propLossQ->qConfig);
+            freeReservedMemory(cfg->propLossQ);
+        }
+    }
+
+    freeReservedMemory(cfg);
+    freeReservedMemory(layer->config);
+    freeReservedMemory(layer);
+}
diff --git a/src/userApi/layer/SoftmaxApi.c b/src/userApi/layer/SoftmaxApi.c
index df4fe8e..0cae60c 100644
--- a/src/userApi/layer/SoftmaxApi.c
+++ b/src/userApi/layer/SoftmaxApi.c
@@ -1,7 +1,12 @@
 #define SOURCE_FILE "SOFTMAX_API"
 
-#include "SoftmaxApi.h"
+#include <stdbool.h>
+#include <stdlib.h>
+
+#include "Common.h"
+#include "LayerQuant.h"
 #include "Softmax.h"
+#include "SoftmaxApi.h"
 #include "StorageApi.h"
 
 layer_t *softmaxLayerInitLegacy(quantization_t *forwardQ, quantization_t *backwardQ) {
@@ -26,3 +31,79 @@ void freeSoftmaxLayerLegacy(layer_t *softmaxLayer) {
     freeReservedMemory(softmaxLayer->config);
     freeReservedMemory(softmaxLayer);
 }
+
+/* ============================================================================
+ * New factory API — layerQuant_t profile (PR 2).
+ * ========================================================================== */
+
+static void validateLayerQuantForSoftmax(layerQuant_t *lq) {
+    if (lq == NULL) {
+        PRINT_ERROR("softmaxLayerInit: lq pointer is NULL");
+        exit(1);
+    }
+    if (lq->forwardMath == NULL) {
+        PRINT_ERROR("softmaxLayerInit: layerQuant.forwardMath must be set");
+        exit(1);
+    }
+    if (lq->backwardMath == NULL) {
+        PRINT_ERROR("softmaxLayerInit: layerQuant.backwardMath must be set");
+        exit(1);
+    }
+}
+
+layer_t *softmaxLayerInit(layerQuant_t *lq) {
+    validateLayerQuantForSoftmax(lq);
+
+    layer_t *layer = reserveMemory(sizeof(layer_t));
+    layer->type = SOFTMAX;
+
+    layerConfig_t *layerCfg = reserveMemory(sizeof(layerConfig_t));
+    softmaxConfig_t *cfg = reserveMemory(sizeof(softmaxConfig_t));
+    layerCfg->softmax = cfg;
+    layer->config = layerCfg;
+
+    cfg->forwardQ = lq->forwardMath;
+    cfg->backwardQ = lq->backwardMath;
+    cfg->ownsQuantizations = false;
+
+    return layer;
+}
+
+layer_t *softmaxLayerInitOwning(layerQuant_t *lq) {
+    validateLayerQuantForSoftmax(lq);
+
+    layer_t *layer = reserveMemory(sizeof(layer_t));
+    layer->type = SOFTMAX;
+
+    layerConfig_t *layerCfg = reserveMemory(sizeof(layerConfig_t));
+    softmaxConfig_t *cfg = reserveMemory(sizeof(softmaxConfig_t));
+    layerCfg->softmax = cfg;
+    layer->config = layerCfg;
+
+    cfg->forwardQ = deepCopyQuantization(lq->forwardMath);
+    cfg->backwardQ = deepCopyQuantization(lq->backwardMath);
+    cfg->ownsQuantizations = true;
+
+    return layer;
+}
+
+void freeSoftmaxLayer(layer_t *softmaxLayer) {
+    if (softmaxLayer == NULL) {
+        return;
+    }
+    softmaxConfig_t *cfg = softmaxLayer->config->softmax;
+
+    if (cfg->ownsQuantizations) {
+        if (cfg->forwardQ != NULL) {
+            freeReservedMemory(cfg->forwardQ->qConfig);
+            freeReservedMemory(cfg->forwardQ);
+        }
+        if (cfg->backwardQ != NULL && cfg->backwardQ != cfg->forwardQ) {
+            freeReservedMemory(cfg->backwardQ->qConfig);
+            freeReservedMemory(cfg->backwardQ);
+        }
+    }
+    freeReservedMemory(cfg);
+    freeReservedMemory(softmaxLayer->config);
+    freeReservedMemory(softmaxLayer);
+}
diff --git a/src/userApi/layer/include/SoftmaxApi.h b/src/userApi/layer/include/SoftmaxApi.h
index beaa289..0498359 100644
--- a/src/userApi/layer/include/SoftmaxApi.h
+++ b/src/userApi/layer/include/SoftmaxApi.h
@@ -2,10 +2,23 @@
 #define SOFTMAXAPI_H
 
 #include "Layer.h"
+#include "LayerQuant.h"
 #include "Tensor.h"
 
 /* Legacy (pre-2026-05-15 factory API) — retained during PR 1/2 coexistence window. */
 layer_t *softmaxLayerInitLegacy(quantization_t *forwardQ, quantization_t *backwardQ);
 void freeSoftmaxLayerLegacy(layer_t *softmaxLayer);
 
+/*! Borrowing variant — stores lq->forwardMath in forwardQ and
+ *  lq->backwardMath in backwardQ verbatim. */
+layer_t *softmaxLayerInit(layerQuant_t *lq);
+
+/*! Owning variant — deep-copies forwardMath + backwardMath via
+ *  deepCopyQuantization. */
+layer_t *softmaxLayerInitOwning(layerQuant_t *lq);
+
+/*! Tears down the layer. Reads config->ownsQuantizations to decide
+ *  whether to also free the two quantization_t and their qConfigs. */
+void freeSoftmaxLayer(layer_t *softmaxLayer);
+
 #endif // SOFTMAXAPI_H
diff --git a/test/unit/layer/CMakeLists.txt b/test/unit/layer/CMakeLists.txt
index 5216d1c..367107c 100644
--- a/test/unit/layer/CMakeLists.txt
+++ b/test/unit/layer/CMakeLists.txt
@@ -99,6 +99,9 @@ add_elastic_ai_unit_test(
         TensorApi
         SoftmaxApi
         QuantizationApi
+        LayerQuant
+        Quantization
+        Rounding
         StorageApi
 )
 
diff --git a/test/unit/layer/UnitTestSoftmax.c b/test/unit/layer/UnitTestSoftmax.c
index 087116b..ba880eb 100644
--- a/test/unit/layer/UnitTestSoftmax.c
+++ b/test/unit/layer/UnitTestSoftmax.c
@@ -1,6 +1,8 @@
 #include <stdlib.h>
 
+#include "LayerQuant.h"
 #include "QuantizationApi.h"
+#include "Softmax.h"
 #include "SoftmaxApi.h"
 #include "StorageApi.h"
 #include "Tensor.h"
@@ -280,6 +282,54 @@ void testSoftmaxLayerInitAndFreeRoundTrip(void) {
     freeQuantization(floatQ);
 }
 
+/* ============================================================================
+ * Tests for the new layerQuant_t-based Softmax factory (PR 2).
+ * ========================================================================== */
+
+void testSoftmaxLayerInitBorrowingStoresLqPointers(void) {
+    quantization_t *qFwd = quantizationInitFloat();
+    quantization_t *qBwd = quantizationInitFloat();
+    layerQuant_t lq = {
+        .forwardMath = qFwd,
+        .backwardMath = qBwd,
+    };
+
+    layer_t *layer = softmaxLayerInit(&lq);
+
+    TEST_ASSERT_NOT_NULL(layer);
+    TEST_ASSERT_EQUAL_INT(SOFTMAX, layer->type);
+
+    softmaxConfig_t *cfg = layer->config->softmax;
+    TEST_ASSERT_EQUAL_PTR(qFwd, cfg->forwardQ);
+    TEST_ASSERT_EQUAL_PTR(qBwd, cfg->backwardQ);
+    TEST_ASSERT_FALSE(cfg->ownsQuantizations);
+
+    freeSoftmaxLayer(layer);
+    freeQuantization(qFwd);
+    freeQuantization(qBwd);
+}
+
+void testSoftmaxLayerInitOwningDeepCopiesLqPointers(void) {
+    quantization_t *qFwd = quantizationInitFloat();
+    quantization_t *qBwd = quantizationInitFloat();
+    layerQuant_t lq = {
+        .forwardMath = qFwd,
+        .backwardMath = qBwd,
+    };
+
+    layer_t *layer = softmaxLayerInitOwning(&lq);
+
+    softmaxConfig_t *cfg = layer->config->softmax;
+    TEST_ASSERT_NOT_EQUAL(qFwd, cfg->forwardQ);
+    TEST_ASSERT_NOT_EQUAL(qBwd, cfg->backwardQ);
+    TEST_ASSERT_EQUAL_INT(qFwd->type, cfg->forwardQ->type);
+    TEST_ASSERT_TRUE(cfg->ownsQuantizations);
+
+    freeSoftmaxLayer(layer);
+    freeQuantization(qFwd);
+    freeQuantization(qBwd);
+}
+
 void setUp() {}
 void tearDown() {}
 
@@ -292,5 +342,7 @@ int main() {
     RUN_TEST(unitTestSoftmaxBackwardSymInt32);
 
     RUN_TEST(testSoftmaxLayerInitAndFreeRoundTrip);
+    RUN_TEST(testSoftmaxLayerInitBorrowingStoresLqPointers);
+    RUN_TEST(testSoftmaxLayerInitOwningDeepCopiesLqPointers);
     return UNITY_END();
 }
diff --git a/test/unit/loss_functions/CMakeLists.txt b/test/unit/loss_functions/CMakeLists.txt
index 0aca768..c91b357 100644
--- a/test/unit/loss_functions/CMakeLists.txt
+++ b/test/unit/loss_functions/CMakeLists.txt
@@ -6,6 +6,7 @@ add_elastic_ai_unit_test(
         Log
         SoftmaxApi
         QuantizationApi
+        LayerQuant
         TensorApi
 )
 
diff --git a/test/unit/userAPI/UnitTestConv1dApi.c b/test/unit/userAPI/UnitTestConv1dApi.c
index a605a9b..072aee7 100644
--- a/test/unit/userAPI/UnitTestConv1dApi.c
+++ b/test/unit/userAPI/UnitTestConv1dApi.c
@@ -8,6 +8,7 @@
 #include "LayerQuant.h"
 #include "QuantizationApi.h"
 #include "Tensor.h"
+#include "TensorApi.h"
 #include "unity.h"
 
 void setUp() {}
@@ -71,6 +72,7 @@ void testConv1dLayerInitBorrowingBuildsLayerWithCorrectShape(void) {
     TEST_ASSERT_EQUAL_UINT(1, cfg->groups);
 
     freeConv1dLayer(layer);
+    freeQuantization(q);
 }
 
 void testConv1dLayerInitBorrowingBiasDefaultResolvesToTrue(void) {
@@ -91,6 +93,7 @@ void testConv1dLayerInitBorrowingBiasDefaultResolvesToTrue(void) {
     TEST_ASSERT_NOT_NULL(cfg->bias);
 
     freeConv1dLayer(layer);
+    freeQuantization(q);
 }
 
 void testConv1dLayerInitBorrowingBiasFalseLeavesBiasNull(void) {
@@ -111,6 +114,7 @@ void testConv1dLayerInitBorrowingBiasFalseLeavesBiasNull(void) {
     TEST_ASSERT_NULL(cfg->bias);
 
     freeConv1dLayer(layer);
+    freeQuantization(q);
 }
 
 void testConv1dLayerInitBorrowingPaddingDefaultIsValid(void) {
@@ -135,6 +139,61 @@ void testConv1dLayerInitBorrowingPaddingDefaultIsValid(void) {
     TEST_ASSERT_EQUAL_UINT(1, cfg->groups);
 
     freeConv1dLayer(layer);
+    freeQuantization(q);
+}
+
+void testConv1dLayerInitOwningDeepCopiesQuantizations(void) {
+    quantization_t *q = quantizationInitFloat();
+    layerQuant_t lq;
+    layerQuantInitUniform(&lq, q);
+
+    layer_t *layer = conv1dLayerInitOwning(
+        &(conv1dInit_t){
+            .inChannels = 3,
+            .outChannels = 4,
+            .kernelSize = 5,
+            .bias = BIAS_TRUE,
+        },
+        &lq);
+
+    conv1dConfig_t *cfg = layer->config->conv1d;
+
+    /* Owning variant: cfg->forwardQ is a fresh allocation, NOT the original q */
+    TEST_ASSERT_NOT_EQUAL(q, cfg->forwardQ);
+    TEST_ASSERT_NOT_EQUAL(q, cfg->weightGradQ);
+    TEST_ASSERT_NOT_EQUAL(q, cfg->biasGradQ);
+    TEST_ASSERT_NOT_EQUAL(q, cfg->propLossQ);
+
+    /* But the copy has equal type to the original */
+    TEST_ASSERT_EQUAL_INT(q->type, cfg->forwardQ->type);
+
+    /* ownsQuantizations flag is set */
+    TEST_ASSERT_TRUE(cfg->ownsQuantizations);
+
+    freeConv1dLayer(layer);
+    freeQuantization(q);
+}
+
+void testConv1dLayerInitOwningFreesAllAllocationsWithoutLeak(void) {
+    /* Build + free 5 layers — if anything leaks, LSan catches it in CI. */
+    for (int i = 0; i < 5; i++) {
+        quantization_t *q = quantizationInitFloat();
+        layerQuant_t lq;
+        layerQuantInitUniform(&lq, q);
+
+        layer_t *layer = conv1dLayerInitOwning(
+            &(conv1dInit_t){
+                .inChannels = 8,
+                .outChannels = 4,
+                .kernelSize = 3,
+                .bias = BIAS_TRUE,
+            },
+            &lq);
+
+        freeConv1dLayer(layer);
+        freeQuantization(q);
+    }
+    TEST_PASS();
 }
 
 int main(void) {
@@ -143,5 +202,7 @@ int main(void) {
     RUN_TEST(testConv1dLayerInitBorrowingBiasDefaultResolvesToTrue);
     RUN_TEST(testConv1dLayerInitBorrowingBiasFalseLeavesBiasNull);
     RUN_TEST(testConv1dLayerInitBorrowingPaddingDefaultIsValid);
+    RUN_TEST(testConv1dLayerInitOwningDeepCopiesQuantizations);
+    RUN_TEST(testConv1dLayerInitOwningFreesAllAllocationsWithoutLeak);
     return UNITY_END();
 }
diff --git a/test/unit/userAPI/UnitTestConv1dTransposedApi.c b/test/unit/userAPI/UnitTestConv1dTransposedApi.c
index 4725fd0..d1a55af 100644
--- a/test/unit/userAPI/UnitTestConv1dTransposedApi.c
+++ b/test/unit/userAPI/UnitTestConv1dTransposedApi.c
@@ -118,10 +118,60 @@ void testConv1dTransposedLayerInitBorrowingOutputPaddingPropagatesToConfig(void)
     freeQuantization(q);
 }
 
+void testConv1dTransposedLayerInitOwningDeepCopiesQuantizations(void) {
+    quantization_t *q = quantizationInitFloat();
+    layerQuant_t lq;
+    layerQuantInitUniform(&lq, q);
+
+    layer_t *layer = conv1dTransposedLayerInitOwning(
+        &(conv1dTransposedInit_t){
+            .inChannels = 8,
+            .outChannels = 4,
+            .kernelSize = 3,
+            .bias = BIAS_TRUE,
+        },
+        &lq);
+
+    conv1dTransposedConfig_t *cfg = layer->config->conv1dTransposed;
+
+    TEST_ASSERT_NOT_EQUAL(q, cfg->forwardQ);
+    TEST_ASSERT_NOT_EQUAL(q, cfg->weightGradQ);
+    TEST_ASSERT_NOT_EQUAL(q, cfg->biasGradQ);
+    TEST_ASSERT_NOT_EQUAL(q, cfg->propLossQ);
+    TEST_ASSERT_EQUAL_INT(q->type, cfg->forwardQ->type);
+    TEST_ASSERT_TRUE(cfg->ownsQuantizations);
+
+    freeConv1dTransposedLayer(layer);
+    freeQuantization(q);
+}
+
+void testConv1dTransposedLayerInitOwningFreesAllAllocationsWithoutLeak(void) {
+    for (int i = 0; i < 5; i++) {
+        quantization_t *q = quantizationInitFloat();
+        layerQuant_t lq;
+        layerQuantInitUniform(&lq, q);
+
+        layer_t *layer = conv1dTransposedLayerInitOwning(
+            &(conv1dTransposedInit_t){
+                .inChannels = 4,
+                .outChannels = 2,
+                .kernelSize = 3,
+                .bias = BIAS_TRUE,
+            },
+            &lq);
+
+        freeConv1dTransposedLayer(layer);
+        freeQuantization(q);
+    }
+    TEST_PASS();
+}
+
 int main(void) {
     UNITY_BEGIN();
     RUN_TEST(testConv1dTransposedLayerInitBorrowingBuildsLayerWithCorrectShape);
     RUN_TEST(testConv1dTransposedLayerInitBorrowingBiasFalseLeavesBiasNull);
     RUN_TEST(testConv1dTransposedLayerInitBorrowingOutputPaddingPropagatesToConfig);
+    RUN_TEST(testConv1dTransposedLayerInitOwningDeepCopiesQuantizations);
+    RUN_TEST(testConv1dTransposedLayerInitOwningFreesAllAllocationsWithoutLeak);
     return UNITY_END();
 }

From 96f0639a4f7bb967dfbd539acfa369cc1b88ccd8 Mon Sep 17 00:00:00 2001
From: Leo Buron <leo.buron@uni-due.de>
Date: Fri, 15 May 2026 22:21:02 +0200
Subject: [PATCH 3/4] feat(examples): emit per-layer state_dict .npy files
 post-training (HAR + ECG)

train_pytorch.py for both examples now writes the trained model's
per-layer weight + bias tensors to examples/<dir>/weights/. This is the
input to the v2 binary's BIT_PARITY mode introduced in PR 2 Tasks 20/21,
which loads these via modelLoadStateDict and runs inference for the CI
bit-parity gate.

Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 10

feat(examples): add har_classifier_v2 using new factory API

Same architecture as the legacy har_classifier binary (Conv->ReLU->Pool
x3 + Flatten + Linear + Softmax) but constructed via conv1dLayerInit,
reluLayerInit, maxPool1dLayerInit, avgPool1dLayerInit, flattenLayerInit,
linearLayerInit, softmaxLayerInit. Shares the legacy data directory.
Outputs to examples/har_classifier_v2/{logs,outputs}/. Supports
BIT_PARITY env-var mode (used by the bit-parity CI step) which loads
PyTorch state_dict via modelLoadStateDict and skips training.

Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 10 (coexistence strategy Z)

feat(examples): add ecg_anomaly_ae_v2 using new factory API

Encoder/decoder AE same as legacy ecg_anomaly_ae but built via
conv1dLayerInit / reluLayerInit / maxPool1dLayerInit / avgPool1dLayerInit
/ conv1dTransposedLayerInit. Supports BIT_PARITY env-var mode using
modelLoadStateDict.

Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 10
---
 examples/CMakeLists.txt                   |   2 +
 examples/ecg_anomaly_ae/train_pytorch.py  |  27 ++
 examples/ecg_anomaly_ae_v2/CMakeLists.txt |  56 ++++
 examples/ecg_anomaly_ae_v2/train_c.c      | 356 ++++++++++++++++++++
 examples/har_classifier/train_pytorch.py  |  25 ++
 examples/har_classifier_v2/CMakeLists.txt |  62 ++++
 examples/har_classifier_v2/train_c.c      | 387 ++++++++++++++++++++++
 7 files changed, 915 insertions(+)
 create mode 100644 examples/ecg_anomaly_ae_v2/CMakeLists.txt
 create mode 100644 examples/ecg_anomaly_ae_v2/train_c.c
 create mode 100644 examples/har_classifier_v2/CMakeLists.txt
 create mode 100644 examples/har_classifier_v2/train_c.c

diff --git a/examples/CMakeLists.txt b/examples/CMakeLists.txt
index 7872c09..2f2fbaa 100644
--- a/examples/CMakeLists.txt
+++ b/examples/CMakeLists.txt
@@ -1,3 +1,5 @@
 add_subdirectory(_shared)
 add_subdirectory(har_classifier)
+add_subdirectory(har_classifier_v2)
 add_subdirectory(ecg_anomaly_ae)
+add_subdirectory(ecg_anomaly_ae_v2)
diff --git a/examples/ecg_anomaly_ae/train_pytorch.py b/examples/ecg_anomaly_ae/train_pytorch.py
index 81b5f99..7ef0f64 100644
--- a/examples/ecg_anomaly_ae/train_pytorch.py
+++ b/examples/ecg_anomaly_ae/train_pytorch.py
@@ -188,6 +188,33 @@ def main() -> None:
     np.save(OUTPUTS / "pytorch_train_recons.npy", pt_train_recons.astype(np.float32))
     print(f"FINAL test_loss={test_loss:.6f}", flush=True)
 
+    # Save per-layer weights for the C-side BIT_PARITY mode.
+    # C-side expects: examples/ecg_anomaly_ae/weights/<name>.{weight,bias}.npy
+    # Where <name> in {e1, e2, d1, d2, d3} matches the order in v2's buildModel.
+    import os
+
+    weights_dir = HERE / "weights"
+    os.makedirs(weights_dir, exist_ok=True)
+
+    # Keys match C-side loadStateDictFromDir() names; values are actual PyTorch attrs.
+    layer_map = {
+        "e1": model.enc1,   # Conv1d(1->8, K=7, S=2)
+        "e2": model.enc2,   # Conv1d(8->16, K=5)
+        "d1": model.dec1,   # ConvTranspose1d(16->8, K=5, S=5)
+        "d2": model.dec2,   # ConvTranspose1d(8->4, K=2, S=2)
+        "d3": model.dec3,   # ConvTranspose1d(4->1, K=2, S=2)
+    }
+
+    print("Saving per-layer weights:", flush=True)
+    for name, layer in layer_map.items():
+        w = layer.weight.detach().cpu().numpy().astype(np.float32)
+        np.save(weights_dir / f"{name}.weight.npy", w)
+        if layer.bias is not None:
+            b = layer.bias.detach().cpu().numpy().astype(np.float32)
+            np.save(weights_dir / f"{name}.bias.npy", b)
+        has_bias = f" + {name}.bias.npy" if layer.bias is not None else ""
+        print(f"  wrote {name}.weight.npy shape={w.shape}{has_bias}", flush=True)
+
 
 if __name__ == "__main__":
     main()
diff --git a/examples/ecg_anomaly_ae_v2/CMakeLists.txt b/examples/ecg_anomaly_ae_v2/CMakeLists.txt
new file mode 100644
index 0000000..d9a9c07
--- /dev/null
+++ b/examples/ecg_anomaly_ae_v2/CMakeLists.txt
@@ -0,0 +1,56 @@
+add_executable(train_c_ecg_anomaly_ae_v2 train_c.c)
+
+target_link_libraries(train_c_ecg_anomaly_ae_v2 PRIVATE
+        DataLoaderApi
+        DataLoader
+        NPYLoaderApi
+        NPYLoader
+
+        Layer
+
+        Conv1dApi
+        Conv1d
+
+        Conv1dTransposedApi
+        Conv1dTransposed
+
+        ReluApi
+        Relu
+
+        Pool1dApi
+        MaxPool1d
+        AvgPool1d
+
+        QuantizationApi
+        Quantization
+
+        TensorApi
+        Tensor
+        Rounding
+
+        TrainingLoopApi
+        CalculateGradsSequential
+        TrainingBatchDefault
+        TrainingEpochDefault
+        Optimizer
+
+        LossFunction
+        MSE
+
+        Sgd
+        SgdApi
+
+        InferenceApi
+
+        StateDictApi
+        LayerWeightsApi
+        LayerQuant
+        LayerCommon
+        Distributions
+
+        Common
+        StorageApi
+        RNG
+
+        examples_shared
+)
diff --git a/examples/ecg_anomaly_ae_v2/train_c.c b/examples/ecg_anomaly_ae_v2/train_c.c
new file mode 100644
index 0000000..06c3a32
--- /dev/null
+++ b/examples/ecg_anomaly_ae_v2/train_c.c
@@ -0,0 +1,356 @@
+#define SOURCE_FILE "ecg_anomaly_ae_v2_train_c"
+
+#include <errno.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/stat.h>
+#include <time.h>
+
+#include "CalculateGradsSequential.h"
+#include "Common.h"
+#include "Conv1dApi.h"
+#include "Conv1dTransposedApi.h"
+#include "DataLoader.h"
+#include "DataLoaderApi.h"
+#include "InferenceApi.h"
+#include "Layer.h"
+#include "LayerCommon.h"
+#include "LayerQuant.h"
+#include "LossFunction.h"
+#include "NPYLoaderApi.h"
+#include "Pool1dApi.h"
+#include "Quantization.h"
+#include "QuantizationApi.h"
+#include "ReluApi.h"
+#include "SgdApi.h"
+#include "StateDictApi.h"
+#include "StorageApi.h"
+#include "Tensor.h"
+#include "TensorApi.h"
+#include "TrainingLoopApi.h"
+
+#include "npy_writer.h"
+
+#define EPOCHS 200
+#define BATCH 32
+#define LR 0.005f
+#define MOMENTUM 0.9f
+#define SEED 42
+#define SHUFFLE_SEED 42
+
+#define IN_CHANNELS 1
+#define LEN_INPUT 140
+
+#define E1_OUT 8
+#define E1_K 7
+#define E1_S 2
+#define E2_OUT 16
+#define E2_K 5
+
+#define D1_OUT 8
+#define D1_K 5
+#define D1_S 5
+#define D2_OUT 4
+#define D2_K 2
+#define D2_S 2
+#define D3_OUT 1
+#define D3_K 2
+#define D3_S 2
+
+#define MODEL_SIZE 11
+
+static dataset_t g_trainDataset;
+static dataset_t g_valDataset;
+static dataset_t g_testDataset;
+
+static void reshapeItemsAddBatchDim(tensorArray_t *items) {
+    for (size_t i = 0; i < items->size; ++i) {
+        tensor_t *t = items->array[i];
+        size_t oldRank = t->shape->numberOfDimensions;
+        size_t newRank = oldRank + 1;
+
+        size_t *newDims = reserveMemory(newRank * sizeof(size_t));
+        size_t *newOrder = reserveMemory(newRank * sizeof(size_t));
+        newDims[0] = 1;
+        for (size_t d = 0; d < oldRank; ++d) {
+            newDims[d + 1] = t->shape->dimensions[d];
+        }
+        for (size_t d = 0; d < newRank; ++d) {
+            newOrder[d] = d;
+        }
+
+        freeReservedMemory(t->shape->dimensions);
+        freeReservedMemory(t->shape->orderOfDimensions);
+        t->shape->dimensions = newDims;
+        t->shape->orderOfDimensions = newOrder;
+        t->shape->numberOfDimensions = newRank;
+    }
+}
+
+static void initDataSets(void) {
+    tensorArray_t *trainItems = npyLoad("examples/ecg_anomaly_ae/data/train_x.npy");
+    tensorArray_t *trainLabels = npyLoad("examples/ecg_anomaly_ae/data/train_x.npy");
+    reshapeItemsAddBatchDim(trainItems);
+    reshapeItemsAddBatchDim(trainLabels);
+    g_trainDataset.items = trainItems;
+    g_trainDataset.labels = trainLabels;
+
+    tensorArray_t *valItems = npyLoad("examples/ecg_anomaly_ae/data/val_x.npy");
+    tensorArray_t *valLabels = npyLoad("examples/ecg_anomaly_ae/data/val_x.npy");
+    reshapeItemsAddBatchDim(valItems);
+    reshapeItemsAddBatchDim(valLabels);
+    g_valDataset.items = valItems;
+    g_valDataset.labels = valLabels;
+
+    tensorArray_t *testItems = npyLoad("examples/ecg_anomaly_ae/data/test_x.npy");
+    tensorArray_t *testLabels = npyLoad("examples/ecg_anomaly_ae/data/test_x.npy");
+    reshapeItemsAddBatchDim(testItems);
+    reshapeItemsAddBatchDim(testLabels);
+    g_testDataset.items = testItems;
+    g_testDataset.labels = testLabels;
+}
+
+static sample_t *getTrainSample(size_t id) {
+    return npyGetSample(&g_trainDataset, id);
+}
+static sample_t *getValSample(size_t id) {
+    return npyGetSample(&g_valDataset, id);
+}
+static sample_t *getTestSample(size_t id) {
+    return npyGetSample(&g_testDataset, id);
+}
+static size_t getTrainSize(void) {
+    return g_trainDataset.items->size;
+}
+static size_t getValSize(void) {
+    return g_valDataset.items->size;
+}
+static size_t getTestSize(void) {
+    return g_testDataset.items->size;
+}
+
+static void buildModel(layer_t **model, layerQuant_t *lq) {
+    /* Encoder */
+    model[0] = conv1dLayerInit(&(conv1dInit_t){.inChannels = IN_CHANNELS,
+                                               .outChannels = E1_OUT,
+                                               .kernelSize = E1_K,
+                                               .stride = E1_S,
+                                               .padding = SAME},
+                               lq);
+    model[1] = reluLayerInit(lq);
+    model[2] = maxPool1dLayerInit(
+        &(maxPool1dInit_t){
+            .kernelSize = 2, .stride = 2, .inputChannels = E1_OUT, .inputLength = LEN_INPUT / E1_S},
+        lq);
+
+    model[3] = conv1dLayerInit(
+        &(conv1dInit_t){
+            .inChannels = E1_OUT, .outChannels = E2_OUT, .kernelSize = E2_K, .padding = SAME},
+        lq);
+    model[4] = reluLayerInit(lq);
+    model[5] = avgPool1dLayerInit(&(avgPool1dInit_t){.kernelSize = 5, .stride = 5}, lq);
+
+    /* Decoder */
+    model[6] = conv1dTransposedLayerInit(
+        &(conv1dTransposedInit_t){
+            .inChannels = E2_OUT, .outChannels = D1_OUT, .kernelSize = D1_K, .stride = D1_S},
+        lq);
+    model[7] = reluLayerInit(lq);
+
+    model[8] = conv1dTransposedLayerInit(
+        &(conv1dTransposedInit_t){
+            .inChannels = D1_OUT, .outChannels = D2_OUT, .kernelSize = D2_K, .stride = D2_S},
+        lq);
+    model[9] = reluLayerInit(lq);
+
+    model[10] = conv1dTransposedLayerInit(
+        &(conv1dTransposedInit_t){
+            .inChannels = D2_OUT, .outChannels = D3_OUT, .kernelSize = D3_K, .stride = D3_S},
+        lq);
+}
+
+static int loadStateDictFromDir(layer_t **model, const char *weightsDir) {
+    /* Param layer order in model[]: e1 (0), e2 (3), d1 (6), d2 (8), d3 (10). 5 entries. */
+    char wPath[256], bPath[256];
+    const char *names[5] = {"e1", "e2", "d1", "d2", "d3"};
+    tensor_t *w[5] = {0};
+    tensor_t *b[5] = {0};
+
+    for (int i = 0; i < 5; i++) {
+        snprintf(wPath, sizeof(wPath), "%s/%s.weight.npy", weightsDir, names[i]);
+        snprintf(bPath, sizeof(bPath), "%s/%s.bias.npy", weightsDir, names[i]);
+        tensorArray_t *wArr = npyLoad(wPath);
+        tensorArray_t *bArr = npyLoad(bPath);
+        if (wArr == NULL || bArr == NULL) {
+            fprintf(stderr, "loadStateDictFromDir: missing %s or %s\n", wPath, bPath);
+            return 1;
+        }
+        w[i] = wArr->array[0];
+        b[i] = bArr->array[0];
+    }
+
+    modelLoadStateDict(
+        model, MODEL_SIZE,
+        (stateDictEntry_t[]){
+            {.name = names[0], .weightData = (float *)w[0]->data, .biasData = (float *)b[0]->data},
+            {.name = names[1], .weightData = (float *)w[1]->data, .biasData = (float *)b[1]->data},
+            {.name = names[2], .weightData = (float *)w[2]->data, .biasData = (float *)b[2]->data},
+            {.name = names[3], .weightData = (float *)w[3]->data, .biasData = (float *)b[3]->data},
+            {.name = names[4], .weightData = (float *)w[4]->data, .biasData = (float *)b[4]->data},
+        },
+        5);
+    return 0;
+}
+
+static FILE *g_log_file = NULL;
+static int g_first_epoch = 1;
+static struct timespec g_epoch_t0;
+
+static void epochCallback(size_t epoch, float trainLoss, epochStats_t evalStats) {
+    struct timespec t1;
+    clock_gettime(CLOCK_MONOTONIC, &t1);
+    double wall_s =
+        (double)(t1.tv_sec - g_epoch_t0.tv_sec) + (double)(t1.tv_nsec - g_epoch_t0.tv_nsec) * 1e-9;
+
+    if (!g_first_epoch) {
+        fprintf(g_log_file, ",\n");
+    }
+    fprintf(g_log_file,
+            "    {\"epoch\": %zu, \"step_losses\": [], \"train_loss\": %.6f, "
+            "\"val_loss\": %.6f, \"val_acc\": null, \"wall_s\": %.4f}",
+            epoch, (double)trainLoss, (double)evalStats.loss, wall_s);
+    fflush(g_log_file);
+    g_first_epoch = 0;
+
+    fprintf(stdout, "epoch %zu: train_loss=%.6f val_loss=%.6f wall_s=%.2f\n", epoch,
+            (double)trainLoss, (double)evalStats.loss, wall_s);
+    fflush(stdout);
+
+    clock_gettime(CLOCK_MONOTONIC, &g_epoch_t0);
+}
+
+static int writeAllReconstructions(layer_t **model, size_t modelSize,
+                                   sample_t *(*getSample)(size_t), size_t n, const char *outPath) {
+    size_t totalElems = n * IN_CHANNELS * LEN_INPUT;
+    float *buf = malloc(totalElems * sizeof(float));
+    if (!buf) {
+        fprintf(stderr, "OOM allocating reconstruction buffer (n=%zu)\n", n);
+        return 1;
+    }
+
+    for (size_t i = 0; i < n; ++i) {
+        sample_t *s = getSample(i);
+        tensor_t *out = inference(model, modelSize, s->item);
+        const float *recon = (const float *)out->data;
+        memcpy(buf + i * IN_CHANNELS * LEN_INPUT, recon, IN_CHANNELS * LEN_INPUT * sizeof(float));
+        freeTensor(out);
+        freeSample(s);
+    }
+
+    size_t outShape[3] = {n, IN_CHANNELS, LEN_INPUT};
+    int rc = npyWriteFloat32(outPath, buf, outShape, 3);
+    free(buf);
+    return rc;
+}
+
+static int ensureDir(const char *p) {
+    if (mkdir(p, S_IRWXU | S_IRWXG | S_IROTH | S_IXOTH) == 0) {
+        return 0;
+    }
+    if (errno == EEXIST) {
+        return 0;
+    }
+    fprintf(stderr, "ERROR: cannot create %s: %s\n", p, strerror(errno));
+    return 1;
+}
+
+int main(void) {
+    if (ensureDir("examples/ecg_anomaly_ae_v2/logs") != 0) {
+        return 1;
+    }
+    if (ensureDir("examples/ecg_anomaly_ae_v2/outputs") != 0) {
+        return 1;
+    }
+
+    initDataSets();
+
+    dataLoader_t *testLoader = dataLoaderInit(getTestSample, getTestSize, 1, NULL, NULL,
+                                              /*shuffle*/ false, /*shuffleSeed*/ 0,
+                                              /*dropLast*/ true);
+
+    layerQuant_t lq;
+    layerQuantInitUniform(&lq, quantizationInitFloat());
+
+    layer_t *model[MODEL_SIZE];
+    buildModel(model, &lq);
+
+    const char *bitParity = getenv("BIT_PARITY");
+    if (bitParity != NULL && bitParity[0] != '\0') {
+        const char *wDir = "examples/ecg_anomaly_ae/weights";
+        if (loadStateDictFromDir(model, wDir) != 0) {
+            fprintf(stderr, "BIT_PARITY: state_dict load failed\n");
+            return 1;
+        }
+        fprintf(stdout, "BIT_PARITY: loaded state_dict from %s\n", wDir);
+    } else {
+        dataLoader_t *trainLoader = dataLoaderInit(getTrainSample, getTrainSize, BATCH, NULL, NULL,
+                                                   /*shuffle*/ true, /*shuffleSeed*/ SHUFFLE_SEED,
+                                                   /*dropLast*/ true);
+        dataLoader_t *valLoader = dataLoaderInit(getValSample, getValSize, 1, NULL, NULL,
+                                                 /*shuffle*/ false, /*shuffleSeed*/ 0,
+                                                 /*dropLast*/ true);
+
+        optimizer_t *sgd =
+            sgdMCreateOptim(LR, MOMENTUM, /*weightDecay*/ 0.0f, model, MODEL_SIZE, FLOAT32);
+
+        g_log_file = fopen("examples/ecg_anomaly_ae_v2/logs/c.json", "w");
+        if (!g_log_file) {
+            fprintf(stderr, "ERROR: cannot open log file for writing\n");
+            return 1;
+        }
+        fprintf(g_log_file,
+                "{\n"
+                "  \"impl\": \"c_v2\",\n"
+                "  \"example\": \"ecg_anomaly_ae\",\n"
+                "  \"config\": {\"epochs\": %d, \"batch\": %d, \"lr\": %.6f, "
+                "\"momentum\": %.6f, \"seed\": %d, \"shuffle_seed\": %d},\n"
+                "  \"epochs\": [\n",
+                EPOCHS, BATCH, (double)LR, (double)MOMENTUM, SEED, SHUFFLE_SEED);
+        fflush(g_log_file);
+
+        clock_gettime(CLOCK_MONOTONIC, &g_epoch_t0);
+
+        trainingRunResult_t result = trainingRun(
+            model, MODEL_SIZE,
+            (lossConfig_t){
+                .funcType = MSE, .backwardReduction = REDUCTION_MEAN, .classWeights = NULL},
+            trainLoader, valLoader, sgd, EPOCHS, calculateGradsSequential, inferenceWithLoss,
+            epochCallback);
+        (void)result;
+
+        float testLoss =
+            evaluationEpoch(model, MODEL_SIZE, MSE, testLoader, inferenceWithLoss, REDUCTION_MEAN);
+
+        fprintf(g_log_file,
+                "\n  ],\n"
+                "  \"final\": {\"test_loss\": %.6f, \"test_acc\": null, "
+                "\"test_auc\": null}\n"
+                "}\n",
+                (double)testLoss);
+        fclose(g_log_file);
+
+        fprintf(stdout, "FINAL test_loss=%.6f\n", (double)testLoss);
+    }
+
+    int status = 0;
+    int rc = writeAllReconstructions(model, MODEL_SIZE, getTestSample, getTestSize(),
+                                     "examples/ecg_anomaly_ae_v2/outputs/c_reconstructions.npy");
+    if (rc != 0) {
+        fprintf(stderr, "ERROR: c_reconstructions.npy write failed (rc=%d)\n", rc);
+        status = 1;
+    }
+
+    return status;
+}
diff --git a/examples/har_classifier/train_pytorch.py b/examples/har_classifier/train_pytorch.py
index 84df12f..07149a7 100644
--- a/examples/har_classifier/train_pytorch.py
+++ b/examples/har_classifier/train_pytorch.py
@@ -154,6 +154,31 @@ def main() -> None:
     np.save(OUTPUTS / "pytorch_predictions.npy", preds)
     print(f"FINAL test_loss={test_loss:.4f} test_acc={test_acc:.4f}", flush=True)
 
+    # Save per-layer weights for the C-side BIT_PARITY mode.
+    # C-side expects: examples/har_classifier/weights/<name>.{weight,bias}.npy
+    # Where <name> in {conv1, conv2, conv3, fc} matches the order in v2's buildModel.
+    import os
+
+    weights_dir = HERE / "weights"
+    os.makedirs(weights_dir, exist_ok=True)
+
+    layer_map = {
+        "conv1": model.conv1,
+        "conv2": model.conv2,
+        "conv3": model.conv3,
+        "fc": model.fc,
+    }
+
+    print("Saving per-layer weights:", flush=True)
+    for name, layer in layer_map.items():
+        w = layer.weight.detach().cpu().numpy().astype(np.float32)
+        np.save(weights_dir / f"{name}.weight.npy", w)
+        if layer.bias is not None:
+            b = layer.bias.detach().cpu().numpy().astype(np.float32)
+            np.save(weights_dir / f"{name}.bias.npy", b)
+        has_bias = f" + {name}.bias.npy" if layer.bias is not None else ""
+        print(f"  wrote {name}.weight.npy shape={w.shape}{has_bias}", flush=True)
+
 
 if __name__ == "__main__":
     main()
diff --git a/examples/har_classifier_v2/CMakeLists.txt b/examples/har_classifier_v2/CMakeLists.txt
new file mode 100644
index 0000000..ad72f40
--- /dev/null
+++ b/examples/har_classifier_v2/CMakeLists.txt
@@ -0,0 +1,62 @@
+add_executable(train_c_har_classifier_v2 train_c.c)
+
+target_link_libraries(train_c_har_classifier_v2 PRIVATE
+        DataLoaderApi
+        DataLoader
+        NPYLoaderApi
+        NPYLoader
+
+        Layer
+
+        Conv1dApi
+        Conv1d
+
+        LinearApi
+        Linear
+
+        ReluApi
+        Relu
+
+        FlattenApi
+        Flatten
+
+        Pool1dApi
+        MaxPool1d
+        AvgPool1d
+
+        QuantizationApi
+        Quantization
+
+        TensorApi
+        Tensor
+        Rounding
+
+        TrainingLoopApi
+        CalculateGradsSequential
+        TrainingBatchDefault
+        TrainingEpochDefault
+        Optimizer
+
+        LossFunction
+        CrossEntropy
+
+        SoftmaxApi
+        Softmax
+
+        Sgd
+        SgdApi
+
+        InferenceApi
+
+        StateDictApi
+        LayerWeightsApi
+        LayerQuant
+        LayerCommon
+        Distributions
+
+        Common
+        StorageApi
+        RNG
+
+        examples_shared
+)
diff --git a/examples/har_classifier_v2/train_c.c b/examples/har_classifier_v2/train_c.c
new file mode 100644
index 0000000..0171a80
--- /dev/null
+++ b/examples/har_classifier_v2/train_c.c
@@ -0,0 +1,387 @@
+#define SOURCE_FILE "har_classifier_v2_train_c"
+
+#include <errno.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/stat.h>
+#include <time.h>
+
+#include "CalculateGradsSequential.h"
+#include "Common.h"
+#include "Conv1dApi.h"
+#include "DataLoader.h"
+#include "DataLoaderApi.h"
+#include "FlattenApi.h"
+#include "InferenceApi.h"
+#include "Layer.h"
+#include "LayerCommon.h"
+#include "LayerQuant.h"
+#include "LinearApi.h"
+#include "LossFunction.h"
+#include "NPYLoaderApi.h"
+#include "Pool1dApi.h"
+#include "Quantization.h"
+#include "QuantizationApi.h"
+#include "ReluApi.h"
+#include "SgdApi.h"
+#include "SoftmaxApi.h"
+#include "StateDictApi.h"
+#include "StorageApi.h"
+#include "Tensor.h"
+#include "TensorApi.h"
+#include "TrainingLoopApi.h"
+
+#include "npy_writer.h"
+
+#define EPOCHS 20
+#define BATCH 64
+#define LR 0.01f
+#define MOMENTUM 0.9f
+#define SEED 42
+#define SHUFFLE_SEED 42
+#define NUM_CLASSES 6
+
+#define IN_CHANNELS 9
+#define LEN_INPUT 128
+
+#define C1_OUT 16
+#define C1_K 7
+#define C2_OUT 32
+#define C2_K 5
+#define C3_OUT 64
+#define C3_K 3
+
+/* 3 x (Conv1d + ReLU + Pool) + Flatten + Linear + Softmax = 12 layers */
+#define MODEL_SIZE 12
+
+static dataset_t g_trainDataset;
+static dataset_t g_valDataset;
+static dataset_t g_testDataset;
+
+static void reshapeItemsAddBatchDim(tensorArray_t *items) {
+    for (size_t i = 0; i < items->size; ++i) {
+        tensor_t *t = items->array[i];
+        size_t oldRank = t->shape->numberOfDimensions;
+        size_t newRank = oldRank + 1;
+
+        size_t *newDims = reserveMemory(newRank * sizeof(size_t));
+        size_t *newOrder = reserveMemory(newRank * sizeof(size_t));
+        newDims[0] = 1;
+        for (size_t d = 0; d < oldRank; ++d) {
+            newDims[d + 1] = t->shape->dimensions[d];
+        }
+        for (size_t d = 0; d < newRank; ++d) {
+            newOrder[d] = d;
+        }
+
+        freeReservedMemory(t->shape->dimensions);
+        freeReservedMemory(t->shape->orderOfDimensions);
+        t->shape->dimensions = newDims;
+        t->shape->orderOfDimensions = newOrder;
+        t->shape->numberOfDimensions = newRank;
+    }
+}
+
+static tensorArray_t *buildOneHotLabels(tensorArray_t *intLabels) {
+    tensorArray_t *out = reserveMemory(sizeof(tensorArray_t));
+    tensor_t **arr = reserveMemory(intLabels->size * sizeof(tensor_t *));
+    out->array = arr;
+    out->size = intLabels->size;
+
+    for (size_t i = 0; i < intLabels->size; ++i) {
+        size_t *dims = reserveMemory(1 * sizeof(size_t));
+        size_t *order = reserveMemory(1 * sizeof(size_t));
+        dims[0] = NUM_CLASSES;
+        order[0] = 0;
+        shape_t *shape = reserveMemory(sizeof(shape_t));
+        shape->dimensions = dims;
+        shape->orderOfDimensions = order;
+        shape->numberOfDimensions = 1;
+
+        quantization_t *q = quantizationInitFloat();
+        tensor_t *t = initTensor(shape, q, NULL);
+
+        int32_t cls = ((int32_t *)intLabels->array[i]->data)[0];
+        float *data = (float *)t->data;
+        for (size_t c = 0; c < NUM_CLASSES; ++c) {
+            data[c] = (c == (size_t)cls) ? 1.0f : 0.0f;
+        }
+        arr[i] = t;
+    }
+    return out;
+}
+
+static void initDataSets(void) {
+    /* Data path: reuse legacy directory; v2 doesn't duplicate the data. */
+    tensorArray_t *trainItems = npyLoad("examples/har_classifier/data/train_x.npy");
+    tensorArray_t *trainLabelsRaw = npyLoad("examples/har_classifier/data/train_y.npy");
+    reshapeItemsAddBatchDim(trainItems);
+    g_trainDataset.items = trainItems;
+    g_trainDataset.labels = buildOneHotLabels(trainLabelsRaw);
+
+    tensorArray_t *valItems = npyLoad("examples/har_classifier/data/val_x.npy");
+    tensorArray_t *valLabelsRaw = npyLoad("examples/har_classifier/data/val_y.npy");
+    reshapeItemsAddBatchDim(valItems);
+    g_valDataset.items = valItems;
+    g_valDataset.labels = buildOneHotLabels(valLabelsRaw);
+
+    tensorArray_t *testItems = npyLoad("examples/har_classifier/data/test_x.npy");
+    tensorArray_t *testLabelsRaw = npyLoad("examples/har_classifier/data/test_y.npy");
+    reshapeItemsAddBatchDim(testItems);
+    g_testDataset.items = testItems;
+    g_testDataset.labels = buildOneHotLabels(testLabelsRaw);
+}
+
+static sample_t *getTrainSample(size_t id) {
+    return npyGetSample(&g_trainDataset, id);
+}
+static sample_t *getValSample(size_t id) {
+    return npyGetSample(&g_valDataset, id);
+}
+static sample_t *getTestSample(size_t id) {
+    return npyGetSample(&g_testDataset, id);
+}
+static size_t getTrainSize(void) {
+    return g_trainDataset.items->size;
+}
+static size_t getValSize(void) {
+    return g_valDataset.items->size;
+}
+static size_t getTestSize(void) {
+    return g_testDataset.items->size;
+}
+
+static void buildModel(layer_t **model, layerQuant_t *lq) {
+    /* Block 1: Conv1d(9->16, K=7, padding=SAME), ReLU, MaxPool(K=2, S=2). */
+    model[0] = conv1dLayerInit(
+        &(conv1dInit_t){
+            .inChannels = IN_CHANNELS, .outChannels = C1_OUT, .kernelSize = C1_K, .padding = SAME},
+        lq);
+    model[1] = reluLayerInit(lq);
+    model[2] = maxPool1dLayerInit(
+        &(maxPool1dInit_t){
+            .kernelSize = 2, .stride = 2, .inputChannels = C1_OUT, .inputLength = LEN_INPUT},
+        lq);
+
+    /* Block 2 */
+    model[3] = conv1dLayerInit(
+        &(conv1dInit_t){
+            .inChannels = C1_OUT, .outChannels = C2_OUT, .kernelSize = C2_K, .padding = SAME},
+        lq);
+    model[4] = reluLayerInit(lq);
+    model[5] = maxPool1dLayerInit(
+        &(maxPool1dInit_t){
+            .kernelSize = 2, .stride = 2, .inputChannels = C2_OUT, .inputLength = LEN_INPUT / 2},
+        lq);
+
+    /* Block 3 */
+    model[6] = conv1dLayerInit(
+        &(conv1dInit_t){
+            .inChannels = C2_OUT, .outChannels = C3_OUT, .kernelSize = C3_K, .padding = SAME},
+        lq);
+    model[7] = reluLayerInit(lq);
+    model[8] = avgPool1dLayerInit(
+        &(avgPool1dInit_t){.kernelSize = LEN_INPUT / 4, .stride = LEN_INPUT / 4}, lq);
+
+    /* Head */
+    model[9] = flattenLayerInit();
+    model[10] =
+        linearLayerInit(&(linearInit_t){.inFeatures = C3_OUT, .outFeatures = NUM_CLASSES}, lq);
+    model[11] = softmaxLayerInit(lq);
+}
+
+/* Load PyTorch state_dict from per-layer .npy files written by
+ * examples/har_classifier/train_pytorch.py --save-weights.
+ *
+ * Returns 0 on success, non-zero on first missing file. */
+static int loadStateDictFromDir(layer_t **model, const char *weightsDir) {
+    /* Param layer order in model[]: model[0] conv1, model[3] conv2,
+     * model[6] conv3, model[10] fc. 4 entries. */
+    char wPath[256], bPath[256];
+    const char *names[4] = {"conv1", "conv2", "conv3", "fc"};
+    tensor_t *w[4] = {0};
+    tensor_t *b[4] = {0};
+
+    for (int i = 0; i < 4; i++) {
+        snprintf(wPath, sizeof(wPath), "%s/%s.weight.npy", weightsDir, names[i]);
+        snprintf(bPath, sizeof(bPath), "%s/%s.bias.npy", weightsDir, names[i]);
+        tensorArray_t *wArr = npyLoad(wPath);
+        tensorArray_t *bArr = npyLoad(bPath);
+        if (wArr == NULL || bArr == NULL) {
+            fprintf(stderr, "loadStateDictFromDir: missing %s or %s\n", wPath, bPath);
+            return 1;
+        }
+        w[i] = wArr->array[0];
+        b[i] = bArr->array[0];
+    }
+
+    modelLoadStateDict(
+        model, MODEL_SIZE,
+        (stateDictEntry_t[]){
+            {.name = names[0], .weightData = (float *)w[0]->data, .biasData = (float *)b[0]->data},
+            {.name = names[1], .weightData = (float *)w[1]->data, .biasData = (float *)b[1]->data},
+            {.name = names[2], .weightData = (float *)w[2]->data, .biasData = (float *)b[2]->data},
+            {.name = names[3], .weightData = (float *)w[3]->data, .biasData = (float *)b[3]->data},
+        },
+        4);
+    return 0;
+}
+
+static FILE *g_log_file = NULL;
+static int g_first_epoch = 1;
+static struct timespec g_epoch_t0;
+
+static void epochCallback(size_t epoch, float trainLoss, epochStats_t evalStats) {
+    struct timespec t1;
+    clock_gettime(CLOCK_MONOTONIC, &t1);
+    double wall_s =
+        (double)(t1.tv_sec - g_epoch_t0.tv_sec) + (double)(t1.tv_nsec - g_epoch_t0.tv_nsec) * 1e-9;
+
+    if (!g_first_epoch) {
+        fprintf(g_log_file, ",\n");
+    }
+    fprintf(g_log_file,
+            "    {\"epoch\": %zu, \"step_losses\": [], \"train_loss\": %.6f, "
+            "\"val_loss\": %.6f, \"val_acc\": %.6f, \"wall_s\": %.4f}",
+            epoch, (double)trainLoss, (double)evalStats.loss, (double)evalStats.accuracy, wall_s);
+    fflush(g_log_file);
+    g_first_epoch = 0;
+
+    fprintf(stdout, "epoch %zu: train_loss=%.4f val_loss=%.4f val_acc=%.4f wall_s=%.2f\n", epoch,
+            (double)trainLoss, (double)evalStats.loss, (double)evalStats.accuracy, wall_s);
+    fflush(stdout);
+
+    clock_gettime(CLOCK_MONOTONIC, &g_epoch_t0);
+}
+
+static int ensureDir(const char *p) {
+    if (mkdir(p, S_IRWXU | S_IRWXG | S_IROTH | S_IXOTH) == 0) {
+        return 0;
+    }
+    if (errno == EEXIST) {
+        return 0;
+    }
+    fprintf(stderr, "ERROR: cannot create %s: %s\n", p, strerror(errno));
+    return 1;
+}
+
+int main(void) {
+    if (ensureDir("examples/har_classifier_v2/logs") != 0) {
+        return 1;
+    }
+    if (ensureDir("examples/har_classifier_v2/outputs") != 0) {
+        return 1;
+    }
+
+    initDataSets();
+
+    dataLoader_t *testLoader = dataLoaderInit(getTestSample, getTestSize, 1, NULL, NULL,
+                                              /*shuffle*/ false, /*shuffleSeed*/ 0,
+                                              /*dropLast*/ true);
+
+    layerQuant_t lq;
+    layerQuantInitUniform(&lq, quantizationInitFloat());
+
+    layer_t *model[MODEL_SIZE];
+    buildModel(model, &lq);
+
+    const char *bitParity = getenv("BIT_PARITY");
+    if (bitParity != NULL && bitParity[0] != '\0') {
+        /* Bit-parity mode: load PyTorch state_dict, skip training, run inference. */
+        const char *wDir = "examples/har_classifier/weights";
+        if (loadStateDictFromDir(model, wDir) != 0) {
+            fprintf(stderr, "BIT_PARITY: state_dict load failed\n");
+            return 1;
+        }
+        fprintf(stdout, "BIT_PARITY: loaded state_dict from %s\n", wDir);
+    } else {
+        dataLoader_t *trainLoader = dataLoaderInit(getTrainSample, getTrainSize, BATCH, NULL, NULL,
+                                                   /*shuffle*/ true, /*shuffleSeed*/ SHUFFLE_SEED,
+                                                   /*dropLast*/ true);
+        dataLoader_t *valLoader = dataLoaderInit(getValSample, getValSize, 1, NULL, NULL,
+                                                 /*shuffle*/ false, /*shuffleSeed*/ 0,
+                                                 /*dropLast*/ true);
+
+        optimizer_t *sgd =
+            sgdMCreateOptim(LR, MOMENTUM, /*weightDecay*/ 0.0f, model, MODEL_SIZE, FLOAT32);
+
+        g_log_file = fopen("examples/har_classifier_v2/logs/c.json", "w");
+        if (!g_log_file) {
+            fprintf(stderr, "ERROR: cannot open log file for writing\n");
+            return 1;
+        }
+        fprintf(g_log_file,
+                "{\n"
+                "  \"impl\": \"c_v2\",\n"
+                "  \"example\": \"har_classifier\",\n"
+                "  \"config\": {\"epochs\": %d, \"batch\": %d, \"lr\": %.6f, "
+                "\"momentum\": %.6f, \"seed\": %d, \"shuffle_seed\": %d},\n"
+                "  \"epochs\": [\n",
+                EPOCHS, BATCH, (double)LR, (double)MOMENTUM, SEED, SHUFFLE_SEED);
+        fflush(g_log_file);
+
+        clock_gettime(CLOCK_MONOTONIC, &g_epoch_t0);
+
+        trainingRunResult_t result =
+            trainingRun(model, MODEL_SIZE,
+                        (lossConfig_t){.funcType = CROSS_ENTROPY,
+                                       .backwardReduction = REDUCTION_MEAN,
+                                       .classWeights = NULL},
+                        trainLoader, valLoader, sgd, EPOCHS, calculateGradsSequential,
+                        inferenceWithLoss, epochCallback);
+        (void)result;
+
+        epochStats_t testStats = evaluationEpochWithMetrics(
+            model, MODEL_SIZE, CROSS_ENTROPY, testLoader, inferenceWithLoss, REDUCTION_MEAN);
+
+        fprintf(g_log_file,
+                "\n  ],\n"
+                "  \"final\": {\"test_loss\": %.6f, \"test_acc\": %.6f, "
+                "\"test_auc\": null}\n"
+                "}\n",
+                (double)testStats.loss, (double)testStats.accuracy);
+        fclose(g_log_file);
+
+        fprintf(stdout, "FINAL test_loss=%.4f test_acc=%.4f\n", (double)testStats.loss,
+                (double)testStats.accuracy);
+    }
+
+    /* Predictions on test set (both modes). */
+    size_t numTest = getTestSize();
+    int32_t *predictions = malloc(numTest * sizeof(int32_t));
+    if (!predictions) {
+        fprintf(stderr, "OOM allocating predictions\n");
+        return 1;
+    }
+
+    for (size_t i = 0; i < numTest; ++i) {
+        sample_t *s = getTestSample(i);
+        tensor_t *out = inference(model, MODEL_SIZE, s->item);
+        float *probs = (float *)out->data;
+        size_t argmax = 0;
+        float best = probs[0];
+        for (size_t c = 1; c < NUM_CLASSES; ++c) {
+            if (probs[c] > best) {
+                best = probs[c];
+                argmax = c;
+            }
+        }
+        predictions[i] = (int32_t)argmax;
+        freeTensor(out);
+        freeSample(s);
+    }
+
+    size_t outShape[] = {numTest};
+    int status = 0;
+    int rc = npyWriteInt32("examples/har_classifier_v2/outputs/c_predictions.npy", predictions,
+                           outShape, 1);
+    if (rc != 0) {
+        fprintf(stderr, "ERROR: npyWriteInt32 failed (rc=%d)\n", rc);
+        status = 1;
+    }
+    free(predictions);
+
+    return status;
+}

From 979826067311739fac60d0d17f77c311f80806bb Mon Sep 17 00:00:00 2001
From: Leo Buron <leo.buron@uni-due.de>
Date: Fri, 15 May 2026 22:23:31 +0200
Subject: [PATCH 4/4] =?UTF-8?q?ci:=20add=20bit-parity=20job=20=E2=80=94=20?=
 =?UTF-8?q?HAR=20and=20ECG=20v2=20binaries=20diff'd=20against=20PyTorch=20?=
 =?UTF-8?q?reference?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

New job c-bit-parity runs in parallel with c-build-and-test. Steps:
  1. PyTorch trains HAR + ECG (emits pytorch_*.npy + per-layer weights)
  2. Builds the two v2 binaries via cmake --preset examples
  3. Runs both v2 binaries with BIT_PARITY=1 (loads state_dict via
     modelLoadStateDict, skips training, writes inference outputs)
  4. uv-run examples/_shared/compare_predictions.py per example —
     exact match for HAR int32, allclose (rtol=1e-4, atol=1e-5) for
     ECG float32

The job fails the CI if the new factories produce different inference
outputs than PyTorch with the same weights — catches factory-wiring
regressions immediately.

Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 10
---
 .github/workflows/ci.yml                | 60 +++++++++++++++++++++++
 examples/_shared/compare_predictions.py | 63 +++++++++++++++++++++++++
 2 files changed, 123 insertions(+)
 create mode 100644 examples/_shared/compare_predictions.py

diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
index 4f42cb7..ec50d4c 100644
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -103,6 +103,66 @@ jobs:
       - name: Test
         run: ctest --preset unit_test_asan
 
+  c-bit-parity:
+    runs-on: ubuntu-latest
+
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Install dependencies
+        run: sudo apt-get update && sudo apt-get install -y cmake ninja-build gcc
+
+      - name: Install uv
+        uses: astral-sh/setup-uv@v6
+
+      - name: Set up Python
+        run: uv python install 3.12
+
+      - name: Sync Python deps
+        run: uv sync
+
+      - name: Prepare HAR data
+        run: uv run examples/har_classifier/prepare_data.py
+
+      - name: Prepare ECG data
+        run: uv run examples/ecg_anomaly_ae/prepare_data.py
+
+      - name: Train PyTorch HAR (produces reference predictions + weights)
+        run: uv run examples/har_classifier/train_pytorch.py
+
+      - name: Train PyTorch ECG (produces reference reconstructions + weights)
+        run: uv run examples/ecg_anomaly_ae/train_pytorch.py
+
+      - name: Configure
+        run: cmake --preset examples
+
+      - name: Build v2 binaries
+        run: |
+          cmake --build --preset examples --target train_c_har_classifier_v2
+          cmake --build --preset examples --target train_c_ecg_anomaly_ae_v2
+
+      - name: Run HAR v2 in BIT_PARITY mode
+        run: BIT_PARITY=1 build/examples/examples/har_classifier_v2/train_c_har_classifier_v2
+
+      - name: Run ECG v2 in BIT_PARITY mode
+        run: BIT_PARITY=1 build/examples/examples/ecg_anomaly_ae_v2/train_c_ecg_anomaly_ae_v2
+
+      - name: Diff HAR predictions (int32, exact match required)
+        run: |
+          uv run examples/_shared/compare_predictions.py \
+            --pytorch examples/har_classifier/outputs/pytorch_predictions.npy \
+            --c examples/har_classifier_v2/outputs/c_predictions.npy \
+            --dtype int32
+
+      - name: Diff ECG reconstructions (float32, allclose)
+        run: |
+          uv run examples/_shared/compare_predictions.py \
+            --pytorch examples/ecg_anomaly_ae/outputs/pytorch_reconstructions.npy \
+            --c examples/ecg_anomaly_ae_v2/outputs/c_reconstructions.npy \
+            --dtype float32 \
+            --rtol 1e-4 \
+            --atol 1e-5
+
   python-test:
     runs-on: ubuntu-latest
 
diff --git a/examples/_shared/compare_predictions.py b/examples/_shared/compare_predictions.py
new file mode 100644
index 0000000..50e797e
--- /dev/null
+++ b/examples/_shared/compare_predictions.py
@@ -0,0 +1,63 @@
+"""Compare C-side predictions/reconstructions against PyTorch reference outputs.
+
+Used by the bit-parity CI step. Exits 0 on match, 1 on mismatch.
+
+Usage:
+    uv run examples/_shared/compare_predictions.py \\
+        --pytorch <path-to-pytorch.npy> \\
+        --c <path-to-c.npy> \\
+        --dtype {int32,float32} \\
+        [--rtol 1e-4] [--atol 1e-5]
+"""
+
+import argparse
+import sys
+import numpy as np
+
+
+def main() -> int:
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--pytorch", required=True, help="PyTorch reference .npy")
+    parser.add_argument("--c", required=True, help="C-side .npy")
+    parser.add_argument("--dtype", required=True, choices=["int32", "float32"])
+    parser.add_argument("--rtol", type=float, default=1e-4)
+    parser.add_argument("--atol", type=float, default=1e-5)
+    args = parser.parse_args()
+
+    py = np.load(args.pytorch)
+    c = np.load(args.c)
+
+    if py.shape != c.shape:
+        print(f"FAIL: shape mismatch — pytorch={py.shape}, c={c.shape}", file=sys.stderr)
+        return 1
+
+    if args.dtype == "int32":
+        if not np.array_equal(py, c):
+            mismatches = np.flatnonzero(py != c)
+            print(f"FAIL: int32 mismatch at {mismatches.size}/{py.size} positions",
+                  file=sys.stderr)
+            for idx in mismatches[:5]:
+                print(f"  idx={idx}: pytorch={py.flat[idx]}, c={c.flat[idx]}", file=sys.stderr)
+            return 1
+        print(f"PASS: int32 arrays bit-identical ({py.size} elements)")
+        return 0
+
+    # float32
+    if not np.allclose(py, c, rtol=args.rtol, atol=args.atol):
+        diffs = np.abs(py - c)
+        max_diff = diffs.max()
+        rel_diffs = diffs / (np.abs(py) + args.atol)
+        max_rel = rel_diffs.max()
+        print(f"FAIL: float32 mismatch — max_abs={max_diff:.6e}, "
+              f"max_rel={max_rel:.6e}, rtol={args.rtol}, atol={args.atol}", file=sys.stderr)
+        worst = np.argmax(diffs)
+        print(f"  worst idx={worst}: pytorch={py.flat[worst]:.6e}, c={c.flat[worst]:.6e}",
+              file=sys.stderr)
+        return 1
+    print(f"PASS: float32 arrays close (rtol={args.rtol}, atol={args.atol}, "
+          f"{py.size} elements)")
+    return 0
+
+
+if __name__ == "__main__":
+    sys.exit(main())