Pipeline hardening: silent-failure findings from PR #2 review (cleaning/imputation phase)

Findings from the pre-merge review of #2 (silent-failure audit + code review). These are the hardening tasks for the cleaning/imputation phase. Key framing: in a {targets} pipeline, a warning during a successful build is shown once and never again — the target caches as "up to date" — so warn-and-continue is effectively silent.

**Critical**
1. `load_study_data()` warn-and-skips missing cycle files with no count reconciliation — 10 of 11 cycles loading builds a "successful" `study_data` on a truncated period range. Add a loaded-vs-configured check that stops on shortfall.
2. No verification that `rec_with_table()` delivered the expected columns per cycle; `bind_rows()` fills silently-skipped derived variables with NA. This is the exact upstream failure class found during v3 integration (feeder gaps → all-NA). Add a per-cycle expected-columns check and an all-NA-by-cycle audit for critical-role variables.
3. `prepare_for_mice()`'s claim that tagged NA(a)/NA(c) are protected is false for numeric variables: `tagged_na()` is `NA` to `mice()`, so structural missingness gets imputed (e.g. quit times invented for never-smokers). Build an explicit `where` matrix (TRUE only for NA(b) cells), restore tags on write-back, and drop `"NA(a)"`/`"NA(c)"` factor levels from the modelling frame. Also check `mice_result$loggedEvents` (silently dropped/collinear variables) — currently suppressed by `printFlag = FALSE`.
4. APC builders and `fit_binomial_apc()` accept zero-event numerators — an all-NA `age_first_cigarette` yields a converged model encoding "nobody ever started smoking". Stop on empty numerator and `sum(d) == 0`.

**Important**
5. `validate_cycle_coverage()` defaults to warn-only for critical gaps (`strict_validation: false` in all profiles), downgrades to a warning when `cfg$cchs_cycles` is NULL, and exempts DerivedVar/empty-`variableStart` rows (exactly the variables `rec_with_table` silently drops). Make strict the default; surface skipped variables; consume the gap report somewhere.
6. `interior_knots()` silently drops out-of-range knots, and `fit_apc_model()` stores the *configured* (not used) knots in `attr(fit, "knots")` — Stage 9 basis reconstruction will mismatch the coefficients. Store actually-used knots; message on drops.
7. No `fit$converged` / `fit$boundary` check after `glm()` despite the docstring anticipating boundary failures.
8. Age restriction is skipped without any message when the age column is absent (split the condition: config-NULL = skip, missing column = stop).
9. Sex subsetting via `data[[sex_col]] == 1` injects all-NA phantom rows when sex is NA and gives empty strata on coding mismatch; use `which()` and assert non-empty strata.
10. Assign `imputation-predictor` roles in `cshm-variables.csv` — no variable currently carries the role, so Stage 5 selects zero variables and silently no-ops.
11. Smaller: `expand_denominator()` drops NA-cohort persons via bare `next` (count them); `check_skewness()` makes all-NA variables vanish from the summary; `select_vars_by_role(...)[1]` yields NA when a role is unassigned (assert); type-conversion loop in `load_study_data()` doesn't guard `as.numeric` coercion NAs.

Also from review, design notes: Table 1 code selects by `predictor`/`model-stratifier` roles while CLAUDE.md documents `table1`/`table1-stratifier` as the wiring — reconcile code or docs. Initiation numerator periods are not clamped to `[period_min, period_max]` while the denominator is — confirm against the intended Holford construction.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pipeline hardening: silent-failure findings from PR #2 review (cleaning/imputation phase) #3

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Pipeline hardening: silent-failure findings from PR #2 review (cleaning/imputation phase) #3

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions