From the PR #6 silent-failure review: variables absent from entire cycles arrive as plain untagged NA (bind_rows fills missing columns), not NA(c). Consequences:
- Table 1a's NA-type rows cannot see them — the cycle-absent mass appears in no category row and no NA row while still inflating denominators (the "percents may not sum to 100" footnote masks the symptom). With the new auxiliaries this is now material: ALCDTTM, GEN_02B, CCC_280 are absent from 2001, and the 2022 smoking block is absent entirely.
- The tagged-NA convention ("every NA carries a/b/c") is aspirational rather than enforced, so downstream code cannot distinguish "not asked" from data corruption.
Proposed fix (DemPoRT's fix_na_c() pattern, worksheet-driven): after bind_rows in load_study_data(), for each variable × cycle where the variable's databaseStart excludes that cycle, convert plain NA to haven::tagged_na("c") (numeric) / the "NA(c)" level (factor). Then add the row-accounting invariant to the descriptive engine: category n's + NA(a/b/c) n's = stratum n, per variable.
Related: prepare_for_mice() currently treats plain NA as where=FALSE (not imputed) — correct behaviour by accident; after fix_na_c the convention would be enforced rather than incidental, and a strict mode could stop on any untagged NA that remains.
From the PR #6 silent-failure review: variables absent from entire cycles arrive as plain untagged NA (bind_rows fills missing columns), not NA(c). Consequences:
Proposed fix (DemPoRT's
fix_na_c()pattern, worksheet-driven): after bind_rows inload_study_data(), for each variable × cycle where the variable'sdatabaseStartexcludes that cycle, convert plain NA tohaven::tagged_na("c")(numeric) / the"NA(c)"level (factor). Then add the row-accounting invariant to the descriptive engine: category n's + NA(a/b/c) n's = stratum n, per variable.Related:
prepare_for_mice()currently treats plain NA as where=FALSE (not imputed) — correct behaviour by accident; after fix_na_c the convention would be enforced rather than incidental, and a strict mode could stop on any untagged NA that remains.