From the pre-merge test-coverage review of #2. The suite (43 passing) covers derive_survey_year exemplarily and descriptive tables well, but the highest-risk machinery is untested. Top three, in order:
- Golden-record test for
build_initiation_data() / build_cessation_data() — 3-5 hand-crafted respondents with exact expected output rows (numerator age/period/event, denominator person-year windows, category scope: SMKDSTY 3 and 5 excluded from cessation). Catches off-by-one in at-risk windows, broken period = cohort + age identity, and the category-misclassification class that bit us twice in cchsflow v3. The existing cessation-scope test admits in its own body that it doesn't test the contract its name promises.
validate_cycle_coverage() unit tests — synthetic worksheet fixtures: critical-role gap warns / strict stops; declared-but-undeliverable cycle lands in declared; comma-separated multi-role strings recognized as critical (the parsing path whose regression would silently disable the guard).
- Tagged-NA preservation tests for
truncate_continuous() and prepare_for_mice() — vector with tagged_na("a"/"b"/"c") + over/under-cap values; assert tags survive truncation (haven::na_tag() round-trip) and only the "b" tag becomes plain NA for MICE (numeric and factor branches). The project's missing-data convention rests on these two functions.
Nice-to-have: fit_binomial_apc() cell-aggregation fixture; survey_var/survey_bound unit tests (every test depends on them); check_skewness threshold cases; factor-typed smoking_status/sex in helper-apc.R to match load_study_data() output types (tests currently exercise numeric, production produces factors). Test-quality: fix the tautological third assertion in select_vars_by_role returns correct variables (compares output to a subset defined by the output); pin config::get(config = "default") in tests so R_CONFIG_ACTIVE in a collaborator's environment doesn't change test bounds.
From the pre-merge test-coverage review of #2. The suite (43 passing) covers
derive_survey_yearexemplarily and descriptive tables well, but the highest-risk machinery is untested. Top three, in order:build_initiation_data()/build_cessation_data()— 3-5 hand-crafted respondents with exact expected output rows (numerator age/period/event, denominator person-year windows, category scope: SMKDSTY 3 and 5 excluded from cessation). Catches off-by-one in at-risk windows, brokenperiod = cohort + ageidentity, and the category-misclassification class that bit us twice in cchsflow v3. The existing cessation-scope test admits in its own body that it doesn't test the contract its name promises.validate_cycle_coverage()unit tests — synthetic worksheet fixtures: critical-role gap warns / strict stops; declared-but-undeliverable cycle lands indeclared; comma-separated multi-role strings recognized as critical (the parsing path whose regression would silently disable the guard).truncate_continuous()andprepare_for_mice()— vector withtagged_na("a"/"b"/"c")+ over/under-cap values; assert tags survive truncation (haven::na_tag()round-trip) and only the "b" tag becomes plain NA for MICE (numeric and factor branches). The project's missing-data convention rests on these two functions.Nice-to-have:
fit_binomial_apc()cell-aggregation fixture;survey_var/survey_boundunit tests (every test depends on them);check_skewnessthreshold cases; factor-typedsmoking_status/sexinhelper-apc.Rto matchload_study_data()output types (tests currently exercise numeric, production produces factors). Test-quality: fix the tautological third assertion inselect_vars_by_role returns correct variables(compares output to a subset defined by the output); pinconfig::get(config = "default")in tests soR_CONFIG_ACTIVEin a collaborator's environment doesn't change test bounds.