Skip to content

bus_fare_spending dropped between LCFS imputation and the published enhanced dataset #430

@vahid-ahmadi

Description

@vahid-ahmadi

Symptom

bus_fare_spending is imputed (#428) but is absent from the published enhanced dataset (e.g. enhanced_frs_2023_24.h5, release 1.56.3). Only bus_subsidy_spending is present. Every other consumption output (transport_consumption, petrol_spending, diesel_spending, all 12 COICOP divisions) is present.

What's been ruled out

  • Version: the 1.56.3 build log shows policyengine-uk==2.89.1 installed (the version that defines the bus_fare_spending input variable, #1780) — so it's not the version pin (Bump policyengine-uk to >=2.89.1 so bus_fare_spending lands in the dataset #429 was applied).
  • Model cache: the consumption QRF model is not downloaded as a prerequisite and is retrained from IMPUTATIONS (which includes bus_fare_spending).
  • generate_lcfs_table: unit-tested (test_lcfs_consumption_ingestion) — it correctly computes the bus_fare_spending column from the COICOP 7.3.2 codes. ✅

Where it must be

Since the training table has the column but the final dataset doesn't, the drop is downstream of generate_lcfs_table — in the QRF train/predict step, or in the enhanced-dataset assembly/save (note the dataset is cloned + calibrated; clone_index is present). No test currently covers that the column survives into the saved dataset.

Plan

  1. Add an end-to-end regression test asserting the enhanced dataset contains bus_fare_spending (xfail until fixed). ← this PR
  2. Trace the exact downstream stage and fix it.
  3. Flip the xfail to a hard assertion.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions