Skip to content

Add end-to-end regression test for bus_fare_spending in the dataset#431

Open
vahid-ahmadi wants to merge 3 commits into
mainfrom
add-bus-fare-dataset-regression-test
Open

Add end-to-end regression test for bus_fare_spending in the dataset#431
vahid-ahmadi wants to merge 3 commits into
mainfrom
add-bus-fare-dataset-regression-test

Conversation

@vahid-ahmadi

Copy link
Copy Markdown
Collaborator

What

Adds an end-to-end regression test asserting the enhanced dataset actually contains a populated bus_fare_spending column. Marked xfail (see #430).

Why

bus_fare_spending is imputed (#428) but missing from the published dataset (1.56.3) — even though the build installs policyengine-uk==2.89.1 (the version that defines the variable) and every other consumption output lands. The existing unit test only covers generate_lcfs_table (which correctly computes the column); nothing checked it survives the QRF predict + enhanced-dataset assembly/save. This adds that missing coverage.

How it pins the stage

  • test_lcfs_consumption_ingestion (existing) passes → the column is built correctly in generate_lcfs_table.
  • This test xfails → the column is absent from the final dataset.

Together they bracket the bug to downstream of generate_lcfs_table (QRF train/predict or the clone/calibrate/save assembly).

Mergeable by design

xfail(strict=False) so CI stays green and the PR is mergeable; it documents the known gap and will XPASS once the pipeline is fixed — the signal to remove the marker and make it a hard assertion. Locally it skips when the dataset isn't present; in CI's build job it runs against the freshly built dataset.

Tracks #430. Follow-up: find and fix the downstream drop, then convert this to a strict assertion.

🤖 Generated with Claude Code

vahid-ahmadi and others added 3 commits June 17, 2026 16:43
generate_lcfs_table is unit-tested to compute bus_fare_spending, but nothing
checked it survives the QRF predict + enhanced-dataset assembly/save into the
published dataset — and it currently doesn't (issue #430): every other
consumption output lands, bus_fare_spending is dropped downstream.

Add an end-to-end test asserting the enhanced dataset carries a populated
bus_fare_spending column. Marked xfail so it is mergeable and documents the
gap; it will XPASS once the pipeline is fixed.

Refs #430.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tic, to revert)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant