Skip to content

ci: fix e2e-docker and playwright-e2e workflows#204

Open
Gill87 wants to merge 21 commits into
benevolentbandwidth:mainfrom
Gill87:ci/fix-checks
Open

ci: fix e2e-docker and playwright-e2e workflows#204
Gill87 wants to merge 21 commits into
benevolentbandwidth:mainfrom
Gill87:ci/fix-checks

Conversation

@Gill87

@Gill87 Gill87 commented Jun 21, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes both CI workflows that exercise the full Docker stack (e2e-docker.yml) and the browser-driven flow (playwright-e2e.yml), neither of which had ever completed successfully. Both are now verified green end-to-end via real workflow_dispatch runs, not just local inspection.

Depends on #193 (feat/session-storage) — that PR must merge first. This branch was cut from feat/session-storage and only contains the CI-fix commits on top of it; this description intentionally omits feat/session-storage's own changes.

Motivation

e2e-docker.yml died at the CMake configure step on every run (missing PostgreSQL dev headers for Drogon's Postgres feature), so the full Docker E2E suite had literally never executed past configure. playwright-e2e.yml failed on a locator.fill timeout because the edit page rendered the same address card twice (desktop + mobile), both always mounted and only CSS-hidden, which confused Playwright's strict-mode locator matching. Fixing the infra surfaced more pre-existing, previously-unexercised bugs once each workflow could finally run far enough to hit them, including a test that only verified an optimize job was accepted rather than that it actually completed and returned a real result.

Changes

Infrastructure

  • Add libpq-dev to the e2e-docker.yml apt package cache so Drogon's Postgres support configures correctly.

Frontend

  • Remove the duplicate mobile/desktop AddressCard mapping in AddressSection.tsx so only one set of address rows is ever mounted, eliminating the hidden-DOM collision that broke Playwright's locator.
  • Scope the results-page Playwright assertion (optimize-flow.spec.ts) to the aside element, since the same always-mounted-both-mounted pattern exists between the desktop Sidebar and mobile ResultsBottomSheet, both of which render identical "N routes with M total stops" text.
  • Minor formatting cleanup in formStyles.v2.ts.

Backend tests

  • Rewrite the 3 stale e2e-docker API scripts (optimize_valid_query_returns_summary, optimize_rejects_negative_deliveries, optimize_rejects_excessive_deliveries), which targeted a GET /optimize?deliveries=N&vehicles=N shape that no longer exists. They now POST real JSON payloads against POST /api/v1/optimization-jobs (the always-registered async API) and assert the current validation contract (400 + {"field": "jobs[0].demand"} / {"field": "jobs"}, 202 + job_id).
  • Update tests/CMakeLists.txt to register the rewritten "valid query" script as a Python-assisted test, since it now needs to parse JSON out of the response.
  • optimize_valid_query_returns_summary_e2e_test.sh initially only asserted 202 + job_id on submission, which would stay green even if the async worker never completed the job or the result payload regressed. It now polls /api/v1/optimization-jobs/{id} to succeeded and asserts the result contains status/summary/routes/unassigned, matching what the test's name promises.

Validation

  • gh workflow run "E2E Docker (weekly + manual)" --repo Gill87/b2deliveryoptimizer --ref ci/fix-checks — 100% tests passed, 0 failed out of 8 (1 disabled smoke test skipped as expected), re-run after the polling fix to confirm OptimizeValidQueryReturnsSummary still passes end-to-end
  • gh workflow run "Playwright E2E (weekly + manual)" --repo Gill87/b2deliveryoptimizer --ref ci/fix-checks — 1 passed

CI workflows

  • Confirmed e2e-docker.yml Configure/Build steps pass for the first time (previously failed every run at CMake configure)
  • Confirmed playwright-e2e.yml clears the original edit-page locator.fill timeout and the newly-discovered results-page locator collision
  • Confirmed OptimizeValidQueryReturnsSummary exercises the full submit → poll → result path (12.3s runtime, not an instant submit-only pass)

Both workflows are workflow_dispatch/schedule-only by design (not wired to pull_request), so they were validated by dispatching real runs against this branch on the fork rather than via pr-checks.

Risk

The 3 rewritten e2e-docker scripts now depend on the always-on /api/v1/optimization-jobs API and Postgres-backed job store being healthy in the stack; if that store becomes misconfigured these tests will fail at submission (503) rather than validation, which is an acceptable and already-covered failure mode (see OptimizationJobValidPayloadReturnsRoutingResponse). The Playwright locator scoping to aside is coupled to the current Sidebar/ResultsBottomSheet markup — if either component's root tag changes, the locator will need to be revisited. OptimizeValidQueryReturnsSummary now duplicates most of OptimizationJobValidPayloadReturnsRoutingResponse's submit/poll/result mechanics; that overlap is intentional so each test can fail independently and point at what actually broke, rather than collapsing coverage into one test.

Rollout and Recovery

No runtime/deploy changes — this is CI and test-only. Both workflows remain workflow_dispatch/schedule-triggered, so a bad merge only affects the weekly scheduled run or manual dispatches, not pull_request checks. Revert is a plain git revert of the relevant commits if either workflow regresses.

Gill87 and others added 19 commits June 13, 2026 12:05
…ressing session storage issues in results page
… can be loaded to the react state of the vehicles to be exported through the save point
Stabilize setRoutes via a routes ref so it no longer breaks memoization
for every callback that depends on it, document why optimizeResults is
intentionally left in sessionStorage for useSyncExternalStore, dedupe
the new-session handler in welcome/page.tsx, and extract the geocode
cache-hit check into a testable helper with a regression test for the
imported-address (null state) case.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…desktop to prevent hidden DOM's from being picked up by the playwright CI check
…te responses for excessive and negative deliveries
…ize tests

The legacy sync /api/v1/deliveries/optimize endpoint is disabled by
default everywhere (dev, prod, and the e2e stack env all set
DELIVERYOPTIMIZER_ENABLE_SYNC_OPTIMIZE=0), so the previous rewrite of
these 3 scripts 404'd against it. Point them at
POST /api/v1/optimization-jobs instead, which is always registered and
shares the same ParseAndValidateOptimizeRequest validation contract.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
OptimizeValidQueryReturnsSummary only asserted HTTP 202 + job_id on
submission, so it could pass even if the async worker never completed
the job or the result payload regressed. Poll the job status to
succeeded and assert the result contains status/summary/routes/
unassigned, matching what the test's name promises.
@markboenigk

Copy link
Copy Markdown
Collaborator

Great work — the CI fixes are clearly motivated and well-executed, and the test rewrites are a genuine upgrade (they now actually test what the test names promise). A few things to flag before merging:

ADDRESS_ROW_STEPPER_CONTAINER_NARROW_ERROR in formStyles.v2.ts

The new version expands the class list into an explicit string, which means it no longer tracks changes to ADDRESS_ROW_STEPPER_CONTAINER or ADDRESS_ROW_STEPPER_CONTAINER_NARROW. If those base classes are ever updated, the error variant will silently diverge. Was there a specific reason the template literal extension couldn't stay? (Tailwind class conflict, purge issue?) A short comment explaining the intent would help future readers.

Module-level mutable cache in results/page.tsx

let cachedRouteLoadKey = "";
let cachedRouteLoadResult: RouteLoadResult = EMPTY_ROUTE_LOAD_RESULT;

The pattern is correct for useSyncExternalStore (the snapshot function must be stable and return the same reference when the data hasn't changed), but this isn't obvious at a glance. The cache also persists across Next.js hot module reloads in dev. A short comment on why this is module-level rather than inside the function would save the next person from reaching for a linter suppression.

Empty state vs. non-empty state in AddressSection.tsx

The non-empty path now uses ADDRESS_LIST_RESPONSIVE_CONTAINER / ADDRESS_LIST_RESPONSIVE_INNER / ADDRESS_LIST_DESKTOP_DIVIDER, while the addressesCount === 0 path still uses the old ADDRESS_LIST_CONTAINER / ADDRESS_LIST_CONTAINER_INNER / ADDRESS_LIST_DIVIDER. This works today, but the two code paths will drift visually if either set of classes is updated independently. Not a blocker, but worth a follow-up cleanup.

e2e_init arg count in the rewritten summary test

e2e_init "$1" "$2" "$3" "$4"
python3_bin="$5"

Just want to confirm: e2e_init takes exactly 4 args and doesn't need the python path, correct? Worth a quick check that the usage: comment above this matches what e2e_helpers.sh actually expects.


Everything else looks solid — the isHydrated guard preventing the auto-save from stomping a fresh import, the useSyncExternalStore + draftRoutes split for local edits, handleNewSession clearing both storage keys, and the hasCachedLocationWithState predicate with its explanation of the null-state invariant are all well thought through.

Gill87 and others added 2 commits June 27, 2026 19:47
Co-authored-by: Cursor <cursoragent@cursor.com>
@Gill87

Gill87 commented Jun 28, 2026

Copy link
Copy Markdown
Contributor Author

Thank you Mark for your feedback! I have addressed the issues you mentioned.

I refactored the stepper error token so it shares the same base layout as the normal stepper, added a comment explaining the module-level useSyncExternalStore cache, aligned the desktop empty address-list path with the responsive list tokens, and clarified in the E2E script that e2e_init takes the shared Docker stack args while Python is handled separately for JSON parsing.

@markboenigk

Copy link
Copy Markdown
Collaborator

Thanks for the follow-up commits — the ADDRESS_LIST_DESKTOP_CHROME extraction, the cache comment in results/page.tsx, and the e2e_init clarification all look good. A few things still outstanding:

results/page.tsxdraftRoutes not cleared when storage updates

routes = draftRoutes ?? routeLoadResult.routes means that once any local edit sets draftRoutes non-null (pin move, stop note, duplicate), it permanently shadows routeLoadResult. If optimize-results-updated fires while the user has local edits (e.g. a second tab finishes a new optimize run), the new results are silently discarded and the user keeps seeing the stale edited routes. A useEffect watching routeLoadResult that resets draftRoutes to null would fix this:

useEffect(() => {
  setDraftRoutes(null);
}, [routeLoadResult]);

formStyles.v2.ts — three dead exports after 39a096d4

ADDRESS_LIST_CONTAINER, ADDRESS_LIST_CONTAINER_INNER, and ADDRESS_LIST_DIVIDER are now unused anywhere (the June 28 commit removed their last uses in AddressSection.tsx) but remain defined and exported. Worth removing to avoid confusing future readers about which constants are active.

formStyles.v2.tsNARROW_ERROR still re-specifies w-[72px]

The ADDRESS_ROW_STEPPER_CONTAINER_BASE extraction removed the layout duplication, but NARROW_ERROR still appends w-[72px] independently rather than extending NARROW. If the narrow width ever changes, NARROW_ERROR will silently diverge. One-liner fix:

export const ADDRESS_ROW_STEPPER_CONTAINER_NARROW_ERROR =
  `${ADDRESS_ROW_STEPPER_CONTAINER_NARROW} border-[var(--edit-error-border)]`;

editPageDraft.tssaveEditPageDraft manually enumerates VehicleRow fields

Lines 22–33 list every field explicitly. A new field on VehicleRow silently won't persist unless this list is also updated — no compile-time warning catches the omission. A spread with a type assertion ({ ...v } as DraftVehicle) would stay in sync automatically while the DraftVehicle type still enforces the shape contract.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants