Skip to content

feat: use route duration accurate for weather-aware optimization#180

Merged
markboenigk merged 11 commits into
benevolentbandwidth:mainfrom
hir-al-14:weather_duration_accuracy
Jun 29, 2026
Merged

feat: use route duration accurate for weather-aware optimization#180
markboenigk merged 11 commits into
benevolentbandwidth:mainfrom
hir-al-14:weather_duration_accuracy

Conversation

@hir-al-14

Copy link
Copy Markdown
Contributor

Summary

  • Improves the weather-aware optimize flow so weather is checked against the actual optimized route duration instead of only a rough service-time estimate.
  • Adds forecast timing metadata to the optimize response, including planned start time, estimated finish time, baseline route duration, weather-adjusted duration, and weather delay.

Motivation

  • Weather impact is more useful when it is based on the real route VROOM produced.
  • This keeps small weather changes from changing the route, while still allowing bigger weather delays to trigger a second weather-adjusted solve before routes are assigned.

Changes

  • Backend:
    • Reads VROOM summary.duration from the first optimized result.
    • Reads the earliest vehicle time window as the planned route start time.
    • Uses planned start time plus route duration when selecting OpenWeather hourly forecast data.
    • Adds forecast metadata fields to successful optimize responses.
    • Runs VROOM once normally, then reruns with weather-adjusted service times only when the weather delay crosses the configured threshold.
    • Adds focused tests for VROOM duration parsing, planned route start time, weather forecast hour selection, and forecast metadata.

Validation

Frontend

  • npm --prefix app/ui run lint
  • npm --prefix app/ui run format:check
  • npm --prefix app/ui run typecheck
  • npm --prefix app/ui run test
  • npm --prefix app/ui run build
  • npm --prefix app/mobile run lint
  • npm --prefix app/mobile run typecheck

Backend

  • cmake --preset dev
  • .github/scripts/check-backend-static.sh build/dev
  • cmake --build --preset conan-release --parallel
  • ctest --preset conan-release -R WeatherForecastOptimizerTest --output-on-failure
  • npx.cmd -y clang-format@1.8.0 --dry-run --Werror app/api/include/deliveryoptimizer/api/forecast_optimizer.hpp app/api/src/endpoints/deliveries_optimize_endpoint.cpp app/api/src/forecast_optimizer.cpp app/api/src/optimization_job_runtime.cpp tests/api/forecast_optimizer/weather_forecast_optimizer_test.cpp
  • git diff --check
  • cmake --build --preset dev --parallel
  • ctest --preset dev --output-on-failure --no-tests=error -LE 'e2e|docker'
  • docker compose -f deploy/compose/docker-compose.arm64.yml --env-file deploy/env/http-server.arm64.env config

Risk

  • Low to medium. The route solve can run twice when weather crosses the threshold, so large requests may take longer in bad weather cases.
  • If OpenWeather is unavailable or not configured, the backend falls back to the existing fixed-delay/disabled behavior.

Rollout and Recovery

  • Roll out with weather forecasting disabled or with conservative thresholds first.
  • If there are issues, disable weather handling with DELIVERYOPTIMIZER_WEATHER_FORECAST_ENABLED=0.
  • If OpenWeather causes problems, remove OPENWEATHER_API_KEY and the route still optimizes without live weather data.

High-Signal PR Checklist

  • The summary states the user-visible or operational outcome, not just file names.
  • The motivation explains why this change is needed now.
  • The change list separates frontend, backend, infrastructure, and documentation work where applicable.
  • Validation includes exact commands run, relevant output, and any checks intentionally skipped.
  • Risks, migrations, feature flags, and rollback steps are called out when relevant.
  • Screenshots or request/response examples are included for UI and API behavior changes.

@markboenigk

Copy link
Copy Markdown
Collaborator

Hey Hiral — the algorithmic upgrade here is exactly right. Using the actual VROOM summary.duration and the earliest vehicle time window to pick the correct forecast hour is a much more grounded approach than the rough service-time estimate from #179, and the jump from 3 to 8 focused tests gives real coverage of the new logic. ReadVroomDuration, ReadRouteStartTime, and IsRouteHour are all clean and the fallback paths (no matched hours → scan first 6; missing duration → baseline 0) are sensible. Good work on this one.

A few things need attention before this is ready to merge:

Must fix

  1. Second VROOM call blocks the event loop. In the sync endpoint's completion lambda, when should_reoptimize is true, you call runner->Run(...) directly on the Drogon event loop thread. ProcessVroomRunner::Run is a blocking subprocess call, so the first VROOM solve is async (via the coordinator) but the second is fully synchronous on the I/O loop. Every concurrent request stalls for the full vroom subprocess duration. This needs to be offloaded to a worker — drogon::async_run or submitting the re-run back through the coordinator are both workable paths.

  2. FetchOpenWeatherDelayEstimate still blocks the event loop (carry-over from feat: added weather-aware route optimization #179). The future.wait_for() inside the completion lambda halts whichever thread calls it. In the sync endpoint, that's the event loop thread. These two blocking calls compound — weather fetch stalls it first, then the VROOM re-run if the threshold is crossed.

  3. ResolveWeatherForecastOptionsFromEnv() per job (carry-over from feat: added weather-aware route optimization #179). WorkerLoop re-reads env vars on every solve iteration. Resolve once at OptimizationJobRuntime construction and store as a member.

  4. Thunder delay double-counts rain (carry-over from feat: added weather-aware route optimization #179). DelayFromHourlyForecast adds 240 s for IDs 200–299 and separately 90 s when hour["rain"].isObject(). OpenWeather thunderstorm responses almost always include a rain object, so a thunderstorm charges 330 s instead of 240 s. The two should be mutually exclusive.

Worth fixing before merge

  1. baseline_duration_seconds and baseline_route_duration_seconds are always identical. Both get set to normalized_baseline_seconds in EstimateWeatherImpact. Likewise, predicted_duration_seconds always equals weather_adjusted_duration_seconds. The forecast response will have two pairs of keys that are always equal — clean up the duplicates before consumers depend on the shape.

  2. ReadRouteStartTime called twice in EstimateRouteWeatherImpact. SetRouteTimes calls it internally, and the function also passes it explicitly to FetchOpenWeatherDelayEstimate. Hoist the result and share it.

  3. loadSessionFromText reports the wrong error on double-parse failure. When both Zod parses miss, the rethrown message comes from the outer catch (session-save schema), so a malformed raw-request file reports a misleading "savedAt" field path. Use the inner catch's error for the rethrow.

  4. Manifest purpose: "maskable" on a full-bleed SVG. Maskable icons clip to the inner 80% safe zone on Android — the current SVG's content touches the edges and will be cut. Use purpose: "any" or provide a padded variant.

  5. window.setTimeout(0) on the summary page is a race. A zero-delay timeout doesn't guarantee the driver page's useEffect persist has completed; it only defers to the next task. Pass the route via URL state or router.push with query params instead.

  6. Summary page warning button has no onClick. The WarningIcon button renders with aria-label="View remaining deliveries" but does nothing when tapped.

Frontend carry-overs from #179 (still present)

  1. StopCard.tsx Call button calls onNavigate instead of opening tel:${stop.phoneNumber}. Tapping "Call" launches Maps.
  2. upload-route/page.tsx JSX still uses className="ur-root" while the CSS now defines .upload-root — the root container gets no styles.
  3. handleContinue has no error handling — the removed try/catch leaves an unhandled promise rejection if file.text() throws or sessionStorage is unavailable.
  4. const [isProcessing] = useState(false) has no setter — the button's disabled state never changes.

One design gap to track separately

The driver PWA imports the session file in its pre-VROOM input order, not the optimized sequence. transformSessionToDriverRoute maps input.deliveries as-is, so drivers navigate stops in the original unoptimized order, defeating the route optimization. This predates this PR but is worth a dedicated issue.

The blocking event loop (items 1 and 2) is the blocker here — especially with the re-run compounding the issue. Everything else is tractable. Happy to look at a follow-up once those are cleared.

@hir-al-14 hir-al-14 force-pushed the weather_duration_accuracy branch from 20dee1d to 3e9ec73 Compare May 25, 2026 19:36
@hir-al-14

Copy link
Copy Markdown
Contributor Author

made all the fixes from the comment

still left to fix-

@markboenigk

Copy link
Copy Markdown
Collaborator

Hey Hiral — all four blockers and every worth-fixing item from the first round are resolved. The two-pass solve through the coordinator is clean, the sync path correctly skips the live weather fetch, and the useSyncExternalStore approach on the summary page is a proper fix for the setTimeout race. Good work getting through all of it.

Two small things worth a quick fix before merge:

  1. EstimateRouteWeatherImpact passes options instead of effective_options to FetchOpenWeatherDelayEstimate. They're equal at that callsite today, so it's not a bug — but if someone later modifies effective_options before the fetch, the fetch would silently see stale settings. Should be effective_options for consistency.

  2. The API-key-clearing pattern needs a comment. sync_weather_options.openweather_api_key.clear() is the mechanism for keeping the sync path off the event loop, but it reads like an accident to anyone who hasn't traced the call chain. A one-liner like // clear key so FetchOpenWeatherDelayEstimate short-circuits — sync path must not block the event loop makes the intent obvious.

Tracking:

Frontend lint/typecheck/test/build are still unchecked in the validation list — those should run before this merges.

@markboenigk

Copy link
Copy Markdown
Collaborator

CI failure — ui check

ESLint is failing with --max-warnings=0 on one unused import:

app/ui/src/app/driver_assist/summary/page.tsx:7
'DriverRoute' is defined but never used  @typescript-eslint/no-unused-vars

The useSyncExternalStore refactor removed the useState<DriverRoute | null> that needed the type, but the import wasn't cleaned up. Remove import type { DriverRoute } on line 7 to fix it.


Minor — RecalculateWeatherImpact has an implicit contract

In deliveries_optimize_endpoint.cpp, the API key is cleared on sync_weather_options before passing it to RecalculateWeatherImpact:

WeatherForecastOptions sync_weather_options = weather_options;
sync_weather_options.openweather_api_key.clear();
const WeatherImpactEstimate impact = RecalculateWeatherImpact(
    sync_weather_options, *optimize_request_ptr, *result.output);

RecalculateWeatherImpact doesn't document or enforce this — if called with a live key it will make a second OpenWeather request during recalculation. Worth a short comment at the call site or on the function to make the intent explicit, since optimization_job_runtime.cpp calls the same function and passes weather_options_ directly (no key clearing), which may be intentional but reads inconsistently.

@hir-al-14 hir-al-14 force-pushed the weather_duration_accuracy branch from 447773e to 652a45b Compare June 27, 2026 12:30
@markboenigk markboenigk merged commit ba561cd into benevolentbandwidth:main Jun 29, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants