fix: Resolve ValueError in distance matrix pivot & correct speed grid filtering#2
Open
pdml422 wants to merge 2 commits into
Open
fix: Resolve ValueError in distance matrix pivot & correct speed grid filtering#2pdml422 wants to merge 2 commits into
pdml422 wants to merge 2 commits into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR addresses two bugs encountered when evaluating a Multi-modal GTFS dataset (Bus + Metro/Subway with constant speeds) in Hanoi.
Bug 1 : Constant-speed routes being skipped in Headway calculation
if gtfs_length_i == gtfs_length: continueacts as a premature optimization. If a route has a perfectly constant speed (e.g., a Metro line running at exactly 35km/h), it passes thespeed_gridthresholds without reducing thegtfs_length. Consequently, the loop skips computation for these speeds, resulting in constant-speed modes being incorrectly assigned the lowest speed bin (e.g., 5km/h). Furthermore, overwritinggtfs_selectioninside the loop breaks the filtering logic for subsequent iterations.gtfs_length_i == gtfs_lengthcheck to ensure computation runs for all valid thresholds. Introducedgtfs_selection_ito preserve the original DataFrame scope during filtering.Bug 2 : ValueError: Index contains duplicate entries, cannot reshape during .pivot()
mode_factorfor Bus=0.8, Metro=1.0), rounding collisions occur. Two different stops from different modes can end up with the exact samestop_quality_gridvalue (e.g., 0.47) but slightly different finalquality_gridvalues after distance multiplication. The previousdrop_duplicateslogic included'quality_grid', which failed to drop these overlapping pairs, ultimately crashing the.pivot()function.'quality_grid'from the subset indrop_duplicates(). Chained.sort_values('quality_grid')and usedkeep='last'to ensure that in the event of a rounding collision, the highest accessibility score is correctly retained for that grid intersection.Testing: Tested successfully on a custom Hanoi GTFS feed containing both street-level buses and grade-separated Metro lines. The fix prevents the ValueError and accurately colors Metro lines based on their true speed.