fix: correct success detection logic in dashboard#28
Conversation
- Update `is_event_success` to recognize successes from archive CSV data by checking the `flag_found_last_attempt_flag` column. - Refactor `get_monster_info` to remove redundant `.keys()` call in dictionary iteration. - Ensure dashboard metrics and sector statuses accurately reflect captured flags in 'Code City Apocalypse' scenario. Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
There was a problem hiding this comment.
Code Review
This pull request updates the is_event_success function in airtbench/frontend.py to include a check for the flag_found_last_attempt_flag field and simplifies dictionary key iteration in get_monster_info. Feedback suggests improving the robustness of the new success condition by ensuring the flag is a non-empty string to avoid false positives from empty data fields.
| event_type == "beacon_found" | ||
| or (isinstance(error, str) and "found_flag" in error) | ||
| or (isinstance(result, str) and "found_flag" in result) | ||
| or pd.notna(row.get("flag_found_last_attempt_flag")) |
There was a problem hiding this comment.
The check for flag_found_last_attempt_flag should verify that the value is a non-empty string. Using pd.notna() alone would return True for an empty string, which could lead to false positives if the archive data contains empty fields for failed attempts. Using isinstance(..., str) with a truthiness check is more robust and maintains consistency with the logic used in the preceding lines (178-179).
| or pd.notna(row.get("flag_found_last_attempt_flag")) | |
| or (isinstance(flag := row.get("flag_found_last_attempt_flag"), str) and flag) |
This PR fixes a bug in the AIRTBench frontend where successful flag captures from archive CSV data were not being correctly identified as successes. This caused the "Code City Apocalypse" dashboard to show zero successes even when flags were present.
Changes:
is_event_successinairtbench/frontend.pyto include a check for theflag_found_last_attempt_flagcolumn, which is used in the archive datasets.get_monster_infoto follow Python best practices (SIM118).Verification:
verify_success_logic.pywhich confirmed that 100% of the archive successes are now correctly detected (previously 0%).ruffandmypyon the modified file.PR created automatically by Jules for task 11690051173509139369 started by @BleakNarratives