Skip to content

fix: correct success detection logic in dashboard#28

Open
BleakNarratives wants to merge 1 commit into
mainfrom
fix/success-detection-logic-11690051173509139369
Open

fix: correct success detection logic in dashboard#28
BleakNarratives wants to merge 1 commit into
mainfrom
fix/success-detection-logic-11690051173509139369

Conversation

@BleakNarratives
Copy link
Copy Markdown
Owner

This PR fixes a bug in the AIRTBench frontend where successful flag captures from archive CSV data were not being correctly identified as successes. This caused the "Code City Apocalypse" dashboard to show zero successes even when flags were present.

Changes:

  • Modified is_event_success in airtbench/frontend.py to include a check for the flag_found_last_attempt_flag column, which is used in the archive datasets.
  • Applied a minor linting fix to get_monster_info to follow Python best practices (SIM118).

Verification:

  • Created and ran a script verify_success_logic.py which confirmed that 100% of the archive successes are now correctly detected (previously 0%).
  • Successfully ran ruff and mypy on the modified file.
  • Manually verified the dashboard UI using Playwright screenshots, confirming that sector statuses (SECURED) and global metrics (Sectors Compromised, Total Casualties) now correctly reflect the archive data.

PR created automatically by Jules for task 11690051173509139369 started by @BleakNarratives

- Update `is_event_success` to recognize successes from archive CSV data by checking the `flag_found_last_attempt_flag` column.
- Refactor `get_monster_info` to remove redundant `.keys()` call in dictionary iteration.
- Ensure dashboard metrics and sector statuses accurately reflect captured flags in 'Code City Apocalypse' scenario.

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
@google-labs-jules
Copy link
Copy Markdown

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the is_event_success function in airtbench/frontend.py to include a check for the flag_found_last_attempt_flag field and simplifies dictionary key iteration in get_monster_info. Feedback suggests improving the robustness of the new success condition by ensuring the flag is a non-empty string to avoid false positives from empty data fields.

Comment thread airtbench/frontend.py
event_type == "beacon_found"
or (isinstance(error, str) and "found_flag" in error)
or (isinstance(result, str) and "found_flag" in result)
or pd.notna(row.get("flag_found_last_attempt_flag"))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The check for flag_found_last_attempt_flag should verify that the value is a non-empty string. Using pd.notna() alone would return True for an empty string, which could lead to false positives if the archive data contains empty fields for failed attempts. Using isinstance(..., str) with a truthiness check is more robust and maintains consistency with the logic used in the preceding lines (178-179).

Suggested change
or pd.notna(row.get("flag_found_last_attempt_flag"))
or (isinstance(flag := row.get("flag_found_last_attempt_flag"), str) and flag)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant