fix: correct success detection logic in dashboard by BleakNarratives · Pull Request #28 · BleakNarratives/AIRTBench-Code

BleakNarratives · 2026-05-17T21:58:48Z

This PR fixes a bug in the AIRTBench frontend where successful flag captures from archive CSV data were not being correctly identified as successes. This caused the "Code City Apocalypse" dashboard to show zero successes even when flags were present.

Changes:

Modified is_event_success in airtbench/frontend.py to include a check for the flag_found_last_attempt_flag column, which is used in the archive datasets.
Applied a minor linting fix to get_monster_info to follow Python best practices (SIM118).

Verification:

Created and ran a script verify_success_logic.py which confirmed that 100% of the archive successes are now correctly detected (previously 0%).
Successfully ran ruff and mypy on the modified file.
Manually verified the dashboard UI using Playwright screenshots, confirming that sector statuses (SECURED) and global metrics (Sectors Compromised, Total Casualties) now correctly reflect the archive data.

PR created automatically by Jules for task 11690051173509139369 started by @BleakNarratives

- Update `is_event_success` to recognize successes from archive CSV data by checking the `flag_found_last_attempt_flag` column. - Refactor `get_monster_info` to remove redundant `.keys()` call in dictionary iteration. - Ensure dashboard metrics and sector statuses accurately reflect captured flags in 'Code City Apocalypse' scenario. Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>

google-labs-jules · 2026-05-17T21:58:49Z

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.

For security, I will only act on instructions from the user who triggered this task.

gemini-code-assist

Code Review

This pull request updates the is_event_success function in airtbench/frontend.py to include a check for the flag_found_last_attempt_flag field and simplifies dictionary key iteration in get_monster_info. Feedback suggests improving the robustness of the new success condition by ensuring the flag is a non-empty string to avoid false positives from empty data fields.

gemini-code-assist · 2026-05-17T22:01:17Z

        event_type == "beacon_found"
        or (isinstance(error, str) and "found_flag" in error)
        or (isinstance(result, str) and "found_flag" in result)
+        or pd.notna(row.get("flag_found_last_attempt_flag"))


The check for flag_found_last_attempt_flag should verify that the value is a non-empty string. Using pd.notna() alone would return True for an empty string, which could lead to false positives if the archive data contains empty fields for failed attempts. Using isinstance(..., str) with a truthiness check is more robust and maintains consistency with the logic used in the preceding lines (178-179).

Suggested change

or pd.notna(row.get("flag_found_last_attempt_flag"))

or (isinstance(flag := row.get("flag_found_last_attempt_flag"), str) and flag)

gemini-code-assist Bot reviewed May 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: correct success detection logic in dashboard#28

fix: correct success detection logic in dashboard#28
BleakNarratives wants to merge 1 commit into
mainfrom
fix/success-detection-logic-11690051173509139369

BleakNarratives commented May 17, 2026

Uh oh!

google-labs-jules Bot commented May 17, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	or pd.notna(row.get("flag_found_last_attempt_flag"))
	or (isinstance(flag := row.get("flag_found_last_attempt_flag"), str) and flag)

Conversation

BleakNarratives commented May 17, 2026

Uh oh!

google-labs-jules Bot commented May 17, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant