Skip to content

feat: enhance Code City Apocalypse dashboard with logs and logic fixes#32

Open
BleakNarratives wants to merge 1 commit into
mainfrom
feat/improve-code-city-dashboard-16745960992002796971
Open

feat: enhance Code City Apocalypse dashboard with logs and logic fixes#32
BleakNarratives wants to merge 1 commit into
mainfrom
feat/improve-code-city-dashboard-16745960992002796971

Conversation

@BleakNarratives
Copy link
Copy Markdown
Owner

This PR significantly enhances the 'Code City Apocalypse' Streamlit dashboard.

Key changes include:

  1. System Logs Tab: A new interactive tab that displays live and archived system telemetry in a terminal-themed view, complete with timestamp parsing and model identification.
  2. Leaderboard Overhaul: Success rates are now calculated based on unique challenge IDs secured by each model, ensuring accurate representation of progress. All active models are listed, regardless of success count.
  3. Metric Correction: The 'Sectors Compromised' metric has been updated to count unique challenges with at least one success, aligning with the 'Secured Sectors' theme.
  4. Visual Polish:
    • Updated CSS for .lore-card to provide a more immersive apocalyptic feel.
    • Added a neon-green glow specifically for 'The Beacon of Hope' success event in the Digital Bestiary.
    • Standardized role labels in chat histories (🤖 AGENT, 👤 OPERATOR, ⚙️ SYSTEM).
  5. Robustness & Performance:
    • Added checks to prevent crashes when data is filtered down to zero events.
    • Used .copy() on dataframe slices to eliminate pandas warnings.
    • Integrated standard library datetime and html for safer log rendering.

Verified with ruff, mypy, and automated Playwright UI testing.


PR created automatically by Jules for task 16745960992002796971 started by @BleakNarratives

- Added 'System Logs' tab for terminal-style telemetry feed.
- Refactored leaderboard to correctly calculate unique successes per model.
- Fixed 'Sectors Compromised' logic to reflect secured sectors accurately.
- Improved 'Digital Bestiary' visuals with neon-green success highlights.
- Added safety guards for empty data states in monster manifests.
- Standardized use of .copy() to avoid SettingWithCopyWarning.
- Ensured all active models are shown in leaderboard, even with 0 successes.

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
@google-labs-jules
Copy link
Copy Markdown

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the airtbench frontend by adding a System Logs tab, refining the Digital Bestiary UI, and improving the accuracy of apocalypse metrics. It also includes logic refinements for success detection and data handling. Review feedback identifies performance bottlenecks in the new logging function related to string concatenation and redundant date parsing, and recommends hoisting constant list derivations to the global scope to avoid unnecessary re-computation.

Comment thread airtbench/frontend.py
Comment on lines +1006 to +1046
log_html = '<div class="log-container">'
# Display newest first
for _, row in failure_df.iloc[::-1].iterrows():
# Handle timestamp: use existing col if present, else fallback
timestamp = "LIVE"
if "timestamp" in row and pd.notna(row["timestamp"]):
try:
# If numeric, assume unix
ts_val = row["timestamp"]
if isinstance(ts_val, (int, float)):
timestamp = datetime.datetime.fromtimestamp(
ts_val,
tz=datetime.timezone.utc,
).strftime("%H:%M:%S")
else:
# Try ISO string
timestamp = pd.to_datetime(ts_val).strftime("%H:%M:%S")
except (ValueError, OSError):
timestamp = "LIVE"

model = html.escape(str(row.get("model_display", "UNKNOWN")))
challenge = html.escape(str(row.get("challenge", "UNKNOWN")))

if row.get("is_success"):
status_class = "log-status-success"
msg = "BEACON SECURED"
else:
status_class = "log-status-fail"
monster_info = get_monster_info(row)
msg = f"OVERWHELMED BY {monster_info['monster']}"

log_html += (
f'<div class="log-entry">'
f'<span class="log-time">[{timestamp}]</span> '
f'<span class="log-model">{model}</span> @ '
f'<span class="log-challenge">{challenge}</span>: '
f'<span class="{status_class}">{msg}</span>'
f"</div>"
)

log_html += "</div>"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The render_system_logs function has significant performance and scalability issues:

  1. Scalability: It attempts to render the entire failure_df as a single HTML block. As the dataset grows, this will result in a massive HTML payload, causing the Streamlit app to lag and potentially crashing the user's browser.
  2. Efficiency: Building a large string using += in a loop is an $O(n^2)$ operation in Python. Using "".join() with a list is much more efficient.
  3. Redundant Parsing: pd.to_datetime is called inside the loop for every row, which is a heavy operation.

I recommend limiting the display to the most recent entries (e.g., the last 100) and optimizing the string construction.

References
  1. Follow standard Python performance practices regarding string immutability and pandas iteration. (link)
  2. Avoid iterrows() and string concatenation in loops for large DataFrames to maintain performance.

Comment thread airtbench/frontend.py
else row.get("error")
)
error_keys = [k for k in ERROR_MAP.keys() if k != "found_flag"]
error_keys = [k for k in ERROR_MAP if k != "found_flag"]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The error_keys list is re-created every time get_monster_info is called. Since ERROR_MAP is a constant, this list (and the sorted version used later in the function) should be defined as a global constant to avoid redundant computations, especially when this function is called within loops (e.g., during log rendering).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant