feat: enhance Code City Apocalypse dashboard with logs and logic fixes#32
feat: enhance Code City Apocalypse dashboard with logs and logic fixes#32BleakNarratives wants to merge 1 commit into
Conversation
- Added 'System Logs' tab for terminal-style telemetry feed. - Refactored leaderboard to correctly calculate unique successes per model. - Fixed 'Sectors Compromised' logic to reflect secured sectors accurately. - Improved 'Digital Bestiary' visuals with neon-green success highlights. - Added safety guards for empty data states in monster manifests. - Standardized use of .copy() to avoid SettingWithCopyWarning. - Ensured all active models are shown in leaderboard, even with 0 successes. Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
There was a problem hiding this comment.
Code Review
This pull request updates the airtbench frontend by adding a System Logs tab, refining the Digital Bestiary UI, and improving the accuracy of apocalypse metrics. It also includes logic refinements for success detection and data handling. Review feedback identifies performance bottlenecks in the new logging function related to string concatenation and redundant date parsing, and recommends hoisting constant list derivations to the global scope to avoid unnecessary re-computation.
| log_html = '<div class="log-container">' | ||
| # Display newest first | ||
| for _, row in failure_df.iloc[::-1].iterrows(): | ||
| # Handle timestamp: use existing col if present, else fallback | ||
| timestamp = "LIVE" | ||
| if "timestamp" in row and pd.notna(row["timestamp"]): | ||
| try: | ||
| # If numeric, assume unix | ||
| ts_val = row["timestamp"] | ||
| if isinstance(ts_val, (int, float)): | ||
| timestamp = datetime.datetime.fromtimestamp( | ||
| ts_val, | ||
| tz=datetime.timezone.utc, | ||
| ).strftime("%H:%M:%S") | ||
| else: | ||
| # Try ISO string | ||
| timestamp = pd.to_datetime(ts_val).strftime("%H:%M:%S") | ||
| except (ValueError, OSError): | ||
| timestamp = "LIVE" | ||
|
|
||
| model = html.escape(str(row.get("model_display", "UNKNOWN"))) | ||
| challenge = html.escape(str(row.get("challenge", "UNKNOWN"))) | ||
|
|
||
| if row.get("is_success"): | ||
| status_class = "log-status-success" | ||
| msg = "BEACON SECURED" | ||
| else: | ||
| status_class = "log-status-fail" | ||
| monster_info = get_monster_info(row) | ||
| msg = f"OVERWHELMED BY {monster_info['monster']}" | ||
|
|
||
| log_html += ( | ||
| f'<div class="log-entry">' | ||
| f'<span class="log-time">[{timestamp}]</span> ' | ||
| f'<span class="log-model">{model}</span> @ ' | ||
| f'<span class="log-challenge">{challenge}</span>: ' | ||
| f'<span class="{status_class}">{msg}</span>' | ||
| f"</div>" | ||
| ) | ||
|
|
||
| log_html += "</div>" |
There was a problem hiding this comment.
The render_system_logs function has significant performance and scalability issues:
-
Scalability: It attempts to render the entire
failure_dfas a single HTML block. As the dataset grows, this will result in a massive HTML payload, causing the Streamlit app to lag and potentially crashing the user's browser. -
Efficiency: Building a large string using
+=in a loop is an$O(n^2)$ operation in Python. Using"".join()with a list is much more efficient. -
Redundant Parsing:
pd.to_datetimeis called inside the loop for every row, which is a heavy operation.
I recommend limiting the display to the most recent entries (e.g., the last 100) and optimizing the string construction.
References
- Follow standard Python performance practices regarding string immutability and pandas iteration. (link)
- Avoid iterrows() and string concatenation in loops for large DataFrames to maintain performance.
| else row.get("error") | ||
| ) | ||
| error_keys = [k for k in ERROR_MAP.keys() if k != "found_flag"] | ||
| error_keys = [k for k in ERROR_MAP if k != "found_flag"] |
There was a problem hiding this comment.
The error_keys list is re-created every time get_monster_info is called. Since ERROR_MAP is a constant, this list (and the sorted version used later in the function) should be defined as a global constant to avoid redundant computations, especially when this function is called within loops (e.g., during log rendering).
This PR significantly enhances the 'Code City Apocalypse' Streamlit dashboard.
Key changes include:
.lore-cardto provide a more immersive apocalyptic feel..copy()on dataframe slices to eliminate pandas warnings.datetimeandhtmlfor safer log rendering.Verified with
ruff,mypy, and automated Playwright UI testing.PR created automatically by Jules for task 16745960992002796971 started by @BleakNarratives