Enhance Code City Apocalypse Dashboard by BleakNarratives · Pull Request #34 · BleakNarratives/AIRTBench-Code

BleakNarratives · 2026-05-29T21:43:59Z

This PR provides a comprehensive upgrade to the 'Code City Apocalypse' frontend dashboard.

Key changes include:

System Logs Tab: A new tab featuring a terminal-style view of recent system events, providing real-time operational awareness.
Leaderboard Refinement: The 'Model Success Rates' chart now correctly calculates unique successes per challenge, ensuring accuracy in model evaluation. It also ensures all active models are represented, even those without a recorded success.
Thematic UI Enhancements:
- Added specific CSS for success states (.lore-card--success).
- Improved the 'Digital Bestiary' with color-coded danger levels and explicit neon-green styling for non-adversarial events.
- Updated global metric labels (e.g., 'Sectors Secured') to better align with mission objectives.
Robust Data Handling: Improved is_event_success to handle archive-specific flag columns and added safety checks for empty datasets in visualizations.
Code Quality: Verified with Ruff and Mypy; addressed code review feedback by ensuring standard library imports are used and avoiding unnecessary dependency bloat.

PR created automatically by Jules for task 17746370473832010499 started by @BleakNarratives

- Add 'System Logs' tab with terminal-style telemetry - Fix 'Model Success Rates' leaderboard to count unique successes - Include all active models in leaderboard (even with 0 successes) - Update global metrics labels and 'is_event_success' logic - Enhance 'Digital Bestiary' with success styling and danger colors - Add safety checks for empty data in manifests - Refine UI theme with neon-green success states Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>

google-labs-jules · 2026-05-29T21:44:01Z

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.

For security, I will only act on instructions from the user who triggered this task.

gemini-code-assist

Code Review

This pull request enhances the dashboard in airtbench/frontend.py by adding a new "System Logs" tab with terminal telemetry, styling improvements for the digital bestiary, and better handling of success metrics and empty data. The review feedback identifies a critical issue in the render_system_logs function where splitting HTML tags across multiple st.markdown calls breaks the layout, alongside potential crashes from NaN timestamps and missing HTML escaping. Additionally, it is recommended to ensure that flag_found_last_attempt_flag is not an empty string when determining event success.

gemini-code-assist · 2026-05-29T21:45:47Z

+def render_system_logs(failure_df: pd.DataFrame) -> None:
+    st.header("Terminal Telemetry")
+    if failure_df.empty:
+        st.info("The system logs are empty. No signals detected.")
+        return
+
+    # Terminal-style container
+    st.markdown(
+        """
+        <div style="background-color: #000; padding: 20px; border-radius: 5px; border: 1px solid #333; font-family: 'Fira Code', monospace; height: 600px; overflow-y: scroll;">
+        """,
+        unsafe_allow_html=True,
+    )
+
+    # Prioritize 'timestamp' or 'time' column if it exists
+    time_col = next((c for c in ["timestamp", "time", "created_at"] if c in failure_df.columns), None)
+
+    for _, row in failure_df.tail(100).iloc[::-1].iterrows():
+        if time_col:
+            raw_time = row[time_col]
+            if isinstance(raw_time, (int, float)):
+                ts = datetime.datetime.fromtimestamp(raw_time, tz=datetime.timezone.utc).strftime("%H:%M:%S")
+            else:
+                ts = str(raw_time)[:19]
+        else:
+            ts = "LIVE"
+
+        model = row.get("model_display", "UNKNOWN")
+        challenge = row.get("challenge", "UNKNOWN")
+        is_success = row.get("is_success")
+
+        color = "#00ff41" if is_success else "#ff4b4b"
+        status = "SUCCESS" if is_success else "FAILURE"
+
+        msg = f"[{ts}] {model} @ {challenge} >> {status}"
+        if not is_success:
+            monster = get_monster_info(row)["monster"]
+            msg += f" (ENTITY: {monster})"
+
+        st.markdown(f'<p style="color: {color}; margin: 0; line-height: 1.2;">{msg}</p>', unsafe_allow_html=True)
+
+    st.markdown("</div>", unsafe_allow_html=True)


Issues Identified:

Streamlit HTML Rendering Bug: In Streamlit, each st.markdown call renders as an independent block in the DOM. Opening a <div> in one call, rendering <p> tags in subsequent calls inside a loop, and closing the <div> in a final call will cause the browser to auto-close the <div> immediately, breaking the terminal container layout.

NaN Timestamp Crash: NaN values in pandas are of type float. Therefore, isinstance(raw_time, (int, float)) will evaluate to True for NaN, leading to a ValueError: cannot convert float NaN to integer when calling datetime.datetime.fromtimestamp(raw_time).

HTML Escaping: Escaping model, challenge, and monster prevents layout breakage or XSS if they contain special characters.

Solution:

Accumulate the HTML lines in a list, check pd.notna(raw_time) before processing, escape the variables, and render everything in a single st.markdown call.

def render_system_logs(failure_df: pd.DataFrame) -> None: st.header("Terminal Telemetry") if failure_df.empty: st.info("The system logs are empty. No signals detected.") return # Prioritize 'timestamp' or 'time' column if it exists time_col = next((c for c in ["timestamp", "time", "created_at"] if c in failure_df.columns), None) log_lines = [] for _, row in failure_df.tail(100).iloc[::-1].iterrows(): if time_col and pd.notna(row[time_col]): raw_time = row[time_col] if isinstance(raw_time, (int, float)): ts = datetime.datetime.fromtimestamp(raw_time, tz=datetime.timezone.utc).strftime("%H:%M:%S") else: ts = str(raw_time)[:19] else: ts = "LIVE" model = html.escape(str(row.get("model_display", "UNKNOWN"))) challenge = html.escape(str(row.get("challenge", "UNKNOWN"))) is_success = bool(row.get("is_success")) color = "#00ff41" if is_success else "#ff4b4b" status = "SUCCESS" if is_success else "FAILURE" msg = f"[{ts}] {model} @ {challenge} >> {status}" if not is_success: monster = html.escape(get_monster_info(row)["monster"]) msg += f" (ENTITY: {monster})" log_lines.append(f'<p style="color: {color}; margin: 0; line-height: 1.2;">{msg}</p>') log_html = f""" <div style="background-color: #000; padding: 20px; border-radius: 5px; border: 1px solid #333; font-family: 'Fira Code', monospace; height: 600px; overflow-y: scroll;"> {"".join(log_lines)} </div> """ st.markdown(log_html, unsafe_allow_html=True)

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: BleakNarratives <bleaknarratives@gmail.com>

gemini-code-assist Bot reviewed May 29, 2026

View reviewed changes

Update airtbench/frontend.py

6a4fc29

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: BleakNarratives <bleaknarratives@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance Code City Apocalypse Dashboard#34

Enhance Code City Apocalypse Dashboard#34
BleakNarratives wants to merge 2 commits into
mainfrom
feat/frontend-apocalypse-v2-17746370473832010499

BleakNarratives commented May 29, 2026

Uh oh!

google-labs-jules Bot commented May 29, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

BleakNarratives commented May 29, 2026

Uh oh!

google-labs-jules Bot commented May 29, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 29, 2026

Choose a reason for hiding this comment

Issues Identified:

Solution:

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant