Skip to content

Enhance Code City Apocalypse Dashboard#34

Open
BleakNarratives wants to merge 2 commits into
mainfrom
feat/frontend-apocalypse-v2-17746370473832010499
Open

Enhance Code City Apocalypse Dashboard#34
BleakNarratives wants to merge 2 commits into
mainfrom
feat/frontend-apocalypse-v2-17746370473832010499

Conversation

@BleakNarratives
Copy link
Copy Markdown
Owner

This PR provides a comprehensive upgrade to the 'Code City Apocalypse' frontend dashboard.

Key changes include:

  1. System Logs Tab: A new tab featuring a terminal-style view of recent system events, providing real-time operational awareness.
  2. Leaderboard Refinement: The 'Model Success Rates' chart now correctly calculates unique successes per challenge, ensuring accuracy in model evaluation. It also ensures all active models are represented, even those without a recorded success.
  3. Thematic UI Enhancements:
    • Added specific CSS for success states (.lore-card--success).
    • Improved the 'Digital Bestiary' with color-coded danger levels and explicit neon-green styling for non-adversarial events.
    • Updated global metric labels (e.g., 'Sectors Secured') to better align with mission objectives.
  4. Robust Data Handling: Improved is_event_success to handle archive-specific flag columns and added safety checks for empty datasets in visualizations.
  5. Code Quality: Verified with Ruff and Mypy; addressed code review feedback by ensuring standard library imports are used and avoiding unnecessary dependency bloat.

PR created automatically by Jules for task 17746370473832010499 started by @BleakNarratives

- Add 'System Logs' tab with terminal-style telemetry
- Fix 'Model Success Rates' leaderboard to count unique successes
- Include all active models in leaderboard (even with 0 successes)
- Update global metrics labels and 'is_event_success' logic
- Enhance 'Digital Bestiary' with success styling and danger colors
- Add safety checks for empty data in manifests
- Refine UI theme with neon-green success states

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
@google-labs-jules
Copy link
Copy Markdown

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the dashboard in airtbench/frontend.py by adding a new "System Logs" tab with terminal telemetry, styling improvements for the digital bestiary, and better handling of success metrics and empty data. The review feedback identifies a critical issue in the render_system_logs function where splitting HTML tags across multiple st.markdown calls breaks the layout, alongside potential crashes from NaN timestamps and missing HTML escaping. Additionally, it is recommended to ensure that flag_found_last_attempt_flag is not an empty string when determining event success.

Comment thread airtbench/frontend.py
Comment on lines +1010 to +1051
def render_system_logs(failure_df: pd.DataFrame) -> None:
st.header("Terminal Telemetry")
if failure_df.empty:
st.info("The system logs are empty. No signals detected.")
return

# Terminal-style container
st.markdown(
"""
<div style="background-color: #000; padding: 20px; border-radius: 5px; border: 1px solid #333; font-family: 'Fira Code', monospace; height: 600px; overflow-y: scroll;">
""",
unsafe_allow_html=True,
)

# Prioritize 'timestamp' or 'time' column if it exists
time_col = next((c for c in ["timestamp", "time", "created_at"] if c in failure_df.columns), None)

for _, row in failure_df.tail(100).iloc[::-1].iterrows():
if time_col:
raw_time = row[time_col]
if isinstance(raw_time, (int, float)):
ts = datetime.datetime.fromtimestamp(raw_time, tz=datetime.timezone.utc).strftime("%H:%M:%S")
else:
ts = str(raw_time)[:19]
else:
ts = "LIVE"

model = row.get("model_display", "UNKNOWN")
challenge = row.get("challenge", "UNKNOWN")
is_success = row.get("is_success")

color = "#00ff41" if is_success else "#ff4b4b"
status = "SUCCESS" if is_success else "FAILURE"

msg = f"[{ts}] {model} @ {challenge} >> {status}"
if not is_success:
monster = get_monster_info(row)["monster"]
msg += f" (ENTITY: {monster})"

st.markdown(f'<p style="color: {color}; margin: 0; line-height: 1.2;">{msg}</p>', unsafe_allow_html=True)

st.markdown("</div>", unsafe_allow_html=True)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Issues Identified:

  1. Streamlit HTML Rendering Bug: In Streamlit, each st.markdown call renders as an independent block in the DOM. Opening a <div> in one call, rendering <p> tags in subsequent calls inside a loop, and closing the <div> in a final call will cause the browser to auto-close the <div> immediately, breaking the terminal container layout.
  2. NaN Timestamp Crash: NaN values in pandas are of type float. Therefore, isinstance(raw_time, (int, float)) will evaluate to True for NaN, leading to a ValueError: cannot convert float NaN to integer when calling datetime.datetime.fromtimestamp(raw_time).
  3. HTML Escaping: Escaping model, challenge, and monster prevents layout breakage or XSS if they contain special characters.

Solution:

Accumulate the HTML lines in a list, check pd.notna(raw_time) before processing, escape the variables, and render everything in a single st.markdown call.

def render_system_logs(failure_df: pd.DataFrame) -> None:
    st.header("Terminal Telemetry")
    if failure_df.empty:
        st.info("The system logs are empty. No signals detected.")
        return

    # Prioritize 'timestamp' or 'time' column if it exists
    time_col = next((c for c in ["timestamp", "time", "created_at"] if c in failure_df.columns), None)

    log_lines = []
    for _, row in failure_df.tail(100).iloc[::-1].iterrows():
        if time_col and pd.notna(row[time_col]):
            raw_time = row[time_col]
            if isinstance(raw_time, (int, float)):
                ts = datetime.datetime.fromtimestamp(raw_time, tz=datetime.timezone.utc).strftime("%H:%M:%S")
            else:
                ts = str(raw_time)[:19]
        else:
            ts = "LIVE"

        model = html.escape(str(row.get("model_display", "UNKNOWN")))
        challenge = html.escape(str(row.get("challenge", "UNKNOWN")))
        is_success = bool(row.get("is_success"))

        color = "#00ff41" if is_success else "#ff4b4b"
        status = "SUCCESS" if is_success else "FAILURE"

        msg = f"[{ts}] {model} @ {challenge} >> {status}"
        if not is_success:
            monster = html.escape(get_monster_info(row)["monster"])
            msg += f" (ENTITY: {monster})"

        log_lines.append(f'<p style="color: {color}; margin: 0; line-height: 1.2;">{msg}</p>')

    log_html = f"""
    <div style="background-color: #000; padding: 20px; border-radius: 5px; border: 1px solid #333; font-family: 'Fira Code', monospace; height: 600px; overflow-y: scroll;">
        {"".join(log_lines)}
    </div>
    """
    st.markdown(log_html, unsafe_allow_html=True)

Comment thread airtbench/frontend.py Outdated
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: BleakNarratives <bleaknarratives@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant