Enhance Code City Apocalypse Dashboard#34
Conversation
- Add 'System Logs' tab with terminal-style telemetry - Fix 'Model Success Rates' leaderboard to count unique successes - Include all active models in leaderboard (even with 0 successes) - Update global metrics labels and 'is_event_success' logic - Enhance 'Digital Bestiary' with success styling and danger colors - Add safety checks for empty data in manifests - Refine UI theme with neon-green success states Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
There was a problem hiding this comment.
Code Review
This pull request enhances the dashboard in airtbench/frontend.py by adding a new "System Logs" tab with terminal telemetry, styling improvements for the digital bestiary, and better handling of success metrics and empty data. The review feedback identifies a critical issue in the render_system_logs function where splitting HTML tags across multiple st.markdown calls breaks the layout, alongside potential crashes from NaN timestamps and missing HTML escaping. Additionally, it is recommended to ensure that flag_found_last_attempt_flag is not an empty string when determining event success.
| def render_system_logs(failure_df: pd.DataFrame) -> None: | ||
| st.header("Terminal Telemetry") | ||
| if failure_df.empty: | ||
| st.info("The system logs are empty. No signals detected.") | ||
| return | ||
|
|
||
| # Terminal-style container | ||
| st.markdown( | ||
| """ | ||
| <div style="background-color: #000; padding: 20px; border-radius: 5px; border: 1px solid #333; font-family: 'Fira Code', monospace; height: 600px; overflow-y: scroll;"> | ||
| """, | ||
| unsafe_allow_html=True, | ||
| ) | ||
|
|
||
| # Prioritize 'timestamp' or 'time' column if it exists | ||
| time_col = next((c for c in ["timestamp", "time", "created_at"] if c in failure_df.columns), None) | ||
|
|
||
| for _, row in failure_df.tail(100).iloc[::-1].iterrows(): | ||
| if time_col: | ||
| raw_time = row[time_col] | ||
| if isinstance(raw_time, (int, float)): | ||
| ts = datetime.datetime.fromtimestamp(raw_time, tz=datetime.timezone.utc).strftime("%H:%M:%S") | ||
| else: | ||
| ts = str(raw_time)[:19] | ||
| else: | ||
| ts = "LIVE" | ||
|
|
||
| model = row.get("model_display", "UNKNOWN") | ||
| challenge = row.get("challenge", "UNKNOWN") | ||
| is_success = row.get("is_success") | ||
|
|
||
| color = "#00ff41" if is_success else "#ff4b4b" | ||
| status = "SUCCESS" if is_success else "FAILURE" | ||
|
|
||
| msg = f"[{ts}] {model} @ {challenge} >> {status}" | ||
| if not is_success: | ||
| monster = get_monster_info(row)["monster"] | ||
| msg += f" (ENTITY: {monster})" | ||
|
|
||
| st.markdown(f'<p style="color: {color}; margin: 0; line-height: 1.2;">{msg}</p>', unsafe_allow_html=True) | ||
|
|
||
| st.markdown("</div>", unsafe_allow_html=True) |
There was a problem hiding this comment.
Issues Identified:
- Streamlit HTML Rendering Bug: In Streamlit, each
st.markdowncall renders as an independent block in the DOM. Opening a<div>in one call, rendering<p>tags in subsequent calls inside a loop, and closing the<div>in a final call will cause the browser to auto-close the<div>immediately, breaking the terminal container layout. - NaN Timestamp Crash:
NaNvalues in pandas are of typefloat. Therefore,isinstance(raw_time, (int, float))will evaluate toTrueforNaN, leading to aValueError: cannot convert float NaN to integerwhen callingdatetime.datetime.fromtimestamp(raw_time). - HTML Escaping: Escaping
model,challenge, andmonsterprevents layout breakage or XSS if they contain special characters.
Solution:
Accumulate the HTML lines in a list, check pd.notna(raw_time) before processing, escape the variables, and render everything in a single st.markdown call.
def render_system_logs(failure_df: pd.DataFrame) -> None:
st.header("Terminal Telemetry")
if failure_df.empty:
st.info("The system logs are empty. No signals detected.")
return
# Prioritize 'timestamp' or 'time' column if it exists
time_col = next((c for c in ["timestamp", "time", "created_at"] if c in failure_df.columns), None)
log_lines = []
for _, row in failure_df.tail(100).iloc[::-1].iterrows():
if time_col and pd.notna(row[time_col]):
raw_time = row[time_col]
if isinstance(raw_time, (int, float)):
ts = datetime.datetime.fromtimestamp(raw_time, tz=datetime.timezone.utc).strftime("%H:%M:%S")
else:
ts = str(raw_time)[:19]
else:
ts = "LIVE"
model = html.escape(str(row.get("model_display", "UNKNOWN")))
challenge = html.escape(str(row.get("challenge", "UNKNOWN")))
is_success = bool(row.get("is_success"))
color = "#00ff41" if is_success else "#ff4b4b"
status = "SUCCESS" if is_success else "FAILURE"
msg = f"[{ts}] {model} @ {challenge} >> {status}"
if not is_success:
monster = html.escape(get_monster_info(row)["monster"])
msg += f" (ENTITY: {monster})"
log_lines.append(f'<p style="color: {color}; margin: 0; line-height: 1.2;">{msg}</p>')
log_html = f"""
<div style="background-color: #000; padding: 20px; border-radius: 5px; border: 1px solid #333; font-family: 'Fira Code', monospace; height: 600px; overflow-y: scroll;">
{"".join(log_lines)}
</div>
"""
st.markdown(log_html, unsafe_allow_html=True)Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: BleakNarratives <bleaknarratives@gmail.com>
This PR provides a comprehensive upgrade to the 'Code City Apocalypse' frontend dashboard.
Key changes include:
.lore-card--success).is_event_successto handle archive-specific flag columns and added safety checks for empty datasets in visualizations.PR created automatically by Jules for task 17746370473832010499 started by @BleakNarratives