Skip to content

feat: Enhance Code City Apocalypse dashboard and metrics#35

Open
BleakNarratives wants to merge 4 commits into
mainfrom
feat/code-city-dashboard-enhancements-15041334110763897136
Open

feat: Enhance Code City Apocalypse dashboard and metrics#35
BleakNarratives wants to merge 4 commits into
mainfrom
feat/code-city-dashboard-enhancements-15041334110763897136

Conversation

@BleakNarratives
Copy link
Copy Markdown
Owner

The dashboard for AIRTBench (Code City Apocalypse) has been significantly upgraded. Key improvements include a new terminal-style telemetry feed in the "System Logs" tab, accurate success tracking for both live and archived agent runs, and a more robust model leaderboard that doesn't skip agents with zero successes. Visual styling was also refined to better reflect the "digital restoration" theme when sectors are secured.


PR created automatically by Jules for task 15041334110763897136 started by @BleakNarratives

- Add a new "System Logs" tab with a live telemetry feed and timestamp parsing.
- Update success detection logic to include archive data via `flag_found_last_attempt_flag`.
- Ensure all active models (including those with 0 successes) are displayed in the leaderboard.
- Optimize 'Sectors Secured' metric to count unique successful challenges rather than total attempts.
- Add visual enhancements: neon-green success glow for secured beacons and normalized difficulty mapping.
- Add guard clauses for empty monster data to prevent dashboard crashes.
- Verified with Playwright and passed ruff/mypy linting.

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
@google-labs-jules
Copy link
Copy Markdown

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces several enhancements to the Streamlit frontend, including a new 'System Telemetry Feed' tab, improved leaderboard success rate calculations that correctly handle models with zero cleared sectors, and visual updates like success-themed lore cards. Additionally, Playwright has been added as a development dependency. The review feedback highlights three important issues: a potential logic bug where boolean False values in flag_found_last_attempt_flag could be incorrectly classified as successes, potential null values appearing in the leaderboard, and a Python 3.10 compatibility issue when parsing ISO timestamps containing the 'Z' suffix.

Comment thread airtbench/frontend.py Outdated
Comment thread airtbench/frontend.py Outdated
Comment thread airtbench/frontend.py
BleakNarratives and others added 3 commits June 4, 2026 10:14
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: BleakNarratives <bleaknarratives@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: BleakNarratives <bleaknarratives@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: BleakNarratives <bleaknarratives@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant