feat: Enhance Code City Apocalypse dashboard and metrics#35
feat: Enhance Code City Apocalypse dashboard and metrics#35BleakNarratives wants to merge 4 commits into
Conversation
- Add a new "System Logs" tab with a live telemetry feed and timestamp parsing. - Update success detection logic to include archive data via `flag_found_last_attempt_flag`. - Ensure all active models (including those with 0 successes) are displayed in the leaderboard. - Optimize 'Sectors Secured' metric to count unique successful challenges rather than total attempts. - Add visual enhancements: neon-green success glow for secured beacons and normalized difficulty mapping. - Add guard clauses for empty monster data to prevent dashboard crashes. - Verified with Playwright and passed ruff/mypy linting. Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
There was a problem hiding this comment.
Code Review
This pull request introduces several enhancements to the Streamlit frontend, including a new 'System Telemetry Feed' tab, improved leaderboard success rate calculations that correctly handle models with zero cleared sectors, and visual updates like success-themed lore cards. Additionally, Playwright has been added as a development dependency. The review feedback highlights three important issues: a potential logic bug where boolean False values in flag_found_last_attempt_flag could be incorrectly classified as successes, potential null values appearing in the leaderboard, and a Python 3.10 compatibility issue when parsing ISO timestamps containing the 'Z' suffix.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: BleakNarratives <bleaknarratives@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: BleakNarratives <bleaknarratives@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: BleakNarratives <bleaknarratives@gmail.com>
The dashboard for AIRTBench (Code City Apocalypse) has been significantly upgraded. Key improvements include a new terminal-style telemetry feed in the "System Logs" tab, accurate success tracking for both live and archived agent runs, and a more robust model leaderboard that doesn't skip agents with zero successes. Visual styling was also refined to better reflect the "digital restoration" theme when sectors are secured.
PR created automatically by Jules for task 15041334110763897136 started by @BleakNarratives