fix: run Codex PR tests in CI#131
Conversation
…le /v1 URL injection Agent-Logs-Url: https://github.com/Daltonganger/opencode-bar/sessions/2672ff4d-23d5-4e27-9d06-5c3ec83b03b4 Co-authored-by: Daltonganger <17501732+Daltonganger@users.noreply.github.com>
Keep the safer OpenCode OpenAI auth decoding from main while preserving the PR's API-key and JSONC config coverage for Codex usage.
Avoid treating numeric 28-31 day windows as monthly so Codex usage labels remain exact across February and other month lengths. Also run real XCTest coverage in CI and align merge-related TokenManager call sites with the updated models.
…b-usage # Conflicts: # CopilotMonitor/CopilotMonitor/Providers/CodexProvider.swift # CopilotMonitor/CopilotMonitorTests/CodexProviderTests.swift
✅ AI Code Review CompletedReview finished. Check the PR for inline comments. 📋 View Logs | 🤖 Model: |
There was a problem hiding this comment.
Summary
Solid little fix PR — three things that all needed doing:
- Wires up the missing test files (
CodexProviderTests,OpenCodeAuthDecodingTests,OpenCodeZenProviderTests) into the Xcode test target. They were sitting in theCopilotMonitorTestsdirectory but never compiled or executed. ngl this is the kind of silent CI gap that bites you 6 months later when someone "ran the tests" and they all "passed" because nothing ran. nice catch. - Strict decoding in
buildStandardPayload— instead of silently fabricating aRateLimitWindow(used_percent: 0, ...)when no window can be resolved, it now throwsdecodingError("Missing rate-limit window"). This is the right call: showing fake0% usedis worse than showing an error, because the user thinks they have full quota. - Module name fix in
OpenCodeAuthDecodingTests(@testable import CopilotMonitor→@testable import OpenCode_Bar). The product isOpenCode Bar, so the module isOpenCode_Bar. Without this, the file simply wouldn't compile once added to the target.
The two test fixtures that previously had only secondary_window got a primary_window added — necessary because resolvedWindows would otherwise treat the lone secondary as the short window, breaking the "30d secondary" assertion.
What's good
- The
buildStandardPayloadthrow is the right semantic:resolvedWindowsalready has four fallback strategies (limit_window_seconds→reset_after_secondsheuristic →primary_secondary_fallback→key_order_fallback), so reaching nil means the response really has zero usable windows. Throwing here is honest. try XCTUnwrap(payload.details.dailyUsage)instead ofpayload.details.dailyUsage == 25.0— proper handling of the optional, no force-unwraps. clean.xcbeautifyinstall before the build/test steps is fine; macos-14 runners don't have it pre-installed.- The pbxproj diff is clean: PBXFileReference, PBXBuildFile, PBXGroup, and PBXSourcesBuildPhase all updated symmetrically. No orphaned references.
Things to consider (non-blocking)
- No test for the new throw path. This PR introduces a new failure mode (
ProviderError.decodingError("Missing rate-limit window")) but there's noXCTAssertThrowsErrortest exercising arate_limit: {}or spark-only response. Worth adding so a future refactor doesn't silently revert the strict behavior. See inline. - Behavior change for fresh-account edge case. If the Codex API ever returns an empty
rate_limit: {}for a brand-new OAuth account, the user now sees an error instead of "0% used". Per the AGENTS.mdError Status Display Instead of Loadingreflection, the caller should surface this asErrorstatus and not get stuck in loading — please confirm this works end-to-end on a fresh account, or that the upstream caller already handlesdecodingErrorgracefully. - Pre-existing inconsistency, not introduced here:
AntigravityProviderVarintTests.swiftstill uses@testable import CopilotMonitor. It's not in the test target so it doesn't break the build, but it's a follow-up worth doing while the import drift is fresh in your head.
Verdict
Approve. This is the kind of housekeeping fix that quietly raises the floor — three real test files are now actually executed, and decoding fails loudly instead of lying. Ship it.
Re-review tips: This bot will automatically re-review on every push to this branch. You can also mention @op-gg-ai-devops in a PR comment to trigger another review with optional direction, e.g.:
@op-gg-ai-devops review@op-gg-ai-devops review the test coverage only@op-gg-ai-devops review focusing on security
✅ AI Code Review CompletedReview finished. Check the PR for inline comments. 📋 View Logs | 🤖 Model: |
There was a problem hiding this comment.
Summary
Tiny follow-up commit on top of an already-approved PR — adds a "Service" label for the Codex provider when it's routed through an external usage endpoint (e.g. https://codex.2631.eu/v1) instead of plain ChatGPT. ngl this is exactly the right fix: when the OAuth credential's email belongs to a personal OpenAI account but requests are actually being load-balanced through some third-party Codex proxy, showing Codex (alice@gmail.com) was straight-up misleading. now you see Codex (Codex.2631.eu) and Service: Codex.2631.eu in the submenu, which is honest about what's happening.
The implementation is clean:
CodexEndpointConfiguration.externalServiceDisplayNameis gated on bothusesOpenAIProviderBaseURL == trueANDmode == .external, so the explicitusageURLoverride path correctly returnsnil(verified by the newXCTAssertNilassertion). This means users who configure their ownusageURLkeep seeing email labels, which is the right call.- The
Codexcapitalization forcodex.<rest>hosts is a nice touch —Codex.2631.eureads way better thancodex.2631.euin a menu. - Hoisting
codexServiceDisplayNameoutside the per-accountforloop inStatusBarControlleravoids redundantgetCodexEndpointConfiguration()calls during multi-account rendering.
What's good
- The label-vs-email branch ordering in
StatusBarControlleris correct: external-service check comes first, then per-provider email fallbacks. No regression for.claudeor for Codex setups without a base URL override. - Test coverage for both the positive case (
Codex.2631.eu) and the explicit-override negative case (XCTAssertNil) — that's exactly what should be tested when adding a computed property like this. ProviderMenuBuilderswaps fromEmail:toService:with a sensible icon (server.rack) — visually it's clear this row represents an endpoint, not a person.
Things to consider (non-blocking)
- Multi-account email visibility loss. When two+ Codex accounts coexist with
usesOpenAIProviderBaseURL=true, both rows now show the sameCodex #N (Codex.2631.eu)label, and the submenu hides the email entirely (Service:row replacesEmail:row). The#Nindex still differentiates them, but if a user has e.g. work + personal OAuth credentials both routed through the same proxy, they can no longer eyeball which row corresponds to which credential. Worth thinking about whether the submenu should show bothService:andEmail:when multiple accounts share an external endpoint. Not a blocker — the current behavior is defensible (the email may not even represent the actual proxy user). displayName(forExternalHost:)unit-test gap. The static helper is only tested indirectly through one fixture (codex.2631.eu). Adding a couple of direct unit tests would lock down the contract:displayName(forExternalHost: "proxy.example.com")returnsproxy.example.comas-is,displayName(forExternalHost: "Codex.example.com")still returnsCodex.example.com, etc. Cheap insurance against future "let's make non-codex hosts look pretty too" refactors that might change behavior silently.- Prior review's
XCTAssertThrowsErrorfollow-up still pending. The previous bot review asked for a test covering theMissing rate-limit windowthrow path inbuildStandardPayload. This commit didn't address it. Not in scope for this commit, just flagging that the carry-over is still open.
Verdict
Approve. Small, focused UX fix that makes the menu more honest about external Codex routing. The test additions match the new behavior, the configuration gating is correct, and there's no regression risk for the default ChatGPT path. Ship it.
✅ AI Code Review CompletedReview finished. Check the PR for inline comments. 📋 View Logs | 🤖 Model: |
There was a problem hiding this comment.
Summary
Third pass on this PR — the new 7f634a2 commit on top of the already-approved branch. Core changes since the last review:
StatusBarControllerconsolidation: when Codex routes through an external proxy (usesOpenAIProviderBaseURL+.externalmode), only the most-used account row is rendered.OpenCodeZenProvider.parseModelCostsrewrite: handles the current multi-lineoc statstable format (model name on one row,Cost $X.XXon a later row).- Service label plumbing: new
externalServiceDisplayNameonCodexEndpointConfigurationplus a "Service:" submenu row inProviderMenuBuilder. - Test wiring: previously-orphan test files now actually compile and run.
honestly this is solid follow-up work — the parser refactor is much more readable than the old single-regex blob, the Codex.2631.eu hostname capitalization is a nice touch, and displayName(forExternalHost:) being pure-static makes it trivially testable. shipping the missing test files into the Xcode target and adding xcbeautify to CI is exactly the kind of unglamorous cleanup that earns trust.
Two non-blocking notes inline — both about edge cases that are unlikely today but worth pinning down before someone else trips over them. Approving.
What I checked
- All 11 changed files end-to-end against the diff.
- Compared the latest commit (
7f634a2) against the two prior bot approvals (d3e1b08,0f00ca3) to focus on what's new. - Verified
OpenCodeAuthDecodingTests.swiftmodule rename matchesPRODUCT_NAME = "OpenCode Bar"(Swift module =OpenCode_Bar).AntigravityProviderVarintTests.swiftstill uses the oldCopilotMonitorimport — pre-existing, not in this PR's scope. - Cross-checked CI:
xcbeautifyis referenced in three later steps oftest.yml, so the install step is genuinely required, not vestigial. - Confirmed
displayAccountsconsolidation is scoped only to the top-level menu loop (lines ~1860–1944); subscription presets, prediction logic, and submenus still iterate the fullaccountslist.
Smoke test note
This review env is Linux, so I didn't re-run xcodebuild. The PR description shows 135/135 passing locally on Debug, and CI's Build & Test is currently in progress on this commit.
Coding convention check
AGENTS.md→ "Default review language: English" — locked English, all good.- No Korean strings introduced. SwiftLint passing on HEAD.
✅ AI Code Review CompletedReview finished. Check the PR for inline comments. 📋 View Logs | 🤖 Model: |
There was a problem hiding this comment.
Summary
Fourth pass on this PR — edee0cd is a focused follow-up that closes out every non-blocking nit from the prior three reviews. honestly this is textbook how to address review feedback without scope-creep.
What's new since the last approval (7f634a2):
- Throw test added —
testDecodeUsagePayloadThrowsWhenRateLimitHasNoBaseWindownow pins down the strict-decoding behavior inCodexProvider.swift:613. Anyone who later tries to revert this with a silent?? RateLimitWindow(used_percent: 0, ...)fallback will eat a CI failure. exactly the kind of test that pays for itself. displayName(forExternalHost:)direct unit tests added inTokenManagerTests.swift:100covering codex-prefixed hosts, non-codex hosts, and the"codex"no-dot edge case. clean contract lock-in.parseModelCostsis now section-aware —MODEL USAGE/TOOL USAGE/OVERVIEW/COST & TOKENSare detected as section headers and onlyMODEL USAGErows contribute tomodelCosts. TheTOOL USAGEcost-row leak is now impossible. NewtestParseModelCostsIgnoresCostRowsOutsideModelUsageSectionlocks this down.- Multi-account consolidation got smarter — the
accounts.max(by:)tiebreak is now deterministic (smallestaccountIndexwins on equal usage), and the consolidated label gained a, N accountssuffix so users at least know more accounts exist behind the single row. mitigates the earlier "extra accounts disappear" concern. - Throw error message now identifies the account —
"Missing rate-limit window for {email|accountId|unknown account} from {authSource}". Way more useful than a bare "Missing rate-limit window" when you're staring at logs at 2am.
What's good
- Every non-blocking suggestion from the previous three reviews got addressed in this commit. that's a clean signal.
- The
parseModelCostsstate machine reads naturally now: section header → toggleisInModelUsageSection→ only model/cost lines inside the section count. the separation ofisStatsSectionHeaderfromisStatsMetricLineis the right factoring. - Deterministic tiebreak on
accountIndexis a small thing but it eliminates flaky menu ordering on equal-usage accounts. nice. - All test additions have positive AND negative cases (Codex.example.com vs proxy.example.com, MODEL USAGE vs TOOL USAGE, missing-window vs valid-fixture).
Things to consider (non-blocking)
- One minor leftover noted inline about how
decodeUsagePayload's outercatchre-wraps the freshly thrownProviderError.decodingError— the carefully formatted "Missing rate-limit window for X from Y" message ends up double-prefixed whenerrorDescriptionis rendered to the user. pre-existing behavior, but this PR makes the path live for the first time.
Verdict
Approve. Final commit is exactly the right shape: addresses the prior review tail, adds tests for the new behavior, doesn't touch anything unrelated. ship it.
Re-review tips: This bot will automatically re-review on every push to this branch. You can also mention @op-gg-ai-devops in a PR comment to trigger another review with optional direction, e.g.:
@op-gg-ai-devops review@op-gg-ai-devops review the test coverage only@op-gg-ai-devops review focusing on security
Summary
xcbeautifybefore CI uses it.Validation
xcodebuild test -project CopilotMonitor/CopilotMonitor.xcodeproj -scheme CopilotMonitor -configuration Debug -derivedDataPath /tmp/opencode-bar-pr-merge-derived-data CODE_SIGNING_ALLOWED=NO— 135 tests, 9 skipped, 0 failures.xcodebuild build -project CopilotMonitor/CopilotMonitor.xcodeproj -scheme CopilotMonitor -configuration Debug -derivedDataPath /tmp/opencode-bar-pr-final-debug-derived-data CODE_SIGNING_ALLOWED=NO— succeeded./tmp/provider_debug.logcompleted provider fetch/menu update.Note: the local Release build path was intentionally stopped because this machine hangs in Swift Release optimization; the GitHub workflow still runs Release validation.