Skip to content

fix: run Codex PR tests in CI#131

Open
Daltonganger wants to merge 11 commits intoopgginc:mainfrom
Daltonganger:fix/minimal-codex-lb-usage
Open

fix: run Codex PR tests in CI#131
Daltonganger wants to merge 11 commits intoopgginc:mainfrom
Daltonganger:fix/minimal-codex-lb-usage

Conversation

@Daltonganger
Copy link
Copy Markdown
Contributor

Summary

  • Add the missing PR test files to the Xcode test target so CI actually runs them.
  • Keep standard Codex usage decoding strict when no usable rate-limit window is present.
  • Fix Codex/OpenCode auth test compile issues and install xcbeautify before CI uses it.

Validation

  • xcodebuild test -project CopilotMonitor/CopilotMonitor.xcodeproj -scheme CopilotMonitor -configuration Debug -derivedDataPath /tmp/opencode-bar-pr-merge-derived-data CODE_SIGNING_ALLOWED=NO — 135 tests, 9 skipped, 0 failures.
  • xcodebuild build -project CopilotMonitor/CopilotMonitor.xcodeproj -scheme CopilotMonitor -configuration Debug -derivedDataPath /tmp/opencode-bar-pr-final-debug-derived-data CODE_SIGNING_ALLOWED=NO — succeeded.
  • Launched the Debug app and confirmed /tmp/provider_debug.log completed provider fetch/menu update.

Note: the local Release build path was intentionally stopped because this machine hangs in Swift Release optimization; the GitHub workflow still runs Release validation.

Daltonganger and others added 8 commits April 2, 2026 11:12
…le /v1 URL injection

Agent-Logs-Url: https://github.com/Daltonganger/opencode-bar/sessions/2672ff4d-23d5-4e27-9d06-5c3ec83b03b4

Co-authored-by: Daltonganger <17501732+Daltonganger@users.noreply.github.com>
Keep the safer OpenCode OpenAI auth decoding from main while preserving the PR's API-key and JSONC config coverage for Codex usage.
Avoid treating numeric 28-31 day windows as monthly so Codex usage labels remain exact across February and other month lengths. Also run real XCTest coverage in CI and align merge-related TokenManager call sites with the updated models.
…b-usage

# Conflicts:
#	CopilotMonitor/CopilotMonitor/Providers/CodexProvider.swift
#	CopilotMonitor/CopilotMonitorTests/CodexProviderTests.swift
@op-gg-ai-devops
Copy link
Copy Markdown
Contributor

op-gg-ai-devops Bot commented Apr 29, 2026

✅ AI Code Review Completed

Review finished. Check the PR for inline comments.


📋 View Logs | 🤖 Model: anthropic/claude-opus-4-7|high

Copy link
Copy Markdown
Contributor

@op-gg-ai-devops op-gg-ai-devops Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Solid little fix PR — three things that all needed doing:

  1. Wires up the missing test files (CodexProviderTests, OpenCodeAuthDecodingTests, OpenCodeZenProviderTests) into the Xcode test target. They were sitting in the CopilotMonitorTests directory but never compiled or executed. ngl this is the kind of silent CI gap that bites you 6 months later when someone "ran the tests" and they all "passed" because nothing ran. nice catch.
  2. Strict decoding in buildStandardPayload — instead of silently fabricating a RateLimitWindow(used_percent: 0, ...) when no window can be resolved, it now throws decodingError("Missing rate-limit window"). This is the right call: showing fake 0% used is worse than showing an error, because the user thinks they have full quota.
  3. Module name fix in OpenCodeAuthDecodingTests (@testable import CopilotMonitor@testable import OpenCode_Bar). The product is OpenCode Bar, so the module is OpenCode_Bar. Without this, the file simply wouldn't compile once added to the target.

The two test fixtures that previously had only secondary_window got a primary_window added — necessary because resolvedWindows would otherwise treat the lone secondary as the short window, breaking the "30d secondary" assertion.

What's good

  • The buildStandardPayload throw is the right semantic: resolvedWindows already has four fallback strategies (limit_window_secondsreset_after_seconds heuristic → primary_secondary_fallbackkey_order_fallback), so reaching nil means the response really has zero usable windows. Throwing here is honest.
  • try XCTUnwrap(payload.details.dailyUsage) instead of payload.details.dailyUsage == 25.0 — proper handling of the optional, no force-unwraps. clean.
  • xcbeautify install before the build/test steps is fine; macos-14 runners don't have it pre-installed.
  • The pbxproj diff is clean: PBXFileReference, PBXBuildFile, PBXGroup, and PBXSourcesBuildPhase all updated symmetrically. No orphaned references.

Things to consider (non-blocking)

  • No test for the new throw path. This PR introduces a new failure mode (ProviderError.decodingError("Missing rate-limit window")) but there's no XCTAssertThrowsError test exercising a rate_limit: {} or spark-only response. Worth adding so a future refactor doesn't silently revert the strict behavior. See inline.
  • Behavior change for fresh-account edge case. If the Codex API ever returns an empty rate_limit: {} for a brand-new OAuth account, the user now sees an error instead of "0% used". Per the AGENTS.md Error Status Display Instead of Loading reflection, the caller should surface this as Error status and not get stuck in loading — please confirm this works end-to-end on a fresh account, or that the upstream caller already handles decodingError gracefully.
  • Pre-existing inconsistency, not introduced here: AntigravityProviderVarintTests.swift still uses @testable import CopilotMonitor. It's not in the test target so it doesn't break the build, but it's a follow-up worth doing while the import drift is fresh in your head.

Verdict

Approve. This is the kind of housekeeping fix that quietly raises the floor — three real test files are now actually executed, and decoding fails loudly instead of lying. Ship it.


Re-review tips: This bot will automatically re-review on every push to this branch. You can also mention @op-gg-ai-devops in a PR comment to trigger another review with optional direction, e.g.:

  • @op-gg-ai-devops review
  • @op-gg-ai-devops review the test coverage only
  • @op-gg-ai-devops review focusing on security

Comment thread CopilotMonitor/CopilotMonitor/Providers/CodexProvider.swift Outdated
@op-gg-ai-devops
Copy link
Copy Markdown
Contributor

op-gg-ai-devops Bot commented Apr 29, 2026

✅ AI Code Review Completed

Review finished. Check the PR for inline comments.


📋 View Logs | 🤖 Model: anthropic/claude-opus-4-7|high

Copy link
Copy Markdown
Contributor

@op-gg-ai-devops op-gg-ai-devops Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Tiny follow-up commit on top of an already-approved PR — adds a "Service" label for the Codex provider when it's routed through an external usage endpoint (e.g. https://codex.2631.eu/v1) instead of plain ChatGPT. ngl this is exactly the right fix: when the OAuth credential's email belongs to a personal OpenAI account but requests are actually being load-balanced through some third-party Codex proxy, showing Codex (alice@gmail.com) was straight-up misleading. now you see Codex (Codex.2631.eu) and Service: Codex.2631.eu in the submenu, which is honest about what's happening.

The implementation is clean:

  • CodexEndpointConfiguration.externalServiceDisplayName is gated on both usesOpenAIProviderBaseURL == true AND mode == .external, so the explicit usageURL override path correctly returns nil (verified by the new XCTAssertNil assertion). This means users who configure their own usageURL keep seeing email labels, which is the right call.
  • The Codex capitalization for codex.<rest> hosts is a nice touch — Codex.2631.eu reads way better than codex.2631.eu in a menu.
  • Hoisting codexServiceDisplayName outside the per-account for loop in StatusBarController avoids redundant getCodexEndpointConfiguration() calls during multi-account rendering.

What's good

  • The label-vs-email branch ordering in StatusBarController is correct: external-service check comes first, then per-provider email fallbacks. No regression for .claude or for Codex setups without a base URL override.
  • Test coverage for both the positive case (Codex.2631.eu) and the explicit-override negative case (XCTAssertNil) — that's exactly what should be tested when adding a computed property like this.
  • ProviderMenuBuilder swaps from Email: to Service: with a sensible icon (server.rack) — visually it's clear this row represents an endpoint, not a person.

Things to consider (non-blocking)

  • Multi-account email visibility loss. When two+ Codex accounts coexist with usesOpenAIProviderBaseURL=true, both rows now show the same Codex #N (Codex.2631.eu) label, and the submenu hides the email entirely (Service: row replaces Email: row). The #N index still differentiates them, but if a user has e.g. work + personal OAuth credentials both routed through the same proxy, they can no longer eyeball which row corresponds to which credential. Worth thinking about whether the submenu should show both Service: and Email: when multiple accounts share an external endpoint. Not a blocker — the current behavior is defensible (the email may not even represent the actual proxy user).
  • displayName(forExternalHost:) unit-test gap. The static helper is only tested indirectly through one fixture (codex.2631.eu). Adding a couple of direct unit tests would lock down the contract: displayName(forExternalHost: "proxy.example.com") returns proxy.example.com as-is, displayName(forExternalHost: "Codex.example.com") still returns Codex.example.com, etc. Cheap insurance against future "let's make non-codex hosts look pretty too" refactors that might change behavior silently.
  • Prior review's XCTAssertThrowsError follow-up still pending. The previous bot review asked for a test covering the Missing rate-limit window throw path in buildStandardPayload. This commit didn't address it. Not in scope for this commit, just flagging that the carry-over is still open.

Verdict

Approve. Small, focused UX fix that makes the menu more honest about external Codex routing. The test additions match the new behavior, the configuration gating is correct, and there's no regression risk for the default ChatGPT path. Ship it.

Comment thread CopilotMonitor/CopilotMonitor/App/StatusBarController.swift Outdated
Comment thread CopilotMonitor/CopilotMonitor/Services/TokenManager.swift
@op-gg-ai-devops
Copy link
Copy Markdown
Contributor

op-gg-ai-devops Bot commented Apr 29, 2026

✅ AI Code Review Completed

Review finished. Check the PR for inline comments.


📋 View Logs | 🤖 Model: anthropic/claude-opus-4-7|high

Copy link
Copy Markdown
Contributor

@op-gg-ai-devops op-gg-ai-devops Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Third pass on this PR — the new 7f634a2 commit on top of the already-approved branch. Core changes since the last review:

  • StatusBarController consolidation: when Codex routes through an external proxy (usesOpenAIProviderBaseURL + .external mode), only the most-used account row is rendered.
  • OpenCodeZenProvider.parseModelCosts rewrite: handles the current multi-line oc stats table format (model name on one row, Cost $X.XX on a later row).
  • Service label plumbing: new externalServiceDisplayName on CodexEndpointConfiguration plus a "Service:" submenu row in ProviderMenuBuilder.
  • Test wiring: previously-orphan test files now actually compile and run.

honestly this is solid follow-up work — the parser refactor is much more readable than the old single-regex blob, the Codex.2631.eu hostname capitalization is a nice touch, and displayName(forExternalHost:) being pure-static makes it trivially testable. shipping the missing test files into the Xcode target and adding xcbeautify to CI is exactly the kind of unglamorous cleanup that earns trust.

Two non-blocking notes inline — both about edge cases that are unlikely today but worth pinning down before someone else trips over them. Approving.

What I checked

  • All 11 changed files end-to-end against the diff.
  • Compared the latest commit (7f634a2) against the two prior bot approvals (d3e1b08, 0f00ca3) to focus on what's new.
  • Verified OpenCodeAuthDecodingTests.swift module rename matches PRODUCT_NAME = "OpenCode Bar" (Swift module = OpenCode_Bar). AntigravityProviderVarintTests.swift still uses the old CopilotMonitor import — pre-existing, not in this PR's scope.
  • Cross-checked CI: xcbeautify is referenced in three later steps of test.yml, so the install step is genuinely required, not vestigial.
  • Confirmed displayAccounts consolidation is scoped only to the top-level menu loop (lines ~1860–1944); subscription presets, prediction logic, and submenus still iterate the full accounts list.

Smoke test note

This review env is Linux, so I didn't re-run xcodebuild. The PR description shows 135/135 passing locally on Debug, and CI's Build & Test is currently in progress on this commit.

Coding convention check

  • AGENTS.md → "Default review language: English" — locked English, all good.
  • No Korean strings introduced. SwiftLint passing on HEAD.

Comment thread CopilotMonitor/CopilotMonitor/Providers/OpenCodeZenProvider.swift
Comment thread CopilotMonitor/CopilotMonitor/App/StatusBarController.swift
@op-gg-ai-devops
Copy link
Copy Markdown
Contributor

op-gg-ai-devops Bot commented Apr 29, 2026

✅ AI Code Review Completed

Review finished. Check the PR for inline comments.


📋 View Logs | 🤖 Model: anthropic/claude-opus-4-7|high

Copy link
Copy Markdown
Contributor

@op-gg-ai-devops op-gg-ai-devops Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Fourth pass on this PR — edee0cd is a focused follow-up that closes out every non-blocking nit from the prior three reviews. honestly this is textbook how to address review feedback without scope-creep.

What's new since the last approval (7f634a2):

  1. Throw test addedtestDecodeUsagePayloadThrowsWhenRateLimitHasNoBaseWindow now pins down the strict-decoding behavior in CodexProvider.swift:613. Anyone who later tries to revert this with a silent ?? RateLimitWindow(used_percent: 0, ...) fallback will eat a CI failure. exactly the kind of test that pays for itself.
  2. displayName(forExternalHost:) direct unit tests added in TokenManagerTests.swift:100 covering codex-prefixed hosts, non-codex hosts, and the "codex" no-dot edge case. clean contract lock-in.
  3. parseModelCosts is now section-awareMODEL USAGE / TOOL USAGE / OVERVIEW / COST & TOKENS are detected as section headers and only MODEL USAGE rows contribute to modelCosts. The TOOL USAGE cost-row leak is now impossible. New testParseModelCostsIgnoresCostRowsOutsideModelUsageSection locks this down.
  4. Multi-account consolidation got smarter — the accounts.max(by:) tiebreak is now deterministic (smallest accountIndex wins on equal usage), and the consolidated label gained a , N accounts suffix so users at least know more accounts exist behind the single row. mitigates the earlier "extra accounts disappear" concern.
  5. Throw error message now identifies the account"Missing rate-limit window for {email|accountId|unknown account} from {authSource}". Way more useful than a bare "Missing rate-limit window" when you're staring at logs at 2am.

What's good

  • Every non-blocking suggestion from the previous three reviews got addressed in this commit. that's a clean signal.
  • The parseModelCosts state machine reads naturally now: section header → toggle isInModelUsageSection → only model/cost lines inside the section count. the separation of isStatsSectionHeader from isStatsMetricLine is the right factoring.
  • Deterministic tiebreak on accountIndex is a small thing but it eliminates flaky menu ordering on equal-usage accounts. nice.
  • All test additions have positive AND negative cases (Codex.example.com vs proxy.example.com, MODEL USAGE vs TOOL USAGE, missing-window vs valid-fixture).

Things to consider (non-blocking)

  • One minor leftover noted inline about how decodeUsagePayload's outer catch re-wraps the freshly thrown ProviderError.decodingError — the carefully formatted "Missing rate-limit window for X from Y" message ends up double-prefixed when errorDescription is rendered to the user. pre-existing behavior, but this PR makes the path live for the first time.

Verdict

Approve. Final commit is exactly the right shape: addresses the prior review tail, adds tests for the new behavior, doesn't touch anything unrelated. ship it.


Re-review tips: This bot will automatically re-review on every push to this branch. You can also mention @op-gg-ai-devops in a PR comment to trigger another review with optional direction, e.g.:

  • @op-gg-ai-devops review
  • @op-gg-ai-devops review the test coverage only
  • @op-gg-ai-devops review focusing on security

Comment thread CopilotMonitor/CopilotMonitor/Providers/CodexProvider.swift
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants