RFC0055 Identity-Aware Routing#535
Conversation
46b4007 to
23b4a0c
Compare
Latest Update: RFC-Compliant Post-Selection AuthorizationImplemented breaking change to replace pre-selection authorization with strict post-selection enforcement per RFC lines 475-517. Key Changes (commit cbf0695)Architecture:
Implementation:
Test Coverage:
RFC Compliance✅ Intermittent 403s - Expected for shared routes across scope boundaries (RFC-compliant) Breaking ChangeDeprecated:
Integration Test ResultsAll integration tests compile successfully. Shared route scenarios validate:
Ready for full integration test run and review. |
Refactoring: AuthError for Future ExtensibilityCommit: 4ff64b9 Renamed Changes
Benefits
No functional changes - pure refactoring for extensibility. |
1f9b804 to
79271b7
Compare
5cc4170 to
b875867
Compare
|
Short update for the people following this PR. After debugging Amelia's environment a bit together, the current working theory is an older version of Diego, without generic route options support is causing the observed behavior. |
|
Things work much better with an up-to-date diego :) |
|
Issue: Inconsistent Access Log Fields The new access log fields are conditionally present instead of always appearing with "-" when empty. Compare these logs: First log has no caller_* or auth* fields, second has them. Standard pattern is all fields should always present. |
|
Issue: fields and names in access logs here are the new fields I want to make it clear that these are only for this type of route. I suggest the following renames, but am open to suggestions. caller_app --> caller_cf_app I don't know if these two values are needed and I would advocate for removing them |
|
Issue: per request logs in gorouter.stdout.log these logs are being logged per request in gorouter.stdout.log. This duplicates access log information and creates log volume amplification risk. I suggest removing these logs (and any others that are per request). |
Good point. This is addressed by commit c058620 which added |
- Rename caller_app/space/org → caller_cf_app/space/org for clarity - Remove auth, auth_rule, auth_denied_reason fields (not needed) - Always emit tls_sni and caller_cf_* fields with "-" when empty - Removes conditional emission that caused inconsistent log output
Fixed — the identity-aware routing access log fields are now always emitted with "-" when empty (matching the standard pattern for all other log fields). The fields have been renamed: |
Per-request denial log statements (mtls-route-policies-denied, mtls-pre-auth-denied, mtls-scope-auth-denied, post-selection-auth-denied) now log at DEBUG level to avoid log volume amplification in production. The access log already captures all denial information via caller_cf_* fields and HTTP status codes. These DEBUG logs remain available for local debugging when operators enable debug-level logging.
Downgraded all 7 per-request denial log statements from INFO to DEBUG level (1c7f1c5). This eliminates the log volume amplification concern in production while keeping them available for operators who enable debug logging during troubleshooting. The access log already captures denial information via If you'd prefer these statements be removed entirely rather than downgraded, happy to do that instead — let me know. |
I didn't say auth_rule should be deleted! I like that one. I suggested a rename to route_policy. And when there is no matching routing policy I suggested using a "-" instead of "route:no_route_policies" |
I vote for removing them entirely since they duplicate the access log. |
| return "" | ||
| } | ||
|
|
||
| return p.endpoints[0].endpoint.ApplicationId |
There was a problem hiding this comment.
There is a chance that routes are stale (by design) and these route policies are out of date.
I think it would be better to store route policies at the pool level instead of on the endpoint object. This would also reduce mutex contention by avoiding the need to acquire the pool lock twice per request.
…ation tests - Update router.client_cert_validation description to note that router.domains enforce mTLS independently - Update router.domains description to clarify relationship with router.client_cert_validation - Add rspec tests for all ERB template validation branches: non-array input, non-hash entry, missing/empty name, missing/empty ca_certs, invalid forwarded_client_cert mode, and invalid xfcc_format value Addresses PR #535 review threads 1-8.
- Rename identityHandler to cfIdentityHandler / NewCfIdentity to clarify it is specific to CF app instance identity certificates (thread 9) - Guard identity extraction: only run when (1) TLS was used and (2) the host is a configured mTLS domain, preventing XFCC header spoofing on non-mTLS routes (thread 10) - Move MtlsPreAuth handler above ClientCert in the proxy chain so a 421 response skips unnecessary certificate processing (thread 11) - Use configured xfcc_format from domain config instead of auto-detecting format at runtime; reject if format doesn't match (thread 12) All 386 handler tests and 179 proxy tests passing.
| if !strings.HasSuffix(hostname, suffix) { | ||
| return false | ||
| } |
There was a problem hiding this comment.
I think this is case sensitive, but URLs are case insensitive.
RFC0055: Identity-Aware mTLS Routing
Implements Phase 1 (1a + 1b) of RFC0055: App-to-App mTLS Routing.
Tracking: cloudfoundry/community#1481
Acceptance Testing Guide: https://gist.github.com/rkoster/5b252b0edca606f10be2dbdcb81a796f
What This Does
Enables GoRouter to enforce mutual TLS and identity-based authorization on a per-domain basis. Apps calling routes on configured mTLS domains must present a valid Diego instance identity certificate. GoRouter extracts the caller's app/space/org identity and checks it against route policies before forwarding the request.
Phase 1a: mTLS Infrastructure
GetConfigForClientcallback (SNI-based)raw: base64-encoded full certificate (~1.5KB)envoy: compactHash=...;Subject="..."format (~250 bytes)router.domainsPhase 1b: Authorization
scopeandallowed_sources) against selected endpointKey Design Decisions
router.domainsis configured in the BOSH manifest and a shared domain with--enforce-access-rulesis created.Testing
go fmt,go vet,staticcheck, ginkgo with--raceConfiguration Example
Related PRs
Merge Ordering
All PRs are independently safe to merge — the feature is dormant without the ops-file and domain creation. No strict ordering required. Recommend merging around the same time once all are approved.
AI Disclosure
This PR was developed with AI assistance. All code has been read and verified manually. Each error path, branch, and edge case has corresponding test coverage.