Skip to content

feat: pin database image to prevent silent changes on CP upgrade#403

Open
tsivaprasad wants to merge 1 commit into
PLAT-599-allow-custom-image-override-per-database-or-nodefrom
PLAT-600-pin-database-image-to-prevent-silent-changes-on-cp-upgrade
Open

feat: pin database image to prevent silent changes on CP upgrade#403
tsivaprasad wants to merge 1 commit into
PLAT-599-allow-custom-image-override-per-database-or-nodefrom
PLAT-600-pin-database-image-to-prevent-silent-changes-on-cp-upgrade

Conversation

@tsivaprasad

@tsivaprasad tsivaprasad commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Summary

This PR pins each database instance's container image at creation time by persisting it in ResolvedImage within etcd. Subsequent reconciles use the stored ResolvedImage directly rather than re-resolving the image from the manifest, ensuring that Control Plane upgrades cannot inadvertently change the image of an existing running database.

Changes

  • Updated ReconcileInstanceSpec in orchestrator/swarm/orchestrator.go to preserve the existing ResolvedImage when PgEdgeVersion remains unchanged. This allows resolveInstanceImages() to take the fast path and avoid manifest lookups during normal reconciles. When the version changes, the stale image pin is cleared and re-resolved from the manifest.
  • Added resolveServiceImage() and ReconcileServiceInstanceSpec() in orchestrator/swarm/orchestrator.go, extending image pinning behavior to MCP, RAG, and PostgREST service instances using the same pattern as PostgreSQL instances.
  • Added ReconcileServiceInstanceSpec() to the Orchestrator interface in database/orchestrator.go.
  • Updated database/service.go to invoke s.orchestrator.ReconcileServiceInstanceSpec() before persisting service instance specifications.
  • Updated database/reconcile_versions.go so that when the instance monitor detects a version change and updates PgEdgeVersion, it also clears ResolvedImage. This ensures the next reconcile derives the correct image for the new version and prevents no-op updates from reverting externally upgraded instances back to an older image.
  • Added a no-op implementation of ReconcileServiceInstanceSpec() in orchestrator/systemd/orchestrator.go to maintain interface compatibility.

Testing

Verification:

Test Scenarios

1. No Image Override

  • Create a database without specifying an image override
    create_db_with_no_image.json
    .
  • Verify the manifest image (17.9-spock5.0.6-standard-2) is selected and stored in ResolvedImage.
  • Perform a no-op update and confirm the image remains unchanged.

2. User Image Override

  • Create a database with a custom image (for example, my-custom-image)
    create_db.json
    .
  • Verify the custom image is deployed directly.
  • Confirm validation warnings are returned.
  • Verify ResolvedImage is not persisted.

3. External Upgrade Followed by No-Op Update

  • Perform an external image upgrade (for example, from 17.9 to 17.10).
docker service update \
  --image ghcr.io/pgedge/pgedge-postgres:17.10-spock5.0.8-standard-1 \
  --no-healthcheck \
  postgres-storefront-no-image-n1-689qacsi

postgres-storefront-no-image-n1-689qacsi
overall progress: 1 out of 1 tasks 
1/1: running   [==================================================>] 
verify: Service postgres-storefront-no-image-n1-689qacsi converged 

docker service ls      
ID             NAME                                       MODE         REPLICAS   IMAGE                                                        PORTS
znrg8hhoo28c   postgres-storefront-no-image-n1-9ptayhma   replicated   1/1        ghcr.io/pgedge/pgedge-postgres:17.9-spock5.0.6-standard-2    
m7jks7yktzta   postgres-storefront-no-image-n1-689qacsi   replicated   1/1        ghcr.io/pgedge/pgedge-postgres:17.10-spock5.0.8-standard-1   
l7i7y5saauzs   postgres-storefront-no-image-n1-ant97dj4   replicated   1/1        ghcr.io/pgedge/pgedge-postgres:17.9-spock5.0.6-standard-2 
  • Trigger a no-op update through the Control Plane.
  • Verify the instance remains on 17.10 and is not reverted to the previous image.
docker service ls --filter label=pgedge.database.id=storefront-no-image \
  --format '{{.Name}} {{.Image}}'

postgres-storefront-no-image-n1-9ptayhma ghcr.io/pgedge/pgedge-postgres:17.9-spock5.0.6-standard-2
postgres-storefront-no-image-n1-689qacsi ghcr.io/pgedge/pgedge-postgres:17.10-spock5.0.8-standard-1
postgres-storefront-no-image-n1-ant97dj4 ghcr.io/pgedge/pgedge-postgres:17.9-spock5.0.6-standard-2```

## Checklist
- [x] Tests added 

[PLAT-600](https://pgedge.atlassian.net/browse/PLAT-600)

[PLAT-600]: https://pgedge.atlassian.net/browse/PLAT-600?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ

@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a6a08c13-9bcf-4036-823c-ae171417197f

📥 Commits

Reviewing files that changed from the base of the PR and between 09a808a and 82c5ac8.

📒 Files selected for processing (7)
  • server/internal/database/orchestrator.go
  • server/internal/database/reconcile_versions.go
  • server/internal/database/service.go
  • server/internal/orchestrator/swarm/orchestrator.go
  • server/internal/orchestrator/swarm/reconcile_instance_spec_test.go
  • server/internal/orchestrator/swarm/resolve_service_image_test.go
  • server/internal/orchestrator/systemd/orchestrator.go

📝 Walkthrough

Walkthrough

This PR implements container image reconciliation and pinning for service instances. The orchestrator interface gains a new ReconcileServiceInstanceSpec hook that runs during service instance reconciliation to resolve container images from manifests or cached pins. The Swarm orchestrator carries forward pinned images when service versions are unchanged and re-derives them from manifests when versions change; service image resolution uses precedence: user override, stored pin, then manifest lookup with lazy backfill.

Changes

Service Instance Spec Reconciliation with Image Pinning

Layer / File(s) Summary
Interface Contract and Database Service Integration
server/internal/database/orchestrator.go, server/internal/database/service.go, server/internal/database/reconcile_versions.go
Adds ReconcileServiceInstanceSpec(old, new *ServiceInstanceSpec) error to the Orchestrator interface; integrates the orchestrator call into Service.ReconcileServiceInstanceSpec with error handling; clears stale Swarm.ResolvedImage when database instances are updated with new PgEdge versions.
Instance Spec Reconciliation (Swarm) — Implementation and Tests
server/internal/orchestrator/swarm/orchestrator.go, server/internal/orchestrator/swarm/reconcile_instance_spec_test.go
ReconcileInstanceSpec now preserves Swarm.ResolvedImage when PgEdge version is unchanged (avoiding re-derivation from manifest) and clears it when the version changes (forcing fresh manifest lookup). Tests validate first-creation pinning, version-stability carry-forward, missing-pin backfill, version-change refresh, and user image override preservation.
Service Image Resolution and Reconciliation (Swarm) — Implementation and Tests
server/internal/orchestrator/swarm/orchestrator.go, server/internal/orchestrator/swarm/resolve_service_image_test.go
resolveServiceImage implements precedence: user override (Swarm.Image), stored pin (Swarm.ResolvedImage), then manifest lookup with lazy backfill. ReconcileServiceInstanceSpec carries forward or clears pinned service images based on version changes. MCP and RAG resource generators switch to use resolveServiceImage for fast-path pinned-image access. Tests cover override precedence, lazy backfill, version stability, manifest refresh on version change, and error handling.
Systemd Orchestrator Stub Implementation
server/internal/orchestrator/systemd/orchestrator.go
Adds no-op ReconcileServiceInstanceSpec method for systemd orchestrator compliance with the interface contract.

Poem

🐰 In the meadow of manifests, images now stay,
Pinned for the stable, refreshed when they change,
One swift override, or cache found to save the day—
Service specs reconciled, no needless exchange!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: pinning database images to prevent unintended changes during Control Plane upgrades.
Description check ✅ Passed The description includes all required sections: Summary, Changes, Testing, and Checklist. All core information is present with clear explanations and test scenarios.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch PLAT-600-pin-database-image-to-prevent-silent-changes-on-cp-upgrade

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codacy-production

Copy link
Copy Markdown

Up to standards ✅

🟢 Issues 2 medium

Results:
2 new issues

Category Results
Complexity 2 medium

View in Codacy

🟢 Metrics 28 complexity · 0 duplication

Metric Results
Complexity 28
Duplication 0

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

@tsivaprasad

Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant