Skip to content

fix: add pg8000 graceful shutdown on Cloud Run scale-down#1669

Open
panish16 wants to merge 4 commits into
bcgov:mainfrom
panish16:fix/pg8000-graceful-shutdown
Open

fix: add pg8000 graceful shutdown on Cloud Run scale-down#1669
panish16 wants to merge 4 commits into
bcgov:mainfrom
panish16:fix/pg8000-graceful-shutdown

Conversation

@panish16

Copy link
Copy Markdown
Contributor

Summary

  • Registers a SQLAlchemy connect event listener on app startup that wraps each pg8000 connection's close() to suppress pg8000.exceptions.InterfaceError during Cloud Run scale-down
  • Fixes intermittent pg8000.exceptions.InterfaceError: network error logged when Cloud Run terminates idle instances and SQLAlchemy's connection pool teardown tries to close already-severed sockets
  • strr-email and strr-pay connect via Unix socket (no Cloud SQL Connector dependency), so the listener is implemented inline using the same logic

Services fixed

  • queue_services/strr-email/src/strr_email/__init__.py
  • queue_services/strr-pay/src/strr_pay/__init__.py

Test plan

  • Confirm no new pg8000.exceptions.InterfaceError entries in GCP Log Explorer for bcrbk9-prod project after deployment
  • Existing unit and integration test suites pass

@panish16 panish16 changed the title fix: add pg8000 graceful shutdown on Cloud Run scale-down [DRAFT] - fix: add pg8000 graceful shutdown on Cloud Run scale-down May 27, 2026
@panish16 panish16 changed the title [DRAFT] - fix: add pg8000 graceful shutdown on Cloud Run scale-down fix: add pg8000 graceful shutdown on Cloud Run scale-down Jun 8, 2026
@panish16 panish16 force-pushed the fix/pg8000-graceful-shutdown branch from 09921be to 2b071bb Compare June 8, 2026 23:06
@panish16 panish16 force-pushed the fix/pg8000-graceful-shutdown branch from 2b071bb to 3780bf5 Compare June 10, 2026 21:38
@panish16 panish16 requested a review from jimmypalelil as a code owner June 10, 2026 21:38
@sonarqubecloud

Copy link
Copy Markdown

@Jacky-Pham Jacky-Pham left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment thread queue_services/strr-email/poetry.lock Outdated
@@ -1,4 +1,186 @@
# This file is automatically @generated by Poetry 2.3.4 and should not be changed by hand.
# This file is automatically @generated by Poetry 2.3.2 and should not be changed by hand.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

poetry version is mismatched. is that intended?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not intentional — the lockfile was regenerated with my local Poetry 2.3.2 when I added the cloud-sql-connector dependency. The 2.3.4 header was from the previous lockfile. The lock content (packages and hashes) is the same; only the generator version comment in line 1 differs. I can upgrade Poetry locally to 2.3.4 and regenerate if you'd prefer consistency.

Comment thread queue_services/strr-pay/poetry.lock Outdated
@@ -1,4 +1,186 @@
# This file is automatically @generated by Poetry 2.3.4 and should not be changed by hand.
# This file is automatically @generated by Poetry 2.3.2 and should not be changed by hand.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment here as above about the poetry version

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above — same unintentional regeneration with local Poetry 2.3.2. Content is correct.

Comment thread .sonarcloud.properties
# Path to sources
#sonar.sources=strr-api/src/**/*
sonar.exclusions=strr-api/migrations/**/*,strr-api/devops/**/*,strr-api/tests/**/*,testing/**/*
sonar.exclusions=strr-api/migrations/**/*,strr-api/devops/**/*,strr-api/tests/**/*,testing/**/*,queue_services/**/tests/**/*

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whats the reason for adding this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SonarCloud was picking up the new test files added under queue_services/ as source files and reporting 0% coverage on them (the main SonarCloud scan doesn't run the queue_services test suites separately). Without the exclusion, the quality gate would flag those test files as unmeasured source. This follows the same pattern as the existing testing/**/* exclusion already in the file.

@jimmypalelil

Copy link
Copy Markdown
Collaborator

hey @panish16 thanks for integrating the cloud-sql-connector lib. Its only applied to couple python proejcts. Should it not be applied to all that depend on the DB?

assert stored.meta_data["email_type"] == "STRATA_HOTEL_REGISTRATION_ACTIVE"


def test_email_no_cloud_event_data(client, queue_envelope, simple_cloud_event):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are these unit & integration tests added for in this project?

wasnt it already at 100% coverage?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests cover the new setup_pg8000_close_event_listener(db.engine) call added to create_app() in __init__.py. Without them, the 3 new lines in that function would be uncovered, dropping coverage below 100% and failing the SonarCloud gate.

@panish16

panish16 commented Jul 1, 2026

Copy link
Copy Markdown
Contributor Author

@jimmypalelil Good question! To clarify why the fix is scoped to just strr-email and strr-pay:

  • strr-api — uses psycopg2 in production (the base Config class sets postgresql+psycopg2://...). Only the Migration config class uses pg8000, and migrations are short-lived (run to completion, no idle pool). So strr-api is not affected by the Cloud Run idle scale-down / pg8000 InterfaceError issue.
  • jobs/* — batch jobs that run to completion and exit cleanly. Cloud Run terminates them normally, not via idle scale-down, so the connection pool teardown issue does not apply.
  • queue_services/batch-permit-validator — has pg8000 in its deps transitively but does not initialize a SQLAlchemy engine in create_app(); no pool to attach the listener to.

strr-email and strr-pay are the only long-running Cloud Run services that hold a pg8000 connection pool while idling between queue messages, which is exactly where the Cloud Run scale-down kills the socket before SQLAlchemy can close it cleanly.

Fixes version mismatch flagged in PR review — lockfiles were previously
regenerated with local Poetry 2.3.2 instead of the expected newer version.
@sonarqubecloud

sonarqubecloud Bot commented Jul 1, 2026

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants