Add PostgreSQL HA cluster guide#8919
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
✅ Files skipped from review due to trivial changes (2)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughUpdates documentation: clarifies the deployment guide excludes database-level HA/disaster recovery and adds a new PostgreSQL HA cluster admin guide (repmgr, HAProxy, Keepalived, pgchk) plus a TOC entry and a brief high-availability subsection. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Admin
participant Keepalived
participant HAProxy
participant pgchk as pgchk (health service)
participant repmgr
participant Postgres1 as Primary
participant Postgres2 as StandbyA
participant Postgres3 as StandbyB
Admin->>Keepalived: configure VIP & priorities
Admin->>HAProxy: configure TCP frontends (5000 write, 5001 read)
Admin->>pgchk: deploy health check service
HAProxy->>pgchk: HTTP health check (8008)
pgchk->>repmgr: query node role/state
repmgr->>Postgres1: monitor primary status
repmgr->>Postgres2: monitor standbys
Note over repmgr,Postgres1: On primary failure
repmgr->>Postgres2: promote to primary
repmgr->>Keepalived: notify / changes reflected via health checks
Keepalived->>HAProxy: VIP moves to new primary host
HAProxy->>Postgres2: route write traffic to promoted node
Admin->>repmgr: re-register old primary as standby (post-repair)
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related issues
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (2)
source/administration-guide/scale/high-availability-cluster-based-deployment.rst (1)
33-35: Clarify HA vs DR scope in this redirect sentence.This reads a bit awkwardly and mixes concepts; make the scope explicit as “database-level HA/DR design is out of scope here” before linking to the PostgreSQL HA page.
Suggested minimal wording update
-Set up and maintain a high availability cluster-based deployment on your Mattermost servers. This document doesn't cover the configuration of databases in terms of -disaster recovery. For self-hosted deployments requiring database-level HA, +Set up and maintain a high availability cluster-based deployment on your Mattermost servers. This document doesn't cover database high availability or disaster recovery design. +For self-hosted deployments requiring database-level HA, see :doc:`PostgreSQL high availability cluster </administration-guide/scale/postgres-ha-cluster>`.As per coding guidelines, "Flag awkward phrasing, punctuation mistakes, tense shifts, and terminology inconsistencies in documentation."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@source/administration-guide/scale/high-availability-cluster-based-deployment.rst` around lines 33 - 35, Revise the redirect sentence that begins "For self-hosted deployments requiring database-level HA," to explicitly state that database-level HA and disaster recovery (DR) design are out of scope for this document, then follow with the existing link to the PostgreSQL page (the :doc:`PostgreSQL high availability cluster </administration-guide/scale/postgres-ha-cluster>` reference) so readers are clearly directed for DB-level HA/DR details; replace the awkward phrasing with a concise sentence such as "Database-level HA and DR design are out of scope for this document; for PostgreSQL-specific guidance, see :doc:`PostgreSQL high availability cluster </administration-guide/scale/postgres-ha-cluster>`."source/administration-guide/scale/postgres-ha-cluster.rst (1)
331-334: Tighten replication user privileges (or justify superuser explicitly).
createuser --superuser repmgris broader than necessary for many environments. Prefer least privilege by default, or add a strong warning explaining why elevated rights are required in this design.As per coding guidelines, "When reviewing ... through the lens of Veteran Vince ... flag content that is ... security-unsafe."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@source/administration-guide/scale/postgres-ha-cluster.rst` around lines 331 - 334, The current step creates a replication user with --superuser which is broader than needed; replace it with a least-privilege creation and explicit grants (create the user with REPLICATION and LOGIN, set a password, and grant only the specific DB ownership/SEARCH_PATH needed) and update the ALTER USER repmgr SET search_path line to match the non-superuser account; if you must keep --superuser for this design, add a prominent security justification/warning explaining why elevated rights are required and the risks. Refer to the existing commands (createuser --superuser repmgr and ALTER USER repmgr SET search_path TO repmgr, public) when making the change so the docs show the least-privilege variant and the optional justified-superuser warning.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@source/administration-guide/scale/postgres-ha-cluster.rst`:
- Around line 295-305: Change the documented pg_hba entries so that
``scram-sha-256`` is shown as the default production authentication method
instead of ``trust`` (update the example host lines and note text that currently
reference ``trust``), and move the ``trust`` entries into a clearly labeled
"lab/testing only" fallback section; also wrap the lab-only fallback in a
warning admonition to highlight the security risk (refer to the example host
entries for repmgr and replication and the existing note block that mentions
``.pgpass`` to update placement and wording accordingly).
- Around line 520-534: The Keepalived snippet hardcodes "interface eth0" which
will fail on systems with predictable NIC names; update the docs around creating
/etc/keepalived/keepalived.conf and the vrrp_instance VI_1 block to require
selecting the correct network interface (do not assume eth0), add a brief step
telling the user to identify the active interface (e.g., via ip link or ip addr)
and substitute that interface name into the "interface" field of the
vrrp_instance configuration before applying Keepalived, and include a short
example note instructing readers to replace <CLUSTER_VIP> and the placeholder
interface with their actual values.
- Around line 483-490: The doc currently references a non-existent repo for
pgchk.py; replace that broken external link by either embedding the full
pgchk.py script directly in this document (with a short explanation of its
purpose and usage) and instructing readers to save it to /usr/local/bin/pgchk.py
and chmod +x, or point to a verified alternate source and include a pinned
commit/tag and SHA256 checksum plus a one-line curl/wget + sha256sum
verification step; update the text around the symbol pgchk.py to include the
chosen solution and add verification instructions so admins can validate the
file before placing it at /usr/local/bin/pgchk.py.
---
Nitpick comments:
In
`@source/administration-guide/scale/high-availability-cluster-based-deployment.rst`:
- Around line 33-35: Revise the redirect sentence that begins "For self-hosted
deployments requiring database-level HA," to explicitly state that
database-level HA and disaster recovery (DR) design are out of scope for this
document, then follow with the existing link to the PostgreSQL page (the
:doc:`PostgreSQL high availability cluster
</administration-guide/scale/postgres-ha-cluster>` reference) so readers are
clearly directed for DB-level HA/DR details; replace the awkward phrasing with a
concise sentence such as "Database-level HA and DR design are out of scope for
this document; for PostgreSQL-specific guidance, see :doc:`PostgreSQL high
availability cluster </administration-guide/scale/postgres-ha-cluster>`."
In `@source/administration-guide/scale/postgres-ha-cluster.rst`:
- Around line 331-334: The current step creates a replication user with
--superuser which is broader than needed; replace it with a least-privilege
creation and explicit grants (create the user with REPLICATION and LOGIN, set a
password, and grant only the specific DB ownership/SEARCH_PATH needed) and
update the ALTER USER repmgr SET search_path line to match the non-superuser
account; if you must keep --superuser for this design, add a prominent security
justification/warning explaining why elevated rights are required and the risks.
Refer to the existing commands (createuser --superuser repmgr and ALTER USER
repmgr SET search_path TO repmgr, public) when making the change so the docs
show the least-privilege variant and the optional justified-superuser warning.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 4e4d9ba5-82a6-486e-8263-05d6d2475d02
📒 Files selected for processing (3)
source/administration-guide/scale/high-availability-cluster-based-deployment.rstsource/administration-guide/scale/postgres-ha-cluster.rstsource/administration-guide/scale/scaling-for-enterprise.rst
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
0f144d7 to
1e7d060
Compare
- pg_hba.conf: default to scram-sha-256; trust moved to lab-only warning - pgchk.py: embed full script inline instead of linking to external repo - Keepalived: add ip link step to identify interface before hardcoding - repmgr createuser: add note explaining why superuser is required - high-availability-cluster-based-deployment.rst: clarify HA vs DR scope Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
@esethna @wiersgallak - Would one of you mind adding the |
|
Newest code from sadohert has been published to preview environment for Git SHA faa3b49 |
|
Done @sadohert |
Summary
source/administration-guide/scale/postgres-ha-cluster.rst— a new guide for deploying a 3-node PostgreSQL HA cluster using repmgr, HAProxy, and Keepalivedscaling-for-enterprise.rsttoctree and adds a prose entryhigh-availability-cluster-based-deployment.rstto cross-link to the new pageWhy
The existing HA doc covers Mattermost app-layer clustering but explicitly does not cover database HA. Many self-hosted customers on bare-metal or VMs need a database-level HA guide. This fills that gap.
Page structure
The guide follows Mattermost documentation conventions:
Validation
All setup steps and checkpoint commands have been validated on Ubuntu 24.04 LTS, PostgreSQL 17, repmgr 5.5, HAProxy 2.8.
Related
🤖 Generated with Claude Code