Skip to content

expose Memgraph snapshot interval/retention/on-exit#4

Merged
nimisha-gj merged 1 commit into
mainfrom
feat/configurable-snapshot-interval
Jun 25, 2026
Merged

expose Memgraph snapshot interval/retention/on-exit#4
nimisha-gj merged 1 commit into
mainfrom
feat/configurable-snapshot-interval

Conversation

@nimisha-gj

@nimisha-gj nimisha-gj commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Problem

Memgraph takes a periodic snapshot every 300s by default — this comes from the image's /etc/memgraph/memgraph.conf (--storage-snapshot-interval-sec=300), entirely inside the Memgraph
process. The operator builds the Memgraph command line in buildMemgraphArgs but passes no snapshot-related flags, so there was no way to tune or disable this behavior through the
MemgraphCluster CRD.

In production this caused an outage: a replica accumulated ~9.3 GB of snapshots on a 10 GB volume, hit 100% disk, and Memgraph then crash-looped on startup because RocksDB couldn't
initialize:
RocksDB couldn't be initialized inside /var/lib/memgraph/settings -- IO error: No space left on device
Note the existing SnapshotSpec (snapshot.enabled/schedule/retentionCount) only controls the operator's CronJob (CREATE SNAPSHOT) and the preStop hook — it does not touch Memgraph's
internal periodic snapshots, which are the real disk-filler. There was simply no knob for them.

Fix

Expose Memgraph's snapshot flags through MemgraphConfig so they can be tuned (or disabled) per cluster

  • Fields are *int32/*bool pointers: nil means "leave Memgraph's default", so the change is fully backward compatible — existing clusters are unaffected until they set a value.
    Pointers also let 0/false be distinguished from unset (important, since 0 is the "disable" value).
  • buildMemgraphArgs appends each flag only when the field is set.
  • Regenerated CRD (config/crd/bases/...) and deepcopy via make generate manifests.

Usage

  spec:
    config:
      snapshotIntervalSec: 0      # disable periodic snapshots
      # or e.g. 3600 to snapshot hourly instead of every 5 min
      snapshotOnExit: false       # optional: also skip the shutdown snapshot

Testing

  • go build ./... clean; make generate manifests produces no further diff.
  • Added unit test TestBuildMemgraphArgsSnapshotFlags: verifies no snapshot flags are emitted when unset, interval=0 emits --storage-snapshot-interval-sec=0, and all three flags render
    when set. Existing TestBuildMemgraphArgs unaffected.

Notes / follow-ups

  • Disabling snapshots entirely (snapshotIntervalSec: 0 + snapshotOnExit: false) leaves only WAL for durability — recovery from an unclean crash is limited. Prefer a non-zero interval
    (e.g. 3600) unless you have a specific reason to fully disable.
  • The operator does not reconcile config changes onto an existing StatefulSet (only replicas/image), so applying a new value still requires a StatefulSet roll.
  • Consumers (e.g. Helm charts) must pass this value through with a hasKey/explicit check, not | default — Go templates treat 0/false as empty and would drop the disable value.

@nimisha-gj nimisha-gj self-assigned this Jun 25, 2026
@nimisha-gj nimisha-gj requested a review from irfn June 25, 2026 09:12
@nimisha-gj nimisha-gj merged commit 6c27d98 into main Jun 25, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants