expose Memgraph snapshot interval/retention/on-exit#4
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Memgraph takes a periodic snapshot every 300s by default — this comes from the image's /etc/memgraph/memgraph.conf (--storage-snapshot-interval-sec=300), entirely inside the Memgraph
process. The operator builds the Memgraph command line in buildMemgraphArgs but passes no snapshot-related flags, so there was no way to tune or disable this behavior through the
MemgraphCluster CRD.
In production this caused an outage: a replica accumulated ~9.3 GB of snapshots on a 10 GB volume, hit 100% disk, and Memgraph then crash-looped on startup because RocksDB couldn't
initialize:
RocksDB couldn't be initialized inside /var/lib/memgraph/settings -- IO error: No space left on device
Note the existing SnapshotSpec (snapshot.enabled/schedule/retentionCount) only controls the operator's CronJob (CREATE SNAPSHOT) and the preStop hook — it does not touch Memgraph's
internal periodic snapshots, which are the real disk-filler. There was simply no knob for them.
Fix
Expose Memgraph's snapshot flags through MemgraphConfig so they can be tuned (or disabled) per cluster
Pointers also let 0/false be distinguished from unset (important, since 0 is the "disable" value).
Usage
Testing
when set. Existing TestBuildMemgraphArgs unaffected.
Notes / follow-ups
(e.g. 3600) unless you have a specific reason to fully disable.