Add msgvault backup: init, create, list, verify, and restore commands (#339) by wesm · Pull Request #428 · kenn-io/msgvault

wesm · 2026-07-04T12:10:13Z

Implements the backup and restore side of #339: msgvault backup init / create / list / verify / restore against an app-owned backup repository. Restore is included and proves itself before reporting success — see "Restore" below.

Docs: usage guide · format reference

How this maps to the #339 proposal

Kept as proposed:

msgvault owns the correctness boundary; retention, remote storage, and scheduling stay with restic/rclone/cron. The repository is a plain directory of write-once, immutable files — designed to be what those tools sync.
--include-config / --include-tokens flags, off by default. Added on top: capturing either into an unencrypted repository requires an explicit --allow-plaintext-secrets, so credentials can't leak by default.
Rebuildable derived state (vector db, analytics, caches) is excluded and the exclusion list is recorded in the manifest.
The manifest is written last, after all content is durable. A crash at any point leaves either a complete snapshot or no snapshot.
A missing or corrupt attachment fails the backup loudly: every attachment is re-read and re-hashed from disk at capture time, read from the storage path the database records for it (importers may namespace paths beyond the loose <aa>/<hash> layout).
The manifest records the msgvault version, backup format version, creation timestamp, capture options, attachment counts (content hashes live in dedicated list objects), message/conversation/account/label counts, and the exclusion list.

Where the design goes further than the proposal:

Snapshot repository instead of one backup directory. A repository holds many snapshots with content-addressed dedup: the database is captured at 4 KB page granularity (only changed pages stored after the first snapshot), and unchanged attachments cost nothing. Nightly backups scale with churn, not archive size, and you keep history rather than one restore point.
Consistency without VACUUM INTO. Instead of a full-copy snapshot, the backup checkpoints the WAL and pins a read transaction on its own connection, coordinating with a running daemon through a short freeze window (loopback API, watchdog-protected). No double-disk requirement, no downtime; the daemon resumes writes while pages are scanned. backup create refuses to run unfrozen against a live daemon.
Verification is stronger than existence + hash checks. verify (full mode) re-derives every referenced blob's SHA-256 from its bytes, proves the page map covers the database exactly, and cross-checks manifest counts. Failures name the damaged blob and the pack file holding it. --quick does structural checks; --all covers every snapshot, reading shared content once.
Versioning for the long haul. Compatibility is enforced at three levels (repository min_reader_version, per-object magic+version on every binary format, per-snapshot min_reader_version), so an older msgvault refuses a newer repository or snapshot with an explicit upgrade message instead of misreading it. Manifests are deterministic JSON; snapshot IDs are content-derived from the manifest bytes and re-checked on load, so a renamed or forged manifest is rejected.
Attachments are packed. Content goes into ~32 MB sealed pack files with per-blob zstd (skipped when content doesn't compress), so a 200k-file attachment tree syncs off-site as a few hundred immutable files. This is backup-repository-internal only — the live archive layout is unchanged.

Restore

backup restore --target DIR [SNAPSHOT] [--overwrite] materializes any snapshot into a usable archive home and does not trust itself:

Every database page is checked against the snapshot's page-hash map as it is written; every blob read re-derives its SHA-256 identity.
After materialization, the restored database must pass PRAGMA integrity_check and reproduce the manifest's recorded stats through exactly the queries capture ran inside the freeze; any mismatch fails the restore.
The end-to-end test asserts the restored database is byte-identical to the live file at capture time, for the latest snapshot and for a parent restored from an incremental chain.
Attachments land at the storage paths the restored database records; extras (deletions manifests, config, tokens) land at their captured relative paths and file modes, with traversal-free path validation.
A non-empty target is refused without --overwrite; overwrite merges but always removes the database and stale -wal/-shm sidecars first. Restoring into a running daemon's archive home is refused outright. Old backups go through normal schema migration on first open.

Performance

Page scanning hashes on all CPUs while reads stay strictly sequential (safe for NAS/spinning disks with no flag). Verify, restore, attachment capture, and changed-page packing fan out workers with a --jobs flag; --jobs 1 serializes reads for spinning-disk repositories. Progress renders as in-place bars on a TTY and plain lines through pipes, with per-stage elapsed time that keeps counting through long quiet stages such as the restore proof's integrity_check.

Changes

New packages: internal/pack (sealed pack container) and internal/backup (repository, locking, incremental page capture, manifests, verify, restore).
New CLI: msgvault backup init | create | list | verify | restore; [backup] config section (repo, zstd_level); --jobs on create/verify/restore; --progress rendering.
Daemon: loopback freeze endpoints with a 60s watchdog; the CLI proxy admits only backup create from the backup group — verify and restore always run locally.
Docs: user guide (docs/usage/backup.md) and on-disk format reference (docs/architecture/backup-format.md), including the roadmap for verify --restore-check, encryption, and retention.
TUI: launching against an archive with no usable analytics cache (a freshly restored one, for example) prints a notice and shows it while aggregate views load, instead of a silent slow first screen.

Sequenced later (format hooks in place)

Repository encryption (age-wrapped repo key, per-object AEAD) and forget/prune retention. Until prune ships, content purged from the archive persists in earlier snapshots — documented, with the privacy implications (including SQLite free-page remnants), in the usage guide.
verify --restore-check: the full restore materialization proof against scratch space without writing a target.

🤖 Generated with Claude Code

roborev-ci · 2026-07-04T12:31:34Z

roborev: Combined Review (`89ece82`)

Medium confidence: the PR has two medium-severity issues that should be fixed before merge.

Medium

internal/backup/manifest.go:123
LoadManifest trusts the JSON body without recomputing the content-derived snapshot ID or checking it against the requested filename. Corrupted manifest metadata can therefore be accepted by list, latest, and verify.
Fix: After unmarshal, parse created_at, recompute ComputeSnapshotID, and reject if it differs from the requested ID or embedded snapshot_id.
internal/backup/frozen.go:187
MAX(size) can be NULL because attachments.size is nullable. Scanning it into int64 can make backup creation fail for otherwise valid attachment rows that have content but missing size metadata.
Fix: Use COALESCE(MAX(size), -1) or scan into sql.NullInt64 and let capture fill the actual size from the file.

Reviewers: 2 done | Synthesis: codex, 11s | Total: 15m33s

roborev-ci · 2026-07-04T13:21:13Z

roborev: Combined Review (`d86b7ad`)

Medium severity findings:

internal/backup/attachments.go:109: Attachment capture reconstructs paths as <hash[:2]>/<hash> instead of using the DB storage_path / thumbnail_path. Existing SyncTech MMS imports store files under paths like synctech-sms/<prefix>/<hash>, so backup create can fail to read otherwise valid attachments.
Fix: Carry the relative storage path from AttachmentRefs into capture, read that path after validation, and keep the hash-derived path only as a legacy fallback. Add a test with a prefixed storage_path.
internal/backup/fsutil.go:49: If the final directory sync fails after os.Rename, the object is left visible even though writeFileAtomic returns an error. Later operations trust visible manifests and indexes, so a retry can build on a file already reported as not durable; a subsequent crash can leave a child manifest or dedup reference pointing at an index/manifest that disappears.
Fix: On post-rename SyncDir failure, remove the published final file best-effort and sync the directory again, or otherwise mark successfully published objects so later loads ignore files from failed publishes.
internal/backup/manifest.go:117: LoadManifest trusts the manifest JSON loaded from snapshots/<id>.mvmanifest without verifying that the file name, snapshot_id, and recomputed content-derived ID still match. A tampered backup repository could edit an existing manifest to point to older valid blobs while preserving the filename and snapshot_id, causing backup verify <snapshot> to verify the attacker-selected graph and report success for a snapshot ID that no longer represents its original contents.
Fix: On manifest load, validate the requested ID equals m.SnapshotID, recompute ComputeSnapshotID from m.CreatedAt and the loaded manifest, and reject mismatches before verification or parent-chain traversal. Consider checking a longer digest than the current 8 hex chars for stronger tamper resistance.
internal/backup/pagehash.go:181: ApplyHashDelta allocates d.PageCount * pageHashSize bytes based on an untrusted page count from repository metadata after only checking integer overflow. A malicious repository can provide a tiny hash-delta object with a huge PageCount and no changed pages, causing backup verify or backup create to allocate enough memory to OOM.
Fix: Validate delta growth before allocation. Reject deltas where PageCount grows beyond base.PageCount + len(d.Pages) or appended pages are not explicitly present, and impose a sane maximum supported database/hash-map size.

Reviewers: 2 done | Synthesis: codex, 15s | Total: 19m10s

Two roborev findings on PR #428, both medium: LoadManifest trusted the JSON body as-is, so corrupted, hand-edited, or renamed manifest files were accepted by list, latest, and verify. It now recomputes the content-derived snapshot ID and rejects any manifest whose computed ID does not match both the requested filename and the embedded snapshot_id. A side effect is that forged parent cycles can no longer load at all (a cycle would need SHA-256 fixed-point IDs), so the mapChain cycle tests are replaced by rejected-at-load coverage and the depth-limit chain is built with genuine WriteManifest IDs; the verify geometry test now models a buggy writer with a self-consistent ID, the only way wrong geometry can still reach full verify. AttachmentRefs scanned MAX(size) into int64, but attachments.size is nullable: a content-bearing hash whose rows all lack size metadata made backup create fail. COALESCE to the same -1 sentinel thumbnails already use; capture resolves the real size from the file. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

roborev-ci · 2026-07-04T13:34:12Z

roborev: Combined Review (`7dc29e4`)

Summary: Two medium-severity correctness/concurrency issues need attention; no high or critical findings were reported.

Medium

internal/backup/verify.go:336
Full verification validates the hash-map chain and page-map chain independently, then checks only that page-map blobs exist and page ranges cover the manifest count. It does not validate pm.PageSize against the manifest or confirm that each mapped page’s bytes hash to the materialized page-hash-map entry. A checksum-valid but semantically incorrect page map could pass backup verify while restoring incorrect database bytes.
Fix: Return the materialized hash map from checkHashMapChain, validate pm.PageSize == m.DB.PageSize, and in full mode walk page-map runs to bounds-check blob offsets and compare each mapped page’s PageHash against the hash map.
internal/backup/lock.go:132
Stale lock reaping performs Stat and later Remove by path without serializing or rechecking ownership. If two processes recover the same stale exclusive.json, one can remove the other’s freshly planted lock between stale check and remove, allowing both operations to proceed as exclusive holders.
Fix: Serialize lock-directory mutation with a guard/advisory lock, or atomically revalidate the specific lock identity immediately before unlinking and planting the replacement.

Reviewers: 2 done | Synthesis: codex, 11s | Total: 14m29s

roborev-ci · 2026-07-04T14:17:31Z

roborev: Combined Review (`620a2e5`)

High-level verdict: changes need fixes before merge due to attachment backup correctness and snapshot consistency risks.

High

internal/backup/attachments.go:252
Attachment capture hardcodes <hash[:2]>/<hash> instead of using attachments.storage_path. Existing SyncTech SMS MMS imports store files under synctech-sms/<hash[:2]>/<hash>, so backups for those archives will fail to read valid local attachments.
Fix: Include storage_path/thumbnail_path in the frozen attachment refs and read from the recorded path, validating the hash from the bytes read.

Medium

internal/backup/frozen.go:62
The daemon operation gate is released immediately after pinning the SQLite snapshot, but attachment files are read much later. A concurrent gated purge such as remove-account can delete files still referenced by the pinned DB snapshot, causing the backup to fail or no longer represent the frozen point.
Fix: Keep a backup file-read guard active through attachment capture, or make attachment deletion defer/skip while a backup session is active.

Reviewers: 2 done | Synthesis: codex, 8s | Total: 27m42s

roborev-ci · 2026-07-04T16:25:30Z

roborev: Combined Review (`e78858e`)

High-level verdict: the PR has one High-risk restore correctness issue and three Medium issues around backup consistency, WAL pressure, and extras path safety.

High

internal/backup/restore.go:151
--overwrite accepts a non-empty target without clearing SQLite sidecars, so an existing msgvault.db-wal or msgvault.db-shm can remain next to the restored msgvault.db. The restore proof opens the DB with immutable=1, which ignores those sidecars, so restore can report success for a target that a normal msgvault/SQLite open may read differently.

Fix: Remove msgvault.db-wal and msgvault.db-shm before proofing/reporting success, or restore into a clean temp directory and replace the target contents.

Medium

internal/backup/frozen.go:62
The daemon operation gate is released immediately after pinning the SQLite snapshot, but attachment files are captured later from refs in that snapshot. A concurrent purge/remove-account can delete a file that is still referenced by the frozen DB image before CaptureAttachments reads it, causing backups of otherwise valid snapshots to fail.

Fix: Keep a daemon-side guard that prevents attachment deletion until capture finishes, or make purge paths defer/remain blocked while a backup is using frozen attachment refs.
internal/backup/create.go:84
The pinned SQLite read transaction is deferred until the end of Create, so it stays open through attachment capture, extras capture, pack sealing, index writing, and manifest writing. During long backups this prevents WAL checkpointing and can let the live WAL grow until the whole backup completes.

Fix: Close the frozen session as soon as ScanPages and storePageBlobs finish reading the database bytes; use a separate guard for attachment deletion if needed.
internal/backup/restore.go:536
restoreExtrasEntry allows any local relative extras path, and Restore writes extras after restoring the verified msgvault.db and attachments/ tree. An attacker who can tamper with an off-site/synced backup repository can add a valid manifest/index/pack whose extras tree names msgvault.db or attachments/<hash[:2]>/<hash>. Restore then overwrites previously verified files and can still report success, since the final DB proof only checks SQLite integrity/stats and attachments are not re-hashed after extras are written.

Fix: Restrict extras restore paths to the capture-produced roots (deletions/, config.toml, tokens/, client_secret*.json) and explicitly reject msgvault.db, attachments/, duplicates, and overlapping paths. Alternatively, restore extras before core files and verify the final completed tree.

Reviewers: 2 done | Synthesis: codex, 15s | Total: 16m54s

Squashed branch commits: - fix: verify progress bar tracks blobs, not snapshots - feat: backup restore with layered self-proof - fix: address roborev findings on delta safety, storage paths, and progress Generated with Codex (GPT-5) Co-authored-by: Claude Fable 5 <noreply@anthropic.com>

Restore now places attachment blobs at database-recorded storage paths, but version-1 readers (earlier builds of this branch) placed every blob at the canonical <aa>/<hash> derivation. A snapshot whose database records namespaced paths (synctech-sms style) would restore "successfully" under such a reader while the database points at files that do not exist. Manifests now record format/min_reader version 2 when any attachment row's storage or thumbnail path deviates from the canonical derivation, so old readers refuse them explicitly via the existing LoadManifest gate. The check inspects every row rather than the captured refs, because MIN(storage_path) can select a canonical path while another row records the same hash at a namespaced one. All-canonical snapshots keep version 1 and remain readable by older code; SupportedReaderVersion rises to 2. Addresses roborev review 27340. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

roborev-ci · 2026-07-04T17:07:41Z

roborev: Combined Review (`060a73c`)

Medium-risk issues found; no Critical or High findings reported.

Medium

internal/backup/attachments.go:168: Attachment capture defaults to one worker per CPU, and each worker later reads and retains an entire attachment file while hashing/compressing it. A library with several large videos can hold many whole files in memory at once and OOM during a default backup create.

Fix: Add a byte-weighted memory limit or stream hashing/compression, and consider a much smaller default parallelism for attachment capture.
internal/backup/restore.go:594: restoreExtrasEntry only checks that an extras path is local before writing it under the restore target. A tampered backup repository can include extras such as attachments/<aa>/<hash> or msgvault.db and overwrite already-restored archive files, allowing restore to report success while materializing attacker-controlled content.

Fix: Reject extras paths that overlap managed restore outputs, including msgvault.db, SQLite sidecars, and anything under attachments/, or restore extras before DB/attachments while still reserving those paths explicitly.
internal/backup/manifest.go:91: Snapshot IDs use only the first 4 bytes of the manifest SHA-256. A repository writer can feasibly brute-force a 32-bit target, replace a known <snapshot-id>.mvmanifest, and have list, verify, or restore accept the forged manifest under the same ID.

Fix: Include a substantially longer digest in the snapshot ID, at least 128 bits, or add a signed/MACed manifest if repository tamper resistance is intended.

Reviewers: 2 done | Synthesis: codex, 12s | Total: 17m19s

Addresses roborev reviews 27340 and 27341 (five Medium findings): - Attachment capture admits work through a byte gate (1 GiB budget) in addition to the worker-count window: a per-CPU pool holding whole files could OOM a default backup of a video-heavy library. Oversized files are admitted alone, and the gate unblocks on failure so an aborted capture cannot deadlock the dispatcher. - restoreExtrasEntry rejects entries naming the restored database, its SQLite sidecars, or the attachments tree (case-insensitively, for macOS filesystems): a tampered extras tree could otherwise overwrite already-proven restore outputs after the proof ran on them. - Snapshot IDs now embed 128 bits of the manifest SHA-256 instead of 32: LoadManifest's recompute check is what rejects forged manifests, and a 32-bit suffix is brute-forceable. Full tamper resistance arrives with repository encryption; the longer digest closes the cheap forgery in the meantime. Pre-release snapshots with short IDs are no longer readable — recreate dev repositories. - Full verify checks the materialized page map's page size against the manifest, mirroring the hash-map geometry check; the mismatch was previously only caught at restore time. - writeRun's blob bounds check is subtraction-based: a decoded page-map run with a huge BlobOffset could wrap the addition-based comparison and panic on the slice instead of failing the restore. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

roborev-ci · 2026-07-04T17:46:33Z

roborev: Combined Review (`b8bbeda`)

High-level verdict: Changes need attention before merge due to plaintext backup exposure risk.

High

internal/backup/create.go:133
Create builds the pack appender with a nil crypter while repositories are initialized with encryption = "none" and encrypted repos are refused. This means the new backup path writes message database pages and attachments into readable backup packs, while the docs recommend syncing that repository off-site. Anyone with read access to the synced repository, such as a cloud-sync provider or compromised backup destination, could restore or parse the backup and recover archived messages and attachments.

Suggested remediation: implement and require authenticated repository encryption before recommending untrusted/off-site sync, or clearly gate/document plaintext repositories as trusted-storage-only. At minimum, avoid describing plaintext repositories as safe for off-site sync without a confidentiality warning.

Reviewers: 2 done | Synthesis: codex, 7s | Total: 17m42s

On a first backup every page is dirty, so create spends most of its wall clock compressing and writing ~all page content into packs — a phase that had no progress stage of its own (the scan bar sat frozen at its last tick for minutes, looking hung) and ran zstd on a single core (observed 85% CPU / 383 MB/s against an 8-core machine on a 66 GB archive). storePageBlobs now reports a dedicated "pack" stage (skipped entirely when nothing is dirty) and fans read+hash+compress out to --jobs workers with a strictly in-order collector, so pack contents and the page-map delta are byte-identical to a serial run. --jobs 1 keeps reads strictly serial for archives on spinning disks or NAS shares, matching the attachment-capture contract. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

The restore proof runs PRAGMA integrity_check, which reads the entire restored database inside SQLite with no events to report — on a 66 GB archive the proof bar sat at 0% for minutes looking hung. Stage lines also gave no sense of how long each phase took. Every progress line (TTY and plain) now carries the stage's elapsed wall-clock time once it passes one second, and final lines record the stage's total duration. An idle ticker re-renders an open TTY line when no event has arrived for a throttle interval, so silent stages visibly count up instead of freezing. The proof stage also reports a sub-step: integrity_check done (1/2), stats comparison done (2/2). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

roborev-ci · 2026-07-04T18:40:23Z

roborev: Combined Review (`940d5fe`)

No issues found.

Reviewers: 2 done | Synthesis: codex | Total: 20m3s

Launching the TUI against an archive with no analytics cache (e.g. a freshly restored backup) left the screen blank for minutes with only a small spinner: in the default engine="auto" mode the daemon silently falls back to live SQL, and the first aggregate query over a large database can take a long time. Nothing told the user why. Now the tui command asks the daemon for cache stats before launching (best-effort, 5s timeout) and, when the cache is missing, prints a notice recommending 'msgvault build-cache' and shows the same message on the TUI info line while aggregate views load, in both email and texts modes.

…sfires Review findings from jobs 27346, 27350, and 27352: - restoreExtrasEntry validated the raw extras path but wrote through filepath.Join, which cleans ".." away — a tampered tree entry like safe/../msgvault.db passed the reserved-name check yet landed on the restored database. Paths are now cleaned before validation. - The progress renderer's throttle discarded events before recording them, so the idle ticker kept repainting stale counts (a quick proof 1/2 inside the throttle window stayed rendered as 0/2 through the long stats comparison). lastEvent is now recorded before throttling. - The TUI analytics-cache notice asked the daemon's /cli/cache-stats, which runs DuckDB scans over the whole cache when one exists (slowing every launch) and reports missing Parquet files for PostgreSQL backends where build-cache does not even apply. The notice now mirrors the daemon's own engine decision from shared local config at stat/glob cost and stays silent for postgres, engine="sql", and configured remotes.

…uide The TUI guide claimed the daemon builds or updates the Parquet cache "when aggregate views need it", but aggregate views never trigger a build: the engine is chosen once at daemon startup, stale caches are refreshed after scheduled syncs, and startup builds happen only under engine="duckdb". This is exactly the surprise behind the blank first view on a freshly restored archive, so document the actual triggers and the new launch notice pointing at 'msgvault build-cache'.

Review findings from jobs 27353 and 27354, plus the nix CI failure: - The TUI cache notice inferred live-SQL fallback from local disk state, but the daemon chooses its engine once at startup: after a cache build the notice went silent while the running daemon stayed on live SQL, and the staleness probe opened the SQLite archive directly (checkpointing the WAL on close) on every launch. The daemon now records its startup choice (duckdb, sql, sql-fallback, postgres) and reports it as analytics_engine on /health; the notice keys off that authoritative O(1) signal, works for configured remotes, and tells the user a daemon restart is needed after build-cache. - docs: auto_build_cache only controls the startup build under engine="duckdb" — scheduled-sync refreshes ignore it. Corrected the tui.md sample comment and the storage.md claim that the daemon builds the cache "when aggregate views need it". - nix: update vendorHash for this branch's dependency changes (verified with a local nix build).

roborev-ci · 2026-07-04T19:19:30Z

roborev: Combined Review (`f848d0d`)

Summary verdict: one medium-severity restore-safety issue remains.

Medium

internal/backup/verify.go:620 — Full backup verify only checks that page-map blob IDs exist and can be read; it does not validate each page-map run’s BlobOffset/length bounds or that the mapped page bytes match the materialized page-hash map. A self-consistent bad snapshot can verify cleanly but later fail restore in writeRun.
- Fix: In full verify, validate page-map runs against the raw page blobs and hash map, reusing or extracting the restore-side bounds and page-hash checks without writing files.

Reviewers: 2 done | Synthesis: codex, 7s | Total: 21m27s

The Windows CI lane failed across internal/backup and internal/config: - restoredDBDSN passed the OS path straight into a file: URI, so a Windows drive-letter path full of backslashes was percent-encoded and read by SQLite's URI parser as an authority ("invalid uri authority: C:%5CUsers..."), failing every restore proof. The path is now slash-converted and slash-rooted (file:///C:/dir/msgvault.db). - Extras "not a regular file" errors printed the OS-separator relative path while tree entries and tests use the canonical slash form. - TestRestoreTargetPathWithURISyntax used '?' in a directory name, which Windows filenames cannot contain; the Windows variant keeps '#' and space, and a new unit test pins the drive-letter DSN shape. - TestLoadWithBackupConfig hardcoded "/mnt/...", which is not absolute on Windows and was resolved relative to the home directory; the fixture now substitutes a platform-absolute path.

roborev-ci · 2026-07-04T19:51:16Z

roborev: Combined Review (`8d96010`)

Restore has one medium-severity path handling bug that should be fixed before merge.

Medium

internal/backup/restore.go:472: Relative restore targets are converted into root-relative SQLite URIs. For --target restore, restoreDBDSN("restore/msgvault.db") becomes file:///restore/msgvault.db?..., so the proof/attachment path query opens the wrong database and the restore fails.
- Fix: Resolve non-drive-letter relative paths with filepath.Abs before building the file: URI, or make TargetDir absolute at the start of Restore; add a relative-target restore test.

Reviewers: 2 done | Synthesis: codex, 6s | Total: 16m4s

…s lane Review 27358 (Medium): restoredDBDSN's Windows fix slash-rooted every non-rooted path, so a relative --target like "restore-out" had its database proven at /restore-out/msgvault.db instead of the materialized file. The path is now made absolute before URI construction, with a relative-target restore test and DSN unit tests (the drive-letter shape check is Windows-only since filepath.Abs treats drive paths as relative elsewhere). Also the last Windows CI failure: the extras mode-preservation assert expects the captured 0640 to round-trip, but Windows has no POSIX permission bits and Stat reports 0666 for any writable file, so the exact-mode check is now POSIX-only.

roborev-ci · 2026-07-04T20:10:49Z

roborev: Combined Review (`92e30d2`)

Medium-risk issue found; no Critical or High findings.

Medium

cmd/msgvault/cmd/backup.go:364 — backup create uses urlFromDaemonRuntime(rt) for the daemon freeze API, but freeze handlers reject non-loopback RemoteAddr. If the daemon is bound to a specific non-loopback address like 192.168.x.x, the subprocess calls the freeze endpoint through that address and receives 404, preventing backups for an otherwise valid remote/NAS bind configuration.
Suggested fix: Use a reachable loopback listener for freeze control, authenticate/relax the freeze endpoint for same-daemon subprocess calls, or reject and clearly handle specific non-loopback bind addresses for backup create.

Reviewers: 2 done | Synthesis: codex, 7s | Total: 22m33s

Combined review of 92e30d2 (Medium): backup create dials the daemon at its configured bind address, but the freeze handlers required a loopback RemoteAddr — so a daemon bound to a specific LAN IP (a NAS bound to 192.168.x.x) answered its own freeze subprocess with 404 and backups were impossible on an otherwise valid configuration. The freeze gate now accepts same-host callers: loopback, or a source address assigned to one of this machine's own interfaces, which is exactly how a same-host connection to a LAN bind arrives. The check still reads r.RemoteAddr (never forwarded-for headers) and API-key auth is still required, including for the new same-host case.

roborev-ci · 2026-07-04T20:35:54Z

roborev: Combined Review (`68a3c4b`)

Medium findings:

internal/backup/create.go:122 - Create accepts a context, but large scan, changed-page packing, and attachment capture paths do not observe cancellation. Canceling backup create during these phases can leave work running until completion or daemon termination, keeping the repo lock fresh until stale-lock timeout. Thread ctx through ScanPages, storePageBlobs, and CaptureAttachments; check cancellation before dispatch/read work and return ctx.Err() after draining workers so cleanup defers run.
internal/backup/repo.go:156 - CleanStaging trusts <repo>/staging as a real directory. If another principal can replace staging with a symlink, the next cleanup can remove entries in the symlink target as the msgvault user. Validate repo structural directories with os.Lstat before use and refuse or replace symlinks; for staging cleanup, remove and recreate the staging path itself or otherwise operate only on a verified real directory under the repo root.

Reviewers: 2 done | Synthesis: codex, 9s | Total: 25m7s

…aging Combined review of 68a3c4b (two Mediums): - Canceling backup create during the scan, changed-page packing, or attachment capture phases left the phase running to completion, and the lock heartbeat kept the repo locked meanwhile. ScanPages, storePageBlobs, and CaptureAttachments now take ctx: dispatchers stop on ctx.Done, collectors fail fast and drain their workers, and a post-drain ctx check prevents an early dispatcher exit from returning partial work as success. - CleanStaging trusted <repo>/staging blindly: a "staging" symlink planted by another principal would have its target's entries RemoveAll'd as the msgvault user on the next cleanup. The path is now Lstat'd and anything that is not a real directory is refused.

github-advanced-security AI found potential problems Jul 4, 2026

View reviewed changes

Comment thread internal/pack/crypter.go Fixed

wesm mentioned this pull request Jul 4, 2026

Add first-class backup and restore support for Msgvault state #339

Open

wesm force-pushed the backup-restore-design branch from 7dc29e4 to 620a2e5 Compare July 4, 2026 13:53

wesm changed the title ~~Add backup repositories: incremental, verifiable snapshots (part 1 of #339)~~ Add backup and restore: incremental snapshots with a self-proving restore (part 1 of #339) Jul 4, 2026

wesm force-pushed the backup-restore-design branch from 17be3ed to f5473cc Compare July 4, 2026 16:44

wesm and others added 2 commits July 4, 2026 13:09

wesm added 3 commits July 4, 2026 13:44

wesm changed the title ~~Add backup and restore: incremental snapshots with a self-proving restore (part 1 of #339)~~ Add msgvault backup: init, create, list, verify, and restore commands (part 1 of #339) Jul 4, 2026

wesm changed the title ~~Add msgvault backup: init, create, list, verify, and restore commands (part 1 of #339)~~ Add msgvault backup: init, create, list, verify, and restore commands (#339) Jul 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add msgvault backup: init, create, list, verify, and restore commands (#339)#428

Add msgvault backup: init, create, list, verify, and restore commands (#339)#428
wesm wants to merge 13 commits into
mainfrom
backup-restore-design

wesm commented Jul 4, 2026 •

edited

Loading

Uh oh!

Uh oh!

roborev-ci Bot commented Jul 4, 2026

Uh oh!

roborev-ci Bot commented Jul 4, 2026

Uh oh!

roborev-ci Bot commented Jul 4, 2026

Uh oh!

roborev-ci Bot commented Jul 4, 2026

Uh oh!

roborev-ci Bot commented Jul 4, 2026

Uh oh!

roborev-ci Bot commented Jul 4, 2026

Uh oh!

roborev-ci Bot commented Jul 4, 2026

Uh oh!

roborev-ci Bot commented Jul 4, 2026

Uh oh!

roborev-ci Bot commented Jul 4, 2026

Uh oh!

roborev-ci Bot commented Jul 4, 2026

Uh oh!

roborev-ci Bot commented Jul 4, 2026

Uh oh!

roborev-ci Bot commented Jul 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

wesm commented Jul 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

How this maps to the #339 proposal

Restore

Performance

Changes

Sequenced later (format hooks in place)

Uh oh!

Uh oh!

roborev-ci Bot commented Jul 4, 2026

roborev: Combined Review (89ece82)

Medium

Uh oh!

roborev-ci Bot commented Jul 4, 2026

roborev: Combined Review (d86b7ad)

Uh oh!

roborev-ci Bot commented Jul 4, 2026

roborev: Combined Review (7dc29e4)

Medium

Uh oh!

roborev-ci Bot commented Jul 4, 2026

roborev: Combined Review (620a2e5)

High

Medium

Uh oh!

roborev-ci Bot commented Jul 4, 2026

roborev: Combined Review (e78858e)

High

Medium

Uh oh!

roborev-ci Bot commented Jul 4, 2026

roborev: Combined Review (060a73c)

Medium

Uh oh!

roborev-ci Bot commented Jul 4, 2026

roborev: Combined Review (b8bbeda)

High

Uh oh!

roborev-ci Bot commented Jul 4, 2026

roborev: Combined Review (940d5fe)

Uh oh!

roborev-ci Bot commented Jul 4, 2026

roborev: Combined Review (f848d0d)

Medium

Uh oh!

roborev-ci Bot commented Jul 4, 2026

roborev: Combined Review (8d96010)

Medium

Uh oh!

roborev-ci Bot commented Jul 4, 2026

roborev: Combined Review (92e30d2)

Medium

Uh oh!

roborev-ci Bot commented Jul 4, 2026

roborev: Combined Review (68a3c4b)

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

wesm commented Jul 4, 2026 •

edited

Loading

roborev: Combined Review (`89ece82`)

roborev: Combined Review (`d86b7ad`)

roborev: Combined Review (`7dc29e4`)

roborev: Combined Review (`620a2e5`)

roborev: Combined Review (`e78858e`)

roborev: Combined Review (`060a73c`)

roborev: Combined Review (`b8bbeda`)

roborev: Combined Review (`940d5fe`)

roborev: Combined Review (`f848d0d`)

roborev: Combined Review (`8d96010`)

roborev: Combined Review (`92e30d2`)

roborev: Combined Review (`68a3c4b`)