Add msgvault backup: init, create, list, verify, and restore commands (#339)#428
Add msgvault backup: init, create, list, verify, and restore commands (#339)#428wesm wants to merge 13 commits into
Conversation
roborev: Combined Review (
|
roborev: Combined Review (
|
Two roborev findings on PR #428, both medium: LoadManifest trusted the JSON body as-is, so corrupted, hand-edited, or renamed manifest files were accepted by list, latest, and verify. It now recomputes the content-derived snapshot ID and rejects any manifest whose computed ID does not match both the requested filename and the embedded snapshot_id. A side effect is that forged parent cycles can no longer load at all (a cycle would need SHA-256 fixed-point IDs), so the mapChain cycle tests are replaced by rejected-at-load coverage and the depth-limit chain is built with genuine WriteManifest IDs; the verify geometry test now models a buggy writer with a self-consistent ID, the only way wrong geometry can still reach full verify. AttachmentRefs scanned MAX(size) into int64, but attachments.size is nullable: a content-bearing hash whose rows all lack size metadata made backup create fail. COALESCE to the same -1 sentinel thumbnails already use; capture resolves the real size from the file. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
roborev: Combined Review (
|
7dc29e4 to
620a2e5
Compare
roborev: Combined Review (
|
roborev: Combined Review (
|
Squashed branch commits: - fix: verify progress bar tracks blobs, not snapshots - feat: backup restore with layered self-proof - fix: address roborev findings on delta safety, storage paths, and progress Generated with Codex (GPT-5) Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
17be3ed to
f5473cc
Compare
Restore now places attachment blobs at database-recorded storage paths, but version-1 readers (earlier builds of this branch) placed every blob at the canonical <aa>/<hash> derivation. A snapshot whose database records namespaced paths (synctech-sms style) would restore "successfully" under such a reader while the database points at files that do not exist. Manifests now record format/min_reader version 2 when any attachment row's storage or thumbnail path deviates from the canonical derivation, so old readers refuse them explicitly via the existing LoadManifest gate. The check inspects every row rather than the captured refs, because MIN(storage_path) can select a canonical path while another row records the same hash at a namespaced one. All-canonical snapshots keep version 1 and remain readable by older code; SupportedReaderVersion rises to 2. Addresses roborev review 27340. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
roborev: Combined Review (
|
Addresses roborev reviews 27340 and 27341 (five Medium findings): - Attachment capture admits work through a byte gate (1 GiB budget) in addition to the worker-count window: a per-CPU pool holding whole files could OOM a default backup of a video-heavy library. Oversized files are admitted alone, and the gate unblocks on failure so an aborted capture cannot deadlock the dispatcher. - restoreExtrasEntry rejects entries naming the restored database, its SQLite sidecars, or the attachments tree (case-insensitively, for macOS filesystems): a tampered extras tree could otherwise overwrite already-proven restore outputs after the proof ran on them. - Snapshot IDs now embed 128 bits of the manifest SHA-256 instead of 32: LoadManifest's recompute check is what rejects forged manifests, and a 32-bit suffix is brute-forceable. Full tamper resistance arrives with repository encryption; the longer digest closes the cheap forgery in the meantime. Pre-release snapshots with short IDs are no longer readable — recreate dev repositories. - Full verify checks the materialized page map's page size against the manifest, mirroring the hash-map geometry check; the mismatch was previously only caught at restore time. - writeRun's blob bounds check is subtraction-based: a decoded page-map run with a huge BlobOffset could wrap the addition-based comparison and panic on the slice instead of failing the restore. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
roborev: Combined Review (
|
On a first backup every page is dirty, so create spends most of its wall clock compressing and writing ~all page content into packs — a phase that had no progress stage of its own (the scan bar sat frozen at its last tick for minutes, looking hung) and ran zstd on a single core (observed 85% CPU / 383 MB/s against an 8-core machine on a 66 GB archive). storePageBlobs now reports a dedicated "pack" stage (skipped entirely when nothing is dirty) and fans read+hash+compress out to --jobs workers with a strictly in-order collector, so pack contents and the page-map delta are byte-identical to a serial run. --jobs 1 keeps reads strictly serial for archives on spinning disks or NAS shares, matching the attachment-capture contract. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The restore proof runs PRAGMA integrity_check, which reads the entire restored database inside SQLite with no events to report — on a 66 GB archive the proof bar sat at 0% for minutes looking hung. Stage lines also gave no sense of how long each phase took. Every progress line (TTY and plain) now carries the stage's elapsed wall-clock time once it passes one second, and final lines record the stage's total duration. An idle ticker re-renders an open TTY line when no event has arrived for a throttle interval, so silent stages visibly count up instead of freezing. The proof stage also reports a sub-step: integrity_check done (1/2), stats comparison done (2/2). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
roborev: Combined Review (
|
Launching the TUI against an archive with no analytics cache (e.g. a freshly restored backup) left the screen blank for minutes with only a small spinner: in the default engine="auto" mode the daemon silently falls back to live SQL, and the first aggregate query over a large database can take a long time. Nothing told the user why. Now the tui command asks the daemon for cache stats before launching (best-effort, 5s timeout) and, when the cache is missing, prints a notice recommending 'msgvault build-cache' and shows the same message on the TUI info line while aggregate views load, in both email and texts modes.
…sfires Review findings from jobs 27346, 27350, and 27352: - restoreExtrasEntry validated the raw extras path but wrote through filepath.Join, which cleans ".." away — a tampered tree entry like safe/../msgvault.db passed the reserved-name check yet landed on the restored database. Paths are now cleaned before validation. - The progress renderer's throttle discarded events before recording them, so the idle ticker kept repainting stale counts (a quick proof 1/2 inside the throttle window stayed rendered as 0/2 through the long stats comparison). lastEvent is now recorded before throttling. - The TUI analytics-cache notice asked the daemon's /cli/cache-stats, which runs DuckDB scans over the whole cache when one exists (slowing every launch) and reports missing Parquet files for PostgreSQL backends where build-cache does not even apply. The notice now mirrors the daemon's own engine decision from shared local config at stat/glob cost and stays silent for postgres, engine="sql", and configured remotes.
…uide The TUI guide claimed the daemon builds or updates the Parquet cache "when aggregate views need it", but aggregate views never trigger a build: the engine is chosen once at daemon startup, stale caches are refreshed after scheduled syncs, and startup builds happen only under engine="duckdb". This is exactly the surprise behind the blank first view on a freshly restored archive, so document the actual triggers and the new launch notice pointing at 'msgvault build-cache'.
Review findings from jobs 27353 and 27354, plus the nix CI failure: - The TUI cache notice inferred live-SQL fallback from local disk state, but the daemon chooses its engine once at startup: after a cache build the notice went silent while the running daemon stayed on live SQL, and the staleness probe opened the SQLite archive directly (checkpointing the WAL on close) on every launch. The daemon now records its startup choice (duckdb, sql, sql-fallback, postgres) and reports it as analytics_engine on /health; the notice keys off that authoritative O(1) signal, works for configured remotes, and tells the user a daemon restart is needed after build-cache. - docs: auto_build_cache only controls the startup build under engine="duckdb" — scheduled-sync refreshes ignore it. Corrected the tui.md sample comment and the storage.md claim that the daemon builds the cache "when aggregate views need it". - nix: update vendorHash for this branch's dependency changes (verified with a local nix build).
roborev: Combined Review (
|
The Windows CI lane failed across internal/backup and internal/config:
- restoredDBDSN passed the OS path straight into a file: URI, so a
Windows drive-letter path full of backslashes was percent-encoded
and read by SQLite's URI parser as an authority ("invalid uri
authority: C:%5CUsers..."), failing every restore proof. The path is
now slash-converted and slash-rooted (file:///C:/dir/msgvault.db).
- Extras "not a regular file" errors printed the OS-separator relative
path while tree entries and tests use the canonical slash form.
- TestRestoreTargetPathWithURISyntax used '?' in a directory name,
which Windows filenames cannot contain; the Windows variant keeps
'#' and space, and a new unit test pins the drive-letter DSN shape.
- TestLoadWithBackupConfig hardcoded "/mnt/...", which is not absolute
on Windows and was resolved relative to the home directory; the
fixture now substitutes a platform-absolute path.
roborev: Combined Review (
|
…s lane Review 27358 (Medium): restoredDBDSN's Windows fix slash-rooted every non-rooted path, so a relative --target like "restore-out" had its database proven at /restore-out/msgvault.db instead of the materialized file. The path is now made absolute before URI construction, with a relative-target restore test and DSN unit tests (the drive-letter shape check is Windows-only since filepath.Abs treats drive paths as relative elsewhere). Also the last Windows CI failure: the extras mode-preservation assert expects the captured 0640 to round-trip, but Windows has no POSIX permission bits and Stat reports 0666 for any writable file, so the exact-mode check is now POSIX-only.
roborev: Combined Review (
|
Combined review of 92e30d2 (Medium): backup create dials the daemon at its configured bind address, but the freeze handlers required a loopback RemoteAddr — so a daemon bound to a specific LAN IP (a NAS bound to 192.168.x.x) answered its own freeze subprocess with 404 and backups were impossible on an otherwise valid configuration. The freeze gate now accepts same-host callers: loopback, or a source address assigned to one of this machine's own interfaces, which is exactly how a same-host connection to a LAN bind arrives. The check still reads r.RemoteAddr (never forwarded-for headers) and API-key auth is still required, including for the new same-host case.
roborev: Combined Review (
|
…aging Combined review of 68a3c4b (two Mediums): - Canceling backup create during the scan, changed-page packing, or attachment capture phases left the phase running to completion, and the lock heartbeat kept the repo locked meanwhile. ScanPages, storePageBlobs, and CaptureAttachments now take ctx: dispatchers stop on ctx.Done, collectors fail fast and drain their workers, and a post-drain ctx check prevents an early dispatcher exit from returning partial work as success. - CleanStaging trusted <repo>/staging blindly: a "staging" symlink planted by another principal would have its target's entries RemoveAll'd as the msgvault user on the next cleanup. The path is now Lstat'd and anything that is not a real directory is refused.
Implements the backup and restore side of #339:
msgvault backup init / create / list / verify / restoreagainst an app-owned backup repository. Restore is included and proves itself before reporting success — see "Restore" below.Docs: usage guide · format reference
How this maps to the #339 proposal
Kept as proposed:
--include-config/--include-tokensflags, off by default. Added on top: capturing either into an unencrypted repository requires an explicit--allow-plaintext-secrets, so credentials can't leak by default.<aa>/<hash>layout).Where the design goes further than the proposal:
VACUUM INTO. Instead of a full-copy snapshot, the backup checkpoints the WAL and pins a read transaction on its own connection, coordinating with a running daemon through a short freeze window (loopback API, watchdog-protected). No double-disk requirement, no downtime; the daemon resumes writes while pages are scanned.backup createrefuses to run unfrozen against a live daemon.verify(full mode) re-derives every referenced blob's SHA-256 from its bytes, proves the page map covers the database exactly, and cross-checks manifest counts. Failures name the damaged blob and the pack file holding it.--quickdoes structural checks;--allcovers every snapshot, reading shared content once.min_reader_version, per-object magic+version on every binary format, per-snapshotmin_reader_version), so an older msgvault refuses a newer repository or snapshot with an explicit upgrade message instead of misreading it. Manifests are deterministic JSON; snapshot IDs are content-derived from the manifest bytes and re-checked on load, so a renamed or forged manifest is rejected.Restore
backup restore --target DIR [SNAPSHOT] [--overwrite]materializes any snapshot into a usable archive home and does not trust itself:PRAGMA integrity_checkand reproduce the manifest's recorded stats through exactly the queries capture ran inside the freeze; any mismatch fails the restore.--overwrite; overwrite merges but always removes the database and stale-wal/-shmsidecars first. Restoring into a running daemon's archive home is refused outright. Old backups go through normal schema migration on first open.Performance
Page scanning hashes on all CPUs while reads stay strictly sequential (safe for NAS/spinning disks with no flag). Verify, restore, attachment capture, and changed-page packing fan out workers with a
--jobsflag;--jobs 1serializes reads for spinning-disk repositories. Progress renders as in-place bars on a TTY and plain lines through pipes, with per-stage elapsed time that keeps counting through long quiet stages such as the restore proof'sintegrity_check.Changes
internal/pack(sealed pack container) andinternal/backup(repository, locking, incremental page capture, manifests, verify, restore).msgvault backup init | create | list | verify | restore;[backup]config section (repo,zstd_level);--jobson create/verify/restore;--progressrendering.backup createfrom the backup group — verify and restore always run locally.docs/usage/backup.md) and on-disk format reference (docs/architecture/backup-format.md), including the roadmap forverify --restore-check, encryption, and retention.Sequenced later (format hooks in place)
forget/pruneretention. Until prune ships, content purged from the archive persists in earlier snapshots — documented, with the privacy implications (including SQLite free-page remnants), in the usage guide.verify --restore-check: the full restore materialization proof against scratch space without writing a target.🤖 Generated with Claude Code