chore(release): promote rc-2026.5.5#106
Merged
Merged
Conversation
On STG-01 (ant-node 0.11.6-rc.3) the single forced `--merkle` client failed
~100% of uploads while every no-merkle/auto client was healthy. The failures
were all at the quote stage ("insufficient peers: Got N quotes, need 7 ...
Timeout waiting for quote"), and were NOT region- or node-related (equally
distant no-merkle clients had zero quote timeouts).
Root cause is in the merkle preflight. `plan_merkle_upload` checks every chunk
for existing storage before paying, via `chunk_already_stored_for_merkle` ->
`get_store_quotes`, which requires a full CLOSE_GROUP_SIZE (7) quote quorum. The
preflight stream did `let is_already_stored = result?;`, so the FIRST chunk
whose quote collection hit a transient timeout aborted the entire plan. Forced
`--merkle` (PaymentMode::Merkle) has no wave-batch fallback — those only fire in
PaymentMode::Auto — so the whole upload failed. Via the multiplicative per-chunk
effect, P(all N chunks reach quorum) = (1-p)^N → ~0 for a large file, hence the
~100% failure. The wave path tolerates per-chunk failures (retry / partial
success), which is why no-merkle clients were fine.
The preflight is only an optimisation — "is this chunk already on the network so
we can skip paying for it?" It must never be able to abort an upload. Route the
quote result through a new `preflight_stored_status` helper that degrades
transient failures (timeout / insufficient peers / transport, via the existing
`classify_error`) to "not known to be stored" -> queue the chunk for upload.
Re-storing an existing chunk is idempotent (the node returns AlreadyExists) and
the real store step has its own per-chunk quorum and retries. Genuine
application errors still propagate.
Adds unit tests covering: quotes-gathered -> not stored, AlreadyStored ->
stored, transient failures (incl. the exact "Got N quotes, need 7") -> not
stored rather than error, and application errors still propagating.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The Clippy, Documentation, and E2E CI jobs (everything built with
`--all-features`, which enables the `devnet` feature and the e2e tests) failed
to compile with:
error[E0308]: mismatched types
--> ant-core/src/node/devnet.rs:78
expected `ant_protocol::transport::MultiAddr`,
found `ant_node::core::MultiAddr`
note: there are multiple different versions of crate `saorsa_core` in the
dependency graph
`ant-core` pinned `ant-protocol` to the rc-2026.5.5 git branch (saorsa-core
0.24.5-rc.1, git) but pulled `ant-node` from crates.io 0.11.5, which depends on
the older crates.io ant-protocol 2.1.1 / saorsa-core 0.24.4. Two saorsa-core
copies meant `ant_node`'s and `ant_protocol`'s re-exported `MultiAddr` were
distinct, incompatible types where `devnet.rs` uses them together. This was
pre-existing on the base branch (same jobs red before this branch existed); the
Build and Unit Test jobs, which don't enable those features, stayed green and
hid it.
Pin the optional runtime `ant-node` dep, the dev-dependency `ant-node`, and the
dev-dependency `saorsa-core` to the same rc-2026.5.5 git branches as
`ant-protocol`. The graph now resolves a single saorsa-core (0.24.5-rc.1, git)
and a single ant-node (0.11.6-rc.3, git), matching the very honour-the-one-pin
note already in the manifest.
Verified: `cargo clippy --all-targets --all-features -- -D warnings`,
`RUSTDOCFLAGS=-D warnings cargo doc --all-features`, and
`cargo test --all-features --no-run` (e2e_* targets now build) all pass;
`cargo test -p ant-core --lib` → 327 passed.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ent-quotes fix(merkle): don't abort the whole upload when a preflight quote fails
The CLI `ant file upload` direct merkle path (`upload_waves_merkle`) aborted the entire file the instant one chunk's store returned `Err`, with no retry and no convergence wait. A chunk that was only transiently short of quorum (a few close-group peers' routing tables briefly disagreeing, or momentarily full) would kill a multi-hundred-chunk upload — observed in the PROD-UL-01 failures where a single unstoreable chunk aborted uploads at 29/367 and 163/479. The external-signer path already handled this correctly via `merkle_store_with_retry`. This change makes the CLI path reuse that same machinery so transient quorum shortfalls are retried instead of aborting. - Make `merkle_store_with_retry` and the `MERKLE_STORE_MAX_ATTEMPTS` / `MERKLE_RETRY_BACKOFF` constants `pub(crate)` so the CLI path can call them - Add `failed_addresses` to `MerkleStoreOutcome` and carry each chunk's last quorum-shortfall message through retry rounds, so the CLI path can build a faithful `PartialUpload` - Rewrite `upload_waves_merkle` to drive each wave through `merkle_store_with_retry` (per-wave, preserving the streaming memory bound), aggregate failures across waves, and return `PartialUpload` only after retries are exhausted — never silently succeeding with missing chunks - Pre-partition addresses by proof so chunks left without a proof by a partial `pay_for_merkle_multi_batch` result (a later sub-batch's payment failed) are reported as failed via `PartialUpload` rather than aborting the whole file with a generic error — preserving the resumable partial-upload contract and every already-stored chunk - Convert any residual non-quorum store error at the `upload_waves_merkle` boundary into `PartialUpload` so earlier waves' stored chunks survive for resume - Non-`InsufficientPeers` errors stay non-retryable, unchanged Tests: `cargo test -p ant-core --lib` for the new merkle + file unit tests (failed_addresses contract; partition-by-proof boundary), `cargo clippy --all-targets --all-features -- -D warnings` clean, `cargo fmt --all -- --check` clean. Closes V2-413 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
fix: retry quorum-short chunks in cli merkle upload path
…eased versions The helper's rewrite map only covers runtime deps; the optional devnet ant-node dep and the ant-node/saorsa-core dev-deps were still git-pinned to the rc-2026.5.5 branch. Pin them to the released crates.io versions (ant-node 0.11.6, saorsa-core 0.24.5) so main doesn't freeze on the rc branch, and refresh the comments that justified the git pins — the released versions are now aligned. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Promotes
rc-2026.5.5to release version(s): 0.2.6,0.2.7.-rc.*from[package].versionCargo.lockOnce merged, the release tag will be pushed to fire the publish workflow.