Skip to content

chore(release): promote rc-2026.5.5#106

Merged
jacderida merged 10 commits into
mainfrom
rc-2026.5.5
Jun 3, 2026
Merged

chore(release): promote rc-2026.5.5#106
jacderida merged 10 commits into
mainfrom
rc-2026.5.5

Conversation

@jacderida
Copy link
Copy Markdown
Contributor

Promotes rc-2026.5.5 to release version(s): 0.2.6,0.2.7.

  • strips -rc.* from [package].version
  • rewrites internal git+branch deps to crates.io version pins
  • regenerates Cargo.lock

Once merged, the release tag will be pushed to fire the publish workflow.

jacderida and others added 10 commits May 29, 2026 13:59
On STG-01 (ant-node 0.11.6-rc.3) the single forced `--merkle` client failed
~100% of uploads while every no-merkle/auto client was healthy. The failures
were all at the quote stage ("insufficient peers: Got N quotes, need 7 ...
Timeout waiting for quote"), and were NOT region- or node-related (equally
distant no-merkle clients had zero quote timeouts).

Root cause is in the merkle preflight. `plan_merkle_upload` checks every chunk
for existing storage before paying, via `chunk_already_stored_for_merkle` ->
`get_store_quotes`, which requires a full CLOSE_GROUP_SIZE (7) quote quorum. The
preflight stream did `let is_already_stored = result?;`, so the FIRST chunk
whose quote collection hit a transient timeout aborted the entire plan. Forced
`--merkle` (PaymentMode::Merkle) has no wave-batch fallback — those only fire in
PaymentMode::Auto — so the whole upload failed. Via the multiplicative per-chunk
effect, P(all N chunks reach quorum) = (1-p)^N → ~0 for a large file, hence the
~100% failure. The wave path tolerates per-chunk failures (retry / partial
success), which is why no-merkle clients were fine.

The preflight is only an optimisation — "is this chunk already on the network so
we can skip paying for it?" It must never be able to abort an upload. Route the
quote result through a new `preflight_stored_status` helper that degrades
transient failures (timeout / insufficient peers / transport, via the existing
`classify_error`) to "not known to be stored" -> queue the chunk for upload.
Re-storing an existing chunk is idempotent (the node returns AlreadyExists) and
the real store step has its own per-chunk quorum and retries. Genuine
application errors still propagate.

Adds unit tests covering: quotes-gathered -> not stored, AlreadyStored ->
stored, transient failures (incl. the exact "Got N quotes, need 7") -> not
stored rather than error, and application errors still propagating.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The Clippy, Documentation, and E2E CI jobs (everything built with
`--all-features`, which enables the `devnet` feature and the e2e tests) failed
to compile with:

    error[E0308]: mismatched types
      --> ant-core/src/node/devnet.rs:78
      expected `ant_protocol::transport::MultiAddr`,
      found `ant_node::core::MultiAddr`
    note: there are multiple different versions of crate `saorsa_core` in the
          dependency graph

`ant-core` pinned `ant-protocol` to the rc-2026.5.5 git branch (saorsa-core
0.24.5-rc.1, git) but pulled `ant-node` from crates.io 0.11.5, which depends on
the older crates.io ant-protocol 2.1.1 / saorsa-core 0.24.4. Two saorsa-core
copies meant `ant_node`'s and `ant_protocol`'s re-exported `MultiAddr` were
distinct, incompatible types where `devnet.rs` uses them together. This was
pre-existing on the base branch (same jobs red before this branch existed); the
Build and Unit Test jobs, which don't enable those features, stayed green and
hid it.

Pin the optional runtime `ant-node` dep, the dev-dependency `ant-node`, and the
dev-dependency `saorsa-core` to the same rc-2026.5.5 git branches as
`ant-protocol`. The graph now resolves a single saorsa-core (0.24.5-rc.1, git)
and a single ant-node (0.11.6-rc.3, git), matching the very honour-the-one-pin
note already in the manifest.

Verified: `cargo clippy --all-targets --all-features -- -D warnings`,
`RUSTDOCFLAGS=-D warnings cargo doc --all-features`, and
`cargo test --all-features --no-run` (e2e_* targets now build) all pass;
`cargo test -p ant-core --lib` → 327 passed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ent-quotes

fix(merkle): don't abort the whole upload when a preflight quote fails
The CLI `ant file upload` direct merkle path (`upload_waves_merkle`) aborted the
entire file the instant one chunk's store returned `Err`, with no retry and no
convergence wait. A chunk that was only transiently short of quorum (a few
close-group peers' routing tables briefly disagreeing, or momentarily full)
would kill a multi-hundred-chunk upload — observed in the PROD-UL-01 failures
where a single unstoreable chunk aborted uploads at 29/367 and 163/479.

The external-signer path already handled this correctly via
`merkle_store_with_retry`. This change makes the CLI path reuse that same
machinery so transient quorum shortfalls are retried instead of aborting.

- Make `merkle_store_with_retry` and the `MERKLE_STORE_MAX_ATTEMPTS` /
  `MERKLE_RETRY_BACKOFF` constants `pub(crate)` so the CLI path can call them
- Add `failed_addresses` to `MerkleStoreOutcome` and carry each chunk's last
  quorum-shortfall message through retry rounds, so the CLI path can build a
  faithful `PartialUpload`
- Rewrite `upload_waves_merkle` to drive each wave through
  `merkle_store_with_retry` (per-wave, preserving the streaming memory bound),
  aggregate failures across waves, and return `PartialUpload` only after retries
  are exhausted — never silently succeeding with missing chunks
- Pre-partition addresses by proof so chunks left without a proof by a partial
  `pay_for_merkle_multi_batch` result (a later sub-batch's payment failed) are
  reported as failed via `PartialUpload` rather than aborting the whole file with
  a generic error — preserving the resumable partial-upload contract and every
  already-stored chunk
- Convert any residual non-quorum store error at the `upload_waves_merkle`
  boundary into `PartialUpload` so earlier waves' stored chunks survive for resume
- Non-`InsufficientPeers` errors stay non-retryable, unchanged

Tests: `cargo test -p ant-core --lib` for the new merkle + file unit tests
(failed_addresses contract; partition-by-proof boundary), `cargo clippy
--all-targets --all-features -- -D warnings` clean, `cargo fmt --all -- --check`
clean.

Closes V2-413

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
fix: retry quorum-short chunks in cli merkle upload path
…eased versions

The helper's rewrite map only covers runtime deps; the optional
devnet ant-node dep and the ant-node/saorsa-core dev-deps were still
git-pinned to the rc-2026.5.5 branch. Pin them to the released
crates.io versions (ant-node 0.11.6, saorsa-core 0.24.5) so main
doesn't freeze on the rc branch, and refresh the comments that
justified the git pins — the released versions are now aligned.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jacderida jacderida merged commit 00da3ab into main Jun 3, 2026
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant