ethersphere · lat-murmeldjur · Apr 16, 2026 · Apr 16, 2026 · Apr 16, 2026 · Apr 16, 2026
diff --git a/SWIPs/swip-191-192.md b/SWIPs/swip-191-192.md
@@ -0,0 +1,266 @@
+---
+swip: 191 & 192
+title: Efficient multiple-version SOC exhaustive lookup; Censorship-resistant decentralised commenting on Swarm
+author: lat-murmeldjur
+status: Draft
+type: Standards Track
+category: Primitives
+created: 2026-04-16
+---
+
+
+# SWIP 191 - Efficient multiple-version SOC exhaustive lookup
+# SWIP 192 - Censorship-resistant decentralised commenting on Swarm
+
+## Problem statement
+
+Single Owner Chunks complement content-addressed chunks by allowing a fixed address, derived from an owner and a topic, to point to arbitrary dynamic content. This makes it possible to create a deterministically addressed root chunk whose address can be calculated by anyone who knows the SOC owner and topic. Sequential feeds are built on top of this primitive: feed updates are consecutive SOCs whose addresses can be calculated by anyone who knows the feed owner and the feed topic, with the topic for each update derived from the feed topic and the update index itself.
+
+The anythread design shows that this mechanism can be repurposed into an open commenting and threading system for arbitrary subjects. A thread is defined by a topic string. The shared feed owner is derived from that topic by using it as a publicly known private key seed in such a way that anyone can write with it, and the feed topic is likewise derived from the same topic string. This creates a feed associated to any subject, such as a Swarm reference, a web2 URI or a label or statement, and that feed can be written by anyone.
+
+To keep comments attributable to the actual writer, anythread introduces one level of indirection. The shared feed does not primarily store comment bodies. Instead, when a participant has a comment for a topic, they publish their own private feed owner identity on the shared anythread feed. The actual comment then resides on a separate feed for the same topic, but owned by that participant alone. This keeps authorship clear while avoiding the need for the shared feed itself to be trusted as the final storage location of the comment content.
+
+This construction also exposes an emergent property of SOCs. If multiple actors use a common SOC owner, they may upload different contents under the same SOC address, including concurrently. If this is not handled explicitly, different parts of a neighborhood in the network may observe different contents for the same SOC address.
+
+The Graffiti Single/Several Owner Chunk (GSOC) primitive addresses this by making distinct versions of the same SOC visible within the neighbourhood through pull sync, even when they were uploaded with different postage stamps. This avoids the simple partitioning scenario, but it introduces a new requirement: once multiple valid contents may exist for the same SOC address, applications need a practical and reliable way to retrieve all versions of that SOC.
+
+The purpose of SWIP 191 is to provide the retrieval-side mechanism required for efficient exhaustive lookup of multiple versions of a single SOC. 
+The purpose of SWIP 192 is to make anythread significantly more resistant to censorship and operational failure modes.
+
+## SWIP 191 - Efficient multiple-version SOC exhaustive lookup
+
+### Motivation
+
+If multiple valid versions of the same SOC may exist, a plain retrieval request is no longer enough for an application that needs completeness. A node may return one valid version, but the caller still does not know whether other valid versions exist. What is required is a way to constrain retrieval requests so that the caller can systematically enumerate the entire set of wrapped content addresses associated with a given SOC.
+
+The goal is not to change the integrity model of Swarm. Returned chunks must still validate against the requested SOC. The goal is to extend retrieval so that an application can ask for a version matching additional conditions and, by repeating such requests deterministically, discover every version that exists.
+
+### Retrieve protocol extensions
+
+Two additional query fields are sufficient for this purpose.
+
+1. `prefix`
+
+   This field is conceptually a bit prefix. At the protobuf level it can be represented as bytes, with the understanding that the query specifies a precise prefix length in bits rather than only in whole bytes. The condition is:
+
+   Return a version of SOC `R` whose wrapped content address begins with the bits specified by `prefix`.
+
+2. `other_than`
+
+   This field is a 32-byte wrapped content address. The condition is:
+
+   Return a version of SOC `R` whose wrapped content address is different from the address specified in `other_than`.
+
+If both fields are present, both conditions must be held. In other words, the request means:
+
+Return a version of SOC `R` whose wrapped content address matches `prefix` and is different from `other_than`.
+
+These additions imply only a small set of changes at the node level:
+
+- the two fields must be added to the retrieve protocol;
+- response validation must verify that the returned chunk satisfies the requested conditions if the returned chunk is an SOC;
+- chunk-store lookup logic must be able to select a matching version when several versions of the same SOC exist.
+
+This is sufficient to support efficient exhaustive lookup without changing the basic validity rules for SOC retrieval.
+
+### Exhaustive lookup at an application level
+
+An external application (for example a dapp, or the anythread library) can use the two fields above to enumerate all versions of a target SOC with a reasonably low number of requests. The key observation is that wrapped content addresses can be explored as ranges of binary prefixes. Each request either discovers a new address in a range or proves that no additional address exists in that range under the given exclusion condition.
+
+A compact example makes the idea easier to follow.
+
+Assume the target SOC address is `R`.
+
+1. Retrieve `R` without any additional condition.
+   - Suppose the response contains wrapped content address `A`, and `A` begins with `0000...`.
+
+2. Retrieve `R` again with `other_than = A`.
+   - If this returns nothing, then `A` is the only known version and the lookup is complete.
+   - Otherwise suppose the response contains a second address `B`, and `B` begins with `0001...`.
+
+3. At this point `A` and `B` share the prefix `000`. This tells us three things:
+   - there may still be further versions under the `0000...` branch besides `A`;
+   - there may still be further versions under the `0001...` branch besides `B`;
+   - there may also be versions in the neighbouring ranges `001...`, `01...`, and `1...`, which are not yet covered by either known address.
+
+   The next requests are therefore:
+
+   - `prefix = 0000`, `other_than = A`
+   - `prefix = 0001`, `other_than = B`
+   - `prefix = 001`
+   - `prefix = 01`
+   - `prefix = 1`
+
+Each request has one of two outcomes. It either returns a new wrapped content address, or it proves that the queried range contains no additional version satisfying the request. By repeating this process only for unexplored ranges, the application can cover the full address space without issuing redundant requests.
+
+The opposite case is also illustrative. If the first two retrieved addresses were `C = 1...` and `D = 0...`, then they would already diverge at the first bit. In that case the next requests would be:
+
+- `prefix = 1`, `other_than = C`
+- `prefix = 0`, `other_than = D`
+
+No additional gap queries would be needed at that stage, because the two branches `0...` and `1...` already cover the full space.
+
+### General form of the algorithm
+
+The logic can be expressed recursively over prefix ranges.
+
+For a given starting prefix, the application first tries to learn one address in that range. If it finds one, it then asks for a different address in the same range. If no second address exists, the range is complete. If a second address does exist, the application uses the first point where the two addresses diverge to split the range into smaller subranges and recurses only into those subranges that still need to be explored.
+
+In pseudocode:
+
+```text
+Lookup(prefix, known_address = None):
+
+    if known_address is None:
+        A = retrieve(prefix = prefix)
+        if A does not exist:
+            return {}
+
+    else:
+        A = known_address
+
+    B = retrieve(prefix = prefix, other_than = A)
+
+    if B does not exist:
+        return {A}
+
+    let n be the length of the common prefix of A and B,
+    measured from the beginning of the full wrapped content address
+
+    results = {A, B}
+
+    results += Lookup(prefix = first n + 1 bits of A, known_address = A)
+    results += Lookup(prefix = first n + 1 bits of B, known_address = B)
+
+    for i from length(prefix) to n - 1:
+        G = first i bits of A, followed by the next bit flipped
+        results += Lookup(prefix = G)
+
+    return results
+```
+
+The initial call is:
+
+```text
+Lookup(prefix = "", known_address = None)
+```
+
+This returns the set of all versions of the target SOC.
+
+Another way to describe the same algorithm is as a traversal of a binary prefix tree over wrapped content addresses. A plain request finds one leaf in a subtree. A request with `other_than` determines whether there is a second distinct leaf in the same subtree. If there is, the subtree is split at the point of divergence and explored further. If there is not, the subtree is complete.
+
+### Expected number of requests
+
+Let `N` be the number of existing versions of the target SOC.
+
+A practical way to think about the request count is:
+
+- roughly two requests are needed per discovered version to prove uniqueness within its explored range;
+- additional requests are needed only for prefix ranges that must be checked and turn out to be empty.
+
+This makes the expected request count close to `2N` when the wrapped content addresses are well distributed across the address space. When versions are clustered, more prefix-only gap checks are needed.
+
+For example:
+
+- if all `N` versions share 16 leading bits, the request count is approximately `2N + 16`;
+- if there are `M` clusters, each containing `N` versions that share 16 leading bits within the cluster, a conservative estimate is `2MN + 16M`, noting that in practice shorter-prefix gaps may overlap and reduce the total.
+
+A property of the algorithm is that it avoids dispatching multiple requests for the same address-space range. It only explores branches that are known to matter and only tests gaps that are not already covered.
+
+## SWIP 192 - Censorship-resistant decentralised commenting on Swarm
+
+### What SWIP 191 solves for anythread
+
+SWIP 191 provides a deterministic way to discover all versions of a given SOC with reasonable efficiency. For anythread this is directly useful. If an attacker attempts to censor a specific feed index by publishing additional versions of the same SOC, retrieval no longer has to rely on chance. The application can enumerate all versions of that index and recover the intended one.
+
+This does not solve spam or flooding. Anythread is meant to be censorship-resistant, and content moderation is outside the scope of this proposal. The purpose here is narrower: to make it harder for a participant to hide legitimate thread updates by exploiting the fact that a shared feed owner allows many writers to target the same SOC address.
+
+### Flooding as a means of censorship
+
+A stronger attack is to keep generating more and more versions of the same SOC in an attempt to bury a specific target version behind an ever-growing set of alternatives. This is not a simple partitioning problem; it is a computational flooding attack against exhaustive lookup.
+
+Even in this case, the attacker can only interfere with discovery of the commenter identity on the shared anythread feed. Once that identity has been found, the reader can follow it to the participant's private feed for the same topic, and the comments on that feed can no longer be modified or replaced by anyone else.
+
+Such attacks require extra effort compared with ordinary use and are most relevant for sensitive topics. For that reason, the revised anythread design can expose a tunable security parameter so that applications can choose how costly they want it to be to produce a version that the application will even consider relevant.
+
+### Security parameter for relevant versions
+
+The security parameter works by narrowing the accepted part of the wrapped-content address space.
+
+At security parameter `p = 0`, the application discovers any valid version of the SOC.
+
+At security parameter `p > 0`, the application only browses and writes versions whose wrapped content address matches a designated `p`-bit prefix. A natural choice is to require the wrapped content address to match the first `p` bits of the SOC address itself. Writers can satisfy this condition by introducing a salt after the commenting-identity portion of the SOC payload and varying that salt until the wrapped content address falls into the accepted prefix range.
+
+This has two useful effects:
+
+- it raises the amount of work required to generate a version that the application will treat as relevant;
+- it gives the reader a narrower starting range for exhaustive lookup.
+
+Operationally, this means that the exhaustive lookup procedure from SWIP 191 does not begin from the empty prefix. It begins from the selected application-level prefix instead.
+
+As the reader narrows the search further during exhaustive lookup, the attacker must generate alternative versions that match progressively longer prefixes. In other words, once the application enforces a `p`-bit relevance condition and the reader is already exploring a branch of length `p + n`, any on-the-fly flooding attempt must satisfy that full prefix length as well. This makes adaptive flooding increasingly difficult as lookup approaches the target version.
+
+### Time-based erosion of feed indexes
+
+Flooding is not the only remaining issue for anythread. Because anyone can write the shared feed, different feed indexes may be backed by postage stamps with different expiry times. Over time, garbage collection can therefore create arbitrary gaps in a section of the feed that was once contiguous.
+
+This creates two distinct problems.
+
+1. Exhaustive look-ahead becomes inefficient.
+
+   If index `n` does not exist, the reader cannot conclude that higher indexes do not exist either. A long thread may therefore require many individual index checks simply to discover where later updates resume.
+
+2. Polling for new updates becomes unreliable.
+
+   A reader may watch the first currently empty index and assume that the next update will appear there. Meanwhile, a lower index may be garbage collected, creating a new gap, and a writer may publish into that lower gap. In that case the reader can miss the new update entirely.
+
+SWIP 192 proposes two application-level conventions to address these problems: a pager feed for efficient look-ahead and a short-lived new-updates feed for reliable polling.
+
+### Pager feed for efficient look-ahead
+
+The pager feed is a second feed associated with the same thread and topic. Its purpose is not to replace the base feed, but to summarise occupancy over larger index ranges.
+
+The pager feed works as follows:
+
+- a pager size is chosen at application level, for example `32`;
+- when a writer publishes base-feed index `N`, it also publishes the same content at pager-feed index `floor(N / pager_size)`;
+- each pager-feed index therefore corresponds to a `pager_size` long section of the base-feed indexes;
+- because many base indexes map to the same pager index, a single pager index may legitimately accumulate multiple versions;
+- the writer is expected to publish the base entry and the pager entry with the same postage stamp, probably making them expire together.
+
+This gives the reader an efficient look-ahead mechanism. If the pager size is `32`, then checking the next `16` pager indexes tells the reader whether any updates exist in the next `512` base-feed indexes. Since SWIP 191 already provides a method to enumerate all versions of a pager index, the application can recover the pager information efficiently even though each pager index may hold multiple versions.
+
+The base feed and the pager feed serve different purposes and are both useful:
+
+- the base feed spreads writes across more neighbourhoods, which makes censorship by local manipulation more difficult;
+- the strong correspondence between base indexes and pager indexes helps writers locate where to write in the pager feed without concentrating too many writes onto a single overloaded SOC by accident;
+- this mitigates the risk that the application itself creates problematically high numbers of versions for one pager index as a side effect;
+- the pager feed provides efficient long-range look-ahead across gaps created by garbage collection.
+
+For these reasons, maintaining both feeds is valuable, at the cost of using an extra stamp.
+
+### New-updates feed for efficient polling
+
+Applications that need low-latency polling can attach a third feed to the same thread: the new-updates feed.
+
+The new-updates feed is designed to be short-lived.
+
+- each new update on the base thread is also written to the current new-updates feed;
+- the feed changes with epochs, for example one epoch every five minutes;
+- the feed identity is derived from the anythread owner, the topic, and the current epoch boundary;
+- when the epoch changes, a new new-updates feed begins.
+
+Because each such feed exists only for a short interval, it is much less likely to develop large garbage-collected gaps. Even if a small gap appears, looking ahead a few indexes remains cheap. A reader that wants to poll for fresh activity can therefore monitor the current epoch's new-updates feed instead of relying on the first missing index of the long-lived base feed.
+
+### Conclusion
+
+Taken together, these pieces form a more robust anythread design on Swarm.
+
+- The base anythread feed is open to all writers because the feed owner is intentionally derivable from the thread topic.
+- Commenters publish their own feed-owner identities on that shared feed, while the actual comment content resides on their private feeds for the same topic.
+- SWIP 191 adds retrieve-side conditions that make efficient exhaustive lookup of all SOC versions possible.
+- A configurable security parameter can make relevant writes more expensive and make adaptive flooding harder.
+- A pager feed makes it practical to look ahead across large gaps created by garbage collection.
+- A short-lived new-updates feed makes polling for fresh activity more reliable.
+
+Combined, these conventions provide a substantially more censorship-resistant decentralised commenting system that can operate entirely on Swarm. Besides commenting, the anythread engine can serve a number of social purposes, such as creating searchable user registries where anyone can sign up, sending friend requests without previously established communications channel to discovered users, and publishing and discovering new content.