connmgr: Limit max connections per host. by davecgh · Pull Request #3698 · decred/dcrd

davecgh · 2026-05-21T22:25:15Z

This requires #3697.

Similar to the recent total normal connection limiting, the current per-host connection limits are enforced by the server. For similar reasons, it is much more robust and natural to perform the limiting early in the connection manager.

With that in mind, this implements the per-host connection limiting in the connection manager and removes the relevant current limiting for it from the server. The limiting is applied to inbound, outbound, and persistent connections.

The new limiting is handled early in both the inbound and outbound paths now which allows it to take advantage of fast connection shedding for inbound connections and to preemptively prevent all outbound attempts that would exceed the limit regardless of source. It also provides the flexibility to apply independent special permissions in the future.

This also slightly changes the semantics to exempt whitelisted addresses for both inbound and outbound connections as opposed to only inbound connections.

It also adds tests to ensure that the new max connections per host limiting properly enforces the limit including automatic outbound, manual outbound, inbound, and persistent connections and also that whitelisted addresses are exempt.

This adds a new context-aware semaphore type with Acquire and Release methods for use in upcoming changes that aim to simplify connection limiting by making use of semaphores for blocking until permits become available.

This adds tests for the new context-aware semaphore to ensure the acquire, release, and context cancel semantics work as expected.

The existing connection manager code was written well before contexts were introduced. Further, due to the old async model that has now been converted to a synchronous model, it is based around connection requests that have their state atomically updated asynchronously as various things happen. While it has undoubtedly worked well enough for over a decade, it has always been a challenge to add new functionality to it and requires the use of a lot of less than ideal and highly outdated techniques such as polling for state changes. It is also rather brittle in terms of requiring output connections to be manually disconnected in the connection manager after they've been closed to avoid things like leaking goroutines and failing to update target outbound counts. Moreover, it only tracks outgoing connections which ultimately forces a lot of connection-related tasks to be split across different layers instead of residing in the connection manager itself where they more naturally belong. Notably, that split, for all intents and purposes, prevents implementing some desirable more advanced features such as immediate connection shedding, different connection types, and listeners tied to specific network types. With the primary goal of addressing all of the aforementioned points and providing a solid base to work on for adding new features moving forward, this significantly reworks the connection manager to completely get rid of the notion of exposed connection requests in favor of a new custom connection type that wraps the underlying net.Conn. The new wrapped connections automatically handle cleanup when closed and have an associated connection type enum that allows easily distinguishing inbound, outbound, and manual connections as well as supporting new connection types in the future. Another nice feature of the new wrapped connections is they provide efficient access to concrete parsed address types which paves the way for avoiding a lot of constant reparsing, repeated host/port splitting and joining, and generally much more ergonomic immutable address types. Since changing to wrapped connections basically required a rather significant rewrite of large portions of the connection manager anyway, this also takes the opportunity to improve several other aspects of the connection manager in the process such as implementing full context support, full tracking of all connection types by the manager itself, much more robust semaphore-based automatic connection limiting, cleaner persistent connection handling with independent limits, prevention of multiple connections of any type to the same address:port, more useful debug logging, and cleanly closing all connections during shutdown. It is also important to note that the following overall semantics have intentionally been changed versus the existing connection manager: - A maximum of 8 persistent connections is now imposed and they no longer count toward the configured target number of automatic outbound peers to maintain - Duplicate addresses (host:port) are now rejected by the connection manager for all types (inbound, outbound, manual, persistent) - Note that inbound conns from the same IP will necessarily have different ports, so the same max IP limits apply in that case - RPC 'node connect' for all connection attempts now: - Supports the RPC connection and server contexts - Properly handles duplicate address rejection including pending attempts - RPC 'node connect' for non-persistent conn attempts now: - Waits for the connection attempt result before returning - Returns an error if the connection attempt fails - Cancels the connection attempt if the RPC connection is closed before it succeeds - RPC 'node remove' now supports removing a pending connection by its persistent connection ID (since no peer ID exists before a valid connection is established) - It is no longer possible for state transitions to allow things like duplicate addresses or failed cancellation

The max retry duration is currently an unexported global variable that the tests override at init time. At least one of the tests also additionally overrides it for that specified test too. While this works, it is somewhat brittle and prevents the tests from being run in parallel. This improves the situation by making the max retry duration a field on the connection manager instead of a global variable and adding a test helper for creating a new connection manager that overrides it by default. Then any tests that need a different value can simply override it on their local instance. It also makes the tests parallel since they can no longer clobber one another.

This updates the test for checking the connection manager cleanly shuts down with failed conns to actualy test what it is intended to. Manual connections do not automatically retry, only persistent connections.

This adds tests to ensure closing a connection multiple times works as intended.

This adds tests to ensure duplication connections are rejected for all possible states.

This adds tests to ensure attempts to add more than the maximum allowed number of persistent are rejected.

This adds tests to ensure the Disconnect method properly disconnects pending and established connections for both non-persistent and persistent connections.

This adds tests to ensure the Remove method properly disconnects and removes pending and established connections for both non-persistent and persistent connections.

This updates the connmgr package README.md to match the new design and capabilities.

This adds a couple of test helpers for asserting the internal state of the connection manager updates all tests to call the new helpers throughout. The first one asserts the internal maps are all coherent and do not violate any preconditions. The second one asserts clean shutdown.

Currently the whitelisting logic happens in the server which makes it inaccessible to the connection manager. In order to pave the way for supporting various connection-related logic that currently happens in the server, but ideally should be happening in the connection manager, this adds basic support for whitelisting CIDR prefixes to the connection manager. The connection manager config struct now accepts a slice of prefixes and a new method named IsWhitelisted is added. Note that this only adds support . It does not update anything to use the new functionality yet.

This adds tests to ensure the new whitelist detection method works as expected.

This modifies the server to pass in the parsed whitelist entries to the connection manager config and the relevant code to make use of the new method it exposes. Finally, it removes the no longer used local isWhitelisted method.

This adds a new TryAcquire method to the context-aware semaphore. As the name implies, the method supports conditionally acquiring the semaphore only when resources are immediately available. In other words, it will not block when there are no resources immediately available.

This adds tests for the new TryAcquire method on the context-aware semaphore to ensure the semantics work as expected.

The current overall total connection limits are enforced by the server rather than the connection manager. This is not ideal for many reasons, but one of the most important consequences is that it makes DoS attacks easier. Another example of some less than ideal behavior that it allows is that some rare combinations of events can lead to temporary extra connection churn. It is much more robust and natural to perform the limiting in the connection manager itself via semaphores. That approach not only significantly hardens the server against DoS attacks and solves various edge cases present in the current code, it also paves the way for even more advanced features such as traffic shaping in the future. To that end, this adds semaphore-based limiting for the total overall number of normal connections to the connection manager and removes the relevant current limiting for it from the server. Normal connections are the automatic outbound, manual outbound, and inbound connections. Persistent connections, on the other hand, are not subject to the limit since they have their own limiting. This is consistent with them not being subject to the automatic target outbound limit either.

This adds tests to ensure that the new max normal connection limiting properly enforces the limit including automatic outbound, manual outbound, and inbound connections. It also ensures that it not applied to persistent connections.

jholdstock · 2026-06-04T00:53:04Z

+//
+// This function MUST be called with the connection mutex held (writes).
+func (cm *ConnManager) decrementPerHostCount(hostKey string) {
+	cm.perHostCounts[hostKey]--


If this func is called with a non-existing key, that key will be inserted into the map with underflow value 4294967295. https://go.dev/play/p/p8xfciT_vQc

Perhaps check for existence before decrementing?

It would underflow, but it's never called without a valid key and that is a precondition. All callers already check existence. There are test assertions that prove that as well.

To confirm that, it's only called 3 places removePendingInfo, removeActiveConn, and removePersistentEntry. All of which happen under the conn mutex and themselves check existence of the connection in the relevant map first. The associated addPendingInfo, addActiveConn, and addPersistentEntry are the only places the entries are added to the map which also all under happen under the same conn mutex.

So, I don't really think it's worth adding an additional check here due to all of the above.

jholdstock · 2026-06-04T01:09:01Z

+		Listeners:       []net.Listener{listener},
+		MaxNormalConns:  30, // High enough to not interfere with per-host tests.
+		MaxConnsPerHost: maxConnsPerHost,
+		TargetOutbound:  maxConnsPerHost,


Shouldn't TargetOutbound be a number greater than maxConnsPerHost, similar to MaxNormalConns, to ensure the number of conns is definitely being limited by MaxConnsPerHost and not by TargetOutbound?

I don't believe so. It is intentionally limiting the target number of outbounds it tries to automatically make to that limit so the test can ensure they are all allowed. Then the tests attempt to add more to the same host (both whitelisted and not, manual, pending, and persistent) in order to ensure it's actually being limited for those cases.

If it were to try to make more automatic outbound connections, they'd end up repeatedly being rejected (because the address func is only returning addresses for the same host) which would mess up the rest of the tests due to the pending connections and inability to to effectively pause the connections on demand via the trick that forces the addrs returned to be errors forcing the auto outbounds into the retry backoff.

I'm open to suggestions there, but, I was unable to find a better way to effectively pause it without needing to put extra instrumentation in the production code that only exists for the sole purpose of allowing tests to control it.

Similar to the recent total normal connection limiting, the current per-host connection limits are enforced by the server. For similar reasons, it is much more robust and natural to perform the limiting early in the connection manager. With that in mind, this implements the per-host connection limiting in the connection manager and removes the relevant current limiting for it from the server. The limiting is applied to inbound, outbound, and persistent connections. The new limiting is handled early in both the inbound and outbound paths now which allows it to take advantage of fast connection shedding for inbound connections and to preemptively prevent all outbound attempts that would exceed the limit regardless of source. It also provides the flexibility to apply independent special permissions in the future. This also slightly changes the semantics to exempt whitelisted addresses for both inbound and outbound connections as opposed to only inbound connections.

This adds tests to ensure that the new max connections per host limiting properly enforces the limit including automatic outbound, manual outbound, inbound, and persistent connections. It also tests whitelisted addresses are exempt.

davecgh added this to the 2.2.0 milestone May 21, 2026

davecgh changed the title ~~Connmgr maxsamehost~~ connmgr: Limit max connections per host. May 21, 2026

davecgh force-pushed the connmgr_maxsamehost branch from 5434681 to 57a78df Compare May 21, 2026 22:29

davecgh added 2 commits May 21, 2026 23:47

connmgr: Add context-aware semaphore.

a1dcaea

This adds a new context-aware semaphore type with Acquire and Release methods for use in upcoming changes that aim to simplify connection limiting by making use of semaphores for blocking until permits become available.

connmgr: Add semaphore tests.

704eb85

This adds tests for the new context-aware semaphore to ensure the acquire, release, and context cancel semantics work as expected.

davecgh force-pushed the connmgr_maxsamehost branch 2 times, most recently from 1d1f2bf to 8524bd9 Compare May 23, 2026 05:11

davecgh mentioned this pull request May 23, 2026

connmgr: Misc test cleanup and modernization. #3699

Open

davecgh added 11 commits May 24, 2026 17:39

connmgr: Correct shutdown failed conns test.

61122e4

This updates the test for checking the connection manager cleanly shuts down with failed conns to actualy test what it is intended to. Manual connections do not automatically retry, only persistent connections.

connmgr: Add double close tests.

1ef516b

This adds tests to ensure closing a connection multiple times works as intended.

connmgr: Add duplicate conn rejection tests.

a3f4690

This adds tests to ensure duplication connections are rejected for all possible states.

connmgr: Add max persistent conns test.

9e6fe37

This adds tests to ensure attempts to add more than the maximum allowed number of persistent are rejected.

connmgr: Add disconnect by id tests.

354adde

This adds tests to ensure the Disconnect method properly disconnects pending and established connections for both non-persistent and persistent connections.

connmgr: Add remove by id tests.

bd5aaf2

This adds tests to ensure the Remove method properly disconnects and removes pending and established connections for both non-persistent and persistent connections.

connmgr: Update README.md.

3b78fbb

This updates the connmgr package README.md to match the new design and capabilities.

davecgh force-pushed the connmgr_maxsamehost branch from 8524bd9 to 92268d2 Compare May 24, 2026 22:47

davecgh added 8 commits May 25, 2026 03:00

connmgr: Add whitelist detection tests.

78e734f

This adds tests to ensure the new whitelist detection method works as expected.

server: Integrate connmgr whitelisting.

b2f6007

This modifies the server to pass in the parsed whitelist entries to the connection manager config and the relevant code to make use of the new method it exposes. Finally, it removes the no longer used local isWhitelisted method.

connmgr: Update README.md for whitelist support.

dc4cd88

connmgr: Add semaphore try acquire tests.

0b85a38

This adds tests for the new TryAcquire method on the context-aware semaphore to ensure the semantics work as expected.

connmgr: Add total max normal conns tests.

9b30eee

This adds tests to ensure that the new max normal connection limiting properly enforces the limit including automatic outbound, manual outbound, and inbound connections. It also ensures that it not applied to persistent connections.

connmgr: Update README.md for total conn limits.

aedda39

davecgh force-pushed the connmgr_maxsamehost branch from 92268d2 to a295a18 Compare May 30, 2026 07:32

jholdstock reviewed Jun 4, 2026

View reviewed changes

davecgh added 3 commits June 3, 2026 20:36

connmgr: Add max per-host conn tests.

8f9ff58

This adds tests to ensure that the new max connections per host limiting properly enforces the limit including automatic outbound, manual outbound, inbound, and persistent connections. It also tests whitelisted addresses are exempt.

connmgr: Update README.md for per-host conn limits.

a09f8f5

davecgh force-pushed the connmgr_maxsamehost branch from a295a18 to a09f8f5 Compare June 4, 2026 01:50

jholdstock approved these changes Jun 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

connmgr: Limit max connections per host.#3698

connmgr: Limit max connections per host.#3698
davecgh wants to merge 24 commits into
decred:masterfrom
davecgh:connmgr_maxsamehost

davecgh commented May 21, 2026

Uh oh!

Uh oh!

jholdstock Jun 4, 2026

Uh oh!

davecgh Jun 4, 2026 •

edited

Loading

Uh oh!

jholdstock Jun 4, 2026

Uh oh!

davecgh Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

davecgh commented May 21, 2026

Uh oh!

Uh oh!

jholdstock Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

davecgh Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jholdstock Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

davecgh Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

davecgh Jun 4, 2026 •

edited

Loading