Skip to content

docs: audit discovery service page#501

Open
Iheanacho-ai wants to merge 1 commit intosiderolabs:mainfrom
Iheanacho-ai:discovery
Open

docs: audit discovery service page#501
Iheanacho-ai wants to merge 1 commit intosiderolabs:mainfrom
Iheanacho-ai:discovery

Conversation

@Iheanacho-ai
Copy link
Copy Markdown
Member

fixes #439

Signed-off-by: Amarachi Iheanacho <amarachi.iheanacho@siderolabs.com>
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@github-project-automation github-project-automation Bot moved this to To Do in Planning Apr 20, 2026
@talos-bot talos-bot moved this from To Do to In Review in Planning Apr 20, 2026

> Note: Talos supports operations when Discovery Service is disabled, but some features will rely on Kubernetes API availability to discover
> controlplane endpoints, so in case of a failure disabled Discovery Service makes troubleshooting much harder.
Each node submits its own data plus the endpoints it observes from other peers. The discovery service aggregates this, deduplicates endpoints, and distributes updates to all connected peers. Peers decrypt the data locally and use it to drive cluster discovery and [KubeSpan](../../networking/kubespan).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add KubePrism too.

The discovery service doesn’t see actual node information – it only stores and updates encrypted blobs.
Discovery data is encrypted/decrypted by the clients – the cluster members.
The discovery service does not have the encryption key.
- [KubeSpan](../../networking/kubespan) and KubePrism require discovery and do not function correctly without it.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They do function on local cache for a period of time. I think it's 20 minutes. Spencer's team would know more details. This is needed because of downtime and upgrades.

If a node reboots while the discovery service is unavailable, it loses all in-memory state and cannot publish its information or retrieve peer data until the service becomes available again.

If the outage exceeds the TTL, all discovery records expire. When the discovery service comes back online, it may return an empty dataset. Nodes receiving this update drop their existing peer information, which can temporarily disrupt KubeSpan connectivity.
If the outage exceeds the TTL, all discovery records expire. When the service comes back online, it may return an empty dataset. Nodes receiving this update drop their existing peer information, which can temporarily disrupt KubeSpan connectivity. Recovery is automatic, nodes republish their data, peer information is rebuilt, and connectivity is restored without manual intervention.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to state if the TTL is hard coded or configurable

### Affiliates

#### Affiliates
An affiliate is a proposed cluster member, a node that shares the same cluster ID and secret. Use this resource to see what nodes the discovery registries are aware of:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should mention how the cluster ID is derived. I know it's generated from the cluster PKI (not exactly sure how) which is why it's important to have unique PKI for each cluster.


Talos Linux includes node-discovery capabilities that depend on a discovery registry.
This allows you to see the members of your cluster, and the associated IP addresses of the nodes.
The Talos Linux discovery service enables nodes in a cluster to find and identify each other automatically. Without discovery, nodes have no built-in way to learn about other cluster members, their IP addresses, or their connection endpoints. With discovery enabled, this information is shared and kept up to date across all nodes, which is what allows Talos to form a cluster and, when enabled, establish encrypted [KubeSpan](../../networking/kubespan) tunnels between nodes.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would point out earlier in this page that we run it as a service and they can self-host with a license. I'd also mention at the top that it's important for kubeprism too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

audit and rewrite discovery service page

3 participants