Skip to content

EV-6666: Surface Alertmanager alerts on the manager Alerts page#4879

Draft
rene-dekker wants to merge 5 commits into
tigera:masterfrom
rene-dekker:EV-6666
Draft

EV-6666: Surface Alertmanager alerts on the manager Alerts page#4879
rene-dekker wants to merge 5 commits into
tigera:masterfrom
rene-dekker:EV-6666

Conversation

@rene-dekker

Copy link
Copy Markdown
Member

Wires Prometheus/Alertmanager alerts through to the manager Alerts page, with a toggle to enable/disable the integration.

  • Allow Alertmanager ingress to Linseed + RBAC to create Linseed events
  • Point the Alertmanager webhook at Linseed's /api/v1/events/alertmanager
  • Add Monitor.spec.uiAlertsIntegration (Enabled|Disabled, default Enabled). When disabled, the rendered Alertmanager config routes to a null receiver. The operator regenerates the config secret when it owns it, so toggling takes effect at runtime.
  • Annotate the Alertmanager pod with a hash of its config so config changes roll the pod and reload the new config.

Companion PRs: calico-private (Linseed ingest + dedup), ui-modules (Alerts page toggle + prometheus_alert rendering).

Added Monitor.spec.uiAlertsIntegration to enable/disable surfacing Prometheus alerts on the Calico Enterprise manager Alerts page.

🤖 Generated with Claude Code

@CLAassistant

CLAassistant commented Jun 2, 2026

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

rene-dekker and others added 4 commits June 9, 2026 14:49
Add a Linseed network policy ingress rule permitting traffic from the
Alertmanager pods in the tigera-prometheus namespace, so Alertmanager can
push Prometheus alerts to Linseed as events. The Alertmanager egress policy
already allows all TCP egress, so only the Linseed ingress side was missing.

Exports monitor.AlertmanagerSourceEntityRule as the single source of truth
for the Alertmanager pod selector.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a ClusterRole granting create on events (linseed.tigera.io), bound to
the prometheus service account that Alertmanager runs as. Linseed authorizes
writes via SubjectAccessReview, so this lets Alertmanager push Prometheus
alerts to Linseed as events using its existing service account token. The
role/binding are rendered only when Alertmanager is enabled and removed
otherwise.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the placeholder Alertmanager webhook receiver with one that posts to
Linseed's /api/v1/events/alertmanager endpoint, so Prometheus alerts surface on
the Alerts UI page. Linseed requires mTLS plus a bearer token, so the
Alertmanager spec now mounts the prometheus client TLS key pair and the trusted
CA bundle, and the webhook http_config references them along with the
service account token.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a UIAlertsIntegration (Enabled|Disabled) field to the Monitor spec that
controls whether Prometheus/Alertmanager alerts are forwarded to Linseed and
surfaced on the manager Alerts page (defaults to Enabled).

When disabled, the operator renders an Alertmanager config that routes to a
null receiver instead of the Linseed webhook. The config secret is regenerated
to the selected variant when the operator owns it, so the toggle takes effect
at runtime. A hash of the Alertmanager config is added as a pod annotation so
that config changes roll the Alertmanager pod and reload the new config.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@rene-dekker rene-dekker force-pushed the EV-6666 branch 2 times, most recently from 099c002 to 044b64c Compare June 10, 2026 18:53
Replace the raw alertmanager.yaml config secret with an AlertmanagerConfig custom
resource referenced by Alertmanager.spec.alertmanagerConfiguration:

- If the user supplies an AlertmanagerConfig named calico-node-alertmanager in the
  tigera-operator namespace, the operator renders a copy of it in tigera-prometheus.
  Otherwise it renders a default: the Linseed webhook receiver when the UI alerts
  integration is enabled, or a null receiver when disabled.
- The webhook authenticates to Linseed with the Linseed-issued bearer token secret
  for the prometheus service account (prometheus-tigera-linseed-token) and the
  client cert / CA bundle, all referenced from the CR; the prometheus-operator
  mounts them into the Alertmanager pod, so the explicit Secrets/ConfigMaps mounts
  are removed.
- The pod is annotated with a hash of the AlertmanagerConfig spec, client cert and
  CA bundle so any config change rolls the pod.
- The legacy alertmanager-calico-node-alertmanager config secret is now deleted.

This also fixes the upgrade gap where a pre-existing (stock) config secret was left
untouched because it matched neither operator default, so the integration never wired up.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
(cherry picked from commit 044b64c)

@electricjesus electricjesus left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drive-by review, courtesy of a quest from Tigera Town 🤣. Mostly looks good. One thing I think blocks merge, plus a few mediums, left inline.

Cross-PR ordering: these two have to ship together, and this PR can't stand alone. The operator's own RBAC for alertmanagerconfigs lives in the calico-private charts in tigera/calico-private#12184, so if this vendors into the operator ahead of that, the monitor controller can't create the AlertmanagerConfig and goes degraded. Same on the receiving end: without #12184 the /api/v1/events/alertmanager endpoint 404s and Linseed rejects the prometheus_alert type. Worth pinning both to the same release and noting the dependency on each PR while they're still draft.

One nice-to-have I noticed but won't block on: the config-hash annotation that rolls the pod doesn't include the token secret data, so the pod won't roll when Kubernetes first populates the token. It relies on the config-reloader watching the mounted secret. Probably fine, worth a sanity check.

// monitor.AlertmanagerConfigName), the operator renders a copy of it in tigera-prometheus.
// Otherwise it renders the operator's default config (the Linseed webhook receiver when the UI
// alerts integration is enabled, or a null receiver when disabled).
func (r *ReconcileMonitor) readAlertmanagerConfig(ctx context.Context, uiAlertsEnabled bool) (*monitoringv1alpha1.AlertmanagerConfig, error) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This drops existing customer Alertmanager config on upgrade, and I don't think we can ship it that way.

Today the customization path is the raw alertmanager-calico-node-alertmanager secret. It's documented, and the old readAlertmanagerConfigSecret carried all that owner-ref logic precisely to leave a user-modified secret alone. This PR deletes that secret and only reads config from an AlertmanagerConfig CR the customer has never created. So on upgrade, anyone who set their own receivers (PagerDuty, Slack, email) loses them and falls back to the default Linseed webhook. Their external paging stops and the alerts quietly reroute to the manager UI instead.

Options, in order of how much I'd trust them:

  • Migrate: if the legacy secret exists and differs from the old default, parse it and seed the AlertmanagerConfig before deleting the secret.
  • Failing that, detect a non-default legacy secret and SetDegraded with a clear message instead of silently replacing it, so the upgrade isn't invisible.

Either way the release note has to call this out as a breaking change. Right now it only describes the new feature.


// The Linseed bearer-token secret is only needed when Alertmanager is running and forwarding
// alerts to Linseed (the UI alerts integration is enabled); otherwise remove it.
if mc.alertmanagerReplicas() > 0 && mc.cfg.Monitor.UIAlertsEnabled() {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two things about the disable toggle when a user brings their own AlertmanagerConfig.

The toggle only swaps the default. If a user has their own AlertmanagerConfig in the operator namespace, uiAlertsIntegration: Disabled does nothing, since we copy their spec verbatim. The field doc says it "controls whether alerts are forwarded to Linseed," which won't hold for those users. Worth documenting the precedence, or deciding whether disable should win regardless.

Separately, the Linseed token secret and the tigera-alertmanager-linseed ClusterRole/Binding get created whenever Alertmanager runs with the integration enabled, even if the user's own config never talks to Linseed. That leaves a token secret and an event-create grant nothing uses. Not harmful, but it's a dangling credential. Could gate those on the default-config path rather than on UIAlertsEnabled alone.

Comment thread api/v1/monitor_types.go
}

// +kubebuilder:validation:Enum=Enabled;Disabled
type UIAlertsIntegrationStatusType string

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UIAlertsIntegrationStatusType reads like a status field, but this is a spec enum. It's public API and awkward to rename after release, so I'd fix it now. UIAlertsIntegrationType or UIAlertsIntegrationMode matches what it actually is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants