Replace fluentd with fluent-bit in the operator#4910
Draft
hjiawei wants to merge 2 commits into
Draft
Conversation
Render the calico-fluent-bit DaemonSet (and its Windows variant) in place of fluentd, migrating the resource identity and wiring up a working fluent-bit configuration. - Namespace tigera-fluentd -> calico-system; DaemonSet/ServiceAccount fluentd-node -> calico-fluent-bit; TLS secret -> calico-fluent-bit-tls; image ComponentFluentd -> ComponentFluentBit; metrics port 9081 -> 2020 (fluent-bit's built-in HTTP server). - Config is rendered in fluent-bit's YAML schema into per-OS ConfigMaps (calico-fluent-bit-conf and -windows — a shared name would make the two renders overwrite each other on mixed clusters), subPath-mounted on Linux and directory-mounted on Windows (which cannot mount single files), and started with `-c`. It loads the Go plugins via plugins_file, defines parsers inline, applies the record_transformer lua filter, and inlines user-provided fluent-bit YAML filter lists. A hash of the rendered config on the pod template rolls the daemonset on config-only changes. - Tail inputs use the producing components' real paths (waf/, runtime-security/report.log, audit/tsee-audit.log, ids/events.log, the compliance.*.reports.log glob, policy/policy_activity.log) with SQLite offsets and filesystem buffering under /var/log/calico/calico-fluent-bit. The pos-migrator init container (Linux and Windows) seeds offsets from the fluentd .pos files and pre-creates the tailed directories so glob inputs don't error while a feature's log dir is absent. Windows tails the fluentd-windows types (flows, audit.tsee, audit.kube) against the C:\fluent-bit image layout. - The linseed output matches only Linseed-bound tags (match_regex; IDS events and compliance reports are not Linseed-bound), posts with ca_file/cert_file/key_file (Go proxy plugins reject the native tls.* namespace) and the in-cluster ServiceAccount token, and retries without limit against the bounded filesystem buffer. S3, Splunk and Syslog outputs mirror fluentd's per-type fan-out: standard AWS credential env vars, endpoint scheme honored, and syslog packs the whole record as JSON via a per-output lua processor with TLS actually enabled (mode alone only selects framing) and the trusted-bundle CA when a user syslog certificate is configured. - NonClusterHost renders the :9880 http input with client-certificate verification (voltron presents its internal certificate, matching fluentd's client_cert_auth), and the input Service is cleaned up when the resource is removed. - eks-log-forwarder runs the fluent-bit image with a rendered in_eks -> linseed pipeline and health probes; the fluentd-era startup init container is gone (the plugin resolves its resume point from Linseed) and FetchInterval maps to EKS_CLOUDWATCH_POLL_INTERVAL. - Health probes hit :2020/api/v1/health (health_check on). The ServiceMonitor scrapes plain HTTP — fluent-bit's monitoring server has no TLS, unlike fluentd's mTLS exporter — with access restricted by the component NetworkPolicy; legacy fluentd monitors are removed. - The LogCollector controller no longer creates or owns the calico-system namespace (deleting the LogCollector must not garbage-collect it), the deprecated fluentdDaemonSet override is honored as an alias with container-name translation, deepcopy and the embedded LogCollector CRD are regenerated, and the legacy tigera-fluentd resources — the namespace last — are cleaned up idempotently on every reconcile. - API: CalicoFluentBitDaemonSet added (FluentdDaemonSet deprecated); golden policy fixtures and enterprise_versions.yml updated. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
b8edebb to
98ab545
Compare
Replace the out_linseed Go proxy output with one built-in http output block per Linseed-bound tag, for the Linux and Windows daemonsets and the eks-log-forwarder. The http output is C compiled into fluent-bit: `format json_lines` with the date key disabled produces exactly the NDJSON body Linseed's bulk APIs expect, native tls.* carries the mTLS client keypair (with hostname verification enabled — fluent-bit's tls.verify alone only checks the chain), and bearer_token_file (re-read per request) carries the ServiceAccount or managed-cluster token. Multi-tenant clusters send the x-tenant-id header. The Windows image runs no Go code at all, so the Windows config no longer loads a plugins file; the Linux configs keep it only for the in_eks EKS CloudWatch input, which stays a Go plugin and now feeds the http output instead of out_linseed. The optional EksCloudwatchLog streamPrefix/fetchInterval settings are omitted from the environment when unset as defense in depth (the logcollector controller defaults them before render, but an empty prefix or zero interval reaching the plugin would override its own defaults with settings that match every stream / disable polling). Per-tag filesystem retry backlogs replace the single shared cap: flows keeps 500M, low-volume tags get 100M each. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
e121ac4 to
c81ad06
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
New feature (Calico Enterprise): the LogCollector controller now deploys fluent-bit (
calico-fluent-bitincalico-system) in place of fluentd, completing the log-collector migration on the operator side.tigera-fluentd→calico-system; DaemonSet/ServiceAccountfluentd-node→calico-fluent-bit; TLS secret →calico-fluent-bit-tls; metrics port 9081 → 2020 (fluent-bit's built-in HTTP server).calico-fluent-bit-conf/-windows), subPath-mounted on Linux and directory-mounted on Windows, started with-c. The render loads the Go plugins viaplugins_file, defines parsers inline, applies the record-transform Lua filter, and inlines user-provided fluent-bit YAML filter lists. A rendered-config hash annotation rolls the pods on config-only changes./var/log/calico/calico-fluent-bit; apos-migratorinit container (Linux and Windows) seeds offsets from the legacy fluentd.posfiles and pre-creates the tailed directories. Windows tails the same log types the fluentd Windows variant shipped.:9880HTTP input with client-certificate verification; the input Service is cleaned up when the resource is removed.eks-log-forwarderruns the fluent-bit image with a renderedin_eks→linseedpipeline and health probes (no startup init container; the input plugin resolves its resume point from Linseed).:2020/api/v1/health; the ServiceMonitor scrapes plain HTTP (fluent-bit's monitoring server has no TLS) with access restricted by the component NetworkPolicy, and legacy fluentd monitors are deleted.calico-systemnamespace (deleting the LogCollector must not garbage-collect it); the deprecatedfluentdDaemonSetoverride is honored as an alias of the newcalicoFluentBitDaemonSetfield (with container-name translation); legacytigera-fluentdresources — the namespace last — are cleaned up idempotently.Testing: render, controller, and monitor unit suites updated/extended (ConfigMap-content assertions replacing the env-var assertions); the rendered configuration was validated against the real fluent-bit binary; the full migration was validated end-to-end on a test cluster — all log types flowing to Linseed/Elasticsearch, fluentd resources fully removed, tail-offset handover without re-shipping, NonClusterHost ingestion with client-certificate enforcement, and EKS/Windows render shapes verified.
Release Note
For PR author
make gen-filesmake gen-versionsFor PR reviewers
A note for code reviewers - all pull requests must have the following:
kind/bugif this is a bugfix.kind/enhancementif this is a a new feature.enterpriseif this PR applies to Calico Enterprise only.