Skip to content

feat(new transform): add a drain log clustering transform#25619

Draft
srstrickland wants to merge 1 commit into
vectordotdev:masterfrom
srstrickland:feat/drain-log-processor
Draft

feat(new transform): add a drain log clustering transform#25619
srstrickland wants to merge 1 commit into
vectordotdev:masterfrom
srstrickland:feat/drain-log-processor

Conversation

@srstrickland

Copy link
Copy Markdown
Contributor

Summary

Adds a new drain transform that runs the Drain log parsing algorithm on each event, derives a template string (e.g. user <*> logged in from <*>), and writes it to a configurable field. Mirrors the OpenTelemetry Collector drain processor, including the seed_templates / seed_logs / warmup_min_clusters knobs for stable templates across deployments.

The transform is built on a new internal crate at lib/drain-log, vendored from the drain3 crate (Apache-2.0, akshatagarwl/drain3) with one substantive addition: true LRU eviction when max_clusters is hit. The upstream crate refuses new clusters at the cap; the OpenTelemetry processor (and the original logpai/Drain3 Python implementation) instead evicts the least-recently-used cluster so the matcher keeps adapting to drifting log vocabularies on long-running streams. The OpenObserve XDrain post calls this out as essential for production streams ("LRU is mandatory"), so the vendored copy implements it via an intrusive doubly-linked list threaded through Cluster — O(1) touch on match, O(1) eviction, no interior mutability, freed cluster ids recycled so the slot vector stays bounded.

Notable design choices:

  • The transform is task-style (stateful, single-threaded per instance). The matcher is Send, suitable for the per-transform task model.
  • template_field defaults to drain_template rather than the OTel log.record.template name, since dots are nested paths in Vector; the OTel-style flat name is reachable via the quoted path "log.record.template" and called out in the field docs.
  • Persistence (snapshot to a storage extension) is deliberately out of scope for v1 — Vector has no equivalent of OTel's storage extension abstraction, and seeding plus warmup cover most of the cross-restart stability the OTel processor's snapshot mode targets.

Tests: 6 LRU tests in drain-log covering eviction policy, id reuse, DLL ordering invariants under interleaved touches, and a hot-clusters- survive-churn stress test; 5 transform-level tests covering annotation, warmup suppression, missing-field pass-through, and a custom template field.

Vector configuration

sources:
  generator:
    type: demo_logs
    format: syslog
    interval: 0.2

transforms:
  cluster:
    type: drain
    inputs: [generator]
    # All other knobs default — see field docs in
    # src/transforms/drain.rs for the full surface.
    max_clusters: 5000

sinks:
  console:
    type: console
    inputs: [cluster]
    encoding:
      codec: json

Each emitted event carries an additional drain_template field with the
Drain-derived template string (e.g. <> <> <> systemd[<>]: Started <*>).
Watch the templates stabilise across a few seconds of input as the matcher
converges.

How did you test this PR?

Unit tests (cargo test, run on Linux x86_64):

  • lib/drain-log — 6 tests covering the LRU eviction policy: live cluster count, oldest-on-cap eviction, freed-id
    recycling, ordering invariants under interleaved touches, hot-clusters-survive-churn stress, and the unlimited
    (max_clusters = 0) path.
  • src/transforms/drain — 5 tests covering basic annotation, warmup_min_clusters suppression, missing-field passthrough,
    a custom template_field, and generate_config.

All tests pass. cargo clippy --no-default-features --features transforms-drain -- -D warnings is clean.

Manual run against demo_logs (syslog + apache_common + apache_error formats simultaneously) for several minutes,
observing template convergence on stdout via the console sink. Verified that:

  • Templates stabilise within a few seconds (e.g. <> - - - "GET <> HTTP/1.0" <> <> for Apache common).
  • warmup_min_clusters correctly suppresses annotation until the configured number of distinct clusters is observed.
  • Setting template_field to the quoted path "log.record.template" writes the OpenTelemetry-style flat dotted attribute
    name.

Build: cargo build --bin vector succeeds with the new feature in the default transforms-logs set.

Change Type

  • Bug fix
  • New feature
  • Dependencies
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the no-changelog label to this PR.

References

Adds a new `drain` transform that runs the Drain log parsing algorithm
on each event, derives a template string (e.g. `user <*> logged in from
<*>`), and writes it to a configurable field. Mirrors the OpenTelemetry
Collector `drain` processor, including the seed_templates / seed_logs /
warmup_min_clusters knobs for stable templates across deployments.

The transform is built on a new internal crate at `lib/drain-log`,
vendored from the `drain3` crate (Apache-2.0, akshatagarwl/drain3) with
one substantive addition: true LRU eviction when `max_clusters` is hit.
The upstream crate refuses new clusters at the cap; the OpenTelemetry
processor (and the original logpai/Drain3 Python implementation) instead
evicts the least-recently-used cluster so the matcher keeps adapting to
drifting log vocabularies on long-running streams. The OpenObserve
XDrain post calls this out as essential for production streams ("LRU is
mandatory"), so the vendored copy implements it via an intrusive
doubly-linked list threaded through `Cluster` — O(1) touch on match,
O(1) eviction, no interior mutability, freed cluster ids recycled so
the slot vector stays bounded.

Notable design choices:
  * The transform is task-style (stateful, single-threaded per
    instance). The matcher is `Send`, suitable for the per-transform
    task model.
  * `template_field` defaults to `drain_template` rather than the OTel
    `log.record.template` name, since dots are nested paths in Vector;
    the OTel-style flat name is reachable via the quoted path
    `"log.record.template"` and called out in the field docs.
  * Persistence (snapshot to a storage extension) is deliberately
    out of scope for v1 — Vector has no equivalent of OTel's storage
    extension abstraction, and seeding plus warmup cover most of the
    cross-restart stability the OTel processor's snapshot mode targets.

Tests: 6 LRU tests in `drain-log` covering eviction policy, id reuse,
DLL ordering invariants under interleaved touches, and a hot-clusters-
survive-churn stress test; 5 transform-level tests covering annotation,
warmup suppression, missing-field pass-through, and a custom template
field.
@github-actions github-actions Bot added the domain: transforms Anything related to Vector's transform components label Jun 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

domain: transforms Anything related to Vector's transform components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant