test(trace-normalization): add microbenchmarks for tag/metric/truncate normalization by yannham · Pull Request #2124 · DataDog/libdatadog

yannham · 2026-06-16T08:57:50Z

What does this PR do?

Extends the existing criterion bench (libdd-trace-normalization/benches/normalization_utils.rs) to cover the normalization functions that were previously unmeasured, all of which run on every ingested span:

normalize_tag: the heaviest function (per-codepoint UTF-8 scan + char-class state machine). Benched on ASCII fast-path, mixed/illegal-char, unicode (codepoint slow path), and over-length (> MAX_TAG_LEN) inputs.
normalize_metric_name: similar complexity with one-byte lookahead. Clean, separator-collapsing, and over-length cases.
truncate_utf8: over-length ASCII plus a multi-byte (3-byte) input where the limit lands mid-codepoint and forces the boundary walk-back.
normalize_span_start_duration: clean vs needs-clock cases to quantify the SystemTime read on the pre-year-2000 path.

A bench-internals cargo feature is added (mirroring libdd-sampling) to expose the otherwise-private benched functions, without changing the shipped public API. The [[bench]] now requires this feature.

Motivation

Normalization runs on every span, but the existing bench skipped the expensive per-char UTF-8 state machines. These are the functions most likely to show up as a per-span tax, so they are worth tracking.

Additional Notes

Follows the existing bench idioms (same group naming, iter_batched_ref over 1000 owned copies, Throughput(Elements), black_box on inputs).
Net change is small (~190 LoC).
Default (shipped) build is unaffected: the visibility split is gated behind bench-internals.

How to test the change?

cargo check -p libdd-trace-normalization
cargo bench -p libdd-trace-normalization --features bench-internals --no-run
cargo bench -p libdd-trace-normalization --features bench-internals -- --warm-up-time 1 --measurement-time 1 --sample-size 10

🤖 (Partly) Generated with Claude Code

github-actions · 2026-06-16T08:59:33Z

📚 Documentation Check Results

⚠️ 129 documentation warning(s) found

📦 `libdd-trace-normalization` - 129 warning(s)

Updated: 2026-06-16 15:45:06 UTC | Commit: 9bd6659 | missing-docs job results

github-actions · 2026-06-16T09:01:15Z

Clippy Allow Annotation Report

Comparing clippy allow annotations between branches:

Base Branch: origin/main
PR Branch: origin/yannham/microbenches-normalization

Summary by Rule

Rule	Base Branch	PR Branch	Change
unwrap_used	1	1	No change (0%)
Total	1	1	No change (0%)

Annotation Counts by File

File	Base Branch	PR Branch	Change
`libdd-trace-normalization/src/normalize_utils.rs`	1	1	No change (0%)

Annotation Stats by Crate

Crate	Base Branch	PR Branch	Change
`clippy-annotation-reporter`	5	5	No change (0%)
`datadog-ffe-ffi`	1	1	No change (0%)
`datadog-ipc`	21	21	No change (0%)
`datadog-live-debugger`	4	4	No change (0%)
`datadog-live-debugger-ffi`	10	10	No change (0%)
`datadog-profiling-replayer`	4	4	No change (0%)
`datadog-sidecar`	46	46	No change (0%)
`libdd-common`	13	13	No change (0%)
`libdd-common-ffi`	12	12	No change (0%)
`libdd-data-pipeline`	5	5	No change (0%)
`libdd-ddsketch`	2	2	No change (0%)
`libdd-dogstatsd-client`	1	1	No change (0%)
`libdd-profiling`	13	13	No change (0%)
`libdd-remote-config`	3	3	No change (0%)
`libdd-telemetry`	20	20	No change (0%)
`libdd-tinybytes`	4	4	No change (0%)
`libdd-trace-normalization`	2	2	No change (0%)
`libdd-trace-obfuscation`	3	3	No change (0%)
`libdd-trace-stats`	1	1	No change (0%)
`libdd-trace-utils`	12	12	No change (0%)
Total	182	182	No change (0%)

About This Report

This report tracks Clippy allow annotations for specific rules, showing how they've changed in this PR. Decreasing the number of these annotations generally improves code quality.

github-actions · 2026-06-16T09:01:39Z

🔒 Cargo Deny Results

⚠️ 1 issue(s) found, showing only errors (advisories, bans, sources)

📦 `libdd-trace-normalization` - 1 error(s)

Show output

error[unsound]: Rand is unsound with a custom logger using `rand::rng()`
   ┌─ /home/runner/work/libdatadog/libdatadog/Cargo.lock:51:1
   │
51 │ rand 0.8.5 registry+https://github.com/rust-lang/crates.io-index
   │ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ unsound advisory detected
   │
   ├ ID: RUSTSEC-2026-0097
   ├ Advisory: https://rustsec.org/advisories/RUSTSEC-2026-0097
   ├ It has been reported (by @lopopolo) that the `rand` library is [unsound](https://rust-lang.github.io/unsafe-code-guidelines/glossary.html#soundness-of-code--of-a-library) (i.e. that safe code using the public API can cause Undefined Behaviour) when all the following conditions are met:
     
     - The `log` and `thread_rng` features are enabled
     - A [custom logger](https://docs.rs/log/latest/log/#implementing-a-logger) is defined
     - The custom logger accesses `rand::rng()` (previously `rand::thread_rng()`) and calls any `TryRng` (previously `RngCore`) methods on `ThreadRng`
     - The `ThreadRng` (attempts to) reseed while called from the custom logger (this happens every 64 kB of generated data)
     - Trace-level logging is enabled or warn-level logging is enabled and the random source (the `getrandom` crate) is unable to provide a new seed
     
     `TryRng` (previously `RngCore`) methods for `ThreadRng` use `unsafe` code to cast `*mut BlockRng<ReseedingCore>` to `&mut BlockRng<ReseedingCore>`. When all the above conditions are met this results in an aliased mutable reference, violating the Stacked Borrows rules. Miri is able to detect this violation in sample code. Since construction of [aliased mutable references is Undefined Behaviour](https://doc.rust-lang.org/stable/nomicon/references.html), the behaviour of optimized builds is hard to predict.
   ├ Announcement: https://github.com/rust-random/rand/pull/1763
   ├ Solution: Upgrade to >=0.10.1 OR <0.10.0, >=0.9.3 OR <0.9.0, >=0.8.6 (try `cargo update -p rand`)
   ├ rand v0.8.5
     └── (dev) libdd-trace-normalization v2.0.0

advisories FAILED, bans ok, sources ok

Updated: 2026-06-16 15:47:12 UTC | Commit: 9bd6659 | dependency-check job results

datadog-official · 2026-06-16T09:11:52Z

Tests

🎉 All green!

🧪 All tests passed
❄️ No new flaky tests detected

🎯 Code Coverage (details)
• Patch Coverage: 100.00%
• Overall Coverage: 73.41% (+0.34%)

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: 302189a | Docs | Datadog PR Page | Give us feedback!}

dd-octo-sts · 2026-06-16T09:30:38Z

Artifact Size Benchmark Report

aarch64-alpine-linux-musl

Artifact	Baseline	Commit	Change
/aarch64-alpine-linux-musl/lib/libdatadog_profiling.so	7.70 MB	7.70 MB	0% (0 B) 👌
/aarch64-alpine-linux-musl/lib/libdatadog_profiling.a	83.67 MB	83.67 MB	0% (0 B) 👌

aarch64-unknown-linux-gnu

Artifact	Baseline	Commit	Change
/aarch64-unknown-linux-gnu/lib/libdatadog_profiling.a	94.77 MB	94.77 MB	0% (0 B) 👌
/aarch64-unknown-linux-gnu/lib/libdatadog_profiling.so	10.34 MB	10.34 MB	0% (0 B) 👌

libdatadog-x64-windows

Artifact	Baseline	Commit	Change
/libdatadog-x64-windows/debug/dynamic/datadog_profiling_ffi.dll	24.83 MB	24.83 MB	0% (0 B) 👌
/libdatadog-x64-windows/debug/dynamic/datadog_profiling_ffi.lib	87.33 KB	87.33 KB	0% (0 B) 👌
/libdatadog-x64-windows/debug/dynamic/datadog_profiling_ffi.pdb	180.86 MB	180.85 MB	-0% (-8.00 KB) 👌
/libdatadog-x64-windows/debug/static/datadog_profiling_ffi.lib	925.02 MB	925.02 MB	+0% (+1.03 KB) 👌
/libdatadog-x64-windows/release/dynamic/datadog_profiling_ffi.dll	8.09 MB	8.09 MB	0% (0 B) 👌
/libdatadog-x64-windows/release/dynamic/datadog_profiling_ffi.lib	87.33 KB	87.33 KB	0% (0 B) 👌
/libdatadog-x64-windows/release/dynamic/datadog_profiling_ffi.pdb	23.94 MB	23.94 MB	0% (0 B) 👌
/libdatadog-x64-windows/release/static/datadog_profiling_ffi.lib	47.78 MB	47.78 MB	0% (0 B) 👌

libdatadog-x86-windows

Artifact	Baseline	Commit	Change
/libdatadog-x86-windows/debug/dynamic/datadog_profiling_ffi.dll	21.52 MB	21.52 MB	0% (0 B) 👌
/libdatadog-x86-windows/debug/dynamic/datadog_profiling_ffi.lib	88.71 KB	88.71 KB	0% (0 B) 👌
/libdatadog-x86-windows/debug/dynamic/datadog_profiling_ffi.pdb	184.87 MB	184.87 MB	0% (0 B) 👌
/libdatadog-x86-windows/debug/static/datadog_profiling_ffi.lib	917.97 MB	917.98 MB	+0% (+1.03 KB) 👌
/libdatadog-x86-windows/release/dynamic/datadog_profiling_ffi.dll	6.24 MB	6.24 MB	0% (0 B) 👌
/libdatadog-x86-windows/release/dynamic/datadog_profiling_ffi.lib	88.71 KB	88.71 KB	0% (0 B) 👌
/libdatadog-x86-windows/release/dynamic/datadog_profiling_ffi.pdb	25.66 MB	25.66 MB	0% (0 B) 👌
/libdatadog-x86-windows/release/static/datadog_profiling_ffi.lib	45.41 MB	45.41 MB	0% (0 B) 👌

x86_64-alpine-linux-musl

Artifact	Baseline	Commit	Change
/x86_64-alpine-linux-musl/lib/libdatadog_profiling.a	74.59 MB	74.59 MB	0% (0 B) 👌
/x86_64-alpine-linux-musl/lib/libdatadog_profiling.so	8.58 MB	8.58 MB	0% (0 B) 👌

x86_64-unknown-linux-gnu

Artifact	Baseline	Commit	Change
/x86_64-unknown-linux-gnu/lib/libdatadog_profiling.a	90.01 MB	90.01 MB	0% (0 B) 👌
/x86_64-unknown-linux-gnu/lib/libdatadog_profiling.so	10.44 MB	10.44 MB	0% (0 B) 👌

…e normalization Extend the existing criterion bench to cover the per-char UTF-8 state machines that run on every ingested span but were previously unmeasured: `normalize_tag` (ASCII / mixed-unicode / over-length), `normalize_metric_name`, `truncate_utf8` (UTF-8 boundary walk-back), and `normalize_span_start_duration` (quantifying the SystemTime read on the year-2000 path). Adds a `bench-internals` feature, mirroring `libdd-sampling`, to expose the otherwise-private `normalize_metric_name`/`truncate_utf8` without changing the shipped public API. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Convert the bench from a single-call `b.iter` to the batched `iter_batched_ref` + 1000-element inner loop used by the other benches in this file. The previous form set `throughput(Elements(1000))` and `SamplingMode::Flat` but measured one call per iteration, so the throughput number was meaningless and the ns-scale "clean" path was swamped by timer overhead. The batch is rebuilt in untimed setup because the function mutates its inputs in place: on the year-2000 path the first call rewrites `start` to a recent timestamp, which would make a second call on the same value skip the clock branch. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7834e6f902

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…rk script The normalization_utils bench target uses required-features = ["bench-internals"], so Cargo silently skips it unless that feature is explicitly activated. Add libdd-trace-normalization/bench-internals to the --features list in run_benchmarks_ci.sh so the new (and existing) normalization benchmarks appear in CI results.

yannham force-pushed the yannham/microbenches-normalization branch from ffb711c to f8e4abb Compare June 16, 2026 09:28

yannham force-pushed the yannham/microbenches-normalization branch from f8e4abb to bd85696 Compare June 16, 2026 12:41

yannham and others added 3 commits June 16, 2026 15:25

doc: reword Cargo.toml comment

ed81de7

test: add black_box to avoid undue inlining

7834e6f

yannham marked this pull request as ready for review June 16, 2026 13:53

yannham requested review from a team as code owners June 16, 2026 13:53

chatgpt-codex-connector Bot reviewed Jun 16, 2026

View reviewed changes

Comment thread libdd-trace-normalization/Cargo.toml

yannham added 2 commits June 16, 2026 17:40

fix: clippy warning (remove useless black box)

302189a

yannham requested a review from a team as a code owner June 16, 2026 15:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(trace-normalization): add microbenchmarks for tag/metric/truncate normalization#2124

test(trace-normalization): add microbenchmarks for tag/metric/truncate normalization#2124
yannham wants to merge 6 commits into
mainfrom
yannham/microbenches-normalization

yannham commented Jun 16, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 16, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 16, 2026

Uh oh!

github-actions Bot commented Jun 16, 2026 •

edited

Loading

Uh oh!

datadog-official Bot commented Jun 16, 2026 •

edited

Loading

Uh oh!

dd-octo-sts Bot commented Jun 16, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yannham commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Motivation

Additional Notes

How to test the change?

Uh oh!

github-actions Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📚 Documentation Check Results

📦 libdd-trace-normalization - 129 warning(s)

Uh oh!

github-actions Bot commented Jun 16, 2026

Clippy Allow Annotation Report

Summary by Rule

Annotation Counts by File

Annotation Stats by Crate

About This Report

Uh oh!

github-actions Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔒 Cargo Deny Results

📦 libdd-trace-normalization - 1 error(s)

Uh oh!

datadog-official Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dd-octo-sts Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Artifact Size Benchmark Report

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

yannham commented Jun 16, 2026 •

edited

Loading

github-actions Bot commented Jun 16, 2026 •

edited

Loading

📦 `libdd-trace-normalization` - 129 warning(s)

github-actions Bot commented Jun 16, 2026 •

edited

Loading

📦 `libdd-trace-normalization` - 1 error(s)

datadog-official Bot commented Jun 16, 2026 •

edited

Loading

dd-octo-sts Bot commented Jun 16, 2026 •

edited

Loading