Skip to content

test(trace-normalization): add microbenchmarks for tag/metric/truncate normalization#2124

Open
yannham wants to merge 6 commits into
mainfrom
yannham/microbenches-normalization
Open

test(trace-normalization): add microbenchmarks for tag/metric/truncate normalization#2124
yannham wants to merge 6 commits into
mainfrom
yannham/microbenches-normalization

Conversation

@yannham

@yannham yannham commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Extends the existing criterion bench (libdd-trace-normalization/benches/normalization_utils.rs) to cover the normalization functions that were previously unmeasured, all of which run on every ingested span:

  • normalize_tag: the heaviest function (per-codepoint UTF-8 scan + char-class state machine). Benched on ASCII fast-path, mixed/illegal-char, unicode (codepoint slow path), and over-length (> MAX_TAG_LEN) inputs.
  • normalize_metric_name: similar complexity with one-byte lookahead. Clean, separator-collapsing, and over-length cases.
  • truncate_utf8: over-length ASCII plus a multi-byte (3-byte) input where the limit lands mid-codepoint and forces the boundary walk-back.
  • normalize_span_start_duration: clean vs needs-clock cases to quantify the SystemTime read on the pre-year-2000 path.

A bench-internals cargo feature is added (mirroring libdd-sampling) to expose the otherwise-private benched functions, without changing the shipped public API. The [[bench]] now requires this feature.

Motivation

Normalization runs on every span, but the existing bench skipped the expensive per-char UTF-8 state machines. These are the functions most likely to show up as a per-span tax, so they are worth tracking.

Additional Notes

  • Follows the existing bench idioms (same group naming, iter_batched_ref over 1000 owned copies, Throughput(Elements), black_box on inputs).
  • Net change is small (~190 LoC).
  • Default (shipped) build is unaffected: the visibility split is gated behind bench-internals.

How to test the change?

cargo check -p libdd-trace-normalization
cargo bench -p libdd-trace-normalization --features bench-internals --no-run
cargo bench -p libdd-trace-normalization --features bench-internals -- --warm-up-time 1 --measurement-time 1 --sample-size 10

🤖 (Partly) Generated with Claude Code

@github-actions

github-actions Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

📚 Documentation Check Results

⚠️ 129 documentation warning(s) found

📦 libdd-trace-normalization - 129 warning(s)


Updated: 2026-06-16 15:45:06 UTC | Commit: 9bd6659 | missing-docs job results

@github-actions

Copy link
Copy Markdown
Contributor

Clippy Allow Annotation Report

Comparing clippy allow annotations between branches:

  • Base Branch: origin/main
  • PR Branch: origin/yannham/microbenches-normalization

Summary by Rule

Rule Base Branch PR Branch Change
unwrap_used 1 1 No change (0%)
Total 1 1 No change (0%)

Annotation Counts by File

File Base Branch PR Branch Change
libdd-trace-normalization/src/normalize_utils.rs 1 1 No change (0%)

Annotation Stats by Crate

Crate Base Branch PR Branch Change
clippy-annotation-reporter 5 5 No change (0%)
datadog-ffe-ffi 1 1 No change (0%)
datadog-ipc 21 21 No change (0%)
datadog-live-debugger 4 4 No change (0%)
datadog-live-debugger-ffi 10 10 No change (0%)
datadog-profiling-replayer 4 4 No change (0%)
datadog-sidecar 46 46 No change (0%)
libdd-common 13 13 No change (0%)
libdd-common-ffi 12 12 No change (0%)
libdd-data-pipeline 5 5 No change (0%)
libdd-ddsketch 2 2 No change (0%)
libdd-dogstatsd-client 1 1 No change (0%)
libdd-profiling 13 13 No change (0%)
libdd-remote-config 3 3 No change (0%)
libdd-telemetry 20 20 No change (0%)
libdd-tinybytes 4 4 No change (0%)
libdd-trace-normalization 2 2 No change (0%)
libdd-trace-obfuscation 3 3 No change (0%)
libdd-trace-stats 1 1 No change (0%)
libdd-trace-utils 12 12 No change (0%)
Total 182 182 No change (0%)

About This Report

This report tracks Clippy allow annotations for specific rules, showing how they've changed in this PR. Decreasing the number of these annotations generally improves code quality.

@github-actions

github-actions Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

🔒 Cargo Deny Results

⚠️ 1 issue(s) found, showing only errors (advisories, bans, sources)

📦 libdd-trace-normalization - 1 error(s)

Show output
error[unsound]: Rand is unsound with a custom logger using `rand::rng()`
   ┌─ /home/runner/work/libdatadog/libdatadog/Cargo.lock:51:1
   │
51 │ rand 0.8.5 registry+https://github.com/rust-lang/crates.io-index
   │ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ unsound advisory detected
   │
   ├ ID: RUSTSEC-2026-0097
   ├ Advisory: https://rustsec.org/advisories/RUSTSEC-2026-0097
   ├ It has been reported (by @lopopolo) that the `rand` library is [unsound](https://rust-lang.github.io/unsafe-code-guidelines/glossary.html#soundness-of-code--of-a-library) (i.e. that safe code using the public API can cause Undefined Behaviour) when all the following conditions are met:
     
     - The `log` and `thread_rng` features are enabled
     - A [custom logger](https://docs.rs/log/latest/log/#implementing-a-logger) is defined
     - The custom logger accesses `rand::rng()` (previously `rand::thread_rng()`) and calls any `TryRng` (previously `RngCore`) methods on `ThreadRng`
     - The `ThreadRng` (attempts to) reseed while called from the custom logger (this happens every 64 kB of generated data)
     - Trace-level logging is enabled or warn-level logging is enabled and the random source (the `getrandom` crate) is unable to provide a new seed
     
     `TryRng` (previously `RngCore`) methods for `ThreadRng` use `unsafe` code to cast `*mut BlockRng<ReseedingCore>` to `&mut BlockRng<ReseedingCore>`. When all the above conditions are met this results in an aliased mutable reference, violating the Stacked Borrows rules. Miri is able to detect this violation in sample code. Since construction of [aliased mutable references is Undefined Behaviour](https://doc.rust-lang.org/stable/nomicon/references.html), the behaviour of optimized builds is hard to predict.
   ├ Announcement: https://github.com/rust-random/rand/pull/1763
   ├ Solution: Upgrade to >=0.10.1 OR <0.10.0, >=0.9.3 OR <0.9.0, >=0.8.6 (try `cargo update -p rand`)
   ├ rand v0.8.5
     └── (dev) libdd-trace-normalization v2.0.0

advisories FAILED, bans ok, sources ok

Updated: 2026-06-16 15:47:12 UTC | Commit: 9bd6659 | dependency-check job results

@datadog-official

datadog-official Bot commented Jun 16, 2026

Copy link
Copy Markdown

Tests

🎉 All green!

🧪 All tests passed
❄️ No new flaky tests detected

🎯 Code Coverage (details)
Patch Coverage: 100.00%
Overall Coverage: 73.41% (+0.34%)

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 302189a | Docs | Datadog PR Page | Give us feedback!

@yannham yannham force-pushed the yannham/microbenches-normalization branch from ffb711c to f8e4abb Compare June 16, 2026 09:28
@dd-octo-sts

dd-octo-sts Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Artifact Size Benchmark Report

aarch64-alpine-linux-musl
Artifact Baseline Commit Change
/aarch64-alpine-linux-musl/lib/libdatadog_profiling.so 7.70 MB 7.70 MB 0% (0 B) 👌
/aarch64-alpine-linux-musl/lib/libdatadog_profiling.a 83.67 MB 83.67 MB 0% (0 B) 👌
aarch64-unknown-linux-gnu
Artifact Baseline Commit Change
/aarch64-unknown-linux-gnu/lib/libdatadog_profiling.a 94.77 MB 94.77 MB 0% (0 B) 👌
/aarch64-unknown-linux-gnu/lib/libdatadog_profiling.so 10.34 MB 10.34 MB 0% (0 B) 👌
libdatadog-x64-windows
Artifact Baseline Commit Change
/libdatadog-x64-windows/debug/dynamic/datadog_profiling_ffi.dll 24.83 MB 24.83 MB 0% (0 B) 👌
/libdatadog-x64-windows/debug/dynamic/datadog_profiling_ffi.lib 87.33 KB 87.33 KB 0% (0 B) 👌
/libdatadog-x64-windows/debug/dynamic/datadog_profiling_ffi.pdb 180.86 MB 180.85 MB -0% (-8.00 KB) 👌
/libdatadog-x64-windows/debug/static/datadog_profiling_ffi.lib 925.02 MB 925.02 MB +0% (+1.03 KB) 👌
/libdatadog-x64-windows/release/dynamic/datadog_profiling_ffi.dll 8.09 MB 8.09 MB 0% (0 B) 👌
/libdatadog-x64-windows/release/dynamic/datadog_profiling_ffi.lib 87.33 KB 87.33 KB 0% (0 B) 👌
/libdatadog-x64-windows/release/dynamic/datadog_profiling_ffi.pdb 23.94 MB 23.94 MB 0% (0 B) 👌
/libdatadog-x64-windows/release/static/datadog_profiling_ffi.lib 47.78 MB 47.78 MB 0% (0 B) 👌
libdatadog-x86-windows
Artifact Baseline Commit Change
/libdatadog-x86-windows/debug/dynamic/datadog_profiling_ffi.dll 21.52 MB 21.52 MB 0% (0 B) 👌
/libdatadog-x86-windows/debug/dynamic/datadog_profiling_ffi.lib 88.71 KB 88.71 KB 0% (0 B) 👌
/libdatadog-x86-windows/debug/dynamic/datadog_profiling_ffi.pdb 184.87 MB 184.87 MB 0% (0 B) 👌
/libdatadog-x86-windows/debug/static/datadog_profiling_ffi.lib 917.97 MB 917.98 MB +0% (+1.03 KB) 👌
/libdatadog-x86-windows/release/dynamic/datadog_profiling_ffi.dll 6.24 MB 6.24 MB 0% (0 B) 👌
/libdatadog-x86-windows/release/dynamic/datadog_profiling_ffi.lib 88.71 KB 88.71 KB 0% (0 B) 👌
/libdatadog-x86-windows/release/dynamic/datadog_profiling_ffi.pdb 25.66 MB 25.66 MB 0% (0 B) 👌
/libdatadog-x86-windows/release/static/datadog_profiling_ffi.lib 45.41 MB 45.41 MB 0% (0 B) 👌
x86_64-alpine-linux-musl
Artifact Baseline Commit Change
/x86_64-alpine-linux-musl/lib/libdatadog_profiling.a 74.59 MB 74.59 MB 0% (0 B) 👌
/x86_64-alpine-linux-musl/lib/libdatadog_profiling.so 8.58 MB 8.58 MB 0% (0 B) 👌
x86_64-unknown-linux-gnu
Artifact Baseline Commit Change
/x86_64-unknown-linux-gnu/lib/libdatadog_profiling.a 90.01 MB 90.01 MB 0% (0 B) 👌
/x86_64-unknown-linux-gnu/lib/libdatadog_profiling.so 10.44 MB 10.44 MB 0% (0 B) 👌

…e normalization

Extend the existing criterion bench to cover the per-char UTF-8 state
machines that run on every ingested span but were previously unmeasured:
`normalize_tag` (ASCII / mixed-unicode / over-length), `normalize_metric_name`,
`truncate_utf8` (UTF-8 boundary walk-back), and `normalize_span_start_duration`
(quantifying the SystemTime read on the year-2000 path).

Adds a `bench-internals` feature, mirroring `libdd-sampling`, to expose the
otherwise-private `normalize_metric_name`/`truncate_utf8` without changing the
shipped public API.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@yannham yannham force-pushed the yannham/microbenches-normalization branch from f8e4abb to bd85696 Compare June 16, 2026 12:41
yannham and others added 3 commits June 16, 2026 15:25
Convert the bench from a single-call `b.iter` to the batched
`iter_batched_ref` + 1000-element inner loop used by the other benches in
this file. The previous form set `throughput(Elements(1000))` and
`SamplingMode::Flat` but measured one call per iteration, so the throughput
number was meaningless and the ns-scale "clean" path was swamped by timer
overhead.

The batch is rebuilt in untimed setup because the function mutates its
inputs in place: on the year-2000 path the first call rewrites `start` to a
recent timestamp, which would make a second call on the same value skip the
clock branch.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@yannham yannham marked this pull request as ready for review June 16, 2026 13:53
@yannham yannham requested review from a team as code owners June 16, 2026 13:53

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7834e6f902

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread libdd-trace-normalization/Cargo.toml
yannham added 2 commits June 16, 2026 17:40
…rk script

The normalization_utils bench target uses required-features = ["bench-internals"],
so Cargo silently skips it unless that feature is explicitly activated. Add
libdd-trace-normalization/bench-internals to the --features list in
run_benchmarks_ci.sh so the new (and existing) normalization benchmarks
appear in CI results.
@yannham yannham requested a review from a team as a code owner June 16, 2026 15:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant