refactor(change-buffer)!: replace slot index with span_id, fix segment isolation#2105
Conversation
🎉 All green!🧪 All tests passed 🎯 Code Coverage (details) 🔗 Commit SHA: 953dc2e | Docs | Datadog PR Page | Give us feedback! |
Clippy Allow Annotation ReportComparing clippy allow annotations between branches:
Summary by Rule
Annotation Counts by File
Annotation Stats by Crate
About This ReportThis report tracks Clippy allow annotations for specific rules, showing how they've changed in this PR. Decreasing the number of these annotations generally improves code quality. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 29b0d7d5bd
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Artifact Size Benchmark Reportaarch64-alpine-linux-musl
aarch64-unknown-linux-gnu
libdatadog-x64-windows
libdatadog-x86-windows
x86_64-alpine-linux-musl
x86_64-unknown-linux-gnu
|
| let (mut span, segment_id) = self | ||
| .spans | ||
| .remove(span_id) | ||
| .ok_or(ChangeBufferError::SpanNotFound(*span_id))?; |
There was a problem hiding this comment.
If this fails, we stop removing the other spans. Is that what we want?
| // Segment 1 — span_id=1: origin → "origin-A" (string id 0) | ||
| w.op(OP_CREATE, 1); // span_id=1 in header | ||
| w.u128(TRACE_ID); | ||
| w.u64(1); // segment_id=1 | ||
| w.u64(0); // parent_id | ||
| w.op(OP_SET_TRACE_ORIGIN, 1); | ||
| w.u32(0); // string_id → "origin-A" |
There was a problem hiding this comment.
Should this have a helper function?
| deferred_meta_at(&mut self.deferred_meta, slot)?.insert(key_id, val_id); | ||
| if !cache.deferred_meta.is_null() { | ||
| // SAFETY: pointer into self.deferred_meta, valid per function contract. | ||
| unsafe { &mut *cache.deferred_meta }.insert(key_id, val_id); |
There was a problem hiding this comment.
Are we guaranteed to enforce stacked-borrows here?
There was a problem hiding this comment.
Yes, because nothing else is touching self.spans in any way (creating mutable or immutable references) while we materialize a mutable reference to some span. Basically we could use HashMap::get_mut for the lifetime of the materialized mutable span reference instead, but of course the whole point of the pointer is to avoid having to make a hashmap fetch.
a5fe934 to
d278eae
Compare
… isolation Segment isolation bug When service A calls B which calls back to A, both of A's segments share the same trace_id. The segment-level state map (origin, meta, metrics) was keyed by trace_id, so SetTraceOrigin/SetTraceMetaAttr/ SetTraceMetricsAttr written for segment 2 would overwrite segment 1's values before it was flushed. Fix: add segment_id (u64) to the Create op so each segment gets its own isolated entry in SmallSegmentMap. Slot index removal Operations previously carried a u32 slot_index — a JS-managed array index into parallel Vecs (spans, deferred_meta, deferred_metrics, segment_ids). JS had to allocate and free slot numbers, silently reusing a slot would drop a live span, and ensure_slot had to grow all four Vecs in lockstep on every Create. Replaced with span_id (u64) as the natural key into HashMaps, removing the slot allocation contract entirely. Opcode size The opcode was encoded as u64 because JS was writing via setBigUint64. With only 17 opcodes this wastes 6 bytes per op. Reduced to u16. SpanHeader removal materialize_header was a direct-write path where JS wrote span fields into a repr(C) struct in WASM memory, bypassing the change buffer protocol. It had no segment_id field and would have reproduced the isolation bug. It was not in use, so deleted along with span_header.rs. Rename Trace→Segment SmallTraceMap/Trace renamed to SmallSegmentMap/Segment since the map holds per-segment state, not per-trace state. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…buffer corruption Co-locate segment_id with its span in spans: HashMap<u64, (Span<T>, u64)>, removing the separate segment_ids: HashMap<u64, u64>. This eliminates a second HashMap lookup on every op that needs the segment_id. Group the 5 cache state variables in flush_change_buffer into a SpanCache<T> struct, fixing the too_many_arguments clippy lint on interpret_operation_cached. Fix a buffer-corruption bug (P1) where BatchSetMeta/BatchSetMetric in the cached path read the count but skipped the payload loop when cached_deferred_meta/cached_deferred_metrics was null (e.g. after materialize_span was called between flushes). The unread bytes were then decoded as the next op's opcode, corrupting all subsequent ops in the buffer. The fix moves the null-check inside the loop so bytes are always consumed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
f18ba51 to
e85b3fd
Compare
yannham
left a comment
There was a problem hiding this comment.
Reviewed with Roch, but since I also rebased/amended the PR, it'd better to have another pair of eyes (and a stamp).
paullegranddc
left a comment
There was a problem hiding this comment.
No objection on principle, but did you benchmark this on dd-trace-js vs the previous implementation?
The previous implementation was incorrect, so it doesn't matter if it was faster. Also, we shouldn't be relying on people benchmarking on their machines, they should be automated in this repo. |
Hey, this is precisely what I'm trying to do this week 🙂 |
Summary
trace_id, trace-level state (SetTraceOrigin,SetTraceMetaAttr,SetTraceMetricsAttr) was keyed bytrace_id, so segment 2's writes overwrote segment 1's values before it was flushed. Fixed by addingsegment_id(u64) to Create ops and keyingSmallSegmentMapby segment_id instead.u32 slot_index— a JS-managed array index into four parallelVecs. JS had to allocate/free slot numbers; reusing a slot prematurely would silently drop a live span;ensure_slothad to grow all four Vecs in lockstep. Replaced withspan_id(u64) as the natural key intoHashMaps, removing the slot allocation contract entirely.u64because JS wrote viasetBigUint64, wasting 6 bytes per op with only 17 opcodes. Reduced tou16.materialize_headerwas an unused direct-write path where JS wrote span fields into a#[repr(C)]struct in WASM memory, bypassing the change buffer protocol. It had nosegment_idfield and would have reproduced the isolation bug. Deleted along withspan_header.rs.Trace→Segment:SmallTraceMap/Tracerenamed toSmallSegmentMap/Segmentsince the map holds per-segment state, not per-trace state.Test plan
cargo test -p libdd-trace-utils --lib --features change-buffer -- change_bufferpasses (16 tests including 3 new segment isolation tests)cargo check -p libdd-trace-utils --features change-bufferclean