Skip to content

refactor(otel-thread-ctx): inline thread-local resolution#2129

Open
cataphract wants to merge 3 commits into
mainfrom
glopes/otel-thread-ctx-res
Open

refactor(otel-thread-ctx): inline thread-local resolution#2129
cataphract wants to merge 3 commits into
mainfrom
glopes/otel-thread-ctx-res

Conversation

@cataphract

Copy link
Copy Markdown
Contributor

What does this PR do?

The inlining of the resolution of the address of otel_thread_ctx_v1 depended on an alignment of toolchain version/options that's not trivial to accomplish. When it's not inlined and wrapped in a regular C function, some of the performance advantages of TLSDESC are negated, like its special calling convention.

How to test the change?

By building DataDog/dd-trace-php#3970 and checking if the call is inlined.

@cataphract cataphract requested review from a team as code owners June 16, 2026 15:28
@cataphract cataphract requested a review from yannham June 16, 2026 15:29
@yannham

yannham commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Haven't looked at it yet, but stupid question: if we're going the route of arhitecture-specific code, and TLSDESC is the default on arch, couldn't we just have a native Rust thread-local in this case and thus avoid entirely messing with assembly in the arm64 case? Offering of course the same API in both cases

@cataphract

cataphract commented Jun 16, 2026

Copy link
Copy Markdown
Contributor Author

Haven't looked at it yet, but stupid question: if we're going the route of arhitecture-specific code, and TLSDESC is the default on arch, couldn't we just have a native Rust thread-local in this case and thus avoid entirely messing with assembly in the arm64 case? Offering of course the same API in both cases

@yannham We could if the feature was stable. See https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=8030d0cb93bd361efe963e035d8005d5

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0cc8540401

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread libdd-otel-thread-ctx/src/lib.rs Outdated
@yannham

yannham commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

@cataphract sorry, I don't understand what you mean. Which feature are you thinking of? On my side, I meant using standard Rust thread-locals, without any extern "C" involved in the arm64/aarch64 case. Since TLSDESC is the only dialect available there, there should be nothing special to do for that architecture. And we would keep the scary assembly for x86 only.

@cataphract

cataphract commented Jun 16, 2026

Copy link
Copy Markdown
Contributor Author

@cataphract sorry, I don't understand what you mean. Which feature are you thinking of? On my side, I meant using standard Rust thread-locals, without any extern "C" involved in the arm64/aarch64 case. Since TLSDESC is the only dialect available there, there should be nothing special to do for that architecture. And we would keep the scary assembly for x86 only.

@yannham but afaik there is no way to produce an exported unmangled symbol with thread_local!. The macro expands to something like (for a const initializer):

pub const SLOT: LocalKey<Cell<*mut u8>> = {
    const __RUST_STD_INTERNAL_INIT: Cell<*mut u8> = Cell::new(std::ptr::null_mut());
    unsafe {
        LocalKey::new(const {
            if std::mem::needs_drop::<Cell<*mut u8>>() {
                |_| {
                    #[thread_local]
                    static __RUST_STD_INTERNAL_VAL: EagerStorage<Cell<*mut u8>> =
                        EagerStorage::new(__RUST_STD_INTERNAL_INIT);
                    __RUST_STD_INTERNAL_VAL.get()
                }
            } else {
                |_| {
                    #[thread_local]
                    static __RUST_STD_INTERNAL_VAL: Cell<*mut u8> = __RUST_STD_INTERNAL_INIT;
                    &__RUST_STD_INTERNAL_VAL
                }
            }
        })
    }
};

Because your type is a pointer we take the else branch, which makes it very close to what we want. But we still need the thread local symbol to be named exactly otel_thread_ctx_v1, and that's not what we get here.

@yannham

yannham commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Ah, I didn't think of the symbol name indeed... Too bad, but makes sense, thanks 👍

@cataphract cataphract force-pushed the glopes/otel-thread-ctx-res branch from 0cc8540 to e515ba8 Compare June 16, 2026 16:29
@datadog-prod-us1-6

datadog-prod-us1-6 Bot commented Jun 16, 2026

Copy link
Copy Markdown

Tests

🎉 All green!

🧪 All tests passed
❄️ No new flaky tests detected

🎯 Code Coverage (details)
Patch Coverage: 100.00%
Overall Coverage: 73.35% (-0.01%)

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: d71783d | Docs | Datadog PR Page | Give us feedback!

@github-actions

github-actions Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

📚 Documentation Check Results

⚠️ 284 documentation warning(s) found

📦 libdd-otel-thread-ctx-ffi - 282 warning(s)

📦 libdd-otel-thread-ctx - 2 warning(s)


Updated: 2026-06-17 19:09:44 UTC | Commit: 9cd81b6 | missing-docs job results

@github-actions

Copy link
Copy Markdown
Contributor

Clippy Allow Annotation Report

Comparing clippy allow annotations between branches:

  • Base Branch: origin/main
  • PR Branch: origin/glopes/otel-thread-ctx-res

Summary by Rule

Rule Base Branch PR Branch Change

Annotation Counts by File

File Base Branch PR Branch Change

Annotation Stats by Crate

Crate Base Branch PR Branch Change
clippy-annotation-reporter 5 5 No change (0%)
datadog-ffe-ffi 1 1 No change (0%)
datadog-ipc 21 21 No change (0%)
datadog-live-debugger 4 4 No change (0%)
datadog-live-debugger-ffi 10 10 No change (0%)
datadog-profiling-replayer 4 4 No change (0%)
datadog-sidecar 46 46 No change (0%)
libdd-common 13 13 No change (0%)
libdd-common-ffi 12 12 No change (0%)
libdd-data-pipeline 5 5 No change (0%)
libdd-ddsketch 2 2 No change (0%)
libdd-dogstatsd-client 1 1 No change (0%)
libdd-profiling 13 13 No change (0%)
libdd-remote-config 3 3 No change (0%)
libdd-telemetry 20 20 No change (0%)
libdd-tinybytes 4 4 No change (0%)
libdd-trace-normalization 2 2 No change (0%)
libdd-trace-obfuscation 3 3 No change (0%)
libdd-trace-stats 1 1 No change (0%)
libdd-trace-utils 12 12 No change (0%)
Total 182 182 No change (0%)

About This Report

This report tracks Clippy allow annotations for specific rules, showing how they've changed in this PR. Decreasing the number of these annotations generally improves code quality.

@github-actions

github-actions Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

🔒 Cargo Deny Results

⚠️ 4 issue(s) found, showing only errors (advisories, bans, sources)

📦 libdd-otel-thread-ctx-ffi - 4 error(s)

Show output
error[unsound]: Rand is unsound with a custom logger using `rand::rng()`
    ┌─ /home/runner/work/libdatadog/libdatadog/Cargo.lock:104:1
    │
104 │ rand 0.8.5 registry+https://github.com/rust-lang/crates.io-index
    │ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ unsound advisory detected
    │
    ├ ID: RUSTSEC-2026-0097
    ├ Advisory: https://rustsec.org/advisories/RUSTSEC-2026-0097
    ├ It has been reported (by @lopopolo) that the `rand` library is [unsound](https://rust-lang.github.io/unsafe-code-guidelines/glossary.html#soundness-of-code--of-a-library) (i.e. that safe code using the public API can cause Undefined Behaviour) when all the following conditions are met:
      
      - The `log` and `thread_rng` features are enabled
      - A [custom logger](https://docs.rs/log/latest/log/#implementing-a-logger) is defined
      - The custom logger accesses `rand::rng()` (previously `rand::thread_rng()`) and calls any `TryRng` (previously `RngCore`) methods on `ThreadRng`
      - The `ThreadRng` (attempts to) reseed while called from the custom logger (this happens every 64 kB of generated data)
      - Trace-level logging is enabled or warn-level logging is enabled and the random source (the `getrandom` crate) is unable to provide a new seed
      
      `TryRng` (previously `RngCore`) methods for `ThreadRng` use `unsafe` code to cast `*mut BlockRng<ReseedingCore>` to `&mut BlockRng<ReseedingCore>`. When all the above conditions are met this results in an aliased mutable reference, violating the Stacked Borrows rules. Miri is able to detect this violation in sample code. Since construction of [aliased mutable references is Undefined Behaviour](https://doc.rust-lang.org/stable/nomicon/references.html), the behaviour of optimized builds is hard to predict.
    ├ Announcement: https://github.com/rust-random/rand/pull/1763
    ├ Solution: Upgrade to >=0.10.1 OR <0.10.0, >=0.9.3 OR <0.9.0, >=0.8.6 (try `cargo update -p rand`)
    ├ rand v0.8.5
      └── (dev) libdd-common v4.2.0
          └── libdd-common-ffi v35.0.0
              └── libdd-otel-thread-ctx-ffi v1.0.0

error[vulnerability]: Name constraints for URI names were incorrectly accepted
    ┌─ /home/runner/work/libdatadog/libdatadog/Cargo.lock:119:1
    │
119 │ rustls-webpki 0.103.10 registry+https://github.com/rust-lang/crates.io-index
    │ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ security vulnerability detected
    │
    ├ ID: RUSTSEC-2026-0098
    ├ Advisory: https://rustsec.org/advisories/RUSTSEC-2026-0098
    ├ Name constraints for URI names were ignored and therefore accepted.
      
      Note this library does not provide an API for asserting URI names, and URI name constraints are otherwise not implemented.  URI name constraints are now rejected unconditionally.
      
      Since name constraints are restrictions on otherwise properly-issued certificates, this bug is reachable only after signature verification and requires misissuance to exploit.
      
      This vulnerability is identified as [GHSA-965h-392x-2mh5](https://github.com/rustls/webpki/security/advisories/GHSA-965h-392x-2mh5). Thank you to @1seal for the report.
    ├ Solution: Upgrade to >=0.103.12, <0.104.0-alpha.1 OR >=0.104.0-alpha.6 (try `cargo update -p rustls-webpki`)
    ├ rustls-webpki v0.103.10
      └── rustls v0.23.37
          ├── hyper-rustls v0.27.7
          │   └── libdd-common v4.2.0
          │       └── libdd-common-ffi v35.0.0
          │           └── libdd-otel-thread-ctx-ffi v1.0.0
          ├── libdd-common v4.2.0 (*)
          └── tokio-rustls v0.26.0
              ├── hyper-rustls v0.27.7 (*)
              └── libdd-common v4.2.0 (*)

error[vulnerability]: Name constraints were accepted for certificates asserting a wildcard name
    ┌─ /home/runner/work/libdatadog/libdatadog/Cargo.lock:119:1
    │
119 │ rustls-webpki 0.103.10 registry+https://github.com/rust-lang/crates.io-index
    │ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ security vulnerability detected
    │
    ├ ID: RUSTSEC-2026-0099
    ├ Advisory: https://rustsec.org/advisories/RUSTSEC-2026-0099
    ├ Permitted subtree name constraints for DNS names were accepted for certificates asserting a wildcard name.
      
      This was incorrect because, given a name constraint of `accept.example.com`, `*.example.com` could feasibly allow a name of `reject.example.com` which is outside the constraint.
      This is very similar to [CVE-2025-61727](https://go.dev/issue/76442).
      
      Since name constraints are restrictions on otherwise properly-issued certificates, this bug is reachable only after signature verification and requires misissuance to exploit.
      
      This vulnerability is identified as [GHSA-xgp8-3hg3-c2mh](https://github.com/rustls/webpki/security/advisories/GHSA-xgp8-3hg3-c2mh). Thank you to @1seal for the report.
    ├ Solution: Upgrade to >=0.103.12, <0.104.0-alpha.1 OR >=0.104.0-alpha.6 (try `cargo update -p rustls-webpki`)
    ├ rustls-webpki v0.103.10
      └── rustls v0.23.37
          ├── hyper-rustls v0.27.7
          │   └── libdd-common v4.2.0
          │       └── libdd-common-ffi v35.0.0
          │           └── libdd-otel-thread-ctx-ffi v1.0.0
          ├── libdd-common v4.2.0 (*)
          └── tokio-rustls v0.26.0
              ├── hyper-rustls v0.27.7 (*)
              └── libdd-common v4.2.0 (*)

error[vulnerability]: Reachable panic in certificate revocation list parsing
    ┌─ /home/runner/work/libdatadog/libdatadog/Cargo.lock:119:1
    │
119 │ rustls-webpki 0.103.10 registry+https://github.com/rust-lang/crates.io-index
    │ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ security vulnerability detected
    │
    ├ ID: RUSTSEC-2026-0104
    ├ Advisory: https://rustsec.org/advisories/RUSTSEC-2026-0104
    ├ A panic was reachable when parsing certificate revocation lists via [`BorrowedCertRevocationList::from_der`]
      or [`OwnedCertRevocationList::from_der`].  This was the result of mishandling a syntactically valid empty
      `BIT STRING` appearing in the `onlySomeReasons` element of a `IssuingDistributionPoint` CRL extension.
      
      This panic is reachable prior to a CRL's signature being verified.
      
      Applications that do not use CRLs are not affected.
      
      Thank you to @tynus3 for the report.
    ├ Solution: Upgrade to >=0.103.13, <0.104.0-alpha.1 OR >=0.104.0-alpha.7 (try `cargo update -p rustls-webpki`)
    ├ rustls-webpki v0.103.10
      └── rustls v0.23.37
          ├── hyper-rustls v0.27.7
          │   └── libdd-common v4.2.0
          │       └── libdd-common-ffi v35.0.0
          │           └── libdd-otel-thread-ctx-ffi v1.0.0
          ├── libdd-common v4.2.0 (*)
          └── tokio-rustls v0.26.0
              ├── hyper-rustls v0.27.7 (*)
              └── libdd-common v4.2.0 (*)

advisories FAILED, bans ok, sources ok

📦 libdd-otel-thread-ctx - ✅ No issues


Updated: 2026-06-17 19:11:27 UTC | Commit: 9cd81b6 | dependency-check job results

@cataphract cataphract changed the title More robust way to inline thread local var resolution refactor(otel-thread-ctx): inline thread-local resolution Jun 16, 2026
@cataphract cataphract force-pushed the glopes/otel-thread-ctx-res branch from e515ba8 to ee8bb2e Compare June 16, 2026 16:41
@dd-octo-sts

dd-octo-sts Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Artifact Size Benchmark Report

aarch64-alpine-linux-musl
Artifact Baseline Commit Change
/aarch64-alpine-linux-musl/lib/libdatadog_profiling.so 7.76 MB 7.76 MB 0% (0 B) 👌
/aarch64-alpine-linux-musl/lib/libdatadog_profiling.a 84.02 MB 84.02 MB 0% (0 B) 👌
aarch64-unknown-linux-gnu
Artifact Baseline Commit Change
/aarch64-unknown-linux-gnu/lib/libdatadog_profiling.a 95.13 MB 95.13 MB 0% (0 B) 👌
/aarch64-unknown-linux-gnu/lib/libdatadog_profiling.so 10.36 MB 10.36 MB 0% (0 B) 👌
libdatadog-x64-windows
Artifact Baseline Commit Change
/libdatadog-x64-windows/debug/dynamic/datadog_profiling_ffi.dll 24.93 MB 24.93 MB 0% (0 B) 👌
/libdatadog-x64-windows/debug/dynamic/datadog_profiling_ffi.lib 87.33 KB 87.33 KB 0% (0 B) 👌
/libdatadog-x64-windows/debug/dynamic/datadog_profiling_ffi.pdb 181.51 MB 181.50 MB -0% (-16.00 KB) 👌
/libdatadog-x64-windows/debug/static/datadog_profiling_ffi.lib 928.21 MB 928.21 MB 0% (0 B) 👌
/libdatadog-x64-windows/release/dynamic/datadog_profiling_ffi.dll 8.12 MB 8.12 MB 0% (0 B) 👌
/libdatadog-x64-windows/release/dynamic/datadog_profiling_ffi.lib 87.33 KB 87.33 KB 0% (0 B) 👌
/libdatadog-x64-windows/release/dynamic/datadog_profiling_ffi.pdb 24.03 MB 24.03 MB 0% (0 B) 👌
/libdatadog-x64-windows/release/static/datadog_profiling_ffi.lib 47.96 MB 47.96 MB 0% (0 B) 👌
libdatadog-x86-windows
Artifact Baseline Commit Change
/libdatadog-x86-windows/debug/dynamic/datadog_profiling_ffi.dll 21.62 MB 21.62 MB 0% (0 B) 👌
/libdatadog-x86-windows/debug/dynamic/datadog_profiling_ffi.lib 88.71 KB 88.71 KB 0% (0 B) 👌
/libdatadog-x86-windows/debug/dynamic/datadog_profiling_ffi.pdb 185.58 MB 185.58 MB 0% (0 B) 👌
/libdatadog-x86-windows/debug/static/datadog_profiling_ffi.lib 921.15 MB 921.15 MB 0% (0 B) 👌
/libdatadog-x86-windows/release/dynamic/datadog_profiling_ffi.dll 6.27 MB 6.27 MB 0% (0 B) 👌
/libdatadog-x86-windows/release/dynamic/datadog_profiling_ffi.lib 88.71 KB 88.71 KB 0% (0 B) 👌
/libdatadog-x86-windows/release/dynamic/datadog_profiling_ffi.pdb 25.76 MB 25.76 MB 0% (0 B) 👌
/libdatadog-x86-windows/release/static/datadog_profiling_ffi.lib 45.59 MB 45.59 MB 0% (0 B) 👌
x86_64-alpine-linux-musl
Artifact Baseline Commit Change
/x86_64-alpine-linux-musl/lib/libdatadog_profiling.a 74.91 MB 74.91 MB 0% (0 B) 👌
/x86_64-alpine-linux-musl/lib/libdatadog_profiling.so 8.61 MB 8.61 MB 0% (0 B) 👌
x86_64-unknown-linux-gnu
Artifact Baseline Commit Change
/x86_64-unknown-linux-gnu/lib/libdatadog_profiling.a 90.33 MB 90.33 MB 0% (0 B) 👌
/x86_64-unknown-linux-gnu/lib/libdatadog_profiling.so 10.48 MB 10.48 MB 0% (0 B) 👌

@yannham yannham left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, putting inline ASM is rather scary and fragile. However, I can see how it makes everything simpler on the build side, and remove a lot of code and scripts, which is appealing. It also gives guaranteed inlining without any toolchain reauirement. Since TLSDESC sequences are very simple and are likely to be very very stable (since the linker, the dynamic loader and the compiler must all agree), I suppose it's a reasonable approach.

I've tested the dynamic build and a static build with TLS relaxation on x86 (I did cross-compile for aarch64 as well and tested both, but had to take some detour there; the CI will test the shared build at least anyway). The ASM sequences look canonical, to the best of my knowledge.

I'm requiring changes only for the potentially missing align for otel_thread_ctx_v1 symbol, but otherwise, looks fine!

Comment thread libdd-otel-thread-ctx/src/lib.rs
#[inline(always)]
unsafe fn tls_slot() -> *mut *mut ThreadContextRecord {
let ptr: usize;
core::arch::asm!(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: we should probably put a big WARNING scary comment around the TLS asm sequence here and lower. I've played with some harmless variations and linker relaxation is actually really sensitive to the precise instructions used. So, basically, tell people to just never touch this or they might break TLS relaxation.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do that, but what kind of relaxation do you expect?

Is this library supposed to be linked statically? If so, the linker will relax accesses to local-exec. We need to tell the linker to export otel_thread_ctx_v1 explicitly. Even then, in the test I made, this generated a relocation for a simple offset in the GOT (R_X86_64_TPOFF64, for an initial-exec like access, I imagine), so no TLSDESC.

If it's supposed to be linked against the .so, then I don't see what relaxation we could have.

@yannham yannham Jun 17, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this library supposed to be linked statically

Yes, in the case of a Rust tracer/profiler typically (dd-trace-rs). The OTel spec mentions TLSDESC but it's been clarified to mean "the TLSDESC dialect as implemented by the compiler": in the statically linked case, this will be relaxed to local exec which is fine and supported. We could want the relaxation for improved perf here (even if keeping the slower generic TLSDESC sequence would still be correct).

But I believe there's worse than the perf question: I've played a bit with the assembly here (to avoid offsetting xo two times), and Claude affirms (I haven't gone as far as testing it though) that my innocent-looking refactoring completely broke the relaxation, though the effect/semantics were exactly the same. It made some instructions not visible to the linker anymore (no associated relocation). Since the linker can't really consider such cases and still patch the instructions that are associated with the relocation, this results in a completely broken partial relaxation which loads a gibberish memory address during the sequence.

In general having inline ASM pretending it's a compiler-generated TLS access sequence is already quite fragile and dangereous. I feel like the current one is ok after thorough review, but I would rather not have anyone touch this until they understand very, very well what they're doing (if the comment can also scare off LLMs that's probably a good thing). Many things could break in many different ways down the line.

@yannham yannham Jun 17, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(and, I must add, because I've tried very hard some time ago when I initially misunderstood the spec: it's just impossible right now to prevent LLD from performing TLS relaxation, so there's no way to say "hey I know this looks strange but please keep my TLSDESC intact at link time". There's just no knob, so we can't prevent linker relaxation in a reasonable setup to happen anway, which is why the spec accepts relaxed accesses like local exec, or being spec compliant with a statically linked executable would just be a nightmare)

@cataphract cataphract Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like the spec is confusing by mixing the concepts of exporting a symbol and its access model/relocation.

  1. It does make sense, if this rust cdylib is consumed as shared library, to both say that 1) it should export the symbol and 2) use the TLSDESC model, because only TLSDESC (and general dynamic) allow for the symbol to accessed when the library is dlopen()ed (with a minor exception that's only applicable to glibc anyway).

  2. If it is consumed as a static library by an executable, it's not even clear why there should be a relocation. It's possible to export a symbol and not have any relocation against it, because the linker knows where it lies inside the executable. And like, you said, it seems impossible to have the linker not relax the TLSDESC in the static library. The behavior I saw was:
    a) without exporting the symbol, the accesses follow the local-exec model,
    b) if the symbol is exported, strangely, a relocation of type R_X86_64_TPOFF64 is produced, and accesses inside the executable change to initial-exec, even though the symbol doesn't seem preemptible (e.g. with LD_PRELOAD). I was only able to prevent the relocation and force local-exec behavior again using a hidden alias.

This means that, if we want to support executables linking against the static library, we need to be able to find the symbol regardless of the type of relocation, regardless of even whethe the relocation exists. Whether TLSDESC is used or not becomes immaterial. Resolving the symbol likely involves getting its offset from dynsym and the module id from libc internals and doing the equivalent of __tls_get_addr

@yannham yannham Jun 17, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does make sense, if this rust cdylib is consumed as shared library, to both say that 1) it should export the symbol and 2) use the TLSDESC model, because only TLSDESC (and general dynamic) allow for the symbol to accessed when the library is dlopen()ed (with a minor exception that's only applicable to glibc anyway).

Yup, agreed.

If it is consumed as a static library by an executable, it's not even clear why there should be a relocation

Putting the symbol in the dynamic table is just a hack to have the reader be able to locate it. Otherwise, it's a local access and the linker has no reason in a statically linked executable to expose the symbol to the outside world in any circumstance. However, it doesn't have to come with an associated relocation, it can just be here in the table. There's a way to do so with --dynamic-list and the linker.

I have a simple example crate that uses libdd-otel-thread-ctxt statically, has a local exec relaxed access, has the symbol in the dynamic symbol table but no relocation associated (and particularly no TLS relocation). I can share it if you'd like to give it a look?

Also, I think we're getting a bit off track here discussing the spec. It hasn't been merged yet, so it's good to have feedback and proposals, but it probably doesn't belong here. Happy to chat about all those things on another channel!

I believe my point still stands and is independent from everything that has been discussed so far:

  1. we generate a TLSDESC access at compile-time with our inline assembly, as would do a compiler.
  2. the linker will attempt to relax the access for a statically link executable at link time, and there's no way to prevent that.
  3. this relaxation is quite "dumb" and really fragile with respect the exact sequence of assembly emitted, because it requires some specific instructions in the sequence to be associated with a relocation. This is fine because it's usually emitted by the compiler and thus true. We're doing it manually so we should match that carefully.
  4. messing with the assembly can cause broken relaxation which will link into a broken executable, that will for example segfault, because it's just not a case the linker relaxation code was written for.

Thus we should warn very loudly people to not mess with this assembly. Do you disagree with any step of this reasoning? That sounds cheap and safe to do, IMHO.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a comment and a test to verify the relaxation does happen.

As for the rest, I think we're in agreement. I'll take my comments on the spec to it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! As for the added test, would that be possible to rebase this PR on the top of #2095 (that we can get reviewed and merged quickly if needed)? The latter changed the elf_properties test (basically using a programmatic test instead of using external tools like readelf and whatnot), which became simpler. I feel like doing it in the other direction, I wouldn't be entirely confident that the result would be the right one...

@yannham yannham requested review from ivoanjo and scottgerring June 17, 2026 09:36
@scottgerring

Copy link
Copy Markdown
Member

Hey @cataphract, haven't had a chance to look at this properly yet, but some more details around what went wrong exactly what probably not go astray - swapping this out for inline asm feels like a big change so it would be nice to convince ourselves its the right move

@ivoanjo

ivoanjo commented Jun 17, 2026

Copy link
Copy Markdown
Member

From my side, I think this overall looks like a win. So my thoughts are "how can we validate this?", and also, the cousin question of "What could go wrong, even if unlikely, and is there something we could do to avoid it?"

Any thoughts on these @cataphract ?

@cataphract

cataphract commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

Hey @cataphract, haven't had a chance to look at this properly yet, but some more details around what went wrong exactly what probably not go astray - swapping this out for inline asm feels like a big change so it would be nice to convince ourselves its the right move

@scottgerring I noticed the inlining wasn't happening in dd-trace-php, but I confess I didn't get to the bottom of it. In general, we shouldn't rely on such toolchain quirks, as we can't always control it (e.g. clients can and do build dd-trace-php themselves).

cataphract and others added 2 commits June 17, 2026 16:48
@cataphract cataphract force-pushed the glopes/otel-thread-ctx-res branch 2 times, most recently from a263aeb to 4acf961 Compare June 17, 2026 16:20
@ivoanjo

ivoanjo commented Jun 17, 2026

Copy link
Copy Markdown
Member

One thing that occurred to me is that we could add a CI test that the inline assembly matches the compilation of tls_shim.c. We would have an option to skip this test (e.g. if someone's local compiler is different), but this way we could ensure that our assembly exactly matches the output of a known-good compiler, vs a "when we looked at it, it was fine" kinda affair.

@cataphract cataphract force-pushed the glopes/otel-thread-ctx-res branch from 4acf961 to d71783d Compare June 17, 2026 19:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants