Skip to content

Fix: Trace mode performance regressions#245

Open
griffinmilsap wants to merge 6 commits intodevfrom
fix/trace-perf
Open

Fix: Trace mode performance regressions#245
griffinmilsap wants to merge 6 commits intodevfrom
fix/trace-perf

Conversation

@griffinmilsap
Copy link
Copy Markdown
Collaborator

Summary

Optimizes profiling trace hot-path overhead, especially for dashboard-style trace mode, by reducing Python-level bookkeeping work on publisher and subscriber message paths.

What changed

  • Kept ProfilingTraceSample at the public API boundary, but removed per-sample dataclass creation from the publisher/subscriber hot path
  • Switched internal trace sample storage to compact tuple records in deque buffers
  • Added a sample_mod == 1 fast path to skip unnecessary sampling counter/modulus work in the common dashboard case
  • Simplified subscriber trace bookkeeping:
    • compute per-message trace state once
    • reduce repeated enablement/sample checks
    • inline lease-time and user-span trace appends into the active receive/profile paths
  • Simplified publisher trace bookkeeping:
    • inline publish trace append logic directly into broadcast()
    • avoid extra helper call overhead on the traced publish path
  • Preserved trace behavior and public interfaces:
    • same metrics
    • same timestamps
    • same sequencing
    • same exported ProfilingTraceSample shape

Performance notes

  • These changes target the runtime cost of trace-enabled message handling rather than dashboard/UI-side processing.
  • The biggest wins came from reducing helper layering and per-message bookkeeping on the subscriber/publisher hot paths.
  • Local traced hotpath improved consistently, with additional wins showing up in several SHM/TCP cases as well.

Validation

  • Focused regression coverage:
    • tests/test_profiling_api.py::test_subscriber_trace_sampling_uses_one_decision_per_message
    • tests/test_perf_hotpath.py
    • tests/test_perf_ab.py
  • Additional verification via ezmsg perf ab --trace --force-shared-env against dev/baseline runs showed consistent improvements across most traced hotpath cases.

Benchmark summary

  • ezmsg perf ab --trace --force-shared-env comparing dev vs fix/trace-perf showed the clearest gains on the local traced hotpath:
    • async/local/payload=4096: 6.05 -> 5.43 us/msg (-10.36%, 6/6 wins)
    • async/local/payload=64: 6.10 -> 5.34 us/msg (-12.19%, 6/6 wins)
  • SHM/TCP traced results were mixed and much smaller in magnitude:
    • modest improvement for async/shm/payload=64 and async/tcp/payload=64
    • slight regressions/noise for async/shm/payload=4096 and async/tcp/payload=4096
  • On the final branch state, direct ezmsg perf hotpath runs show trace overhead remains concentrated on the local path, but at a lower absolute level than the earlier baseline:
    • no trace local: 4.89-4.92 us/msg
    • trace local: 5.21-5.26 us/msg
  • Final branch-state traced results:
    • async/local/payload=4096: 5.26 us/msg
    • async/local/payload=64: 5.21 us/msg
    • async/shm/payload=4096: 76.65 us/msg
    • async/shm/payload=64: 74.44 us/msg
    • async/tcp/payload=4096: 74.20 us/msg
    • async/tcp/payload=64: 72.14 us/msg

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant