TSC Clocks + Benchmark Methodology Rework#26
Open
MoonFlowww wants to merge 4 commits into
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds three RDTSC-family clock backends behind compile-time macros, with a runtime calibration cascade and chrono fallback. Reworks the benchmark to remove self-measurement bias and
-O3collapsing of the no-track baseline. A few smaller cleanups in the stats path.Major changes
1. TSC clock backends (
ctrack.hpp)New
Clock_RDTSC/Clock_RDTSCP/Clock_RDTSCP_LFENCEstructs selected viaCTRACK_CLOCK_RDTSC*macros. Selected backend aliased toActiveClock; chrono is the default when no macro is set. Hard#erroron non-x86_64 if a TSC macro is defined.All
std::chrono::high_resolution_clock::time_pointandduration_cast<nanoseconds>calls inEvent,Simple_Event,EventGroup,store,ctrack_result*, andEventHandlerare replaced withActiveClock::time_pointandActiveClock::duration_ns(). The interface contract:NOW(),duration_ns(s,e),to_string(tp)is uniform across all four clocks.Calibration cascade in
calibrate_tsc():0x150x16/sys/.../cpu0/cpufreq/base_frequencyHKLM\...\CentralProcessor\0\~MHz__rdtscvssteady_clock, medianCalibration runs once via a function-local
static const bool _ = (calibrate_tsc(), true);insideEventHandler's ctor. Anchorstsc_anchor_cycles+tsc_anchor_systemforto_stringconversion.If every source returns 0 the library calls
std::abort()with a message pointing at the macro to remove. No silent fallback to chrono — wrong-by-frequency-ratio numbers are worse than an abort.2. Benchmark methodology (
ctrack_benchmark.cpp)Three fixes that together remove the bias the old harness had:
BENCHMARK_NOINLINEonbusy_wait_nsand every*_no_trackhelper. The tracked variants are naturally barriered: eachEventHandlerctor/dtor mutates thread-local event state, so the compiler cannot reorder or fuse adjacent calls. The*_no_trackvariants have no such barrier — at-O3the entire call tree gets inlined into the worker loop as a flat sequence ofbusy_wait_nscalls, which the scheduler can then reorder asymmetrically vs the tracked path. NOINLINE on the no-track side restores call-site symmetry with the tracked side, so the delta reflects CTRACK overhead and not asymmetric optimization.raw_clock_ns()—CLOCK_MONOTONIC_RAWon POSIX,QueryPerformanceCounteron Windows. Replacesstd::chrono::high_resolution_clockas the outer timer inmeasure_overhead(). Measuring a solution with itself adds bias; the outer timer needs a path independent of whatever ctrack uses internally.measure_overhead()restructure:no_trackfirst vstrackfirst per parity) so any first-trial effect cancels across pairs.ctrack::result_as_string()moved outside the timed window: pre-clear of accumulated state happens beforet0, post-clear aftert1. The old harness called it inside the timed window, charging stats-flush cost to overhead.raw_diffclamped to 0 with a verbose-mode note (noise floor).Outer clock measures the
track − no_trackdelta.3. Bench results
Minor changes
load_child_events_simple:parent_eventlookup hoisted out of the inner child loop (was redundantly fetched per child).EventHandlerctor signature dropped the defaultedstart_timeparameter;start_timeis now captured afterregister_event()and after thewrite_events_lockedspin, so the spin no longer counts toward the measured interval.EventHandlerdtor: removed manualcapacity()-size() < 1check beforeemplace_back(letvectorhandle it). Sub-events still use an explicitreserve(max(4, cap*4))growth pattern.BeautifulTable::table_timeunit string"mcs"→"us".parse_function_timingin the benchmark updated to match.usis the standard ASCII fallback forµsin low-latency perf tooling;mcsis non-standard.table_timepointrewritten to dispatch throughActiveClock::to_string.result_print/result_as_stringprintTSC frequency: X GHzwhen a TSC backend is active.