Add ISOSDacInterface17 for StressLog enumeration via cDAC#129272
Add ISOSDacInterface17 for StressLog enumeration via cDAC#129272max-charlamb wants to merge 12 commits into
Conversation
1c1c89a to
714265b
Compare
Add a new ISOSDacInterface17 COM interface that exposes the cDAC's IStressLog contract to SOS and clrmd consumers. This enables reading stress logs from both CoreCLR and NativeAOT without hardcoded struct offsets. Changes: - Add StartTime (wall-clock FILETIME) field to cDAC data descriptors, Data/StressLog.cs, StressLogData record, and contract implementation - Add StressLog to ContractRegistry - Add Address field to ThreadStressLogData for thread identification - Define ISOSDacInterface17 with IsStressLogAvailable, GetStressLogData, GetStressLogThreadEnumerator, GetStressLogMessageEnumerator, and GetStressLogMemoryRanges - Define ISOSStressLogThreadEnum, ISOSStressLogMsgEnum, and ISOSStressLogMemoryEnum enumerator interfaces - Implement SOSDacImpl bridge delegating to _target.Contracts.StressLog - Add IDL definitions in sospriv.idl (native DAC does not implement this interface; QueryInterface for the IID will fail on the legacy DAC) - Add StressLog debuggee and StressLogDumpTests for dump-based validation - Add design document explaining the architecture and migration path Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
714265b to
b31ec46
Compare
There was a problem hiding this comment.
Pull request overview
This PR extends the cDAC stress log support by (1) enriching the StressLog contract data model (notably adding StartTime and thread-log address identity) and (2) introducing a new ISOSDacInterface17 COM surface (IDL + managed declarations + SOSDacImpl bridge) intended to let SOS/clrmd enumerate stress logs without hardcoded layout offsets. It also adds a new dump-test debuggee plus contract-level dump tests and a detailed design document.
Changes:
- Add
StartTimeto StressLog data descriptors (CoreCLR + NativeAOT) and flow it through the managed contract types. - Add
ISOSDacInterface17+ related StressLog enumerator interfaces/structs and implement the bridge inSOSDacImpl. - Add StressLog dump debuggee and dump-based integration tests for the StressLog contract, plus a design doc.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| src/native/managed/cdac/tests/DumpTests/StressLogDumpTests.cs | Adds dump-based integration tests for the StressLog contract (availability, header data, basic enumeration). |
| src/native/managed/cdac/tests/DumpTests/Debuggees/StressLog/StressLog.csproj | New debuggee project enabling stress logging via environment variables. |
| src/native/managed/cdac/tests/DumpTests/Debuggees/StressLog/Program.cs | New debuggee that generates allocations/GC activity then FailFasts to produce a dump. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Legacy/SOSDacImpl.cs | Implements ISOSDacInterface17 and new StressLog enumerator implementations bridging to the cDAC IStressLog contract. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Legacy/ISOSDacInterface.cs | Adds public COM interface/struct declarations for ISOSDacInterface17 and StressLog enumerators. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Data/StressLog.cs | Adds [Field] StartTime to the StressLog data model. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/StressLog.cs | Plumbs StartTime into StressLogData and includes thread-log address in ThreadStressLogData. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Abstractions/Contracts/IStressLog.cs | Extends StressLogData with StartTime and ThreadStressLogData with Address. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Abstractions/ContractRegistry.cs | Exposes StressLog via Target.Contracts.StressLog. |
| src/coreclr/vm/datadescriptor/datadescriptor.inc | Exports StressLog.StartTime for CoreCLR via data descriptors. |
| src/coreclr/nativeaot/Runtime/datadescriptor/datadescriptor.inc | Exports StressLog.StartTime for NativeAOT via data descriptors. |
| src/coreclr/inc/sospriv.idl | Adds IDL definitions for ISOSDacInterface17 and associated StressLog structs/enumerator interfaces. |
| docs/design/datacontracts/stresslog-isosinterface-design.md | Design rationale and planned consumer migration strategy for exposing StressLog via SOS DAC interface. |
|
Is there a specific reason that regular CoreCLR and NativeAOT stress logs must be different? |
- Add [GeneratedComClass] to all three enumerator classes - Use ToTargetPointer for ClrDataAddress comparison - Shrink _lastBatchRaw to actual fetched size - Return ex.HResult instead of E_FAIL in IsStressLogAvailable Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
We should be able to delete https://github.com/max-charlamb/runtime/blob/bd3de83c6195bdbda612e67672fb1248a1014d28/src/coreclr/inc/stresslog.h#L225 once this is in place. (It does not have to be done in this PR.) |
- Remove DacpStressLogMemoryRange, ISOSStressLogMemoryEnum, and GetStressLogMemoryRanges from IDL, managed interface, and impl since no SOS consumer uses them. - Refactor SOSStressLogMsgEnum to eagerly materialize messages into an array, enabling Reset and GetCount support consistent with other ISOSEnum implementations. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Rename DacpStressLogData -> SOSStressLogData, DacpThreadStressLogData -> SOSThreadStressLogData, DacpStressMsgData -> SOSStressMsgData - Define structs inline in sospriv.idl with typedef struct and cpp_quote guards, following the SOSMethodData/SOSMemoryRegion convention used by newer interfaces - Remove Dacp forward declarations and dacprivate.h definitions - Put ISOSDacInterface16, ISOSDacInterface17 on same line in class decl Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Use 'written < count' pattern for S_FALSE (standard COM IEnumXxx convention matching SOSMethodEnum/SOSMemoryEnum) - Return E_POINTER directly for null output in GetStressLogData Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
StressMsgHeaderSize, ChunkSize, MaxMessageSize, and PointerSize are implementation details for raw memory walking that the message enumerator now abstracts away. Consumers no longer need them. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Callers can just call GetStressLogData/GetStressLogThreadEnumerator directly -- they return S_FALSE when stress log is not enabled. No need for a separate availability check method. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.
Comments suppressed due to low confidence (1)
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Abstractions/Contracts/IStressLog.cs:28
StressLogDataandThreadStressLogDataarepublic record structs. Adding new positional parameters (StartTime,Address) changes the synthesized primary constructor andDeconstruct, which is source/binary breaking for any consumers. If this assembly has any external consumers, consider keeping the existing primary constructor stable and adding the new data as additional properties (or introducing new types) instead.
public record struct StressLogData(
uint LoggedFacilities,
uint Level,
uint MaxSizePerThread,
uint MaxSizeTotal,
int TotalChunks,
ulong TickFrequency,
ulong StartTimestamp,
ulong StartTime,
TargetPointer Logs);
public record struct ThreadStressLogData(
TargetPointer Address,
TargetPointer NextPointer,
ulong ThreadId,
bool WriteHasWrapped,
TargetPointer CurrentPointer,
TargetPointer ChunkListHead,
TargetPointer ChunkListTail,
TargetPointer CurrentWriteChunk);
- Initialize *data = default early in GetStressLogData for clean output on all failure paths - Wrap enumerator Next methods in try/catch to avoid leaking exceptions across the COM boundary - Add ISOSDacInterface17 dump tests exercising GetStressLogData, GetStressLogThreadEnumerator, GetStressLogMessageEnumerator, and GetArguments through the SOSDacImpl COM bridge Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
d76e2bd to
248326c
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.
Comments suppressed due to low confidence (1)
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Abstractions/Contracts/IStressLog.cs:28
- StressLogData and ThreadStressLogData are public positional record structs. Adding new parameters (StartTime, Address) is a source+binary breaking change for any downstream consumers (constructor signature + synthesized members change). If these are consumed outside the repo, consider versioning the contract shape (e.g., new record types or additive properties) rather than changing existing positional parameters, or ensure the package versioning/compat story accounts for this break.
public record struct StressLogData(
uint LoggedFacilities,
uint Level,
uint MaxSizePerThread,
uint MaxSizeTotal,
int TotalChunks,
ulong TickFrequency,
ulong StartTimestamp,
ulong StartTime,
TargetPointer Logs);
public record struct ThreadStressLogData(
TargetPointer Address,
TargetPointer NextPointer,
ulong ThreadId,
bool WriteHasWrapped,
TargetPointer CurrentPointer,
TargetPointer ChunkListHead,
TargetPointer ChunkListTail,
TargetPointer CurrentWriteChunk);
| int ISOSEnum.Reset() | ||
| { | ||
| _index = 0; | ||
| return HResults.S_OK; | ||
| } |
| uint threadCount; | ||
| ppThreadEnum.Interface!.GetCount(&threadCount); | ||
|
|
||
| SOSThreadStressLogData[] threads = new SOSThreadStressLogData[threadCount]; | ||
| uint fetched; | ||
| ppThreadEnum.Interface!.Next(threadCount, threads, &fetched); |
CoreCLR datadescriptor.inc declared TotalChunks as T_UINT32 but the actual C++ field is Volatile<LONG> (signed int32). NativeAOT already had the correct T_INT32. Fix CoreCLR to match. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| Contracts.IStressLog stressLogContract = _target.Contracts.StressLog; | ||
| if (!stressLogContract.HasStressLog()) | ||
| return HResults.S_FALSE; |
| Contracts.IStressLog stressLogContract = _target.Contracts.StressLog; | ||
| if (!stressLogContract.HasStressLog()) | ||
| return HResults.S_FALSE; |
| uint threadCount; | ||
| threadEnum.GetCount(&threadCount); | ||
|
|
||
| SOSThreadStressLogData[] threads = new SOSThreadStressLogData[threadCount]; | ||
| uint fetched; | ||
| threadEnum.Next(threadCount, threads, &fetched); | ||
|
|
- SOSStressLogMsgEnum.Reset() now clears _lastBatchStart and _lastBatchCount to prevent stale GetArguments after Reset - Assert HRESULT return values from GetCount/Next in the message enumerator test for actionable failure messages Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| unsigned int Level; | ||
| unsigned int MaxSizePerThread; | ||
| unsigned int MaxSizeTotal; | ||
| int TotalChunks; |
There was a problem hiding this comment.
Does SOS need fields like 'TotalChunks' or 'Logs'? Those seem like internal implementation details of the data structure and not info we'd need to include in the interface.
There was a problem hiding this comment.
Removed everything that isn't used by the SOS implementation: dotnet/diagnostics#5873
Logs is not used, but TotalChunks is used to print data in the !dumplog header.
| CLRDATA_ADDRESS CurrentPointer; | ||
| CLRDATA_ADDRESS ChunkListHead; | ||
| CLRDATA_ADDRESS ChunkListTail; | ||
| CLRDATA_ADDRESS CurrentWriteChunk; |
There was a problem hiding this comment.
Other than ThreadId, does SOS need anything else in this structure? It looks like implementation details of the data structure that SOS wouldn't need to know. If we need something to act as an opaque identifier when refering to a specific ThreadStressLog the address field also seems fine.
| CLRDATA_ADDRESS FormatString; | ||
| UINT64 Timestamp; | ||
| unsigned int ArgumentCount; | ||
| CLRDATA_ADDRESS MessageAddress; |
There was a problem hiding this comment.
Do we need this MessageAddress field in the interface for something?
Remove fields not consumed by the SOS DumpLog PR: - SOSStressLogData: Logs (callers use enumerator, not raw pointer) - SOSThreadStressLogData: WriteHasWrapped, CurrentPointer, ChunkListHead, ChunkListTail, CurrentWriteChunk (raw chunk-walking details abstracted by the message enumerator) - SOSStressMsgData: MessageAddress (was always 0) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 13 out of 13 changed files in this pull request and generated 6 comments.
Comments suppressed due to low confidence (1)
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Abstractions/Contracts/IStressLog.cs:28
- These are
publicpositionalrecord structs. Adding new positional parameters (StartTime/Address) is a breaking change for any consumer that constructs, deconstructs, or pattern-matches these records. If this API is intended to be compatible across versions, consider keeping the existing primary constructor shape and adding new members as additional properties (e.g.,public ulong StartTime { get; init; }) instead.
public record struct StressLogData(
uint LoggedFacilities,
uint Level,
uint MaxSizePerThread,
uint MaxSizeTotal,
int TotalChunks,
ulong TickFrequency,
ulong StartTimestamp,
ulong StartTime,
TargetPointer Logs);
public record struct ThreadStressLogData(
TargetPointer Address,
TargetPointer NextPointer,
ulong ThreadId,
bool WriteHasWrapped,
TargetPointer CurrentPointer,
TargetPointer ChunkListHead,
TargetPointer ChunkListTail,
TargetPointer CurrentWriteChunk);
| if (pFetched is null || values is null) | ||
| throw new NullReferenceException(); | ||
|
|
||
| count = Math.Min(count, (uint)values.Length); |
| int ISOSEnum.Skip(uint count) | ||
| { | ||
| _index = Math.Min(_index + count, (uint)_threads.Length); | ||
| return HResults.S_OK; | ||
| } |
| if (pFetched is null || values is null) | ||
| throw new NullReferenceException(); | ||
|
|
||
| count = Math.Min(count, (uint)values.Length); |
| int ISOSStressLogMsgEnum.GetArguments(uint messageIndex, uint argCount, ClrDataAddress[] args, uint* pFetched) | ||
| { | ||
| if (pFetched is null || args is null) | ||
| return HResults.E_POINTER; | ||
|
|
||
| if (messageIndex >= _lastBatchCount) | ||
| return HResults.E_INVALIDARG; |
| int ISOSEnum.Skip(uint count) | ||
| { | ||
| _index = Math.Min(_index + count, (uint)_messages.Length); | ||
| return HResults.S_OK; | ||
| } |
| // StressLog data structures for ISOSDacInterface17 | ||
|
|
||
| public struct SOSStressLogData | ||
| { | ||
| public uint LoggedFacilities; | ||
| public uint Level; | ||
| public uint MaxSizePerThread; | ||
| public uint MaxSizeTotal; | ||
| public int TotalChunks; | ||
| public ulong TickFrequency; | ||
| public ulong StartTimestamp; | ||
| public ulong StartTime; | ||
| } | ||
|
|
||
| public struct SOSThreadStressLogData | ||
| { | ||
| public ClrDataAddress ThreadLogAddress; | ||
| public ulong ThreadId; | ||
| } | ||
|
|
||
| public struct SOSStressMsgData | ||
| { | ||
| public uint Facility; | ||
| public ClrDataAddress FormatString; | ||
| public ulong Timestamp; | ||
| public uint ArgumentCount; | ||
| } | ||
|
|
||
| [GeneratedComInterface] | ||
| [Guid("94a2bd3d-ab3d-43bf-81d8-3ae96b8e33cd")] | ||
| public unsafe partial interface ISOSStressLogThreadEnum : ISOSEnum | ||
| { | ||
| [PreserveSig] | ||
| int Next(uint count, | ||
| [In, Out, MarshalUsing(CountElementName = nameof(count))] | ||
| SOSThreadStressLogData[] values, | ||
| uint* pFetched); | ||
| } | ||
|
|
||
| [GeneratedComInterface] | ||
| [Guid("437cb033-afe7-4c0f-a4a7-82c891bc049e")] | ||
| public unsafe partial interface ISOSStressLogMsgEnum : ISOSEnum | ||
| { | ||
| [PreserveSig] | ||
| int Next(uint count, | ||
| [In, Out, MarshalUsing(CountElementName = nameof(count))] | ||
| SOSStressMsgData[] values, | ||
| uint* pFetched); | ||
|
|
||
| [PreserveSig] | ||
| int GetArguments(uint messageIndex, | ||
| uint argCount, | ||
| [In, Out, MarshalUsing(CountElementName = nameof(argCount))] | ||
| ClrDataAddress[] args, | ||
| uint* pFetched); | ||
| } | ||
|
|
||
| [GeneratedComInterface] | ||
| [Guid("2f4bb585-ed50-479e-bbe0-10a95a5da3bb")] | ||
| public unsafe partial interface ISOSDacInterface17 | ||
| { | ||
| [PreserveSig] | ||
| int GetStressLogData(SOSStressLogData* data); | ||
|
|
||
| [PreserveSig] | ||
| int GetStressLogThreadEnumerator( | ||
| DacComNullableByRef<ISOSStressLogThreadEnum> ppEnum); | ||
|
|
||
| [PreserveSig] | ||
| int GetStressLogMessageEnumerator( | ||
| ClrDataAddress threadStressLogAddress, | ||
| DacComNullableByRef<ISOSStressLogMsgEnum> ppEnum); | ||
|
|
||
| } |
Note
This PR was created with assistance from GitHub Copilot.
Summary
Add a new
ISOSDacInterface17COM interface that exposes the cDAC'sIStressLogcontract to SOS and clrmd consumers. This enables reading stress logs from both CoreCLR and NativeAOT processes without hardcoded struct offsets.Motivation
SOS (
!DumpLog) and clrmd currently read stress logs via raw memory reads with hardcoded struct offsets matching the CoreCLR layout. NativeAOT'sStressLogstruct has a different layout (different lock type, no module table, no padding sentinel), so these tools cannot read NativeAOT stress logs.The cDAC already has a working
IStressLogcontract (StressLog_1/StressLog_2) that reads stress logs from both runtimes usingdatadescriptor.incfield offsets. This PR bridges that contract to the COM interface layer so SOS and clrmd can consume it.Changes
cDAC contract updates
StartTime(wall-clock FILETIME) to data descriptors (CoreCLR + NativeAOT),Data/StressLog.cs,StressLogDatarecord, and contract implementationAddressfield toThreadStressLogDatafor thread identification across the COM boundaryStressLogtoContractRegistryISOSDacInterface17 definition
SOSStressLogData,SOSThreadStressLogData,SOSStressMsgData(defined inline in IDL following modern convention)ISOSStressLogThreadEnum,ISOSStressLogMsgEnumISOSDacInterface17:GetStressLogData,GetStressLogThreadEnumerator,GetStressLogMessageEnumeratorS_FALSEwhen stress log is not enabled (no separate availability check needed)SOSDacImpl bridge
Reset/GetCountsupport andGetArgumentsfor per-batch arg retrievalQueryInterfacefor the IID will fail on the legacy DAC, and only the cDAC handles itIDL + testing
sospriv.idlTest Results
All dump tests pass:
StressLogIsAvailable-- PASSEDStressLogDataIsValid-- PASSEDCanEnumerateThreadsAndMessages-- PASSEDRelated PRs
!DumpLogupdated to useISOSDacInterface17with fallback