Skip to content

Add ISOSDacInterface17 for StressLog enumeration via cDAC#129272

Open
max-charlamb wants to merge 12 commits into
dotnet:mainfrom
max-charlamb:maxcharlamb/stresslog-cdac-bridge
Open

Add ISOSDacInterface17 for StressLog enumeration via cDAC#129272
max-charlamb wants to merge 12 commits into
dotnet:mainfrom
max-charlamb:maxcharlamb/stresslog-cdac-bridge

Conversation

@max-charlamb

@max-charlamb max-charlamb commented Jun 11, 2026

Copy link
Copy Markdown
Member

Note

This PR was created with assistance from GitHub Copilot.

Summary

Add a new ISOSDacInterface17 COM interface that exposes the cDAC's IStressLog contract to SOS and clrmd consumers. This enables reading stress logs from both CoreCLR and NativeAOT processes without hardcoded struct offsets.

Motivation

SOS (!DumpLog) and clrmd currently read stress logs via raw memory reads with hardcoded struct offsets matching the CoreCLR layout. NativeAOT's StressLog struct has a different layout (different lock type, no module table, no padding sentinel), so these tools cannot read NativeAOT stress logs.

The cDAC already has a working IStressLog contract (StressLog_1/StressLog_2) that reads stress logs from both runtimes using datadescriptor.inc field offsets. This PR bridges that contract to the COM interface layer so SOS and clrmd can consume it.

Changes

cDAC contract updates

  • Add StartTime (wall-clock FILETIME) to data descriptors (CoreCLR + NativeAOT), Data/StressLog.cs, StressLogData record, and contract implementation
  • Add Address field to ThreadStressLogData for thread identification across the COM boundary
  • Add StressLog to ContractRegistry

ISOSDacInterface17 definition

  • Data structs: SOSStressLogData, SOSThreadStressLogData, SOSStressMsgData (defined inline in IDL following modern convention)
  • Enumerator interfaces: ISOSStressLogThreadEnum, ISOSStressLogMsgEnum
  • ISOSDacInterface17: GetStressLogData, GetStressLogThreadEnumerator, GetStressLogMessageEnumerator
  • All APIs return S_FALSE when stress log is not enabled (no separate availability check needed)

SOSDacImpl bridge

  • Thread enumerator: materialized array with try/catch COM boundary protection
  • Message enumerator: eagerly materialized with full Reset/GetCount support and GetArguments for per-batch arg retrieval
  • The native DAC does not implement this interface -- QueryInterface for the IID will fail on the legacy DAC, and only the cDAC handles it

IDL + testing

  • Full IDL definitions in sospriv.idl
  • StressLog debuggee + dump-based integration tests validating the contract and the COM interface layer

Test Results

All dump tests pass:

  • StressLogIsAvailable -- PASSED
  • StressLogDataIsValid -- PASSED
  • CanEnumerateThreadsAndMessages -- PASSED

Related PRs

Add a new ISOSDacInterface17 COM interface that exposes the cDAC's
IStressLog contract to SOS and clrmd consumers. This enables reading
stress logs from both CoreCLR and NativeAOT without hardcoded struct
offsets.

Changes:
- Add StartTime (wall-clock FILETIME) field to cDAC data descriptors,
  Data/StressLog.cs, StressLogData record, and contract implementation
- Add StressLog to ContractRegistry
- Add Address field to ThreadStressLogData for thread identification
- Define ISOSDacInterface17 with IsStressLogAvailable, GetStressLogData,
  GetStressLogThreadEnumerator, GetStressLogMessageEnumerator, and
  GetStressLogMemoryRanges
- Define ISOSStressLogThreadEnum, ISOSStressLogMsgEnum, and
  ISOSStressLogMemoryEnum enumerator interfaces
- Implement SOSDacImpl bridge delegating to _target.Contracts.StressLog
- Add IDL definitions in sospriv.idl (native DAC does not implement
  this interface; QueryInterface for the IID will fail on the legacy DAC)
- Add StressLog debuggee and StressLogDumpTests for dump-based validation
- Add design document explaining the architecture and migration path

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@max-charlamb max-charlamb force-pushed the maxcharlamb/stresslog-cdac-bridge branch from 714265b to b31ec46 Compare June 11, 2026 02:34

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the cDAC stress log support by (1) enriching the StressLog contract data model (notably adding StartTime and thread-log address identity) and (2) introducing a new ISOSDacInterface17 COM surface (IDL + managed declarations + SOSDacImpl bridge) intended to let SOS/clrmd enumerate stress logs without hardcoded layout offsets. It also adds a new dump-test debuggee plus contract-level dump tests and a detailed design document.

Changes:

  • Add StartTime to StressLog data descriptors (CoreCLR + NativeAOT) and flow it through the managed contract types.
  • Add ISOSDacInterface17 + related StressLog enumerator interfaces/structs and implement the bridge in SOSDacImpl.
  • Add StressLog dump debuggee and dump-based integration tests for the StressLog contract, plus a design doc.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
src/native/managed/cdac/tests/DumpTests/StressLogDumpTests.cs Adds dump-based integration tests for the StressLog contract (availability, header data, basic enumeration).
src/native/managed/cdac/tests/DumpTests/Debuggees/StressLog/StressLog.csproj New debuggee project enabling stress logging via environment variables.
src/native/managed/cdac/tests/DumpTests/Debuggees/StressLog/Program.cs New debuggee that generates allocations/GC activity then FailFasts to produce a dump.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Legacy/SOSDacImpl.cs Implements ISOSDacInterface17 and new StressLog enumerator implementations bridging to the cDAC IStressLog contract.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Legacy/ISOSDacInterface.cs Adds public COM interface/struct declarations for ISOSDacInterface17 and StressLog enumerators.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Data/StressLog.cs Adds [Field] StartTime to the StressLog data model.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/StressLog.cs Plumbs StartTime into StressLogData and includes thread-log address in ThreadStressLogData.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Abstractions/Contracts/IStressLog.cs Extends StressLogData with StartTime and ThreadStressLogData with Address.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Abstractions/ContractRegistry.cs Exposes StressLog via Target.Contracts.StressLog.
src/coreclr/vm/datadescriptor/datadescriptor.inc Exports StressLog.StartTime for CoreCLR via data descriptors.
src/coreclr/nativeaot/Runtime/datadescriptor/datadescriptor.inc Exports StressLog.StartTime for NativeAOT via data descriptors.
src/coreclr/inc/sospriv.idl Adds IDL definitions for ISOSDacInterface17 and associated StressLog structs/enumerator interfaces.
docs/design/datacontracts/stresslog-isosinterface-design.md Design rationale and planned consumer migration strategy for exposing StressLog via SOS DAC interface.

@jkotas

jkotas commented Jun 11, 2026

Copy link
Copy Markdown
Member

Is there a specific reason that regular CoreCLR and NativeAOT stress logs must be different?

- Add [GeneratedComClass] to all three enumerator classes
- Use ToTargetPointer for ClrDataAddress comparison
- Shrink _lastBatchRaw to actual fetched size
- Return ex.HResult instead of E_FAIL in IsStressLogAvailable

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 11, 2026 15:21

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 5 comments.

@jkotas

jkotas commented Jun 11, 2026

Copy link
Copy Markdown
Member

We should be able to delete https://github.com/max-charlamb/runtime/blob/bd3de83c6195bdbda612e67672fb1248a1014d28/src/coreclr/inc/stresslog.h#L225 once this is in place. (It does not have to be done in this PR.)

Max Charlamb and others added 3 commits June 16, 2026 11:50
- Remove DacpStressLogMemoryRange, ISOSStressLogMemoryEnum, and
  GetStressLogMemoryRanges from IDL, managed interface, and impl
  since no SOS consumer uses them.
- Refactor SOSStressLogMsgEnum to eagerly materialize messages into
  an array, enabling Reset and GetCount support consistent with
  other ISOSEnum implementations.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Rename DacpStressLogData -> SOSStressLogData,
  DacpThreadStressLogData -> SOSThreadStressLogData,
  DacpStressMsgData -> SOSStressMsgData
- Define structs inline in sospriv.idl with typedef struct and
  cpp_quote guards, following the SOSMethodData/SOSMemoryRegion
  convention used by newer interfaces
- Remove Dacp forward declarations and dacprivate.h definitions
- Put ISOSDacInterface16, ISOSDacInterface17 on same line in class decl

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Use 'written < count' pattern for S_FALSE (standard COM IEnumXxx
  convention matching SOSMethodEnum/SOSMemoryEnum)
- Return E_POINTER directly for null output in GetStressLogData

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 17, 2026 13:50

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.

Max Charlamb and others added 2 commits June 17, 2026 10:13
StressMsgHeaderSize, ChunkSize, MaxMessageSize, and PointerSize are
implementation details for raw memory walking that the message
enumerator now abstracts away. Consumers no longer need them.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Callers can just call GetStressLogData/GetStressLogThreadEnumerator
directly -- they return S_FALSE when stress log is not enabled.
No need for a separate availability check method.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 17, 2026 14:18

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.

Comments suppressed due to low confidence (1)

src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Abstractions/Contracts/IStressLog.cs:28

  • StressLogData and ThreadStressLogData are public record structs. Adding new positional parameters (StartTime, Address) changes the synthesized primary constructor and Deconstruct, which is source/binary breaking for any consumers. If this assembly has any external consumers, consider keeping the existing primary constructor stable and adding the new data as additional properties (or introducing new types) instead.
public record struct StressLogData(
    uint LoggedFacilities,
    uint Level,
    uint MaxSizePerThread,
    uint MaxSizeTotal,
    int TotalChunks,
    ulong TickFrequency,
    ulong StartTimestamp,
    ulong StartTime,
    TargetPointer Logs);

public record struct ThreadStressLogData(
    TargetPointer Address,
    TargetPointer NextPointer,
    ulong ThreadId,
    bool WriteHasWrapped,
    TargetPointer CurrentPointer,
    TargetPointer ChunkListHead,
    TargetPointer ChunkListTail,
    TargetPointer CurrentWriteChunk);

Comment thread src/coreclr/inc/sospriv.idl
- Initialize *data = default early in GetStressLogData for clean
  output on all failure paths
- Wrap enumerator Next methods in try/catch to avoid leaking
  exceptions across the COM boundary
- Add ISOSDacInterface17 dump tests exercising GetStressLogData,
  GetStressLogThreadEnumerator, GetStressLogMessageEnumerator,
  and GetArguments through the SOSDacImpl COM bridge

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 17, 2026 16:49
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@max-charlamb max-charlamb force-pushed the maxcharlamb/stresslog-cdac-bridge branch from d76e2bd to 248326c Compare June 17, 2026 16:51

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.

Comments suppressed due to low confidence (1)

src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Abstractions/Contracts/IStressLog.cs:28

  • StressLogData and ThreadStressLogData are public positional record structs. Adding new parameters (StartTime, Address) is a source+binary breaking change for any downstream consumers (constructor signature + synthesized members change). If these are consumed outside the repo, consider versioning the contract shape (e.g., new record types or additive properties) rather than changing existing positional parameters, or ensure the package versioning/compat story accounts for this break.
public record struct StressLogData(
    uint LoggedFacilities,
    uint Level,
    uint MaxSizePerThread,
    uint MaxSizeTotal,
    int TotalChunks,
    ulong TickFrequency,
    ulong StartTimestamp,
    ulong StartTime,
    TargetPointer Logs);

public record struct ThreadStressLogData(
    TargetPointer Address,
    TargetPointer NextPointer,
    ulong ThreadId,
    bool WriteHasWrapped,
    TargetPointer CurrentPointer,
    TargetPointer ChunkListHead,
    TargetPointer ChunkListTail,
    TargetPointer CurrentWriteChunk);

Comment on lines +7252 to +7256
int ISOSEnum.Reset()
{
_index = 0;
return HResults.S_OK;
}
Comment on lines +123 to +128
uint threadCount;
ppThreadEnum.Interface!.GetCount(&threadCount);

SOSThreadStressLogData[] threads = new SOSThreadStressLogData[threadCount];
uint fetched;
ppThreadEnum.Interface!.Next(threadCount, threads, &fetched);
CoreCLR datadescriptor.inc declared TotalChunks as T_UINT32 but the
actual C++ field is Volatile<LONG> (signed int32). NativeAOT already
had the correct T_INT32. Fix CoreCLR to match.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 17, 2026 19:52

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 6 comments.

Comment on lines +7304 to +7306
Contracts.IStressLog stressLogContract = _target.Contracts.StressLog;
if (!stressLogContract.HasStressLog())
return HResults.S_FALSE;
Comment on lines +7339 to +7341
Contracts.IStressLog stressLogContract = _target.Contracts.StressLog;
if (!stressLogContract.HasStressLog())
return HResults.S_FALSE;
Comment on lines +127 to +133
uint threadCount;
threadEnum.GetCount(&threadCount);

SOSThreadStressLogData[] threads = new SOSThreadStressLogData[threadCount];
uint fetched;
threadEnum.Next(threadCount, threads, &fetched);

- SOSStressLogMsgEnum.Reset() now clears _lastBatchStart and
  _lastBatchCount to prevent stale GetArguments after Reset
- Assert HRESULT return values from GetCount/Next in the message
  enumerator test for actionable failure messages

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
unsigned int Level;
unsigned int MaxSizePerThread;
unsigned int MaxSizeTotal;
int TotalChunks;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does SOS need fields like 'TotalChunks' or 'Logs'? Those seem like internal implementation details of the data structure and not info we'd need to include in the interface.

@max-charlamb max-charlamb Jun 17, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed everything that isn't used by the SOS implementation: dotnet/diagnostics#5873

Logs is not used, but TotalChunks is used to print data in the !dumplog header.

Comment thread src/coreclr/inc/sospriv.idl Outdated
CLRDATA_ADDRESS CurrentPointer;
CLRDATA_ADDRESS ChunkListHead;
CLRDATA_ADDRESS ChunkListTail;
CLRDATA_ADDRESS CurrentWriteChunk;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than ThreadId, does SOS need anything else in this structure? It looks like implementation details of the data structure that SOS wouldn't need to know. If we need something to act as an opaque identifier when refering to a specific ThreadStressLog the address field also seems fine.

Comment thread src/coreclr/inc/sospriv.idl Outdated
CLRDATA_ADDRESS FormatString;
UINT64 Timestamp;
unsigned int ArgumentCount;
CLRDATA_ADDRESS MessageAddress;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this MessageAddress field in the interface for something?

Remove fields not consumed by the SOS DumpLog PR:
- SOSStressLogData: Logs (callers use enumerator, not raw pointer)
- SOSThreadStressLogData: WriteHasWrapped, CurrentPointer,
  ChunkListHead, ChunkListTail, CurrentWriteChunk (raw chunk-walking
  details abstracted by the message enumerator)
- SOSStressMsgData: MessageAddress (was always 0)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 17, 2026 21:51

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 6 comments.

Comments suppressed due to low confidence (1)

src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Abstractions/Contracts/IStressLog.cs:28

  • These are public positional record structs. Adding new positional parameters (StartTime / Address) is a breaking change for any consumer that constructs, deconstructs, or pattern-matches these records. If this API is intended to be compatible across versions, consider keeping the existing primary constructor shape and adding new members as additional properties (e.g., public ulong StartTime { get; init; }) instead.
public record struct StressLogData(
    uint LoggedFacilities,
    uint Level,
    uint MaxSizePerThread,
    uint MaxSizeTotal,
    int TotalChunks,
    ulong TickFrequency,
    ulong StartTimestamp,
    ulong StartTime,
    TargetPointer Logs);

public record struct ThreadStressLogData(
    TargetPointer Address,
    TargetPointer NextPointer,
    ulong ThreadId,
    bool WriteHasWrapped,
    TargetPointer CurrentPointer,
    TargetPointer ChunkListHead,
    TargetPointer ChunkListTail,
    TargetPointer CurrentWriteChunk);

Comment on lines +7135 to +7138
if (pFetched is null || values is null)
throw new NullReferenceException();

count = Math.Min(count, (uint)values.Length);
Comment on lines +7154 to +7158
int ISOSEnum.Skip(uint count)
{
_index = Math.Min(_index + count, (uint)_threads.Length);
return HResults.S_OK;
}
Comment on lines +7194 to +7197
if (pFetched is null || values is null)
throw new NullReferenceException();

count = Math.Min(count, (uint)values.Length);
Comment on lines +7226 to +7232
int ISOSStressLogMsgEnum.GetArguments(uint messageIndex, uint argCount, ClrDataAddress[] args, uint* pFetched)
{
if (pFetched is null || args is null)
return HResults.E_POINTER;

if (messageIndex >= _lastBatchCount)
return HResults.E_INVALIDARG;
Comment on lines +7245 to +7249
int ISOSEnum.Skip(uint count)
{
_index = Math.Min(_index + count, (uint)_messages.Length);
return HResults.S_OK;
}
Comment on lines +1195 to +1268
// StressLog data structures for ISOSDacInterface17

public struct SOSStressLogData
{
public uint LoggedFacilities;
public uint Level;
public uint MaxSizePerThread;
public uint MaxSizeTotal;
public int TotalChunks;
public ulong TickFrequency;
public ulong StartTimestamp;
public ulong StartTime;
}

public struct SOSThreadStressLogData
{
public ClrDataAddress ThreadLogAddress;
public ulong ThreadId;
}

public struct SOSStressMsgData
{
public uint Facility;
public ClrDataAddress FormatString;
public ulong Timestamp;
public uint ArgumentCount;
}

[GeneratedComInterface]
[Guid("94a2bd3d-ab3d-43bf-81d8-3ae96b8e33cd")]
public unsafe partial interface ISOSStressLogThreadEnum : ISOSEnum
{
[PreserveSig]
int Next(uint count,
[In, Out, MarshalUsing(CountElementName = nameof(count))]
SOSThreadStressLogData[] values,
uint* pFetched);
}

[GeneratedComInterface]
[Guid("437cb033-afe7-4c0f-a4a7-82c891bc049e")]
public unsafe partial interface ISOSStressLogMsgEnum : ISOSEnum
{
[PreserveSig]
int Next(uint count,
[In, Out, MarshalUsing(CountElementName = nameof(count))]
SOSStressMsgData[] values,
uint* pFetched);

[PreserveSig]
int GetArguments(uint messageIndex,
uint argCount,
[In, Out, MarshalUsing(CountElementName = nameof(argCount))]
ClrDataAddress[] args,
uint* pFetched);
}

[GeneratedComInterface]
[Guid("2f4bb585-ed50-479e-bbe0-10a95a5da3bb")]
public unsafe partial interface ISOSDacInterface17
{
[PreserveSig]
int GetStressLogData(SOSStressLogData* data);

[PreserveSig]
int GetStressLogThreadEnumerator(
DacComNullableByRef<ISOSStressLogThreadEnum> ppEnum);

[PreserveSig]
int GetStressLogMessageEnumerator(
ClrDataAddress threadStressLogAddress,
DacComNullableByRef<ISOSStressLogMsgEnum> ppEnum);

}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants