feat(dist): add structured logging to dist backend (Phase A.1)#108
Merged
feat(dist): add structured logging to dist backend (Phase A.1)#108
Conversation
Introduce `WithDistLogger(*slog.Logger)` option for `DistMemory`. When supplied, the dist backend emits structured `slog` records for all operational events across its background loops and error surfaces: - HTTP listener bind success and failure - Serve-goroutine exits (previously silent in `serveErr` only) - Peer state transitions: suspect-on-timeout, suspect-on-probe-failure, and dead-peer pruning - Merkle sync fetch failures - Rebalance migration forward failures - Hint dropped after replay error All records are pre-bound with `component=dist_memory` and `node_id=<id>` at construction so operators can grep/filter without each call site re-attaching the node identity. Default behaviour is unchanged: the zero-value path installs a `slog.DiscardHandler` so library code never writes to stderr unless the caller opts in. Add `tests/dist_logging_test.go` with two contract tests: - `TestDistMemory_LoggerEmitsListenerStart` — asserts the listener-start record is emitted with the expected level and attributes. - `TestDistMemory_LoggerDefaultIsSilent` — asserts no-option construction still succeeds without any side effects. Update `.mdl_style.rb` to allow `MD024` (`allow_different_nesting`) so the Keep-a-Changelog heading structure passes linting. Update `CHANGELOG.md`: add `[Unreleased]` entry for this work; correct version numbers from `2.0.x` to `0.5.0`/`0.4.3`; fix comparison links to match actual tags.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR introduces opt-in structured logging for the distributed backend (DistMemory) via a new WithDistLogger(*slog.Logger) option, keeping default behavior silent while surfacing operational events (listener lifecycle, peer liveness transitions, merkle sync errors, rebalance forwarding failures, hint replay drops) when a logger is provided.
Changes:
- Add
WithDistLoggeroption and wire a pre-bound*slog.LoggerthroughDistMemoryand its internal HTTP server to emit structured operational logs. - Add contract tests validating that listener-start logging is emitted when configured and that default construction remains silent.
- Update changelog and markdownlint style to reflect the new feature and accommodate Keep-a-Changelog heading structure.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
pkg/backend/dist_memory.go |
Adds logger field/option, binds default logger, and emits structured log events across dist operations. |
pkg/backend/dist_http_server.go |
Adds inherited logger to surface serve-goroutine exit errors via structured logs. |
pkg/backend/dist_memory_test_helpers.go |
Propagates the dist logger into the HTTP server in test helpers. |
tests/dist_logging_test.go |
Adds tests + a custom slog.Handler to assert emitted structured records. |
CHANGELOG.md |
Documents the feature and adjusts release versioning/links (but currently breaks reference links). |
.mdl_style.rb |
Configures markdownlint to allow repeated nested headings (MD024) for Keep-a-Changelog format. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+710
to
+712
| dm.logger = dm.logger.With( | ||
| slog.String("component", "dist_memory"), | ||
| slog.String("node_id", dm.nodeID), |
Comment on lines
+201
to
+202
| Unreleased: <https://github.com/hyp3rd/hypercache/compare/v0.5.0...HEAD> | ||
| Released: [0.5.0](https://github.com/hyp3rd/hypercache/releases/tag/v0.5.0) |
Introduces per-operation spans on the DistMemory backend as the second
phase of the production-readiness work.
Public API:
- WithDistTracerProvider(trace.TracerProvider) option enables tracing;
nil-safe, no-op by default so existing callers are unaffected.
- ConsistencyLevel.String() renders consistency levels as ONE /
QUORUM / ALL for use in log and span attributes.
Spans emitted:
- dist.get — attributes: cache.key.length, dist.consistency, cache.hit
- dist.set — attributes: cache.key.length, dist.consistency,
dist.owners.count, dist.acks; errors recorded on span
- dist.remove — attribute: dist.keys.count; errors recorded on span
- dist.replicate.set / dist.replicate.remove — child spans per peer
during fan-out carrying peer.id
Design notes:
- Cache key values are intentionally omitted from all spans (keys can
be PII such as user IDs or session tokens); only cache.key.length
is recorded.
- Business logic extracted into getImpl / setImpl / removeImpl so the
public wrappers remain thin tracing + latency shells.
- replicateSetWithSpan / replicateRemoveWithSpan helpers ensure the
trace-tree shape is identical regardless of which path drives fan-out.
- Library default installs noop.NewTracerProvider so the hot path
incurs zero allocation overhead when tracing is not opted into.
Deps: adds go.opentelemetry.io/otel/sdk v1.43.0 and
go.uber.org/goleak v1.3.0 for goroutine-leak detection in tests.
Tests: new tests/dist_tracing_test.go covers Set/Get span emission,
hit-vs-miss cache.hit attribute, replication child-span shape,
and the default no-op behaviour.
Introduce `WithDistMeterProvider(metric.MeterProvider)` option for the DistMemory backend. When set, NewDistMemory registers 40+ observable instruments covering every field on DistMetrics: - ObservableCounters for cumulative totals: write attempts/acks/quorum failures, forwarded ops, replica fan-out, hinted-handoff lifecycle (queued/replayed/expired/dropped), Merkle syncs/keys pulled, heartbeat probes, membership transitions, versioning, rebalance keys and batches. - ObservableGauges for current state: alive/suspect/dead member counts, active tombstones, hinted-handoff queue bytes, last-operation latencies (merkle build/diff/fetch, rebalance, auto-sync) in nanoseconds. A single registered callback drives all instruments: on each collection cycle it takes one Metrics() snapshot and observes every instrument, keeping per-operation overhead to zero beyond the existing atomics. Instrument names use the `dist.` prefix so a Prometheus exporter renders them under a single subsystem. Stop() unregisters the callback so the OTel SDK does not invoke it against a half-stopped backend. Library default remains a no-op meter; metrics cost nothing unless the caller opts in. Refactor: extract `installTelemetryDefaults()` from NewDistMemory to wire no-op logger/tracer/meter fallbacks and register the OTel callback, keeping the constructor under the function-length lint cap. Also fix README version references from v2.0.0 → v0.5.0 in the sharded eviction, transport hardening, and security sections. Tests: add dist_metrics_test.go (Phase A.3 contract tests): - TestDistMemory_MetricsExportsObservableCounters: 3 Sets yield dist.write.attempts=3 via a ManualReader. - TestDistMemory_MetricsExportsObservableGauges: dist.members.alive=1 for a standalone node. - TestDistMemory_MetricsDefaultIsNoop: no meter provider → cache still functional, no panic.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Introduce
WithDistLogger(*slog.Logger)option forDistMemory. When supplied, the dist backend emits structuredslogrecords for all operational events across its background loops and error surfaces:serveErronly)All records are pre-bound with
component=dist_memoryandnode_id=<id>at construction so operators can grep/filter without each call site re-attaching the node identity.Default behaviour is unchanged: the zero-value path installs a
slog.DiscardHandlerso library code never writes to stderr unless the caller opts in.Add
tests/dist_logging_test.gowith two contract tests:TestDistMemory_LoggerEmitsListenerStart— asserts the listener-start record is emitted with the expected level and attributes.TestDistMemory_LoggerDefaultIsSilent— asserts no-option construction still succeeds without any side effects.Update
.mdl_style.rbto allowMD024(allow_different_nesting) so the Keep-a-Changelog heading structure passes linting.Update
CHANGELOG.md: add[Unreleased]entry for this work; correct version numbers from2.0.xto0.5.0/0.4.3; fix comparison links to match actual tags.