Skip to content

Add semantic index observability before performance tuning #81

@Suknna

Description

@Suknna

Summary

Semantic index performance problems are difficult to diagnose today because the existing status/progress pipeline reports the current stage and coarse counts, but not the timings, byte estimates, or clone/cache costs needed to explain high CPU/RSS during startup or refresh.

Before proposing larger performance changes, I would like to add a small observability-focused PR that reuses the existing semantic status/logging infrastructure where possible.

Existing observability already present

From a read-only scan of main:

  • SemanticIndexEvent::Progress already carries semantic build progress fields:
    • stage
    • files
    • entries_done
    • entries_total
    • source: crates/aft/src/context.rs
  • configure already emits progress for semantic stages such as:
    • initializing_embedding_model
    • refreshing_stale_files
    • embedding_stale_symbols
    • scanned_project_files
    • extracting_symbols
    • embedding_symbols
    • persisting_index
    • source: crates/aft/src/commands/configure.rs
  • The status command already exposes semantic index state:
    • status/stage/files/entries_done/entries_total
    • entries/dimension/backend/model
    • project-scoped disk.semantic_disk_bytes
    • source: crates/aft/src/commands/status.rs
  • The Pi status UI already renders semantic index status, entries, dimension, backend/model, and disk size:
    • source: packages/pi-plugin/src/shared/status.ts
    • source: packages/pi-plugin/src/dialogs/status-dialog.ts
  • Session-scoped logging already exists through slog_info!, slog_warn!, slog_debug!:
    • source: crates/aft/src/log_ctx.rs

Gap

The existing hooks are useful for user-facing progress, but they do not expose enough data to understand CPU/RSS spikes:

  • no per-stage elapsed time;
  • no file scan / chunk collection duration;
  • no chunk count before embedding;
  • no approximate embed_text / snippet bytes;
  • no vector byte estimate (entries * dimension * sizeof(f32));
  • no cache read/write duration or semantic.bin byte size in the build log;
  • no explicit log around SemanticIndex::clone() used for the refresh worker or corpus refresh;
  • no query-time observability for full linear cosine scan + full sort;
  • no refresh batch stats for watcher-driven invalidations.

This makes it hard to decide which follow-up optimization should come first: cancellation, thread limits, streaming chunk collection, vector layout changes, cache format changes, or query top-k optimization.

Proposed first PR: observability only, no behavior change

Add a small instrumentation PR that reuses the existing mechanisms rather than introducing a new metrics subsystem.

Suggested shape:

  1. Keep the existing SemanticIndexEvent::Progress / status path for coarse progress.
  2. Add structured slog_info! / slog_debug! lines around semantic build stages with:
    • stage name;
    • elapsed ms;
    • file count;
    • chunk count where available;
    • entries count;
    • dimension;
    • approximate vector bytes;
    • approximate snippet/embed text bytes where cheap to compute;
    • semantic cache file size and read/write elapsed ms.
  3. Avoid changing search behavior, cache format, or plugin UI in the first PR.
  4. Add focused tests only for any new pure helper functions used to compute stats/byte estimates.

Example log lines could be similar to:

semantic index: stage=collect_chunks files=842 chunks=5310 embed_text_bytes=1843920 snippet_bytes=612004 elapsed_ms=731
semantic index: stage=embed chunks=5310 entries=5310 dimension=384 vector_bytes=8156160 elapsed_ms=8821
semantic index: stage=persist semantic_bin_bytes=10485760 elapsed_ms=208

Follow-up PRs after the data is visible

Once the instrumentation exists, small follow-up PRs can be proposed with evidence:

  1. cancel stale semantic builds on reconfigure instead of letting detached builds keep consuming CPU;
  2. limit semantic parsing / embedding concurrency;
  3. optimize query top-k selection and avoid full result sorting;
  4. batch watcher invalidations to avoid repeated full entries.retain(...) scans;
  5. reduce peak memory by streaming chunk collection and cache serialization;
  6. reduce full-index clones used by the refresh worker / corpus refresh path.

Why start with observability

The semantic index performance work may require multiple small PRs. Starting with instrumentation should make the rest of the discussion easier for maintainers because each follow-up can show before/after timings and byte estimates instead of relying on anecdotal CPU/RSS observations.

Would the maintainers prefer this first PR to be log-only, or should the timing/byte fields also be added to the status JSON under semantic_index?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions