Add semantic index observability before performance tuning

## Summary

Semantic index performance problems are difficult to diagnose today because the existing status/progress pipeline reports the current stage and coarse counts, but not the timings, byte estimates, or clone/cache costs needed to explain high CPU/RSS during startup or refresh.

Before proposing larger performance changes, I would like to add a small observability-focused PR that reuses the existing semantic status/logging infrastructure where possible.

## Existing observability already present

From a read-only scan of `main`:

- `SemanticIndexEvent::Progress` already carries semantic build progress fields:
  - `stage`
  - `files`
  - `entries_done`
  - `entries_total`
  - source: `crates/aft/src/context.rs`
- `configure` already emits progress for semantic stages such as:
  - `initializing_embedding_model`
  - `refreshing_stale_files`
  - `embedding_stale_symbols`
  - `scanned_project_files`
  - `extracting_symbols`
  - `embedding_symbols`
  - `persisting_index`
  - source: `crates/aft/src/commands/configure.rs`
- The `status` command already exposes semantic index state:
  - status/stage/files/entries_done/entries_total
  - entries/dimension/backend/model
  - project-scoped `disk.semantic_disk_bytes`
  - source: `crates/aft/src/commands/status.rs`
- The Pi status UI already renders semantic index status, entries, dimension, backend/model, and disk size:
  - source: `packages/pi-plugin/src/shared/status.ts`
  - source: `packages/pi-plugin/src/dialogs/status-dialog.ts`
- Session-scoped logging already exists through `slog_info!`, `slog_warn!`, `slog_debug!`:
  - source: `crates/aft/src/log_ctx.rs`

## Gap

The existing hooks are useful for user-facing progress, but they do not expose enough data to understand CPU/RSS spikes:

- no per-stage elapsed time;
- no file scan / chunk collection duration;
- no chunk count before embedding;
- no approximate `embed_text` / `snippet` bytes;
- no vector byte estimate (`entries * dimension * sizeof(f32)`);
- no cache read/write duration or `semantic.bin` byte size in the build log;
- no explicit log around `SemanticIndex::clone()` used for the refresh worker or corpus refresh;
- no query-time observability for full linear cosine scan + full sort;
- no refresh batch stats for watcher-driven invalidations.

This makes it hard to decide which follow-up optimization should come first: cancellation, thread limits, streaming chunk collection, vector layout changes, cache format changes, or query top-k optimization.

## Proposed first PR: observability only, no behavior change

Add a small instrumentation PR that reuses the existing mechanisms rather than introducing a new metrics subsystem.

Suggested shape:

1. Keep the existing `SemanticIndexEvent::Progress` / `status` path for coarse progress.
2. Add structured `slog_info!` / `slog_debug!` lines around semantic build stages with:
   - stage name;
   - elapsed ms;
   - file count;
   - chunk count where available;
   - entries count;
   - dimension;
   - approximate vector bytes;
   - approximate snippet/embed text bytes where cheap to compute;
   - semantic cache file size and read/write elapsed ms.
3. Avoid changing search behavior, cache format, or plugin UI in the first PR.
4. Add focused tests only for any new pure helper functions used to compute stats/byte estimates.

Example log lines could be similar to:

```text
semantic index: stage=collect_chunks files=842 chunks=5310 embed_text_bytes=1843920 snippet_bytes=612004 elapsed_ms=731
semantic index: stage=embed chunks=5310 entries=5310 dimension=384 vector_bytes=8156160 elapsed_ms=8821
semantic index: stage=persist semantic_bin_bytes=10485760 elapsed_ms=208
```

## Follow-up PRs after the data is visible

Once the instrumentation exists, small follow-up PRs can be proposed with evidence:

1. cancel stale semantic builds on reconfigure instead of letting detached builds keep consuming CPU;
2. limit semantic parsing / embedding concurrency;
3. optimize query top-k selection and avoid full result sorting;
4. batch watcher invalidations to avoid repeated full `entries.retain(...)` scans;
5. reduce peak memory by streaming chunk collection and cache serialization;
6. reduce full-index clones used by the refresh worker / corpus refresh path.

## Why start with observability

The semantic index performance work may require multiple small PRs. Starting with instrumentation should make the rest of the discussion easier for maintainers because each follow-up can show before/after timings and byte estimates instead of relying on anecdotal CPU/RSS observations.

Would the maintainers prefer this first PR to be log-only, or should the timing/byte fields also be added to the `status` JSON under `semantic_index`?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add semantic index observability before performance tuning #81

Summary

Existing observability already present

Gap

Proposed first PR: observability only, no behavior change

Follow-up PRs after the data is visible

Why start with observability

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Add semantic index observability before performance tuning #81

Description

Summary

Existing observability already present

Gap

Proposed first PR: observability only, no behavior change

Follow-up PRs after the data is visible

Why start with observability

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions