Skip to content

feat(semantic): [alpha build] provider-aware typed embeddings, reranking, diagnostics, and eval harness#87

Open
Zireael wants to merge 38 commits into
cortexkit:mainfrom
Zireael:semantic-search-enhancement
Open

feat(semantic): [alpha build] provider-aware typed embeddings, reranking, diagnostics, and eval harness#87
Zireael wants to merge 38 commits into
cortexkit:mainfrom
Zireael:semantic-search-enhancement

Conversation

@Zireael
Copy link
Copy Markdown
Contributor

@Zireael Zireael commented Jun 2, 2026

Summary

Semantic search in AFT moves from a minimal embedding-and-cosine prototype to a provider-capability-aware retrieval subsystem with typed vectors, optional reranking, background lifecycle management, diagnostics, and evaluation tooling. This is a public preview — the feature is functional and tested (~93 new tests) but expects iteration based on real-world feedback.

What changed

The upgrade touches the full semantic pipeline — config, indexing, retrieval, diagnostics, and observability — without breaking the default fastembed experience.

Typed vector representations

Vectors are no longer opaque f32 blobs. Every stored vector carries explicit type metadata (DenseF32, Int8SourceDecoded, BinaryPacked) and is paired with its source kind so the correct distance metric is selected automatically. Binary packed vectors use Hamming search (native bitwise XOR + popcount) instead of cosine, which is both faster and semantically correct for quantized embeddings. This unlocks Perplexity's base64_binary and base64_int8 output modes alongside standard dense providers.

Provider capability profiles

Each embedding backend (fastembed, OpenAI-compatible, Ollama, Perplexity) declares what it supports: output encoding, distance metric, dimension range, max batch size. The config layer validates combinations at configure time — you cannot accidentally request binary vectors through a cosine-only provider. Profiles also carry fingerprint fields so switching providers triggers a clean index rebuild rather than silent corruption.

Fingerprint-driven index lifecycle

A SemanticIndexFingerprint captures every dimension that affects index correctness: backend, model, base_url, dimension, chunking_version, output_encoding, storage_strategy, vector kinds, normalization, and prompt hashes. diff() classifies changes as Rebuild (structural — re-embed everything), ClearQueryCache (query prompts changed — invalidate cached results only), or None. This replaces the previous "delete and hope" invalidation with precise, explainable rebuild decisions.

Non-blocking cold start

Index builds run in a background thread with cooperative cancellation (SemanticCancellationToken via AtomicU64 generation counter). The build checks the generation before each embedding batch and exits early when a reconfigure arrives. Priority ordering ensures high-value files (recently edited, high PageRank) get embedded first. Exponential backoff handles transient provider failures without blocking the session.

Stale-vector pruning

When files are edited, deleted, moved, excluded, or re-included, the index tracks which vectors are stale and prunes them during the next refresh cycle. Every vector record carries file/chunk ownership metadata (file path, version, chunk hash, index fingerprint) so pruning is traceable and deterministic.

File policy and docs chunking

A configurable file policy controls which files enter the index (include globs, exclude globs, max file size, max chunk count). The docs chunker splits Markdown and documentation files into semantic sections before embedding, improving recall for documentation-shaped queries.

Reranking pipeline

Optional reranking via any OpenAI-compatible /v1/rerank or chat-completion endpoint. The pipeline sends initial retrieval candidates to a reranker, parses the response (supporting multiple JSON shapes), and reorders results with safe fallback — if the reranker fails, the original cosine-similarity order is returned unchanged. Config fields: rerank.enabled, rerank.model, rerank.base_url, rerank.api_key_env, rerank.max_candidates.

Search pipeline metrics and diagnostics

Every aft_search call records timing, cache hits/misses, result counts, and reranker fallback events. Metrics are exposed through the status command and through JSONL diagnostic logs for offline analysis. The DiagnosticsOutputMode config controls verbosity in tool output (compact | verbose | off).

Semantic doctor

semantic_doctor is a health-check command that reports config summary, index summary, metrics summary, provider summary, and actionable suggestions. Use it to verify that the index is healthy, the provider is reachable, and the configuration is consistent.

Semantic eval harness

semantic_eval runs a JSONL-defined evaluation suite against the semantic index. Each case specifies a query, expected paths, expected symbols, and top-k. The harness computes recall@k and MRR (Mean Reciprocal Rank) for quantifying retrieval quality across config changes.

Status integration

The status command now includes semantic health metrics: lifecycle state, entry count, dimension, total queries, cache hit ratio, average query time, and provider info. The OpenCode TUI sidebar surfaces these alongside the existing index state.

Config trust boundary

backend, base_url, and api_key_env are user-only fields — project-level aft.jsonc cannot inject these. A hostile repository cannot redirect embeddings at an attacker-controlled endpoint or exfiltrate API keys. The plugin logs a warning when it strips a project-level setting.

Contextualized document-chunk embedding (partial)

Initial support for Perplexity-style document/chunk grouped embedding — chunks from the same source document are batched together rather than flattened. Oversized document handling and retry logic are still in progress (see roadmap).

How to test

Default fastembed (zero-config)

# Enable semantic search in your AFT config
# ~/.config/opencode/aft.jsonc or ~/.pi/agent/aft.jsonc:
{ "semantic_search": true }

# Start a session — index builds in background
# Run aft_search with a concept query:
aft_search({ "query": "authentication middleware" })

Verify: results appear with source: semantic or source: hybrid tags. Status shows [index: ready] after build completes.

Provider switching

// Switch to OpenAI-compatible
{
  "semantic_search": true,
  "semantic": {
    "backend": "openai_compatible",
    "model": "text-embedding-3-small",
    "base_url": "https://api.openai.com/v1",
    "api_key_env": "OPENAI_API_KEY"
  }
}

Verify: index rebuilds automatically on next session start. Status shows new provider/model.

Reranking

{
  "semantic_search": true,
  "semantic": {
    "backend": "openai_compatible",
    "model": "text-embedding-3-small",
    "base_url": "https://api.openai.com/v1",
    "api_key_env": "OPENAI_API_KEY"
  },
  "rerank": {
    "enabled": true,
    "model": "rerank-english-v3.0",
    "base_url": "https://api.cohere.com",
    "api_key_env": "COHERE_API_KEY"
  }
}

Verify: search results show reranker-sorted order. Disable reranker — results fall back to cosine order.

Semantic doctor

aft_search({ "query": "test" })  # trigger index build if cold
# Then check health via status command or semantic_doctor

Verify: health report shows ConfigSummary, IndexSummary, MetricsSummary, ProviderSummary.

Eval harness

// Create eval-cases.jsonl:
{"query": "authentication handler", "expected_paths": ["src/auth/middleware.ts"], "expected_symbols": ["authMiddleware"], "top_k": 10}
{"query": "database connection", "expected_paths": ["src/db/pool.ts"], "expected_symbols": ["createPool"], "top_k": 10}

Verify: returns recall@k and MRR scores.

Test coverage

~93 tests across 8 test sub-tasks covering:

  • Config parsing and backward compatibility
  • Fingerprint diff matrix (all field combinations → Rebuild/ClearQueryCache/None)
  • File policy, docs chunking, and manifest handling
  • VectorStore trait with DenseF32 and BinaryPacked implementations
  • Binary packed-vector storage and Hamming search
  • Lifecycle states, snapshots, and stale-vector pruning
  • Search pipeline metrics, diagnostics, and DiagnosticsOutputMode
  • Concurrency, race conditions, and cancellation token behavior
  • Security trust boundary enforcement (project config stripping)
  • Semantic doctor health report
  • Semantic eval harness (JSONL parsing, scoring, recall/MRR)
  • Reranking pipeline (parse multiple JSON shapes, fallback on failure)

Roadmap

Still in progress or planned for follow-up:

  • aft-t6p.23: Complete contextualized document-chunk embedding (oversized docs, retry logic) — partially implemented
  • aft-t6p.2.2: Configurable snippet truncation in reranking (currently hardcoded at 200 chars)
  • aft-t6p.18: End-to-end verification across all backends
  • aft-t6p.5: Configuration and operations documentation
  • Performance benchmarking suite
  • Migration tooling for index format upgrades

Architecture notes

Key new modules:

  • crates/aft/src/semantic_rerank.rs — reranking pipeline with safe fallback
  • crates/aft/src/semantic_diagnostics.rs — JSONL diagnostic logging
  • crates/aft/src/semantic_doctor.rs — health-check report generation
  • crates/aft/src/semantic_eval.rs — evaluation harness (JSONL parser, scoring)
  • crates/aft/src/vector_store.rs — VectorStore trait with DenseF32 and BinaryPacked implementations
  • crates/aft/src/commands/semantic_doctor.rs — doctor command handler
  • crates/aft/src/commands/semantic_eval.rs — eval command handler

Modified significantly:

  • crates/aft/src/semantic_index.rs — lifecycle management, fingerprint-driven invalidation, non-blocking build, stale pruning, typed vectors
  • crates/aft/src/config.rs — provider profiles, rerank config, trust boundary fields
  • crates/aft/src/commands/status.rs — semantic health metrics
  • crates/aft/src/commands/semantic_search.rs — reranking integration, diagnostics output mode

View with Codesmith Autofix with Codesmith
Need help on this PR? Tag /codesmith with what you need. Autofix is disabled.


Summary by cubic

Upgrades semantic search to a provider-aware pipeline with typed vectors, optional reranking, partial-index querying, and built-in diagnostics/eval. Adds Perplexity support and binary/int8 vectors with Hamming, and hardens reranking and config schema alignment for a smoother setup.

  • New Features

    • Provider profiles and typed vectors (f32, int8, binary packed) with auto metric selection; enables Perplexity base64 binary/int8.
    • Fingerprint-driven lifecycle with background/partial builds and precise stale-vector pruning.
    • Optional reranking via OpenAI-compatible endpoints with safe fallback to original order.
    • Metrics and diagnostics with JSONL logs and configurable verbosity; status surfaces semantic health.
    • Tools: semantic_doctor for health checks and semantic_eval for recall@k/MRR.
    • File policy and docs chunking for better recall; trust boundary blocks project configs from setting backend/base_url/api_key.
    • VectorStore abstraction decouples storage/search; preview contextualized document-chunk embedding.
    • Dev/repo: Docker-based Rust validation scripts; restore upstream .alfonso/ and expand .gitignore to ignore local agent/tooling dirs.
    • Plugin schema now matches Rust enums and adds perplexity (output encoding, storage strategy, input mode, distance metric); project-level stripping extended with tests.
    • Reranker robustness: strip markdown fences in responses and prevent duplicate indices in reranked results.
  • Migration

    • No changes needed for default fastembed.
    • Switching providers auto-triggers a clean index rebuild via fingerprints.
    • To use reranking, add a rerank block (model, base_url, api key env, limits).
    • Move backend/base_url/api_key to user-level config; project-level values are ignored.

Written for commit 95ea25c. Summary will update on new commits.

Review in cubic

Greptile Summary

This PR replaces the minimal embedding prototype with a provider-capability-aware retrieval subsystem: typed vectors (f32, int8, binary-packed), fingerprint-driven index lifecycle with background builds and cooperative cancellation, optional LLM-based reranking, JSONL diagnostics, semantic_doctor, and semantic_eval.

  • Typed vector pipeline: TypedVector / StoredVector / VectorStore trait decouple encoding, storage, and search; Hamming distance is now used correctly for binary-packed vectors.
  • Fingerprint-driven lifecycle: SemanticIndexFingerprint::diff() classifies changes as Rebuild, ClearQueryCache, or None, replacing the prior "delete and hope" invalidation.
  • Reranking, diagnostics, and eval: optional reranking via any OpenAI-compatible endpoint with safe fallback; JSONL diagnostic logging with configurable retention; semantic_doctor and semantic_eval commands.
  • Trust boundary hardened: 10 additional semantic config fields are now blocked from project-level aft.jsonc.

Confidence Score: 3/5

Safe to merge for alpha/preview use, but the reranking pipeline does not function as intended — the overfetch from the vector index is discarded before the reranker can see it.

The reranking feature fetches max(top_k, 50) candidates from the vector index specifically to give the reranker a wider pool, but fuse_hybrid_results immediately truncates back to top_k before calling rerank_candidates. For the default top_k=10, the reranker always operates on 10 items — the same result set a non-reranking search returns — so it can only reorder but cannot surface better matches from the wider pool.

crates/aft/src/commands/semantic_search.rs — the interaction between index.search(...clamp(50, MAX_TOP_K)) and fuse_hybrid_results(..., params.top_k) needs to be reworked so the wider pool reaches the reranker before truncation.

Important Files Changed

Filename Overview
crates/aft/src/commands/semantic_search.rs Reranking pipeline has a logic bug: vector search overfetches (clamp to 50) but fuse_hybrid_results truncates back to top_k before rerank_candidates is called, so the reranker never sees more candidates than top_k.
crates/aft/src/semantic_rerank.rs New reranking pipeline with proper fallback, markdown-fence stripping, and multiple JSON shape support; prompt template has a minor double-expansion edge case.
crates/aft/src/config.rs New typed enums and extensive SemanticBackendConfig fields; serde names align with TypeScript schema.
packages/opencode-plugin/src/config.ts Trust boundary enforcement expanded to cover 10 new semantic fields; enum values now match Rust serde; reranking and diagnostics fields still absent from SemanticConfigSchema.
crates/aft/src/semantic_index.rs Massive expansion: typed vector support, fingerprint-driven lifecycle, background build with cancellation token, stale pruning, partial index querying; new V7/V8 index versions.
crates/aft/src/vector_store.rs New VectorStore trait with FlatF32 (cosine) and FlatBinaryHamming implementations; decouples storage/search from lifecycle management.

Sequence Diagram

sequenceDiagram
    participant Agent
    participant SemanticSearch
    participant VectorIndex
    participant LexicalIndex
    participant Reranker

    Agent->>SemanticSearch: "aft_search({ query, top_k })"
    SemanticSearch->>VectorIndex: search(query_vec, clamp(top_k, 50, 100))
    VectorIndex-->>SemanticSearch: semantic_results up to 50
    SemanticSearch->>LexicalIndex: lexical_rank(trigrams, 50)
    LexicalIndex-->>SemanticSearch: lexical_files up to 50
    SemanticSearch->>SemanticSearch: fuse_hybrid_results truncates to top_k
    Note over SemanticSearch: results.len() at most top_k default 10
    alt rerank_enabled
        SemanticSearch->>Reranker: POST chat/completions min(max_candidates,top_k) snippets
        Reranker-->>SemanticSearch: JSON indices
        SemanticSearch->>SemanticSearch: apply reorder + append missing
    end
    SemanticSearch-->>Agent: status results diagnostics
Loading

Comments Outside Diff (1)

  1. packages/opencode-plugin/src/config.ts, line 37-54 (link)

    P1 TypeScript enum values don't match the Rust serde strings — config will fail to deserialize

    Several new enum schemas use values that don't align with the Rust serde representation:

    • SemanticOutputEncodingEnum allows "binary", "ubinary", "int8", "uint8" but Rust OutputEncoding deserializes from "base64_binary" and "base64_int8".
    • SemanticStorageStrategyEnum allows "flat" and "binary_pack" but Rust StorageStrategy expects "native_f32" and "binary_packed".
    • SemanticInputModeEnum includes "chunk_extracts" and "contextualized" but Rust InputMode only has "flat_texts" and "document_chunks".
    • SemanticDistanceMetricEnum uses "dot" but Rust DistanceMetric expects "dot_product".
    • SemanticBackendEnum is missing the new "perplexity" variant added to Rust.

    A user who follows the TypeScript autocomplete and picks output_encoding: "int8" will pass TypeScript validation but receive a deserialization error (or silent fallback to default) from the Rust binary at runtime.

Reviews (3): Last reviewed commit: "fix: address greptile and qubic review c..." | Re-trigger Greptile

Zireael and others added 30 commits May 24, 2026 11:10
Add scripts, docs, Dockerfile, and package.json scripts for Docker-based
Rust validation (fmt/check/clippy/test) so Windows users without MSVC
Build Tools can still validate Rust code.

- scripts/docker-rust.ps1: PowerShell script supporting fmt/check/clippy/
  test/validate/shell tasks with persistent Docker volumes
- Dockerfile.rust: minimal Rust image with rustfmt + clippy pre-installed
- docs/docker-rust-validation.md: full usage and design documentation
- package.json: 6 new docker:rust:* convenience scripts

Design: Linux-target validation via rust:1-bookworm, persistent cargo
volumes for caching, fail-fast sequential validation.
- SemanticFilePolicy config struct with include_code/include_docs/
  include_configs/binary_detection/generated_file_detection/globs
- parse_semantic_files_config handler in configure.rs
- File policy evaluation: should_index_file(), is_generated_file(),
  is_config_file(), is_docs_file()
- Docs chunker: collect_docs_chunks() with heading-based splitting
  for markdown, splitting by file for other doc types
- collect_chunks routes doc files through docs chunker, skips
  binary/generated/config files per policy
- SemanticIndexFingerprint extended with file_policy_hash and
  docs_chunker_version; diff() triggers rebuild on policy change
- build_with_progress/refresh_stale_files accept &SemanticFilePolicy
- compute_file_policy_hash() deterministic hash of policy fields
- Re-export SemanticFilePolicy from semantic_index module
- All test callers updated with &SemanticFilePolicy::default()
…iority ordering, backoff

- CancellationToken (Arc<AtomicU64> generation counter) for cooperative build cancellation on reconfigure
- Cancel old semantic index builds instead of detaching when config changes
- Priority file ordering: README/docs first, then core source, then tests, then rest
- Embedding backoff: exponential retry with jitter for remote provider rate limits
- SemanticIndexStatus::Partial variant with completeness percentage for partial builds
- Search reports partial index state during cold start
- Phase-boundary cancellation checks between model init, disk read, incremental refresh, and full rebuild
Add Perplexity backend with InputMode::DocumentChunks support for
contextualized embedding where chunks carry document-level context.

- SemanticBackend::Perplexity variant with config, profile, engine
- DocumentChunks/PerDocumentChunks/DocumentEmbeddings structs
- embed_document_chunks() routes Perplexity to grouped embedding API
- build_with_progress_contextualized() groups chunks by document
- Wire configure.rs to branch on input_mode: DocumentChunks
- SemanticEmbeddingModel::input_mode() public accessor
- EmbeddingModelProfile with contextualized_supported guard
- Response validation: index continuity, missing documents, dimension
…to trait-backed module

Bead: aft-t6p.12

Extracts Vec<EmbeddingEntry> storage and search from SemanticIndexSnapshot
into a VectorStore trait with FlatF32VectorStore implementation. This
decouples the storage layer from the lifecycle logic and prepares for
alternative backends (binary Hamming, approximate ANN).

Key changes:
- vector_store.rs: VectorStore trait + ScoredChunk/PruneStats types
- FlatF32VectorStore: flat scan with cosine similarity (preserves existing
  behaviour exactly)
- FlatBinaryHammingVectorStore: forward-looking Hamming-search impl
- SemanticIndexSnapshot delegates search/len/prune/entries to store
- Fixed dimension-sync bug where set_dimension updated the snapshot
  dimension but not the store dimension, causing search to return 0
- EmbeddingEntry and IndexedFileMetadata made pub for trait compatibility
On Windows, use copyFileSync for the binary replacement (which overwrites
the target — renameSync fails with EEXIST). If it fails, the original
binary at binaryPath is preserved.

The temp file cleanup is now wrapped in its own try/catch so a cleanup
failure does NOT propagate as a download failure — the binary was already
successfully placed at binaryPath.

Addresses PR cortexkit#69 cubic review finding P2.
Implement bead aft-t6p.24: file identity manifest + vector ownership records.

Changes:
- **FileRecord struct**: identity record with content_hash, size_bytes, mtime,
  language, document_kind, inclusion_policy_hash, indexed_at
- **file_manifest on SemanticIndexSnapshot**: HashMap<PathBuf, FileRecord>
  tracking which files produced which vectors, enabling precise stale-vector
  pruning when files are edited, deleted, or excluded
- **V8 serialization format**: extends V7 with per-entry chunk_hash (after
  each vector) and file manifest block (after all entry vectors). Full
  backward compatibility with V1-V7 reads.
- **chunk_hash on EmbeddingEntry**: deterministic hash of chunk content fields
  for tracing which version of a chunk produced a stored vector
- **compute_chunk_hash**: blake3-based deterministic hash
- **build_manifest_from_store helper**: populates file_manifest from store's
  file_metadata, called in all builder functions (build_from_chunks,
  build_with_progress_contextualized, refresh_stale_files) and from_bytes
  for V1-V7 cache migration
- **next_chunk_id, fingerprint_string**: forward-looking fields on snapshot
  for future unique ID assignment and fingerprint tracking
…rmalization, and model profiles

Adds aft-t6p.20 (Typed embedding vector representation +
storage-strategy resolution):

- TypedVector (source-side) and StoredVector (persisted) enums
  with DenseF32, DenseInt8, BinaryPacked, and Quantized variants
- StorageStrategy (NativeF32, DecodeNormalizeF32, BinaryPacked)
- VectorKind enum for runtime type tagging
- DistanceMetric (Cosine, DotProduct, Euclidean, Hamming)
- NormalizationPolicy (AlreadyNormalized, NormalizeOnInsertQuery,
  NotApplicable)
- EmbeddingModelProfile fields: source_vector_kind, stored_vector_kind,
  metric, normalization
- convert_vector() / validate_compatible() on EmbeddingModelProfile
- blake3 dependency for chunk hashing
… + dummy base_url for Perplexity profile test

Two fixes for `fingerprint_invalidation_tests`:
- Mock HTTP server now lowercases header names before matching
  Content-Length (reqwest/hyper sends lowercase `content-length:`).
- `base64_int8_profile_from_config_selects_correctly` test provides a
  dummy `base_url` for the Perplexity backend (required by `from_config`).

Co-authored-by: CommandCodeBot <noreply@commandcode.ai>
- Add StorageStrategy::BinaryPacked variant for packed-bit vector storage
- Add EmbeddingModelProfile::perplexity_binary() with BinaryPacked → Hamming path
- Wire from_config to select perplexity_binary profile when Base64Binary encoding
- Implement parse_embedding_value for Base64Binary (decode → 0.0/1.0 f32 vec)
- Implement into_stored for TypedVector::BinaryPacked (requires BinaryPacked strategy)
- Update validate_config and validate_compatible to accept Base64Binary+BinaryPacked
- Replace old "not yet supported" test with parse_embedding_value_base64_binary_succeeds
- 886/893 tests pass (7 pre-existing Docker failures)

Co-authored-by: CommandCodeBot <noreply@commandcode.ai>
Co-authored-by: CommandCodeBot <noreply@commandcode.ai>
Add semantic_diagnostics module with SearchDiagnostics, SearchPipelineType,
SearchWarning, SearchMetricsCollector, PhaseTimer, score_statistics,
top1_margin. Instrument handle_semantic_search with per-phase timing
and warning collection. Wire SearchMetricsCollector into AppContext.
17 new tests, 902/910 lib tests pass (8 pre-existing Docker failures).

Co-authored-by: CommandCodeBot <noreply@commandcode.ai>
- Add SemanticDiagnosticsLogger with file append, rotation (50 MB), and
  retention cleanup (file-deletion based on mtime)
- Add SearchDiagnosticsEvent struct for JSONL serialization with
  raw_query redaction (opt-in via include_raw_queries) and snippet
  placeholder (include_snippets)
- Add config fields: jsonl_logging, jsonl_path, include_raw_queries,
  include_snippets, retention_days to SemanticBackendConfig
- Add lazy-init diagnostics_logger on AppContext with
  resolve_diagnostics_log_path helper (env var → project root → ~/.cache)
- Wire JSONL record into handle_semantic_search diagnostics block
- 4 new tests: raw query redaction, raw query inclusion, disk write
  verification, missing-file recovery
- 907/914 lib tests pass (7 pre-existing Docker failures)

Co-authored-by: CommandCodeBot <noreply@commandcode.ai>
…rch output

Add DiagnosticsOutputMode enum (Off/Minimal/Verbose) and output_mode field
to SemanticBackendConfig. Implement format_diagnostics_prefix() for
Minimal (warnings only) and Verbose (scores + latency + warnings)
output modes. Wire into handle_semantic_search response text.
4 new tests, 25 diagnostics tests total. 910/918 lib tests pass
(8 pre-existing Docker failures).

Co-authored-by: CommandCodeBot <noreply@commandcode.ai>
Add optional reranking via OpenAI-compatible chat endpoint. When
enabled, aft_search overfetches candidates, sends them to a reranker
model, and re-sorts by relevance. Falls back gracefully on any error.

- Add RerankConfig fields to SemanticBackendConfig (rerank_enabled,
  rerank_model, rerank_base_url, rerank_api_key_env, rerank_timeout_ms,
  rerank_max_candidates)
- Create semantic_rerank.rs with RerankerClient, RerankOutcome enum,
  and rerank_candidates function
- Add RerankerFailure warning variant to SearchWarning
- Wire reranking into handle_semantic_search (overfetch → rerank → re-sort)
- Add rerank_latency_ms to SearchDiagnostics and SearchDiagnosticsEvent
- Include rerank latency in verbose diagnostics output
- 6 unit tests for reranker parsing, skip conditions, and failure handling

All 25 diagnostics + 6 reranker tests pass. 917/924 total tests pass
(7 pre-existing Docker infrastructure failures).
Add 40+ unit tests to fingerprint_invalidation_tests covering:
- SemanticBackendConfig deserialization (minimal, all-fields, defaults)
- EmbeddingModelProfile validation for all encoding types
- TypedVector conversion and StoredVector roundtrip
- convert_vector and validate_compatible rejection paths
- Distance metric auto-resolution for f32/int8/binary
- base64_int8 signed int8 decode correctness
- Template hashing, enum roundtrips, resolve helpers

Minor: add #[derive(Debug)] to StoredVector for test ergonomics.

Closes aft-t6p.6.1
Add 6 new tests to fingerprint_invalidation_tests covering:
- file_policy_hash mismatch triggers rebuild
- docs_chunker_version mismatch triggers rebuild
- multi-field changes still trigger rebuild
- rebuild+query_prompt: rebuild wins
- only query_prompt change: ClearQueryCache
- non-fingerprint field changes: NoChange

Total: 22 fingerprint tests. Closes aft-t6p.6.2
Add 29 tests covering:
- is_generated_file: protobuf, minified, dist, build, generated, dart
- is_doc_extension and is_config_extension validation
- classify_semantic_file for code/doc/config
- collect_docs_chunks markdown heading splitting
- SemanticFilePolicy defaults and builtin globs
- FileRecord field population
- build_manifest_from_store construction and cleanup

Closes aft-t6p.6.3
… tests

Add 23 tests covering:
- FlatF32VectorStore: search, empty, dimension mismatch, CRUD, prune, stats
- FlatBinaryHammingVectorStore: search, ranking, prune, delete, stats
- hamming_distance and popcount64 correctness
- Binary decode: byte-aligned, non-byte-aligned, padding, error

Closes aft-t6p.6.4
Add 8 tests covering:
- SemanticIndexLifecycle: cold start, set/get, failed+error, all variants
- SemanticIndexSnapshot: search ranking, immutability after clone
- VectorStore: prune_stale_vectors, prune_orphans

Closes aft-t6p.6.5
Add 10 tests covering:
- HybridRerank pipeline type display
- Metrics collector: window size 1, cache hit rate, zero result rate,
  low confidence rate, latency percentiles
- Diagnostics output mode defaults
- Warning formatting: minimal (all variants, verifies suppressed),
  verbose (all 9 variants)
- SearchWarning serde roundtrip for all 8 variants

Closes aft-t6p.6.6
Add 4 tests covering:
- Concurrent snapshot clones produce independent results
- Concurrent read threads see identical data via Arc
- Mutex contention across 10 threads does not deadlock
- Arc strong_count tracks clone/drop correctly

Closes aft-t6p.6.7
Add 6 tests covering:
- Trust file atomic write (no tmp files left behind)
- Multiple projects trusted independently
- Untrust is idempotent
- Trust state survives reload (serde roundtrip)
- Nonexistent project path is untrusted (fail-closed)

Closes aft-t6p.6.8
The validate_compatible_rejects_binary_stored_with_cosine_metric test
was missing source_vector_kind: BinaryPacked, causing the first match
block to fail with 'unsupported source→stored vector conversion' instead
of reaching the metric compatibility check.
Zireael added 4 commits June 1, 2026 09:24
Add local retrieval evaluation harness for measuring semantic search quality.

New files:
- crates/aft/src/semantic_eval.rs — pure-logic module with:
  - EvalCase, EvalResult, EvalSummary structs
  - JSONL parser (tolerates blank lines and comments)
  - path_matches() — cross-platform suffix matching
  - symbol_matches() — Rust/other-language symbol normalization
  - score_case() — per-case recall@k and MRR scoring
  - score_suite() — aggregate metrics across a suite
- crates/aft/src/commands/semantic_eval.rs — handler wiring:
  - Reads .aft/semantic-eval.jsonl, returns EvalSummary as JSON
  - Supports top_k override and include_per_case toggle
  - Returns tri-state response per AFT honest reporting convention

Wiring:
- crates/aft/src/lib.rs: pub mod semantic_eval
- crates/aft/src/commands/mod.rs: pub mod semantic_eval
- crates/aft/src/main.rs: dispatch semantic_eval command

Tests: 44 tests passing (parser, matcher, scorer, handler)
Add semantic_doctor command that produces a SemanticHealthReport gathering:
- Config summary (backend, model, dimensions, metric, prompts, rerank)
- Index state (lifecycle, entry count, dimension, fingerprint freshness)
- Search quality metrics (p50/p95 latency, zero-result/low-confidence rates)
- Provider connectivity (optional probe)
- Active warnings and actionable suggestions

New files:
- crates/aft/src/semantic_doctor.rs — HealthStatus, ConfigSummary,
  IndexSummary, MetricsSummary, ProviderSummary, Suggestion,
  SemanticHealthReport structs with Serialize and Display impls
- crates/aft/src/commands/semantic_doctor.rs — command handler with
  optional probe_provider param, suggestion generation for disabled/
  building/failed/ready states, 7 handler tests + 6 model tests

Wiring:
- crates/aft/src/lib.rs: pub mod semantic_doctor
- crates/aft/src/commands/mod.rs: pub mod semantic_doctor
- crates/aft/src/main.rs: dispatch "semantic_doctor" command

Also: fix semantic_eval temp directory race condition (atomic counter).

Tests: 14 semantic_doctor + 44 semantic_eval passing, check+clippy+fmt clean.
Extend the semantic_index_info section of the status command to include:
- Search quality metrics (total_queries, p50/p95 latency, zero_result_rate,
  low_confidence_rate, embedding_failure_rate, lexical_failure_rate)
- Rerank status (rerank_enabled, rerank_model)
- Diagnostics state (diagnostics_enabled, prompt_active)

The TUI/status surfaces can now show pipeline health without a separate
semantic_doctor call. Metrics are zero when no queries have been recorded.

Tests: status + semantic_doctor tests passing, check+clippy+fmt clean.
- Add 3 new tests: markdown-fence parsing, snippet truncation, max_candidates limit
- Fix missing-ID append: semantic_search now appends missing indices in original order
- Add max_candidate_chars config field (default 2500) to SemanticBackendConfig
- Use config.rerank_max_candidate_chars instead of hardcoded 200 in reranker
- Update all test configs with new field

Bead: aft-t6p.2.1
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 issues found across 107 files

Note: This PR contains a large number of files. cubic only reviews up to 100 files per PR, so some files may not have been reviewed. cubic prioritizes the most important files to review.
On a pro plan you can use ultrareview for larger PRs.

Re-trigger cubic

Comment thread .beads/README.md Outdated
Comment thread .beads/config.yaml Outdated
Comment thread .claude/settings.json Outdated
Comment thread .qartez/acks/5813b13fa433d553 Outdated
@Zireael Zireael changed the title feat(semantic): provider-aware typed embeddings, reranking, diagnostics, and eval harness feat(semantic): [alpha build] provider-aware typed embeddings, reranking, diagnostics, and eval harness Jun 2, 2026
Remove .beads/, .qartez/, .claude/, .omo/, .kiro/, .lean-ctx/ from
the branch. These are local agent working directories that should not
be distributed. Add them to .gitignore to prevent future accidents.

Addresses cubic review comments on PR cortexkit#87.
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 69 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name=".gitignore">

<violation number="1" location=".gitignore:95">
P2: Inconsistent .gitignore pattern: `omo/` should likely be `.omo/` to match the hidden tooling directory convention used by all other entries in this block.</violation>
</file>

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

Comment thread .gitignore
.beads/
.qartez/
.claude/
omo/
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Inconsistent .gitignore pattern: omo/ should likely be .omo/ to match the hidden tooling directory convention used by all other entries in this block.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At .gitignore, line 95:

<comment>Inconsistent .gitignore pattern: `omo/` should likely be `.omo/` to match the hidden tooling directory convention used by all other entries in this block.</comment>

<file context>
@@ -87,3 +87,11 @@ benchmarks/aft-search/.bench/
+.beads/
+.qartez/
+.claude/
+omo/
+.kiro/
+.lean-ctx/
</file context>

Comment thread crates/aft/src/commands/semantic_search.rs
Comment on lines +132 to +165
// Parse response — try "choices[0].message.content" JSON first.
let content: String = match serde_json::from_str::<serde_json::Value>(&text) {
Ok(v) => v
.get("choices")
.and_then(|c| c.as_array())
.and_then(|c| c.first())
.and_then(|c| c.get("message"))
.and_then(|m| m.get("content"))
.and_then(|c| c.as_str())
.map(|s| s.to_string())
.unwrap_or(text.clone()),
Err(_) => text.clone(),
};

// Parse the content as a JSON array of indices.
let indices = serde_json::from_str::<Vec<usize>>(&content)
.or_else(|_| {
// Try extracting from a JSON object with an "indices" field.
serde_json::from_str::<serde_json::Value>(&content)
.ok()
.and_then(|v| {
v.get("indices")
.or_else(|| v.get("rank"))
.or_else(|| v.get("order"))
.and_then(|a| serde_json::from_value::<Vec<usize>>(a.clone()).ok())
})
.ok_or(())
})
.map_err(|_| {
format!(
"reranker response did not contain a JSON array of indices: {}",
content.chars().take(100).collect::<String>()
)
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Markdown-fenced JSON response never stripped in production path

The test rerank_parses_markdown_fenced_json demonstrates the need but strips the fences inline in the test body — the actual rerank_candidates function passes the LLM response straight to serde_json::from_str without stripping ```json … ``` wrappers. Many chat models consistently return JSON inside code fences regardless of response_format: json_object. When they do, both the Vec<usize> parse and every indices/rank/order field lookup fail, rerank_candidates returns RerankOutcome::Failed, and the reranker silently becomes a no-op for those models. The fix is to strip leading ```json / ``` and trailing ``` from content before the two parse attempts.

Zireael added 2 commits June 2, 2026 20:43
Remove .alfonso/, agents.md, beads-data-*.jsonl, magic-context-*.md,
biome.json_ from the branch. Add them to .gitignore to prevent future
inclusion in PRs.
Restore .alfonso/ from main (it exists upstream). Keep agents.md,
beads-data-*.jsonl, magic-context-*.md, biome.json_ removed and
gitignored since they don't exist on main.
Comment on lines 51 to 78
const SemanticConfigSchema = z.object({
/** Semantic backend type: local fastembed, OpenAI-compatible API, or Ollama. */
backend: SemanticBackendEnum.optional(),
/** Model identifier passed to the selected semantic backend. */
model: z.string().trim().min(1).optional(),
/** Base URL of the backend API endpoint. */
base_url: z.string().trim().min(1).optional(),
/** Environment variable that contains the API key used by external backends. */
api_key_env: z.string().trim().min(1).optional(),
/** Backend request timeout in milliseconds. */
timeout_ms: z.number().int().positive().optional(),
/** Maximum batch size used by the semantic pipeline. */
max_batch_size: z.number().int().positive().optional(),
/** Output encoding for embedding vectors: "float" (default), "binary", "ubinary", "int8", or "uint8". */
output_encoding: SemanticOutputEncodingEnum.optional(),
/** Storage strategy: "flat" (default) or "binary_pack". */
storage_strategy: SemanticStorageStrategyEnum.optional(),
/** Input mode for document processing: "flat_texts" (default), "chunk_extracts", or "contextualized". */
input_mode: SemanticInputModeEnum.optional(),
/** Embedding dimension count (for providers that support variable dimensions). */
dimensions: z.number().int().positive().optional(),
/** Distance metric: "cosine" (default), "dot", or "hamming". */
distance_metric: SemanticDistanceMetricEnum.optional(),
/** Optional query prompt template (applied before embedding queries). */
query_prompt_template: z.string().optional(),
/** Optional document prompt template (applied before embedding documents). */
document_prompt_template: z.string().optional(),
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Reranking config fields not declared in SemanticConfigSchema — silently stripped by Zod

SemanticBackendConfig on the Rust side gained 10+ new fields for reranking and diagnostics (rerank_enabled, rerank_model, rerank_base_url, rerank_api_key_env, rerank_timeout_ms, rerank_max_candidates, rerank_max_candidate_chars, diagnostics_enabled, output_mode, jsonl_logging, etc.), but none of them appear in the TypeScript SemanticConfigSchema. Zod's z.object({...}) strips unknown keys by default, so any user who adds rerank_enabled: true (or any other new field) to their semantic block in aft.jsonc will have it silently removed during parsing. The value never reaches Rust, reranking never activates, and there is no warning. This renders the reranking feature completely unconfigurable through the standard plugin config path.

// model had reasonable matches the agent could have judged. Surface every
// hit with its score so the caller can decide.

*ctx.semantic_index_status().borrow_mut() = SemanticIndexStatus::Ready;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Successful search unconditionally overwrites Partial index status with Ready

Every successful search ends with *ctx.semantic_index_status().borrow_mut() = SemanticIndexStatus::Ready, even when the background build is still in progress. When a user queries while the index is Partial (e.g., 40% built), the search falls through (allowed by design), then this line overwrites the status — dropping the completeness percentage and the entry counts the build thread wrote. The background thread will overwrite it again on its next tick, but until then the status command and TUI sidebar show "ready" for an incomplete index.

@Zireael
Copy link
Copy Markdown
Contributor Author

Zireael commented Jun 2, 2026

Source code for semantic search functionality for public preview.
Feature skeleton is there, needs finishing up, polishing static tests and functional testing.
One more thing that would need adding would be model2vec 'Potion Code 16M' support. If it performs well in tests, I think it could become fast, cheap and performant default semantic model.

Here's imlementation plans for sprints under this epic (in gastown beads format):
aft-semantic-search-upgrade.json

1. Fix duplicate entries in reranked output (greptile P1)
   - Add !used[i] check in filter_map to prevent duplicate indices
   - File: crates/aft/src/commands/semantic_search.rs

2. Strip markdown fences from LLM reranker responses (greptile P1)
   - Many chat models wrap JSON in code fences
   - Add strip_markdown_fences() helper applied before parsing
   - File: crates/aft/src/semantic_rerank.rs

3. Align TypeScript enum values with Rust serde (qubic P1)
   - SemanticBackendEnum: add perplexity variant
   - SemanticOutputEncodingEnum: float, base64_int8, base64_binary
   - SemanticStorageStrategyEnum: native_f32, decode_normalize_f32, binary_packed
   - SemanticInputModeEnum: flat_texts, document_chunks
   - SemanticDistanceMetricEnum: auto, cosine, dot_product, euclidean, hamming
   - File: packages/opencode-plugin/src/config.ts
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 4 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/opencode-plugin/src/config.ts">

<violation number="1" location="packages/opencode-plugin/src/config.ts:40">
P2: Semantic enum literals were renamed without backward-compatibility aliases or migration, breaking existing configs that use old values.</violation>
</file>

Tip: Review your code locally with the cubic CLI to iterate faster.

Re-trigger cubic

const SemanticBackendEnum = z.enum(["fastembed", "openai_compatible", "ollama", "perplexity"]);

/** Output encoding mode for embeddings. */
const SemanticOutputEncodingEnum = z.enum(["float", "base64_int8", "base64_binary"]);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Semantic enum literals were renamed without backward-compatibility aliases or migration, breaking existing configs that use old values.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/opencode-plugin/src/config.ts, line 40:

<comment>Semantic enum literals were renamed without backward-compatibility aliases or migration, breaking existing configs that use old values.</comment>

<file context>
@@ -34,19 +34,19 @@ const CheckerEnum = z.enum([
 
 /** Output encoding mode for embeddings. */
-const SemanticOutputEncodingEnum = z.enum(["float", "binary", "ubinary", "int8", "uint8"]);
+const SemanticOutputEncodingEnum = z.enum(["float", "base64_int8", "base64_binary"]);
 
 /** Storage strategy for embedding vectors. */
</file context>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant