feat(semantic): [alpha build] provider-aware typed embeddings, reranking, diagnostics, and eval harness#87
Conversation
Add scripts, docs, Dockerfile, and package.json scripts for Docker-based Rust validation (fmt/check/clippy/test) so Windows users without MSVC Build Tools can still validate Rust code. - scripts/docker-rust.ps1: PowerShell script supporting fmt/check/clippy/ test/validate/shell tasks with persistent Docker volumes - Dockerfile.rust: minimal Rust image with rustfmt + clippy pre-installed - docs/docker-rust-validation.md: full usage and design documentation - package.json: 6 new docker:rust:* convenience scripts Design: Linux-target validation via rust:1-bookworm, persistent cargo volumes for caching, fail-fast sequential validation.
…rough, fingerprint upgrade
…or pruning, write-lock sync
…pgrade, invalidation tests
- SemanticFilePolicy config struct with include_code/include_docs/ include_configs/binary_detection/generated_file_detection/globs - parse_semantic_files_config handler in configure.rs - File policy evaluation: should_index_file(), is_generated_file(), is_config_file(), is_docs_file() - Docs chunker: collect_docs_chunks() with heading-based splitting for markdown, splitting by file for other doc types - collect_chunks routes doc files through docs chunker, skips binary/generated/config files per policy - SemanticIndexFingerprint extended with file_policy_hash and docs_chunker_version; diff() triggers rebuild on policy change - build_with_progress/refresh_stale_files accept &SemanticFilePolicy - compute_file_policy_hash() deterministic hash of policy fields - Re-export SemanticFilePolicy from semantic_index module - All test callers updated with &SemanticFilePolicy::default()
…iority ordering, backoff - CancellationToken (Arc<AtomicU64> generation counter) for cooperative build cancellation on reconfigure - Cancel old semantic index builds instead of detaching when config changes - Priority file ordering: README/docs first, then core source, then tests, then rest - Embedding backoff: exponential retry with jitter for remote provider rate limits - SemanticIndexStatus::Partial variant with completeness percentage for partial builds - Search reports partial index state during cold start - Phase-boundary cancellation checks between model init, disk read, incremental refresh, and full rebuild
Add Perplexity backend with InputMode::DocumentChunks support for contextualized embedding where chunks carry document-level context. - SemanticBackend::Perplexity variant with config, profile, engine - DocumentChunks/PerDocumentChunks/DocumentEmbeddings structs - embed_document_chunks() routes Perplexity to grouped embedding API - build_with_progress_contextualized() groups chunks by document - Wire configure.rs to branch on input_mode: DocumentChunks - SemanticEmbeddingModel::input_mode() public accessor - EmbeddingModelProfile with contextualized_supported guard - Response validation: index continuity, missing documents, dimension
…to trait-backed module Bead: aft-t6p.12 Extracts Vec<EmbeddingEntry> storage and search from SemanticIndexSnapshot into a VectorStore trait with FlatF32VectorStore implementation. This decouples the storage layer from the lifecycle logic and prepares for alternative backends (binary Hamming, approximate ANN). Key changes: - vector_store.rs: VectorStore trait + ScoredChunk/PruneStats types - FlatF32VectorStore: flat scan with cosine similarity (preserves existing behaviour exactly) - FlatBinaryHammingVectorStore: forward-looking Hamming-search impl - SemanticIndexSnapshot delegates search/len/prune/entries to store - Fixed dimension-sync bug where set_dimension updated the snapshot dimension but not the store dimension, causing search to return 0 - EmbeddingEntry and IndexedFileMetadata made pub for trait compatibility
On Windows, use copyFileSync for the binary replacement (which overwrites the target — renameSync fails with EEXIST). If it fails, the original binary at binaryPath is preserved. The temp file cleanup is now wrapped in its own try/catch so a cleanup failure does NOT propagate as a download failure — the binary was already successfully placed at binaryPath. Addresses PR cortexkit#69 cubic review finding P2.
Implement bead aft-t6p.24: file identity manifest + vector ownership records. Changes: - **FileRecord struct**: identity record with content_hash, size_bytes, mtime, language, document_kind, inclusion_policy_hash, indexed_at - **file_manifest on SemanticIndexSnapshot**: HashMap<PathBuf, FileRecord> tracking which files produced which vectors, enabling precise stale-vector pruning when files are edited, deleted, or excluded - **V8 serialization format**: extends V7 with per-entry chunk_hash (after each vector) and file manifest block (after all entry vectors). Full backward compatibility with V1-V7 reads. - **chunk_hash on EmbeddingEntry**: deterministic hash of chunk content fields for tracing which version of a chunk produced a stored vector - **compute_chunk_hash**: blake3-based deterministic hash - **build_manifest_from_store helper**: populates file_manifest from store's file_metadata, called in all builder functions (build_from_chunks, build_with_progress_contextualized, refresh_stale_files) and from_bytes for V1-V7 cache migration - **next_chunk_id, fingerprint_string**: forward-looking fields on snapshot for future unique ID assignment and fingerprint tracking
…rmalization, and model profiles Adds aft-t6p.20 (Typed embedding vector representation + storage-strategy resolution): - TypedVector (source-side) and StoredVector (persisted) enums with DenseF32, DenseInt8, BinaryPacked, and Quantized variants - StorageStrategy (NativeF32, DecodeNormalizeF32, BinaryPacked) - VectorKind enum for runtime type tagging - DistanceMetric (Cosine, DotProduct, Euclidean, Hamming) - NormalizationPolicy (AlreadyNormalized, NormalizeOnInsertQuery, NotApplicable) - EmbeddingModelProfile fields: source_vector_kind, stored_vector_kind, metric, normalization - convert_vector() / validate_compatible() on EmbeddingModelProfile - blake3 dependency for chunk hashing
… + dummy base_url for Perplexity profile test Two fixes for `fingerprint_invalidation_tests`: - Mock HTTP server now lowercases header names before matching Content-Length (reqwest/hyper sends lowercase `content-length:`). - `base64_int8_profile_from_config_selects_correctly` test provides a dummy `base_url` for the Perplexity backend (required by `from_config`). Co-authored-by: CommandCodeBot <noreply@commandcode.ai>
- Add StorageStrategy::BinaryPacked variant for packed-bit vector storage - Add EmbeddingModelProfile::perplexity_binary() with BinaryPacked → Hamming path - Wire from_config to select perplexity_binary profile when Base64Binary encoding - Implement parse_embedding_value for Base64Binary (decode → 0.0/1.0 f32 vec) - Implement into_stored for TypedVector::BinaryPacked (requires BinaryPacked strategy) - Update validate_config and validate_compatible to accept Base64Binary+BinaryPacked - Replace old "not yet supported" test with parse_embedding_value_base64_binary_succeeds - 886/893 tests pass (7 pre-existing Docker failures) Co-authored-by: CommandCodeBot <noreply@commandcode.ai>
Co-authored-by: CommandCodeBot <noreply@commandcode.ai>
Add semantic_diagnostics module with SearchDiagnostics, SearchPipelineType, SearchWarning, SearchMetricsCollector, PhaseTimer, score_statistics, top1_margin. Instrument handle_semantic_search with per-phase timing and warning collection. Wire SearchMetricsCollector into AppContext. 17 new tests, 902/910 lib tests pass (8 pre-existing Docker failures). Co-authored-by: CommandCodeBot <noreply@commandcode.ai>
- Add SemanticDiagnosticsLogger with file append, rotation (50 MB), and retention cleanup (file-deletion based on mtime) - Add SearchDiagnosticsEvent struct for JSONL serialization with raw_query redaction (opt-in via include_raw_queries) and snippet placeholder (include_snippets) - Add config fields: jsonl_logging, jsonl_path, include_raw_queries, include_snippets, retention_days to SemanticBackendConfig - Add lazy-init diagnostics_logger on AppContext with resolve_diagnostics_log_path helper (env var → project root → ~/.cache) - Wire JSONL record into handle_semantic_search diagnostics block - 4 new tests: raw query redaction, raw query inclusion, disk write verification, missing-file recovery - 907/914 lib tests pass (7 pre-existing Docker failures) Co-authored-by: CommandCodeBot <noreply@commandcode.ai>
…rch output Add DiagnosticsOutputMode enum (Off/Minimal/Verbose) and output_mode field to SemanticBackendConfig. Implement format_diagnostics_prefix() for Minimal (warnings only) and Verbose (scores + latency + warnings) output modes. Wire into handle_semantic_search response text. 4 new tests, 25 diagnostics tests total. 910/918 lib tests pass (8 pre-existing Docker failures). Co-authored-by: CommandCodeBot <noreply@commandcode.ai>
Add optional reranking via OpenAI-compatible chat endpoint. When enabled, aft_search overfetches candidates, sends them to a reranker model, and re-sorts by relevance. Falls back gracefully on any error. - Add RerankConfig fields to SemanticBackendConfig (rerank_enabled, rerank_model, rerank_base_url, rerank_api_key_env, rerank_timeout_ms, rerank_max_candidates) - Create semantic_rerank.rs with RerankerClient, RerankOutcome enum, and rerank_candidates function - Add RerankerFailure warning variant to SearchWarning - Wire reranking into handle_semantic_search (overfetch → rerank → re-sort) - Add rerank_latency_ms to SearchDiagnostics and SearchDiagnosticsEvent - Include rerank latency in verbose diagnostics output - 6 unit tests for reranker parsing, skip conditions, and failure handling All 25 diagnostics + 6 reranker tests pass. 917/924 total tests pass (7 pre-existing Docker infrastructure failures).
Add 40+ unit tests to fingerprint_invalidation_tests covering: - SemanticBackendConfig deserialization (minimal, all-fields, defaults) - EmbeddingModelProfile validation for all encoding types - TypedVector conversion and StoredVector roundtrip - convert_vector and validate_compatible rejection paths - Distance metric auto-resolution for f32/int8/binary - base64_int8 signed int8 decode correctness - Template hashing, enum roundtrips, resolve helpers Minor: add #[derive(Debug)] to StoredVector for test ergonomics. Closes aft-t6p.6.1
Add 6 new tests to fingerprint_invalidation_tests covering: - file_policy_hash mismatch triggers rebuild - docs_chunker_version mismatch triggers rebuild - multi-field changes still trigger rebuild - rebuild+query_prompt: rebuild wins - only query_prompt change: ClearQueryCache - non-fingerprint field changes: NoChange Total: 22 fingerprint tests. Closes aft-t6p.6.2
Add 29 tests covering: - is_generated_file: protobuf, minified, dist, build, generated, dart - is_doc_extension and is_config_extension validation - classify_semantic_file for code/doc/config - collect_docs_chunks markdown heading splitting - SemanticFilePolicy defaults and builtin globs - FileRecord field population - build_manifest_from_store construction and cleanup Closes aft-t6p.6.3
… tests Add 23 tests covering: - FlatF32VectorStore: search, empty, dimension mismatch, CRUD, prune, stats - FlatBinaryHammingVectorStore: search, ranking, prune, delete, stats - hamming_distance and popcount64 correctness - Binary decode: byte-aligned, non-byte-aligned, padding, error Closes aft-t6p.6.4
Add 8 tests covering: - SemanticIndexLifecycle: cold start, set/get, failed+error, all variants - SemanticIndexSnapshot: search ranking, immutability after clone - VectorStore: prune_stale_vectors, prune_orphans Closes aft-t6p.6.5
Add 10 tests covering: - HybridRerank pipeline type display - Metrics collector: window size 1, cache hit rate, zero result rate, low confidence rate, latency percentiles - Diagnostics output mode defaults - Warning formatting: minimal (all variants, verifies suppressed), verbose (all 9 variants) - SearchWarning serde roundtrip for all 8 variants Closes aft-t6p.6.6
Add 4 tests covering: - Concurrent snapshot clones produce independent results - Concurrent read threads see identical data via Arc - Mutex contention across 10 threads does not deadlock - Arc strong_count tracks clone/drop correctly Closes aft-t6p.6.7
Add 6 tests covering: - Trust file atomic write (no tmp files left behind) - Multiple projects trusted independently - Untrust is idempotent - Trust state survives reload (serde roundtrip) - Nonexistent project path is untrusted (fail-closed) Closes aft-t6p.6.8
The validate_compatible_rejects_binary_stored_with_cosine_metric test was missing source_vector_kind: BinaryPacked, causing the first match block to fail with 'unsupported source→stored vector conversion' instead of reaching the metric compatibility check.
Add local retrieval evaluation harness for measuring semantic search quality. New files: - crates/aft/src/semantic_eval.rs — pure-logic module with: - EvalCase, EvalResult, EvalSummary structs - JSONL parser (tolerates blank lines and comments) - path_matches() — cross-platform suffix matching - symbol_matches() — Rust/other-language symbol normalization - score_case() — per-case recall@k and MRR scoring - score_suite() — aggregate metrics across a suite - crates/aft/src/commands/semantic_eval.rs — handler wiring: - Reads .aft/semantic-eval.jsonl, returns EvalSummary as JSON - Supports top_k override and include_per_case toggle - Returns tri-state response per AFT honest reporting convention Wiring: - crates/aft/src/lib.rs: pub mod semantic_eval - crates/aft/src/commands/mod.rs: pub mod semantic_eval - crates/aft/src/main.rs: dispatch semantic_eval command Tests: 44 tests passing (parser, matcher, scorer, handler)
Add semantic_doctor command that produces a SemanticHealthReport gathering: - Config summary (backend, model, dimensions, metric, prompts, rerank) - Index state (lifecycle, entry count, dimension, fingerprint freshness) - Search quality metrics (p50/p95 latency, zero-result/low-confidence rates) - Provider connectivity (optional probe) - Active warnings and actionable suggestions New files: - crates/aft/src/semantic_doctor.rs — HealthStatus, ConfigSummary, IndexSummary, MetricsSummary, ProviderSummary, Suggestion, SemanticHealthReport structs with Serialize and Display impls - crates/aft/src/commands/semantic_doctor.rs — command handler with optional probe_provider param, suggestion generation for disabled/ building/failed/ready states, 7 handler tests + 6 model tests Wiring: - crates/aft/src/lib.rs: pub mod semantic_doctor - crates/aft/src/commands/mod.rs: pub mod semantic_doctor - crates/aft/src/main.rs: dispatch "semantic_doctor" command Also: fix semantic_eval temp directory race condition (atomic counter). Tests: 14 semantic_doctor + 44 semantic_eval passing, check+clippy+fmt clean.
Extend the semantic_index_info section of the status command to include: - Search quality metrics (total_queries, p50/p95 latency, zero_result_rate, low_confidence_rate, embedding_failure_rate, lexical_failure_rate) - Rerank status (rerank_enabled, rerank_model) - Diagnostics state (diagnostics_enabled, prompt_active) The TUI/status surfaces can now show pipeline health without a separate semantic_doctor call. Metrics are zero when no queries have been recorded. Tests: status + semantic_doctor tests passing, check+clippy+fmt clean.
- Add 3 new tests: markdown-fence parsing, snippet truncation, max_candidates limit - Fix missing-ID append: semantic_search now appends missing indices in original order - Add max_candidate_chars config field (default 2500) to SemanticBackendConfig - Use config.rerank_max_candidate_chars instead of hardcoded 200 in reranker - Update all test configs with new field Bead: aft-t6p.2.1
There was a problem hiding this comment.
4 issues found across 107 files
Note: This PR contains a large number of files. cubic only reviews up to 100 files per PR, so some files may not have been reviewed. cubic prioritizes the most important files to review.
On a pro plan you can use ultrareview for larger PRs.
Re-trigger cubic
Remove .beads/, .qartez/, .claude/, .omo/, .kiro/, .lean-ctx/ from the branch. These are local agent working directories that should not be distributed. Add them to .gitignore to prevent future accidents. Addresses cubic review comments on PR cortexkit#87.
There was a problem hiding this comment.
1 issue found across 69 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name=".gitignore">
<violation number="1" location=".gitignore:95">
P2: Inconsistent .gitignore pattern: `omo/` should likely be `.omo/` to match the hidden tooling directory convention used by all other entries in this block.</violation>
</file>
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
| .beads/ | ||
| .qartez/ | ||
| .claude/ | ||
| omo/ |
There was a problem hiding this comment.
P2: Inconsistent .gitignore pattern: omo/ should likely be .omo/ to match the hidden tooling directory convention used by all other entries in this block.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At .gitignore, line 95:
<comment>Inconsistent .gitignore pattern: `omo/` should likely be `.omo/` to match the hidden tooling directory convention used by all other entries in this block.</comment>
<file context>
@@ -87,3 +87,11 @@ benchmarks/aft-search/.bench/
+.beads/
+.qartez/
+.claude/
+omo/
+.kiro/
+.lean-ctx/
</file context>
| // Parse response — try "choices[0].message.content" JSON first. | ||
| let content: String = match serde_json::from_str::<serde_json::Value>(&text) { | ||
| Ok(v) => v | ||
| .get("choices") | ||
| .and_then(|c| c.as_array()) | ||
| .and_then(|c| c.first()) | ||
| .and_then(|c| c.get("message")) | ||
| .and_then(|m| m.get("content")) | ||
| .and_then(|c| c.as_str()) | ||
| .map(|s| s.to_string()) | ||
| .unwrap_or(text.clone()), | ||
| Err(_) => text.clone(), | ||
| }; | ||
|
|
||
| // Parse the content as a JSON array of indices. | ||
| let indices = serde_json::from_str::<Vec<usize>>(&content) | ||
| .or_else(|_| { | ||
| // Try extracting from a JSON object with an "indices" field. | ||
| serde_json::from_str::<serde_json::Value>(&content) | ||
| .ok() | ||
| .and_then(|v| { | ||
| v.get("indices") | ||
| .or_else(|| v.get("rank")) | ||
| .or_else(|| v.get("order")) | ||
| .and_then(|a| serde_json::from_value::<Vec<usize>>(a.clone()).ok()) | ||
| }) | ||
| .ok_or(()) | ||
| }) | ||
| .map_err(|_| { | ||
| format!( | ||
| "reranker response did not contain a JSON array of indices: {}", | ||
| content.chars().take(100).collect::<String>() | ||
| ) | ||
| }); |
There was a problem hiding this comment.
Markdown-fenced JSON response never stripped in production path
The test rerank_parses_markdown_fenced_json demonstrates the need but strips the fences inline in the test body — the actual rerank_candidates function passes the LLM response straight to serde_json::from_str without stripping ```json … ``` wrappers. Many chat models consistently return JSON inside code fences regardless of response_format: json_object. When they do, both the Vec<usize> parse and every indices/rank/order field lookup fail, rerank_candidates returns RerankOutcome::Failed, and the reranker silently becomes a no-op for those models. The fix is to strip leading ```json / ``` and trailing ``` from content before the two parse attempts.
Remove .alfonso/, agents.md, beads-data-*.jsonl, magic-context-*.md, biome.json_ from the branch. Add them to .gitignore to prevent future inclusion in PRs.
Restore .alfonso/ from main (it exists upstream). Keep agents.md, beads-data-*.jsonl, magic-context-*.md, biome.json_ removed and gitignored since they don't exist on main.
| const SemanticConfigSchema = z.object({ | ||
| /** Semantic backend type: local fastembed, OpenAI-compatible API, or Ollama. */ | ||
| backend: SemanticBackendEnum.optional(), | ||
| /** Model identifier passed to the selected semantic backend. */ | ||
| model: z.string().trim().min(1).optional(), | ||
| /** Base URL of the backend API endpoint. */ | ||
| base_url: z.string().trim().min(1).optional(), | ||
| /** Environment variable that contains the API key used by external backends. */ | ||
| api_key_env: z.string().trim().min(1).optional(), | ||
| /** Backend request timeout in milliseconds. */ | ||
| timeout_ms: z.number().int().positive().optional(), | ||
| /** Maximum batch size used by the semantic pipeline. */ | ||
| max_batch_size: z.number().int().positive().optional(), | ||
| /** Output encoding for embedding vectors: "float" (default), "binary", "ubinary", "int8", or "uint8". */ | ||
| output_encoding: SemanticOutputEncodingEnum.optional(), | ||
| /** Storage strategy: "flat" (default) or "binary_pack". */ | ||
| storage_strategy: SemanticStorageStrategyEnum.optional(), | ||
| /** Input mode for document processing: "flat_texts" (default), "chunk_extracts", or "contextualized". */ | ||
| input_mode: SemanticInputModeEnum.optional(), | ||
| /** Embedding dimension count (for providers that support variable dimensions). */ | ||
| dimensions: z.number().int().positive().optional(), | ||
| /** Distance metric: "cosine" (default), "dot", or "hamming". */ | ||
| distance_metric: SemanticDistanceMetricEnum.optional(), | ||
| /** Optional query prompt template (applied before embedding queries). */ | ||
| query_prompt_template: z.string().optional(), | ||
| /** Optional document prompt template (applied before embedding documents). */ | ||
| document_prompt_template: z.string().optional(), | ||
| }); |
There was a problem hiding this comment.
Reranking config fields not declared in
SemanticConfigSchema — silently stripped by Zod
SemanticBackendConfig on the Rust side gained 10+ new fields for reranking and diagnostics (rerank_enabled, rerank_model, rerank_base_url, rerank_api_key_env, rerank_timeout_ms, rerank_max_candidates, rerank_max_candidate_chars, diagnostics_enabled, output_mode, jsonl_logging, etc.), but none of them appear in the TypeScript SemanticConfigSchema. Zod's z.object({...}) strips unknown keys by default, so any user who adds rerank_enabled: true (or any other new field) to their semantic block in aft.jsonc will have it silently removed during parsing. The value never reaches Rust, reranking never activates, and there is no warning. This renders the reranking feature completely unconfigurable through the standard plugin config path.
| // model had reasonable matches the agent could have judged. Surface every | ||
| // hit with its score so the caller can decide. | ||
|
|
||
| *ctx.semantic_index_status().borrow_mut() = SemanticIndexStatus::Ready; |
There was a problem hiding this comment.
Successful search unconditionally overwrites
Partial index status with Ready
Every successful search ends with *ctx.semantic_index_status().borrow_mut() = SemanticIndexStatus::Ready, even when the background build is still in progress. When a user queries while the index is Partial (e.g., 40% built), the search falls through (allowed by design), then this line overwrites the status — dropping the completeness percentage and the entry counts the build thread wrote. The background thread will overwrite it again on its next tick, but until then the status command and TUI sidebar show "ready" for an incomplete index.
|
Source code for semantic search functionality for public preview. Here's imlementation plans for sprints under this epic (in gastown beads format): |
1. Fix duplicate entries in reranked output (greptile P1) - Add !used[i] check in filter_map to prevent duplicate indices - File: crates/aft/src/commands/semantic_search.rs 2. Strip markdown fences from LLM reranker responses (greptile P1) - Many chat models wrap JSON in code fences - Add strip_markdown_fences() helper applied before parsing - File: crates/aft/src/semantic_rerank.rs 3. Align TypeScript enum values with Rust serde (qubic P1) - SemanticBackendEnum: add perplexity variant - SemanticOutputEncodingEnum: float, base64_int8, base64_binary - SemanticStorageStrategyEnum: native_f32, decode_normalize_f32, binary_packed - SemanticInputModeEnum: flat_texts, document_chunks - SemanticDistanceMetricEnum: auto, cosine, dot_product, euclidean, hamming - File: packages/opencode-plugin/src/config.ts
There was a problem hiding this comment.
1 issue found across 4 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/opencode-plugin/src/config.ts">
<violation number="1" location="packages/opencode-plugin/src/config.ts:40">
P2: Semantic enum literals were renamed without backward-compatibility aliases or migration, breaking existing configs that use old values.</violation>
</file>
Tip: Review your code locally with the cubic CLI to iterate faster.
Re-trigger cubic
| const SemanticBackendEnum = z.enum(["fastembed", "openai_compatible", "ollama", "perplexity"]); | ||
|
|
||
| /** Output encoding mode for embeddings. */ | ||
| const SemanticOutputEncodingEnum = z.enum(["float", "base64_int8", "base64_binary"]); |
There was a problem hiding this comment.
P2: Semantic enum literals were renamed without backward-compatibility aliases or migration, breaking existing configs that use old values.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/opencode-plugin/src/config.ts, line 40:
<comment>Semantic enum literals were renamed without backward-compatibility aliases or migration, breaking existing configs that use old values.</comment>
<file context>
@@ -34,19 +34,19 @@ const CheckerEnum = z.enum([
/** Output encoding mode for embeddings. */
-const SemanticOutputEncodingEnum = z.enum(["float", "binary", "ubinary", "int8", "uint8"]);
+const SemanticOutputEncodingEnum = z.enum(["float", "base64_int8", "base64_binary"]);
/** Storage strategy for embedding vectors. */
</file context>
Summary
Semantic search in AFT moves from a minimal embedding-and-cosine prototype to a provider-capability-aware retrieval subsystem with typed vectors, optional reranking, background lifecycle management, diagnostics, and evaluation tooling. This is a public preview — the feature is functional and tested (~93 new tests) but expects iteration based on real-world feedback.
What changed
The upgrade touches the full semantic pipeline — config, indexing, retrieval, diagnostics, and observability — without breaking the default
fastembedexperience.Typed vector representations
Vectors are no longer opaque f32 blobs. Every stored vector carries explicit type metadata (
DenseF32,Int8SourceDecoded,BinaryPacked) and is paired with its source kind so the correct distance metric is selected automatically. Binary packed vectors use Hamming search (native bitwise XOR + popcount) instead of cosine, which is both faster and semantically correct for quantized embeddings. This unlocks Perplexity'sbase64_binaryandbase64_int8output modes alongside standard dense providers.Provider capability profiles
Each embedding backend (fastembed, OpenAI-compatible, Ollama, Perplexity) declares what it supports: output encoding, distance metric, dimension range, max batch size. The config layer validates combinations at configure time — you cannot accidentally request binary vectors through a cosine-only provider. Profiles also carry fingerprint fields so switching providers triggers a clean index rebuild rather than silent corruption.
Fingerprint-driven index lifecycle
A
SemanticIndexFingerprintcaptures every dimension that affects index correctness: backend, model, base_url, dimension, chunking_version, output_encoding, storage_strategy, vector kinds, normalization, and prompt hashes.diff()classifies changes asRebuild(structural — re-embed everything),ClearQueryCache(query prompts changed — invalidate cached results only), orNone. This replaces the previous "delete and hope" invalidation with precise, explainable rebuild decisions.Non-blocking cold start
Index builds run in a background thread with cooperative cancellation (
SemanticCancellationTokenviaAtomicU64generation counter). The build checks the generation before each embedding batch and exits early when a reconfigure arrives. Priority ordering ensures high-value files (recently edited, high PageRank) get embedded first. Exponential backoff handles transient provider failures without blocking the session.Stale-vector pruning
When files are edited, deleted, moved, excluded, or re-included, the index tracks which vectors are stale and prunes them during the next refresh cycle. Every vector record carries file/chunk ownership metadata (file path, version, chunk hash, index fingerprint) so pruning is traceable and deterministic.
File policy and docs chunking
A configurable file policy controls which files enter the index (include globs, exclude globs, max file size, max chunk count). The docs chunker splits Markdown and documentation files into semantic sections before embedding, improving recall for documentation-shaped queries.
Reranking pipeline
Optional reranking via any OpenAI-compatible
/v1/rerankor chat-completion endpoint. The pipeline sends initial retrieval candidates to a reranker, parses the response (supporting multiple JSON shapes), and reorders results with safe fallback — if the reranker fails, the original cosine-similarity order is returned unchanged. Config fields:rerank.enabled,rerank.model,rerank.base_url,rerank.api_key_env,rerank.max_candidates.Search pipeline metrics and diagnostics
Every
aft_searchcall records timing, cache hits/misses, result counts, and reranker fallback events. Metrics are exposed through thestatuscommand and through JSONL diagnostic logs for offline analysis. TheDiagnosticsOutputModeconfig controls verbosity in tool output (compact|verbose|off).Semantic doctor
semantic_doctoris a health-check command that reports config summary, index summary, metrics summary, provider summary, and actionable suggestions. Use it to verify that the index is healthy, the provider is reachable, and the configuration is consistent.Semantic eval harness
semantic_evalruns a JSONL-defined evaluation suite against the semantic index. Each case specifies a query, expected paths, expected symbols, and top-k. The harness computes recall@k and MRR (Mean Reciprocal Rank) for quantifying retrieval quality across config changes.Status integration
The
statuscommand now includes semantic health metrics: lifecycle state, entry count, dimension, total queries, cache hit ratio, average query time, and provider info. The OpenCode TUI sidebar surfaces these alongside the existing index state.Config trust boundary
backend,base_url, andapi_key_envare user-only fields — project-levelaft.jsonccannot inject these. A hostile repository cannot redirect embeddings at an attacker-controlled endpoint or exfiltrate API keys. The plugin logs a warning when it strips a project-level setting.Contextualized document-chunk embedding (partial)
Initial support for Perplexity-style document/chunk grouped embedding — chunks from the same source document are batched together rather than flattened. Oversized document handling and retry logic are still in progress (see roadmap).
How to test
Default fastembed (zero-config)
Verify: results appear with
source: semanticorsource: hybridtags. Status shows[index: ready]after build completes.Provider switching
Verify: index rebuilds automatically on next session start. Status shows new provider/model.
Reranking
{ "semantic_search": true, "semantic": { "backend": "openai_compatible", "model": "text-embedding-3-small", "base_url": "https://api.openai.com/v1", "api_key_env": "OPENAI_API_KEY" }, "rerank": { "enabled": true, "model": "rerank-english-v3.0", "base_url": "https://api.cohere.com", "api_key_env": "COHERE_API_KEY" } }Verify: search results show reranker-sorted order. Disable reranker — results fall back to cosine order.
Semantic doctor
aft_search({ "query": "test" }) # trigger index build if cold # Then check health via status command or semantic_doctorVerify: health report shows ConfigSummary, IndexSummary, MetricsSummary, ProviderSummary.
Eval harness
Verify: returns recall@k and MRR scores.
Test coverage
~93 tests across 8 test sub-tasks covering:
Roadmap
Still in progress or planned for follow-up:
Architecture notes
Key new modules:
crates/aft/src/semantic_rerank.rs— reranking pipeline with safe fallbackcrates/aft/src/semantic_diagnostics.rs— JSONL diagnostic loggingcrates/aft/src/semantic_doctor.rs— health-check report generationcrates/aft/src/semantic_eval.rs— evaluation harness (JSONL parser, scoring)crates/aft/src/vector_store.rs— VectorStore trait with DenseF32 and BinaryPacked implementationscrates/aft/src/commands/semantic_doctor.rs— doctor command handlercrates/aft/src/commands/semantic_eval.rs— eval command handlerModified significantly:
crates/aft/src/semantic_index.rs— lifecycle management, fingerprint-driven invalidation, non-blocking build, stale pruning, typed vectorscrates/aft/src/config.rs— provider profiles, rerank config, trust boundary fieldscrates/aft/src/commands/status.rs— semantic health metricscrates/aft/src/commands/semantic_search.rs— reranking integration, diagnostics output modeNeed help on this PR? Tag
/codesmithwith what you need. Autofix is disabled.Summary by cubic
Upgrades semantic search to a provider-aware pipeline with typed vectors, optional reranking, partial-index querying, and built-in diagnostics/eval. Adds Perplexity support and binary/int8 vectors with Hamming, and hardens reranking and config schema alignment for a smoother setup.
New Features
statussurfaces semantic health.semantic_doctorfor health checks andsemantic_evalfor recall@k/MRR..alfonso/and expand.gitignoreto ignore local agent/tooling dirs.perplexity(output encoding, storage strategy, input mode, distance metric); project-level stripping extended with tests.Migration
fastembed.rerankblock (model, base_url, api key env, limits).Written for commit 95ea25c. Summary will update on new commits.
Greptile Summary
This PR replaces the minimal embedding prototype with a provider-capability-aware retrieval subsystem: typed vectors (f32, int8, binary-packed), fingerprint-driven index lifecycle with background builds and cooperative cancellation, optional LLM-based reranking, JSONL diagnostics,
semantic_doctor, andsemantic_eval.TypedVector/StoredVector/VectorStoretrait decouple encoding, storage, and search; Hamming distance is now used correctly for binary-packed vectors.SemanticIndexFingerprint::diff()classifies changes asRebuild,ClearQueryCache, orNone, replacing the prior "delete and hope" invalidation.semantic_doctorandsemantic_evalcommands.aft.jsonc.Confidence Score: 3/5
Safe to merge for alpha/preview use, but the reranking pipeline does not function as intended — the overfetch from the vector index is discarded before the reranker can see it.
The reranking feature fetches max(top_k, 50) candidates from the vector index specifically to give the reranker a wider pool, but fuse_hybrid_results immediately truncates back to top_k before calling rerank_candidates. For the default top_k=10, the reranker always operates on 10 items — the same result set a non-reranking search returns — so it can only reorder but cannot surface better matches from the wider pool.
crates/aft/src/commands/semantic_search.rs — the interaction between index.search(...clamp(50, MAX_TOP_K)) and fuse_hybrid_results(..., params.top_k) needs to be reworked so the wider pool reaches the reranker before truncation.
Important Files Changed
Sequence Diagram
sequenceDiagram participant Agent participant SemanticSearch participant VectorIndex participant LexicalIndex participant Reranker Agent->>SemanticSearch: "aft_search({ query, top_k })" SemanticSearch->>VectorIndex: search(query_vec, clamp(top_k, 50, 100)) VectorIndex-->>SemanticSearch: semantic_results up to 50 SemanticSearch->>LexicalIndex: lexical_rank(trigrams, 50) LexicalIndex-->>SemanticSearch: lexical_files up to 50 SemanticSearch->>SemanticSearch: fuse_hybrid_results truncates to top_k Note over SemanticSearch: results.len() at most top_k default 10 alt rerank_enabled SemanticSearch->>Reranker: POST chat/completions min(max_candidates,top_k) snippets Reranker-->>SemanticSearch: JSON indices SemanticSearch->>SemanticSearch: apply reorder + append missing end SemanticSearch-->>Agent: status results diagnosticsComments Outside Diff (1)
packages/opencode-plugin/src/config.ts, line 37-54 (link)Several new enum schemas use values that don't align with the Rust serde representation:
SemanticOutputEncodingEnumallows"binary","ubinary","int8","uint8"but RustOutputEncodingdeserializes from"base64_binary"and"base64_int8".SemanticStorageStrategyEnumallows"flat"and"binary_pack"but RustStorageStrategyexpects"native_f32"and"binary_packed".SemanticInputModeEnumincludes"chunk_extracts"and"contextualized"but RustInputModeonly has"flat_texts"and"document_chunks".SemanticDistanceMetricEnumuses"dot"but RustDistanceMetricexpects"dot_product".SemanticBackendEnumis missing the new"perplexity"variant added to Rust.A user who follows the TypeScript autocomplete and picks
output_encoding: "int8"will pass TypeScript validation but receive a deserialization error (or silent fallback to default) from the Rust binary at runtime.Reviews (3): Last reviewed commit: "fix: address greptile and qubic review c..." | Re-trigger Greptile