fix(chunker): add Solidity language support#76
Merged
Conversation
Solidity (`.sol`) files were detected as unknown language and indexed as plain text via sliding window — no symbols, no contract/function navigation, no `cix def` / `cix refs`. Repos cloned by the server-side `repoindexer` skipped `.sol` files entirely because their language resolved to "". Fix on three sides: - `server/internal/langdetect`: map `.sol` → "solidity" - `server/internal/chunker`: register `grammars.SolidityLanguage` with node kinds for contract / library (class), interface / struct / enum / event (type), and function / modifier / constructor / fallback-receive (function). Tree-sitter grammar was already shipped by gotreesitter. - `cli/internal/discovery`: mirror `.sol` → "solidity" in the CLI's own extension map so locally-discovered files reach the server with the right language tag instead of "". Tests: - new `TestChunkFile_Solidity` verifies contract / function symbols are extracted with correct kind and parent linkage - `TestRegistry_NodeNamesMatchAST` fixture extended with a Solidity snippet so node names stay in sync with the grammar - langdetect test gets a `Token.sol → solidity` case Verified on a sample contract: 10 symbols extracted (contract, library, interface, struct, enum, event, function, modifier, method) where before everything came out as a single `block` chunk with no symbol name. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
PR review feedback: 1. doc/LANGUAGES.md was missed in the original PR. Bump default set header from (30) -> (31) and add the `solidity` row to the table. 2. TestRegistry_NodeNamesMatchAST's `matched := false` logic passes if any *one* of a language's configured kinds appears in the AST -- so a future grammar rename of `modifier_definition` (or any of the other 8 Solidity kinds beyond contract/function) would slip through. Reviewer flagged this as a low-priority shared limitation; rather than reshape the generic test for every language, add a focused `TestRegistry_SolidityAllNodeKindsPresent` that parses a contract exercising all 10 advertised node kinds and fails if any is missing. Also widen the Solidity fixture in the generic test for parity. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This was referenced Jun 5, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Add full Solidity (
.sol) support to the indexer: language detection onboth server and CLI, plus a
chunkerregistry entry that wires thetree-sitter Solidity grammar (already shipped by
gotreesitter) tocontract / interface / library / struct / enum / event / function /
modifier / constructor node kinds.
Why
Solidity files were detected as unknown language and indexed as plain
text via the sliding-window fallback — no symbols, no contract /
function navigation, no
cix def/cix refs. Worse, on theserver-side
repoindexerpath (GitHub-repo clones),.solfiles wereskipped entirely because language resolved to
"".After this fix
.solfiles behave like every other first-classlanguage: structural chunks, symbol names, references, semantic search.
How
Three coordinated changes — language detection lives in two places by
design (CLI sends
language=on file payloads, server falls back to itsown detection when missing), so both need the mapping:
server/internal/langdetect/langdetect.go—.sol→"solidity"server/internal/chunker/chunker.go— new registry entry mappinggrammars.SolidityLanguageto:class:contract_declaration,library_declarationtype:interface_declaration,struct_declaration,enum_declaration,event_definitionfunction:function_definition,modifier_definition,constructor_definition,fallback_receive_definitioncli/internal/discovery/language.go— same.sol→"solidity"in the CLI's own extension map
Node kinds were verified by parsing a real contract against the
gotreesitter Solidity grammar and reading the actual AST node types
(not guessed from the language reference).
Per CLAUDE.md the CLI and server release on independent tag streams,
but a shared contract change like this lands in one PR so neither side
drifts.
Tests
TestChunkFile_Solidityverifies that a sample contract yields aclasschunk forToken, afunctionchunk fortransfer, etc.,with correct parent linkage.
TestRegistry_NodeNamesMatchASTfixture extended with a Soliditysnippet so the configured node names stay in sync with the grammar
(and any future grammar bump fails loudly).
TestDetect(langdetect) gets aToken.sol → soliditycase.Manual verification on a sample contract: 10 symbols extracted
(contract, library, interface, struct, enum, event, function, modifier,
method) where before everything came out as a single
blockchunk withno symbol name.
Type of change
Checklist
go test ./internal/chunker/ ./internal/langdetect/— greengo build ./...— green on both server and CLI modulesgofmtclean on all edited files🤖 Generated with Claude Code