Skip to content

fix(chunker): add Solidity language support#76

Merged
dvcdsys merged 2 commits into
developfrom
fix/solidity-language-support
Jun 5, 2026
Merged

fix(chunker): add Solidity language support#76
dvcdsys merged 2 commits into
developfrom
fix/solidity-language-support

Conversation

@dvcdsys
Copy link
Copy Markdown
Owner

@dvcdsys dvcdsys commented Jun 5, 2026

What

Add full Solidity (.sol) support to the indexer: language detection on
both server and CLI, plus a chunker registry entry that wires the
tree-sitter Solidity grammar (already shipped by gotreesitter) to
contract / interface / library / struct / enum / event / function /
modifier / constructor node kinds.

Why

Solidity files were detected as unknown language and indexed as plain
text via the sliding-window fallback — no symbols, no contract /
function navigation, no cix def / cix refs. Worse, on the
server-side repoindexer path (GitHub-repo clones), .sol files were
skipped entirely because language resolved to "".

After this fix .sol files behave like every other first-class
language: structural chunks, symbol names, references, semantic search.

How

Three coordinated changes — language detection lives in two places by
design (CLI sends language= on file payloads, server falls back to its
own detection when missing), so both need the mapping:

  • server/internal/langdetect/langdetect.go.sol"solidity"
  • server/internal/chunker/chunker.go — new registry entry mapping
    grammars.SolidityLanguage to:
    • class: contract_declaration, library_declaration
    • type: interface_declaration, struct_declaration, enum_declaration, event_definition
    • function: function_definition, modifier_definition, constructor_definition, fallback_receive_definition
  • cli/internal/discovery/language.go — same .sol"solidity"
    in the CLI's own extension map

Node kinds were verified by parsing a real contract against the
gotreesitter Solidity grammar and reading the actual AST node types
(not guessed from the language reference).

Per CLAUDE.md the CLI and server release on independent tag streams,
but a shared contract change like this lands in one PR so neither side
drifts.

Tests

  • New TestChunkFile_Solidity verifies that a sample contract yields a
    class chunk for Token, a function chunk for transfer, etc.,
    with correct parent linkage.
  • TestRegistry_NodeNamesMatchAST fixture extended with a Solidity
    snippet so the configured node names stay in sync with the grammar
    (and any future grammar bump fails loudly).
  • TestDetect (langdetect) gets a Token.sol → solidity case.

Manual verification on a sample contract: 10 symbols extracted
(contract, library, interface, struct, enum, event, function, modifier,
method) where before everything came out as a single block chunk with
no symbol name.

Type of change

  • Bug fix
  • New feature

Checklist

  • go test ./internal/chunker/ ./internal/langdetect/ — green
  • go build ./... — green on both server and CLI modules
  • gofmt clean on all edited files
  • No secrets or API keys committed

🤖 Generated with Claude Code

dvcdsys and others added 2 commits June 5, 2026 17:31
Solidity (`.sol`) files were detected as unknown language and indexed
as plain text via sliding window — no symbols, no contract/function
navigation, no `cix def` / `cix refs`. Repos cloned by the server-side
`repoindexer` skipped `.sol` files entirely because their language
resolved to "".

Fix on three sides:

- `server/internal/langdetect`: map `.sol` → "solidity"
- `server/internal/chunker`: register `grammars.SolidityLanguage` with
  node kinds for contract / library (class), interface / struct / enum /
  event (type), and function / modifier / constructor / fallback-receive
  (function). Tree-sitter grammar was already shipped by gotreesitter.
- `cli/internal/discovery`: mirror `.sol` → "solidity" in the CLI's own
  extension map so locally-discovered files reach the server with the
  right language tag instead of "".

Tests:
- new `TestChunkFile_Solidity` verifies contract / function symbols are
  extracted with correct kind and parent linkage
- `TestRegistry_NodeNamesMatchAST` fixture extended with a Solidity
  snippet so node names stay in sync with the grammar
- langdetect test gets a `Token.sol → solidity` case

Verified on a sample contract: 10 symbols extracted (contract, library,
interface, struct, enum, event, function, modifier, method) where
before everything came out as a single `block` chunk with no symbol
name.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
PR review feedback:

1. doc/LANGUAGES.md was missed in the original PR. Bump default set
   header from (30) -> (31) and add the `solidity` row to the table.

2. TestRegistry_NodeNamesMatchAST's `matched := false` logic passes if
   any *one* of a language's configured kinds appears in the AST -- so
   a future grammar rename of `modifier_definition` (or any of the
   other 8 Solidity kinds beyond contract/function) would slip through.
   Reviewer flagged this as a low-priority shared limitation; rather
   than reshape the generic test for every language, add a focused
   `TestRegistry_SolidityAllNodeKindsPresent` that parses a contract
   exercising all 10 advertised node kinds and fails if any is missing.
   Also widen the Solidity fixture in the generic test for parity.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@dvcdsys dvcdsys merged commit 525a942 into develop Jun 5, 2026
3 checks passed
@dvcdsys dvcdsys deleted the fix/solidity-language-support branch June 5, 2026 16:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant