Skip to content

Fix tokenizer over-consuming character after ->> operator#83

Closed
RagingKore wants to merge 1 commit intoTylerBrinks:mainfrom
RagingKore:fix/longarrow-json-extraction
Closed

Fix tokenizer over-consuming character after ->> operator#83
RagingKore wants to merge 1 commit intoTylerBrinks:mainfrom
RagingKore:fix/longarrow-json-extraction

Conversation

@RagingKore
Copy link
Copy Markdown

@RagingKore RagingKore commented Apr 23, 2026

Reproduction (from the issue)

var parser = new SqlQueryParser();
const string sql = """
                   select
                   category_seq as seq,
                   data.name as name,
                   meta->>'description' as description
                   from category
                   order by seq;
                   """;

var x = parser.Parse(sql, new DuckDbDialect());

Throws:

SqlParser.TokenizeException: Unterminated string literal. Expected ' after Line: 4, Col: 20
   at SqlParser.Tokenizer.TokenizeQuotedString(TokenizeQuotedStringSettings settings)
   at SqlParser.Tokenizer.TokenizeSingleQuotedString(Char quoteStyle, Boolean backslashEscape)
   ...

Root cause

TokenizeLongArrow called _state.Next() after confirming the second >, then handed off to ConsumeForBinOp which calls _state.Next() again. The character immediately following ->> was therefore silently consumed. When it was the opening ' of a string literal, subsequent tokenization walked past the closing quote and raised "Unterminated string literal".

This only surfaced when ->> was written without whitespace before the following token (e.g. meta->>'description'). Every existing test uses meta ->> 'description', where the eaten character was the harmless space — which is why it slipped through.

The fix drops the redundant _state.Next(), aligning ->> with the already-correct #>> pattern in TokenizeHash.

Tests

Added LongArrowJsonExtractionTests covering tokenization, the parser-level AST, and the exact multi-line SQL from the bug report.

The new tests are deliberately scoped to the dialects that genuinely support ->> as a JSON extraction operator: PostgreSQL, DuckDB, MySQL (since 5.7.13), SQLite (since 3.38), Redshift, and Generic. ->> is a PostgreSQL-originated extension and is not part of ANSI/ISO SQL. Dialects that use other JSON extraction mechanisms (Snowflake's col:path, BigQuery's JSON_VALUE, MS SQL's JSON_VALUE, Hive/Databricks' get_json_object, Oracle, ANSI) are intentionally excluded.

…s#70)

TokenizeLongArrow called _state.Next() after confirming the second '>'
and then delegated to ConsumeForBinOp, which calls _state.Next() again.
The result was that the character immediately following `->>` was
silently eaten. When that character was the opening single quote of a
string literal, subsequent tokenization walked past the closing quote
and raised "Unterminated string literal".

This only surfaced when `->>` was written without whitespace before the
following token (e.g. `meta->>'x'`). The existing tests all used
`meta ->> 'x'`, where the swallowed character happened to be the
harmless space.

The fix drops the redundant _state.Next(), aligning `->>` with the
already-correct pattern used by `#>>` in TokenizeHash.

Regression tests are added against the dialects that genuinely support
`->>` (PostgreSQL, DuckDB, MySQL, SQLite, Redshift, Generic). `->>` is
a PostgreSQL-originated extension and is not part of ANSI/ISO SQL, so
dialects that use other JSON extraction mechanisms (Snowflake, BigQuery,
MS SQL Server, Hive, Databricks, Oracle, ANSI) are intentionally
excluded.
@RagingKore
Copy link
Copy Markdown
Author

Superseded by #84 — renamed branch to bugfix/longarrow-json-extraction to match the CI workflow's branch naming convention.

@RagingKore RagingKore closed this Apr 23, 2026
@RagingKore RagingKore deleted the fix/longarrow-json-extraction branch April 23, 2026 14:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant