From 4b2d7c90fa2b578015516567c9e7eed6d54ec63a Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Lukas=20Backstr=C3=B6m?= <lukas@erlang.org>
Date: Wed, 6 May 2026 14:24:43 +0200
Subject: [PATCH 01/15] Add lexer rules for native records (OTP 29)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Native records introduced new syntactic forms in Erlang/OTP 29
(erlang/otp PR #11090). The shape that the existing record rule
does not match is the external construction / pattern / field
access form `#Module:Name{...}` and `#Module:Name.field`, where
`Module:Name` appears between the `#` and the `{` (or `.`).

Tokenize the module qualifier as `:name_class` (matching the
existing namespace pattern) and the record name as the existing
record-name token (`:string_symbol`). Local construction
(`#Name{...}`) is identical in shape to a tuple-based record and
needs no change — the lexer cannot disambiguate from local
context, so both colour the same way.

The other native-record forms — `-record #Name{...}.` definition
attribute, `-export_record([...])`, `-import_record(Mod, [...])`
— already tokenize correctly under the existing module_attribute
rule (the attribute name is captured generically and `(` is
already optional). Tests added to lock the expected output.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Add tests for native record patterns and updates

The native-records commit covered construction (`#mod:name{f=v}`)
but not the symmetric forms: pattern matching
(`#mod:name{f = X} = Y`) and updates via a prefixed variable
(`Y#mod:name{f = 2}`). Both already work via the same rule but
nothing tested them; lock the coverage so a future ordering tweak
in the choice can't silently regress them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Accept variable-shape and keyword names in native records

OTP 29's native-record syntax relaxes the record-name rule: per the
spec at https://www.erlang.org/doc/system/data_types.html, "it is
not necessary to quote atoms that look like variable names or
keywords." So `#State{}`, `#div{}`, `#case{}`, `#fun{}` are all
valid record references even though `State` is variable-shape and
`div`/`case`/`fun` are reserved words.

The previous rule used `atom_name` only, which requires lowercase
or quoted. `#State{}` therefore fell through to the punctuation /
variable rules and produced disjoint tokens with no record-shape
grouping. Add a `record_name` combinator that accepts either
`atom_name` or `variable_name`, both tagged `:string_symbol`, and
use it in both `record` and `native_record_external`. Tuple-based
records don't actually allow these forms, but the lexer can't tell
the two record kinds apart from local context — so accept the
union.

Keyword and word-operator names (`#case{}`, `#div{}`, `#fun{}`)
still get re-tagged by postprocess to `:keyword` /
`:operator_word`. That's accepted output: the surrounding `#...{`
shape still groups visually as a record reference, and themes that
care can render keywords-in-record-position differently if
desired.

Tests cover all four name shapes (lowercase atom, variable,
keyword, quoted) in three positions (local construction, external
construction, definition attribute), including the OTP 29 spec
example `-record #vector{x = 0.0, y = 0.0}.`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Don't reclassify keyword / builtin record names in postprocess

When `#case{}` is a record reference, the existing postprocess
clauses re-tagged the inner `:string_symbol "case"` as `:keyword`,
because the conversion was unconditional on the value matching
the keyword list. Visually that meant the record-name slot
flipped colour depending on whether the chosen name happened to
be a reserved word — confusing for readers, and wrong because the
text in that position is a record name, not an expression keyword.

Tag record-name tokens with a `record_name: true` meta marker via
the `record_name` combinator. Add a postprocess clause that
matches the marker and bypasses the keyword / builtin /
word-operator reclassification, then strips the marker so it
doesn't leak into rendered tokens. Both `:string_symbol` clauses
(keyword, builtin, word-operator) are guarded by the marker check
implicitly because pattern matching is order-sensitive and the
marker clause comes first.

Tests for `#case{}`, `#fun{}`, `#div{}`, and `#mod:case{}` now
assert `:string_symbol` for the record name (matching the
lowercase-name behaviour) instead of `:keyword` /
`:operator_word`. A new test verifies the marker doesn't leak
through to output meta.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 lib/makeup/lexers/erlang_lexer.ex             |  44 ++-
 .../erlang_lexer_tokenizer_test.exs           | 279 ++++++++++++++++++
 2 files changed, 322 insertions(+), 1 deletion(-)

diff --git a/lib/makeup/lexers/erlang_lexer.ex b/lib/makeup/lexers/erlang_lexer.ex
index eb94a3f..9a65a46 100644
--- a/lib/makeup/lexers/erlang_lexer.ex
+++ b/lib/makeup/lexers/erlang_lexer.ex
@@ -224,9 +224,43 @@ defmodule Makeup.Lexers.ErlangLexer do
       :operator
     )
 
+  # OTP 29 native records relax the record-name rule: per the spec
+  # (https://www.erlang.org/doc/system/data_types.html), "it is not necessary
+  # to quote atoms that look like variable names or keywords." So `#State{}`,
+  # `#div{}`, `#case{}` are all valid record references even though `State`
+  # is variable-shape and `div`/`case` are reserved words. Tuple-based records
+  # don't allow these forms, but the lexer can't tell the two record kinds
+  # apart from local context — so accept the union.
+  #
+  # The `record_name: true` meta marker tells postprocess to skip the
+  # keyword / builtin / word-operator conversion for this position. Without
+  # it, `#case{}` would tokenise as `[#, keyword case, {]` — visually
+  # confusing because `case` here names a record, not an expression keyword.
+  record_name =
+    choice([
+      token(atom_name, :string_symbol, %{record_name: true}),
+      token(variable_name, :string_symbol, %{record_name: true})
+    ])
+
+  # External native record construction / pattern / field access:
+  #     #Module:Name{F = V}
+  #     #Module:Name.field
+  # The `Module:Name` shape between `#` and `{` (or `.`) was added in OTP 29
+  # alongside native records. Local construction (`#Name{...}`) is identical
+  # in shape to a tuple-based record and is handled by the rule below.
+  native_record_external =
+    token(string("#"), :operator)
+    |> concat(token(atom_name, :name_class))
+    |> concat(token(":", :punctuation))
+    |> concat(record_name)
+    |> choice([
+      token("{", :punctuation),
+      token(".", :punctuation)
+    ])
+
   record =
     token(string("#"), :operator)
-    |> concat(atom)
+    |> concat(record_name)
     |> choice([
       token("{", :punctuation),
       token(".", :punctuation)
@@ -304,6 +338,7 @@ defmodule Makeup.Lexers.ErlangLexer do
       ] ++
         all_sigils ++
         [
+          native_record_external,
           record,
           punctuation,
           # `tuple` might be unnecessary
@@ -379,6 +414,13 @@ defmodule Makeup.Lexers.ErlangLexer do
 
   @word_operators ~W[and andalso band bnot bor bsl bsr bxor div not or orelse rem xor]
 
+  # Record names tagged by the `record_name` combinator should not be
+  # reclassified as keywords / builtins / word-operators even if their
+  # text happens to match. Strip the marker after acting on it so it
+  # doesn't leak into the rendered output.
+  defp postprocess_helper([{:string_symbol, %{record_name: true} = meta, value} | tokens]),
+    do: [{:string_symbol, Map.delete(meta, :record_name), value} | postprocess_helper(tokens)]
+
   defp postprocess_helper([{:string_symbol, meta, value} | tokens]) when value in @keywords,
     do: [{:keyword, meta, value} | postprocess_helper(tokens)]
 
diff --git a/test/makeup/erlang_lexer/erlang_lexer_tokenizer_test.exs b/test/makeup/erlang_lexer/erlang_lexer_tokenizer_test.exs
index 84ba145..f75085b 100644
--- a/test/makeup/erlang_lexer/erlang_lexer_tokenizer_test.exs
+++ b/test/makeup/erlang_lexer/erlang_lexer_tokenizer_test.exs
@@ -553,6 +553,285 @@ defmodule ErlangLexerTokenizer do
     end
   end
 
+  describe "native records (OTP 29)" do
+    test "tokenizes external native record construction" do
+      assert [
+               {:operator, %{}, "#"},
+               {:name_class, %{}, "vector_lib"},
+               {:punctuation, %{}, ":"},
+               {:string_symbol, %{}, "vector"},
+               {:punctuation, %{}, "{"} | _
+             ] = lex("#vector_lib:vector{x = 1.0, y = 2.0}")
+    end
+
+    test "tokenizes external native record print form" do
+      assert [
+               {:operator, %{}, "#"},
+               {:name_class, %{}, "example"},
+               {:punctuation, %{}, ":"},
+               {:string_symbol, %{}, "pair"},
+               {:punctuation, %{}, "{"} | _
+             ] = lex("#example:pair{a = 1, b = 2}")
+    end
+
+    test "tokenizes external native record field access" do
+      assert [
+               {_, %{}, "X"},
+               {:operator, %{}, "#"},
+               {:name_class, %{}, "vector_lib"},
+               {:punctuation, %{}, ":"},
+               {:string_symbol, %{}, "vector"},
+               {:punctuation, %{}, "."} | _
+             ] = lex("X#vector_lib:vector.x")
+    end
+
+    test "tokenizes local native record construction the same as tuple-based records" do
+      assert [
+               {:operator, %{}, "#"},
+               {:string_symbol, %{}, "pair"},
+               {:punctuation, %{}, "{"} | _
+             ] = lex("#pair{a = 1, b = 2}")
+    end
+
+    test "tokenizes -record #Name{...} native definition attribute" do
+      tokens = lex("\n-record #pair{a, b}.")
+      assert {:name_attribute, %{}, "record"} in tokens
+      assert {:operator, %{}, "#"} in tokens
+      assert {:string_symbol, %{}, "pair"} in tokens
+    end
+
+    test "tokenizes -export_record attribute" do
+      assert [
+               {:whitespace, %{}, "\n"},
+               {:punctuation, %{}, "-"},
+               {:name_attribute, %{}, "export_record"} | _
+             ] = lex("\n-export_record([vector, position]).")
+    end
+
+    test "tokenizes -import_record attribute" do
+      assert [
+               {:whitespace, %{}, "\n"},
+               {:punctuation, %{}, "-"},
+               {:name_attribute, %{}, "import_record"} | _
+             ] = lex("\n-import_record(vector_lib, [vector, position]).")
+    end
+
+    test "does not break the existing local-record rule when there is no `:`" do
+      tokens = lex("X#name{f = 1}")
+      assert {:operator, %{}, "#"} in tokens
+      assert {:string_symbol, %{}, "name"} in tokens
+      refute Enum.any?(tokens, fn t -> match?({:name_class, _, _}, t) end)
+    end
+
+    test "external native record pattern match" do
+      assert [
+               {:operator, %{}, "#"},
+               {:name_class, %{}, "mod"},
+               {:punctuation, %{}, ":"},
+               {:string_symbol, %{}, "name"},
+               {:punctuation, _, "{"},
+               {:string_symbol, %{}, "f"},
+               {:whitespace, %{}, " "},
+               {:operator, %{}, "="},
+               {:whitespace, %{}, " "},
+               {:name, %{}, "X"},
+               {:punctuation, _, "}"},
+               {:whitespace, %{}, " "},
+               {:operator, %{}, "="},
+               {:whitespace, %{}, " "},
+               {:name, %{}, "Y"}
+             ] = lex("#mod:name{f = X} = Y")
+    end
+
+    test "external native record update via prefixed variable" do
+      assert [
+               {:name, %{}, "Y"},
+               {:operator, %{}, "#"},
+               {:name_class, %{}, "mod"},
+               {:punctuation, %{}, ":"},
+               {:string_symbol, %{}, "name"},
+               {:punctuation, _, "{"} | _
+             ] = lex("Y#mod:name{f = 2}")
+    end
+
+    # Native records relax the record-name rule:
+    # https://www.erlang.org/doc/system/data_types.html says "it is not
+    # necessary to quote atoms that look like variable names or keywords."
+    # So `#State{}`, `#div{}`, `#case{}` are all valid.
+    test "variable-shape name (`#State{}`)" do
+      assert [
+               {:operator, %{}, "#"},
+               {:string_symbol, %{}, "State"},
+               {:punctuation, _, "{"},
+               {:string_symbol, %{}, "x"},
+               {:whitespace, %{}, " "},
+               {:operator, %{}, "="},
+               {:whitespace, %{}, " "},
+               {:number_integer, %{}, "1"},
+               {:punctuation, _, "}"}
+             ] = lex("#State{x = 1}")
+    end
+
+    test "external native record with variable-shape name" do
+      assert [
+               {:operator, %{}, "#"},
+               {:name_class, %{}, "mod"},
+               {:punctuation, %{}, ":"},
+               {:string_symbol, %{}, "State"},
+               {:punctuation, _, "{"} | _
+             ] = lex("#mod:State{x = 1}")
+    end
+
+    # Keyword and word-operator names stay as `:string_symbol` in record
+    # position. Postprocess sees the `record_name: true` meta marker and
+    # skips the usual conversion to `:keyword` / `:operator_word`, so the
+    # surrounding `#...{` shape renders consistently regardless of whether
+    # the name happens to be a reserved word.
+    test "keyword name (`#case{}`) stays as :string_symbol" do
+      assert [
+               {:operator, %{}, "#"},
+               {:string_symbol, %{}, "case"},
+               {:punctuation, _, "{"} | _
+             ] = lex("#case{x = 1}")
+    end
+
+    test "keyword name (`#fun{}`)" do
+      assert [
+               {:operator, %{}, "#"},
+               {:string_symbol, %{}, "fun"},
+               {:punctuation, _, "{"} | _
+             ] = lex("#fun{f = g}")
+    end
+
+    test "word-operator name (`#div{}`)" do
+      assert [
+               {:operator, %{}, "#"},
+               {:string_symbol, %{}, "div"},
+               {:punctuation, _, "{"} | _
+             ] = lex("#div{class}")
+    end
+
+    test "external native record with keyword name (`#mod:case{}`)" do
+      assert [
+               {:operator, %{}, "#"},
+               {:name_class, %{}, "mod"},
+               {:punctuation, %{}, ":"},
+               {:string_symbol, %{}, "case"},
+               {:punctuation, _, "{"} | _
+             ] = lex("#mod:case{x = 1}")
+    end
+
+    test "quoted-atom record name (`#'42'{}`)" do
+      assert [
+               {:operator, %{}, "#"},
+               {:string_symbol, %{}, "'42'"},
+               {:punctuation, _, "{"} | _
+             ] = lex("#'42'{}")
+    end
+
+    # Declaration syntax: `-record #Name{...}.` (no parens around the name).
+    # This is the OTP 29 native-record definition form, distinct from the
+    # tuple-based `-record(name, {...}).` form. The same name flexibility
+    # (lowercase / variable-shape / keyword / quoted) applies.
+    test "definition with lowercase name" do
+      assert lex("\n-record #pair{a, b}.") == [
+               {:whitespace, %{}, "\n"},
+               {:punctuation, %{}, "-"},
+               {:name_attribute, %{}, "record"},
+               {:whitespace, %{}, " "},
+               {:operator, %{}, "#"},
+               {:string_symbol, %{}, "pair"},
+               {:punctuation, %{group_id: "group-1"}, "{"},
+               {:string_symbol, %{}, "a"},
+               {:punctuation, %{}, ","},
+               {:whitespace, %{}, " "},
+               {:string_symbol, %{}, "b"},
+               {:punctuation, %{group_id: "group-1"}, "}"},
+               {:punctuation, %{}, "."}
+             ]
+    end
+
+    test "definition with variable-shape name (`-record #State{x}.`)" do
+      assert lex("\n-record #State{x}.") == [
+               {:whitespace, %{}, "\n"},
+               {:punctuation, %{}, "-"},
+               {:name_attribute, %{}, "record"},
+               {:whitespace, %{}, " "},
+               {:operator, %{}, "#"},
+               {:string_symbol, %{}, "State"},
+               {:punctuation, %{group_id: "group-1"}, "{"},
+               {:string_symbol, %{}, "x"},
+               {:punctuation, %{group_id: "group-1"}, "}"},
+               {:punctuation, %{}, "."}
+             ]
+    end
+
+    test "definition with keyword name (`-record #div{class}.`)" do
+      assert lex("\n-record #div{class}.") == [
+               {:whitespace, %{}, "\n"},
+               {:punctuation, %{}, "-"},
+               {:name_attribute, %{}, "record"},
+               {:whitespace, %{}, " "},
+               {:operator, %{}, "#"},
+               {:string_symbol, %{}, "div"},
+               {:punctuation, %{group_id: "group-1"}, "{"},
+               {:string_symbol, %{}, "class"},
+               {:punctuation, %{group_id: "group-1"}, "}"},
+               {:punctuation, %{}, "."}
+             ]
+    end
+
+    test "definition with quoted name (`-record #'42'{}.`)" do
+      assert lex("\n-record #'42'{}.") == [
+               {:whitespace, %{}, "\n"},
+               {:punctuation, %{}, "-"},
+               {:name_attribute, %{}, "record"},
+               {:whitespace, %{}, " "},
+               {:operator, %{}, "#"},
+               {:string_symbol, %{}, "'42'"},
+               {:punctuation, %{group_id: "group-1"}, "{"},
+               {:punctuation, %{group_id: "group-1"}, "}"},
+               {:punctuation, %{}, "."}
+             ]
+    end
+
+    test "the record_name meta marker does not leak into output tokens" do
+      # Postprocess strips the marker after acting on it. End-to-end the
+      # token's meta should be the same as for any other :string_symbol.
+      [_, {:string_symbol, meta_kw, "case"} | _] = lex("#case{x = 1}")
+      [_, {:string_symbol, meta_lc, "vector"} | _] = lex("#vector{x = 1}")
+      assert meta_kw == meta_lc
+      refute Map.has_key?(meta_kw, :record_name)
+    end
+
+    test "definition with default values" do
+      # `-record #vector{x = 0.0, y = 0.0}.` — the OTP 29 spec example.
+      assert lex("\n-record #vector{x = 0.0, y = 0.0}.") == [
+               {:whitespace, %{}, "\n"},
+               {:punctuation, %{}, "-"},
+               {:name_attribute, %{}, "record"},
+               {:whitespace, %{}, " "},
+               {:operator, %{}, "#"},
+               {:string_symbol, %{}, "vector"},
+               {:punctuation, %{group_id: "group-1"}, "{"},
+               {:string_symbol, %{}, "x"},
+               {:whitespace, %{}, " "},
+               {:operator, %{}, "="},
+               {:whitespace, %{}, " "},
+               {:number_float, %{}, "0.0"},
+               {:punctuation, %{}, ","},
+               {:whitespace, %{}, " "},
+               {:string_symbol, %{}, "y"},
+               {:whitespace, %{}, " "},
+               {:operator, %{}, "="},
+               {:whitespace, %{}, " "},
+               {:number_float, %{}, "0.0"},
+               {:punctuation, %{group_id: "group-1"}, "}"},
+               {:punctuation, %{}, "."}
+             ]
+    end
+  end
+
   describe "function_arity" do
     test "is tokenized correctly for the syntax function_name/arity" do
       assert [

From 5161d41ac910e9d64e0ee1d23498b97b9a69869c Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Lukas=20Backstr=C3=B6m?= <lukas@erlang.org>
Date: Wed, 6 May 2026 14:25:31 +0200
Subject: [PATCH 02/15] Accept underscore separators in numeric literals (OTP
 27)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

OTP 27 added `_` as a digit-group separator in numeric literals:
`1_000_000`, `16#FF_FF`, `0.1_5e1_0`. Extend the digit character
classes so numeric tokens accept these forms.

The lexer is intentionally tolerant about position — it does not
validate that underscores only sit between digits and not at the
edges of the literal — because the lexer's job is highlighting,
not validation. The compiler will reject malformed literals with
a real error.

Tightened `number_integer` to require a leading digit so a bare
underscore can't accidentally start a number; the digit-tail then
absorbs further `[0-9_]+`. Weird-base integers (`16#FF_FF`) now
include `_` in the post-`#` character set.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 lib/makeup/lexers/erlang_lexer.ex             | 13 ++++++---
 .../erlang_lexer_tokenizer_test.exs           | 27 +++++++++++++++++++
 2 files changed, 36 insertions(+), 4 deletions(-)

diff --git a/lib/makeup/lexers/erlang_lexer.ex b/lib/makeup/lexers/erlang_lexer.ex
index 9a65a46..b67fb21 100644
--- a/lib/makeup/lexers/erlang_lexer.ex
+++ b/lib/makeup/lexers/erlang_lexer.ex
@@ -59,24 +59,29 @@ defmodule Makeup.Lexers.ErlangLexer do
     ])
 
   # Numbers
-  digits = ascii_string([?0..?9], min: 1)
+  #
+  # Erlang/OTP 27 added underscore separators in numeric literals
+  # (`1_000_000`, `16#FF_FF`, `0.1_5e1_0`). Lexer-tolerant: underscores are
+  # accepted anywhere inside the digit run; we don't validate position.
+  digits = ascii_string([?0..?9, ?_], min: 1)
 
   number_integer =
     optional(ascii_char([?+, ?-]))
-    |> concat(digits)
+    |> ascii_char([?0..?9])
+    |> optional(ascii_string([?0..?9, ?_], min: 1))
     |> token(:number_integer)
 
   number_integer_in_weird_base =
     optional(ascii_char([?+, ?-]))
     |> concat(numeric_base)
     |> string("#")
-    |> ascii_string([?0..?9, ?a..?z, ?A..?Z], min: 1)
+    |> ascii_string([?0..?9, ?a..?z, ?A..?Z, ?_], min: 1)
     |> token(:number_integer)
 
   # Floating point numbers
   float_scientific_notation_part =
     ascii_string([?e, ?E], 1)
-    |> optional(string("-"))
+    |> optional(ascii_char([?+, ?-]))
     |> concat(digits)
 
   number_float =
diff --git a/test/makeup/erlang_lexer/erlang_lexer_tokenizer_test.exs b/test/makeup/erlang_lexer/erlang_lexer_tokenizer_test.exs
index f75085b..47f728e 100644
--- a/test/makeup/erlang_lexer/erlang_lexer_tokenizer_test.exs
+++ b/test/makeup/erlang_lexer/erlang_lexer_tokenizer_test.exs
@@ -94,6 +94,33 @@ defmodule ErlangLexerTokenizer do
       assert lex("1.05e12") == [{:number_float, %{}, "1.05e12"}]
       assert lex("1.05e-6") == [{:number_float, %{}, "1.05e-6"}]
       assert lex("1.05e-12") == [{:number_float, %{}, "1.05e-12"}]
+      assert lex("1.05e+6") == [{:number_float, %{}, "1.05e+6"}]
+      assert lex("1.0e+10") == [{:number_float, %{}, "1.0e+10"}]
+    end
+
+    # Numeric separators (`_`) are valid inside numeric literals since OTP 27.
+    test "integers with underscore separators" do
+      assert lex("1_000") == [{:number_integer, %{}, "1_000"}]
+      assert lex("1_000_000") == [{:number_integer, %{}, "1_000_000"}]
+    end
+
+    test "floats with underscore separators" do
+      assert lex("1_000.5") == [{:number_float, %{}, "1_000.5"}]
+      assert lex("3.14_15") == [{:number_float, %{}, "3.14_15"}]
+    end
+
+    test "weird-base integers with underscore separators" do
+      assert lex("16#FF_FF") == [{:number_integer, %{}, "16#FF_FF"}]
+      assert lex("2#1010_1010") == [{:number_integer, %{}, "2#1010_1010"}]
+    end
+
+    test "trailing identifier after a number is not absorbed via underscore" do
+      # `1_000` is a number; the bare identifier following with whitespace is separate.
+      assert [
+               {:number_integer, %{}, "1_000"},
+               {:whitespace, %{}, " "},
+               {:name, %{}, "X"}
+             ] = lex("1_000 X")
     end
   end
 

From e5ed1f36d9228b7122e62ffba745d44c580afb94 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Lukas=20Backstr=C3=B6m?= <lukas@erlang.org>
Date: Wed, 6 May 2026 14:33:54 +0200
Subject: [PATCH 03/15] Tokenize `?=` as a single operator
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

`?=` is the maybe-expression match operator added in OTP 25
(stable in OTP 27). Without it in `syntax_operators`,
`X ?= Y` was lexed as two operator tokens (`?` and `=`),
which is wrong both visually and semantically — it broke
inside `maybe ... end` blocks.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 lib/makeup/lexers/erlang_lexer.ex              |  2 +-
 .../erlang_lexer_tokenizer_test.exs            | 18 ++++++++++++++++++
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/lib/makeup/lexers/erlang_lexer.ex b/lib/makeup/lexers/erlang_lexer.ex
index b67fb21..fa66bc9 100644
--- a/lib/makeup/lexers/erlang_lexer.ex
+++ b/lib/makeup/lexers/erlang_lexer.ex
@@ -225,7 +225,7 @@ defmodule Makeup.Lexers.ErlangLexer do
 
   syntax_operators =
     word_from_list(
-      ~W[+ - +? ++ = == -- * / < > /= =:= =/= =< >= ==? <- <:- <= <:= ! ? ?!],
+      ~W[+ - +? ++ = == -- * / < > /= =:= =/= =< >= ==? <- <:- <= <:= ! ? ?! ?=],
       :operator
     )
 
diff --git a/test/makeup/erlang_lexer/erlang_lexer_tokenizer_test.exs b/test/makeup/erlang_lexer/erlang_lexer_tokenizer_test.exs
index 47f728e..cbc2b3a 100644
--- a/test/makeup/erlang_lexer/erlang_lexer_tokenizer_test.exs
+++ b/test/makeup/erlang_lexer/erlang_lexer_tokenizer_test.exs
@@ -431,6 +431,7 @@ defmodule ErlangLexerTokenizer do
       assert lex("<:-") == [{:operator, %{}, "<:-"}]
       assert lex("<=") == [{:operator, %{}, "<="}]
       assert lex("<:=") == [{:operator, %{}, "<:="}]
+      assert lex("?=") == [{:operator, %{}, "?="}]
     end
 
     test "word operators are tokenized as operator" do
@@ -580,6 +581,23 @@ defmodule ErlangLexerTokenizer do
     end
   end
 
+  describe "maybe expression" do
+    # `?=` is the maybe-expression match operator added in OTP 25.
+    test "tokenizes ?= as a single operator inside a maybe block" do
+      assert lex("maybe X ?= ok end") == [
+               {:keyword, %{}, "maybe"},
+               {:whitespace, %{}, " "},
+               {:name, %{}, "X"},
+               {:whitespace, %{}, " "},
+               {:operator, %{}, "?="},
+               {:whitespace, %{}, " "},
+               {:string_symbol, %{}, "ok"},
+               {:whitespace, %{}, " "},
+               {:keyword, %{}, "end"}
+             ]
+    end
+  end
+
   describe "native records (OTP 29)" do
     test "tokenizes external native record construction" do
       assert [

From 315529933964a178a6006b47fb9e23f4ad59678d Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Lukas=20Backstr=C3=B6m?= <lukas@erlang.org>
Date: Wed, 6 May 2026 14:37:12 +0200
Subject: [PATCH 04/15] Tokenize multi-character escape sequences in `$\\...`
 chars
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The `character` rule tokenized `$\\` followed by a single byte —
fine for simple `$\\n`, `$\\t`, but wrong for hex (`$\\xFF`,
`$\\x{1F600}`), octal (`$\\077`) and control (`$\\^A`) forms.
Those were splitting into a partial char-token plus stray name
or integer tokens, which rendered as broken syntax in the docs.

Add a dedicated `character_escape` rule that, after consuming
the leading backslash, tries the structured escapes (hex with
its two `x`-prefixed forms, octal, control) before falling back
to any single char. The order matters: `escape_hex` and
`escape_octal` must precede the single-char fallback so the
multi-character forms are consumed whole.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Emit `:string_escape` sub-tokens inside double-quoted strings

The `triple_quoted_string` rule already emitted `:string_escape`
sub-tokens for each escape sequence inside the string body. Plain
double-quoted strings did not — they used a literal `\"` recogniser
that only stopped the closing-quote logic from triggering early,
without producing a distinct token for the escape itself. Themes
that wanted to colour escapes differently from the surrounding
string body had no token to hook on.

Replace the special-purpose `escape_double_quote` with the generic
`escaped_char`, which itself was extended to consume structured
escapes (`\\xFF`, `\\x{...}`, `\\077`, `\\^A`) whole rather than
truncating after the leading byte. `string_like` now sees the same
sub-token vocabulary in `"..."` strings and `"""..."""`
triple-quoted strings.

Existing string-escape tests updated to match the new (richer)
output.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 lib/makeup/lexers/erlang_lexer.ex             | 31 ++++++--
 .../erlang_lexer_tokenizer_test.exs           | 77 +++++++++++++++++--
 2 files changed, 98 insertions(+), 10 deletions(-)

diff --git a/lib/makeup/lexers/erlang_lexer.ex b/lib/makeup/lexers/erlang_lexer.ex
index fa66bc9..db3e7f4 100644
--- a/lib/makeup/lexers/erlang_lexer.ex
+++ b/lib/makeup/lexers/erlang_lexer.ex
@@ -157,10 +157,23 @@ defmodule Makeup.Lexers.ErlangLexer do
     |> optional(string(".") |> concat(atom_name))
     |> token(:name_label)
 
+  # `$\xFF`, `$\x{1F600}`, `$\077`, `$\^A`, plus simple `$\n` / `$\t` / `$\\` /
+  # `$\"` / `$\'` etc. The structured escapes (octal, hex, ctrl) must be tried
+  # before the single-char fallback so multi-character sequences are consumed
+  # whole.
+  character_escape =
+    string("\\")
+    |> choice([
+      escape_hex,
+      escape_octal,
+      escape_ctrl,
+      utf8_char([])
+    ])
+
   character =
     string("$")
     |> choice([
-      string("\\") |> utf8_char([]),
+      character_escape,
       utf8_char(not: ?\\)
     ])
     |> token(:string_char)
@@ -171,14 +184,22 @@ defmodule Makeup.Lexers.ErlangLexer do
     |> ascii_char(to_charlist("~#+BPWXb-ginpswx"))
     |> token(:string_interpol)
 
-  escape_double_quote = string(~s/\\"/)
-  erlang_string = string_like(~s/"/, ~s/"/, [escape_double_quote, string_interpol], :string)
-
+  # Sub-token emitted inside string literals for escape sequences. Mirrors
+  # the `character_escape` shape so multi-character escapes (`\xFF`,
+  # `\x{1F600}`, `\077`, `\^A`) are consumed whole instead of getting
+  # cut at the first byte. Themes can render these distinctly.
   escaped_char =
     string("\\")
-    |> utf8_string([], 1)
+    |> choice([
+      escape_hex,
+      escape_octal,
+      escape_ctrl,
+      utf8_char([])
+    ])
     |> token(:string_escape)
 
+  erlang_string = string_like(~s/"/, ~s/"/, [escaped_char, string_interpol], :string)
+
   triple_quoted_string =
     lookahead_string(string(~s/"""\n/), string(~s/\n"""/), [escaped_char, string_interpol])
 
diff --git a/test/makeup/erlang_lexer/erlang_lexer_tokenizer_test.exs b/test/makeup/erlang_lexer/erlang_lexer_tokenizer_test.exs
index cbc2b3a..0bcc194 100644
--- a/test/makeup/erlang_lexer/erlang_lexer_tokenizer_test.exs
+++ b/test/makeup/erlang_lexer/erlang_lexer_tokenizer_test.exs
@@ -20,6 +20,36 @@ defmodule ErlangLexerTokenizer do
     assert lex("$🫂") == [{:string_char, %{}, "$🫂"}]
   end
 
+  describe "character escape sequences" do
+    test "named escapes" do
+      assert lex("$\\n") == [{:string_char, %{}, "$\\n"}]
+      assert lex("$\\t") == [{:string_char, %{}, "$\\t"}]
+      assert lex("$\\\\") == [{:string_char, %{}, "$\\\\"}]
+      assert lex("$\\\"") == [{:string_char, %{}, "$\\\""}]
+    end
+
+    test "octal escape" do
+      assert lex("$\\7") == [{:string_char, %{}, "$\\7"}]
+      assert lex("$\\07") == [{:string_char, %{}, "$\\07"}]
+      assert lex("$\\077") == [{:string_char, %{}, "$\\077"}]
+    end
+
+    test "hex escape (two-digit form)" do
+      assert lex("$\\xFF") == [{:string_char, %{}, "$\\xFF"}]
+      assert lex("$\\x4a") == [{:string_char, %{}, "$\\x4a"}]
+    end
+
+    test "hex escape (braced form)" do
+      assert lex("$\\x{1F600}") == [{:string_char, %{}, "$\\x{1F600}"}]
+      assert lex("$\\x{0}") == [{:string_char, %{}, "$\\x{0}"}]
+    end
+
+    test "control escape" do
+      assert lex("$\\^A") == [{:string_char, %{}, "$\\^A"}]
+      assert lex("$\\^z") == [{:string_char, %{}, "$\\^z"}]
+    end
+  end
+
   test "comment" do
     assert lex("%abc") == [{:comment_single, %{}, "%abc"}]
     assert lex("% abc") == [{:comment_single, %{}, "% abc"}]
@@ -148,16 +178,53 @@ defmodule ErlangLexerTokenizer do
     end
 
     test "tokenizes escape of double quotes correctly" do
-      assert [{:string, %{}, ~s/"escape \\"double quote\\""/}] ==
-               lex(~s/"escape \\"double quote\\""/)
+      # Strings now produce :string_escape sub-tokens for each escape
+      # sequence (mirroring the triple-quoted-string behaviour and
+      # `makeup_elixir`). Themes can render escapes distinctly from the
+      # surrounding string body.
+      assert [
+               {:string, %{}, ~s/"escape /},
+               {:string_escape, %{}, ~s/\\"/},
+               {:string, %{}, "double quote"},
+               {:string_escape, %{}, ~s/\\"/},
+               {:string, %{}, "\""}
+             ] = lex(~s/"escape \\"double quote\\""/)
 
-      assert [{:string, %{}, ~s/"\\"quote\\""/}] == lex(~s/"\\"quote\\""/)
       assert {:string, %{}, ~s/"invalid string\\"/} not in lex(~s/"invalid string\\"/)
     end
 
     test "tokenizes literal escaped characters correctly" do
-      assert [{:string, %{}, ~s/"\\b"/}] == lex(~s/"\\b"/)
-      assert [{:string, %{}, ~s/"\\\\b"/}] == lex(~s/"\\\\b"/)
+      assert [
+               {:string, %{}, "\""},
+               {:string_escape, %{}, "\\b"},
+               {:string, %{}, "\""}
+             ] = lex(~s/"\\b"/)
+
+      assert [
+               {:string, %{}, "\""},
+               {:string_escape, %{}, "\\\\"},
+               {:string, %{}, "b\""}
+             ] = lex(~s/"\\\\b"/)
+    end
+
+    test "tokenizes hex / octal / control escapes inside strings" do
+      assert [
+               {:string, %{}, ~s/"a/},
+               {:string_escape, %{}, ~s/\\xFF/},
+               {:string, %{}, "b\""}
+             ] = lex(~s/"a\\xFFb"/)
+
+      assert [
+               {:string, %{}, ~s/"a/},
+               {:string_escape, %{}, "\\077"},
+               {:string, %{}, "b\""}
+             ] = lex(~s/"a\\077b"/)
+
+      assert [
+               {:string, %{}, ~s/"a/},
+               {:string_escape, %{}, "\\^A"},
+               {:string, %{}, "b\""}
+             ] = lex(~s/"a\\^Ab"/)
     end
   end
 

From 7d236a658bf786a3ad8217cdbf34629f1dd85c09 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Lukas=20Backstr=C3=B6m?= <lukas@erlang.org>
Date: Wed, 6 May 2026 14:38:08 +0200
Subject: [PATCH 05/15] Recover keywords misclassified as function names
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The `function` rule eagerly matches any atom-shaped token followed
by `(` and tags it `:name_function`. For reserved words written
adjacent to `(` — most commonly `fun(X) -> ... end` — that loses
the keyword classification, because the postprocess pass only
checked `:string_symbol` tokens against the keyword list.

Add a postprocess clause that converts `:name_function` tokens
whose value is in the keyword list back to `:keyword`. Reserved
words can't legally be defined as function names in Erlang, so
this is unambiguous.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 lib/makeup/lexers/erlang_lexer.ex             |  8 +++++++
 .../erlang_lexer_tokenizer_test.exs           | 23 +++++++++++++++++++
 2 files changed, 31 insertions(+)

diff --git a/lib/makeup/lexers/erlang_lexer.ex b/lib/makeup/lexers/erlang_lexer.ex
index db3e7f4..b57daad 100644
--- a/lib/makeup/lexers/erlang_lexer.ex
+++ b/lib/makeup/lexers/erlang_lexer.ex
@@ -450,6 +450,14 @@ defmodule Makeup.Lexers.ErlangLexer do
   defp postprocess_helper([{:string_symbol, meta, value} | tokens]) when value in @keywords,
     do: [{:keyword, meta, value} | postprocess_helper(tokens)]
 
+  # Keywords followed by `(` are first matched by the `function` rule and
+  # tagged `:name_function`. Recover them here. The most common case is
+  # `fun(X) -> ... end`; the rule also covers any other keyword that gets
+  # written next to `(` (e.g. `if(X)` in a teaching example of invalid
+  # syntax).
+  defp postprocess_helper([{:name_function, meta, value} | tokens]) when value in @keywords,
+    do: [{:keyword, meta, value} | postprocess_helper(tokens)]
+
   defp postprocess_helper([{:string_symbol, meta, value} | tokens]) when value in @builtins,
     do: [{:name_builtin, meta, value} | postprocess_helper(tokens)]
 
diff --git a/test/makeup/erlang_lexer/erlang_lexer_tokenizer_test.exs b/test/makeup/erlang_lexer/erlang_lexer_tokenizer_test.exs
index 0bcc194..01dffc0 100644
--- a/test/makeup/erlang_lexer/erlang_lexer_tokenizer_test.exs
+++ b/test/makeup/erlang_lexer/erlang_lexer_tokenizer_test.exs
@@ -665,6 +665,29 @@ defmodule ErlangLexerTokenizer do
     end
   end
 
+  describe "fun keyword vs function call" do
+    test "fun(X) -> ... end tokenizes `fun` as keyword, not function name" do
+      assert [
+               {:keyword, %{}, "fun"},
+               {:punctuation, _, "("},
+               {:name, %{}, "X"},
+               {:punctuation, _, ")"} | _
+             ] = lex("fun(X) -> X end")
+    end
+
+    test "fun mod:func/2 still tokenizes correctly" do
+      assert [
+               {:keyword, %{}, "fun"},
+               {:whitespace, %{}, " "},
+               {:name_class, %{}, "mod"},
+               {:punctuation, %{}, ":"},
+               {:string_symbol, %{}, "func"},
+               {:punctuation, %{}, "/"},
+               {:number_integer, %{}, "2"}
+             ] = lex("fun mod:func/2")
+    end
+  end
+
   describe "native records (OTP 29)" do
     test "tokenizes external native record construction" do
       assert [

From 65258a802b3c1315d7136f8d4f898967a374849f Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Lukas=20Backstr=C3=B6m?= <lukas@erlang.org>
Date: Wed, 6 May 2026 14:52:17 +0200
Subject: [PATCH 06/15] Generate the BIF list at compile time from
 `erl_internal:bif/2`
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The static `@builtins` list had bit-rotted: it was missing
post-OTP-19 BIFs (`map_get/2`, `is_map_key/2`, `binary_part/2,3`,
`floor/1`, `ceil/1`, `min/2`, `max/2`, `unique_integer/{0,1}`,
`monotonic_time/{0,1}`, etc.) and contained at least one typo
(`resume_processround` — no such BIF, presumably a merge of
`resume_process` and `round`).

Replace it with a compile-time-generated list sourced from
`erl_internal:bif/2` — the same predicate the Erlang compiler
uses to decide what's auto-imported. Every rebuild of
`makeup_erlang` re-syncs the list with the OTP version we
compile against. 122 BIFs vs the previous ~85.

Also add a postprocess clause that converts `:name_function`
tokens whose value is a BIF back to `:name_builtin` (analogous
to the keyword-recovery clause). Closes makeup_erlang #13:
`length(L)` and similar BIF calls now render as builtins
instead of plain function calls.

The pre-existing string-symbol → name_builtin clause was
unchanged and still applies in non-`(`-followed positions
(e.g. `length` standalone in a documentation prose). Both
clauses share the same `@builtins` list.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 lib/makeup/lexers/erlang_lexer.ex             | 38 ++++++++-----------
 .../erlang_lexer_tokenizer_test.exs           | 32 +++++++++++++++-
 2 files changed, 46 insertions(+), 24 deletions(-)

diff --git a/lib/makeup/lexers/erlang_lexer.ex b/lib/makeup/lexers/erlang_lexer.ex
index b57daad..fe5c9e6 100644
--- a/lib/makeup/lexers/erlang_lexer.ex
+++ b/lib/makeup/lexers/erlang_lexer.ex
@@ -414,29 +414,15 @@ defmodule Makeup.Lexers.ErlangLexer do
 
   @keywords ~W[after begin case catch cond end fun if let of query receive try when maybe else]
 
-  @builtins ~W[
-    abs append_element apply atom_to_list binary_to_list bitstring_to_list
-    binary_to_term bit_size bump_reductions byte_size cancel_timer
-    check_process_code delete_module demonitor disconnect_node display
-    element erase exit float float_to_list fun_info fun_to_list
-    function_exported garbage_collect get get_keys group_leader hash
-    hd integer_to_list iolist_to_binary iolist_size is_atom is_binary
-    is_bitstring is_boolean is_builtin is_float is_function is_integer
-    is_list is_number is_pid is_port is_process_alive is_record is_reference
-    is_tuple length link list_to_atom list_to_binary list_to_bitstring
-    list_to_existing_atom list_to_float list_to_integer list_to_pid
-    list_to_tuple load_module localtime_to_universaltime make_tuple
-    md5 md5_final md5_update memory module_loaded monitor monitor_node
-    node nodes open_port phash phash2 pid_to_list port_close port_command
-    port_connect port_control port_call port_info port_to_list
-    process_display process_flag process_info purge_module put read_timer
-    ref_to_list register resume_processround send send_after send_nosuspend
-    set_cookie setelement size spawn spawn_link spawn_monitor spawn_opt
-    split_binary start_timer statistics suspend_process system_flag
-    system_info system_monitor system_profile term_to_binary tl trace
-    trace_delivered trace_info trace_pattern trunc tuple_size tuple_to_list
-    universaltime_to_localtime unlink unregister whereis
-  ]
+  # Auto-imported BIFs, sourced at compile time from `erl_internal:bif/2` —
+  # the same predicate the Erlang compiler uses to decide what's auto-imported.
+  # Refreshed every time `makeup_erlang` is rebuilt, so the list stays in sync
+  # with the OTP version we compile against and never bit-rots.
+  @builtins :erlang.module_info(:exports)
+            |> Enum.filter(fn {name, arity} -> :erl_internal.bif(name, arity) end)
+            |> Enum.map(fn {name, _arity} -> Atom.to_string(name) end)
+            |> Enum.uniq()
+            |> Enum.sort()
 
   @word_operators ~W[and andalso band bnot bor bsl bsr bxor div not or orelse rem xor]
 
@@ -461,6 +447,12 @@ defmodule Makeup.Lexers.ErlangLexer do
   defp postprocess_helper([{:string_symbol, meta, value} | tokens]) when value in @builtins,
     do: [{:name_builtin, meta, value} | postprocess_helper(tokens)]
 
+  # Same recovery for builtins: when a BIF is called as `length(L)` it is
+  # first matched by the `function` rule and tagged `:name_function`. Closes
+  # makeup_erlang #13.
+  defp postprocess_helper([{:name_function, meta, value} | tokens]) when value in @builtins,
+    do: [{:name_builtin, meta, value} | postprocess_helper(tokens)]
+
   defp postprocess_helper([{:string_symbol, meta, value} | tokens]) when value in @word_operators,
     do: [{:operator_word, meta, value} | postprocess_helper(tokens)]
 
diff --git a/test/makeup/erlang_lexer/erlang_lexer_tokenizer_test.exs b/test/makeup/erlang_lexer/erlang_lexer_tokenizer_test.exs
index 01dffc0..b1ca0bf 100644
--- a/test/makeup/erlang_lexer/erlang_lexer_tokenizer_test.exs
+++ b/test/makeup/erlang_lexer/erlang_lexer_tokenizer_test.exs
@@ -665,6 +665,36 @@ defmodule ErlangLexerTokenizer do
     end
   end
 
+  describe "builtin (BIF) recognition" do
+    # The @builtins list is generated at compile time from `erl_internal:bif/2`.
+    test "atoms that are auto-imported BIFs render as :name_builtin" do
+      assert [{:name_builtin, %{}, "length"}] = lex("length")
+      assert [{:name_builtin, %{}, "tuple_size"}] = lex("tuple_size")
+    end
+
+    test "BIF calls (`name(...)`) render as :name_builtin not :name_function" do
+      # makeup_erlang #13. Before this fix, `length(L)` rendered as a regular
+      # function call instead of a builtin.
+      assert [{:name_builtin, %{}, "length"} | _] = lex("length(L)")
+      assert [{:name_builtin, %{}, "is_atom"} | _] = lex("is_atom(X)")
+      assert [{:name_builtin, %{}, "tuple_size"} | _] = lex("tuple_size(T)")
+    end
+
+    test "post-OTP-19 BIFs are recognised (proves the static list is gone)" do
+      assert [{:name_builtin, %{}, "map_get"} | _] = lex("map_get(K, M)")
+      assert [{:name_builtin, %{}, "is_map_key"} | _] = lex("is_map_key(K, M)")
+      assert [{:name_builtin, %{}, "binary_part"} | _] = lex("binary_part(B, 0, 4)")
+      assert [{:name_builtin, %{}, "floor"} | _] = lex("floor(X)")
+      assert [{:name_builtin, %{}, "ceil"} | _] = lex("ceil(X)")
+    end
+
+    test "module_info and nif_error are not classified as BIFs" do
+      # Both are exported from `erlang` but neither is auto-imported.
+      refute Enum.any?(lex("module_info"), &match?({:name_builtin, _, "module_info"}, &1))
+      refute Enum.any?(lex("nif_error"), &match?({:name_builtin, _, "nif_error"}, &1))
+    end
+  end
+
   describe "fun keyword vs function call" do
     test "fun(X) -> ... end tokenizes `fun` as keyword, not function name" do
       assert [
@@ -1106,7 +1136,7 @@ defmodule ErlangLexerTokenizer do
                      *** argument 1: not an iolist term
              """) == [
                {:generic_prompt, %{selectable: false}, "1> "},
-               {:name_function, %{}, "list_to_binary"},
+               {:name_builtin, %{}, "list_to_binary"},
                {:punctuation, %{group_id: "group-1"}, "("},
                {:punctuation, %{group_id: "group-2"}, "<<"},
                {:punctuation, %{group_id: "group-2"}, ">>"},

From 31b82cbd5a3b10b77bdb8288cd1f2fd1936776f9 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Lukas=20Backstr=C3=B6m?= <lukas@erlang.org>
Date: Wed, 6 May 2026 15:08:52 +0200
Subject: [PATCH 07/15] Detect prompts after multi-line whitespace blocks

The `erl_prompt` rule used to require a literal `\n` immediately
before the prompt body. When the generic `whitespace` rule earlier
in the choice consumed a multi-character whitespace block ending in
`\n` (e.g. `"\n  \n1> ok."`), no `\n` remained at the prompt rule's
expected position and the prompt was lexed as plain
`[number_integer, operator]` instead of `:generic_prompt`. See
makeup_elixir #28 for the same-shape bug.

Match any leading whitespace block that contains at least one `\n`,
which keeps the rule anchored to a line boundary while tolerating
preceding spaces / tabs / further newlines. False-positives on `1 > 2`
and `x. 1> a.` are still rejected because neither contains a `\n`
between the operand and `>`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 lib/makeup/lexers/erlang_lexer.ex                   | 13 +++++++++++--
 .../erlang_lexer/erlang_lexer_tokenizer_test.exs    | 13 +++++++++++++
 2 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/lib/makeup/lexers/erlang_lexer.ex b/lib/makeup/lexers/erlang_lexer.ex
index fe5c9e6..fc305e8 100644
--- a/lib/makeup/lexers/erlang_lexer.ex
+++ b/lib/makeup/lexers/erlang_lexer.ex
@@ -309,10 +309,19 @@ defmodule Makeup.Lexers.ErlangLexer do
     |> concat(token("/", :punctuation))
     |> concat(number_integer)
 
-  # Erlang prompt
+  # Erlang prompt. Anchored to a line boundary by requiring the leading
+  # whitespace to contain at least one `\n`. The original rule required
+  # the `\n` immediately before the prompt body, which broke when the
+  # generic `whitespace` rule had already consumed the trailing `\n` of
+  # a multi-character whitespace block (see makeup_elixir #28). Allowing
+  # any leading non-newline whitespace before the `\n` and any further
+  # whitespace after lets the rule match in those cases without
+  # false-positiving on `1 > 2` or `x. 1> a.` (neither contains a `\n`
+  # in the relevant position).
   erl_prompt =
-    ascii_string([?\s, ?\r, ?\t], min: 0)
+    ascii_string([?\s, ?\f, ?\r, ?\t], min: 0)
     |> string("\n")
+    |> optional(ascii_string([?\s, ?\f, ?\r, ?\n, ?\t], min: 1))
     |> token(:whitespace)
     |> concat(
       optional(string("(") |> concat(atom_name) |> string(")"))
diff --git a/test/makeup/erlang_lexer/erlang_lexer_tokenizer_test.exs b/test/makeup/erlang_lexer/erlang_lexer_tokenizer_test.exs
index b1ca0bf..310fd3a 100644
--- a/test/makeup/erlang_lexer/erlang_lexer_tokenizer_test.exs
+++ b/test/makeup/erlang_lexer/erlang_lexer_tokenizer_test.exs
@@ -1059,6 +1059,19 @@ defmodule ErlangLexerTokenizer do
              ]
     end
 
+    # makeup_elixir #28 analogue. The whitespace rule used to consume
+    # multi-line whitespace blocks greedily, leaving no `\n` for the prompt
+    # rule to anchor against. The prompt rule now matches any leading
+    # whitespace block that contains a `\n`.
+    test "is detected after a multi-line whitespace block" do
+      assert [
+               {:whitespace, %{}, "\n  \n"},
+               {:generic_prompt, %{selectable: false}, "1> "},
+               {:string_symbol, %{}, "ok"},
+               {:punctuation, %{}, "."}
+             ] = lex("\n  \n1> ok.")
+    end
+
     test "with newlines" do
       assert lex("x. 1> a.") == [
                {:string_symbol, %{}, "x"},

From 13133427f5adafee0390e7ede756067a482a250f Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Lukas=20Backstr=C3=B6m?= <lukas@erlang.org>
Date: Wed, 6 May 2026 15:09:35 +0200
Subject: [PATCH 08/15] Lex `_5` / `_X` / `_unused` as a single variable

Erlang's grammar treats any identifier starting with `_` followed by
identifier characters as a variable (typically a "don't bother to
warn me about this" hint). The lexer was tokenising `_5` as
`[punctuation "_", number_integer 5]` because `_` appears in the
generic punctuation list and was matched before the variable rule.

Add a dedicated `underscore_identifier` rule that matches `_`
followed by at least one identifier character and emits `:name`,
placed before `punctuation` in the choice. Bare `_` (the wildcard
pattern) remains a punctuation token so themes can render the two
distinctly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 lib/makeup/lexers/erlang_lexer.ex             | 11 +++++++
 .../erlang_lexer_tokenizer_test.exs           | 30 +++++++++++++++++++
 2 files changed, 41 insertions(+)

diff --git a/lib/makeup/lexers/erlang_lexer.ex b/lib/makeup/lexers/erlang_lexer.ex
index fc305e8..b56ce45 100644
--- a/lib/makeup/lexers/erlang_lexer.ex
+++ b/lib/makeup/lexers/erlang_lexer.ex
@@ -96,6 +96,16 @@ defmodule Makeup.Lexers.ErlangLexer do
     ascii_string([?A..?Z, ?_], 1)
     |> optional(ascii_string([?a..?z, ?_, ?0..?9, ?A..?Z], min: 1))
 
+  # An underscore followed by at least one identifier character (`_5`,
+  # `_X`, `_unused`). Bare `_` stays as a punctuation token (the wildcard
+  # pattern), but `_<id>` is a variable in Erlang grammar and should
+  # render as `:name`. Without this rule the `_` is matched first by
+  # the `punctuation` rule and the rest of the identifier falls through.
+  underscore_identifier =
+    string("_")
+    |> ascii_string([?a..?z, ?_, ?0..?9, ?A..?Z], min: 1)
+    |> token(:name)
+
   simple_atom_name =
     ascii_string([?a..?z], 1)
     |> optional(ascii_string([?a..?z, ?_, ?@, ?0..?9, ?A..?Z], min: 1))
@@ -375,6 +385,7 @@ defmodule Makeup.Lexers.ErlangLexer do
         [
           native_record_external,
           record,
+          underscore_identifier,
           punctuation,
           # `tuple` might be unnecessary
           tuple,
diff --git a/test/makeup/erlang_lexer/erlang_lexer_tokenizer_test.exs b/test/makeup/erlang_lexer/erlang_lexer_tokenizer_test.exs
index 310fd3a..c26a977 100644
--- a/test/makeup/erlang_lexer/erlang_lexer_tokenizer_test.exs
+++ b/test/makeup/erlang_lexer/erlang_lexer_tokenizer_test.exs
@@ -80,6 +80,36 @@ defmodule ErlangLexerTokenizer do
     assert lex("A_b1") == [{:name, %{}, "A_b1"}]
   end
 
+  describe "underscore-prefixed variables" do
+    test "underscore + digit lexes as a single variable" do
+      assert lex("_5") == [{:name, %{}, "_5"}]
+    end
+
+    test "underscore + lowercase lexes as a single variable" do
+      assert lex("_unused") == [{:name, %{}, "_unused"}]
+    end
+
+    test "underscore + uppercase lexes as a single variable" do
+      assert lex("_X") == [{:name, %{}, "_X"}]
+    end
+
+    test "bare underscore (wildcard) stays as punctuation" do
+      # Pattern wildcard. Treat as punctuation so themes can render it
+      # distinctly from a variable name.
+      assert [
+               {:keyword, %{}, "case"},
+               {:whitespace, %{}, " "},
+               {:name, %{}, "X"},
+               {:whitespace, %{}, " "},
+               {:keyword, %{}, "of"},
+               {:whitespace, %{}, " "},
+               {:punctuation, %{}, "_"},
+               {:whitespace, %{}, " "},
+               {:punctuation, %{}, "->"} | _
+             ] = lex("case X of _ -> ok end")
+    end
+  end
+
   test "function call" do
     assert lex("f(") == [
              {:name_function, %{}, "f"},

From d9e00e1b2c915783b6127c142be2de51b5f87339 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Lukas=20Backstr=C3=B6m?= <lukas@erlang.org>
Date: Wed, 6 May 2026 15:10:15 +0200
Subject: [PATCH 09/15] Lock the current OTP module-attribute set with tests

The `module_attribute` rule already accepts any atom-shaped name as
the attribute, so all current and future OTP attributes work without
lexer changes. Add an explicit list of every current attribute
(`-callback`, `-optional_callbacks`, `-on_load`, `-nifs`,
`-deprecated`, `-removed`, `-feature`, `-export_type`, `-export_record`
and `-import_record` from the native-records work, plus the
historically-supported set) and assert each one tokenises as
`:name_attribute`. Catches accidental regressions if anyone ever
narrows the rule.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 .../erlang_lexer_tokenizer_test.exs           | 34 +++++++++++++++++++
 1 file changed, 34 insertions(+)

diff --git a/test/makeup/erlang_lexer/erlang_lexer_tokenizer_test.exs b/test/makeup/erlang_lexer/erlang_lexer_tokenizer_test.exs
index c26a977..11175bd 100644
--- a/test/makeup/erlang_lexer/erlang_lexer_tokenizer_test.exs
+++ b/test/makeup/erlang_lexer/erlang_lexer_tokenizer_test.exs
@@ -748,6 +748,40 @@ defmodule ErlangLexerTokenizer do
     end
   end
 
+  describe "OTP-current module attribute coverage" do
+    # The generic `module_attribute` rule accepts any `atom_name`, which
+    # means new attributes ship without lexer changes. Lock the current
+    # OTP-supported set with an explicit assertion list so the rule
+    # keeps covering them.
+    @known_attributes ~w[module export import behaviour behavior callback
+                         optional_callbacks on_load nifs deprecated removed
+                         feature compile export_type record export_record
+                         import_record spec type opaque doc moduledoc define
+                         ifdef ifndef else endif if elif vsn]
+
+    test "every current OTP module attribute lexes as :name_attribute" do
+      for attr <- @known_attributes do
+        # Use `(Body)` so the body is one well-known token. The point of
+        # the test is the attribute name, not the body shape.
+        expected = [
+          {:whitespace, %{}, "\n"},
+          {:punctuation, %{}, "-"},
+          {:name_attribute, %{}, attr},
+          {:punctuation, %{group_id: "group-1"}, "("},
+          {:name, %{}, "Body"},
+          {:punctuation, %{group_id: "group-1"}, ")"}
+        ]
+
+        actual = lex("\n-" <> attr <> "(Body)")
+
+        assert actual == expected,
+               "expected -#{attr} to lex as :name_attribute\n" <>
+                 "expected: #{inspect(expected)}\n" <>
+                 "actual:   #{inspect(actual)}"
+      end
+    end
+  end
+
   describe "native records (OTP 29)" do
     test "tokenizes external native record construction" do
       assert [

From 27f46947fb176666a22aaeda634af9391dfc1e91 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Lukas=20Backstr=C3=B6m?= <lukas@erlang.org>
Date: Wed, 6 May 2026 15:12:43 +0200
Subject: [PATCH 10/15] Distinguish parameterised macros from parameterless
 ones
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

`makeup_elixir` splits `@foo` and `@foo(...)` into different tokens.
The Erlang equivalent — `?FOO` vs `?FOO(args)` — used to collapse
both into `:name_constant`, and worse, the `?` operator in
`syntax_operators` was tried first in the choice and ate the leading
`?` of any macro reference, leaving `?FOO` to lex as
`[operator "?", name "FOO"]`.

Add a separate `macro_call` rule that matches `?<name>(`-style
references and emits `:name_function`, keep the existing `macro`
rule (now `:name_constant`) for parameterless references, and move
both ahead of `syntax_operators` in the choice so the operator
rule no longer captures the `?`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 lib/makeup/lexers/erlang_lexer.ex             | 20 +++++++++++++++-
 .../erlang_lexer_tokenizer_test.exs           | 24 +++++++++++++++++++
 2 files changed, 43 insertions(+), 1 deletion(-)

diff --git a/lib/makeup/lexers/erlang_lexer.ex b/lib/makeup/lexers/erlang_lexer.ex
index b56ce45..8612cb6 100644
--- a/lib/makeup/lexers/erlang_lexer.ex
+++ b/lib/makeup/lexers/erlang_lexer.ex
@@ -156,6 +156,20 @@ defmodule Makeup.Lexers.ErlangLexer do
 
   macro_name = choice([variable_name, atom_name])
 
+  # Parameterised macro reference: `?FOO(arg1, arg2)`. Tokenised
+  # separately from the parameterless form so themes can render the two
+  # distinctly (matches `makeup_elixir`'s split between `@foo` and
+  # `@foo(...)`). The macro head emits as `:name_function`; the trailing
+  # `(` opens the standard punctuation group so paren matching still
+  # works.
+  macro_call =
+    string("?")
+    |> concat(macro_name)
+    |> token(:name_function)
+    |> concat(optional(whitespace))
+    |> concat(token("(", :punctuation))
+
+  # Parameterless macro: `?FOO`. Constants by convention.
   macro =
     string("?")
     |> concat(macro_name)
@@ -386,6 +400,11 @@ defmodule Makeup.Lexers.ErlangLexer do
           native_record_external,
           record,
           underscore_identifier,
+          # Macros must be tried before `syntax_operators`, since the
+          # operator list contains `?` and `?=` and would otherwise eat the
+          # leading `?` of `?FOO` / `?FOO(X)`.
+          macro_call,
+          macro,
           punctuation,
           # `tuple` might be unnecessary
           tuple,
@@ -400,7 +419,6 @@ defmodule Makeup.Lexers.ErlangLexer do
           function_arity,
           function,
           atom,
-          macro,
           character,
           label,
           # If we can't parse any of the above, we highlight the next character as an error
diff --git a/test/makeup/erlang_lexer/erlang_lexer_tokenizer_test.exs b/test/makeup/erlang_lexer/erlang_lexer_tokenizer_test.exs
index 11175bd..4f28d94 100644
--- a/test/makeup/erlang_lexer/erlang_lexer_tokenizer_test.exs
+++ b/test/makeup/erlang_lexer/erlang_lexer_tokenizer_test.exs
@@ -725,6 +725,30 @@ defmodule ErlangLexerTokenizer do
     end
   end
 
+  describe "macros" do
+    test "parameterless macro tokenizes as :name_constant" do
+      assert lex("?FOO") == [{:name_constant, %{}, "?FOO"}]
+      assert lex("?bar") == [{:name_constant, %{}, "?bar"}]
+    end
+
+    test "parameterised macro head tokenizes as :name_function" do
+      assert [
+               {:name_function, %{}, "?FOO"},
+               {:punctuation, _, "("},
+               {:name, %{}, "X"},
+               {:punctuation, _, ")"}
+             ] = lex("?FOO(X)")
+    end
+
+    test "parameterless macro followed by punctuation stays as constant" do
+      # `?FOO,` shouldn't be lured into the parameterised form.
+      assert [
+               {:name_constant, %{}, "?FOO"},
+               {:punctuation, %{}, ","} | _
+             ] = lex("?FOO, X")
+    end
+  end
+
   describe "fun keyword vs function call" do
     test "fun(X) -> ... end tokenizes `fun` as keyword, not function name" do
       assert [

From a62069ab8ed9a6dbb4a4e13e8f4a2fdf988e70d0 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Lukas=20Backstr=C3=B6m?= <lukas@erlang.org>
Date: Wed, 6 May 2026 15:17:08 +0200
Subject: [PATCH 11/15] Support quadruple- and quintuple-quoted strings (OTP
 27+)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

OTP 27's triple-quoted-string spec extends to N quotes (N >= 3): an
opening run of N quotes on its own line opens the string and a
matching run of N quotes on its own line closes it. The lexer only
recognised N=3, so any string using a 4-quote opener (which is the
canonical way to embed a literal `"""` in the body) was lexed as
multiple unrelated tokens.

Add explicit `quadruple_quoted_string` and `quintuple_quoted_string`
rules — NimbleParsec doesn't support dynamic delimiter lengths, so
each width needs its own rule. Place the longer-quote variants
ahead of the triple-quote rule in the choice so the longest
matching opener wins.

Also extend `sigil_delimiters` with `""""\n` / `\n""""` and the
quintuple analogue (plus the matching `''''` / `'''''` variants),
so sigil-prefixed multi-quoted strings (`~b""""..."""" `,
`~B""""..."""" `, etc.) get the same coverage.

The sub-token vocabulary inside the body — `:string_escape` for
escape sequences, `:string_interpol` for `~p` / `~b` etc. — is
identical across all widths, since they all share the same element
list.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Lock OTP 27 sigil-delimiter spec coverage with tests

The spec at https://www.erlang.org/doc/system/data_types.html#sigil
defines the allowed sigil delimiters as:

* pair forms: `()` `[]` `{}` `<>`
* symmetric forms: `/` `|` `'` `"` `` ` `` `#`
* triple-quote forms: `"""` `'''` (with quad/quint extensions for
  bodies that need to contain a literal `"""` / `""""`)

The current `sigil_delimiters` list already covers every entry, but
nothing locked the coverage. Add per-delimiter tests so a future
narrowing of the list trips a test rather than silently dropping a
valid sigil form.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 lib/makeup/lexers/erlang_lexer.ex             | 31 +++++++
 .../erlang_lexer_tokenizer_test.exs           | 86 +++++++++++++++++++
 2 files changed, 117 insertions(+)

diff --git a/lib/makeup/lexers/erlang_lexer.ex b/lib/makeup/lexers/erlang_lexer.ex
index 8612cb6..4a05f97 100644
--- a/lib/makeup/lexers/erlang_lexer.ex
+++ b/lib/makeup/lexers/erlang_lexer.ex
@@ -224,10 +224,39 @@ defmodule Makeup.Lexers.ErlangLexer do
 
   erlang_string = string_like(~s/"/, ~s/"/, [escaped_char, string_interpol], :string)
 
+  # Multi-quoted strings (OTP 27+). The opening run of `"""` (or more) on
+  # its own line opens the string; a matching run on its own line closes
+  # it. Use a quadruple/quintuple opener when the body needs to contain
+  # `"""` literally. Each variant is a separate rule because NimbleParsec
+  # doesn't support dynamic delimiter lengths; longer-quote variants must
+  # be tried first so the triple-quote rule doesn't claim them prematurely.
+  quintuple_quoted_string =
+    lookahead_string(
+      string(~s/"""""\n/),
+      string(~s/\n"""""/),
+      [escaped_char, string_interpol]
+    )
+
+  quadruple_quoted_string =
+    lookahead_string(
+      string(~s/""""\n/),
+      string(~s/\n""""/),
+      [escaped_char, string_interpol]
+    )
+
   triple_quoted_string =
     lookahead_string(string(~s/"""\n/), string(~s/\n"""/), [escaped_char, string_interpol])
 
+  # Longer-quote variants must come first so the longest matching delimiter
+  # wins for sigils like `~"""""..."""""` (quintuple) or `~""""..."""" `
+  # (quadruple) — these are needed when the sigil body has to contain
+  # `"""` or `""""` literally, mirroring the rule for plain multi-quoted
+  # strings above.
   sigil_delimiters = [
+    {~s["""""\n], ~s[\n"""""]},
+    {"'''''\n", "\n'''''"},
+    {~s[""""\n], ~s[\n""""]},
+    {"''''\n", "\n''''"},
     {~s["""\n], ~s[\n"""]},
     {"'''\n", "\n'''"},
     {"\"", "\""},
@@ -392,6 +421,8 @@ defmodule Makeup.Lexers.ErlangLexer do
         hashbang,
         whitespace,
         comment,
+        quintuple_quoted_string,
+        quadruple_quoted_string,
         triple_quoted_string,
         erlang_string
       ] ++
diff --git a/test/makeup/erlang_lexer/erlang_lexer_tokenizer_test.exs b/test/makeup/erlang_lexer/erlang_lexer_tokenizer_test.exs
index 4f28d94..9533765 100644
--- a/test/makeup/erlang_lexer/erlang_lexer_tokenizer_test.exs
+++ b/test/makeup/erlang_lexer/erlang_lexer_tokenizer_test.exs
@@ -772,6 +772,92 @@ defmodule ErlangLexerTokenizer do
     end
   end
 
+  # https://www.erlang.org/doc/system/data_types.html#sigil
+  describe "sigil delimiters (OTP 27 spec coverage)" do
+    # Pair delimiters: () [] {} <>
+    test "pair delimiters" do
+      for {open, close} <- [{"(", ")"}, {"[", "]"}, {"{", "}"}, {"<", ">"}] do
+        src = "~b" <> open <> "hi" <> close
+
+        assert [{:string, %{}, ^src}] = lex(src),
+               "expected ~b#{open}hi#{close} to lex as a single :string"
+      end
+    end
+
+    # Symmetric delimiters: / | ' " ` #
+    test "symmetric delimiters" do
+      for delim <- ["/", "|", "'", "\"", "`", "#"] do
+        src = "~b" <> delim <> "hi" <> delim
+
+        assert [{:string, %{}, ^src}] = lex(src),
+               "expected ~b#{delim}hi#{delim} to lex as a single :string"
+      end
+    end
+
+    test "triple-quote and triple-single-quote" do
+      assert [{:string, %{}, "~b\"\"\"\nhi\n\"\"\""}] =
+               lex("~b\"\"\"\nhi\n\"\"\"")
+
+      assert [{:string, %{}, "~b'''\nhi\n'''"}] =
+               lex("~b'''\nhi\n'''")
+    end
+
+    test "all sigil prefix kinds (~ ~b ~B ~s ~S) work with the same delimiters" do
+      for prefix <- ["~", "~b", "~B", "~s", "~S"] do
+        src = prefix <> "/hi/"
+
+        assert [{:string, %{}, ^src}] = lex(src),
+               "expected #{prefix}/hi/ to lex as a single :string"
+      end
+    end
+  end
+
+  describe "multi-quoted strings (OTP 27+)" do
+    test "triple-quoted string lexes as a single :string" do
+      assert [{:string, %{}, "\"\"\"\nfoo\n\"\"\""}] = lex("\"\"\"\nfoo\n\"\"\"")
+    end
+
+    test "quadruple-quoted string lexes as a single :string" do
+      assert [{:string, %{}, "\"\"\"\"\nfoo\n\"\"\"\""}] =
+               lex("\"\"\"\"\nfoo\n\"\"\"\"")
+    end
+
+    test "quadruple-quoted string can contain triple quotes in its body" do
+      # The whole point of using a quadruple opener: lets the body include
+      # `"""` literally without ending the string.
+      assert [{:string, %{}, body}] =
+               lex("\"\"\"\"\nhello \"\"\" inside\n\"\"\"\"")
+
+      assert body =~ "\"\"\""
+    end
+
+    test "quintuple-quoted string can contain quadruple quotes in its body" do
+      assert [{:string, %{}, body}] =
+               lex("\"\"\"\"\"\nhi \"\"\"\" foo\n\"\"\"\"\"")
+
+      assert body =~ "\"\"\"\""
+    end
+
+    test "escape sub-tokens still emitted inside quadruple-quoted strings" do
+      assert [
+               {:string, %{}, "\"\"\"\"\nhi "},
+               {:string_escape, %{}, "\\xFF"},
+               {:string, %{}, " there\n\"\"\"\""}
+             ] = lex("\"\"\"\"\nhi \\xFF there\n\"\"\"\"")
+    end
+
+    test "sigil prefixes work with quadruple-quoted strings" do
+      assert [{:string, %{}, "~b\"\"\"\"\nfoo\n\"\"\"\""}] =
+               lex("~b\"\"\"\"\nfoo\n\"\"\"\"")
+
+      assert [{:string, %{}, "~B\"\"\"\"\nhello \"\"\" inside\n\"\"\"\""}] =
+               lex("~B\"\"\"\"\nhello \"\"\" inside\n\"\"\"\"")
+
+      assert [{:string, %{}, "~\"\"\"\"\nhi\n\"\"\"\""}] =
+               lex("~\"\"\"\"\nhi\n\"\"\"\"")
+    end
+  end
+
   describe "OTP-current module attribute coverage" do
     # The generic `module_attribute` rule accepts any `atom_name`, which
     # means new attributes ship without lexer changes. Lock the current

From 0cf6946f449a5a89edb63f988d9fba98e5ea615e Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Lukas=20Backstr=C3=B6m?= <lukas@erlang.org>
Date: Wed, 6 May 2026 15:29:56 +0200
Subject: [PATCH 12/15] Add tests for -doc / -moduledoc attributes

Doc attributes are nearly universal in OTP 27+ modules and the
canonical use case for triple-quoted strings. Lock coverage of
the common shapes: triple-quoted body, single-line string body,
and a `-doc """..."""` attribute followed by a function clause
(which exercises the boundary between the doc string close `"""`
and the function head).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 .../erlang_lexer_tokenizer_test.exs           | 55 +++++++++++++++++++
 1 file changed, 55 insertions(+)

diff --git a/test/makeup/erlang_lexer/erlang_lexer_tokenizer_test.exs b/test/makeup/erlang_lexer/erlang_lexer_tokenizer_test.exs
index 9533765..ed0bbd3 100644
--- a/test/makeup/erlang_lexer/erlang_lexer_tokenizer_test.exs
+++ b/test/makeup/erlang_lexer/erlang_lexer_tokenizer_test.exs
@@ -858,6 +858,61 @@ defmodule ErlangLexerTokenizer do
     end
   end
 
+  describe "doc / moduledoc attributes (OTP 27+)" do
+    test "moduledoc with triple-quoted body" do
+      src = "-moduledoc \"\"\"\nThis module does X.\n\"\"\""
+
+      assert [
+               {:punctuation, %{}, "-"},
+               {:name_attribute, %{}, "moduledoc"},
+               {:whitespace, %{}, " "},
+               {:string, %{}, "\"\"\"\nThis module does X.\n\"\"\""}
+             ] = lex(src)
+    end
+
+    test "doc attribute followed by a function definition" do
+      src = "-doc \"\"\"\nReturns true if X is positive.\n\"\"\".\nis_pos(X) when X > 0 -> true."
+
+      assert lex(src) == [
+               {:punctuation, %{}, "-"},
+               {:name_attribute, %{}, "doc"},
+               {:whitespace, %{}, " "},
+               {:string, %{}, "\"\"\"\nReturns true if X is positive.\n\"\"\""},
+               {:punctuation, %{}, "."},
+               {:whitespace, %{}, "\n"},
+               {:name_function, %{}, "is_pos"},
+               {:punctuation, %{group_id: "group-1"}, "("},
+               {:name, %{}, "X"},
+               {:punctuation, %{group_id: "group-1"}, ")"},
+               {:whitespace, %{}, " "},
+               {:keyword, %{}, "when"},
+               {:whitespace, %{}, " "},
+               {:name, %{}, "X"},
+               {:whitespace, %{}, " "},
+               {:operator, %{}, ">"},
+               {:whitespace, %{}, " "},
+               {:number_integer, %{}, "0"},
+               {:whitespace, %{}, " "},
+               {:punctuation, %{}, "->"},
+               {:whitespace, %{}, " "},
+               {:string_symbol, %{}, "true"},
+               {:punctuation, %{}, "."}
+             ]
+    end
+
+    test "doc with single-line string body still works" do
+      src = "-doc \"short\"."
+
+      assert [
+               {:punctuation, %{}, "-"},
+               {:name_attribute, %{}, "doc"},
+               {:whitespace, %{}, " "},
+               {:string, %{}, "\"short\""},
+               {:punctuation, %{}, "."}
+             ] = lex(src)
+    end
+  end
+
   describe "OTP-current module attribute coverage" do
     # The generic `module_attribute` rule accepts any `atom_name`, which
     # means new attributes ship without lexer changes. Lock the current

From bcad191ec5307c90d072f144990515603cd0c76a Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Lukas=20Backstr=C3=B6m?= <lukas@erlang.org>
Date: Wed, 6 May 2026 15:30:26 +0200
Subject: [PATCH 13/15] Add tests for function clauses with guards

Function-head guards exercise the interaction between several rule
families: keyword recognition (`when`), word operators
(`andalso`, `orelse`), comparison operators (`>`, `<`, `=/=`),
BIF recognition (`is_integer`, `is_atom`), and the comma/semicolon
guard separator. Lock the common shapes so a regression in any
one of those would surface as a guard test failure.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 .../erlang_lexer_tokenizer_test.exs           | 85 +++++++++++++++++++
 1 file changed, 85 insertions(+)

diff --git a/test/makeup/erlang_lexer/erlang_lexer_tokenizer_test.exs b/test/makeup/erlang_lexer/erlang_lexer_tokenizer_test.exs
index ed0bbd3..12871a7 100644
--- a/test/makeup/erlang_lexer/erlang_lexer_tokenizer_test.exs
+++ b/test/makeup/erlang_lexer/erlang_lexer_tokenizer_test.exs
@@ -749,6 +749,91 @@ defmodule ErlangLexerTokenizer do
     end
   end
 
+  describe "function clauses with guards" do
+    test "guard with operator and BIF" do
+      assert [
+               {:name_function, %{}, "f"},
+               {:punctuation, _, "("},
+               {:name, %{}, "X"},
+               {:punctuation, _, ")"},
+               {:whitespace, %{}, " "},
+               {:keyword, %{}, "when"},
+               {:whitespace, %{}, " "},
+               {:name, %{}, "X"},
+               {:whitespace, %{}, " "},
+               {:operator, %{}, ">"},
+               {:whitespace, %{}, " "},
+               {:number_integer, %{}, "0"},
+               {:punctuation, %{}, ","},
+               {:whitespace, %{}, " "},
+               {:name_builtin, %{}, "is_integer"},
+               {:punctuation, _, "("},
+               {:name, %{}, "X"},
+               {:punctuation, _, ")"},
+               {:whitespace, %{}, " "},
+               {:punctuation, %{}, "->"} | _
+             ] = lex("f(X) when X > 0, is_integer(X) -> X * 2.")
+    end
+
+    test "guard sequence with `;` (alternative guards)" do
+      assert lex("f(X) when X < 0; X > 100 -> out_of_range.") == [
+               {:name_function, %{}, "f"},
+               {:punctuation, %{group_id: "group-1"}, "("},
+               {:name, %{}, "X"},
+               {:punctuation, %{group_id: "group-1"}, ")"},
+               {:whitespace, %{}, " "},
+               {:keyword, %{}, "when"},
+               {:whitespace, %{}, " "},
+               {:name, %{}, "X"},
+               {:whitespace, %{}, " "},
+               {:operator, %{}, "<"},
+               {:whitespace, %{}, " "},
+               {:number_integer, %{}, "0"},
+               {:punctuation, %{}, ";"},
+               {:whitespace, %{}, " "},
+               {:name, %{}, "X"},
+               {:whitespace, %{}, " "},
+               {:operator, %{}, ">"},
+               {:whitespace, %{}, " "},
+               {:number_integer, %{}, "100"},
+               {:whitespace, %{}, " "},
+               {:punctuation, %{}, "->"},
+               {:whitespace, %{}, " "},
+               {:string_symbol, %{}, "out_of_range"},
+               {:punctuation, %{}, "."}
+             ]
+    end
+
+    test "guard with word operators (`andalso`, `orelse`)" do
+      assert lex("f(X) when is_atom(X) andalso X =/= undefined -> ok.") == [
+               {:name_function, %{}, "f"},
+               {:punctuation, %{group_id: "group-1"}, "("},
+               {:name, %{}, "X"},
+               {:punctuation, %{group_id: "group-1"}, ")"},
+               {:whitespace, %{}, " "},
+               {:keyword, %{}, "when"},
+               {:whitespace, %{}, " "},
+               {:name_builtin, %{}, "is_atom"},
+               {:punctuation, %{group_id: "group-2"}, "("},
+               {:name, %{}, "X"},
+               {:punctuation, %{group_id: "group-2"}, ")"},
+               {:whitespace, %{}, " "},
+               {:operator_word, %{}, "andalso"},
+               {:whitespace, %{}, " "},
+               {:name, %{}, "X"},
+               {:whitespace, %{}, " "},
+               {:operator, %{}, "=/="},
+               {:whitespace, %{}, " "},
+               {:string_symbol, %{}, "undefined"},
+               {:whitespace, %{}, " "},
+               {:punctuation, %{}, "->"},
+               {:whitespace, %{}, " "},
+               {:string_symbol, %{}, "ok"},
+               {:punctuation, %{}, "."}
+             ]
+    end
+  end
+
   describe "fun keyword vs function call" do
     test "fun(X) -> ... end tokenizes `fun` as keyword, not function name" do
       assert [

From 55660ff4f755124edc47616bb62e2020f6ba24be Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Lukas=20Backstr=C3=B6m?= <lukas@erlang.org>
Date: Wed, 6 May 2026 15:31:12 +0200
Subject: [PATCH 14/15] Add tests for map and bitstring comprehensions

Map comprehensions (OTP 26) and bitstring comprehensions
(pre-existing, but tests scarce) exercise the interactions between
several operator and punctuation tokens that the lexer hasn't
explicitly tested in combination: `=>` and `:=` next to `||`,
`<-`, `<=`, and the `\#{...}` map-open punctuation. Also lock
strict-generator `<:-` (OTP 27) coverage with an explicit
positive test rather than the operator-list catch-all.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 .../erlang_lexer_tokenizer_test.exs           | 77 +++++++++++++++++++
 1 file changed, 77 insertions(+)

diff --git a/test/makeup/erlang_lexer/erlang_lexer_tokenizer_test.exs b/test/makeup/erlang_lexer/erlang_lexer_tokenizer_test.exs
index 12871a7..a8b0d97 100644
--- a/test/makeup/erlang_lexer/erlang_lexer_tokenizer_test.exs
+++ b/test/makeup/erlang_lexer/erlang_lexer_tokenizer_test.exs
@@ -834,6 +834,83 @@ defmodule ErlangLexerTokenizer do
     end
   end
 
+  describe "newer comprehensions (OTP 26 / 27)" do
+    test "list comprehension with strict generator (OTP 27)" do
+      assert lex("[X || X <:- L]") == [
+               {:punctuation, %{group_id: "group-1"}, "["},
+               {:name, %{}, "X"},
+               {:whitespace, %{}, " "},
+               {:punctuation, %{}, "||"},
+               {:whitespace, %{}, " "},
+               {:name, %{}, "X"},
+               {:whitespace, %{}, " "},
+               {:operator, %{}, "<:-"},
+               {:whitespace, %{}, " "},
+               {:name, %{}, "L"},
+               {:punctuation, %{group_id: "group-1"}, "]"}
+             ]
+    end
+
+    test "map comprehension (OTP 26)" do
+      # `#{K => V * 2 || K := V <- M}` exercises map-open `\#{`,
+      # map arrow `=>`, comprehension separator `||`, map match
+      # operator `:=`, and the list-generator operator `<-`.
+      assert lex("\#{K => V * 2 || K := V <- M}") == [
+               {:punctuation, %{group_id: "group-1"}, "\#{"},
+               {:name, %{}, "K"},
+               {:whitespace, %{}, " "},
+               {:punctuation, %{}, "=>"},
+               {:whitespace, %{}, " "},
+               {:name, %{}, "V"},
+               {:whitespace, %{}, " "},
+               {:operator, %{}, "*"},
+               {:whitespace, %{}, " "},
+               {:number_integer, %{}, "2"},
+               {:whitespace, %{}, " "},
+               {:punctuation, %{}, "||"},
+               {:whitespace, %{}, " "},
+               {:name, %{}, "K"},
+               {:whitespace, %{}, " "},
+               {:punctuation, %{}, ":="},
+               {:whitespace, %{}, " "},
+               {:name, %{}, "V"},
+               {:whitespace, %{}, " "},
+               {:operator, %{}, "<-"},
+               {:whitespace, %{}, " "},
+               {:name, %{}, "M"},
+               {:punctuation, %{group_id: "group-1"}, "}"}
+             ]
+    end
+
+    test "bitstring comprehension with `<=` generator" do
+      # `<<>>` brackets, the bitstring-generator operator `<=`, and
+      # nested `<<>>` patterns inside.
+      assert lex("<< <<X:8>> || <<X:8>> <= Bin >>") == [
+               {:punctuation, %{group_id: "group-1"}, "<<"},
+               {:whitespace, %{}, " "},
+               {:punctuation, %{group_id: "group-2"}, "<<"},
+               {:name, %{}, "X"},
+               {:punctuation, %{}, ":"},
+               {:number_integer, %{}, "8"},
+               {:punctuation, %{group_id: "group-2"}, ">>"},
+               {:whitespace, %{}, " "},
+               {:punctuation, %{}, "||"},
+               {:whitespace, %{}, " "},
+               {:punctuation, %{group_id: "group-3"}, "<<"},
+               {:name, %{}, "X"},
+               {:punctuation, %{}, ":"},
+               {:number_integer, %{}, "8"},
+               {:punctuation, %{group_id: "group-3"}, ">>"},
+               {:whitespace, %{}, " "},
+               {:operator, %{}, "<="},
+               {:whitespace, %{}, " "},
+               {:name, %{}, "Bin"},
+               {:whitespace, %{}, " "},
+               {:punctuation, %{group_id: "group-1"}, ">>"}
+             ]
+    end
+  end
+
   describe "fun keyword vs function call" do
     test "fun(X) -> ... end tokenizes `fun` as keyword, not function name" do
       assert [

From 3ec73ecb152cd53bd4a32388e65ea3c0d48ea7c9 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Lukas=20Backstr=C3=B6m?= <lukas@erlang.org>
Date: Wed, 6 May 2026 15:32:16 +0200
Subject: [PATCH 15/15] Add integration test exercising a small real Erlang
 module

Most lexer tests are minimal isolated inputs that pin one rule's
output. The richer interaction-shaped failures (a rule's order in
the choice perturbing how a sibling rule fires) need a test that
threads many features through one input. Add a small module
fragment that combines:

* `-module` / `-export` attributes
* a `-doc """..."""` doc attribute with multi-line body
* a function head with a `when` guard and BIF call
* a map comprehension (`#{K => V || K := V <- M, ...}`)
* a body with comparison operator and number

If a future change breaks any of those rules' interactions, this
test catches it whereas the per-feature tests would still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 .../erlang_lexer_tokenizer_test.exs           | 97 +++++++++++++++++++
 1 file changed, 97 insertions(+)

diff --git a/test/makeup/erlang_lexer/erlang_lexer_tokenizer_test.exs b/test/makeup/erlang_lexer/erlang_lexer_tokenizer_test.exs
index a8b0d97..c4baa6b 100644
--- a/test/makeup/erlang_lexer/erlang_lexer_tokenizer_test.exs
+++ b/test/makeup/erlang_lexer/erlang_lexer_tokenizer_test.exs
@@ -911,6 +911,103 @@ defmodule ErlangLexerTokenizer do
     end
   end
 
+  describe "real-world module fragment (integration)" do
+    # Exercises module attribute, doc string, function head with guard,
+    # body with map, BIF call, and a record. If any rule's choice order
+    # gets perturbed, this is the test most likely to catch it.
+    test "small module with -doc, guard, map, and BIF call" do
+      src = """
+
+      -module(positives).
+      -export([keep/1]).
+
+      -doc \"\"\"
+      Keep map entries whose values are positive integers.
+      \"\"\".
+      keep(M) when is_map(M) ->
+          \#{K => V || K := V <- M, is_integer(V), V > 0}.
+      """
+
+      assert lex(src) == [
+               {:whitespace, %{}, "\n"},
+               {:punctuation, %{}, "-"},
+               {:name_attribute, %{}, "module"},
+               {:punctuation, %{group_id: "group-1"}, "("},
+               {:string_symbol, %{}, "positives"},
+               {:punctuation, %{group_id: "group-1"}, ")"},
+               {:punctuation, %{}, "."},
+               {:whitespace, %{}, "\n"},
+               {:punctuation, %{}, "-"},
+               {:name_attribute, %{}, "export"},
+               {:punctuation, %{group_id: "group-2"}, "("},
+               {:punctuation, %{group_id: "group-3"}, "["},
+               {:string_symbol, %{}, "keep"},
+               {:punctuation, %{}, "/"},
+               {:number_integer, %{}, "1"},
+               {:punctuation, %{group_id: "group-3"}, "]"},
+               {:punctuation, %{group_id: "group-2"}, ")"},
+               {:punctuation, %{}, "."},
+               {:whitespace, %{}, "\n"},
+               {:whitespace, %{}, "\n"},
+               {:punctuation, %{}, "-"},
+               {:name_attribute, %{}, "doc"},
+               {:whitespace, %{}, " "},
+               {:string, %{},
+                "\"\"\"\nKeep map entries whose values are positive integers.\n\"\"\""},
+               {:punctuation, %{}, "."},
+               {:whitespace, %{}, "\n"},
+               {:name_function, %{}, "keep"},
+               {:punctuation, %{group_id: "group-4"}, "("},
+               {:name, %{}, "M"},
+               {:punctuation, %{group_id: "group-4"}, ")"},
+               {:whitespace, %{}, " "},
+               {:keyword, %{}, "when"},
+               {:whitespace, %{}, " "},
+               {:name_builtin, %{}, "is_map"},
+               {:punctuation, %{group_id: "group-5"}, "("},
+               {:name, %{}, "M"},
+               {:punctuation, %{group_id: "group-5"}, ")"},
+               {:whitespace, %{}, " "},
+               {:punctuation, %{}, "->"},
+               {:whitespace, %{}, "\n    "},
+               {:punctuation, %{group_id: "group-6"}, "\#{"},
+               {:name, %{}, "K"},
+               {:whitespace, %{}, " "},
+               {:punctuation, %{}, "=>"},
+               {:whitespace, %{}, " "},
+               {:name, %{}, "V"},
+               {:whitespace, %{}, " "},
+               {:punctuation, %{}, "||"},
+               {:whitespace, %{}, " "},
+               {:name, %{}, "K"},
+               {:whitespace, %{}, " "},
+               {:punctuation, %{}, ":="},
+               {:whitespace, %{}, " "},
+               {:name, %{}, "V"},
+               {:whitespace, %{}, " "},
+               {:operator, %{}, "<-"},
+               {:whitespace, %{}, " "},
+               {:name, %{}, "M"},
+               {:punctuation, %{}, ","},
+               {:whitespace, %{}, " "},
+               {:name_builtin, %{}, "is_integer"},
+               {:punctuation, %{group_id: "group-7"}, "("},
+               {:name, %{}, "V"},
+               {:punctuation, %{group_id: "group-7"}, ")"},
+               {:punctuation, %{}, ","},
+               {:whitespace, %{}, " "},
+               {:name, %{}, "V"},
+               {:whitespace, %{}, " "},
+               {:operator, %{}, ">"},
+               {:whitespace, %{}, " "},
+               {:number_integer, %{}, "0"},
+               {:punctuation, %{group_id: "group-6"}, "}"},
+               {:punctuation, %{}, "."},
+               {:whitespace, %{}, "\n"}
+             ]
+    end
+  end
+
   describe "fun keyword vs function call" do
     test "fun(X) -> ... end tokenizes `fun` as keyword, not function name" do
       assert [