Skip to content

390 ignore linemarkers#498

Open
hiker wants to merge 5 commits intomasterfrom
390_ignore_linemarkers
Open

390 ignore linemarkers#498
hiker wants to merge 5 commits intomasterfrom
390_ignore_linemarkers

Conversation

@hiker
Copy link
Copy Markdown
Collaborator

@hiker hiker commented Apr 13, 2026

Fixes #390 by supporting linemarkers (# 123 "test.f90" - indicating line number and file).

I have used the existing preprocessor directive support, since linemarkers looks pretty much like a preprocessor directive (starting with #). But I was unable to get it to verify properly out of the box - I feel like the existing implementation assumes that the value is constant (e.g. #ifdef), while in the case of a linemarker we don't have this (since it would need the line number and filename to verify correctness, and I couldn't work out how to do this). While it worked this way, it was unable to detect invalid linemarkers (# abc "xx.f90" or # 123 'no_single_quote_allowed.f90'). So, instead I added a simple regex check at the beginning, and only if that passes, it will use the established path.

I am happy to reimplement (I know there is pattern_tools.digit_string and file_name ... I just couldn't figure out how to plug this all together.

@hiker
Copy link
Copy Markdown
Collaborator Author

hiker commented Apr 13, 2026

I could also not find an official syntax for a linemaker, but looked at https://git.cels.anl.gov/ksalapenades/llvm-project/-/blob/develop/clang/test/Preprocessor/line-directive.c (and did some compiler tests to verify what gets rejected)

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 13, 2026

Codecov Report

❌ Patch coverage is 92.85714% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 92.21%. Comparing base (a197a85) to head (8eb3954).

Files with missing lines Patch % Lines
src/fparser/two/C99Preprocessor.py 90.90% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master     #498   +/-   ##
=======================================
  Coverage   92.21%   92.21%           
=======================================
  Files          87       87           
  Lines       13832    13845   +13     
=======================================
+ Hits        12755    12767   +12     
- Misses       1077     1078    +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@hiker
Copy link
Copy Markdown
Collaborator Author

hiker commented Apr 13, 2026

Checking the git log, it appears this code was contributed by @reuterbal - I wonder if you would have time to provide some feedback? Thanks!

Copy link
Copy Markdown

@reuterbal reuterbal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to provide my 2ct here. I agree that linemarkers should be handled like preprocessor directives and the implementation itself looks good to me.

As for the need for the additional regex validation: I believe you only need to encode the full linemarker specification as a regex pattern to use in WORDClsBase.match - either manually or using the building blocks in pattern_tools - e.g., like here:

part_ref = name + ~((r"[(]" + name + r"[)]"))

Without testing it, I would expect this would maybe need to look like something like this:

_pattern = r"^\s*#\s+" +  digit_string + "\s+" + file_name

Comment on lines +457 to +466
"line_ref",
[
('# 123 "file"', '# 123 "file"'),
(' # 123 "file" ', '# 123 "file"'),
('# 123 "file" 1 3', '# 123 "file" 1 3'),
],
)
def test_linemarker(line_ref):
"""Test that #line is recognized"""
line, ref = line_ref
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NB: you can directly specify the parametrization as multiple variables:

Suggested change
"line_ref",
[
('# 123 "file"', '# 123 "file"'),
(' # 123 "file" ', '# 123 "file"'),
('# 123 "file" 1 3', '# 123 "file" 1 3'),
],
)
def test_linemarker(line_ref):
"""Test that #line is recognized"""
line, ref = line_ref
"line, ref",
[
('# 123 "file"', '# 123 "file"'),
(' # 123 "file" ', '# 123 "file"'),
('# 123 "file" 1 3', '# 123 "file" 1 3'),
],
)
def test_linemarker(line, ref):
"""Test that #line is recognized"""

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, I totally forgot about that. Thanks!

Comment thread src/fparser/two/C99Preprocessor.py Outdated

# The match method will check that it is a valid linemarker, i.e.
# it has a line number, and file name in double quotes.
_pattern = pattern.Pattern("<linemarker>", r"^\s*#", value="#")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that most Fortran compilers don't accept white space in front of the # for preprocessor directives. Are they more forgiving for line markers?
But since fparser allows for white space in front of directives, it's probably wise to allow them for marker as well.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually remember that I was wondering about the \s at the beginning (since I did some testing originally, and knew it's not accepted). But since all regex here started with #\s* .. I just followed the trend :) And I'll agree to leave it in for consistency (and because I am afraid of breaking something).

Comment thread src/fparser/two/C99Preprocessor.py Outdated
"""
Linemarker

linemarker-stmt is # digit-sequence [ "s-char-sequence" ] [digit ...]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This suggests the filename is optional - if that is the case, then the test parameterisation should be expanded by such a case.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I re-checked it - it's hard to find the official syntax - there is https://gcc.gnu.org/onlinedocs/gcc-3.0.2/cpp_9.html, but it's not really the proper grammar, e.g. it does not specify the quotes, and as such it's not clear if it must be double quotes, or can be single quotes. FWIW, I tested it, and it appears it must be double quote.

Supporting an optional filename is a gnu extension. And as such, we shouldn't support it. I've updated the comment accordingly.

@hiker
Copy link
Copy Markdown
Collaborator Author

hiker commented Apr 16, 2026

Hi Balthasar,

thanks a lot for the quick review, really appreciated.

Happy to provide my 2ct here. I agree that linemarkers should be handled like preprocessor directives and the implementation itself looks good to me.

I am glad that I made the right decision here :)

As for the need for the additional regex validation: I believe you only need to encode the full linemarker specification as a regex pattern to use in WORDClsBase.match - either manually or using the building blocks in pattern_tools - e.g., like here:

part_ref = name + ~((r"[(]" + name + r"[)]"))

Without testing it, I would expect this would maybe need to look like something like this:

_pattern = r"^\s*#\s+" +  digit_string + "\s+" + file_name

I tried that, and thanks a lot for the example. Looking at the +digit_string..., made me realise that Pattern overwrites __add_-, but that doesn't work here since the first expression is a string. But I could use digit_string.pattern etc. But then file_name.pattern started with ^, so I couldn't use this. And since digit_string.pattern was pretty long with no apparent advantage over \d+, I actually just removed it and used a simple regex instead.

But then I am stuck with my original problem: I am using:

    _pattern = pattern.Pattern("<linemarker>", r"^\s*#\s+\d+\s+\".*\"")

But if I do this, the matching in WORDClsBase uses _pattern.value ... which is None. _pattern.value is a string, so I can set it to # (which I did in the state that you reviewed it), but then I don't get the rest of the match: The match function basically does:

           my_match = keyword.match(string)
            if my_match is None:
                return None
            line = string[len(my_match.group()) :]
            pattern_value = keyword.value
...

Basically, the rest of the match is discarded (line number and filename) :(

I tried to fix that using:

           my_match = keyword.match(string)
            if my_match is None:
                return None
            line = string[len(my_match.group()) :]
            if keyword.value:
                pattern_value = keyword.value
            else:
                pattern_value = my_match.group()

So, I provide value=None for the linemarker class, and that then triggers returning the fully matched string, including line numbers and filename.

This passed nearly all tests, except it would not remove spaces between "#" and line number and line number and file name. Therefore, I adjusted one linemarker test (but the test still checks that leading spaces before the # are removed, which is consistent with other code in C99_preprocessor). Small changes to __str__ were required.

I am not sure if this is the right solution tbh :(

Maybe it should not be based on WORDClsBase, but instead use the full grammar? But that also feels like an total overkill.

@reuterbal , @arporter , @sergisiso - feedback appreciated! The modification of the match function seems to have no other side effect, all tests were passing. But I am not keen on changing something that might be quite fundamental ;)

Copy link
Copy Markdown

@reuterbal reuterbal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, the details of the keyword/value returns were lost on me and your proposal seems sensible to me.
Overall I think that's indeed much better now, no further comments!

Copy link
Copy Markdown
Member

@arporter arporter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me Joerg, thanks. I could do with a bit of help understanding some of the changes - probably because I know pretty much nothing about the Pattern class.
I think it would also be good to test the new functionality in context - i.e. provide some Fortran containing some linemarkers and check that things work OK.


"""C99 Preprocessor Syntax Rules."""
"""C99 Preprocessor Syntax Rules. It also supports linemarker statements
(which are technically not preprocessor directives, but are very close
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


class Cpp_Linemarker_Stmt(WORDClsBase): # Linemarker
"""
Linemarker
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit more detail would be good here. i.e. explain what a Linemarker is.


:param str string: the string to match with as a line statement.

:return: a tuple of size 1 with the right hand side as a string, \
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No line-continuation and please update to use type hints.

@staticmethod
def match(string):
"""Implements the matching for a linemarker.
The right hand side of the directive is not matched any further
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the "right hand side of the directive" mean?

require_cls=False,
)

def tostr(self):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typehints pls.

Returns the line marker as string. Note that fparser accepts
spaces before the `#`, but it should remove the spaces, hence
we lstrip the result
:return: this linemarker as a string.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blank line before :return: please.

Comment thread src/fparser/two/utils.py
return None
line = string[len(my_match.group()) :]
pattern_value = keyword.value
# If no constant return value is defined,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need a bit of help here: what does "no constant return value" mean?

@arporter arporter added ready for review reviewed with actions PR has been reviewed and is back with developer and removed under review ready for review labels Apr 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement reviewed with actions PR has been reviewed and is back with developer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Handle line markers

3 participants