Retaining significant whitespace within `MacroExpression`s within `MacroVerbatim` #15307

Blacksmoke16 · 2024-12-21T19:54:58Z

Discussion

Background

Currently if you have code like:

{%
  10

  # Foo

  20
  30

  # Bar

  40
%}
{%
  50
  60
%}

The parser is able to properly attribute location information to each number, taking into account the newlines and comments:

{value: "10", line_number: 2}
{value: "20", line_number: 6}
{value: "30", line_number: 7}
{value: "40", line_number: 11}
{value: "50", line_number: 14}
{value: "60", line_number: 15}

However, when used as part of a MacroVerbatim node like:

macro finished
  {% verbatim do %}
    {%
      10

      # Foo

      20
      30

      # Bar

      40
    %}
    {%
      50
      60
    %}
  {% end %}
end

NOTE: The macro finished isn't really needed to reproduce this problem, but is included for demonstration purposes to make generating the output easier.

The results are now not totally accurate:

{value: "10", line_number: 2}
{value: "20", line_number: 3}
{value: "30", line_number: 4}
{value: "40", line_number: 5}
{value: "50", line_number: 7}
{value: "60", line_number: 8}

This is because the line numbers for each node are now based on the to_s representation of the MacroExpressions within the MacroVerbatim versus the actual source code. E.g.

# There's a newline here
    {% 10
20
30
40
 %}
    {% 50
60
 %}

The Problem

#15305 makes things a little bit more accurate by retaining the newline after the {% is present. E.g.

{value: "10", line_number: 3}
{value: "20", line_number: 4}
{value: "30", line_number: 5}
{value: "40", line_number: 6}
{value: "50", line_number: 9}
{value: "60", line_number: 10}

# There's a newline here
    {%
  10
  20
  30
  40
%}
    {%
  50
  60
%}

However things are still not quite 100% correct. We can make things a bit better by summing the node's location with the location of the MacroVerbatim node itself (removing the 0 || on like 592 in interpreter.cr in the appendix diff):

{value: "10", line_number: 4}
{value: "20", line_number: 5}
{value: "30", line_number: 6}
{value: "40", line_number: 7}
{value: "50", line_number: 10}
{value: "60", line_number: 11}

The number 10 now at least has the proper location, but all the others are still not correct. The gist of why the other numbers are off is because notice in the stringified output, all the whitespace and comments in the macro expression have been stripped. Thus they are not taken into consideration when the code is re-parsed as part of the macro expansion logic.

Not having the nodes have the proper location information in this context makes #14880 much harder as it relies on having proper line numbers to map to the source code when generating the coverage report.

Proposed Solution

So far the most robust solution I can think of is to have the parser include MacroLiteral nodes when parsing macro expressions that represent comments and extra newlines. This would ensure that the stringified macro expression includes these newlines so the re-parsed code is able to map to the line numbers of the source.

WDYT?

EDIT: Is this maybe something we could leverage location pragmas for?

Appendix

Diff used to generate the outputs

diff --git a/src/compiler/crystal/macros/interpreter.cr b/src/compiler/crystal/macros/interpreter.cr
index 978c57470..d7541a8af 100644
--- a/src/compiler/crystal/macros/interpreter.cr
+++ b/src/compiler/crystal/macros/interpreter.cr
@@ -82,6 +82,10 @@ module Crystal
       @last = Nop.new
     end
 
+    def is_test_file?
+      @location.try &.original_filename.as?(String).try &.ends_with? "test.cr"
+    end
+
     def define_var(name : String, value : ASTNode) : Nil
       @vars[name] = value
     end
@@ -584,6 +588,10 @@ module Crystal
     end
 
     def visit(node : Nop | NilLiteral | BoolLiteral | NumberLiteral | CharLiteral | StringLiteral | SymbolLiteral | RangeLiteral | RegexLiteral | MacroId | TypeNode | Def)
+      if self.is_test_file?
+        pp({value: node.to_s, line_number: (node.location.try(&.line_number) || 0) + (0 || node.location.try(&.macro_location.try(&.line_number)) || 0)})
+      end
+
       @last = node.clone_without_location
       false
     end
diff --git a/src/compiler/crystal/macros/macros.cr b/src/compiler/crystal/macros/macros.cr
index a5d5714f1..1b68e3b53 100644
--- a/src/compiler/crystal/macros/macros.cr
+++ b/src/compiler/crystal/macros/macros.cr
@@ -27,7 +27,18 @@ class Crystal::Program
     interpreter = MacroInterpreter.new self, scope, path_lookup || scope, node.location, def: a_def, in_macro: false
     interpreter.free_vars = free_vars
     node.accept interpreter
-    {interpreter.to_s, interpreter.macro_expansion_pragmas}
+    output_str = interpreter.to_s
+
+    if interpreter.is_test_file?
+      puts
+      puts
+
+      puts output_str
+
+      puts
+      puts
+    end
+    {output_str, interpreter.macro_expansion_pragmas}
   end
 
   def parse_macro_source(generated_source, macro_expansion_pragmas, the_macro, node, vars, current_def = nil, inside_type = false, inside_exp = false, mode : Parser::ParseMode = :normal, visibility : Visibility = :public)

The text was updated successfully, but these errors were encountered:

Blacksmoke16 added status:discussion topic:compiler:parser topic:lang:macros labels Dec 21, 2024

Blacksmoke16 linked a pull request Dec 23, 2024 that will close this issue

Handle properly stringifying multiline macro expressions #15305

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retaining significant whitespace within `MacroExpression`s within `MacroVerbatim` #15307

Retaining significant whitespace within `MacroExpression`s within `MacroVerbatim` #15307

Blacksmoke16 commented Dec 21, 2024 •

edited

Loading

Retaining significant whitespace within MacroExpressions within MacroVerbatim #15307

Retaining significant whitespace within MacroExpressions within MacroVerbatim #15307

Comments

Blacksmoke16 commented Dec 21, 2024 • edited Loading

Discussion

Background

The Problem

Proposed Solution

Appendix

Retaining significant whitespace within `MacroExpression`s within `MacroVerbatim` #15307

Retaining significant whitespace within `MacroExpression`s within `MacroVerbatim` #15307

Blacksmoke16 commented Dec 21, 2024 •

edited

Loading