Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Fix global strategy highlighting #132

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open

Conversation

HiPhish
Copy link
Owner

@HiPhish HiPhish commented Aug 14, 2024

I am opening this PR for the sake of transparency. This PR is my attempt at fixing the broken highlighting caused by Neovim 0.10, it involves a rewrite of the core highlighting algorithm to use iter_matches rather than iter_captures now that the bug that make iter_matches unworkable has been fixed in Neovim.

As of the time of writing this all tests pass, but that does not mean that I don't have any blind spots in my tests. Now is the time to actually try it out. Rewriting the highlighting has revealed a number of bugs, so of which are not even visible, in the old code, so there have been quite a few commits to master as well. Some patterns had to be dropped unfortunately because they never worked correctly to begin with (usually because nested parentheses would be on the same level in the tree).

I have also found the previous highlighting code to be lacking in structure and hard to understand. I was making new data structure types up as I went along and I was abusing stacks as ad-hoc sets. Thus I used this opportunity to introduce a proper set type and properly define what a match and what a match tree are. These definitions will probably change before I merge this branch into master though.

@Danielkonge Thank you for the time you have invested and the PRs you have opened, but I think this one obsoletes them.

@Danielkonge
Copy link
Contributor

Danielkonge commented Aug 17, 2024

The new global strategy looks good to me, I played around with something similar and it generally worked fine, but I wanted to play around with making a blocks query for python and couldn't do that without something more complicated.

I have a few comments for problems I have noted:

  • on_changedtree sometimes give us bad (as in not precise enough) changes on undo, when using snippets and when pasting something copied, and that can mess with highlighting. (I don't have a good solution for this.)
  • The check that parent_lang ~= lang should really only be relevant for rust from what I can tell, and the main problem stems from the way neovim uses injections of a language inside the language itself. In general, I think we would have much nicer code here, if neovim didn't do this, and I am not really sure why they have made that choice.
  • I played around with updating highlights on screenupdates and during that I noticed a small optimization we might want to add. Instead of doing string comparisons, we can compare the ids of container, delimiter and sentinel instead, which should be faster. I haven't tested if this actually makes any measurable difference, but the code change is quite easy, if you want to do this. Something like this before the main part of the code:
-- we get id's in the order they show up in the query file, which
-- will usually be
local delimiter_id, sentinel_id, container_id = 1, 2, 3
-- this will fix any mistaken order
for id, name in ipairs(query.captures) do
    -- captures should only contain at most these three
    if name == 'delimiter' then
        delimiter_id = id
    elseif name == 'sentinel' then
        sentinel_id = id
    elseif name == 'container' then
        container_id = id
    end
end

And then compare e.g. id == delimiter_id instead of capture == 'delimiter' (after local capture = query.captures[id]).

@HiPhish HiPhish force-pushed the fix-highlighting branch 2 times, most recently from 5def45d to 2b96c6e Compare August 30, 2024 23:19
@HiPhish
Copy link
Owner Author

HiPhish commented Aug 31, 2024

I think the problem with Python is that the body of a statement is not sandwiched between two parent delimiters. Let's take the following piece of Python:

if True:
    print([{'a': {'b', 'c'}}])

The full tree is

(if_statement ; [0, 0] - [1, 30]
  "if" ; [0, 0] - [0, 2]
  condition: (true) ; [0, 3] - [0, 7]
  ":" ; [0, 7] - [0, 8]
  consequence: (block ; [1, 4] - [1, 30]
    (expression_statement ; [1, 4] - [1, 30]
      (call ; [1, 4] - [1, 30]
        function: (identifier) ; [1, 4] - [1, 9]
        arguments: (argument_list ; [1, 9] - [1, 30]
          "(" ; [1, 9] - [1, 10]
          (list ; [1, 10] - [1, 29]
            "[" ; [1, 10] - [1, 11]
            (dictionary ; [1, 11] - [1, 28]
              "{" ; [1, 11] - [1, 12]
              (pair ; [1, 12] - [1, 27]
                key: (string ; [1, 12] - [1, 15]
                  (string_start) ; [1, 12] - [1, 13]
                  (string_content) ; [1, 13] - [1, 14]
                  (string_end)) ; [1, 14] - [1, 15]
                ":" ; [1, 15] - [1, 16]
                value: (set ; [1, 17] - [1, 27]
                  "{" ; [1, 17] - [1, 18]
                  (string ; [1, 18] - [1, 21]
                    (string_start) ; [1, 18] - [1, 19]
                    (string_content) ; [1, 19] - [1, 20]
                    (string_end)) ; [1, 20] - [1, 21]
                  "," ; [1, 21] - [1, 22]
                  (string ; [1, 23] - [1, 26]
                    (string_start) ; [1, 23] - [1, 24]
                    (string_content) ; [1, 24] - [1, 25]
                    (string_end)) ; [1, 25] - [1, 26]
                  "}")) ; [1, 26] - [1, 27]
              "}") ; [1, 27] - [1, 28]
            "]") ; [1, 28] - [1, 29]
          ")")))))

I would like to highlight the if as level 1, and the parentheses from print(...) as level two. The query for the if statement is

(if_statement
  "if" @delimiter) @container

There is no matching delimiter to sandwich consequence between, so the parentheses get highlighted as level one, even though the nesting of containers is correct. We do not have this problem in Lua where there is always a closing end delimiter.

The problem is definitely lack of a closing delimiter. If I make both the for and in of a for-loop into delimiters and there is a parenthesized expression between them it will get highlighted properly.

This is definitely something I would like to get working. I want rainbow blocks in Python as well, but until now that was impossible because of the lack of a sentinel node. But now that we no longer need sentinels it should be doable.

@HiPhish
Copy link
Owner Author

HiPhish commented Sep 1, 2024

I now know why sandwiching is required: the algorithm to determine nesting assumes that matches are returned in a certain order which is outlined in the HACKING file (when exiting a node according to a depth-first visitor pattern). This assumption only holds true if child matches are sandwiched between delimiters. Otherwise the match will be returned upon entering the node.

This means I need to modify the nesting algorithm to account for the reverse case. I have updated the HACKING file with the current state of knowledge so far.

@HiPhish
Copy link
Owner Author

HiPhish commented Sep 1, 2024

OK, I give up on Python for the time being. Trying to get rainbow-blocks working for Python has given me some deeper insight though, so I do think that it is doable in the future, but the priority is getting this fix out of the door. A my findings are documented in the HACKING file.

The problem is that right now I am assuming that for two matches m1 and m2 if m1 is not an ancestor of m2 they must be cousins. This works fine as long as the body is sandwiched between two delimiters. For Python I need a third possibility: if m1 is not an ancestor of m2, m2 might be an ancestor of m2. This can happen with pairwise delimiters as well. Consider this Awk code:

if (some_var) {
    # ...
}

Right now we highlight the parentheses and braces the same, which makes sense because visually they are all siblings. However, in the syntax tree the block delimited by the braces is a child of the whole if-statement, and so it would get highlighted differently from the parentheses. I could define a @body capture in the pattern and then say that a match is only considered a child if and only if it contained inside the @body

(if_statement
  "if"
  "(" @delimiter
  condition: _ @body
  ")" @delimiter @sentinel) @container

Now only what is inside the parentheses is considered a child, but the block with braces is a sibling (or more generally a cousin). The same for Python:

(for_statement
  "for"
  left: _ @body
  "in"
  body: _ @body) @container

Here we have two bodies: whatever is between for and in, and the body of the loop. Let's look at an example:

for (k, v) in {'a': 1, 'b': 2}:
    print(k, v)

Here for, in and the dictionary are considered siblings, while (k, v) and print(k, v) are children because both are inside a @body of the pattern.

@HiPhish
Copy link
Owner Author

HiPhish commented Sep 1, 2024

@Danielkonge

on_changedtree sometimes give us bad (as in not precise enough) changes on undo, when using snippets and when pasting something copied, and that can mess with highlighting. (I don't have a good solution for this.)

Can you please provide some examples so I can replicate it?

@HiPhish
Copy link
Owner Author

HiPhish commented Sep 3, 2024

I have found one broken edge case: when pasting multiple lines with delimiters only the last line will have the proper highlighting, the other lines will be off by one per how many lines they are above the last line. Take the following example:

print {
    {a = 1, b = 2},
}

Now type ggjyy5p to past the second line five times and you will see what I mean. Oddly enough this works fine when pasting just one line or when pasting inside an embedded language (such as this code block in this Markdown buffer).

I have added a test case. I anyone finds more edge cases please post them here.

@HiPhish HiPhish force-pushed the fix-highlighting branch 2 times, most recently from baac83c to efda387 Compare September 5, 2024 19:00
@HiPhish
Copy link
Owner Author

HiPhish commented Sep 5, 2024

Pasting multiple lines should be fixed now.

Previously I was abusing stacks as sets, but this was hard to read
because it was unclear whether the stacks were actually used for their
stack properties or just as a makeshift set.  In addition to that stacks
do not have the idempotence of sets, which as not an issue in practice,
but it could have caused bugs.
There is nothing I can do until this issue is resolved:

  elm-tooling/tree-sitter-elm#159

The problem is that because the parenthesized pattern does not have a
container of its own the nesting cannot be determined properly.
There are now two global strategies and the correct one gets picked at
runtime based on the version of Neovim.
@Danielkonge
Copy link
Contributor

Sorry for the late reply, I have been on vacation and then after that I was quite busy for some time. I am not sure exactly what you have looked at since your last messages. Is there any of the problems I mentioned I should still try to find a way to reproduce?

Also just a note: To get around the issue with not having sentinels (e.g. in Python), I used the end (line,column) of the node for queries with no sentinel provided, and that worked fine. I could generally get good highlighting in Python (and other languages) with that, I just used a really hacky solution to get around the problems you mentioned in Python, so that should definitely be considered a bit more thoroughly at some point, if we would want something like that to work.

@HiPhish
Copy link
Owner Author

HiPhish commented Oct 13, 2024

I am not sure exactly what you have looked at since your last messages. Is there any of the problems I mentioned I should still try to find a way to reproduce?

The best thing to do would be just to use the current state as it is and report if you find some issue. I have already fixed one issue (outlined above), but I have noticed a different one. Take this Lua code:

print {
    {
    }
    {
    }
}

The execute %s/\v\{\n\s+/{ to remove line breaks after the opening brace. The highlighting will be messed up. To be honest, I consider this to be a very marginal edge case and I would leave it as is for the time being until the PR is merged. I can look into it afterwards. There is already a failing test for it.

Also just a note: To get around the issue with not having sentinels (e.g. in Python), I used the end (line,column) of the node for queries with no sentinel provided, and that worked fine. I could generally get good highlighting in Python (and other languages) with that, I just used a really hacky solution to get around the problems you mentioned in Python, so that should definitely be considered a bit more thoroughly at some point, if we would want something like that to work.

I don't want to spend time on it until after this PR. I don't think we can make it work without an explicit @body capture which tells us where a child tree can be, but I don't mind being proven wrong.. Maybe it will be possible to omit the @body when we don't need it, but it is not a big deal to add it to all queries.

The first priority is finishing this PR. The global strategy is in a good enough shape, the next step is porting over the local strategy and that should be it then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants