Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix promises at EOF #4097

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Fix promises at EOF #4097

wants to merge 4 commits into from

Conversation

b4n
Copy link
Member

@b4n b4n commented Oct 8, 2024

getInputLineOffset() was giving very wrong results at EOF, leading to overlong promises, leading to crashes in building the narrowed input stream (see e.g. newly introduced test case, and geany/geany#3939).

I'm unsure if this is a good way to fix the issue, but I couldn't find a better one; but admittedly I didn't poke around this code for a long time, and it got a fair bit more complicated with its newer features, so I might have missed something.

The previous computation could result in invalid values, which could
lead to invalid promise ranges and subsequent crashes.
Don't include (part of) the PHP open tag in the HTML input.
Copy link

codecov bot commented Oct 8, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 85.85%. Comparing base (ecf0c4a) to head (039e589).
Report is 20 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4097      +/-   ##
==========================================
+ Coverage   85.72%   85.85%   +0.13%     
==========================================
  Files         239      239              
  Lines       56961    58633    +1672     
==========================================
+ Hits        48827    50338    +1511     
- Misses       8134     8295     +161     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@masatake
Copy link
Member

masatake commented Oct 9, 2024

Thank you.

it got a fair bit more complicated with its newer features

I agree with you. Some issues I have opened these days are related to this area.
Eventually, I must revise the whole area after merging this pull request.

I will review this.

body input.php /^<div id="body">$/;" I line:26 language:HTML roles:def
content input.php /^ <main id="content">$/;" I line:27 language:HTML roles:def
post input.php /^ <article class="post" id="post-<?p/;" c line:31 language:HTML roles:attribute
post-<?p input.php /^ <article class="post" id="post-<?p/;" I line:31 language:HTML roles:def
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a follow-up to fix the HTML promise range from the PHP lexer to fix this spurious <?p, but it currently depends on these changes so I'll post it after this is sorted out (no pressure, just saying that indeed, this tag is not exactly as it should be).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you paste this comment into the commit log for this commit?
So people can understand this commit from its log only with 'git log'.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want, but I'm not sure its so important to specify in the commit, it's just a note for review that it's not to be worried about, but I don't think it undermines the patch much.
Another option could be removing this line from the test case and add a new one for the other patch, I merely kept it like this because it's a "real" file and made me notice the issue.

Anyway, I can do either just fine, as you prefer :)

@masatake
Copy link
Member

I'm very sorry to be late to respond.
I cannot find enough time to work on this item.

Though I introduced the original bug, let me report a case where this pull request doesn't work this time.

input.html

<%=

makes enter an infinite loop:

~/bin/ctags --options=NONE --verbose -o - --extras='*' input.html

JSP tags shouldn't trigger the PHP parser.  This used to happen and
trigger an infinite loop of a ping-pong effect with the HTML parser
asking the PHP parser to parse the JSP block, and the PHP parser, not
recognizing the input, asking the HTML one to deal with it.  Rinse and
repeat.
@b4n
Copy link
Member Author

b4n commented Nov 11, 2024

I'm very sorry to be late to respond.
I cannot find enough time to work on this item.

Don't worry, I understand only too well :)

Though I introduced the original bug, let me report a case where this pull request doesn't work this time. […]

I don't think it's a bug in this PR, but rather an existing issue in the HTML parser, and in the fact that the HTML and PHP parser both make promises for each other. I think if it's new with these changes, they might have merely uncovered it by fixing ranges.
Anyway, I've added a commit that fixes this issue, but the issue is a bit deeper: with both parser triggering each other, there is a risk that such an infinite loop arises if both parsers don't agree on what's theirs to deal with. Here, the issue was that the HTML parser fed the PHP one JSP/ASP tags it doesn't accept, and so the PHP parser gave them back to the HTML parser, so on and so forth.
Maybe a more robust solution would be for a parser never to trigger the triggering parser at offset 0, because that's bound to create this problem. E.g. there might need to be a safeguard at some point that say "if I got called by parser X, I can't ask it to parse this from offset 0". This might need to be even generalized as "if I got called by a parser, I must handle at least one character before triggering another parser" to avoid more complex loops where e.g. parser A triggers parser B that triggers parser C that triggers A again.

However, as this is likely a pretty rare issue limited to possibly the PHP/HTML couple, maybe a more localized fix would be fine -- if there's a way to know the input is from a promise, it might be easy. Or at the promise layer, refuse 2 promises starting at the same offset?

@b4n
Copy link
Member Author

b4n commented Nov 12, 2024

Maybe even something like this?

diff --git a/main/promise.c b/main/promise.c
index 01e84f5ca..428742c73 100644
--- a/main/promise.c
+++ b/main/promise.c
@@ -96,6 +96,17 @@ int  makePromise   (const char *parser,
 			return -1;
 	}
 
+	for (int i = 0; i < promise_count; i++)
+	{
+		p = promises + i;
+		if (p->startLine == startLine && p->startCharOffset == startCharOffset)
+		{
+			error (WARNING, "Ignoring promise at line %lu offset %lu because there is already an existing one at this location\n",
+			       startLine, startCharOffset);
+			return -1;
+		}
+	}
+
 	if ( promise_count == promise_allocated)
 	{
 		size_t c = promise_allocated? (promise_allocated * 2): 8;

beware that I know very little about this and didn't dig deep, it might drop legitimate promises e.g. if the offset is relative to the reduced input rather than the input file itself. But this illustrates a potential solution anyway.

@masatake
Copy link
Member

masatake commented Nov 12, 2024

(I have not read the comments yet. I'm currently running ctags on the entire source code in Fedora now. So I have many chances to find bugs.)

input.php

<?=

makes enter an infinite loop:

~/bin/ctags --options=NONE --verbose -o - --extras=g input.php

(Edited: This bug is reproduced with ctags 6.0.0. So this has nothing to do with this pull request, as @b4n wrote.)

@masatake
Copy link
Member

Instead of fcab90a, the following change also passes the test cases (parser-php.r/wp-guest.d)

commit d3ea3a8c17861e2e7695ffd4d057533362993318
Author: Masatake YAMATO <[email protected]>
Date:   Tue Nov 12 18:42:44 2024 +0900

    TEMP

diff --git a/main/read.c b/main/read.c
index 67586051d..f975ba7d7 100644
--- a/main/read.c
+++ b/main/read.c
@@ -160,12 +160,12 @@ extern int getInputLineOffset (void)
         * So the way to calculate the offset at the end of file is tricky.
         */
        ret = (mio_tell (File.mio) - (File.bomFound? 3: 0))
-           - getInputFileOffsetForLine(File.input.lineNumber);
+           - getInputFileOffsetForLine(File.input.lineNumber) - File.ungetchIdx;
    }
    else
    {
        /* At the first line of file. */
-       ret = mio_tell (File.mio) - (File.bomFound? 3: 0);
+       ret = mio_tell (File.mio) - (File.bomFound? 3: 0) - File.ungetchIdx;
    }
 
    return ret >= 0 ? ret : 0;

When PEG-based parsers run as guests, they don't work at all.
However, it has nothing to do with the bug initially pointed out by this pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants