Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-126469: fix an error in lexer.c. #126473

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Conversation

qqwqqw689
Copy link
Contributor

@qqwqqw689 qqwqqw689 commented Nov 6, 2024

In file lexer.c we have

    Py_ssize_t invalid = _PyUnicode_ScanIdentifier(s);
    if (invalid < 0) {
        Py_DECREF(s);
        tok->done = E_ERROR;
        return 0;
    }

link

But for function _PyUnicode_ScanIdentifier in unicodeobject.c file

Py_ssize_t
_PyUnicode_ScanIdentifier(PyObject *self)
{
    Py_ssize_t i;
    Py_ssize_t len = PyUnicode_GET_LENGTH(self);
    if (len == 0) {
        /* an empty string is not a valid identifier */
        return 0;
    }

    int kind = PyUnicode_KIND(self);
    const void *data = PyUnicode_DATA(self);
    Py_UCS4 ch = PyUnicode_READ(kind, data, 0);
    /* PEP 3131 says that the first character must be in
       XID_Start and subsequent characters in XID_Continue,
       and for the ASCII range, the 2.x rules apply (i.e
       start with letters and underscore, continue with
       letters, digits, underscore). However, given the current
       definition of XID_Start and XID_Continue, it is sufficient
       to check just for these, except that _ must be allowed
       as starting an identifier.  */
    if (!_PyUnicode_IsXidStart(ch) && ch != 0x5F /* LOW LINE */) {
        return 0;
    }

    for (i = 1; i < len; i++) {
        ch = PyUnicode_READ(kind, data, i);
        if (!_PyUnicode_IsXidContinue(ch)) {
            return i;
        }
    }
    return i;
}

link
This function will never return a invalid valiable <0, so

    if (invalid < 0) {
        Py_DECREF(s);
        tok->done = E_ERROR;
        return 0;
    }

will never be executed.

I make this pr to fix it.

@qqwqqw689 qqwqqw689 changed the title gh-126469: fix a error in lexer.c. gh-126469: fix an error in lexer.c. Nov 6, 2024
Parser/lexer/lexer.c Outdated Show resolved Hide resolved
@skirpichev
Copy link
Member

Unfortunately, fix seems to be wrong: pr has CI failures, related to it.

@diegorusso
Copy link
Contributor

diegorusso commented Nov 6, 2024

It is true that the if statement is not executed (at least in the tests) but the fix is not right as tests are failing. Maybe we should really focus on writing a test that exercise that code path and if we cannot, we could refactor the logic there.

@qqwqqw689
Copy link
Contributor Author

qqwqqw689 commented Nov 6, 2024

Actually I'm a rookie, I'd like to wait for some advices.😁

@tomasr8
Copy link
Member

tomasr8 commented Nov 6, 2024

The tests seem to be failing because the error that is being raised is unknown parsing error while the tests expect either invalid character or invalid non-printable character so it doesn't seem to be entirely wrong. I think you can simply remove the if block and the tests should pass:

-    if (invalid < 0) {
-        Py_DECREF(s);
-        tok->done = E_ERROR;
-        return 0;
-    }

@qqwqqw689
Copy link
Contributor Author

The tests seem to be failing because the error that is being raised is unknown parsing error while the tests expect either invalid character or invalid non-printable character so it doesn't seem to be entirely wrong. I think you can simply remove the if block and the tests should pass:

-    if (invalid < 0) {
-        Py_DECREF(s);
-        tok->done = E_ERROR;
-        return 0;
-    }

I willl try.

tok->done = E_ERROR;
return 0;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to replace removed code with: assert(invalid >= 0);.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants