Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unsafe identifier renames #8

Closed
j4k0xb opened this issue Dec 12, 2023 · 11 comments
Closed

Unsafe identifier renames #8

j4k0xb opened this issue Dec 12, 2023 · 11 comments

Comments

@j4k0xb
Copy link
Contributor

j4k0xb commented Dec 12, 2023

Minifiers only rename variable/function/parameter bindings and not any other identifier like properties because it's almost impossible to guarantee the program still works then.

Example input file:

class A {
  get() {
    return document.querySelector("#foo");
  }
}

humanify output:

class MyClass {
  getElementById() {
    return dom.findElement("#foo");
  }
}

I think thats the only change necessary but it needs some testing:

- Identifier: (path) => { 
+ BindingIdentifier: (path) => { 

Or how about renaming all variables to something unique so it doesn't mix up minified ones and then supplying a list of binding names it is allowed to change in the prompt?

Also it can happen that a variable gets renamed but another one with the same name already exists in the scope, or the code gets split up in such a way that not all references are renamed.
Maybe use babel's scope-aware rename that also handles these conflicts (have to find a way of getting rid of the _ prefix): https://astexplorer.net/#/gist/7283141e13dab314521744603a95e9b7/05b11370db8d5ef257550b2916b87d6e72e4eb1d

@0xdevalias
Copy link

Or how about renaming all variables to something unique so it doesn't mix up minified ones and then supplying a list of binding names it is allowed to change in the prompt?

I've laid out some of my thoughts on this concept in general, in the following issues/comments:

Also it can happen that a variable gets renamed but another one with the same name already exists in the scope, or the code gets split up in such a way that not all references are renamed.

Off the top of my head, if it's not easily solved via tweaking the prompt, you could potentially have it do a 'refinement/feedback round' where you provide feedback to the LLM + get it to correct the issues (which could be repeated up to X times) eg.

The following variables were already defined in the scope, so we can't use the names you chose. Please choose a new name for them so that they don't clash:

  • X
  • Y
  • Z

@j4k0xb
Copy link
Contributor Author

j4k0xb commented Feb 28, 2024

copilot now has a similar feature: https://code.visualstudio.com/updates/v1_87#_rename-suggestions
worth looking into how they've done it

@0xdevalias
Copy link

0xdevalias commented Feb 29, 2024

Release detailed here:

Couldn't see any overly relevant commits in that range, but did find the following searching the issue manually:

Which lead me to this label:

And these issues, which sound like there are 'rename providers' used by the feature:

More docs about rename providers here:

Based on the above, and the release notes explicitly mentioning copilot, I suspect the implementation will be in the Copilot extension itself (which isn't open-source):

⇒ file GitHub.copilot-1.168.741.vsix

GitHub.copilot-1.168.741.vsix: Zip archive data, at least v2.0 to extract, compression method=deflate

Though unzipping that and searching for provideRename didn't seem to turn up anything useful unfortunately; so not sure if that means it's not implemented in the copilot extension, it doesn't use VSCode's 'rename provider' API, or I just wasn't looking right.

@isidorn
Copy link

isidorn commented Mar 4, 2024

Hi VS Code PM here,

A team member pointed me to this issue because it links to one of our VS Code issues.

I wanted to share that reverse engineering GitHub copilot is violating the general terms of the use agreement https://github.com/customer-terms/general-terms

Feel free to reach out to me if I can help with the general terms clarification [email protected]

Thank you
Isidor

@0xdevalias
Copy link

0xdevalias commented Mar 5, 2024

@isidorn It's less about "reverse engineering GitHub copilot" and more about "trying to figure out where the 'rename suggestions' change mentioned in the VSCode release notes was actually implemented; and what mechanism 'integrates' it into VSCode'".

The above is assumptions + an attempt to figure that out; but if you're able to point me to the actual issue/commit on the VSCode side (assuming it was implemented there), or confirm whether it's implemented on the closed source GitHub Copilot extension side of things (if it was implemented there), that would be really helpful.

If it was implemented on the GitHub Copilot extension side of things, then confirming whether the VSCode extension 'rename provider' is the right part of the VSCode extension API to look at to implement a similar feature would be awesome.

@ulugbekna
Copy link

Thank you for taking interest in this API. The rename suggestions feature is powered by a proposed API defined here. Extensions provide the suggestions, while the vscode shows them in the rename widget.

@0xdevalias
Copy link

0xdevalias commented Mar 6, 2024

@ulugbekna Awesome; thanks for pointing that out! Shall have a closer read :)

Edit: See also:

@sagarika-fynd
Copy link

sagarika-fynd commented Mar 29, 2024

I renamed all variables to unique names, and then ran the unminify code, most of the stuff was unminified, but i still had some random code, which wasnts, so i added a check to only rename variables which are in the pattern of my naming convention, and instructed the same to GPT, had to run this a few times, but it made sure sancity of previous naming convention was maintained

@0xdevalias
Copy link

0xdevalias commented Aug 29, 2024

Also it can happen that a variable gets renamed but another one with the same name already exists in the scope, or the code gets split up in such a way that not all references are renamed.

See also:


Semi-related:

@0xdevalias
Copy link

Also it can happen that a variable gets renamed but another one with the same name already exists in the scope, or the code gets split up in such a way that not all references are renamed.

Off the top of my head, if it's not easily solved via tweaking the prompt, you could potentially have it do a 'refinement/feedback round' where you provide feedback to the LLM + get it to correct the issues (which could be repeated up to X times) eg.

The following variables were already defined in the scope, so we can't use the names you chose. Please choose a new name for them so that they don't clash:

  • X
  • Y
  • Z

Just made a similar suggestion for this pattern while debugging another issue, referencing it here for continuity:

The proper fix would be to use toIdentifier from @babel/types

@j4k0xb @jehna From a 2sec search I couldn't find rendered docs, but here's the relevant source for each:

toIdentifier definitely seems like a more robust approach than the current 'prefix with _' approach for sure.

Though I wonder if a 'proper fix' should also involve tweaking how we prompt for/filter the suggestions coming back from the LLM itself as well. Like instead of just forcing an invalid suggestion to be valid (with toIdentifier), we could detect that it's invalid (with isValidIdentifier) and then provide that feedback to the LLM, asking it to give a new suggestion; probably with some max retry limit; after which we could fall back to using the invalid suggestion run through toIdentifier, or log a warning and leave it un-renamed or similar.

Originally posted by @0xdevalias in #117 (comment)

@jehna
Copy link
Owner

jehna commented Oct 18, 2024

Using BindingIdentifier instead of Identifier now, thanks for reporting this! Closing for now, should be fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants