Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Break code into chunks #90

Open
Yiannis128 opened this issue Oct 13, 2023 · 2 comments
Open

Break code into chunks #90

Yiannis128 opened this issue Oct 13, 2023 · 2 comments
Labels
enhancement New feature or request research This issue isn't being worked on, it is being considered, and researched.
Milestone

Comments

@Yiannis128
Copy link
Collaborator

All LLMs have issues with medium-large code samples in fixing and optimizing the code. A strategy that could be done to improve this is to break the input code into chunks and incrementally give it to the LLM, so that it doesn't have access to the entire code. So for example, a 2000 line code file could be broken into chunks segmented by functions.

@Yiannis128 Yiannis128 added enhancement New feature or request research This issue isn't being worked on, it is being considered, and researched. labels Oct 13, 2023
@juancolonna
Copy link

Hi Yiannis, thanks for opening this issue. That's right, from the last table you presented today it is clear that with smaller codes the method works better, so if we find a way to reduce larger code into small verifiable snippets, we will have a win-win strategy. My next concern is: can ESBMC check code snippets?

@Yiannis128
Copy link
Collaborator Author

Hi Yiannis, thanks for opening this issue. That's right, from the last table you presented today it is clear that with smaller codes the method works better, so if we find a way to reduce larger code into small verifiable snippets, we will have a win-win strategy.

While it is worth running the experiment, it is worth mentioning that it could in some cases affect the code in a bad way. For example, let's say we provide a single function every time, then if that function calls another function from the same or a different source file, then the LLM may not know what the function is doing. Perhaps we could avoid this by including additional metadata with each function call describing textually what each function does, please see this example:

int x() {
  int res = y();
}

Could be transformed into:

int x() {
  int res = y(); // Call function y which is supposed to do .........
}

My next concern is: can ESBMC check code snippets?

I am not sure about ESBMC processing code snippets, I personally think that as long as you contain in the source file everything such that the code compiles into an object file, then yes, it would be able to. To rephrase it, if you remove all irrelevant functions and variables, then yes, it would work. I don't think that you could go lower than that to breaking down functions, but perhaps @lucasccordeiro can answer that question?

@Yiannis128 Yiannis128 added this to the Version 0.4.0 milestone Feb 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request research This issue isn't being worked on, it is being considered, and researched.
Projects
None yet
Development

No branches or pull requests

2 participants