-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Incorrectly interpreting a Python error as a success message #5637
Comments
Does it continue from there? This error only means that the LLM asked for a replace without giving the text to replace. Then the LLM will receive this error message, next step, and hopefully correct itself next time. What LLM are you using? |
2024-12-17.00-02-47.msedge.mp4 |
Sorry, this looks like normal, intended behavior. The model will make mistakes. When it makes a mistake, like trying to execute a command without a required parameter (this 'old_str'), then the next step it will receive that information (the error message), and expected to figure out how to fix the problem. That's what it says it will do "let's try again, by specifying the correct old_str parameter". This kind of error is caused by the model, and the model will be helped - and expected - to solve it in the next steps. |
I see, then why is the icon next to the command result a ✅ instead of ❌? |
Aha! You are correct, it appears as a pretty green checkbox, and the rest of the application interprets it as any other message of the model doing its thing. Maybe we could fix this. @openhands-agent the ipython observation with an "ERROR" string is displayed in the frontend as a success message, with a green checkbox. Can we detect it was an error and display it with a red x? |
A potential fix has been generated and a draft PR #5644 has been created. Please review the changes. |
Same issue here I'm using: |
I have this issue as well regarding edit loops, it is either that they forgot some of the parameters, or worse the strings do not match the file, which makes substitutions harder. #5310 |
I don't think we need the ability to instruct the agent to edit lines manually; we might as well click the "Open VS Code" and edit the file manually |
@avi12 it's like telling the agent to learn how to parse the file correctly, no matter Python or other languages. Yet this is very tricky when models like Qwen uses |
I get what you're saying but in my POV I just want the agent to figure out stuff on its own, using the LLM, whether it's keeping a record of the file structure, reading many files to get a better context, etc |
@avi12 yeah you are right, would be much better if it has self-instruction built in so it can correct itself and "learn" how to edit (e.g. section checks, line checks, function checks). Hate how low compatibility is for some models that are good on theory and algo but bad in writing and comprehension. |
I mean I can't really complain, it's a FOSS project |
I used phi "4" not official from Microsoft and it had less issues getting stuck in a loop but it still happened so I think it's also something related to openhands |
Getting stuck in a loop is a problem for many LLMs. In fact, to my knowledge only Anthropic appears to have solved it somehow, specially for Opus but Sonnet too, and I think even Haiku. Just about every other LLM gets weighted down by its own history at some point and gets stuck in a loop every once in a while. The next tokens end up the same every time: only a particular answer, the model responds with it over and over. This happens with coding agents, but it also happens with multi-user dialogue, discord experiments, attempts at analysis of text etc. I've seen it everywhere there's a larger or more complex history than just a simple back and forth basically. What And we try to hand over to the user the ability to try to prompt it again, in the hope that it will be able to continue. Many LLMs might not able to anyway, or not without some lot of prompting. Sorry for elaborating, I'm sure we can do more, but this isn't really on topic here, we should probably make a new issue if you wish to discuss improvements on how we can handle the situation when the LLM gets stuck in a loop. It's really getting lost if it's in unrelated issues. Back to the topic here, we can fix this icon. I haven't verified the |
@enyst have you seen DRY sampler as a replacement for repetition penalty yet? Not in OpenRouter yet but got a lot of good press, made by @p-e-w oobabooga/text-generation-webui#5677 |
Is there an existing issue for the same bug?
Describe the bug and reproduction steps
2024-12-16.23-33-04.msedge.mp4
OpenHands Installation
Docker command in README
OpenHands Version
0.15.2, from
main
Operating System
WSL on Windows
Logs, Errors, Screenshots, and Additional Context
The error message is
The text was updated successfully, but these errors were encountered: