[Bug]: Incorrectly interpreting a Python error as a success message #5637

avi12 · 2024-12-16T21:35:19Z

Is there an existing issue for the same bug?

I have checked the existing issues.

Describe the bug and reproduction steps

2024-12-16.23-33-04.msedge.mp4

OpenHands Installation

Docker command in README

OpenHands Version

0.15.2, from main

Operating System

WSL on Windows

Logs, Errors, Screenshots, and Additional Context

The error message is

Parameter `old_str` is required for command: str_replace.

The text was updated successfully, but these errors were encountered:

enyst · 2024-12-16T21:56:48Z

Does it continue from there? This error only means that the LLM asked for a replace without giving the text to replace. Then the LLM will receive this error message, next step, and hopefully correct itself next time.

What LLM are you using?

avi12 · 2024-12-16T22:03:12Z

2024-12-17.00-02-47.msedge.mp4

enyst · 2024-12-16T22:09:45Z

Sorry, this looks like normal, intended behavior. The model will make mistakes. When it makes a mistake, like trying to execute a command without a required parameter (this 'old_str'), then the next step it will receive that information (the error message), and expected to figure out how to fix the problem.

That's what it says it will do "let's try again, by specifying the correct old_str parameter". This kind of error is caused by the model, and the model will be helped - and expected - to solve it in the next steps.

avi12 · 2024-12-17T05:47:04Z

I see, then why is the icon next to the command result a ✅ instead of ❌?

Fix All-Hands-AI#5637

enyst · 2024-12-17T07:46:20Z

Aha! You are correct, it appears as a pretty green checkbox, and the rest of the application interprets it as any other message of the model doing its thing.

Maybe we could fix this. @openhands-agent the ipython observation with an "ERROR" string is displayed in the frontend as a success message, with a green checkbox. Can we detect it was an error and display it with a red x?

openhands-agent · 2024-12-17T07:46:48Z

OpenHands started fixing the issue! You can monitor the progress here.

openhands-agent · 2024-12-17T07:55:00Z

A potential fix has been generated and a draft PR #5644 has been created. Please review the changes.

CitroenGames · 2024-12-17T10:02:18Z

Same issue here I'm using:
Qwen 2.5 Coder 14B using LM studio as backend

BradKML · 2024-12-18T01:17:07Z

I have this issue as well regarding edit loops, it is either that they forgot some of the parameters, or worse the strings do not match the file, which makes substitutions harder. #5310
And prompting them to just delete and rewrite the file is just as difficult. Imagine if they can substitute by line numbers or function/class/method. But then the issue is to make sure the substitution can handle varying line counts between the old and new, or that intended function changes can be handled well. Also refactoring would be a pain if there are name changes.
@CitroenGames have you tried the other models like Phi-3/Phi-4 and DeepSeek Coder v2.5?

avi12 · 2024-12-18T03:59:59Z

I don't think we need the ability to instruct the agent to edit lines manually; we might as well click the "Open VS Code" and edit the file manually
The goal is rather to have a proper fix for the agent to parse the result from the Python code correctly

BradKML · 2024-12-18T04:17:19Z

@avi12 it's like telling the agent to learn how to parse the file correctly, no matter Python or other languages. Yet this is very tricky when models like Qwen uses ... to skip code like a human writing tutorials. It's partially a prompting issue and partially a tool use issue.

avi12 · 2024-12-18T04:22:34Z

I get what you're saying but in my POV I just want the agent to figure out stuff on its own, using the LLM, whether it's keeping a record of the file structure, reading many files to get a better context, etc
If I have to manually instruct the model what to do then it kinda loses the point IMHO

BradKML · 2024-12-18T04:24:57Z

@avi12 yeah you are right, would be much better if it has self-instruction built in so it can correct itself and "learn" how to edit (e.g. section checks, line checks, function checks). Hate how low compatibility is for some models that are good on theory and algo but bad in writing and comprehension.

avi12 · 2024-12-18T04:42:12Z

I mean I can't really complain, it's a FOSS project
Cheaper than paying $500 for Devin

CitroenGames · 2024-12-18T10:27:02Z

I have this issue as well regarding edit loops, it is either that they forgot some of the parameters, or worse the strings do not match the file, which makes substitutions harder. #5310
And prompting them to just delete and rewrite the file is just as difficult. Imagine if they can substitute by line numbers or function/class/method. But then the issue is to make sure the substitution can handle varying line counts between the old and new, or that intended function changes can be handled well. Also refactoring would be a pain if there are name changes.
@CitroenGames have you tried the other models like Phi-3/Phi-4 and DeepSeek Coder v2.5?

I used phi "4" not official from Microsoft and it had less issues getting stuck in a loop but it still happened so I think it's also something related to openhands

enyst · 2024-12-18T11:30:52Z

Getting stuck in a loop is a problem for many LLMs. In fact, to my knowledge only Anthropic appears to have solved it somehow, specially for Opus but Sonnet too, and I think even Haiku. Just about every other LLM gets weighted down by its own history at some point and gets stuck in a loop every once in a while. The next tokens end up the same every time: only a particular answer, the model responds with it over and over.

This happens with coding agents, but it also happens with multi-user dialogue, discord experiments, attempts at analysis of text etc. I've seen it everywhere there's a larger or more complex history than just a simple back and forth basically.

What openhands currently does about it, is to stop it (instead of going on and on until the maximum steps allowed!).

And we try to hand over to the user the ability to try to prompt it again, in the hope that it will be able to continue. Many LLMs might not able to anyway, or not without some lot of prompting.

Sorry for elaborating, I'm sure we can do more, but this isn't really on topic here, we should probably make a new issue if you wish to discuss improvements on how we can handle the situation when the LLM gets stuck in a loop. It's really getting lost if it's in unrelated issues.

Back to the topic here, we can fix this icon. I haven't verified the openhands-agent's fix yet, will get to do that. Please note that it's just about web frontend display, it makes no difference to the agent.

BradKML · 2024-12-18T23:41:04Z

@enyst have you seen DRY sampler as a replacement for repetition penalty yet? Not in OpenRouter yet but got a lot of good press, made by @p-e-w oobabooga/text-generation-webui#5677
Besides that I would highly recommend checking out Dynamic Temperature and min_p by @kalomaze (see here #5310 (comment) ), but the issue is that sampler hacks requires upstream support, which is a tall order for proprietary models.
A more proper approach, other than just reliance on Open Weights hacking, is to create mechanism that forces the LLM to "get creative" and break out of established loops. Temperature in OpenHands are defaulted to 0 and I tried other params, need to run a larger set of test to see if alternative param pairs matter. Raising temperature based on attempt counter would be helpful.
But other than doing it on parameter side, we also should just get the LLM to observe previous attempts and make it do some introspection on how to avoid redoing bad solutions, and to learn from them a little. Exposing errors from tool usage, and throw in some prompt engineering regarding debug procedures, would help with this.

avi12 added the bug Something isn't working label Dec 16, 2024

SmartManoj added a commit to SmartManoj/Kevin that referenced this issue Dec 17, 2024

Fix indicator

37ab668

Fix All-Hands-AI#5637

kevin-support-bot bot mentioned this issue Dec 17, 2024

[Bug]: Incorrectly interpreting a Python error as a success message SmartManoj/Kevin#133

Open

openhands-agent linked a pull request Dec 17, 2024 that will close this issue

Fix issue #5637: [Bug]: Incorrectly interpreting a Python error as a success message #5644

Draft

enyst mentioned this issue Dec 17, 2024

[Bug]: Cannot recover from "Agent stuck in loop" #5480

Closed

1 task

BradKML mentioned this issue Dec 18, 2024

agent: test the accuracy of agent after incorporating linter result to the file_editor #4636

Open

avi12 mentioned this issue Dec 27, 2024

[Bug]: Error: Not a directory #5870

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Incorrectly interpreting a Python error as a success message #5637

[Bug]: Incorrectly interpreting a Python error as a success message #5637

avi12 commented Dec 16, 2024

enyst commented Dec 16, 2024

avi12 commented Dec 16, 2024

enyst commented Dec 16, 2024

avi12 commented Dec 17, 2024 •

edited

Loading

enyst commented Dec 17, 2024

openhands-agent commented Dec 17, 2024

openhands-agent commented Dec 17, 2024

CitroenGames commented Dec 17, 2024 •

edited

Loading

BradKML commented Dec 18, 2024 •

edited

Loading

avi12 commented Dec 18, 2024

BradKML commented Dec 18, 2024

avi12 commented Dec 18, 2024

BradKML commented Dec 18, 2024 •

edited

Loading

avi12 commented Dec 18, 2024

CitroenGames commented Dec 18, 2024

enyst commented Dec 18, 2024 •

edited

Loading

BradKML commented Dec 18, 2024

[Bug]: Incorrectly interpreting a Python error as a success message #5637

[Bug]: Incorrectly interpreting a Python error as a success message #5637

Comments

avi12 commented Dec 16, 2024

Is there an existing issue for the same bug?

Describe the bug and reproduction steps

OpenHands Installation

OpenHands Version

Operating System

Logs, Errors, Screenshots, and Additional Context

enyst commented Dec 16, 2024

avi12 commented Dec 16, 2024

enyst commented Dec 16, 2024

avi12 commented Dec 17, 2024 • edited Loading

enyst commented Dec 17, 2024

openhands-agent commented Dec 17, 2024

openhands-agent commented Dec 17, 2024

CitroenGames commented Dec 17, 2024 • edited Loading

BradKML commented Dec 18, 2024 • edited Loading

avi12 commented Dec 18, 2024

BradKML commented Dec 18, 2024

avi12 commented Dec 18, 2024

BradKML commented Dec 18, 2024 • edited Loading

avi12 commented Dec 18, 2024

CitroenGames commented Dec 18, 2024

enyst commented Dec 18, 2024 • edited Loading

BradKML commented Dec 18, 2024

avi12 commented Dec 17, 2024 •

edited

Loading

CitroenGames commented Dec 17, 2024 •

edited

Loading

BradKML commented Dec 18, 2024 •

edited

Loading

BradKML commented Dec 18, 2024 •

edited

Loading

enyst commented Dec 18, 2024 •

edited

Loading