refactor: Replace pexpect with libtmux in BashSession #4881

xingyaoww · 2024-11-10T19:39:10Z

Simplified implementation using libtmux instead of pexpect
Added proper handling of command errors, interactive commands, and timeouts
Added test suite to verify behavior
Improved output handling and error detection

End-user friendly description of the problem this fixes or functionality that this introduces

Include this change in the Release Notes. If checked, you must provide an end-user friendly description for your change below

Give a summary of what the PR does, explaining any non-trivial design decisions

Collaborated with OpenHands:

Write tests for PS1 parsing https://www.all-hands.dev/share?share_id=41932d06ced712b1112e880ad3d3ed757308205438e0a6c6fd86048f22a5511d

Link of any specific issues this addresses

To run this PR locally, use the following command:

docker run -it --rm   -p 3000:3000   -v /var/run/docker.sock:/var/run/docker.sock   --add-host host.docker.internal:host-gateway   -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:cf98287-nikolaik   --name openhands-app-cf98287   docker.all-hands.dev/all-hands-ai/openhands:cf98287

- Simplified implementation using libtmux instead of pexpect - Added proper handling of command errors, interactive commands, and timeouts - Added test suite to verify behavior - Improved output handling and error detection

…ould do it in the CmdOutputObservation end for keep_prompt

- Add tests for missing fields in PS1 metadata - Add tests for malformed values in numeric fields - Add tests for boolean values in numeric fields - Fix JSON parsing in test_ps1_metadata_json_structure - Fix handling of malformed values in from_ps1_match

- Move error handling from from_ps1_match to from_ps1 - Let from_ps1_match raise exceptions for invalid data - Update tests to match new error handling behavior

- Add support for float values in numeric fields - Fix regex pattern to handle different line endings - Add more test cases for edge cases - Keep valid string fields when numeric fields fail to parse

- Use re.escape() to properly escape special characters in markers - Use constants to avoid duplication - Update tests to use constants and handle newlines consistently

…the new change

tofarr

🍰

enyst · 2024-12-20T20:31:39Z

We got around 41% on SWE-Bench Lite (well, in my first run), then I fix one minor issue and retry, it got us around 40%.

Running SWE-Bench verify to validate 🏃

I wonder what were the empty generations? I notice you used the proxy with the general model, and it's possible it will fall into a weird "not found" error we've seen before. I think I've seen that results in empty patches.🤔

xingyaoww · 2024-12-20T21:28:46Z

@enyst i think that should be exposed as an .error attribute, right? I didn't really see smth like that 🤔

ryanhoangt · 2024-12-21T05:38:06Z

I wonder what were the empty generations? I notice you used the proxy with the general model, and it's possible it will fall into a weird "not found" error we've seen before. I think I've seen that results in empty patches.🤔

I think another possibility is when the agent faced issues reproducing the error and poked around with the reproduce script, and that script was in the /workspace instead of /workspace/xxx, which is where we collect diff.

…tmux-shell

enyst · 2024-12-25T02:58:19Z

openhands/events/serialization/event.py

@@ -82,7 +90,7 @@ def event_to_dict(event: 'Event') -> dict:
            d['timeout'] = event.timeout
    elif 'observation' in d:
        d['content'] = props.pop('content', '')
-        d['extras'] = props
+        d['extras'] = {k: _convert_pydantic_to_dict(v) for k, v in props.items()}


Suggested change

d['extras'] = {k: _convert_pydantic_to_dict(v) for k, v in props.items()}

# props is a dict whose values can include a complex object like an instance of a BaseModel subclass

# such as CmdOutputMetadata

# we serialize it along with the rest

d['extras'] = {k: _convert_pydantic_to_dict(v) for k, v in props.items()}

enyst · 2024-12-25T03:07:55Z

openhands/events/action/commands.py

+        self.confirmation_state = confirmation_state
+        self.security_risk = security_risk
+        logger.debug(
+            f'CmdRunAction ignored kwargs (likely due to legacy serialization): {kwargs}'


Is compatibility the reason for adding this init? In other cases we've done it outside the constructor, so we could let the dataclass continue to create its init undercover. It's only keep_prompt, if I see this right?

enyst · 2024-12-25T03:11:59Z

openhands/events/observation/commands.py

+            '--END AGENT OBSERVATION--'
+        )
+
+    def to_agent_observation(self) -> str:


Suggested change

def to_agent_observation(self) -> str:

def _to_agent_observation(self) -> str:

This method seems used only here? Maybe we can make it private.

It confused me a bit, I thought it was used in the agent... 😅

enyst · 2024-12-25T03:32:18Z

openhands/runtime/utils/bash.py

+        metadata = CmdOutputMetadata.from_ps1_match(ps1_matches[-1])
+
+        # Special case where the previous command output is truncated due to history limit
+        # We should content BEFORE the last PS1 prompt


Suggested change

# We should content BEFORE the last PS1 prompt

# We should get the content BEFORE the last PS1 prompt

enyst · 2024-12-25T03:44:37Z

openhands/events/observation/commands.py

    command: str
-    exit_code: int = 0


I think the stuck equivalence check _eq_no_pid may need a bit of update, to account for the move of exit_code.

With these changes, what should it mean for a CmdOutputObservation to be "the same" with another?

enyst · 2024-12-25T03:46:28Z

openhands/events/observation/commands.py

+        # Handle legacy attribute
+        if 'exit_code' in kwargs:
+            self.metadata.exit_code = kwargs['exit_code']
+        if 'command_id' in kwargs:


Ditto, if we need to handle legacy attrs, we can do it as before, maybe we don't have to write an init (yet)?

enyst · 2024-12-25T03:53:36Z

@enyst i think that should be exposed as an .error attribute, right? I didn't really see smth like that 🤔

I've seen it not reported as error in a recent eval... I'll have to verify what's up, it might be just the summary script that missed it as error.
The completions of the eval, any of those with empty patch, would show that error.

openhands-agent and others added 30 commits November 10, 2024 18:12

refactor: Replace pexpect with libtmux in BashSession

7b86e33

- Simplified implementation using libtmux instead of pexpect - Added proper handling of command errors, interactive commands, and timeouts - Added test suite to verify behavior - Improved output handling and error detection

update poetry and implement pwd

9e5653c

add CmdOutputMetadata to get a lot of info from ps1

522eb53

handle and test on multiple PS1 block

d60065b

greatly simplify command to not accepting blocking/keep_prompt, we sh…

e304973

…ould do it in the CmdOutputObservation end for keep_prompt

fix PS1 so PS1JSON works

c743833

support pid

49eae72

preliminary impl of bash

363b379

slight refactor of cmd

ffa0676

add blocking back

554e03a

add blocking back

2b252e0

Refactor error handling in CmdOutputMetadata

1935483

- Move error handling from from_ps1_match to from_ps1 - Let from_ps1_match raise exceptions for invalid data - Update tests to match new error handling behavior

Improve CmdOutputMetadata handling of malformed values and line endings

23ddbe4

- Add support for float values in numeric fields - Fix regex pattern to handle different line endings - Add more test cases for edge cases - Keep valid string fields when numeric fields fail to parse

Refactor PS1 metadata regex pattern

744938c

- Use re.escape() to properly escape special characters in markers - Use constants to avoid duplication - Update tests to use constants and handle newlines consistently

update test bash session to be compatible with latest interface

f5518bf

make sure ps1 end begin with newline

aaed596

add newline suffix for test

71e4ec5

add test

3a2443f

add testcase & tweak for ps1

ceb0d32

remove re escape

145f141

tweak ps1

56e4df5

fix bash session arg

9e43ee5

make action execution server compatible

ed382c6

remove command id from agent obs & make other places compatible with …

8e4180f

…the new change

fix typo

3c68c7d

fix typo

d82c420

do not wrap lines in tmux captured output

010e453

use pwd to get working_dir

4aeb681

use PROMPT_COMMAND to make sure PS1 changes

28615cd

xingyaoww and others added 4 commits December 20, 2024 11:08

replace while true with while should_continue

b4ed2dc

rename pwd to cwd

be8914b

move bash init logic to a separate init function

8f2e9a9

update resource factor

178e029

tofarr approved these changes Dec 20, 2024

View reviewed changes

xingyaoww added 12 commits December 23, 2024 15:25

Merge commit 'd62cf7e7319850ce8c0dc47a3ddab0f4151d2af6' into feature/…

7498fe4

…tmux-shell

add initialized for bash session

fa78313

make sure legacy CmdOutputObservation is still serializable

8040497

fix missing init

5ff8998

re-order thought

c5ca25f

fix serialization of action

b34beaa

fix obs serialization

73f379e

fix serialization

c593295

try fix test

bf34c7e

fix test again

68ffd0c

Merge commit 'ecff5c67fb7f1995556f0f36f5050f33dc0953d2' into feature/…

cf98287

…tmux-shell

pretty print file write action

bb9c19b

enyst reviewed Dec 25, 2024

View reviewed changes

xingyaoww added 3 commits December 26, 2024 15:36

improve util script for swebench

c89677d

print actual visualization file path of the diff

165ee7a

fix grab test_output logic

9bc721b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: Replace pexpect with libtmux in BashSession #4881

refactor: Replace pexpect with libtmux in BashSession #4881

xingyaoww commented Nov 10, 2024 •

edited by github-actions bot

Loading

tofarr left a comment

enyst commented Dec 20, 2024 •

edited

Loading

xingyaoww commented Dec 20, 2024

ryanhoangt commented Dec 21, 2024 •

edited

Loading

enyst Dec 25, 2024

enyst Dec 25, 2024

enyst Dec 25, 2024

enyst Dec 25, 2024

enyst Dec 25, 2024

enyst Dec 25, 2024

enyst commented Dec 25, 2024

-        d['extras'] = {k: _convert_pydantic_to_dict(v) for k, v in props.items()}
+        # props is a dict whose values can include a complex object like an instance of a BaseModel subclass
+        # such as CmdOutputMetadata
+        # we serialize it along with the rest
+        d['extras'] = {k: _convert_pydantic_to_dict(v) for k, v in props.items()}

	def to_agent_observation(self) -> str:
	def _to_agent_observation(self) -> str:

	# We should content BEFORE the last PS1 prompt
	# We should get the content BEFORE the last PS1 prompt

refactor: Replace pexpect with libtmux in BashSession #4881

Are you sure you want to change the base?

refactor: Replace pexpect with libtmux in BashSession #4881

Conversation

xingyaoww commented Nov 10, 2024 • edited by github-actions bot Loading

tofarr left a comment

Choose a reason for hiding this comment

enyst commented Dec 20, 2024 • edited Loading

xingyaoww commented Dec 20, 2024

ryanhoangt commented Dec 21, 2024 • edited Loading

enyst Dec 25, 2024

Choose a reason for hiding this comment

enyst Dec 25, 2024

Choose a reason for hiding this comment

enyst Dec 25, 2024

Choose a reason for hiding this comment

enyst Dec 25, 2024

Choose a reason for hiding this comment

enyst Dec 25, 2024

Choose a reason for hiding this comment

enyst Dec 25, 2024

Choose a reason for hiding this comment

enyst commented Dec 25, 2024

xingyaoww commented Nov 10, 2024 •

edited by github-actions bot

Loading

enyst commented Dec 20, 2024 •

edited

Loading

ryanhoangt commented Dec 21, 2024 •

edited

Loading