-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: Replace pexpect with libtmux in BashSession #4881
base: main
Are you sure you want to change the base?
Conversation
- Simplified implementation using libtmux instead of pexpect - Added proper handling of command errors, interactive commands, and timeouts - Added test suite to verify behavior - Improved output handling and error detection
…ould do it in the CmdOutputObservation end for keep_prompt
- Add tests for missing fields in PS1 metadata - Add tests for malformed values in numeric fields - Add tests for boolean values in numeric fields - Fix JSON parsing in test_ps1_metadata_json_structure - Fix handling of malformed values in from_ps1_match
- Move error handling from from_ps1_match to from_ps1 - Let from_ps1_match raise exceptions for invalid data - Update tests to match new error handling behavior
- Add support for float values in numeric fields - Fix regex pattern to handle different line endings - Add more test cases for edge cases - Keep valid string fields when numeric fields fail to parse
- Use re.escape() to properly escape special characters in markers - Use constants to avoid duplication - Update tests to use constants and handle newlines consistently
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🍰
I wonder what were the empty generations? I notice you used the proxy with the general model, and it's possible it will fall into a weird "not found" error we've seen before. I think I've seen that results in empty patches.🤔 |
@enyst i think that should be exposed as an |
I think another possibility is when the agent faced issues reproducing the error and poked around with the reproduce script, and that script was in the |
@@ -82,7 +90,7 @@ def event_to_dict(event: 'Event') -> dict: | |||
d['timeout'] = event.timeout | |||
elif 'observation' in d: | |||
d['content'] = props.pop('content', '') | |||
d['extras'] = props | |||
d['extras'] = {k: _convert_pydantic_to_dict(v) for k, v in props.items()} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
d['extras'] = {k: _convert_pydantic_to_dict(v) for k, v in props.items()} | |
# props is a dict whose values can include a complex object like an instance of a BaseModel subclass | |
# such as CmdOutputMetadata | |
# we serialize it along with the rest | |
d['extras'] = {k: _convert_pydantic_to_dict(v) for k, v in props.items()} |
self.confirmation_state = confirmation_state | ||
self.security_risk = security_risk | ||
logger.debug( | ||
f'CmdRunAction ignored kwargs (likely due to legacy serialization): {kwargs}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is compatibility the reason for adding this init? In other cases we've done it outside the constructor, so we could let the dataclass continue to create its init undercover. It's only keep_prompt
, if I see this right?
'--END AGENT OBSERVATION--' | ||
) | ||
|
||
def to_agent_observation(self) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def to_agent_observation(self) -> str: | |
def _to_agent_observation(self) -> str: |
This method seems used only here? Maybe we can make it private.
It confused me a bit, I thought it was used in the agent... 😅
metadata = CmdOutputMetadata.from_ps1_match(ps1_matches[-1]) | ||
|
||
# Special case where the previous command output is truncated due to history limit | ||
# We should content BEFORE the last PS1 prompt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# We should content BEFORE the last PS1 prompt | |
# We should get the content BEFORE the last PS1 prompt |
command: str | ||
exit_code: int = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the stuck
equivalence check _eq_no_pid
may need a bit of update, to account for the move of exit_code.
With these changes, what should it mean for a CmdOutputObservation to be "the same" with another?
# Handle legacy attribute | ||
if 'exit_code' in kwargs: | ||
self.metadata.exit_code = kwargs['exit_code'] | ||
if 'command_id' in kwargs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto, if we need to handle legacy attrs, we can do it as before, maybe we don't have to write an init (yet)?
I've seen it not reported as error in a recent eval... I'll have to verify what's up, it might be just the summary script that missed it as error. |
End-user friendly description of the problem this fixes or functionality that this introduces
Give a summary of what the PR does, explaining any non-trivial design decisions
Collaborated with OpenHands:
Link of any specific issues this addresses
To run this PR locally, use the following command: