Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add action to support file upload. #564

Open
madalinabuzau opened this issue Aug 10, 2024 · 2 comments
Open

Add action to support file upload. #564

madalinabuzau opened this issue Aug 10, 2024 · 2 comments

Comments

@madalinabuzau
Copy link

I have been trying to use the agent to upload a file on a website and unfortunately it doesn't seem to have that action.

@adeprez
Copy link
Contributor

adeprez commented Aug 15, 2024

The agent tends to click on file input elements, but since Selenium doesn't support interaction with the file system modal, this action fails. We have a method in our SeleniumDriver that can upload a file using send_keys.

To ensure the action succeeds, we should guide the LLM to use the set_value method through its prompt instead of attempting to click on the element. Additionally, we need to ensure that the World Model correctly passes the file path to be uploaded.

The necessary code is already in place, but the prompts need to be adjusted accordingly. Would you like to contribute on this feature?

Relates to #406

@madalinabuzau
Copy link
Author

Thanks Alexis. I did modify the prompt but it still clicks to upload the file rather than send_keys. I think I need to dig deeper into the entire codebase to sort this out. Happy to contribute on this!
Btw, the costs are insane. I think we need much cheaper multimodal models to make this approach feasible

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants