-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feat] A competitive Web Browsing agent #1856
Conversation
Example logs:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! This mainly adds a browsing agent to the agent hub and tweaked a little bit about browser env. I think we can approve it to unblock the integration of BrowserGym.
EDIT: I also locally tested and confirmed the sample command works on my end!
PS: When we figure out a way to do task decomposition, CodeAct can eventually delegate tasks to this BrowserAgent
for complex web browsing tasks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leave some nits. Mostly LGTM. I would be appricate it if you can add more comments or simply elaborate your design and some parameter setting. Then other people can add more work on your codebase. I don't want to block our integration progress and AP it. I can help for some follow up refactor or nits if you have no time.
Thanks! @yufansong I added some comments for things that are not clear. Hope it's good for now -- since I changed the BrowserOutputObservation a bit, the integration tests are failing for some, would you mind taking a look how to fix those? EDIT: NVM, just fixed those, should be ready to go |
Sad, our project test coverage reduced by 5.87%... let me see if there's anything we could do to test this. |
I've made some progress in creating an integration test for this agent! Will create a PR in a day. |
) | ||
|
||
|
||
class SystemPrompt(PromptElement): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@frankxu2004 this prompt (along with many other prompts in this file) seems unused? Is it by intention?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, basically this whole prompt.py file is not currently used. Currently the agent is a simplified version for ease of understanding. However I included here with the intention of incorporating a more complex agent using more comprehensive information as next steps. Here it's still useful as it provides others of building blocks of prompts and understanding what possible information to include as context for LLMs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These PRs are mostly for chasing the neurips paper deadline so not all features are implemented yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, sounds fair. I am just having a bit trouble reproducing poetry run python ./opendevin/core/main.py -i 5 -t "tell me the usa's president using google search" -c BrowsingAgent -m gpt-4o-2024-05-13
... I tried like 5 times and only succeeded once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a bit weird, what error are you seeing? do you have logs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently the agent does not return AgentFinishAction, so to the eyes of the frame, it's always error in the end. Maybe I should add this Finish thing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, sometimes it's like this. I improved the agent a bit and fixed some issues here #1993
* initial attempt at a browsing only agent * add browsing agent * update * implement agent * update * fix comments * remove unnecessary things from memory extras * update image processing --------- Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com>
This PR aims at enabling a competitive browsing agent for #1470.
Now I transplanted the simplified demo agent used in WebArena in our agent hub.
To test, it works best with GPT-4 LLMs such as GPT-4o.