Large Language Models (LLMs) have taken center stage in many intelligent agent tasks due to their cognitive abilities and quick responses. Even so, existing models often fail to meet demands when negotiating and navigating the multitude of complexities on webpages. Factors such as versatility of actions, HTML text-processing constraints, and the intricacy of on-the-spot decision-making represent significant roadblocks.
A team of researchers has proposed a solution with AutoWebGLM, an automated web-browsing agent that improves on the capabilities of its predecessor, GPT-4. Built on the ChatGLM3-6B paradigm, AutoWebGLM features several key developments that tackle the identified challenges.
The first is an HTML simplification algorithm that reshapes webpages to present important information in a condensed manner. This format, which respects human browsing behaviors, allows AutoWebGLM to process webpage materials in enhanced ways for better comprehension.
Secondly, high-quality web browsing data has been generated to facilitate the training of AutoWebGLM. The hybrid approach merges human experience with the capacities of AI. Significant, practically-selected data helps the model to learn, adapt, and evolve its performance over time.
Furthering the agent’s learning process, reinforcement learning techniques and rejection sampling have been applied, aiding the model’s understanding of webpages, guiding its browser actions, and helping it decompose tasks autonomously.
The team has also developed a multilingual benchmark, AutoWebBench, to assess AutoWebGLM’s performance across various real-world web browsing tasks. Despite the optimistic results, there remains an acknowledgement of underlying issues that must be addressed for effective real-world navigation.
Following all these developments, the researchers summarize their core contributions. Primarily, they have successfully deployed AutoWebGLM and applied curriculum learning techniques and self-sampling reinforcement learning, along with rejection sampling fine-tuning (RFT).
Data of real-world webpage viewing activities, manually collected, and model-assisted techniques have facilitated learning, producing a dataset of over 10,000 entries. AutoWebBench, a multilingual web browsing benchmark in both English and Chinese prompts, supports an evaluation in differing linguistic environments.
Test results reveal AutoWebGLM, with its 6 billion parameters, performs at a competitive level with the latest LLM-based agents. Crucially, the AI proves genuinely usable for real-world web tasks, overcoming a significant milestone, and proving its capabilities to address the challenges of web navigation.
CONTEXT
The AutoWebGLM research paper and Github showcases the findings of the team, who acknowledge full credit for their work. There is an open invitation to join the team and engage further with the technology on various social media platforms and group discussions. Subscribers to their newsletter can also receive in-depth and timely updates on their progress. All are also welcome to join their 40k+ ML SubReddit.