ZeroSearch: Alibabas Innovative AI Training Method for Enhanced Search Capabilities

Alibaba’s Tongyi Research Laboratory has unveiled ZeroSearch, a novel training method for large language models aimed at enhancing search capabilities without relying on actual internet search queries. To enable chatbots to provide accurate responses—particularly when their pre-existing knowledge falls short—they must learn to retrieve information on-the-fly. Current methodologies often utilize reinforcement learning (RL) and depend on real search engines like Google for skills training. However, Alibaba’s team argues that this approach is costly, difficult to oversee, and struggles with scalability.

ZeroSearch adopts a different strategy: rather than employing real web searches during training, it simulates the search process through a secondary language model. This model generates brief texts in response to search queries, offering either relevant or intentionally irrelevant information, thereby mimicking actual search results under the full control of researchers.

The core language model, Qwen-2.5, undergoes a structured training process. In each training iteration, it determines whether further information is necessary. If so, it crafts a query and forwards it to the simulation model. Next, the model reviews the generated documents and formulates a response while evaluating accuracy and gathering feedback through RL. You can verify this and test the model by visiting BotHub via the referral link.

During the initial training phase, the simulated search outcomes are deliberately beneficial. Over time, the quality gradually decreases, mirroring a programmatic training approach. This technique enables the model to make sensible inferences even from ambiguous or contradictory information, similar to real-life internet searching.

The simulation model is pre-adjusted to generate both «helpful» and «unhelpful» search results. This distinction is governed by subtle modifications in the prompts provided to the model.

Testing has demonstrated that the model can navigate complex multi-step search processes. In one instance, the question posed was, “Who is the spouse of the person who voices Smokey Bear?” Initially, the simulated search identified Sam Elliott as the voice actor. The model then conducted a second simulated search to find Sam Elliott’s spouse, uncovering Katherine Ross. It successfully connected the two pieces of information to provide an accurate answer.

The ability to decompose questions into sub-questions and build intermediate results based on them stands as a pivotal teaching objective of ZeroSearch.

Simulating the search process not only removes reliance on external search services but also significantly reduces costs. In experimental scenarios, executing 64,000 searches via SerpAPI Google incurred approximately $586 in API fees. In contrast, using the simulation model on four rented AWS A100 GPUs cost merely $71 for computational time.

Another advantage is that simulated searches are always available, provide answers in a consistent format, and can be adjusted in complexity as needed. According to the team, this makes the training process more predictable and reliable.

The team evaluated ZeroSearch against seven well-known question-and-answer benchmarks, including Natural Questions, TriviaQA, and HotpotQA. The model either matched or surpassed results from approaches trained on actual Google searches, particularly when utilizing a large simulation model with 14 billion parameters.

Smaller models with 7 billion parameters also exhibited commendable performance. A critical factor was not only the size but also whether the simulation model had been tailored specifically for the task; models solely guided by prompts performed significantly worse. Alibaba has released some of its models on HuggingFace. More information and the source code can be found on GitHub.

Source