AI Generating AI: ASI-ARCH Discovers 106 New SOTA Architectures Through Autonomous Research

ASI-ARCH is an experimental demonstration of artificial superintelligence for AI research capable of autonomously conducting scientific work aimed at discovering new neural network architectures.

The system autonomously formulates hypotheses, implements them as executable code, and trains and validates them in practice. This process resulted in 1,773 independent experiments, consuming over 20,000 GPU hours, and led to the discovery of 106 new state-of-the-art (SOTA) architectures featuring a linear attention mechanism.

During the initial phase, the system focused on smaller models with approximately 20 million parameters, training them on 1 billion tokens. This stage witnessed 1,773 experiments that took about 10,000 GPU hours. Ultimately, 1,350 promising candidates were selected, all outperforming the baseline DeltaNet architecture in terms of loss and benchmark metrics.

The second phase involved verification. The candidates from the first phase were scaled up to 340 million parameters to align with DeltaNet’s configuration. After filtering out architectures with excessive complexity or parameter counts, roughly 400 models remained. Their training on 1 billion tokens required another 10,000 GPU hours, from which 106 architectures that achieved SOTA status were determined.

For the final validation, researchers narrowed it down to five top models, which were trained on 15 billion tokens and compared against Mamba2, Gated DeltaNet, and DeltaNet.

ASI-ARCH noticeably favors established components like gating and convolution. Most notably, the distribution of components in the 106 top models exhibits a significantly less pronounced long-tail distribution compared to the other 1,667 generated architectures.

This indicates that the system succeeds not through random exploration of exotic ideas, but via iterative enhancement of proven techniques, closely resembling the methodologies used by human scientists.

Among the top AI architectures identified, PathGateFusionNet achieved an average score of 48.51 across all benchmarks. In comparison, Mamba2 scored 47.84, while the human-designed Gated DeltaNet scored 47.32. Another generation, ContentSharpRouter, reached a score of 48.34.

Looking at individual tests, PathGateFusionNet scored 60.58 on BoolQ, compared to Gated DeltaNet’s score of 60.12. The AdaptiveEntropyRouter, with 340 million parameters, achieved 44.31 on tests, which was 2.21 points higher than Gated DeltaNet’s score of 42.10.

Improvements were observed across a wide range of tasks.

Cognition refers to knowledge extracted from scientific literature;
Analysis denotes conclusions drawn from the system’s own previous experiments;
Originality represents entirely new ideas.

For the 1,773 generated architectures, the distribution of sources was as follows:

— 51.7% of ideas were derived from human literature;
— 38.2% came from the system’s analysis;
— 10.1% were original concepts.

However, when examining solely the 106 SOTA outcomes, the picture changes. The proportion of ideas based on Analysis increases from 38.2% to 44.8%, while the share from Cognition slightly declines to 48.6%.

Thus, for significant achievements, AI must do more than merely imitate and combine human contributions. It needs to analyze its experiences, learn from its own successes and failures, and synthesize more refined solutions.

Project Page
Arxiv
GitHub

Delegate some of your routine tasks with BotHub! Access to the service does not require a VPN, and you can use a Russian card. Follow this link to receive 100,000 free tokens for your initial tasks and start working with neural networks right away!