Revolutionizing AI: Energy-Based Transformer Architecture Enhances Systematic Thinking Capabilities

The newly introduced architecture known as Energy-Based Transformer (EBT) aims to train AI models to engage in analytical and step-by-step problem-solving.

Most contemporary AI models operate on what Daniel Kahneman refers to as «System 1 thinking»: they are quick, intuitive, and excel at pattern recognition. However, a study conducted by researchers from the University of Virginia, the University of Illinois Chicago, Stanford, Harvard, and Amazon GenAI reveals that these models often struggle with tasks that require the slower, analytical «System 2 thinking,» such as complex logical reasoning or advanced mathematics.

In the paper titled «Energy-Based Transformers: Scalable Learning and Thinking Systems,» the researchers explore whether such reasoning abilities could develop purely through unsupervised learning. They propose a novel architecture: the Energy-Based Transformer (EBT).

The EBT approach views thinking as an iterative optimization process. Instead of generating a final answer in one step, the model begins with a random solution and then assesses it by calculating an «energy» value.

A lower energy indicates a better fit of the prediction to the context. Through numerous adjustments using gradient descent, the answer is refined until the energy reaches a minimum. This methodology enables the model to allocate more computational resources to tackle more complex problems.

The concept of approaching this process from an energy perspective is not new; leading AI specialist Yann LeCun from Meta and other scholars have discussed «energy-based models» for many years.

In their experiments, the researchers compared the EBT to an enhanced version of the Transformer model (Transformer++). Their findings suggest that EBT scales more effectively, reporting a 35% increase in scaling efficiency concerning data, parameter count, and computational resources. This indicates improved efficiency in data and resource usage.

Nevertheless, the true strength lies in what the authors term «scalability of thinking,» meaning the ability to boost performance through additional computations during execution. When tackling language tasks, EBT enhanced performance by 29%, particularly in challenges significantly different from the training data.

In image denoising tests, EBT outperformed Diffusion Transformers (DiT), achieving results with 99% fewer computational steps. The research also showed that EBT learned to represent images in a way that the classification accuracy on ImageNet-1k improved nearly tenfold, indicating a deeper understanding of the content.

Despite these promising results, certain issues remain unresolved. The main challenge pertains to computational demands: the study notes that training EBT requires 3.3 to 6.6 times more computational power (FLOPs) than standard transformers. These extra costs could hinder many practical applications. Furthermore, the research primarily evaluates «System 2 thinking» based on improvements in complexity scores rather than on actual logical reasoning tasks, with no comparative analysis to modern logical reasoning models due to limited computational budgets.

All scalability forecasts are based on experiments with models containing up to 800 million parameters, significantly fewer than those found in today’s largest AI systems. It remains unclear whether the advantages of EBT will hold up at larger scales.

Delegate some of your routine tasks with BotHub! No VPN is required for access to the service, and you can use a Russian card. Click here to receive 100,000 free tokens for your first tasks and get started with neural networks right now!

*Meta and its products (Instagram, Facebook) are banned in the Russian Federation.

Translation and source of the news can be found here.