Unlocking Claudes Thought Processes: A Glimpse Through Anthropics AI Microscope

Anthropic’s new «AI microscope» provides a limited glimpse into the inner workings of its language model, Claude 3.5 Haiku, revealing how it processes information and reasons when tackling complex tasks.

One of the primary findings, according to Anthropic, is that Claude seems to utilize a sort of language-agnostic internal representation, described by researchers as a «universal language of thought.» For instance, when the model is asked to generate the antonym of «small» in various languages, it first activates a general concept before producing the translated answer in the target language.

Anthropic reports that larger models like Claude 3.5 show a greater conceptual alignment across languages compared to smaller models. Researchers believe this abstract representation could facilitate more consistent multilingual reasoning.

The study additionally examined Claude’s responses to questions that require multi-step reasoning, such as, «What is the capital of the state where Dallas is located?» According to Anthropic, the model activates representations for «Dallas is in Texas» and then connects it to «The capital of Texas is Austin.» This sequence suggests that Claude is not merely recalling facts but engaging in multi-step reasoning.

Researchers also discovered that Claude plans several words ahead when composing poetry. Instead of generating line by line, it begins by selecting appropriate rhyming words and then constructs each line to align with these objectives. If the target words are changed, the model produces an entirely different poem, indicating intentional planning rather than simple word-by-word prediction.

For mathematical problems, Claude employs parallel processing paths—one for approximation and another for precise calculation. However, when asked to explain its reasoning process, Claude describes a method different from what it actually used, suggesting it mimics human explanations rather than accurately conveying its internal logic. Researchers noted that when presented with an incorrect prompt, Claude frequently generates a coherent but logically incorrect explanation.

A Google study follows a similar line of inquiry. A recent investigation published in Nature Human Behavior analyzed the similarities between AI language models and human brain activity during conversation. The Google team found that internal representations from OpenAI’s Whisper model are closely linked to patterns of neural activity recorded in humans. In both instances, the systems appear to predict forthcoming words before they are spoken.

Despite these similarities, researchers highlight fundamental differences between the two systems. Unlike Transformer models that can process hundreds or thousands of tokens simultaneously, the human brain handles language sequentially—word by word, over time, and with repetitive cycles. Google states, «While human brains and Transformer-based LLMs share core computational principles of natural language processing, their underlying neural architectures differ significantly.»

Source