AI Struggles to Read Clocks: New Research Reveals Challenges in Time Interpretation

Researchers from the University of Edinburgh have evaluated the ability of seven multimodal large language models (LLMs) to interpret and generate various types of information, such as answering questions about time using different images of clocks and calendars. The study concluded that these LLMs face challenges with these fundamental tasks.

The ability to interpret time and reason about it based on visual inputs is crucial for many real-world applications, ranging from event planning to autonomous systems, the authors point out.

Despite advancements in multimodal LLMs, much of the research has primarily focused on object detection and text recognition in images, leaving time-related inference relatively underexplored, the researchers continue.

The team tested OpenAI’s GPT-4o and o1, Google DeepMind’s Gemini 2.0, Anthropic’s Claude 3.5 Sonnet, Meta’s Llama 3.2-11B-Vision-Instruct, Alibaba’s Qwen2-VL7B-Instruct, and ModelBest’s MiniCPM-V-2.6.

Researchers supplied the models with various images of analog clocks, including those with Roman numerals, in different colors, and without a second hand. They also uploaded calendar images spanning a decade.

The scientists posed a variety of questions regarding time and dates, such as what dates correspond to New Year’s or the 153rd day of the year.

Reading analog clock readings and understanding calendars involves complex cognitive steps, which include detailed visual recognition (the position of the clock hands and the arrangement of calendar days) and non-trivial numerical reasoning (day shifts in leap years), the research team notes.

Overall, AI models accurately read the time on analog clocks in less than 25% of cases. The LLMs struggled with reading clocks that had Roman numerals and stylized hands, as well as with images lacking a second hand. This issue may be linked to detecting the hands and interpreting angles on the clock face, the researchers explain.

Gemini-2.0 achieved the highest accuracy in the clock task, while o1 performed better with calendar tasks, though this model still made mistakes approximately 20% of the time.

The current study highlights a significant gap in AI’s ability to perform basic human-like activities, notes co-author Rohit Saxena, a graduate student at the School of Informatics at the University of Edinburgh. He emphasizes that such shortcomings must be addressed for the successful deployment of AI systems in time-sensitive applications.

The preprint of the study titled «Lost in Time: Clock and Calendar Understanding Challenges in Multimodal LLMs» was published on February 7, 2025, on arxiv.org (DOI: arXiv.2502.05092 [cs.CV]).

Meta Platforms*, along with its social networks Facebook** and Instagram**:
* — recognized as an extremist organization, its activities are banned in Russia;
** — are banned in Russia.