Study Reveals Timeline for AIs Full-Time Work Capabilities

Опубликовано: 27 марта, 2025

A recent analysis by METR Evaluations indicates that the capacity of artificial intelligence systems to carry out tasks is improving rapidly. The duration of independent AI operation is doubling every seven months, and by 2027, systems are expected to operate an eight-hour workday with a success rate of 50%.

METR analysts have established clear criteria for testing the long-term efficacy of AI. Each task’s instructions are straightforward, providing only the minimal additional context needed for comprehension. Each task is accompanied by a simple, algorithmically-defined evaluation function.

In contrast, the majority of tasks handled by software engineers or through machine learning typically require extensive references to prior context and are not clearly defined. Consequently, METR’s tests focus on conditions that may not always be relevant in real-world scenarios.

Moreover, a 50% success rate is not particularly high when compared to human performance.

Nevertheless, when user X visualized METR’s data, showcasing accuracy rates of 80%, 95%, and 99% on a logarithmic scale, the results validated the analysts’ findings. The scale illustrates that AI accuracy thresholds are rising quickly, while achieving near-perfect performance (99%) follows a much more gradual curve. This highlights the challenge of attaining high reliability in AI outputs. While reaching an 80% accuracy for four-hour tasks may be achievable by 2028, attaining 99% will require exponentially greater effort.

Even a fast, inexpensive system with only 50% accuracy could be a game changer—provided that a human can quickly verify its outcomes. However, such oversight could render AI implementation economically unfeasible.

On the other hand, an 80% accuracy rate appears more realistic for practical applications. For example, each task necessitates 1 million tokens at an approximate cost of $10, followed by a human review that takes 15 minutes. If the task is completed incorrectly, a live specialist will need about four hours for the corrections, with a typical human labor cost of $100 per hour. If a thousand such tasks are done manually, it would amount to 4,000 person-hours and cost around $400,000. Thus, utilizing AI for task delegation with subsequent checks would likely be more cost-effective.

Leaders in the AI sector are already noting that programmers will need to acquire new skills. OpenAI’s CEO, Sam Altman, believes that the ability to effectively leverage AI tools will become a key competency for new programmers. Earlier, Anthropic’s CEO, Dario Amodei, asserted that AI will be writing all the code for programmers within a year. In January, Meta CEO Mark Zuckerberg informed Joe Rogan that his company is developing a new AI capable of writing “most of the code” for its applications.

_{*Meta Platforms has been designated as an extremist organization, and its activities are banned in Russia.}