R&D Amplifier
February 09, 2026 3 min read 532 words

Jean Kaddour, Srijan Patel, Gbètondji Dovonon, Leo Richter, Pasquale Minervini, Matt J. Kusner

This article is AI-generated from a scientific publication. We recommend verifying information in the original source.

Agentic Uncertainty Reveals Agentic Overconfidence

In Brief

This research explores whether AI agents can accurately judge their own chances of succeeding at a task. "Agentic uncertainty" refers to how well an AI can predict its own success—measuring if it overestimates or underestimates its odds. The study finds that even when AI agents fail most of the time, they often believe they’ll succeed, a flaw called "overconfidence."

The Problem

AI agents are increasingly used to make decisions, from writing code to solving complex problems. But if they don’t know when they’re likely to fail, they can make dangerous or costly mistakes. This lack of self-awareness—called agentic uncertainty—can lead to overconfidence, where an AI thinks it’s doing well even when it’s not. Knowing when an AI is wrong is crucial for safety, especially in high-stakes areas like healthcare, finance, or autonomous systems. Without reliable self-assessment, we can’t trust AI to act independently.

The Solution

Researchers tested how well three major AI models—GPT, Gemini, and Claude—could predict their own success across three stages: before starting a task (pre-execution), during the process (mid-execution), and after finishing (post-execution). They also tried a new method: adversarial prompting, where the AI is asked to act as a "bug finder" to review its own work. This reframing led to more honest predictions. The process is visualized in, which shows how the AI’s assessment evolves across three phases: from 0–25% to 25–75%, and finally to a full outcome at 100%.

Figure 2
The figure illustrates a three-stage process where the outcome at 100% depends on the path taken through the mid-execution phase.

The path to success or failure depends on decisions made mid-process, and the researchers measured how well the AI’s confidence matched actual results.

Key Findings

  • Some AI agents that succeeded only 22% of the time predicted a 77% chance of success—showing strong overconfidence across all models.
  • Surprisingly, pre-execution assessments (with less information) were better at distinguishing success from failure than post-execution reviews, though differences were not always statistically significant.
  • The adversarial prompting method—asking the AI to find flaws in its own reasoning—produced the best-calibrated predictions, outperforming standard self-reviews. This is clearly shown in, where the "Adversarial" bar for each model shows lower overconfidence than pre- or post-execution checks.
Figure 1
The chart compares the overconfidence levels of three AI models across different conditions, with GPT showing the highest overconfidence in the Pre-Exec category.
  • reveals that while predictions for success were generally well-calibrated (higher predicted success rates matched higher actual success rates), failure predictions were more uncertain, suggesting AI struggles to recognize when it’s truly failing.

Why It Matters

Better self-assessment in AI could prevent dangerous overreliance on flawed systems. For example, if a medical AI suggests a treatment but is actually wrong 75% of the time, it should know it’s uncertain—not confident. By using adversarial prompting, we might train AI to be more honest about its limits, improving safety in applications like self-driving cars, financial advice, or automated coding. This could help humans trust AI more, only using it when it’s truly reliable.

Limitations

  • The study only tested three AI models, so results may not apply to all systems.
  • The researchers report that differences between pre- and post-execution assessments were not always significant, suggesting some results may be inconsistent across tasks.
  • The study focuses on language models, so findings may not directly transfer to other AI types like vision or robotics systems.
Read Original Paper
All Articles