LLMs bow to pressure, changing answers when challenged: DeepMind study

5gDedicated

Large language models (LLMs) such as GPT-4o and Google’s Gemma may appear confident, but new research suggests their reasoning can break down under pressure, raising concerns for enterprise applications that rely on multi-turn AI interactions.

A study by researchers at Google DeepMind and University College London has revealed that LLMs display a human-like tendency to stubbornly stick to their initial answers when reminded of them, but become dramatically underconfident and prone to changing their minds when presented with opposing advice, even when that advice is incorrect.

“We show that LLMs – Gemma 3, GPT4o and o1-preview – exhibit a pronounced choice-supportive bias that reinforces and boosts their estimate of confidence in their answer, resulting in a marked resistance to change their mind,” the researchers said in the paper. “We further demonstrate that LLMs markedly overweight inconsistent compared to consistent advice, in a fashion that deviates qualitatively from normative Bayesian updating.”

“Finally, we demonstrate that these two mechanisms – a drive to maintain consistency with prior commitments and hypersensitivity to contradictory feedback – parsimoniously capture LLM behavior in a different domain,” the researchers added.

The findings challenge conventional assumptions about LLM reliability in multi-turn enterprise applications, where decision support and task automation increasingly rely on AI-generated confidence.

This hypersensitivity to criticism and poor calibration under pressure could introduce hidden risks for enterprises deploying conversational AI in regulated, high-stakes, or customer-facing workflows.

Threats to enterprise AI

Analysts say the tendency of LLMs to reverse their answers under pressure is not a one-off glitch but a structural weakness in how these systems handle multi-turn reasoning.

The DeepMind study reinforces what has been observed in real-world use: models that initially give correct answers often discard them when faced with confident user input, even if that input is incorrect.

“This trait, termed ‘sycophancy’ by Stanford researchers, arises from an overemphasis on user alignment over truthfulness during model fine-tuning,” said Sanchit Vir Gogia, chief analyst and CEO at Greyhound Research. “In enterprise applications like customer service bots, HR assistants, or decision-support tools, this deference creates a paradox: the AI appears helpful while making the system less trustworthy over time.”

As AI becomes embedded in core workflows, organizations are being urged to shift from single-turn validations to treating dialogue integrity as a critical and testable aspect of system performance.

Dealing with nuanced sycophancy

Researchers noted that the sycophantic behavior of LLMs may be partly explained by how they are trained using reinforcement learning from human feedback (RLHF), a technique intended to align responses with user preferences.

However, the study found a more nuanced pattern than simple sycophancy.

While sycophancy typically involves a symmetrical bias toward both agreeing and disagreeing input, the models showed a stronger sensitivity to opposing advice than to supportive input, with the likelihood of changing answers influenced by the model’s initial confidence.

“While this behavior may enhance perceived helpfulness in consumer settings, it creates systemic risk in enterprise environments that rely on AI to enforce boundaries,” Gogia said. “Whether in banking KYC, healthcare triage, or grievance resolution, enterprises need AI systems that assert truth, even when the user insists otherwise. Sycophancy undermines not only accuracy but institutional authority.”

Enterprises will need to adopt alignment strategies that prioritize factual accuracy over user satisfaction, especially when the two are at odds.LLMs bow to pressure, changing answers when challenged: DeepMind study – ComputerworldRead More