Researchers Find Chatbots Struggle With Truth vs. Belief

Research Shows Top AI Chatbots Struggle With Truth vs. Belief

A new study published in Nature Machine Intelligence warns that leading artificial intelligence chatbots — including ChatGPT, Claude, DeepSeek, and Gemini — continue to struggle when distinguishing between factual information and beliefs. Conducted by Stanford University researchers, the study surveyed 24 large language models using 13,000 questions designed to test whether they could separate knowledge from fiction and accurately identify false beliefs.

The findings suggest that while AI has rapidly improved, many models still rely on pattern recognition rather than a deeper understanding of truth. According to the paper, “most models lack a robust understanding of the factive nature of knowledge,” raising concerns as AI tools are increasingly adopted across high-stakes industries.

Major Risks in Law, Medicine, and Public Information

Researchers warned that AI confusion between fact and opinion could have significant consequences in areas such as healthcare, law, and journalism. Misjudgments, they say, could lead to incorrect diagnoses, distorted legal decisions, or widespread misinformation.

More recent models performed better, with systems released after May 2024 — including GPT-4o — achieving accuracy rates above 91% on factual tests. Older models scored between 71.5% and 84.8%, highlighting progress but underscoring the continued risk of misinformation.

The paper also cited real-world examples of AI errors, including a viral case where a chatbot misidentified former UK prime ministers and fabricated dates. Researchers and experts — including Pablo Haya Coll of the Autonomous University of Madrid — say AI systems may need more cautious response strategies, even if it limits creativity.

With 77% of U.S. ChatGPT users treating AI like a search engine, researchers argue that improving accuracy and safety is critical as AI becomes more widely relied upon for information.