News summaries from AI chatbots have major accuracy problems
A study from the BBC and EBU found that 45% of responses had significant issues.
• 4 min read
When it comes to aggregating the news, ChatGPT and its ilk could use some copy editors.
A far-reaching study from the BBC and the European Broadcasting Union found that current affairs-related summaries from four major AI chatbots were rife with errors. Nearly half (45%) of more than 3,000 responses contained at least one significant issue, including sourcing problems (31%), factual inaccuracy (20%), or a lack of context (14%).
The research spanned 22 public service media organizations across 18 countries and 14 languages. Journalists at each of these orgs prompted ChatGPT, Perplexity, Google Gemini, and Microsoft Copilot for news, then analyzed the responses.
The authors used the findings to create a comprehensive toolkit aimed to boost AI news integrity for tech companies, media orgs, researchers, and the public.
As more people turn to chatbots for informational needs, the news outlets wanted to understand what that means for the media ecosystem, as well as their own reputational risks if content is being distorted.
“This research conclusively shows that these failings are not isolated incidents,” EBU Media Director and Deputy Director General Jean Philip De Tender said in a statement. “They are systemic, cross-border, and multilingual, and we believe this endangers public trust. When people don’t know what to trust, they end up trusting nothing at all, and that can deter democratic participation.”
“Not isolated incidents”: The new report builds on a similar but smaller undertaking by the BBC in February, which found that 51% of AI answers to questions about news contained errors.
Accuracy levels varied across different chatbots. Gemini performed by far the worst, with 76% of responses containing significant issues—double the error rate of the next highest, Copilot, at 37%. ChatGPT clocked in at 36% and Perplexity at 30%. The bulk of Gemini’s failures seemed to be sourcing issues: 72% of responses had some form of misattribution (including a lack of any sources at all), compared to 24% from ChatGPT and 15% from both Perplexity and Copilot.
Keep up with the innovative tech transforming business
Tech Brew keeps business leaders up-to-date on the latest innovations, automation advances, policy shifts, and more, so they can make informed decisions about tech.
What’s wrong: Chatbots sometimes struggled to properly cite laws; Perplexity claimed that surrogacy is illegal in the Czech Republic, for instance, when it is in fact unregulated. And information was often outdated; chatbots struggled to keep up with the latest European leaders and the new pope.
AIs had an especially hard time with breaking or complex news stories, even after they were no longer developing.
“The data suggests that assistants particularly struggle with fast-moving stories with rapidly changing or updating information (Trump trade war, Myanmar); intricate timelines involving multiple actors (Yemen); detailed information (China exports, Trump trade war, Trump tariffs); or topics that require clear distinction between facts and opinions and proper attribution of claims (Orbán, climate change),” the authors wrote.
Hold the news: In addition to creating misinformation and posing brand risks to news outlets, these accuracy problems may actually be hindering the use of AI chatbots for news consumption as well. A Pew survey earlier this month found that only 2% of Americans regularly get news from chatbots (7% said “sometimes”). Those numbers stand in contrast to the growing share of Americans who use ChatGPT in general (34% as of this June).
Respondents also seem keenly aware of the factual issues at play. A third of Americans who use chatbots for news said they find it difficult to determine what’s accurate, Pew reported, and about half said they come across news they think is inaccurate.
Keep up with the innovative tech transforming business
Tech Brew keeps business leaders up-to-date on the latest innovations, automation advances, policy shifts, and more, so they can make informed decisions about tech.