Stop guessing, start measuring: USA Today on AI in the newsroom
Jessica Davis, Vice President of AI Product at USA Today, came with a practical framework, and a case study from her own newsroom.
Her argument is straightforward: evaluations – not goodwill or a journalist’s thumbs up – are the foundation any newsroom needs before it can trust AI at scale.
Davis was speaking at our ongoing World News Media Congress in Marseille, where she previewed her Capstone research at the City University of New York (CUNY).
What she brought to the stage was less a polished product announcement than a frank, practical account of what it actually takes to move AI from experimentation into production.
The ceiling on human-in-the-loop
Most newsrooms today are working with what Davis calls assistive AI: tools that talk back, generate text, surface information. You prompt it, you get an output, you decide what to do with it. It’s useful, but it puts all the cognitive weight on the journalist.
The direction of travel is somewhere more consequential: autonomous, agentic AI that doesn’t wait to be prompted, but takes action on a goal.
“Assistive AI is like the mouth – it can talk to you, but you have to copy and paste and take the action yourself,” Davis said. “Agentic AI is more like the hands. You give it a goal, and it can take action for you.”
The problem is that the current model of a human reviewing every output doesn’t scale. And without a smarter approach to oversight, the human in the loop becomes a bottleneck and eventually a burnout.
“We have trusted products. Trust is a key asset to our organisation. When we’re working with AI systems, they can be wrong, and they can be confidently wrong. And sometimes it’s subtle when they fail.”
The public records lesson
The most instructive part of Davis’s talk was a case study from USA Today’s own newsroom. The organisation built an agent to help journalists navigate public records requests, a workflow riddled with complexity, since laws vary across all 50 states and officials can reject requests that cite the wrong statute.
The team spent months building and testing. “The agent kept hallucinating. It kept getting things slightly wrong. And slightly wrong, in a public records context, means the request fails,” Davis noted.
Then they introduced evaluations – a structured method for defining exactly what success looked like and measuring against it.
“We moved from months to being able to ship to production within a week. And from there, we shipped multiple features within days,” she added.
What good evaluation actually looks like
Davis was careful to distinguish between what’s possible at scale – automated AI scoring AI – and what’s accessible to most newsrooms right now.
“You don’t need a data science team to start. You need a definition of success that’s specific enough to measure,” she pointed out.
The instinct in many newsrooms is to ask journalists for feedback via a thumbs up or thumbs down. Davis is clear that this doesn’t work. On the other end, detailed feedback forms are a non-starter too – journalists were too busy for that.
The solution was structured criteria developed with journalists.
“So, we brought a group of journalists into the process early, worked through what a successful public records request actually required, and built those requirements into the evaluation framework,” said Davis.
(L-R) Pundi S. Sriram (Chief Product Officer & Business Head, STEP from The Hindu Group, India), Jessica Davis (Vice President, AI Product, USA TODAY Co., USA AI Product Newsroom), Jan Helin (Chief Product Officer, Bonnier News, Sweden), and Kevin Anderson (Director of the Digital Revenue Network, WAN-IFRA, UK)
A new governance model
For Davis, evaluations are also the new governance model – dynamic, requiring continuous monitoring, and still something she is figuring out at scale with her data science team.
“This takes a black box system and helps journalists and product managers see: this is how it works, this is when it fails, this is what it needs to do to be successful in my workflow,” she said.
That, she said, creates trust, clarity about where AI belongs in a workflow and where it doesn’t, and speed.
“We can be strategic about when we need to intervene and when, through the data, we can actually trust the system to do the job we’ve given it,” she added.
