Sanoma tried to build an AI tool. It ended up rebuilding its workflow

At Sanoma Media Finland, efforts to introduce AI into editorial workflows ran into a more basic constraint: how journalists handled phone interviews.

At our recent Frankfurt AI Forum, Pauliina Toivanen, Development Manager, opened with an object that captured the starting point — a USB cable. In some parts of the newsroom, it remained the primary way to transfer recordings from devices to computers, informally known as the “miracle wire” because of how often journalists relied on it.

The friction mattered at scale. Sanoma has around 450 journalists, with roughly 200 working in news and producing hundreds of interviews each week. “Imagine using that tool,” Toivanen said.

The initial ambition was straightforward: turn a phone call into a draft article that journalists could begin working on. The newsroom was already using AI transcription tools, and extending that into a broader workflow seemed like a natural progression.

Early testing in Sanoma’s daily Helsingin Sanomat, however, surfaced a different problem. There was no single way to conduct or record a phone interview. Some journalists used recorders and manual transfers, others used tools such as Elisa Ring, and some did not record at all.

“This was not something that the journalists were doing wrong,” Toivanen said. “It was just something that has evolved, prone to errors and inconsistent in quality.”

The lack of consistency made the system difficult to scale. “Our biggest problem was not the AI. It was everything that happened before it. If there is no standard input, there is no scalable AI.”

That realisation shifted the focus from automation to process.

“If there is no standard input, there is no scalable AI,” Toivanen said. “You cannot build reliable automation on top of variation.”

Rather than introducing a new tool, the team expanded one already in use: Elisa Ring, a mobile service that records phone interviews and automatically delivers the audio to the journalist’s email. The aim was to remove extra steps — no separate devices, no manual transfers — and create a consistent entry point.

See Also  WAN-IFRA and FIPP Forge Strategic Alliance to Strengthen Global Media Collaboration

The change was deliberately simple. Journalists start the recording, and the system handles the rest.

Adoption followed. “The journalists were really pleased to have a simple tool,” Toivanen said, noting that many journalists were “fed up with their old system.”

Defining “good” proved harder than building AI

The team then moved to the AI layer, developed as part of the WAN-IFRA GAMI incubator programme with technology partner Limecraft and a pilot group at Helsingin Sanomat.

The system operates as a pipeline: transcription, summarisation, and draft article generation, each stage feeding into the next and guided by prompts and editorial rules.

But before building, the team had to answer a more difficult question: what constitutes a “good” output.

For transcription, this meant deciding what level of accuracy was acceptable — whether “good” depended on speaker identification, word error rate, or linguistic precision. In Finnish, where small errors can alter meaning, these distinctions were not trivial.

“Defining good was actually even harder than building the AI tool,” Toivanen said.

The same applied to summaries and draft articles. In the pilot, transcripts and summaries proved useful as starting points, but the generated drafts often required correction. The system could select the wrong quotes, misorder information, or include inaccuracies. Hallucinations were also present.

“The AI can sometimes pick the wrong information or prioritise the information in the wrong order,” she said. “Over time, the definition of ‘good’ became tied to practical use. What defines the good is the amount of work journalists need to do after the AI step.”

See Also  Pete Hegseth’s spiritual leader on Trump, Iran, and the pope

Where the journalist decides

The final layer of the workflow remains editorial.

Journalists decide what to record, including whether sensitive conversations — such as those involving mental health or other private matters — should be captured at all. They determine what to transcribe, how to interpret outputs, and what ultimately gets published.

“There is always a journalist who decides,” Toivanen said. “They are the ones who know what matters in the story.”

The system is designed to support, not replace, that role. In the pilot, journalists described the outputs as useful when they could be trusted, particularly as a starting point rather than a finished product.

“If they trust the transcript and if they trust the draft article, it helps them,” Toivanen said. “But it’s a good starting point.”

Lessons from the pilot

The project remains in the pilot phase, but several patterns are clear.

AI did not resolve inefficiencies on its own.

“We did not improve our work by adding AI,” Toivanen said. “We improved our workflow so AI could actually do something useful.”

Standardisation came before automation. “You cannot automate the variation.”

Input quality defined output quality. Defining what “good” looks like proved as difficult as building the technical system.

These lessons extend beyond phone interviews. “If there is not a standard process, everything will be chaos,” Toivanen said.

Source link

Similar Posts