AI29 views6 min read

New York Times Uses AI for Investigative Journalism

The New York Times has formed a dedicated AI team to help journalists analyze massive datasets, leading to major investigative stories without using AI to write articles.

Grace O'Malley
By
Grace O'Malley

Grace O'Malley is a media and technology correspondent for Neurozzio, reporting on how digital transformation and artificial intelligence are reshaping the news and publishing industries.

Author Profile
New York Times Uses AI for Investigative Journalism

The New York Times has established a specialized team to integrate artificial intelligence into its newsroom, focusing on analyzing large and complex datasets for investigative reporting. This initiative, led by Editorial Director of A.I. Initiatives Zach Seward, aims to equip journalists with tools to tackle stories that were previously too data-intensive to pursue.

Key Takeaways

  • The New York Times has an eight-person team dedicated to developing AI tools for its newsroom.
  • AI is primarily used for research and analysis of large datasets, not for writing articles.
  • The team built an internal tool called "Cheat Sheet" to help reporters process complex information.
  • Extensive training is provided to the newsroom, with a strong emphasis on caution and fact-verification.

A Dedicated Team for AI Initiatives

In December 2023, The New York Times created the role of editorial director of A.I. initiatives, appointing Zach Seward to lead the effort. This move reflects a broader trend among media companies to explore how AI can provide a competitive edge to reporters.

Seward's team consists of eight professionals, including four engineers, a product designer, and two editors. Their primary mission is to develop and implement AI solutions that can assist journalists with complex research tasks.

Speaking at the Digiday Publishing Summit, Seward stated that using AI for research and investigations is "by far the biggest use of our resources and I think the biggest opportunity right now when it comes to AI in media."

Processing Massive Datasets for Investigations

The core function of the AI team is to help reporters manage and extract insights from enormous volumes of information. The team often works on a specific project with a journalist and then uses that experience to create a repeatable process or tool for the entire newsroom.

What is Semantic Search?

Unlike a standard keyword search (like Ctrl+F), semantic search understands the context and concepts within a text. It allows users to search for ideas or topics, even if the exact keywords are not present. Seward described it as "vibes-based searching," which is highly effective for analyzing transcripts and documents where subjects are discussed indirectly.

Case Study: Election Interference Recordings

One of the team's significant projects involved a reporter who obtained 500 hours of leaked Zoom recordings from an election interference group. Manually reviewing this amount of audio before an election deadline was an impossible task.

The AI team used tools to transcribe the recordings, which amounted to approximately 5 million words. They then applied semantic search technology to help the reporter identify relevant conversations and concepts within the massive text corpus.

"Where AI becomes useful… where you’re looking for topics, concepts, things that are similar. And that’s hugely useful when looking through enormous corpuses of text," Seward explained.

This AI-assisted analysis enabled the publication of a major story before the presidential election, demonstrating the technology's potential for high-impact journalism.

Case Study: Puerto Rico Tax Registrations

In another instance, a reporter had an unorganized list of 10,000 names of individuals who had registered for a tax cut in Puerto Rico. Verifying each name individually would have been impractical.

Seward's team used AI to automate the process of searching for these names online. The system then analyzed the search results for specific markers that the reporter was interested in, helping to sort the list into more promising leads for further investigation.

While the AI's analysis was not perfectly accurate, it served as a powerful filtering tool, allowing the reporter to focus their efforts on the most relevant individuals.

Developing In-House Tools for Reporters

The experiences from these projects led to the development of an internal, spreadsheet-based AI tool named "Cheat Sheet." This tool allows reporters, with guidance from the AI team, to apply various large language models (LLMs) to their own datasets.

According to Seward, Cheat Sheet is now in use by several dozen reporters across the newsroom. While he did not specify all the external technologies used, he confirmed The Times utilizes a mix of commercial AI providers and open-source models.

The Team's Approach

Seward described his team's strategy as taking on individual reporting challenges involving "knotty, huge, messy data sets" with an "immediate deadline," but "always with an eye toward building up tooling that will make that repeatable in the future."

Training and Guidelines for the Newsroom

A key part of the initiative is education. Seward's team maintains constant communication with journalists to understand their needs and provide training on how to use AI responsibly.

To date, the team has conducted training sessions for 1,700 of the 2,000 people in The New York Times newsroom. An open Slack channel also serves as a forum for reporters to ask questions and share ideas on using AI technology.

Seward noted that AI is not used to write articles. However, reporters are permitted to use it for drafting ancillary copy, such as headlines or SEO text, based on already published articles.

A Cautious Approach to AI Adoption

Despite the potential benefits, Seward and his team promote a message of caution. He consistently reminds editorial staff to be skeptical of AI-generated information.

"Never trust output from an LLM. Treat it… with the same suspicion you would a source you just met and you don’t know if you could trust," Seward advises his colleagues.

The team actively addresses skepticism among journalists by acknowledging their concerns about the technology's risks, including legal and editorial implications. Their goal is not to be "AI boosters" but to demonstrate how these tools can provide a tangible competitive advantage in reporting.

Seward's biggest concern is a potential error in a story that could be attributed to AI. He clarified the organization's stance on accountability: "We would never attribute an error to AI, meaning it’s always on us." He added, "I would 100% feel responsible" if such an incident occurred.