The Hindu: How AI Powers Data Journalism & Investigations at Scale

In recent months, The Hindu, a leading Indian newspaper, has significantly expanded its data journalism capabilities through the integration of large language models (LLMs). This isn’t about automating writing, but about accelerating investigations, processing vast datasets and building interactive tools with greater efficiency, according to Srinivasan Ramani, Deputy National Editor and Senior Associate Editor at The Hindu.
One major undertaking involved analyzing data from India’s Special Intensive Revision (SIR) of voter rolls. Authorities released records detailing voter deletions and the stated reasons. The team processed approximately 22 million records across three states – Bihar, Tamil Nadu, and West Bengal – which were initially provided as image-based PDFs in Hindi.
The workflow involved using optical character recognition (OCR) to convert the images into machine-readable text, translating the text into English, and storing the data in databases. Ramani’s team then utilized LLMs to generate SQL queries using natural language prompts, eliminating the need for manual database coding. As reported by WAN-IFRA, this process revealed patterns, such as a disproportionate number of women being deleted from voter rolls in Bihar despite higher male out-migration, and inconsistencies in the reasons cited for deletions.
These findings were discussed in Parliament and prompted some corrections to voter rolls in Bihar following public scrutiny and ground reporting.
The Hindu also employed LLMs to build interactive maps for the 2019 and 2024 general elections, allowing users to filter results by region, state, and other criteria. Remarkably, Ramani stated that he did not write a single line of code for these applications. According to Archyde, the entire application was built over two weeks using prompts in ChatGPT, Gemini, and Claude.
The team broke down the interface into components and used the models to generate annotated code for each, enabling verification. This significantly reduced the time required compared to previous methods that relied on in-house engineers or volunteers.
Beyond digital analysis, The Hindu used AI-assisted guidance to assemble low-cost Arduino-based devices to measure heat stress experienced by workers in Chennai. These devices recorded temperature and humidity every 10 seconds, providing data for a cook, a fisherman, an industrial worker, and an autorickshaw driver. The results revealed significant variations in heat index exposure, peaking at 69°C (156.2 F) in one instance.
Following publication of these findings, the Tamil Nadu government announced a heat management plan and explored using similar devices for further study.
Ramani emphasizes that AI tools are integrated into an established data journalism pipeline, assisting with tasks like web scraping, document processing, query generation, and front-end development. Though, he stresses that human oversight remains crucial. He describes AI as ‘a very sophisticated intern,’ capable of executing tasks precisely but requiring human direction and control.
He cautions against relying on AI for editorial conclusions, arguing that the risk of hallucination is lower in structured tasks where outputs can be directly tested.
The Hindu’s data journalism efforts have evolved over the past decade, from visual add-ons to traditional reporting to a dedicated function with data journalists, designers, and editorial coders. A notable past project included an excess deaths analysis during the COVID-19 pandemic, which estimated that official death counts were significantly underreported.
Ramani notes that data-driven reporting is now integrated across all operations, leading to increased subscriptions and engagement. He believes that AI expands the scale at which journalistic judgment can operate, ultimately contributing to a more informed audience.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *