Automated journalism is real and happening now. Bill Snitzer's EQBot reports on natural disasters faster than a person ever could, and the financial stories you read from the AP and Forbes were probably automatically generated -- robots write thousands of their reports every quarter.
Still, a range of issues hold automated journalism back, including fear that any step towards automation will lead to replacing journalists and devaluing content. Frankly, we should be afraid; many recently established technologies have had exactly this effect. But there is nothing "inevitable" –to use a popular phrase– about the way technology takes shape, and I believe that we can create a future where automated journalism extends the abilities of journalists, rather than replaces them.
The projects above output stories that look a lot like the products of traditional journalism, but they do so at lightning speed and, importantly, they do so relentlessly. This relentlessness brings up a fundamental difference that we should be aware of when building autonomous bots.
There is a feedback loop between journalists and readers that is difficult and dangerous to build into autonomous systems. A journalist can think about how their audience's knowledge of a subject has evolved over time, or about how expectations for an upcoming financial report have been influenced by leaked information. Bots writing wire stories cannot. Of course, when they are being set up, they can be designed to produce the most useful stories possible. But newsrooms should not treat bots as solutions they can set and forget. Part of integrating automated journalism should be a commitment to regularly revisit decisions and incorporate feedback.
Some automated journalism projects will attempt to automate this feedback process. Unfortunately, the things these systems will measure and track are invariably imperfect proxies for reality. For the same reason that is it fundamentally difficult to quantify the influence of a given news story, it is difficult to build systems that wisely incorporate feedback. Data about page views and mouse clicks alone are never going to give you enough context to make informed decisions about what news should be covered. There are huge cultural forces at play that are extremely difficult to capture in software models.
An alternative setup, where any editorial decisions made by bots are explicitly codified and are static unless changed by journalists, is laborious and will still be error prone. But it also presents advantages over traditional journalism. Algorithms have bias, to be sure, but their bias can be studied, and they can be adjusted. For this reason newsrooms should reject proprietary technology, insist on controlling the algorithms they use, and build systems on foundations of open source software.
This preference for open source systems and algorithms also helps with another guiding principle I'd like to articulate: We should strive to make our tools as useful to individuals and small newsrooms as to big ones. Technological progress can have the unfortunate side effect of consolidating decision making into the hands of a few people with similar perspectives. We should remember that there is strength in diversity, and that data will be more or less valuable in different contexts. News about a spike in political campaign contributions from an individual in a small town might not register nationally, but it could be very important to the residents of said small town.
Finally, our automated systems should be surfacing things that are meaningful, not simply measurable. When a metric is easy to track, it can take on significance beyond its actual importance. Because automation is already used to generate things like AP wire stories that are redistributed widely, journalists have to be especially mindful that small changes in what data is presented can have a large impact on our collective perception of what's important to think about.