Three months ago, NPR staged a little experiment in digital journalism, pitting the media company's White House correspondent, "wicked fast reporter" Scott Horsley, against a machine called WordSmith to see who could write a news story faster. Horsley took seven minutes to compose an article about a Denny's earnings report. WordSmith only needed two. Nobody seemed all that surprised.
It's true that robots can write the news — and, over time, can even learn to mimic human linguistic flourishes. But unprompted — without being given the discrete, specific task of digesting a press release or an earnings report — can machines learn to suss out good stories? Can they grasp what humans find interesting?
Travis Hoppe, a redditor with a Ph.D. in physics, was fascinated by these questions, and their implications for the nature of intelligence and the limits of automation. So for 90 days this summer, Hoppe ran a Turing Test of sorts. Hoppe had long been an admirer of the subreddit Today I Learned (better known as TIL, an eclectic listing of bizarre bar trivia and factoids of historical significance), and he suspected he could create an algorithm that would suggest interesting content for new, popular TIL posts.
To teach the bot what redditors found compelling, Hoppe first downloaded Wikipedia in its entirety. He then examined successful, highly upvoted Today I Learned posts that referenced the online encyclopedia. Hoppe trained his machine to recognize the words and phrasings of Wikipedia articles whose content was found in popular TIL posts — as it turns out, largely stories about Nazis, racism, and unsolved grisly murders, according to Hoppe. He then unleashed the machine on all of Wikipedia's content, scouring the database for undiscovered articles it thought would do well on the subreddit, after which he wrote a headline and posted the stories — the only part of the process that wasn't automated.
Hoppe's machine was shockingly successful. Of the roughly 50 stories it suggested to Hoppe, who then put them up, four made it to the top of TIL — an all-but-insane batting average on a popular forum like TIL. Three of them made it to the front page of Reddit. Hoppe's algorithm's most popular post was about outlandish remarks regarding McDonald's and the people of Japan, attributed to a Japanese businessperson; today, it has more than 1,600 comments. The second-most upvoted post told the story of an imprisoned American who received a pardon by secretly eating soap and faking serious illness.
But only Hoppe knew that the savvy redditor wasn't a person at all, but a bot. Once Hoppe revealed that this Reddit account was run by a machine he was banned from Today I Learned. But, he said, the termination of his experiment was a kind of validation; by generating posts that resonated with the Reddit community, Hoppe's machine passed for human. Until he pulled back the curtain.
"I wanted to see if these posts would be vetted by the community, and if it could pass muster like the content that's on there, like the human redditors," he told BuzzFeed News. "I feel like it's passed the Turing Test."
"I could have posted for the next year and they would have never known," he added, with clear delight.
Hoppe — who also organizes a DC Meetup event called Hack and Tell, where hackers share their stories of data wizardry — sees in his experiment big (if far-off) implications for machine learning within newsrooms and beyond. "All I did with this project is an advanced form of pattern recognition," he said. "Which is cool. But every time you do a Google search, that is essentially some form of pattern recognition." Looking outside of Reddit and Wikipedia and to other data sets, Hoppe added, "There's no reason why you couldn't train this, if you had larger computer power, and more manpower, let this loose on the real world and find new things to write about, or things that aren't new but you could make it new news."
The idea that creativity simply can't be aped by computers is one that's held tightly by humans who fear becoming obsolete. But it's hard to deny technology's impact on news judgment. A precise knowledge of news traffic, lists of most read and most emailed articles, search engine optimization, Twitter numbers, and the Facebook feed all signal to readers and writers what interesting stories look like. By suggesting or digging up what might be compelling information, machines may recognize other invisible patterns, exaggerating existing tendencies or augmenting editorial discretion.
For his part, Hoppe, the man behind the machine, is optimistic about the future of bot-infused newsrooms. "There's lots of people who are doing awesome things with machines and we are going to see a lot more of this," Hoppe said, "not just in Reddit but larger pattern recognition problems that are going to pretty much blow our minds."
Hamza Shaban is a technology policy reporter for BuzzFeed News and is based in Washington, DC.
Contact Hamza Shaban at Hamza.Shaban@buzzfeed.com.
Got a confidential tip? Submit it here.