"Can A Computer Algorithm Do The Job Of A Historian?" is the second article in a BuzzFeed series written with help from Columbia University's History Lab. This team of historians and data scientists is developing a "Declassification Engine" that turns documents into data and mines it for insights about the history and future of official secrecy. The stories draw from the lab's searchable database of over 2 million declassified government documents.
How do we decide what counts as history? Well, there's the first draft, journalism — the stories the media tells about the events of the day. And then there are the endless subsequent iterations, mined from primary sources and dusted off and polished by historians into arguments and narratives that shape our understanding of the world.
Then there's a third option, one that is made possible by the deluge of electronic records kept in the second half of the 20th century, and tools of modern data science: automatic event detection. That's the idea that software can read historical data to try to pick out patterns — discrete events that stick out from an ocean of data as significant.
In the early 1970s, the State Department began keeping electronic records of the thousands of cables its employees sent about American interests throughout the world. Researchers at Columbia's Declassification Engine project believe it's possible to automatically distinguish periods of increased activity in these cables that correspond to historically important events.
Three Columbia University statisticians — Rahul Mazumder, Yuanjun Gao, and Jonathan Goetz — developed an advanced statistical model that allowed them to sift through 1.7 million diplomatic cables from the years 1973–1977, including 330,000-odd cables in which only the metadata has been declassified. The model, with the help of the 2,600 cores in Columbia's High Performance Computer Cluster, isolated 500 "bursts" — periods of heightened activity where more cables were being sent. And from those 500, the team investigated the top 10, what you might call the most active areas of American diplomacy in a four-year span that included the end of the Vietnam War, roiling conflict in the Middle East, and the OPEC oil embargo.
1. January 1977: Jimmy Carter takes office
2. November 1977: Anwar Sadat visits Israel
3. May 1977: Unknown
The third most pronounced burst, from February to June 1977, doesn't obviously correspond to any major world event, and indeed, many of the cables in this burst (more than 400,000) are tagged "US," suggesting that the peak has to do with the way domestic policies influence foreign relations. The peak day of the burst, for example, came on the day the government announced a policy curtailing weapons sales to foreign nations, with a notable exemption: Israel.
4. April 1975: The fall of South Vietnam
5. May 1977: Vietnamese refugee crisis
The fall of South Vietnam precipitated the flight of hundreds of thousands (and eventually millions) of Vietnamese from communist rule. These so called "Vietnamese boat people" were likely the subject of many of the cables in this burst, which corresponds to a massive uptick in the number of refugees.
6. September 1977: U.S. relinquishes control of the Panama Canal
7. October 1973: The Yom Kippur War
8. July 1974: Turkey invades Cyprus
9. November 1975: Portugal Leaves Angola
10. November 1977: U.N. imposes an arms embargo against South Africa
So how did the Declassification Engine do? The head of the History Lab team, Matthew Connelly, asked another historian — Daniel Sargent, who recently published a history of American foreign policy in the '70s — to rank his own 10 most important events from 1973–1977. While there was broad agreement between the computer and the historian on the importance of Middle Eastern politics in the period, Sargent ranked highly some events, including China's post-Mao transition, that were hardly picked up by the Declassification Engine, and became obviously significant only in hindsight.
In the History Lab blog, Sargent writes, "Comparing and contrasting my top ten with the results that the History Lab generated, I feel a certain relief. For all the differences, which are substantive, our conclusions are not so far removed as I'd feared." You can read the rest of his list there.
Joe Bernstein is a senior technology reporter for BuzzFeed News and is based in New York. Bernstein reports on and writes about the gaming industry and web culture.
Contact Joseph Bernstein at email@example.com.
Got a confidential tip? Submit it here.