Tech

Can An Algorithm Do The Job Of A Historian?

The Declassification Engine attempts to determine the most important events in American foreign policy.

Posted on

"Can A Computer Algorithm Do The Job Of A Historian?" is the second article in a BuzzFeed series written with help from Columbia University's History Lab. This team of historians and data scientists is developing a "Declassification Engine" that turns documents into data and mines it for insights about the history and future of official secrecy. The stories draw from the lab's searchable database of over 2 million declassified government documents.

How do we decide what counts as history? Well, there's the first draft, journalism — the stories the media tells about the events of the day. And then there are the endless subsequent iterations, mined from primary sources and dusted off and polished by historians into arguments and narratives that shape our understanding of the world.

Then there's a third option, one that is made possible by the deluge of electronic records kept in the second half of the 20th century, and tools of modern data science: automatic event detection. That's the idea that software can read historical data to try to pick out patterns — discrete events that stick out from an ocean of data as significant.

In the early 1970s, the State Department began keeping electronic records of the thousands of cables its employees sent about American interests throughout the world. Researchers at Columbia's Declassification Engine project believe it's possible to automatically distinguish periods of increased activity in these cables that correspond to historically important events.

Three Columbia University statisticians — Rahul Mazumder, Yuanjun Gao, and Jonathan Goetz — developed an advanced statistical model that allowed them to sift through 1.7 million diplomatic cables from the years 1973–1977, including 330,000-odd cables in which only the metadata has been declassified. The model, with the help of the 2,600 cores in Columbia's High Performance Computer Cluster, isolated 500 "bursts" — periods of heightened activity where more cables were being sent. And from those 500, the team investigated the top 10, what you might call the most active areas of American diplomacy in a four-year span that included the end of the Vietnam War, roiling conflict in the Middle East, and the OPEC oil embargo.

1. January 1977: Jimmy Carter takes office

United States Air Force

The most pronounced burst of the period under examination starts just as Jimmy Carter takes office, in January 1977. Carter was elected on a foreign policy platform that placed human rights at the forefront. In his Jan. 20 inaugural address, Carter famously said, "Because we are free, we can never be indifferent to the fate of freedom elsewhere. Our moral sense dictates a clear-cut preference for those societies which share with us an abiding respect for individual human rights." The increased amount of activity presumably relates to foreign policy changes stemming from Carter's mandate.

2. November 1977: Anwar Sadat visits Israel

Ya'acov Sa'ar, Government Press Office / Via archives.gov.il

The second burst corresponds to the Egyptian president's visit to Israel, the first such official visit by an Arab head of state ever. The visit led eventually to the Camp David Accords, and the peace treaty between the two states, in 1979.

3. May 1977: Unknown

The third most pronounced burst, from February to June 1977, doesn't obviously correspond to any major world event, and indeed, many of the cables in this burst (more than 400,000) are tagged "US," suggesting that the peak has to do with the way domestic policies influence foreign relations. The peak day of the burst, for example, came on the day the government announced a policy curtailing weapons sales to foreign nations, with a notable exemption: Israel.

5. May 1977: Vietnamese refugee crisis

The fall of South Vietnam precipitated the flight of hundreds of thousands (and eventually millions) of Vietnamese from communist rule. These so called "Vietnamese boat people" were likely the subject of many of the cables in this burst, which corresponds to a massive uptick in the number of refugees.

6. September 1977: U.S. relinquishes control of the Panama Canal

White House

This burst peaks five days before Jimmy Carter and Omar Torrijos signed the treaty that gave Panama full control of the canal in 1999; the cables composing it concern which foreign governments would be present at the signing ceremony.

7. October 1973: The Yom Kippur War

"Bridge Crossing" by Unknown - Flickr: ciagov Licensed under Public Domain via Wikimedia Commons - commons.wikimedia.org

This war, started by a coordinated surprise attack by Egypt and Syria against Israel, saw the American government airlift hundreds of thousands of tons of arms and supplies to the Israelis; the burst corresponding to this war peaks as these airlifts begin.

9. November 1975: Portugal Leaves Angola

Author unknown

The ninth-largest burst corresponds to the end of the Portuguese colonial presence in Angola and indeed the last of the European powers to leave its African colonies.

10. November 1977: U.N. imposes an arms embargo against South Africa

Author unknown

United Nations Security Council Resolution 418 imposed an arms embargo against the apartheid government of South Africa, led by Prime Minister B.J. Vorster (above).

So how did the Declassification Engine do? The head of the History Lab team, Matthew Connelly, asked another historian — Daniel Sargent, who recently published a history of American foreign policy in the '70s — to rank his own 10 most important events from 1973–1977. While there was broad agreement between the computer and the historian on the importance of Middle Eastern politics in the period, Sargent ranked highly some events, including China's post-Mao transition, that were hardly picked up by the Declassification Engine, and became obviously significant only in hindsight.

In the History Lab blog, Sargent writes, "Comparing and contrasting my top ten with the results that the History Lab generated, I feel a certain relief. For all the differences, which are substantive, our conclusions are not so far removed as I'd feared." You can read the rest of his list there.

Joe Bernstein is a senior technology reporter for BuzzFeed News and is based in New York. Bernstein reports on and writes about the gaming industry and web culture.

Contact Joseph Bernstein at joe.bernstein@buzzfeed.com.

Got a confidential tip? Submit it here.