Today, BuzzFeed News is sharing an enormous dataset — one that sheds light on four decades of the United States’ federal payroll.
The dataset contains hundreds of millions of rows and stretches all the way back to 1973. It provides salary, title, and demographic details about millions of U.S. government employees, as well as their migrations into, out of, and through the federal bureaucracy. In many cases, the data also contains employees’ names.
We obtained the information — more than 30 gigabytes of it — from the U.S. Office of Personnel Management, via the Freedom of Information Act (FOIA). Now, we’re sharing it with the public. You can download it for free on the Internet Archive.
This is the first time, it seems, that such extensive federal payroll data is freely available online, in bulk. (The Asbury Park Press and FedsDataCenter.com both publish searchable databases. They’re great for browsing, but don’t let you download the data.)
We hope that policy wonks, sociologists, statisticians, fellow journalists — or anyone else, for that matter — find the data useful. Here are just a few of the questions it might help answer:
How have changes in federal employment compared with broader demographic changes in the nation?
What do career trajectories look like in federal government?
Do some agencies have more of a “revolving door” than others?
Which federal gigs have the best job security?
What type of employee is most likely to quit during a presidential transition?
Do healthcare outcomes decline at VA hospitals after spikes in retirements or resignations?
Do salmonella outbreaks decrease after the USDA hires more food inspectors?
Are younger air traffic controllers correlated with higher accident rates?
If you find something interesting in the data, we’d love to hear about it. In the meantime, here’s a bit more about what we’ve published:
We obtained the information through two Freedom of Information Act requests to OPM. The first chunk of data, provided in response to a request filed in September 2014, covers late 1973 through mid-2014. The second, provided in response to a request filed in December 2015, covers late 2014 through late 2016. We have submitted a third request, pending with the agency, to update the data further.
We are sharing the data in the format that OPM and Department of Defense (which sent the data pertaining its employees) have provided it to us. The only changes we’ve made are to unzip and reorganize the files so that they’re easier to browse and download. The data are formatted as a “fixed-width” and “pipe-delimited” text files. You should be able to load them into most database and spreadsheet programs (though the larger files might push Excel to its limits).
You’ll find two types of files. One tracks the “status” of every federal employee at the end of every fiscal quarter. The other tracks entries and exits, known in bureaucratic lingo as “accessions” and “separations.” (That data only goes back to 1982.) For each employee, the “status” data includes the following information:
Name (with certain exceptions; see below)
Location (only to the state/country level in recent data, but a more detailed “duty station” before that)
Age (as a range)
Education level, and years since degree (as a range)
Adjusted basic pay
Pay plan and pay grade
Type of appointment (e.g., “career,” “nonpermanent,” “Schedule A,” etc.)
Work schedule (e.g., full time, part-time seasonal, etc.)
Alongside the data, we’ve also published the documentation that OPM and the Department of Defense provided us. (For a quick overview, you can find more details about the fields above in this data dictionary.)
There are some crucial caveats about, and limitations to, the data we’re sharing. For example: Unlike the data searchable through the Asbury Park Press or FedsDataCenter, the data we received does not include information about bonuses or other additional compensation.
Between our first and second requests, OPM announced it had suffered a massive computer hack. As a result, the agency told us, it would no longer release certain information, including the employee “pseudo identifier” that had previously disambiguated employees with common names.
Even before the hack, the government withheld data on hundreds of thousands of employees. The data contains no names or duty stations for employees of the Department of Defense, FBI, Secret Service, DEA, IRS, US Mint, or the Bureau of Alcohol, Tobacco, Firearms and Explosives. It also withholds the names and duty stations of law enforcement officers, nuclear engineers, certain investigators, and a few other types of personnel — no matter which agency employs them. And it contains no data at all on employees of the White House, Congress, the judicial branch, CIA, NSA, the Department of State’s Foreign Service, the Postal Service, Congressional Budget Office, Library of Congress, Botanic Garden, Panama Canal Commission, and and a large handful of other agencies.
Finally, there are signs that OPM is clamping down on the data it releases. Earlier this month, the Asbury Park Press reported that the agency had — for the first time since the newspaper began requesting the data in 2007 — refused to provide certain data on employees working outside the 50 states or the District of Columbia, as well as “most performance-based bonuses given to federal employees in fiscal year 2016.”
This post has been updated to note that the "status" data provided by the Department of Defense is missing the fiscal quarters ending Dec. 1992 through Sept. 2013.
We've received the missing data from the Department of Defense and uploaded it, bringing the total size of the records to more than 33 gigabytes. The post has been updated to reflect this.
Jeremy Singer-Vine is the data editor for the BuzzFeed News investigative unit and is based in Washington, D.C. His secure PGP fingerprint is E2B0 63DB 0601 D634 1E9E F9AE 9F24 768F 9B4A EFB0
Contact Jeremy Singer-Vine at email@example.com.
Got a confidential tip? Submit it here.