A Day Of FOIA Data

    Report from our Freedom of Information data hackday.

    This spring, BuzzFeed News and BuzzFeed Open Lab got together with MuckRock, FOIA Mapper, FOIA Machine, and a few dozen very smart folks for a day of tinkering with FOIA data.

    BuzzFeed's Assistant General Counsel, Nabiha Syed, led us off with a great workshop on the finer points of public records laws, and then we broke out into groups to tackle some fun projects.

    MuckRock brought 112 gigabytes of public records requests and responses to share, and spent much of the day helping people search through them. The data, which is still available for download, covers FOIA requests submitted through their service — as well as information about the government FOIA offices and officers that they've crossed paths with in their work.

    Participants came up with some very cool projects, especially considering that they only spent one day working with the data. There was a ton of skill sharing going on, which was fantastic to watch.

    Hanna Wallach walked us through topic modeling tools, which identify "topics" (characterized by lists of top words) from unlabeled document collections. Using the MALLET toolkit, she applied topic modeling (with 500 topics) to MuckRock and FOIA Mapper's request descriptions and calculated the number of days it took for each request to be fulfilled. With that information we (or ... a machine learning expert like Hanna, anyway) can run survival analysis to identify patterns in response times.

    Andrew Tran, Kai Theo and Hilary Fung designed scorecards system to illustrate agency responsiveness. The three, who are all data reporters or newsroom developers, evaluated agencies by the percent of requests they deny and the time they take to respond at all. Their parser is really well documented, so it’s worth checking out if you want to tackle a similar project but aren't sure how to start.

    Clint Adams and BuzzFeed's Jeremy Singer Vine both scoured the MuckRock corpus for trends by file type. There are more videos and audio files in the corpus than you might think -- those are definitely worth looking at more closely. And take a look at Jeremy's python script -- it will randomly grab a segment of audio from somewhere in MuckRock's corpus.

    Max Galka, founder of FOIA Mapper, dug into FOIA Mapper data to look for patterns in who requests information -- he found that the top 100 requesters accounted for more than 100,000 FOIA requests.

    Rich Jones, who is famous in FOIA circles for releasing ODB's FBI file, walked everyone through a Caravel visualization of state and federal agencies to show variations in response times.

    Molly Kraus and Michael Kenney assembled a series of observations about the cost and volume of Federal FOIA requests. Ashesha Mehrotra, Poroma Pant, and Dylan Portelance looked for trends in topic areas and agencies. Kelsey Kennedy and Peter Hess looked for an uptick in requests for email records, especially at public universities. Dominic Mauro, Geoffrey Yip, and Kate Fink asked if agencies are less likely to brush off requests if there are financial consequences for wrongfully denying a request for information. Especially at the state level, are agencies more responsive to FOIA requests when they risk large non-compliance fine for denying a request?

    Keep it Going

    MuckRock's API is public and they're eager to help folks use it. And ... we're cooking up a West Coast edition of the hackday. If any of these projects has inspired you, I hope you'll be able to join us.

    Open Lab for Journalism, Technology, and the Arts is a workshop in BuzzFeed’s San Francisco bureau. We offer fellowships to artists and programmers and storytellers to spend a year making new work in a collaborative environment. Read more about the lab.