Here at the Open Lab I've been thinking a lot about automated journalism. In particular, I'd like to be building open source tools that can be used by newsrooms big and small, and that are fundamentally designed to empower journalists instead of replace them. As a first -small- step in this direction, I've built a tool for monitoring RSS feeds in bulk that we're using internally to make the SEC's EDGAR system more accessible.
EDGAR, or the Electronic Data Gathering, Analysis, and Retrieval system, is the SEC's public filings database. For certain kinds of events, like when a company is going public and is required to disclose information to everyone at the same time, EDGAR is the source of truth.
The SEC did make some fairly forward-thinking decisions when they built the system, like providing per-company RSS feeds for each type of filing that they store. Still, they didn't build an API and the interface (which dates to the mid-90s) has not aged well, so it is difficult to access information. Reporters who cover businesses need notification of new filings, they don't want to have to remember to refresh their RSS readers periodically. They want those notifications to include context, so that they don't have to start from scratch every time a new version of a 200 page filing appears. And they want these things not for a handful of companies that they know to watch, but for hundreds or thousands that might otherwise fly under their radar.
We reviewed many of the existing tools that attempt to solve some of these problems, but ultimately came up short. There are many RSS readers, but few provide immediate notifications. IFTTT has a wonderful interface, but setting up recipes is time consuming, so monitoring many feeds quickly becomes untenable. And knowing that a page has changed is not the same as knowing how it changed.
Eventually, I decided to build a simple RSS watchdog tool called RSS Puppy. It's designed to watch bulk collections of RSS feeds and to emit events that other systems can listen for and act on. Once that was in place, I also made a small module which can process EDGAR filings, and hooked the two together. The result watches for filings from many different companies and sends out notifications to our journalists as soon as something happens. The processing module pulls up past versions and highlights changes to individual filings, giving journalists a head start identifying developments. Control of the story remains in journalists hands, but the system tackles the boring work of figuring out what changed in any one filing.
RSS Puppy requires only Node.js and a Postgres database, so anyone with some RSS feeds they'd like to watch should check it out. The EDGAR processing module is more specific to our needs (and depends on Amazon Web Services), but will certainly work for anyone else interested in watching EDGAR. Check out both projects on GitHub, and let us know if you build cool things with them!