Justin Bieber, not trending.
A small team at MIT, in conjunction with Twitter, claims to have written an algorithm that can pick out trending Twitter topics an hour and a half — and in some cases, up to five hours — before they show up on the official list. It’s right 95% of the time.
Professor Devavrat Shah, along with a student, Stanislav Nikolov, used machine learning to refine their software, which compares the real-time behavior of fledgling Twitter topics with a sample set of 200 former Twitter trends. According to MIT, “their algorithm compares changes over time in the number of tweets about each new topic to the changes over time of every sample in the training set.”
In very simple terms, it matches real-time early posting patterns with those of topics that actually broke through into the TT list. These patterns — changes in rate of posting, for example — are apparently distinctive enough to distinguish the popular from the very popular, or the briefly explosive from the fully trending.
Twitter’s trending topics are themselves determined by an algorithm, which takes into account not just a topic’s raw popularity, but past popularity as well. Nobody outside of Twitter knows exactly how it works, but given that it’s based primarily on public data — tweets — reverse engineers would have a lot to work with. Nikolov is a “grad student in residence” at Twitter, meaning he has fuller access to Twitter’s data than most. This research will likely be used to improve Twitter’s trending topics, allowing the company to work more quickly with advertisers to identify valuable topics.
Outside reverse-engineering, however, would present a problem for Twitter. Its largest ad campaigns include promoted trending topics; in other words, it charges for access to the trending list. The ability to guess what that list will look like a few hours in advance wouldn’t guarantee entry, exactly, but it would give people time to tailor posts to the imminent trend, helping secure a spot on its “Top Tweets” list.
When techniques like this are applied to Google, it’s called search engine optimization, or SEO. Google regards SEO with quite a bit of hostility, because it undermines the company’s ability to sort information correctly; SEO is as much about exploiting weaknesses or errors in a search engine as it is about “optimizing” sites to follow the rules. The site’s monthly algorithm updates, if you read them closely, are effectively just SEO bug fixes — that is, dozens of changes meant to prevent anyone from getting an unfair advantage in search results.
Twitter considers its algorithm a constant “work in progress” and makes changes at will: In 2010, Justin Bieber was semi-exiled from the trending topics list by an algorithm change. Twitter said at the time, “We think that trending topics which capture the hottest emerging trends and topics of discussion on Twitter are the most interesting.”
What Shah and Nikolov have accomplished, then, hints at a new type of SEO. Like the old SEO, it’s all about reverse engineering. Unlike the old SEO, for which the data set (the search database) was private, the raw data at hand here is mostly public. Google solved SEO with obscurity. Twitter has been able to do the same so far.
This post has been updated to reflect Nikolov’s relationship with Twitter