Guess which of these tweets were labeled as "positive" or "negative" (according to a machine):
Trick question. All of these were classified as negative, but any human reading these tweets would know that no one is actually crying or dying and that distractions are often a good thing. Who knows, maybe even that guy meant "Buzzfeed is terrible" in the terribly awesome way.
Nuances in language are beautiful, albeit totally complex. Sentiment analysis attempts to classify the emotional context in written text, and tweets are a unique kind — limited in length, rife with grammatical errors and abbreviations, and sometimes just don't make any sense at all. The most accurate way to understand the sentiment of a tweet would be for humans to read each and every one (some researchers use crowdsourcing sites like Amazon's Mechanical Turk to have people do this manually) but even then there are errors, and the amount of data analyzed is significantly smaller.
Twitter has created a whole new area of natural language processing, says Alec Go, one of the Stanford computer science graduates behind Sentiment 140. Go and his colleagues wanted to build a better sentiment analysis tool to try and classify the current attitude on Twitter toward a brand, product or topic. Their secret to (relative) success: emoticons. (Though everyone knows EMOJI are the language of the future.)
"That smiley or frownie face is sort of a label for us to use. What we did was we found as many tweets with smileys and frownies as possible — almost 1.6 million — and took that data to train our model," he told me. Instead of using only keywords, like a lot of not-that-great sentiment analysis research, they looked for words and ensuing emoticons to get a more complete understanding of the sentiment in a tweet. It's not perfect, Go admits, but better than a lot of what's out there.
Why are giant companies and researchers so damned interested in how we're feeling on Twitter en masse? Well, they hope that by divining the pulse of the masses on a second-by-second basis in a way that was never possible before — over 340 million tweets a day as of March — they'll be able to predict what's next, almost tremors before an earthquake. Already people have attempted to predict everything from the stock market to election outcomes to the spread of disease to box office revenues and American Idol results.
While it's not new humans are complex creatures and feelingssss are tricky, what sentiment analysis and the living, breathing horde of information that is Twitter ultimately offer together is a new kind of reason to try to understand these complexities. Being able to perceive what 140 million people are feeling from one second to the next? That would be some kind of superpower.