Analyzing URLs Between Pro–Donald Trump And The Alt-Right Subreddits

    As part of a larger body of research on online hate speech, and the alt-right, my collaborator Francis Tseng and I wanted to analyze the overlap between two subreddits: r/The_Donald and the (now-banned) r/altright.

    With the president’s slow response to condemning neo-Nazis, his general comments about violence on “both sides” post Charlottesville, and the alt-right’s general support of the current president (at least on 4chan and Reddit), I was interested in seeing how these two digital conversations overlap. But how could we measure that overlap in a meaningful way? One idea we had was to look at the links that were being discussed. Reddit is a place of sharing--sharing jokes, images, and urls. Conversations or boards revolve around hyper-specific topics and we wondered how much overlap there might be between the links that are shared and discussed in the two communities.

    Our data was sourced from a database of public Reddit comments originally compiled by redditor u/Stuck_In_the_Matrix and subsequently migrated to BigQuery by another user, u/fhoffa.

    We exported all available posts and comments from r/altright, which had its first post on Dec. 4, 2015, and r/The_Donald, which was created on June 27, 2015, from the start of each subreddit through April 2017 (for r/The_Donald), and January 2017, which is when r/altright was banned for violating Reddit’s terms of use. (In retrospect, we should have limited our analysis to June 2015 through January 2017, the period when both forums were active simultaneously.) At the time of our analysis, in May 2017, r/the_Donald had 396,526 subscribers. Because it was taken down and banned, a lot of data was lost. Thus, it was incredibly hard to find the final subscriber count, what we did find was that in November 2016, r/altright had 7,625 subscribers. The BigQuery database is mostly complete but there are occasionally gaps which we filled using a secondary source, a database maintained by the same redditor, u/Stuck_In_the_Matrix.

    Francis wrote a series of Python scripts to process all post and comment text to extract any links referenced within. The scripts go over each post and its comments and look for strings of text that look like URLs and saves those strings to a separate file.

    We distinguished image links by looking for image extensions in each URL -- if any common image file extension ("jpg", "jpeg", "gif", "png", or "gifv") is present in the URL, we tagged it as an image. There were so many images in the archive that we turned that image data set into another project focused specifically on studying visual culture and memes. Our Knight Prototype Fund Grant will fund research on those images. More on that later.

    We extracted the top level domains (TLDs) of each link we collected. We expanded more popular TLDs, such as youtube.com (and variants like youtu.be), wikipedia.org, and reddit.com, so that we could group them by article or subreddit. These particular sites are so large and vary so much in content that the TLD alone provides little context. What we found to be the most interesting, were other subreddit links and Wikipedia.

    Finally, we grouped and counted the outbound links and computed summary statistics on how they are used, namely looking at the number of references made to specific URLs. Generally, the most important thing for this analysis was popularity, and that we determined by amount of links referencing specific sites. For example, the more links to Wikipedia, the more ‘popular’ Wikipedia became in the graphs.

    What did we learn?

    In both communities, self-referential links to to other posts in the same subreddit dominated the counts, followed closely by links to other subreddits. We were fascinated by how often each subreddit references itself, but on consideration, the self referential links shouldn’t come as a big surprise: references to other posts from their own subculture build ‘in jokes’ and an insider community.

    It was also fascinating to see what other subreddits are referenced often, and to think about that as an indicator of what the community is talking about.

    Our purpose was to explore what kinds of outside URLs were being shared, and whether there was overlap between these two communities. What were the kinds of media and conversations that were overlapping, and what other sites were they linking to? Additionally, how often were they internally linking to Reddit?

    If you scan the subreddits linked in r/altright, what do you see? Is it surprising to see a lot of links to r/the_Donald or to r/conspiracy, r/autotldr? In r/altright, links to r/The_Donald overshadowed links to any other subreddit. It wasn’t to us -- r/The_Donald is a much larger subreddit and discusses Trump, who is much beloved by the r/altright community.

    Links from r/the_Donald to other subreddits.

    But what of the others? As researchers, our next step is to dig into why these subreddits are popular. Popularity isn’t an indication of support, necessarily. A link to r/socialism does not mean that the readers of r/altright are pro-socialism. But they are watching that subreddit and discussing it. This was a good quantitative basis for deeper qualitative research we are doing next. Here are some examples of comments that link to r/socialism from r/altright:

    "people on r/socialism aren't just economically Left, they're socially as well. They have nothing in common with you, they believe in an egalitarian marxist utopia."

    /r/socialism is waaaayyy awful too.

    "We need to infiltrate places like r/socialism and start making posts about being cheated on, cucked, etc."

    “Hey /r/socialism mods, thoughts on this:
    https://i.sli.mg/jHB9vv.png You have failed. Socialism is dead. The workers are on our side. This election was only the beginning.”

    "Oh yeah how'd you know!? I live there with my 6 adopted black kids that i raise by myself because i hate my race so much ever since i got banned from /r/socialism, man they sure showed me."

    "This must be the r/socialism rifle club.....”

    "I don't think this sub is going anywhere. I mean, if it gets big enough maybe. But you have subs like /r/socialism and /r/communism that openly promotes violence and often white genocide.
    Also, remember that the alt right is still very small in general, and not many people identify as alt right, especially not publicly. I've seen lots of random Twitter accounts from women who support Trump and are pretty red pilled on race and immigration, but don't go as far as posting blatant 1488 memes or anything like that. These types of people (men as well as women) may very well start identifying as alt right, at least privately, as the movement continues to grow. At any rate, these more "casual" observers are naturally aligned with us and will likely vote for policies and politicians that we find favorable, even if they don't take the time to share their opinions on alt right forums."

    Conversely, sometimes ‘popularity’ of links can point to general support for a specific idea or ideology. But what is also important to look at is what is the ‘culture’ and what makes up that culture from a qualitative perspective. Trolling and jokes show intentionality and norms for subcultures, especially subcultures on Reddit, 4chan and 8chan. You could say these are ‘just jokes’ and that anyone is entitled to make them, and you’d be right. But it’s interesting to see what gets joked about, if there’s pushback against those jokes, and how similar in nature jokes tend to be. So we read months worth of posts and looked at the kinds of jokes made in r/altright to see what kind of overlap, if any at all, it had with other subreddits. Here are a few comments that linked to r/internethitlers. Again, what’s important here is not frequency but intention. Does it matter how often anti-semitic or eugenics jokes are made or does it matter that these jokes are treated as normal, funny humor? When defining, studying, and visualizing hate speech, it’s not just about frequency but also about norms and responses within the culture. The below are good examples of the r/altright’s norms in practice:

    Here are some of the comments that link to r/internethitlers from r/altright:

    You should show them [this](https://np.reddit.com/r/InternetHitlers/comments/4zig1j/an_altright_user_attempts_to_avoid_sounding_like/d6wj9xf/) comment that I wrote as well. It's proof that Trump supporters support eugenics! *gasp*

    [/r/internethitlers] [\*Can you imagine what will happen when the white man awakens in America? There are more of us here, than any country on Earth. Hail Victory!""](https://np.reddit.com/r/InternetHitlers/comments/4snkti/can_you_imagine_what_will_happen_when_the_white/)”

    Here are more examples of trolling and jokes on r/altright:

    "I was posting in r/internethitlers and the mods over there tagged one of our boys with a swastika. I thought that was pretty cool! Most SRS subs would simply drop the banhammer on crimethink.

    It would be neat if our mods could tag problematic users like that with a dildo or something. Some people aren't intentional trolls, just emotionally retarded redditors who need a scarlet letter. u/controlblue is a good example of this, he's a good guy but he just doesn't get it for whatever reason. Maybe a jar of peanut butter would be appropriate.

    * Dildos for shitlibs
    * Stars for our greatest allies
    * Pharaohs for kangs
    * Tacos for beaners
    * Goats for goatfuckers

    And so on.

    Just a thought anyway."

    And from r/internethitlers:

    "Unfortunately, we can't do much about blacks (I know, I want a shoah too) beside keep them in a permanent state of poverty in their self made ghettoes."

    https://np.reddit.com/r/InternetHitlers/comments/4zarp2/unfortunately_we_cant_do_much_about_blacks_i_know/

    The importance here is what the words mean, and not just the frequency of the words. “Shoah”, for example, is the Hebrew word for catastrophe and is the Hebrew term for the Holocaust. However, it’s often used by the alt-right and other subgroups, like white supremacists and neo-Nazis, for all kinds of ‘displeasing’ situations. The alt-right is co-opting the term, and changing it, similar to the way that Pepe was co-opted This is classic trolling behavior.

    So where was the overlap? Posters on r/altright were very much watching and talking about content on r/the_Donald, but that wasn’t the only place where the two subreddits overlapped. It’s hard to say how much influence or chatter about r/altright existed in r/the_Donald because of how large and noisy r/the_Donald is. With several hundred thousand subscribers, r/the_Donald has a lot of content and a lot of people commenting.

    What’s more important to highlight here is not just frequency of links, but what the links mean. Is it worse or better that Holocaust jokes exist within these subreddits or does the amount of the jokes occurring matter more than what the jokes mean? Whitney Phillips, media folklorist and writer on 4chan and trolling culture, breaks apart the importance of ‘irony’ and ‘joking’ within

    “Many people have raised the issues of performativity, when you’re responding to something that may be joke or is a joke. You can’t wave away something away on the grounds that it is ‘just joking.’ You have to understand how irony is performed [within a community] and what people are trying to accomplish through their joking. You can’t say just joking, or taking it entirely seriously minus a performative element. Having a data set that show these links being shared, that can show you a lot. But you have to show what is being performed, does it qualify with irony or humor within that community?”

    Phillips raises a good point: does it matter how frequently those jokes occur or does it matter more what they mean to the community? It can be a mixture of both. Does it matter if both spaces really want to dive into conspiracy theories? What’s important here is how we define the existence of content and what that content qualitatively means. When analyzing things like violence, conspiracies or hate speech, we need to look beyond frequency of occurrence to derive meaning and importance. It’s not how much something occurs but that it occurred at all. That being said, having any kind of quantitative analysis to outline great trends and themes is incredibly important and point researchers where to look at trends, as well as giving an overall overview of the community. Quantitative analysis can point a researcher in the direction of where to look, especially when dealing with big data, and very large data sets. In our case, seeing the volume of links, and the overlap of shared links gave us a place to narrow our research and focus in on the kinds of conversations that were happening around those shared links.

    The overlap of links between r/the_Donald and r/altright

    Wikipedia Pages linked to from r/altright

    One of the most popular links out of r/the_Donald was to the Wikipedia entry for moveon.org. One reason that particular entry is so popular is because one or two users frequently spammed the forum with posts much like this one.

    Wikipedia pages linked to from the r/The_Donald

    A significant share of the outbound links on both subreddits pointed to Wikipedia, so we wanted to drill down and look at which articles each linked to often.

    There is some overlap between the subreddits, but not a lot, but it’s less about the frequency or amount of the shared links, and just that some of the links are shared at all. A question I’m still trying to answer with my research is how do you determine ideology and shared political interests? Can it be measured by the volume of conversation on one topic or by small chatter that is supported and allowed?

    As a researcher, I favor the latter argument. When a subreddit does discuss something like “white superiority,” e.g. r/altright supporting white nationalism, it’s interesting to see when conversations from one subreddit bleed to another.

    It is notable that the discussion is happening in that space, that it is allowed to happen. So the importance is not that it’s a popular discussion but that users feel comfortable in having that discussion on that board. That’s where the overlap between two subreddits gets really interesting, especially with respect to shared links.

    Overlap in urls between r/The_Donald and r/altright

    We found very little overlap in the popular wikipedia articles, but there are a few curious exceptions: "Streisand effect" and an article about a child sexual exploitation ring in South Yorkshire got a similar volume of attention on both boards. The only similar amount of frequency/popularity was the wikipedia page mentioning Donald Trump. What is interesting in the above, though, is what all the shared URLs are. The more obviously racist ones stand out more on r/altright, as they account for a larger percentage of links overall, but came up several times on r/the_Donald, too.

    It helps to know that the Bellamy salute was also known as the “flag salute” in the 1890s when it was introduced along with the Pledge of Allegiance. Today you’d probably recognize it as a Nazi salute -- placing the right hand over the heart was formally adopted by Congress in 1942, in part because the long-armed salute was so widely associated with Nazis.

    The Battle of Vienna is about Poland fighting the “spread of Islam” from the Ottoman Empire. This is a comment from the r/the_Donald

    https://en.m.wikipedia.org/wiki/Battle_of_Vienna Essentially, the Muslims had been cutting bloody swaths across Europe, North Africa and Central Asia for centuries. The Ottoman Empire was attempting to penetrate deeper into Europe. Capturing the city of Vienna would have given them a strategic advantage, as this was access to the Danube, etc.

    The King of Poland marched his army down and kicked their asses, and the Ottoman incursion into Europe was halted.”

    And this is a comment from r/altright,

    ”Well, I definitely agree that religion can cause irrational decisions. However, I think something that's being seen right now, whether people realize it or not, is the intertwining of religion and race in an almost revival of historic conflicts.

    It's been a little over 300 years since the Holy Roman Empire and the Polish-Lithuanian Commonwealth fought and won against the forces of Islamic evil [in Vienna] (https://en.wikipedia.org/wiki/Battle_of_Vienna).

    I pray Europe finds its soul and listens to people like Cardinal Schonborn instead of continuing this garbage open border policy.”

    Or that conversations about White Americans, where about the decline of ‘white’ birthrates in the alt-right subreddit and statistics about the amount of white people in poverty vs black people in the r/the_Donald subreddit. Both conversations were about the perceived decline of White people, though on very different topics: birth rate vs poverty. The actually Wikipedia page, though on, White Americans, breaks down White Americans by census information, social definitions and critical race theory definitions.

    It is worth taking a closer look at why r/altright was so interested in

    Animal welfare in Nazi Germany, but the Nazi’s were famously supportive of animal rights so presumably the article is either a defense of Nazis (they weren't all bad, they were nice to animals!) or vegetarians.

    The Minnesota Transracial Adoption Study is often trotted out to demonstrate that white americans are genetically superior. (The Wikipedia entry on the study is the subject of a 7 year arbitration process that more or less revolves around efforts to keep any criticism of the study or it’s methodology off of the site.)

    While this is just a small slice of the kinds of analysis we’ve been doing on the altright, it’s interesting to see what and when conversations overlap between other political groups and other social media spaces and networks.

    Our next undertaking will be to analyze the images we found in these two communities, with a grant from the Knight Prototype Fund. We’re building a dashboard to track images and memes so that researchers and reporters can quickly find the origins of an image or meme. Follow us on Twitter for updates -- we’re Susie, Francis, and Caroline.