In any community there’s bound to be friction, but some… take it further than others. Reddit is a platform for thousands of online communities (known as “subreddits”), where community members can submit content, and upvote, downvote, or comment on content that others have submitted. Topics of discussion on Reddit run the gamut of human interest, but one of Reddit’s favorite topics to talk about is, unsurprisingly, Reddit itself.

recent post on AskReddit posing the question – “What popular subreddit has a really toxic community?” – surged to the top of the front page with 4,000 upvotes and over 10,000 comments as Redditors voiced their opinions on which Reddit communities they found to be the most abhorrent (the “/r/” prefix denotes a subreddit):

As I sifted through the thread, my data geek sensibilities tingled as I wondered “Why must we rely upon opinion for such a question? Shouldn’t there be an objective way to measure toxicity?”

With this in mind, I set out to scientifically measure toxicity and supportiveness in Reddit comments and communities. I then compared Reddit’s own evaluation of its subreddits to see where they were right, where they were wrong, and what they may have missed. While this post is specific to Reddit, our methodology here could be applied to offer an objective score of community health for any data set featuring user comments.

Defining Toxicity and Supportiveness

So what is Toxicity? Before we could do any analysis around which subreddits were the most Toxic, we needed to define what we would be measuring. At a high level, Toxic comments are ones that would make someone who disagrees with the viewpoint of the commenter feel uncomfortable and less likely to want to participate in that Reddit community. To be more specific, we defined a comment as Toxic if it met either of the following criteria:

  1. Ad hominem attack: a comment that directly attacks another Redditor (e.g. “your mother was a hamster and your father smelt of elderberries”) or otherwise shows contempt/disagrees in a completely non-constructive manner (e.g. “GASP are they trying CENSOR your FREE SPEECH??? I weep for you /s”)
  2. Overt bigotry:  the use of bigoted (racist/sexist/homophobic etc.) language, whether targeting any particular individual or more generally, which would make members of the referenced group feel highly uncomfortable

However, the problem with only measuring Toxic comments is it biases against subreddits that simply tend to be more polarizing and evoke more emotional responses generally. In order to account for this, we also measured Supportiveness in comments – defined as language that is directly addressing another Redditor in a supportive (e.g. “We’re rooting for you!”) or appreciative (e.g. “Thanks for the awesome post!”) manner.

By measuring both Toxicity and Supportiveness we are able to get a holistic view of community health that can be used to more fairly compare and contrast subreddit communities.

Data Collection

Comments were pulled via the Reddit API from the top 250 subreddits by number of subscribers, in addition to any subreddit mentioned in the AskReddit thread with over 150 upvotes. Comments were pulled from articles on the front page of each subreddit, 1000 comments were randomly chosen from each subreddit for analysis, and any subreddit that had fewer than 1000 comments was excluded from the analysis.

The Fun Stuff: Machine Learning

Idibon specializes in combining machine learning with human annotation of text, and for this task I was able to take advantage of our technology to improve both the efficiency and accuracy of our experiment. Specifically, a task as nuanced as labelling comments as Toxic/non-Toxic given our definition requires human annotation, but if we had annotated all 250 subreddits with 1,000 comments the task would have, at about 11 seconds/annotation (the average amount of time it took for our contributors) and 3 annotations per comment (in order to get multiple opinions for consensus), required nearly 23,000 person-hours to annotate.

Instead, we were able to use Idibon’s Sentiment Analysis model to narrow down the number of comments human annotators would need to see to only those that were most likely to carry negative or positive sentiment (a good high-level proxy for Toxicity/Supportiveness), and also only for subreddits which contained highly negative or positive sentiment generally. Using this tool, we narrowed down our dataset to 100 subreddits and 100 comments per subreddit, cutting our total number of annotations from 250,000 to 10,000, a decrease of 96%.

OH THE HUMANITY! – Human Text Annotation

At Idibon, we have three primary ways of engaging a third party to annotate text: the crowd, a global network of analysts, and experts who are analysts for our clients. In this case, we took our 10,000 comments to the crowd with CrowdFlower, an online human annotation service, where nearly 500 annotators from around the globe labeled our Reddit comments based on our criteria, until each comment had been labeled 3 times.

Analysis

In determining what makes a Subreddit community Toxic or Supportive, simply counting the number of Toxic and Supportive comments wouldn’t be sufficient. One of the unique aspects of Reddit is that members of the community have the ability to upvote and downvote comments, which gives us a window into not only what individual commenters are saying, but whether or not and to what extent the community as a whole supports those comments. With this in mind, overall Toxicity/Supportiveness of a subreddit was determined as a function of the scores1 of all the Toxic and Supportive comments in a subreddit2.

Results

Here are the results for subreddits plotted by Toxicity and Supportiveness:

In the interactive chart above, the red bubbles represent subreddits that were mentioned in the thread  “What popular subreddit has a really toxic community?” post with a score greater than 150 (upvotes – downvotes), while those in gray were picked from the top 250 subreddits by subscribers.  As we move up and right in the chart, subreddits were found to be more Toxic and less Supportive, while those in the bottom left are the least Toxic and most Supportive. Bubbles are sized by number of subscribers in the subreddit.

So how good was Reddit at picking out its most Toxic communities? Well, it seems they got most of the big ones with a few exceptions. The winner by far with 44% Toxicity and 1.7% Supportiveness, /r/ShitRedditSays, received 4,234 upvotes on the thread. /r/ShitRedditSays is, somewhat ironically, a subreddit dedicated to finding and discussing bigoted posts around Reddit – where the term “Redditor” is often used as an insult, and the Toxicity was generally directed at the Reddit community at large. However, it’s also important to note that a significant portion of their Toxicity score came from conversations between SRS members and other Redditors who come specifically to disagree and pick fights with the community, a trap that many members tend to fall into, and which lead to some rather nasty and highly unproductive conversations. 

While many of the most Toxic subreddits were mentioned in the thread, there were also a number of highly Toxic subreddits that Reddit seemed to miss, such as /r/SubredditDrama, /r/TumblrinAction (a subreddit dedicated to mocking Tumblr – where marginalized groups, particularly LGBTQ, post about their experiences), /r/4chan, and /r/news.

On the other end of the spectrum, it seems that some of the subreddits that were picked out as being Toxic were found to be some of the most Supportive communities by our study. In particular, /r/GetMotivated, with 50% Supportiveness and 6% Toxicity, seemed far from the Toxic community described by /u/LookHardBody as comprised of “two type[s] of people […] The people that post content to motivate others or because it motivated them and commenters who comment why it’s bullshit, stupid and unmotivational because it wasn’t specifically tailored to them.”

However, upon inspection of the data, there certainly were these types of negative posts in /r/GetMotivated as claimed, but they were not supported by the community at large. In fact, the average score for Supportive posts in /r/GetMotivated was 41, while Toxic posts had an average score of only 1.4. Overall, /r/GetMotivated fits in nicely next /r/loseit and /r/DIY as a subreddit built specifically for members to seek/give advice and support from the community, an unsurprisingly supportive bunch.

Another example of why it’s important to look at comment scores comes when we look at bigotry across subreddits:

Looking specifically at bigoted comments, the importance of taking score into account rather than number of comments becomes even more apparent. For a small number of communities (/r/Libertarian, /r/Jokes, /r/community, and /r/aww) the total aggregated score of comments that our annotators labeled as bigoted was actually negative – so despite having bigoted comments present in their communities, those bigoted comments were rejected by the community as a whole. On the other end of the spectrum we see /r/TheRedPill, a subreddit dedicated to proud male chauvinism3, where bigoted comments received overwhelming approval from the community at large.

Summary

In researching this post, I have delved deep into the darkest recesses of the interwebs, I have read comments that cannot be unread, seen things that cannot be unseen… but for good cause!

Sentiment analysis is only the tip of the iceberg in understanding how people relate to one another, how communities form and what characteristics make up a community abstracted from its individual members. In the case of subreddits, hopefully this post will give you some idea of what communities you’d want to be a part of and which you might want to avoid.

On a broader scale, these methods help answer larger questions like, “How can we build communities that we’re proud of and that encourage effective communication?”, and “How should we structure our discourse so that people really hear one another?” Answering these questions will allow us to strengthen our connections with those around us and improve our daily experiences in an increasingly digital world.

– Ben Bell (@BenSethBell)

 PS Like this article? Check out our AMA on it!

 

footnotes

  1. Logged because scores followed an exponential distribution
  2. Specifically, for each subreddit:

    Screen Shot 2015-03-10 at 6.26.54 PM

  3. Not a place I’d recommend spending your time, if you’d like better reading, I’d recommend my colleague Jana’s recent post on Idibon’s efforts to get more women in tech

31 thoughts on “Toxicity in Reddit Communities: a Journey to the Darkest Depths of the Interwebs

    • There was a time when TiA fit your description but the sub has become a place to hate on feminism in general. The OP submissions still follow the original direction of the sub, but If you like to comment on TiA you are probably an asshole.

      And if anyone doesn’t bleieve me, just follow that link. They’re the kind of people who will call you SJW for saying that genderqueer isn’t made up.

      • Nobodyinparticular says:

        I would have to disagree… scroll down to the very bottom of a popular post and read the downvoted comments, those are the bigoted ones. At least from what I’ve seen, the subreddit is pretty close to the original concept. Occasionally you’ll see things as you have described, but the things that the community supports most (via upvotes) are outrageous examples that should be taken note of. I would agree, however, that amidst the mass of sarcasm and exasperation there are a few legitimate bigots who are most certainly toxic, but they aren’t representative of the community.

        The defining line is the difference between Genderqueer (most certainly a thing) and mayonnaise-gender (not a thing…).

        • personally I’ve found that a lot earlier on the subreddit was full of posts about truly ridiculous stuff found on tumblr. Slowly but surely people started posting more moderate topics (but still with extremist views) until suddenly you get very few posts about otherkin and ridiculous triggers/pronouns in comparison to the amount of posts about radfems. Ridiculous radfems for sure, but it’s pretty much the most popular place to bash feminism on reddit right now

          And also, the comments nowadays are pretty much just the same recycled jokes used over and over, sometimes not even when relevant, with some more knowledgeable people replying from time to time. Compared to TiA of old, the current TiA is pretty shit.

          • vsuperfreckles says:

            I miss the /r/TumblrInAction of old. :( It had a lot of legitimate discussion.

            What this article failed to mention was the fact that there’s a major issue on the tumblr community with bullying, which is part of the inspiration for my posting on TiA. Many good people have been sent horrific messages (being told to kill themselves for being straight, or being called a variety of names because they dared complain about their lives when they were a ‘majority’ race or sexuality). In a video game fan circle of the site, anyone who shipped a canonically straight character as, well, straight, was apparently homophobic and should kill themselves because they were committing “gay erasure.” People have their inboxes flooded with hate comments over the smallest things.

            Tumblr hasn’t really done much to stop this bullying and whenever it’s discussed outside of tumblr you get people who say “oh well it’s a support community for LGBTQ! anyone against it is obviously a bigot!” when clearly they’re not aware of the bigger picture of what’s going on.

            I am glad to see I am right in avoiding /r/gaming though. I coudn’t stand most of the comments that were upvoted on there, and many genuinely well thought out comments that weren’t the “popular opinion” would be downvoted to hell. See: the XB1 vs PS4 console war.

      • Fran Kramer says:

        >the sub has become a place to hate on feminism in general

        That doesn’t make you an asshole or a bigot.

        What DOES make you an asshole or a bigot, however, is defending the notion that disliking feminism or disagreeing that genders such as “mayonaise” or “genderqueer” exist is bigotry.

      • It doesn’t hate on feminism in general. It has a lot of respect for 1st wave feminism, most members just feel that 3rd wave/modern has completely lost it’s path and has turned more into a toxic movement against both men and women.

        Social justice warrior is coined in the sub for the more fanatical irrational aspects, not for merely suggesting that “genderqueer exists”. Plenty of times posts have gone up and members have remarked that the content was reasonable and not to fanatical levels of SJWism

        We have our fair share of bigots and assholes, but typically they get downvoted by our own community

      • > the sub has become a place to hate on feminism in general

        This is the kind of the thing the sub hates on. people being really dramatic and making completely unfounded statements, usually grossly overreaching exaggerations of reality.

        > They’re the kind of people who will call you SJW for saying that genderqueer isn’t made up.

        It blows my mind that you rationalize things like this. You should probably get off the internet for a little while, because it’s completely warped your brain.

        Bickering about terminology with regards to completely made up words is a sure sign that you’re being OVER SENSITIVE.

      • “They’re the kind of people who will call you SJW for saying that genderqueer isn’t made up.”

        Within the framework of feminist gender theory, “genderqueer” can be nothing other than made up. If gender is a social construct (as all of these tumblrites would argue) then identifying with either, both, somewhere in between, or neither of the traditional genders is also socially constructed.

    • That may be the stated goal, but there are a lot of posts that are mainly anti-social justice. Recently there was a post with a worksheet from a racial sensitivity workshop, where the user was asked to examine their privilege. All the top comments decried the concept of privilege and ridiculed the importance of empathy towards minorities. Mind you, this was a very reasonable worksheet. Or look at this, a pamphlet about helping to make a college more inclusive.

      http://www.reddit.com/r/TumblrInAction/comments/2ysi9j/saw_this_at_my_college_today_cant_wait_to_transfer/

      • did you read the comments in the post, the top one pointing out the irony of using a large red fist to promote an “inclusive environment”

        Not everyone believes the current obsession around identity politics and intangible social constructs and forcing a captive audience(students) to listen adhere to a singular groups ideological stance on social issues is the best or even healthy.

        Just because someone is against one groups social justice approach does not mean they are against social justice. No group has central unquestionable authority on issues of social progress, and just because you label your actions as “social justice” or “progressive” doesn’t mean they are having that result or exempt from having the opposite effect

    • That’s what I’m wondering. I’m curious if /r/globaloffensive is on most bigoted because an automated program picked up stuff about terrorists and the like, or if because someone saw a bunch of rants about Russians or something

  1. If I use derogatory language against someone who raped a child, and people supports my message – my message would be considered toxic and bigoted?

  2. Aleesha Brown says:

    I have to say that I am not at all surprised at /r/ShitRedditSays’s inclusion at the top of the list of most toxic subs. SRS is the only sub on reddit that literally made me feel physically unsafe when I turned around and disagreed with the prevailing ideology. The community, which extends beyond just that single subreddit itself, is one of the most zealous, wild-eyed communities on the internet. Having regrettably been involved in them for some time prior my… “excommunication,” I can definitely see a comparison, if not kinship with, those those anarchists you see on TV that throw rocks and bombs and cause violent havoc during protests and demonstrations. They’re intolerant, they’re hateful, and worst of all, you’ll find they are violent and threatening if you dig underneath the surface into the overall community.

  3. I really enjoyed reading how you went about this data endeavor, and wanted to let you know that I appreciate it. The internet is what you make of it, and this study will help me broaden my browsing horizons without stumbling into a bullying community. Thank You!

  4. I think there may by a flaw in your algorithm.

    Suppose you have two comments:

    1. “OP is an asshole.”
    2. “Joffrey is an asshole.”

    and they are both upvoted equally. Does the algorithm know that #1 is mean towards someone, while #2 is not?

    • I’m going to guess that it’s not a clean algorithm. r/BlackPeopleTwitter and r/HipHopHeads are on the list most certainly due to wannabe and actual black people saying ‘n***a’ and having casual discussions about race. Those two subs should be average if not low scoring on toxicity. r/BTP is has a very light, humorous mood.

  5. Hey, after reading your post I am left with a few questions.

    I was wondering how you where able to detect Ad hominem attacks and “Overt bigotry”. Where you using some kind of NLP or just the feelings of the people on your AskReddit? Looking at your data it that the latter is true. This is unfortunate due to the fact that many of the comments you directly point out in this article they have put in edits saying that the communities they pointed out where not as bad as they had lead on.

    Using feelings is an extremely incorrect method of data collection. Any sociologist would tell you that you need to break up your problem and look at it in an imperative and logical manor. If you want correctly measure “toxicity” you need to break it up into it’s base elements. Find out what provokes the feelings in redditors that you take issue with.

    Do you have any plans to further update this article or renew this research with information collected without unbiased?

    • It would appear you scrolled down to the data without actually reading how the data was collected, as this is addressed in the article. It’s specifically stated that they did not use the comments/feelings of those who responded to that particular AskReddit question; in fact, their results contradicted some of what people were saying in that particular thread.

      This is how they defined “ad hominem attacks” and “overt bigotry:”

      This is how they defined “supportive” comments:

      This how they collected the data:

      Once the comments were collected, they were annotated according to the criteria defined above.

      Perhaps you still disagree with the methodology. But might I suggest that, next time, you actually read about the methodology before questioning and making assumptions about it?

  6. Karina Dealba-Klein says:

    I really appreciated this article! I had recently become a member of Reddit, and encountered the hate on it super fast. It’s sad people cannot just interact without having to act like such fools. I went ahead an published your article on my site, Leagit.com, so more people can read it! Thanks again for your article.

  7. The most toxic subreddits aren’t those where people write things that others may dislike. The most toxic subreddits are the ones where the downvoting is unjust, and where other forms of censorship are common. Toxicity doesn’t come from what people write; toxicity comes from preventing people from writing in the first place, or punishing them when they have written something.

  8. I think the the most fitting addition on the list of toxic reddits that got missed is definitely /r/subredditdrama. The sub is extraordinarily hostile towards any opinion that diverts their agenda’d gossip and hatred especially if it offers better reason or evidence.

    You can have threads where people will circlejerk with no end in sight, but as soon as someone provides credible scientifically supported counterpoint, the hostility and defensiveness that bubbles to the surface is palpable.

    One recent example I remember and was shocked at, but there are oh so many more:

    https://www.reddit.com/r/SubredditDrama/comments/2yuot4/an_rmensrights_mod_notices_a_lot_of_red_pill/cpddx5b?context=1

    The worst part is, they are completely and utterly self-unaware. They have convinced themselves its all a harmless joke and that they don’t really care, but the immense amount of anger, and outrage just pops out at you in any thread via text.

    They will lambast other people for caring about things, but never so much as notice the fact that they gossip, literally around the clock, routinely stalking individual users and investing enormous amounts of time being angry at the most asinine of jokes and other SJW causes.

    They have no idea how much they resemble their hated targets, like /r/TheRedPill in toxicity and bullshit.

  9. Surprised fatpeoplehate is not listed. Routinely gets to the front page with incredibly critical and somewhat demeaning posts.

  10. FAAAAAAAAAAAAAWK YEAH!!!!

    The red pill,what are they a pill company or somethin’? Tss.

    r/politics=easily the most toxic sub
    r/fatpeoplehate

    the author is so biased

  11. I’m a mod of ELI5 and I’m curious why it wasn’t included in the survey, it’s one of the largest subreddits. Could you run your analysis against it?

  12. How did you take quotes into account? How did you weed out responses to Trolls, aka “Redditors who come specifically to disagree and pick fights with the community?”

  13. Can I ask how /r/mylittlepony compares? I’ve heard they have an excellent reputation for tolerance and positiveness, and I’d like to know know if that’s the case.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>