Audioburn, 7/17/2015
Backed by popular demand, here is The Coontown Breakdown 2: Electric Boogaloo. I'm back with new sample, with a size of ~1000 (995 to be exact) with debugged code, and with the addition of saved usernames as well as subreddit visit frequency (as opposed to karma scores).
This time around, I invite you all to make your own observations from the data, because, as I've learned from part 1 of this study, my own observations can skew perceptions and conclusions.
/r/Coontown is again omitted from the visualizations (it always comes #1 by a longshot in these studies). If you'd like to see, the raw JSON datasets do not have /r/coontown omitted.
Here is the code that made it happen:
import praw
import json
users = []
submissions = []
r = praw.Reddit(user_agent='africanawiki')
subreddit = r.get_subreddit('coontown')
#get submission object ids
for i,submission in enumerate(subreddit.get_hot(limit=350)): #takes around 3-4 hours
print 'getting submission object %s' % (i)
submissions.append(r.get_submission(
submission_id=submission.id))
root_comments = []
for i,s in enumerate(submissions):
print 'getting comments %s of %s' % (
i, len(submissions))
for c in s.comments:
root_comments.append(c)
def get_comments(comments,level):
for i,c in enumerate(comments):
try:
print 'getting comment count: %s in level %s' % (
i,level)
if c.author.name not in users:
users.append(c.author.name)
except AttributeError:
print 'nada'
if hasattr(c,'replies'):
level += 1
get_comments(c.replies,level)
get_comments(root_comments,0)
kb_submissions = {}
kb_comments = {}
for idx,username in enumerate(users):
try:
print 'getting info for %s, %s of %s' % (username,idx,len(users))
user = r.get_redditor(username)
submissions = user.get_submitted(limit=None)
comments = user.get_comments(limit=None)
for s in submissions:
subreddit = s.subreddit.display_name
kb_submissions[subreddit] = (
kb_submissions.get(subreddit, 0) + s.score) #changed s.score to 1 for freq
for c in comments:
subreddit = c.subreddit.display_name
kb_comments[subreddit] = (
kb_comments.get(subreddit, 0) + c.score) #changed c.score to 1 for freq
except:
print 'user deleted his/her account, smart'
karma_by_subreddit = {
'submissions':kb_submissions,
'comments':kb_comments,
'users':users,
}
#save object to disk as json
with open('coontown_breakdown.json','w') as fp:
json.dump(karma_by_subreddit,fp)
Knock yourselves out analysing other subreddits. You can find more at my github repository, which also contains the code for my open source data visualization app Agile which I used to help visualize these charts above (+Highcharts).
Here are some links to the raw data (warning: auto-download):
Thanks for viewing, this is the final Coontown analysis I'll be doing. I hope you enjoyed. More visualizations on other interesting topics coming soon.
code_black()Authored by Audioburn