How to convert JSON (Twitter Data) to CSV using Python

Question

I am attempting to query the twitter search engine (search.twitter.com), convert the results into json, and then prepare the results as a csv for a research project. I am a python novice, but I have managed to code 2/3 of the program myself. However, I have a difficult time converting my json file into the csv format. I have tried various suggested techniques without success. What am I doing wrong here?

Here is what I have so far:

import twitter, os, json, csv

qname = raw_input("Please enter the term(s) you wish to search for: ")
date = int(raw_input("Please enter today's date (no dashes or spaces): "))
nname = raw_input("Please enter a nickname for this query (no spaces): ")
q1 = raw_input("Would you like to set a custom directory? Enter Yes or No: ")

if q1 == 'No' or 'no' or 'n' or 'N':
    dirname = 'C:\Users\isaac\Desktop\TPOP'

elif q1 == 'Yes' or 'yes' or 'y' or 'Y':
    dirname = raw_input("Please enter the directory path:")

ready = raw_input("Are you ready to begin? Enter Yes or No: ")
while ready == 'Yes' or 'yes' or 'y' or 'Y':
    twitter_search = twitter.Twitter(domain = "search.Twitter.com")
search_results = []
for page in range (1,10):
    search_results.append(twitter_search.search(q=qname, rpp=1, page=page))
    ready1 = raw_input("Done! Are you ready to continue? Enter Yes or No: ")
    if ready1 == 'Yes' or 'yes' or 'y' or 'Y':
        break

ready3 = raw_input("Do you want to save output as a file? Enter Yes or No: ")
while ready3 == 'Yes' or 'yes' or 'y' or 'Y':
    os.chdir(dirname)
    filename = 'results.%s.%06d.json' %(nname,date)
    t = open (filename, 'wb+')
    s = json.dumps(search_results, sort_keys=True, indent=2)
    print >> t,s
    t.close()
    ready4 = raw_input("Done! Are you ready to continue? Enter Yes or No: ")
    if ready4 == 'Yes' or 'yes' or 'y' or 'Y':
        break

ready5 = raw_input("Do you want to save output as a csv/excel file? Enter Yes or No: ")
while ready5 == 'Yes' or 'yes' or 'y' or 'Y':
    filename2 = 'results.%s.%06d.csv' %(nname,date)
    z = json.dumps(search_results, sort_keys=True, indent=2)
    x=json.loads(z)

    json_string = z
    json_array = x

    columns = set()
    for entity in json_array:
        if entity == "created_at" or "from_user" or "from_user_id" or "from_user_name" or "geo" or "id" or "id_str" or "iso_language_code" or "text":
            columns.update(set(entity))

    writer = csv.writer(open(filename2, 'wb+'))
    writer.writerow(list(columns))
    for entity in json_array:
        row = []
        for c in columns:
            if c in entity: row.append(str(entity[c]))
            else: row.append('')

"convert the results into json, and then prepare the results as a csv" How exactly is that supposed to work?
What do you want the output to look like? "key1: value1, key2: value2,.." or "key1, key2, key3...\n value1, value2, value3,..." (like column titles separated by newline from values)
My hope is to have a csv with the tweet information as columns (i.e. date, userid, isocode, text) with each new tweet a representing a new row. I have seen many examples of converting json to csv: bitcointalk.org/index.php?topic=13550.0
I fixed the code indentation again for you (I think, although it's getting more complicated and harder to guess the indentation you want). Indentation is really important for python and if you want help from the SO community, your best bet is to give us code that actually runs, what you're expecting it to do, and what you're seeing it do instead. To keep the code formatted properly, you should type it straight into a .py file and make sure it runs, then copy and paste it straight here, highlight it, and click the code button (the curly braces).

Mu Mind · Answer 1 · 2012-02-22 03:52:58Z

You have several different problems going on.

First off, the syntax of

x == 'a' or 'b' or 'c'

probably doesn't do what you think it does. You should use

x in ('a', 'b', 'c')

instead.

Second, your ready5 variable never changes and won't work right in the loop. Try

while True:
    ready5 = raw_input("Do you want to save output as a csv/excel file? Enter Yes or No: ") 
    if ready5 not in (...):
        break

And finally, there's something wrong with your dumping/loading code. What you're getting from twitter should be a JSON string. There's some code you've left out from your question, so I can't tell for sure, but I don't think you want to be using json.dumps at all. You're reading from JSON (using json.loads) and writing to CSV (using csv.writer.writerow).

Thank you everyone for the comments! I will try these changes to the code. I actually put the rest of the code up for you to read through. Most of the example I have seen around the web suggest some variation of the read json/ write csv combo. My hope is to have a csv document with all of basic information from the tweet search (i.e. userid, geoid, iso code, text, etc.). If I do just a generic dump, the formatting seems all messed up.

Mu Mind · Answer 2 · 2012-02-22 15:10:23Z

up vote 0 down vote

A different approach would be to have tablib do the actual conversion for you:

import tablib
data = tablib.Dataset()
data.json = search_results
filename = 'results.%s.%06d.csv' %(nname,date)
csv_file = open(filename, 'wb')
csv_file.write(data.csv)

answered Feb 22 '12 at 15:10

Mu Mind
4,415632

1

Does this handle nested data? – Paul Rigor Apr 28 at 16:46

Looks like no, it silently writes garbage (filed a bug: github.com/kennethreitz/tablib/issues/100). But you could probably tweak it to handle 3 dimensions by iterating over the first dimension and writing multiple "Databooks". – Mu Mind Apr 28 at 23:01

There's a better solution (I can't recall the reference) that utilizes some recursion. Here's the link to my updated post: theoryno3.blogspot.com/2013/04/… – Paul Rigor Apr 29 at 20:24

user1224809 · Accepted Answer · 2012-08-19 12:27:51Z

After some searching around, I found the answer here: http://michelleminkoff.com/2011/02/01/making-the-structured-usable-transform-json-into-a-csv/

The code should look something like this:(if you are search the twitter python api)

filename2 = '/path/to/my/file.csv'
writer = csv.writer(open(filename2, 'w'))
z = json.dumps(search_results, sort_keys=True, indent=2)
parsed_json=json.loads(z)
#X needs to be the number of page you pulled less one. So 5 pages would be 4.
while n<X:
 for tweet in parsed_json[n]['results']:
     row = []
     row.append(str(tweet['from_user'].encode('utf-8')))
     row.append(str(tweet['created_at'].encode('utf-8')))
     row.append(str(tweet['text'].encode('utf-8')))
     writer.writerow(row)
 n = n +1

Thanks Everyone for the help!

You should accepted your own answer to mark the question as answered.

asked	1 year ago
viewed	1251 times
active	10 months ago

How to convert JSON (Twitter Data) to CSV using Python

3 Answers

Your Answer

Not the answer you're looking for? Browse other questions tagged python json twitter csv or ask your own question.

How to convert JSON (Twitter Data) to CSV using Python

3 Answers

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged python json twitter csv or ask your own question.

Related