Tell me more ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

I am attempting to query the twitter search engine (search.twitter.com), convert the results into json, and then prepare the results as a csv for a research project. I am a python novice, but I have managed to code 2/3 of the program myself. However, I have a difficult time converting my json file into the csv format. I have tried various suggested techniques without success. What am I doing wrong here?

Here is what I have so far:

import twitter, os, json, csv

qname = raw_input("Please enter the term(s) you wish to search for: ")
date = int(raw_input("Please enter today's date (no dashes or spaces): "))
nname = raw_input("Please enter a nickname for this query (no spaces): ")
q1 = raw_input("Would you like to set a custom directory? Enter Yes or No: ")

if q1 == 'No' or 'no' or 'n' or 'N':
    dirname = 'C:\Users\isaac\Desktop\TPOP'

elif q1 == 'Yes' or 'yes' or 'y' or 'Y':
    dirname = raw_input("Please enter the directory path:")

ready = raw_input("Are you ready to begin? Enter Yes or No: ")
while ready == 'Yes' or 'yes' or 'y' or 'Y':
    twitter_search = twitter.Twitter(domain = "search.Twitter.com")
search_results = []
for page in range (1,10):
    search_results.append(twitter_search.search(q=qname, rpp=1, page=page))
    ready1 = raw_input("Done! Are you ready to continue? Enter Yes or No: ")
    if ready1 == 'Yes' or 'yes' or 'y' or 'Y':
        break

ready3 = raw_input("Do you want to save output as a file? Enter Yes or No: ")
while ready3 == 'Yes' or 'yes' or 'y' or 'Y':
    os.chdir(dirname)
    filename = 'results.%s.%06d.json' %(nname,date)
    t = open (filename, 'wb+')
    s = json.dumps(search_results, sort_keys=True, indent=2)
    print >> t,s
    t.close()
    ready4 = raw_input("Done! Are you ready to continue? Enter Yes or No: ")
    if ready4 == 'Yes' or 'yes' or 'y' or 'Y':
        break

ready5 = raw_input("Do you want to save output as a csv/excel file? Enter Yes or No: ")
while ready5 == 'Yes' or 'yes' or 'y' or 'Y':
    filename2 = 'results.%s.%06d.csv' %(nname,date)
    z = json.dumps(search_results, sort_keys=True, indent=2)
    x=json.loads(z)

    json_string = z
    json_array = x

    columns = set()
    for entity in json_array:
        if entity == "created_at" or "from_user" or "from_user_id" or "from_user_name" or "geo" or "id" or "id_str" or "iso_language_code" or "text":
            columns.update(set(entity))

    writer = csv.writer(open(filename2, 'wb+'))
    writer.writerow(list(columns))
    for entity in json_array:
        row = []
        for c in columns:
            if c in entity: row.append(str(entity[c]))
            else: row.append('')
share|improve this question
2  
And what's the problem you're seeing? – Mu Mind Feb 22 '12 at 3:38
"convert the results into json, and then prepare the results as a csv" How exactly is that supposed to work? – Ignacio Vazquez-Abrams Feb 22 '12 at 3:39
What do you want the output to look like? "key1: value1, key2: value2,.." or "key1, key2, key3...\n value1, value2, value3,..." (like column titles separated by newline from values) – platinummonkey Feb 22 '12 at 3:41
My hope is to have a csv with the tweet information as columns (i.e. date, userid, isocode, text) with each new tweet a representing a new row. I have seen many examples of converting json to csv: bitcointalk.org/index.php?topic=13550.0 – user1224809 Feb 22 '12 at 12:30
I fixed the code indentation again for you (I think, although it's getting more complicated and harder to guess the indentation you want). Indentation is really important for python and if you want help from the SO community, your best bet is to give us code that actually runs, what you're expecting it to do, and what you're seeing it do instead. To keep the code formatted properly, you should type it straight into a .py file and make sure it runs, then copy and paste it straight here, highlight it, and click the code button (the curly braces). – Mu Mind Feb 22 '12 at 15:28
show 1 more comment

3 Answers

You have several different problems going on.

First off, the syntax of

x == 'a' or 'b' or 'c'

probably doesn't do what you think it does. You should use

x in ('a', 'b', 'c')

instead.

Second, your ready5 variable never changes and won't work right in the loop. Try

while True:
    ready5 = raw_input("Do you want to save output as a csv/excel file? Enter Yes or No: ") 
    if ready5 not in (...):
        break

And finally, there's something wrong with your dumping/loading code. What you're getting from twitter should be a JSON string. There's some code you've left out from your question, so I can't tell for sure, but I don't think you want to be using json.dumps at all. You're reading from JSON (using json.loads) and writing to CSV (using csv.writer.writerow).

share|improve this answer
Thank you everyone for the comments! I will try these changes to the code. I actually put the rest of the code up for you to read through. Most of the example I have seen around the web suggest some variation of the read json/ write csv combo. My hope is to have a csv document with all of basic information from the tweet search (i.e. userid, geoid, iso code, text, etc.). If I do just a generic dump, the formatting seems all messed up. – user1224809 Feb 22 '12 at 12:25

A different approach would be to have tablib do the actual conversion for you:

import tablib
data = tablib.Dataset()
data.json = search_results
filename = 'results.%s.%06d.csv' %(nname,date)
csv_file = open(filename, 'wb')
csv_file.write(data.csv)
share|improve this answer
1  
Does this handle nested data? – Paul Rigor Apr 28 at 16:46
Looks like no, it silently writes garbage (filed a bug: github.com/kennethreitz/tablib/issues/100). But you could probably tweak it to handle 3 dimensions by iterating over the first dimension and writing multiple "Databooks". – Mu Mind Apr 28 at 23:01
There's a better solution (I can't recall the reference) that utilizes some recursion. Here's the link to my updated post: theoryno3.blogspot.com/2013/04/… – Paul Rigor Apr 29 at 20:24
up vote 0 down vote accepted

After some searching around, I found the answer here: http://michelleminkoff.com/2011/02/01/making-the-structured-usable-transform-json-into-a-csv/

The code should look something like this:(if you are search the twitter python api)

filename2 = '/path/to/my/file.csv'
writer = csv.writer(open(filename2, 'w'))
z = json.dumps(search_results, sort_keys=True, indent=2)
parsed_json=json.loads(z)
#X needs to be the number of page you pulled less one. So 5 pages would be 4.
while n<X:
 for tweet in parsed_json[n]['results']:
     row = []
     row.append(str(tweet['from_user'].encode('utf-8')))
     row.append(str(tweet['created_at'].encode('utf-8')))
     row.append(str(tweet['text'].encode('utf-8')))
     writer.writerow(row)
 n = n +1

Thanks Everyone for the help!

share|improve this answer
1  
You should accepted your own answer to mark the question as answered. – j0k Aug 20 '12 at 9:48
Sorry! I did not know how to do that. Made the change now. – user1224809 Aug 21 '12 at 13:50

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.