(cache) Examples

Examples

The -H flag in the examples below signifies that the file has a header row which is used for naming columns.

The -t flag is just a shortcut for saying that the file is a tab-separated file (any delimiter is supported - Use the -d flag).

Queries are given using upper case for clarity, but actual query keywords such as SELECT and WHERE are not really case sensitive.

Examples

Example 1 - COUNT DISTINCT values of specific field (uuid of clicks data)

q -H -t "SELECT COUNT(DISTINCT(uuid)) FROM ./clicks.csv"

Output 1:

Example 2 - Filter numeric data, controlling ORDERing and LIMITing output

Note that q understands that the column is numeric and filters according to its numeric value (real numeric value comparison, not string comparison).

q -H -t "SELECT request_id,score FROM ./clicks.csv WHERE score > 0.7 ORDER BY score DESC LIMIT 5"

Output 2:

2cfab5ceca922a1a2179dc4687a3b26e    1.0
        f6de737b5aa2c46a3db3208413a54d64    0.986665809568
        766025d25479b95a224bd614141feee5    0.977105183282
        2c09058a1b82c6dbcf9dc463e73eddd2    0.703255121794

Example 3 - Illustrate GROUP BY

q -t -H "SELECT hashed_source_machine,count(*) FROM ./clicks.csv GROUP BY hashed_source_machine"

Output 3:

47d9087db433b9ba.domain.com 400000

Example 4 - More complex GROUP BY (group by time expression)

q -t -H "SELECT strftime('%H:%M',date_time) hour_and_minute,count(*) FROM ./clicks.csv GROUP BY hour_and_minute"

Output 4:

07:00   138148
        07:01   140026
        07:02   121826

Example 5 - Read input from standard input

Calculates the total size per user/group in the /tmp subtree.

sudo find /tmp -ls | q "SELECT c5,c6,sum(c7)/1024.0/1024 AS total FROM - GROUP BY c5,c6 ORDER BY total desc"

Output 5:

mapred hadoop   304.00390625
        root   root     8.0431451797485
        smith  smith    4.34389972687

Example 6 - Use column names from header row

Calculate the top 3 user ids with the largest number of owned processes, sorted in descending order.

Note the usage of the autodetected column name UID in the query.

ps -ef | q -H "SELECT UID,COUNT(*) cnt FROM - GROUP BY UID ORDER BY cnt DESC LIMIT 3"

Output 6:

root 152
        harel 119
        avahi 2

Example 7 - JOIN two files

The following command joins an ls output (exampledatafile) and a file containing rows of group-name,email (group-emails-example) and provides a row of filename,email for each of the emails of the group. For brevity of output, there is also a filter for a specific filename called ppp which is achieved using a WHERE clause.

q "SELECT myfiles.c8,emails.c2 FROM exampledatafile myfiles JOIN group-emails-example emails ON (myfiles.c4 = emails.c1) WHERE myfiles.c8 = 'ppp'"

Output 7:

ppp dip.1@otherdomain.com
        ppp dip.2@otherdomain.com

You can see that the ppp filename appears twice, each time matched to one of the emails of the group dip to which it belongs. Take a look at the files exampledatafile and group-emails-example for the data.

Column name detection is supported for JOIN scenarios as well. Just specify -H in the command line and make sure that the source files contain the header rows.

q

q - Text as Data

General

Documentation

Behind the Scenes

Author

Examples

General

Documentation

Behind the Scenes

Author

Social

Examples