Sunday, August 30, 2009

Linux comm command brief tutorial


From COMM(1) man page, the options available are:

-1 suppress lines unique to FILE1
-2 suppress lines unique to FILE2
-3 suppress lines that appear in both files

Input files:

$ cat a.txt
sl-9023
sl-2112
sl-9029
sl-1210
sl-1215

$ cat b.txt
sl-9029
sl-9023
sl-1215
sl-2112
sl-9012
sl-9016


1) To find only those lines which are common to both the files

$ comm -12 a.txt b.txt
sl-9029

But the above output is wrong as we can see there are nearly 4 lines common between a.txt and b.txt.

As the man pages of Linux COMM(1) command says:

comm - compare two sorted files line by line

Lets sort the files:

$ sort -o /tmp/a.txt.srt a.txt
$ sort -o /tmp/b.txt.srt b.txt

Now

$ comm -12 /tmp/a.txt.srt /tmp/b.txt.srt
sl-1215
sl-2112
sl-9023
sl-9029

or Using bash process substitution technique (without creating those temporary files)

$ comm -12 <(sort a.txt) <(sort b.txt)
sl-1215
sl-2112
sl-9023
sl-9029

From Linux GREP(1) man pages,
-f FILE, --file=FILE (Obtain patterns from FILE, one per line)

$ grep -f a.txt b.txt
sl-9029
sl-9023
sl-1215
sl-2112

2) Find lines which are unique to first file (a.txt) only (w.r.t 2nd file b.txt)

$ comm -23 <<(sort a.txt) <(sort b.txt)
sl-1210

3) Find lines which are unique to second file (b.txt) only (w.r.t 1st file a.txt)

$ comm -13 <(sort a.txt) <(sort b.txt)
sl-9012
sl-9016

With no options, produce three-column output.
Column one contains lines unique to FILE1,
column two contains lines unique to FILE2,
and column three contains lines common to both files.

$ comm <(sort a.txt) <(sort b.txt)
sl-1210
sl-1215
sl-2112
sl-9012
sl-9016
sl-9023
sl-9029

No comments:

© Jadu Saikia www.UNIXCL.com