199

What is the easiest way (using a graphical tool or command line on Ubuntu Linux) to know if two binary files are the same or not (except for the time stamps)? I do not need to actually extract the difference. I just need to know whether they are the same or not.

| improve this question | |
  • 6
  • 2
    The man page for cmp specifically says it does a byte by byte comparison so that is my default for 2 binary files. diff is line by line and will give you the same Yes/No answer but of course not the same dump to the standard out stream. If the lines are long because perhaps they are not text files then I would prefer cmp. diff has the advantage that you can specify a comparison of directories and the -r for recursion thereby comparing multiple files in one command. – H2ONaCl Dec 24 '16 at 8:07

14 Answers 14

193

The standard unix diff will show if the files are the same or not:

[me@host ~]$ diff 1.bin 2.bin
Binary files 1.bin and 2.bin differ

If there is no output from the command, it means that the files have no differences.

| improve this answer | |
  • 5
    diff seems to have problems with really large files. I got a diff: memory exhausted when comparing two 13G files. – Yongwei Wu Sep 28 '16 at 8:45
  • 1
    Interesting output. diff is telling you they are "binary" fies. Since all files can be considered to be binary that's a strange assertion. – H2ONaCl Dec 24 '16 at 8:13
  • 9
    You can report identical files with option: diff -s 1.bin 2.bin or diff --report-identical-files 1.bin 2.bin This shows Files 1.bin and 2.bin are identical – Tom Kuschel Jul 20 '17 at 10:44
  • 1
    No, it will say that they are "differ", so they are not the same – Josef Klimuk Mar 20 '18 at 13:31
  • 1
    I have two executables, I know they are different because I compiled and ran them, but all options of diff and cmp given here judge them identical. Why? !!! – mirkastath Feb 28 '19 at 2:14
112

Use cmp command. This will either exit cleanly if they are binary equal, or it will print out where the first difference occurs and exit.

| improve this answer | |
  • 9
    For the use case the OP describes IMHO cmp is more efficient than diff. So I'd prefer this. – halloleo Dec 18 '13 at 5:41
  • 5
    I have a shell script that runs: cmp $1 $2 && echo "identical" || echo "different" – steveha Dec 14 '14 at 2:01
  • 2
    does the cmp stop when it found the first difference, and display it or it goes through the end of the files? – sop Oct 25 '16 at 8:10
  • cmp has "silent" mode: -s, --quiet, --silent - suppress all normal output. I didn't test yet but I think that it will stop at the first difference if there is one. – Victor Yarema Nov 22 '16 at 5:21
102

I found Visual Binary Diff was what I was looking for, available on:

  • Ubuntu:

    sudo apt install vbindiff
    
  • Arch Linux:

    sudo pacman -S vbindiff
    
  • Mac OS X via MacPorts:

    port install vbindiff
    
  • Mac OS X via Homebrew:

    brew install vbindiff
    
| improve this answer | |
  • 3
    Nice... I /thought/ I only wanted to know whether the files differed; but being able to see the exact differences easily was a lot more useful. It tended to segfault when I got to the end of the file, but never mind, it still worked. – Jeremy Oct 28 '16 at 2:42
  • 3
    It's been said a few times, but this is a great little program! (fyi also on homebrew) – johncip Feb 19 '17 at 22:59
  • 3
    This should be the accepted answer as it's a far superior method than the bland and unhelpful output of the canonical diff command. – Gearoid Murphy Nov 7 '18 at 0:20
  • 2
    This is the best tool for binary diff. – Carla Camargo Jun 3 '19 at 13:22
18

Use sha1 to generate checksum:

sha1 [FILENAME1]
sha1 [FILENAME2]
| improve this answer | |
13

I ended up using hexdump to convert the binary files to there hex representation and then opened them in meld / kompare / any other diff tool. Unlike you I was after the differences in the files.

hexdump tmp/Circle_24.png > tmp/hex1.txt
hexdump /tmp/Circle_24.png > tmp/hex2.txt

meld tmp/hex1.txt tmp/hex2.txt
| improve this answer | |
  • 1
    Use hexdump -v -e '/1 "%02x\n"' if you want to diff and see exactly which bytes were inserted or removed. – William Entriken Mar 17 '17 at 21:13
  • Meld also works with binary files when they aren't converted to hex first. It shows hex values for things which aren't in the char set, otherwise normal chars, which is useful with binary files that also contain some ascii text. Many do, at least begin with a magic string. – Felix Dombek Jul 12 '19 at 11:20
7

You can use MD5 hash function to check if two files are the same, with this you can not see the differences in a low level, but is a quick way to compare two files.

md5 <filename1>
md5 <filename2>

If both MD5 hashes (the command output) are the same, then, the two files are not different.

| improve this answer | |
  • 7
    Can you explain your down votes please? SHA1 has 4 upvotes, and if the OP thinks there's a chance the two files could be the same or similar, the chances of a collision are slight and not worthy of down voting MD5 but up voting SHA1 other than because you heard you should hash your passwords with SHA1 instead of MD5 (that's a different problem). – Rikki Jan 16 '16 at 1:10
  • 3
    not sure about the reason but a pure cmp will be more efficient than computing any hash function of files and comparing them (at least for only 2 files) – Paweł Szczur Apr 26 '16 at 13:58
  • 1
    if the two files are large and on the same disk (not ssd), the md5 or sha* variant might be faster because the disks can read the two files sequentially which saves lots of head movements – Daniel Alder Feb 22 '17 at 20:08
  • 8
    I downvoted because you posted a minor variant of an earlier (bad) solution, when it should have been a comment. – johncip Mar 6 '17 at 10:07
6

Use cmp command. Refer to Binary Files and Forcing Text Comparisons for more information.

cmp -b file1 file2
| improve this answer | |
  • 1
    -b doesn't compare files in "binary mode". It actually "With GNU cmp, you can also use the -b or --print-bytes option to show the ASCII representation of those bytes.". This is exactly what I found using URL to manual that you have provided. – Victor Yarema Nov 22 '16 at 5:28
  • Victor Yarema, I don't know what you mean by "binary mode". cmp is inherently a binary comparison in my opinion. The -b option merely prints the first byte that is different. – H2ONaCl Dec 24 '16 at 8:25
4

For finding flash memory defects, I had to write this script which shows all 1K blocks which contain differences (not only the first one as cmp -b does)

#!/bin/sh

f1=testinput.dat
f2=testoutput.dat

size=$(stat -c%s $f1)
i=0
while [ $i -lt $size ]; do
  if ! r="`cmp -n 1024 -i $i -b $f1 $f2`"; then
    printf "%8x: %s\n" $i "$r"
  fi
  i=$(expr $i + 1024)
done

Output:

   2d400: testinput.dat testoutput.dat differ: byte 3, line 1 is 200 M-^@ 240 M- 
   2dc00: testinput.dat testoutput.dat differ: byte 8, line 1 is 327 M-W 127 W
   4d000: testinput.dat testoutput.dat differ: byte 37, line 1 is 270 M-8 260 M-0
   4d400: testinput.dat testoutput.dat differ: byte 19, line 1 is  46 &  44 $

Disclaimer: I hacked the script in 5 min. It doesn't support command line arguments nor does it support spaces in file names

| improve this answer | |
4

Diff with the following options would do a binary comparison to check just if the files are different at all and it'd output if the files are the same as well:

diff -qs {file1} {file2}

If you are comparing two files with the same name in different directories, you can use this form instead:

diff -qs {file1} --to-file={dir2}

OS X El Capitan

| improve this answer | |
3

Try diff -s

Short answer: run diff with the -s switch.

Long answer: read on below.


Here's an example. Let's start by creating two files with random binary contents:

$ dd if=/dev/random bs=1k count=1 of=test1.bin
1+0 records in
1+0 records out
1024 bytes (1,0 kB, 1,0 KiB) copied, 0,0100332 s, 102 kB/s

                                                                                  
$ dd if=/dev/random bs=1k count=1 of=test2.bin
1+0 records in
1+0 records out
1024 bytes (1,0 kB, 1,0 KiB) copied, 0,0102889 s, 99,5 kB/s

Now let's make a copy of the first file:

$ cp test1.bin copyoftest1.bin

Now test1.bin and test2.bin should be different:

$ diff test1.bin test2.bin
Binary files test1.bin and test2.bin differ

... and test1.bin and copyoftest1.bin should be identical:

$ diff test1.bin copyoftest1.bin

But wait! Why is there no output?!?

The answer is: this is by design. There is no output on identical files.

But there are different error codes:

$ diff test1.bin test2.bin
Binary files test1.bin and test2.bin differ

$ echo $?
1


$ diff test1.bin copyoftest1.bin

$ echo $?
0

Now fortunately you don't have to check error codes each and every time because you can just use the -s (or --report-identical-files) switch to make diff be more verbose:

$ diff -s test1.bin copyoftest1.bin
Files test1.bin and copyoftest1.bin are identical
| improve this answer | |
2

Radiff2 is a tool designed to compare binary files, similar to how regular diff compares text files.

Try radiff2 which is a part of radare2 disassembler. For instance, with this command:

radiff2 -x file1.bin file2.bin

You get pretty formatted two columns output where differences are highlighted.

| improve this answer | |
1

My favourite ones using xxd hex-dumper from the vim package :

1) using vimdiff (part of vim)

#!/bin/bash
FILE1="$1"
FILE2="$2"
vimdiff <( xxd "$FILE1" ) <( xxd "$FILE2" )

2) using diff

#!/bin/bash
FILE1=$1
FILE2=$2
diff -W 140 -y <( xxd $FILE1 ) <( xxd $FILE2 ) | colordiff | less -R -p '  \|  '
| improve this answer | |
0
md5sum binary1 binary2

If the md5sum is same, binaries are same

E.g

md5sum new*
89c60189c3fa7ab5c96ae121ec43bd4a  new.txt
89c60189c3fa7ab5c96ae121ec43bd4a  new1.txt
root@TinyDistro:~# cat new*
aa55 aa55 0000 8010 7738
aa55 aa55 0000 8010 7738


root@TinyDistro:~# cat new*
aa55 aa55 000 8010 7738
aa55 aa55 0000 8010 7738
root@TinyDistro:~# md5sum new*
4a7f86919d4ac00c6206e11fca462c6f  new.txt
89c60189c3fa7ab5c96ae121ec43bd4a  new1.txt
| improve this answer | |
  • 1
    Not quite. Only the possibility is high. – sawa Jan 25 '19 at 5:37
  • What is the probability of failing ? – ashish Jan 25 '19 at 6:08
  • Slim, but worse than using some variant of diff, over which there is no reason to prefer it. – sawa Jan 25 '19 at 6:24
  • You would have to change MD5 hash to SHA2 in order for this advice to be practical. Anyone's laptop can these days generate collision in MD5 and based on this single collision prefix (2 files of the same size, same prefix and same MD5) to generate infinite number of colliding files (having same prefix, different colliding block, same suffix) – Michal Ambroz Apr 23 '19 at 18:11
-1

There is a relatively simple way to check if two binary files are the same.

If you use file input/output in a programming language; you can store each bit of both the binary files into their own arrays.

At this point the check is as simple as :

if(file1 != file2){
    //do this
}else{
    /do that
}
| improve this answer | |

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

Not the answer you're looking for? Browse other questions tagged or ask your own question.