What is the easiest way (using a graphical tool or command line on Ubuntu Linux) to know if two binary files are the same or not (except for the time stamps)? I do not need to actually extract the difference. I just need to know whether they are the same or not.
The standard unix diff
will show if the files are the same or not:
[me@host ~]$ diff 1.bin 2.bin
Binary files 1.bin and 2.bin differ
If there is no output from the command, it means that the files have no differences.
-
5diff seems to have problems with really large files. I got a
diff: memory exhausted
when comparing two 13G files. – Yongwei Wu Sep 28 '16 at 8:45 -
1Interesting output.
diff
is telling you they are "binary" fies. Since all files can be considered to be binary that's a strange assertion. – H2ONaCl Dec 24 '16 at 8:13 -
9You can report identical files with option:
diff -s 1.bin 2.bin
ordiff --report-identical-files 1.bin 2.bin
This showsFiles 1.bin and 2.bin are identical
– Tom Kuschel Jul 20 '17 at 10:44 -
1
-
1I have two executables, I know they are different because I compiled and ran them, but all options of diff and cmp given here judge them identical. Why? !!! – mirkastath Feb 28 '19 at 2:14
Use cmp
command. This will either exit cleanly if they are binary equal, or it will print out where the first difference occurs and exit.
-
9For the use case the OP describes IMHO
cmp
is more efficient thandiff
. So I'd prefer this. – halloleo Dec 18 '13 at 5:41 -
5I have a shell script that runs:
cmp $1 $2 && echo "identical" || echo "different"
– steveha Dec 14 '14 at 2:01 -
2does the cmp stop when it found the first difference, and display it or it goes through the end of the files? – sop Oct 25 '16 at 8:10
-
cmp
has "silent" mode:-s, --quiet, --silent
-suppress all normal output
. I didn't test yet but I think that it will stop at the first difference if there is one. – Victor Yarema Nov 22 '16 at 5:21
I found Visual Binary Diff was what I was looking for, available on:
Ubuntu:
sudo apt install vbindiff
Arch Linux:
sudo pacman -S vbindiff
Mac OS X via MacPorts:
port install vbindiff
Mac OS X via Homebrew:
brew install vbindiff
-
3Nice... I /thought/ I only wanted to know whether the files differed; but being able to see the exact differences easily was a lot more useful. It tended to segfault when I got to the end of the file, but never mind, it still worked. – Jeremy Oct 28 '16 at 2:42
-
3It's been said a few times, but this is a great little program! (fyi also on homebrew) – johncip Feb 19 '17 at 22:59
-
3This should be the accepted answer as it's a far superior method than the bland and unhelpful output of the canonical diff command. – Gearoid Murphy Nov 7 '18 at 0:20
-
2
Use sha1 to generate checksum:
sha1 [FILENAME1]
sha1 [FILENAME2]
-
3If you only had a checksum for one of the files, this would be useful, but if you have both files on disk this is unnecessary.
diff
andcmp
will both tell you if they differ without any extra effort. – johncip Feb 19 '17 at 23:04 -
1
-
2
-
2There are two files that will return the same result despite being different: shattered.io – mik Feb 16 '18 at 10:58
-
2SHA1 has already one public collision (shattered.io) and probably some non-public as well. One collision can be used to generate countless of colliding files Use SHA2 for hashing instead please. – Michal Ambroz Apr 23 '19 at 18:18
I ended up using hexdump to convert the binary files to there hex representation and then opened them in meld / kompare / any other diff tool. Unlike you I was after the differences in the files.
hexdump tmp/Circle_24.png > tmp/hex1.txt
hexdump /tmp/Circle_24.png > tmp/hex2.txt
meld tmp/hex1.txt tmp/hex2.txt
-
1Use
hexdump -v -e '/1 "%02x\n"'
if you want to diff and see exactly which bytes were inserted or removed. – William Entriken Mar 17 '17 at 21:13 -
Meld also works with binary files when they aren't converted to hex first. It shows hex values for things which aren't in the char set, otherwise normal chars, which is useful with binary files that also contain some ascii text. Many do, at least begin with a magic string. – Felix Dombek Jul 12 '19 at 11:20
You can use MD5 hash function to check if two files are the same, with this you can not see the differences in a low level, but is a quick way to compare two files.
md5 <filename1>
md5 <filename2>
If both MD5 hashes (the command output) are the same, then, the two files are not different.
-
7Can you explain your down votes please? SHA1 has 4 upvotes, and if the OP thinks there's a chance the two files could be the same or similar, the chances of a collision are slight and not worthy of down voting MD5 but up voting SHA1 other than because you heard you should hash your passwords with SHA1 instead of MD5 (that's a different problem). – Rikki Jan 16 '16 at 1:10
-
3not sure about the reason but a pure cmp will be more efficient than computing any hash function of files and comparing them (at least for only 2 files) – Paweł Szczur Apr 26 '16 at 13:58
-
1if the two files are large and on the same disk (not ssd), the md5 or sha* variant might be faster because the disks can read the two files sequentially which saves lots of head movements – Daniel Alder Feb 22 '17 at 20:08
-
8I downvoted because you posted a minor variant of an earlier (bad) solution, when it should have been a comment. – johncip Mar 6 '17 at 10:07
Use cmp command. Refer to Binary Files and Forcing Text Comparisons for more information.
cmp -b file1 file2
-
1
-b
doesn't compare files in "binary mode". It actually "With GNUcmp
, you can also use the-b
or--print-bytes
option to show the ASCII representation of those bytes.". This is exactly what I found using URL to manual that you have provided. – Victor Yarema Nov 22 '16 at 5:28 -
Victor Yarema, I don't know what you mean by "binary mode".
cmp
is inherently a binary comparison in my opinion. The-b
option merely prints the first byte that is different. – H2ONaCl Dec 24 '16 at 8:25
For finding flash memory defects, I had to write this script which shows all 1K blocks which contain differences (not only the first one as cmp -b
does)
#!/bin/sh
f1=testinput.dat
f2=testoutput.dat
size=$(stat -c%s $f1)
i=0
while [ $i -lt $size ]; do
if ! r="`cmp -n 1024 -i $i -b $f1 $f2`"; then
printf "%8x: %s\n" $i "$r"
fi
i=$(expr $i + 1024)
done
Output:
2d400: testinput.dat testoutput.dat differ: byte 3, line 1 is 200 M-^@ 240 M-
2dc00: testinput.dat testoutput.dat differ: byte 8, line 1 is 327 M-W 127 W
4d000: testinput.dat testoutput.dat differ: byte 37, line 1 is 270 M-8 260 M-0
4d400: testinput.dat testoutput.dat differ: byte 19, line 1 is 46 & 44 $
Disclaimer: I hacked the script in 5 min. It doesn't support command line arguments nor does it support spaces in file names
-
-
@unseen_rider which shell, which line? Please call the script using
sh -x
for debugging – Daniel Alder Feb 4 '17 at 12:20 -
-
@unseen_rider I can't help you this way. The script is ok. Please post your debug output to pastebin.com. You can see here what I mean: pastebin.com/8trgyF4A. Also, please tell me the output of
readlink -f $(which sh)
– Daniel Alder Feb 5 '17 at 12:33 -
The last command gives
/bin/dash
. Currently creating paste on pastebin. – unseen_rider Feb 6 '17 at 2:39
Diff with the following options would do a binary comparison to check just if the files are different at all and it'd output if the files are the same as well:
diff -qs {file1} {file2}
If you are comparing two files with the same name in different directories, you can use this form instead:
diff -qs {file1} --to-file={dir2}
OS X El Capitan
Try diff -s
Short answer: run diff
with the -s
switch.
Long answer: read on below.
Here's an example. Let's start by creating two files with random binary contents:
$ dd if=/dev/random bs=1k count=1 of=test1.bin
1+0 records in
1+0 records out
1024 bytes (1,0 kB, 1,0 KiB) copied, 0,0100332 s, 102 kB/s
$ dd if=/dev/random bs=1k count=1 of=test2.bin
1+0 records in
1+0 records out
1024 bytes (1,0 kB, 1,0 KiB) copied, 0,0102889 s, 99,5 kB/s
Now let's make a copy of the first file:
$ cp test1.bin copyoftest1.bin
Now test1.bin and test2.bin should be different:
$ diff test1.bin test2.bin
Binary files test1.bin and test2.bin differ
... and test1.bin and copyoftest1.bin should be identical:
$ diff test1.bin copyoftest1.bin
But wait! Why is there no output?!?
The answer is: this is by design. There is no output on identical files.
But there are different error codes:
$ diff test1.bin test2.bin
Binary files test1.bin and test2.bin differ
$ echo $?
1
$ diff test1.bin copyoftest1.bin
$ echo $?
0
Now fortunately you don't have to check error codes each and every time because you can just use the -s
(or --report-identical-files
) switch to make diff be more verbose:
$ diff -s test1.bin copyoftest1.bin
Files test1.bin and copyoftest1.bin are identical
Radiff2 is a tool designed to compare binary files, similar to how regular diff compares text files.
Try radiff2
which is a part of radare2
disassembler. For instance, with this command:
radiff2 -x file1.bin file2.bin
You get pretty formatted two columns output where differences are highlighted.
My favourite ones using xxd hex-dumper from the vim package :
1) using vimdiff (part of vim)
#!/bin/bash
FILE1="$1"
FILE2="$2"
vimdiff <( xxd "$FILE1" ) <( xxd "$FILE2" )
2) using diff
#!/bin/bash
FILE1=$1
FILE2=$2
diff -W 140 -y <( xxd $FILE1 ) <( xxd $FILE2 ) | colordiff | less -R -p ' \| '
md5sum binary1 binary2
If the md5sum is same, binaries are same
E.g
md5sum new*
89c60189c3fa7ab5c96ae121ec43bd4a new.txt
89c60189c3fa7ab5c96ae121ec43bd4a new1.txt
root@TinyDistro:~# cat new*
aa55 aa55 0000 8010 7738
aa55 aa55 0000 8010 7738
root@TinyDistro:~# cat new*
aa55 aa55 000 8010 7738
aa55 aa55 0000 8010 7738
root@TinyDistro:~# md5sum new*
4a7f86919d4ac00c6206e11fca462c6f new.txt
89c60189c3fa7ab5c96ae121ec43bd4a new1.txt
-
1
-
-
Slim, but worse than using some variant of
diff
, over which there is no reason to prefer it. – sawa Jan 25 '19 at 6:24 -
You would have to change MD5 hash to SHA2 in order for this advice to be practical. Anyone's laptop can these days generate collision in MD5 and based on this single collision prefix (2 files of the same size, same prefix and same MD5) to generate infinite number of colliding files (having same prefix, different colliding block, same suffix) – Michal Ambroz Apr 23 '19 at 18:11
There is a relatively simple way to check if two binary files are the same.
If you use file input/output in a programming language; you can store each bit of both the binary files into their own arrays.
At this point the check is as simple as :
if(file1 != file2){
//do this
}else{
/do that
}
cmp
specifically says it does a byte by byte comparison so that is my default for 2 binary files.diff
is line by line and will give you the same Yes/No answer but of course not the same dump to the standard out stream. If the lines are long because perhaps they are not text files then I would prefercmp
.diff
has the advantage that you can specify a comparison of directories and the-r
for recursion thereby comparing multiple files in one command. – H2ONaCl Dec 24 '16 at 8:07