5

Sorry if this question seems simplistic or overly broad.

I would like to clarify what a binary file is. I know that a binary file is a binary encoded file.

Is a file format like JPEG classified as a being a binary file?

Wikipedia simply states that a binary file is any binary encoded file for computerized storage / processing and that anything wholly text based is regarded as a plain-text file, that is, not a binary file.

| |
8

Well, you understand that every file that has content is a binary file, every single one without exception, including a file with a .txt extension.

The one and only difference between a binary file with a .txt extension and one with a .jpg extension is really a meta difference: convention and historical practice tell us that we can make assumptions about the first file:

  1. it is to be interpreted as a collection of contiguous 8-bit fields;
  2. each such field represents an ASCII character; and
  3. most important, there are no control fields -- no counts, no state-change indicators, none of that.

Otherwise, there's no difference between what we -- only by convention -- call a text file and any other file.

Furthermore, there is no way to know how a file should be interpreted just by looking at its contents. We have to depend upon something external to the file -- like its extension, say -- to give us a hint at what the thing is.

| |
  • 1
    Many file formats include "in-band" identifying strings or "magic numbers" that can provide hints on the file type as well. So reading the file and looking for these hints can reveal the file type with a degree of probability. The only thing really enforcing file types based on the content within is the programs that read and write them, not the OS or filesystem. – LawrenceC Dec 16 '13 at 20:10
  • Many text files have extensions other than .txt (e.g., .html) or no extension at all (e.g., README). – mouviciel Sep 21 '16 at 11:07
  • 2
    Text files encompass far more than single-byte ASCII-encoded characters. – kreemoweet Sep 21 '16 at 11:11
6

I would describe this to my mom (hope neither of you take offense to this) -- is that any file that is contains gibberish when opened in notepad is a binary file.

When I refer to binaries at work, they're typically outputs of the compiler. These may have readable text embedded inside, but still considered binaries.

A JPEG is a binary file.

UPDATE:

The distinction becomes more important with FTP, where you are in ASCII or Binary transfer mode. This has to do with interpreting the line endings (NL versus CRLF) for multiple systems. You wouldn't want to modify a JPEG that uses the newline code as this risks corruption.

| |
  • This question stems from storage of binary files in a mongodb collection. MongoDB supports storage of binary files - would you classify a JPEG as a binary file? – wulfgarpro Jul 9 '11 at 12:15
  • 1
    Yes. The JPEG spec details how to encode and decode the image. Even a simpler format though -- say a raw Bitmap -- is also a Binary file. In the Bitmap case, it has a small header and then a color code for every pixel in the image. The JPEG involves a few extras such as quantization tables. In the end they're both binary but the respective specs say how the binary should be interpreted to produce an image – jglouie Jul 9 '11 at 12:24
  • The JPEG format is a specification for encoding images as a series of bytes which don't tend to make sense when viewed with Notepad (or any text editor), as @jglouie said. – pavium Jul 9 '11 at 12:25
  • Thanks guys. This is what I understood - just wanted to clarify it. – wulfgarpro Jul 9 '11 at 12:27
  • 1
    Encrypted text is an exception to the "gibberish" rule. But I still like this description. :) – Kon Jul 9 '11 at 12:29

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

Not the answer you're looking for? Browse other questions tagged or ask your own question.