Working with Octet Streams in Javascript
From 0 to 255 in 1ms.
If you’ve ever handled files at all with Javascript, you’ve probably seen the Uint8Array typed array or ArrayBuffer object; either one is frequently caught handling a large array of numbers, each ranging from 0 to 255. As the name may have tipped you off, these are unsigned (meaning no negative numbers) 8 bit integers.
8 bit integers are extremely important in the history of computing. Quick history lesson: A bit — binary digit — represents our most basic unit used to compute. It alone can only have 2 values, 0 or 1. But in a sequence, it can represent many more. Originally, computers chose different lengths of bit sequences to encode data. These varied length sequences became known as bytes — in any system, one byte was used to encode one character on a computer. Since it was the smallest useful amount of bits, it became the core value for storing data in computers. Ultimately, 8 bit bytes became the standard. Even today, most computers are built on multiples of 8 bytes to process data.
So how do 8 bits turn into a value from 0–255? Let every bit place n represent 2^n, starting with index 0. For every 1 in a bit place n, add the value of 2^n to your total. You have every number possible here!
‘application/octet-stream’
Back to MIME RFC 2046 (Multipurpose Internet Mail Extensions — back when it seemed like emails were going to be the primary form of communication, AKA 1993!). This standard was huge — while initially intended for email, it essentially defined the standards by which we still transfer our files today in both email, HTTP, and many other internet communication protocols. The headers we write at the top of every request and response? All MIME standardized.
This enables us to specify what content we’re sending or expecting. It’s not a huge leap to connect this standardization with the interactive, multimedia focused internet we have today; it’s fact. These are the foundations that we still build on today.
Generally, we fill in the precise data type in our ‘Content-Type’ headers. This can include ‘image/png’, ‘audio/mp3’, ‘application/msword’, but also ‘application/octet-stream.’ I’ve never heard of an octet-stream application…
Okay, I’m sure I’ve teased it enough already that you get the point! Octet streams are quite simply larger sequences made up of 8 bit integers (AKA smaller 8 binary digit sequences). As you may have guessed, everything is an octet stream, just a stream of these 8 bit computer building blocks. However, MIME RFC 2046 describes octet streams as ‘arbitrary binary data’. The difference here is that the computer sending as well as the computer receiving (or at least the processes handling the transfer) have no idea what this data is. Either way, if you’re sending or receiving one of these requests, you’re going to be handling an array of raw binary data.
Transferring Files with Javascript
In Javascript, these concepts can seem extremely abstract when you first start working with files; in fact, they are rarely explained, and are frequently explained around.
The
FileReader
object lets web applications asynchronously read the contents of files (or raw data buffers) stored on the user's computer, usingFile
orBlob
objects to specify the file or data to read. [Mozilla FileReader Docs]
Mozilla is awesome, and so is their documentation — don’t get me wrong! I don’t think it’s their job to explain these lower-level details, but to a novice programmer, it leaves you curious (which is all the fuel needed apparently, since I’m writing this blog post…).
If you’re allowing a user to upload a file, you’re probably using some form of <input type="file” />
. This returns a File
object. But you can’t transfer a File
with your HTTP request; for one, you don’t know what language the receiving entity is using. Second, you’re using HTTP! To the internet (and everything that reads the request after), this is just a string that can potentially be converted to some sort of content (as indicated by your ‘Content-Type’ header). Sending a file like this will return a weird incorrect mix of characters as the HTTP process attempts (and fails) to understand and translate your File
object to a readable string. What to do? Convert to the most basic blocks that computers can universally understand (aside from 6-bit byte hipster computers).
We use the FileReader
object to read this File
object and return to us the contents of the file in its 8 bit integer sequence form. Specifically, we get back an ArrayBuffer
object when we use the FileReader.readAsArrayBuffer()
method, which “represents a chunk of data; it has no format.” To work with this data, we need a ‘view’. You may think of your front end pages hearing this term, but in our case, it means another array that is typed — which is to say, has a set data type, starting offset, and length. Mozilla explains it pithily: “A view provides context.”
Assuming your data uses octets, use new UInt8(yourArrayBuffer)
to have a mutable structure to work with! From here, you can use this with the Node.JS Buffer class, which affords you an expansive API for dealing with data. You can convert it into a ReadableStream, which can be helpful for large files or files of an indeterminate size. Alternatively, if you don’t need to do much with the file, you can post this array in your HTTP request, and the receiving end can download it and convert it (assuming you’ve properly filled out your ‘Content-Type’ header!).
Happy buffering!