Shaders.

How to draw high fidelity graphics when all you have is an x and y coordinate.

╌╌╌╌

The shader pipeline showing the different stages and how they connect to each other.

Shaders are a great example of how constraints make people creative - they are simple programs that run in parallel on the GPU⁸, with the goal of working out the value of a single pixel. But each instance of a shader program only really knows one thing: its x and y position.

So how the hell do people create such insane, high fidelity graphics with just an x and y coordinate? Well the short answer is maths. And the long answer will, unfortunately, also involve some maths.

╌╌╌╌

What is a shader?

Let's get this out of the way: a shader is a type of program designed to run in parallel on the GPU.

When you think of a shader, you're probably picturing some trippy animated graphics, but those are usually fragment shaders⁶ which is just one type of shader. There are actually multiple types and they work together in a sort of pipeline that is designed primarily for rendering real time 3D graphics.

The goal of this pipeline is essentially to figure out what color every pixel in the scene should be, and each step in the pipeline calculates some small part of that before passing it on. But why do we need shaders at all?

The goal of the shader pipeline is to determine the color of each pixel.

Well before we had shaders, developers didn't really have fine-grained control over the way lighting and other effects were applied by the GPU. This was called a Fixed Function Pipeline and before the early 2000s this was how all consumer GPUs shipped, with a fairly fixed set of lighting and rendering effects.

So shaders were designed to make the graphics pipeline programmable, allowing developers to create almost any effect they wanted, directly on the GPU. Since then, shaders have broken out of the world of game engines and into places like the web.

But shaders can be a little daunting because programming on the GPU is a little different to 'normal' programming where things happen in sequence.

How a GPU works

Because shaders run on the GPU, before we can dig into how they work, we need to know a little more about the environment they are designed for. If you've read the chapter on how a GPU works^↗ you can skip this bit, but we won't go into as much detail here.

In the really early days of computing, computers didn't have screens at all - they would just print their outputs - so when GUIs⁹ came along they added a lot of extra demand to the CPU which was in charge of updating the pixel values being displayed.

Updating the screen is an annoying job for a CPU - it's not particularly difficult to figure out what color a pixel should be but there are a ton of pixels and they need updating a lot. CPUs are designed for almost the exact opposite sort of thing - executing one, potentially complicated, task as fast as possible before moving onto the next.

Each CPU core processes instructions in a sequential pipeline.

So, hardware companies realised the need for dedicated hardware to update the framebuffer⁷ and talk to the video controller. Over the course of decades, this developed into what we now call GPUs - discrete pieces of hardware that are specifically designed for the process of updating an array of pixels as fast as possible.

A GPU dedicates more resources to compute and memory than control.

Because they are designed specifically for this purpose, they can make some tradeoffs that the more general purpose CPU can't. A CPU is designed to minimize latency, or put differently, to process a single instruction stream as quickly as possible. They have a small number of really powerful cores that run at super fast clock speeds and can do clever predictions to make sure almost no cycles are wasted.

A GPU makes the opposite trade-off, maximising throughput over latency to process as many instructions as possible with thousands of small, simple and relatively slow cores. Any single instruction is processed slower than it would be on a CPU, but it can get through more instructions per second by the sheer number of instructions it can process simultaneously.

A modern multi-core CPU can process around a hundred billion instructions every second, but a modern GPU can process tens of trillions instructions per second.

The way they do this is by having thousands of smaller, less powerful cores that are very efficient at doing some specific tasks, like matrix multiplication¹⁰ or figuring out the sine of an angle. These cores are arranged into groups called compute units or streaming multi-processors (SM) which can dispatch out tasks for them to complete.

The key is that these cores can work in parallel because the type of operations they are good at are easy to divide up and complete simultaneously - in fact we usually call this type of work embarrassingly parallel.

GPUs typically arrange groups of cores into compute units or streaming multi-processors

Shaders allow us to run programs on these compute units but you're probably starting to understand why they have some strange constraints. The whole reason shaders are fast is they split up work and run independently of each other, but this design necessitates keeping the complexity to a minimum.

Because they are being run at the same time, the calculation of one shader instance can't depend on the result of another. This means we can't pass data between instances of our shader but we can pass data down to all of our instances. We call these uniforms¹⁴, not variables, because each instance receives the exact same value.

The GPU likes to keep all these cores busy, so as soon as one is free it's given a new piece of work. There's no guarantee that its new task is related at all to the previous one, so in this sense each core is memory-less and can't compute something based on a previous output.

It's not all parallel though, the different types of shaders run as part of a sequential pipeline where we can pass data from one stage to the next. Let's dig into the graphics pipeline to see how shaders actually fit into the whole process.

The graphics pipeline

Shaders run as part of the graphics rendering pipeline which is designed primarily for rendering 3D graphics. Although it actually has a bunch of steps, we can simplify it down to three main steps:

Vertex Shading^↓ — transforming vertices.
Rasterisation^↓ — preparing fragments.
Fragment Shading^↓ — calculating pixel values.

Let's imagine we are rendering a cube in 3D space and walk through what needs to happen in every step of the pipeline to render that cube.

Before anything happens on the GPU, the CPU, which is running our application logic, issues a draw call to the GPU. Along with that draw call it gives the GPU the vertex¹⁸ data required to render the scene which the GPU stores in memory as Vertex Buffer² Objects, or VBOs.

The vertex data includes stuff like positions of the vertices, any normals¹¹, texture¹³ coordinates, or material properties that are needed to render any given geometry.

The CPU prepares the data for the graphics pipeline.

The first step to happen on the GPU is the Input Assembler (IA). It reads the vertex data from the VBO and starts using the data to assemble primitives. In our case it's going to take vertex data and build the cube out of triangles.

The input assembler takes the vertex data and assembles it into primitives.

This is important because now we know exactly how many vertices our shape will have and how many instances of our vertex shader¹⁹ we will need to run.

Vertex shading

The vertex shader runs once for each vertex in our geometry and so obviously each instance of the shader runs with a different value for the position of the vertex. Since the vertex shader determines where the vertex is in our final scene, we can use it to transform that position however we'd like.

To rotate the cube, we can apply a rotation matrix to each vertex where the angle is based on the elapsed time. Remember that the rotation matrix is a uniform, which means it is the same for each vertex, it's only the vertex position that is different for each instance of our shader.

This is quite a simple example, but vertex manipulation allows you to apply effects that would be pretty difficult otherwise. Imagine we have a flat plane, which is made up of dozens of vertices in a mesh. Using just some basic trigonometry, we can manipulate each vertex of the plane with a sine wave.

A vertex shader applying a sine wave to a flat plane.

Right now we are applying the sine wave in one direction, but if we apply it from the origin we can create a ripple.

A vertex shader applying a ripple wave to a flat plane.

We can use some uniforms to pass parameters down to our vertex shader - in this case we are altering the frequency and amplitude of the sine wave. Although the uniforms are the same for each instance of our shader, we can update the uniform over time so each frame gets a new value.

Three different planes showing the effect of sine wave frequency and amplitude modulation.

We can decide how many vertices a given mesh has - in 3D software we can usually set the subdivisions for basic shapes like this. More subdivisions, means more intersections and therefore more vertices. We can make this choice based on how much detail we want to show but it comes with a performance tradeoff.

A vertex shader needs to run for every vertex, so doubling the amount of vertices obviously doubles the amount of times our vertex shader is run and how many threads are needed.

Three different planes showing the effect of increasing the number of vertices in a flat plane mesh.

Before our vertex shader returns the vertex information, we typically need to convert it to screen space. If you've ever used a 3D program, you'll know that when you create a geometry it has an origin, or a point where each axis begins { x:0, y:0, z:0 }. For our cube the origin is at its center, so the coordinates for a given vertex are relative to this origin.

But obviously, when we actually want to render these geometries, we need their coordinates relative to the flat screen. To do this, the first step is to translate the coordinates from model space into world space. We do this by multiplying each vertex by the modelMatrix, a matrix uniform passed down to our shader from the 3D application, making the coordinates relative to the world origin.

The process of converting from model space to world space.

The model matrix is actually where we usually apply all the basic transforms, like rotation, scaling and translating that we've been describing so far. We never actually update the original vertex positions, but rather describe their transforms in this matrix.

Next we need to figure out where the vertex is relative to the camera, known as view space, and we do that by multiplying the vertex data by the viewMatrix. Lastly, we apply the camera perspective using the projectionMatrix and transform the coordinates relative to the viewport size and resolution.

The process of converting from world space to screen space.

Before we move onto the next step, remember how we said that shaders can't really communicate with each other? Well that's not entirely true. Because a vertex shader runs before a fragment shader, we can pass variables calculated in the vertex shader to the fragment shader. These have historically been called varyings¹⁶ because unlike uniforms, they will be different based on the output of the vertex shader.

Say for example we wanted each vertex to have a different color so we return a varying from the vertex shader called vColor. That color value is then interpolated when it's passed to the fragment shader, so if our fragment⁵ happens to be half way between the blue and red vertices, the value of vColor it receives will be purplish.

How a varying is interpolated across fragments

This is also how we can pass stuff like normals, basically vectors¹⁷ describing the direction a surface is facing, to the fragment shader for lighting effects.

In some older graphics systems there are actually a few optional steps after the vertex shader and before the rasteriser: tessellation and geometry shaders. I don't want to confuse you too much so we won't go into too much detail on these.

A tessellation shader basically allows us to add more detail to geometry by subdividing it up into smaller primitives. You might use this in a video game to add more detail to stuff that is closer to the camera. Not super helpful for our toy example.

A tessellation shader can add more detail to a geometry on the fly.

A geometry shader allows us to entirely add or remove elements on the fly. So, maybe we've determined that an element is so far away from the camera that it shouldn't even be rendered, we can use the geometry shader to do that.

Rasterisation

So this is the point at which we have to go from points and coordinates into pixels. The rasterizer, which is a non-programmable step, takes the primitives and their transformed positions and figures out which pixels they cover, which is why we needed to convert the coordinates into screen space.

You may have noticed that our cube is actually made up of triangles, and each rasterizer works on a single triangle to figure out which pixels fall inside it. For each of these pixels, it generates a fragment.

The rasteriser generates fragments based on which pixels are covered by a triangle.

A fragment is basically all the information our fragment shader will need to figure out the final color of pixel - this includes any uniforms, textures and interpolated varyings from the vertex stage, as well as the depth of the shape at this point.

We are jumping slightly ahead here but the depth of each fragment is important for something called depth testing. The idea being that if this fragment isn't visible because it falls behind something that occludes it, we shouldn't bother writing it to the frame buffer.

Depth testing checks which shapes occlude each other.

In practice, this means that when we go to write our fragment to a specific pixel coordinate in the framebuffer, we check the depth value of that pixel in the z-buffer²⁰ and only overwrite it if our fragment value is smaller than that one.

The frame buffer is only updated based on values in the z-buffer

Another complicated thing that happens around here is anti-aliasing¹. Because we are converting from vector coordinates to a pixel grid, there will be some pixels that are half covered by the triangle. Ideally, we want these to fragments to be a sort of blended color based on the two shapes that cover it.

To do this, modern systems use Multisample Anti-aliasing (MSAA) which samples multiple points within these edge pixels to determine how much of each color to blend together. We use the number of sample points covered by the fragment to figure out how to weight the blending of the two colors when we eventually write to the framebuffer.

Multisample Anti-aliasing is used to blend out aliasing artifacts

Complicated, I know. Anyway, after the rasteriser has generated the fragments, we spin up a fragment shader for each one and send along the fragment data.

Fragment shading

All this information is obviously passed to the fragment shader which is going to use it to determine the color value of the pixel covered by this fragment. In the simplest case, the fragment shader just applies textures, lighting models, and other material properties.

Let's walk through a simple example to see how a fragment shader can be used to create a gradient. For simplicity, imagine, that we are just working with a flat plane that's 20 pixels high and 20 pixels wide. Each of the 400 pixels in this plane are a fragment and they are all being determined by the same fragment shader.

A simple 20 x 20 plane with 400 individual fragments.

A fragment shader has a single main() function that returns a color value. If we just want the entire plane to be the same color, it's pretty simple, we just make sure our fragment shader returns that color.

A simple fragment shader that returns a single color, rgb(0.75,0.0,0.0), for each fragment.

But how do we apply something like a gradient when all our shader really knows is the x and y coordinate of the pixel it is calculating? Well, remember we can pass down some uniforms to each instance of our fragment shader, like the width and height.

We can store our resolution in a vec2 uniform, which is a vector that has 2 components, the width and height. To make a simple gradient from left to right, we can divide the fragment position, which is also a vec2, by the resolution and use the resulting x component as the red component of the final color.

A simple fragment shader that returns a gradient on the x-axis

If we want a vertical gradient we can return the y component of the vector instead.

A simple fragment shader that returns a gradient on the y-axis

We can play around with a bunch of variations of this gradient by using different components of the st vector in the final color components.

We can also use the current clock time and some trigonometry to animate the color components over time. Remember that each new frame will get a new value for the u_time uniform, which combined with a sin() function causes the green component to oscillate between 1 and -1.

Animating the green component with a sin() function based on the current time.

Aside from this basic example, fragment shaders are more commonly used to provide realistic lighting effects to our geometries. For our case, we want to add a basic lighting setup to our cube by approximating some real world lighting calculations with a method called Phong¹² lighting.

Phong lighting is created by combining three simple lighting techniques:

Ambient lighting - creates some uniform, minimum amount of light.
Diffuse lighting - lights our object based on the position of a light source.
Specular lighting - adds reflective highlights based on the relationship between the light source and the viewer.

The different lighting that make up the Phong lighting effect.

Because real light comes from many sources, bouncing and scattering off surfaces, it's basically impossible for an object to be completely dark. A really cheap and easy way to approximate this is to create an ambient lighting constant that we apply to the final color so that objects always have some minimum amount of lightness.

Of course, light also has some sort of color or temperature to it, so we multiply the light color which we store as a vec3 by the ambient strength and then multiply the object color by this ambient light before returning the final color.

We can alter the ambient lighting intensity by increasing the strength factor.

The next stage is more interesting. Diffuse lighting makes an object brighter based on the angle of a particular fragment relative to a light source. If the angle between a particular fragment and a light source is perpendicular, it will be brighter than if it is at a more obtuse angle. To do this we need to know a few things first: the position, color, and intensity of a light source and the direction that a particular fragment is facing.

Diffuse lighting alters a fragments brightness based on an angle between the surface and the light source.

We can store the light position, color, and intensity as different uniforms and pass that information down to to our fragments, so that's easy. We also can calculate the direction of the light towards our fragment by getting the angle between the two positions, but how do we get the direction of our fragment? Well, this is called a normal vector which we touched on briefly in the vertex shading section.

Surface normals are vectors perpendicular to the face of a surface.

Each triangle in our mesh has a surface normal, which is a unit vector¹⁵ that points out perpendicular to the face of the triangle. For a flat shape like our cube, we could just use this surface normal in our lighting calculation as it will be the same for each fragment in that triangle.

Surface normals of a cube and a curved plane.

But what if we were trying to shade a curved surface, like a sphere? We wouldn't want to use the surface normal because it would create flat shading for each surface that makes up the sphere.

The effects of flat surface shading on a sphere.

In these cases, we want to create normals for each vertex in our triangle by averaging the surface normals of the surrounding triangles.

Vertex normals are averaged based on the surrounding surface normals.

We can then pass these vertex normals to our fragment shader as varyings where they will be interpolated for each fragment, creating the effect of a smooth curved surface without the need for us to actually store information about that curve. Pretty neat.

Back to our diffuse lighting setup, now we have the direction of our fragment and our light source, we can use the dot product of those two vectors to determine the intensity of our diffuse lighting.

The dot product⁴ is an operation that multiplies two vectors of equal length together and produces a scalar which serves as a measure of how much the two vectors point in the same direction. When the angle between the two vectors is 90 degrees, the dot product will be 0 and the more acute the angle between them gets the closer the dot product gets to 1.

Like with our ambient lighting, we multiply the light's color by the intensity we calculated with the dot product (we limit this so it can never go negative). We then add the diffuse and ambient results together and multiply the object color by the result.

The last step is specular lighting which basically adds reflective highlights to the surface to mimic how some surfaces reflect a light source. The amount of specular lighting we want is related to the reflective properties of the material we are trying to mimic. A material like glass will show more reflective highlights than a rougher material like wood.

Just like diffuse lighting, the intensity of a reflection is based on the direction of the light to the surface but is also based on the direction of the viewer to the surface. The direction of the reflection away from the surface is sort of like the mirror of the light direction and as the direction of the reflection and viewer converge the reflection intensity increases.

The missing piece we need to calculate this is the position of the camera, which we store as another uniform. We can then get the angle to the fragment using their two positions, which gives us a vector representing the direction of the viewer.

Similar to before, we calculate the dot product of the reflection and viewer vectors to give us the intensity value. We then raise it to the power of 8, which represent the shininess value of the surface - the higher this value, the smaller and less diffuse the reflection is - and multiply all of this by a strength factor and the light color again.

Lastly, we add this into our ambient and diffuse lighting and multiply it by the object color.

That was a lot of detail but I think it demonstrates just how flexible and powerful fragment shading is. Luckily, when you use 3D software, you get these lighting effects for free with the materials you use, so you'll likely never have to worry about this sort of stuff.

After the fragment shader is finished, we write the fragment into the framebuffer so it can be read by the display controller and rendered on the screen. Before each fragment is written into the framebuffer it will undergo some tests of its visibility, like the depth testing we spoke about earlier or stencil testing.

The stencil test is used for things like masking or clipping, where we check whether the coordinate of the pixel falls outside of the bounds of a stencil, in which case we bail out again. We also handle things like blending and opacity at this stage, so we might blend our fragment output with the value in the framebuffer based on the anti-aliasing result from the rasteriser.

Writing shaders

If shaders weren't complicated enough, there's a bunch of other stuff you need to be aware of if you are going to start writing some yourselves.

Let's start with environments they run in - unless you are an absolute weirdo and can write machine code directly on a GPU, you'll need to interact with a GPU through some sort of API. Historically, we've used this cross platform API called OpenGL that allows us to interact with the GPU and fiddle with the rendering pipeline in a bunch of different languages.

The same people who make OpenGL also developed a language specifically for writing shaders called GLSL, which is a C-like language that is compiled at run time by the graphics driver. Mostly, the code you'll find online for shaders is written GLSL. OpenGL has a web implementation that allows us to write web applications that can interact with the GPU, called webGL

But, of course, nothing is very simple. Microsoft has it's own graphics API called DirectX or Direct3D and they have their own object-orientated shader language called HLSL which is compiled in advance into an executable. Apple has its own graphics API layer for Apple hardware called Metal with yet another specific shader language called MSL.

Oh, and by the way, OpenGL is considered obsolete and was replaced by something called Vulkan and WebGL is being being replaced by WebGPU which no longer supports GLSL. Lovely.

The good news is it's highly unlikely you'll ever interact with any of this stuff and there are layers of interoperability that keep everything running smoothly. If you're like me, you'll write your shaders in something like Three.js which uses WebGL/WebGPU, which in turn will interact with one of the lower level APIs depending on the browser and operating system.

It's not just 3D graphics rendering that can benefit from being run on the GPU - there are a bunch of other types of work that can benefit from being split up and run in parallel. The problem is, the graphics rendering pipeline we've just described is quite bespoke to that type of problem.

Newer APIs have introduced compute shaders³, which is a type of shader that runs outside of this pipeline but still has the ability to interact with it. Imagine we wanted to render a 3D particle system that depended on a physics simulation. Updating the position of each particle's vertex would consume a lot of CPU resources, so instead we can offload that to a compute shader and feed those values into the pipeline at the vertex shader stage.

As well as being faster because the particle positions can be calculated in parallel, this has the added advantage of storing all the vertex positions in GPU memory and so it's much faster for the GPU to access them.

There's also CUDA (Compute Unified Device Architecture) which is a set of APIs for NVIDIA GPUs that allows you to use the GPU outside of the graphics rendering pipeline for compute. This is what's often used to run AI/ML models, which are extremely parallel problems that need to be run completely outside of the rendering pipeline.

In case you've lost track, here's a summary of all the different APIs:

* considered legacy
Technology	Company	Platform	Graphics or Compute
OpenGL^*	Khronos Group	Cross-platform	Graphics (+ compute via compute shaders in newer versions)
WebGL ^*	Khronos Group	Browsers	Graphics
Vulkan	Khronos Group	Cross-platform	Both
WebGPU	W3C	Browsers	Both
Metal	Apple	Apple platforms	Both
Direct3D	Microsoft	Windows and Xbox	Graphics (+ compute via compute shaders)
CUDA	NVIDIA	NVIDIA GPUs	Compute

╌╌╌╌

So that's the long, complicated answer to what was quite a simple question. In summary, shaders are cool as hell but they require a pretty big mindset shift from writing software that's run in sequence. The reason I think learning about the graphics pipeline is important is because it makes the shift a bit easier.

Luckily, between frameworks like Three.js and AI getting better at writing shader code, it's never been easier to mess around with shaders on your own.

Glossary

¹Anti-Aliasing — A technique used in computer graphics to reduce the appearance of jagged edges on curved or diagonal lines by smoothing pixel colors.

²Buffer — A region of memory used to store data temporarily, often for transferring data between the CPU and GPU or between different stages of a graphics pipeline.

³Compute Shader — A shader that performs general-purpose computing tasks on the GPU, not limited to graphics rendering, enabling parallel processing of data.

⁴Dot Product — A mathematical operation that multiplies two vectors and returns a scalar value, often used to measure how aligned two vectors are.

⁵Fragment — A potential pixel generated during rasterization that contains data such as color and depth, which is processed by the fragment shader to produce the final pixel color.

⁶Fragment Shader — A shader that calculates the final color of each pixel (fragment) on screen, often applying textures, lighting, and other visual effects.

⁷Framebuffer — A region of memory that holds the complete frame of image data being rendered, which will be displayed on screen.

⁸GPU (Graphics Processing Unit) — specialized hardware designed to process many parallel operations simultaneously, particularly for rendering graphics and performing compute-intensive tasks.

⁹GUI (Graphical User Interface) — A visual interface that allows users to interact with a computer or software using graphical elements like windows, icons, and buttons.

¹⁰Matrix Multiplication — A mathematical operation that produces a new matrix by multiplying two matrices, commonly used in graphics and linear algebra to transform coordinates.

¹¹Normal — A vector perpendicular to a surface, used in 3D graphics to determine how light interacts with that surface.

¹²Phong Lighting — A shading technique used in 3D computer graphics to simulate the way light interacts with surfaces, producing realistic highlights and shading.

¹³Texture — A bitmap image applied to the surface of a 3D model to give it color and detail.

¹⁴Uniform — A global variable in shader programs that remains constant across all instances of a shader during a single draw call.

¹⁵Unit Vector — A vector with a magnitude of one, used to indicate direction without scaling.

¹⁶Varying — A variable that is interpolated between shader stages, passing data from vertex shaders to fragment shaders with values smoothly blended across surfaces.

¹⁷Vector — A quantity with both magnitude and direction, often represented as an arrow in 2D or 3D space.

¹⁸Vertex — A point in 3D space that defines the corners or intersections of geometric shapes, used as the basic building block in 3D modeling and graphics.

¹⁹Vertex Shader — A shader that processes each vertex in 3D geometry, typically used to transform vertex positions and pass data to later pipeline stages.

²⁰Z-Buffer — A type of buffer that stores depth information for pixels in 3D graphics to handle occlusion and determine which objects are visible in a scene.

╌╌ END ╌╌

3D and Graphics / Shaders

What is a shader?

How a GPU works

The graphics pipeline

Vertex shading

Rasterisation

Fragment shading

Writing shaders

Glossary