Game Programming Patterns / Behavioral Patterns

Bytecode

×

This book is a work in progress!

If you see a mistake, find something unclear, or have a suggestion, please file a ticket. To know when new chapters are up, join the mailing list:

Thank you!

— Bob (@munificentbob)

Intent

Give behavior the flexibility of data by encoding it as instructions for a virtual machine.

Motivation

Making games may be fun, but it certainly ain’t easy. Modern games require enormous, complex codebases. Console manufacturers and app marketplace gatekeepers have stringent quality requirements, and a single crash bug can prevent your game from shipping.

At the same time, we’re expected to squeeze every drop of performance out of the platform. Games push hardware like nothing else, and we have to optimize relentlessly just to keep pace with the competition.

To handle these high stability and performance requirements, we reach for heavyweight languages like C++ that have both low-level expressiveness to make the most of the hardware, and rich type systems to prevent or at least corral bugs.

We pride ourselves on our skill at this, but it has its cost. Being a proficient programmer takes years of dedicated training, after which you must contend with the sheer scale of your codebase. Build times for large games can vary somewhere between "go get a coffee" and "go roast your own beans, hand-grind them, pull an espresso, foam some milk and practice your latte art in the froth".

On top of these challenges, games have one more nasty constraint: fun. Players demand a play experience that’s both novel and yet carefully balanced. That requires constant iteration, but if every tweak requires bugging an engineer to muck around in piles of low-level code and then waiting for a glacial recompile, you’ve killed your creative flow.

Spell fight!

Let’s say we’re working on a magic-based fighting game. A pair of wizards square off and fling enchantments at each other until a victor is pronounced. We could define these spells in code, but that means an engineer has to be involved every time one is modified. When a designer wants to tweak a few numbers and get a feel for them, they have to recompile the entire game, reboot it, and get back into a fight.

Like most games these days, we also need to be able to update it after it ships, both to fix bugs and to add new content. If all of these spells are hard-coded, then updating them means patching the actual game executable.

Let’s take things a bit farther and say that we also want to support modding. We want users to be able to create their own spells. If those are in code, that means every modder needs a full compiler toolchain to build the game, and we have to release the sources. Worse, if they have a bug in their spell, it can crash the game on some other player’s machine.

Data > code

It’s pretty clear that our engine’s implementation language isn’t the right fit. We need spells to be safely sandboxed from the core game. We want them to be easy to modify and reload, and physically separate from the rest of the executable.

I don’t know about you, but to me that sounds a lot like data. If we could define our behavior in separate data files that the game engine loads and "executes" in some way, we can achieve all of our goals.

We just need to figure out what "execute" means for data. How do you make some bytes in a file express behavior? There are a few ways to do this. I think it will help you get a picture of this pattern’s strengths and weaknesses if we compare it to another one: the Interpreter pattern.

The Interpreter pattern

I could write a whole chapter on this pattern, but four other guys already covered that for me. Instead, I’ll cram the briefest of introductions in here. It starts with a language — think programming language — that you want to execute. Say, for example, it supports arithmetic expressions like this:

(1 + 2) * (3 - 4)

Then you take each piece of that expression, each rule in the language’s grammar, and turn it into an object. The number literals will be objects:

Basically just little wrappers around the raw value. The operators will be objects too, and they’ll have references to their operands. If you take into account the parentheses and precedence, that expression magically turns into a little tree of objects like so:

The interpreter pattern isn’t about creating that tree, it’s about executing it. The way it works is pretty clever. Each object in the tree is an expression, or a subexpression. In true object-oriented fashion, we’ll let expressions evaluate themselves.

First, we define a base interface that all expressions implement:

class Expression
{
public:
  virtual ~Expression() {}
  virtual double evaluate() = 0;
};

Then we define a class that implements that for each kind of expression in our language’s grammar. The simplest one is numbers:

class NumberExpression : public Expression
{
public:
  NumberExpression(double value)
  : value_(value)
  {}

  virtual double evaluate()
  {
    return value_;
  }

private:
  double value_;
};

A literal number expression just evaluates to its value. Addition and multiplication are a bit more complex because they contain subexpressions. Before they can evaluate themselves, they need to recursively evaluate those first. Like so:

class AdditionExpression : public Expression
{
public:
  AdditionExpression(Expression* left, Expression* right)
  : left_(left),
    right_(right)
  {}

  virtual double evaluate()
  {
    // Evaluate the operands.
    double left = left_->evaluate();
    double right = right_->evaluate();

    // Add them.
    return left + right;
  }

private:
  Expression* left_;
  Expression* right_;
};

Pretty neat right? Just a couple of simple classes and now we can represent and evaluate arbitrarily complex arithmetic expressions. We just need to create the right objects and wire them up correctly.

It’s a beautiful, simple pattern, but is has some problems. Look up at the illustration. What do you see? Lots of little boxes, and lots of arrows between them. Code is represented as a sprawling fractal tree of tiny objects. That has some unpleasant consequences:

Put those together, and what do they spell? S-L-O-W. There’s a reason most programming languages in wide use aren’t based on the Interpreter pattern. It’s just too slow, and uses up too much memory.

Machine code, virtually

Consider our game. When we run it, the player’s computer doesn’t traverse a bunch of C++ grammar tree structures at runtime. Instead, we compile it ahead of time to machine code and the CPU runs that. What’s machine code got going for it?

This sounds swell, but we don’t want actual machine code for our spells. Letting users provide machine code which our game executes is just begging for security problems. What we need is a compromise between the performance of machine code and the safety of the Interpreter pattern.

What if instead of loading actual machine code and executing it directly, we defined our own virtual machine code? We’d then write a little emulator for it in our game. It would be similar to machine code — dense, linear, relatively low-level — but would also be handled entirely by our game so we could safely sandbox it.

We’d call our little emulator a virtual machine, and the synthetic binary machine code it runs bytecode. It’s got the flexibility and ease of use of defining things in data, but better performance than higher-level representations like the Interpreter pattern.

This sounds daunting, though. My goal for the rest of this chapter is to show you that if you keep your feature list pared down, it’s actually pretty approachable. Even if you end up not using this pattern yourself, you’ll at least have a better understanding of Lua and many other languages which are implemented using it.

The Pattern

An instruction set defines the low-level operations that can be performed. These are encoded as a sequence of bytes. A virtual machine executes these instructions one at a time, using a stack for intermediate values. By combining instructions, complex high-level behavior can be defined.

When to Use It

This is the most complex pattern in this book, and not something to throw into your game lightly. Use it when you have a lot of behavior you need to define and your game’s implementation language isn’t a good fit because:

Of course, that list describes a bunch of your game. Who doesn’t want a faster iteration loop or more safety? However, that doesn’t come free. Bytecode is slower than native code, so it isn’t a good fit for performance-critical parts of your engine.

Keep in Mind

There’s something seductive about creating your own language or system-within-a-system. I’ll be doing a minimal example here, but in the real world, these things tend to grow like vines.

Every time I see someone define a little language or a scripting system, they say, "Don’t worry, it will be tiny." Then, inevitably, they add more and more little features until it’s a full-fledged language. Except, unlike some other languages, it grew in an ad-hoc organic fashion and has all of the architectural elegance of a shanty town.

Of course, there’s nothing wrong with making a full-fledged language. Just make sure you do so deliberately. Otherwise, be very careful to control the scope of what your bytecode can express. Put a short leash on it before it runs away from you.

You’ll need a front-end

Low-level bytecode instructions are great for performance, but a binary bytecode format is not what your users are going to author. One reason we’re moving behavior out of code is so that we can express it at a higher level. If C++ is too low level, making your users effectively write in assembly language — even one of your own design — isn’t an improvement!

Much like the Gang of Four’s Interpreter pattern, it’s assumed that you also have some way to generate the bytecode. Usually, users author their behavior in some higher level format and a tool translates that to the bytecode that our VM understands. In other words, a compiler.

I know, that sounds scary. That’s why I’m mentioning it here. If you don’t have the resources to build an authoring tool, then bytecode isn’t for you. But, as we’ll see later, it may not be as bad as you think.

You’ll miss your debugger

Programming is hard. We know what we want the machine to do, but we don’t always communicate that correctly — we write bugs. To help find and fix those, we’ve amassed a pile of tools to understand what our code is doing wrong, and how to right it.

We have debuggers, static analyzers, decompilers, etc. All of those tools are designed to work with some existing language: either machine code or something higher level.

When you define your own bytecode VM, you leave those tools behind. Sure, you can step through the VM in your debugger, but that tells you what the VM itself is doing, and not what the bytecode it’s interpreting is up to. It certainly doesn’t help you map that bytecode back to the high-level form is was compiled from.

If the behavior you’re defining is simple, you can scrape by without too much tooling to help you debug it. But as the scale of your content grows, plan to invest real time into features that help users see what their bytecode is doing. Those features might not ship in your game, but they’ll be critical to ensure that you actually can ship your game.

Sample Code

After the previous couple of sections, you might be surprised how straightforward the implementation is. First, we need to craft an instruction set for our VM. Before we start thinking about bytecode and stuff, let’s just think about it like an API.

A magical API

If we were defining spells in straight C++ code, what kind of API would we need for that code to call into? What are the basic operations in the game engine that spells are defined in terms of?

Most spells ultimately change one of the stats of a wizard, so we’ll start with a couple for that:

void setHealth(int wizard, int amount);
void setWisdom(int wizard, int amount);
void setAgility(int wizard, int amount);

The first parameter identifies which wizard is affected, say 0 for the player’s and 1 for their opponent. This way, healing spells can affect your own wizard, while damaging attacks harm your nemesis. These three little methods cover a surprisingly wide variety of magical effects.

If the spells just silently tweaked stats, the game logic would be fine, but playing it would bore players to tears. Let’s fix that:

void playSound(int soundId);
void spawnParticles(int particleType);

These don’t affect gameplay, but they crank up the intensity of the gameplay experience. We could add more for camera shake, animation, etc., but this is enough to get us started.

A magical instruction set

Now let’s see how we’d turn this programmatic API into something that can be controlled from data. Let’s start small and then we’ll work our way up to the whole shebang. For now, we’ll ditch all of the parameters to these methods. We’ll say the set___() methods always affect your own wizard and always max out the stat. Likewise, the FX operations always play a single hard-coded sound and particle effect.

Given that, a spell is just a series of instructions. Each one identifies which operation you want to perform. We can enumerate them:

enum Instruction
{
  INST_SET_HEALTH      = 0x00,
  INST_SET_WISDOM      = 0x01,
  INST_SET_AGILITY     = 0x02,
  INST_PLAY_SOUND      = 0x03,
  INST_SPAWN_PARTICLES = 0x04
};

To encode a spell in data, we just store an array of enum values. We’ve only got a few different primitives, so the range of enum values easily fits into a byte. This means a spell is just a list of bytes — ergo "bytecode".

To execute a single instruction, we see which primitive it is and dispatch to the right API method:

switch (instruction) {
  case INST_SET_HEALTH:
    setHealth(0, 100);
    break;

  case INST_SET_WISDOM:
    setWisdom(0, 100);
    break;

  case INST_SET_AGILITY:
    setAgility(0, 100);
    break;

  case INST_PLAY_SOUND:
    playSound(SOUND_BANG);
    break;

  case INST_SPAWN_PARTICLES:
    spawnParticles(PARTICLE_FLAME);
    break;
}

In this way, our interpreter forms the bridge between code world and data world. We can wrap this in a little VM that executes an entire spell like so:

class VM
{
public:
  void interpret(char bytecode[], int size)
  {
    for (int i = 0; i < size; i++)
    {
      char instruction = bytecode[i];
      switch (instruction)
      {
        // Cases for each instruction...
      }
    }
  }
};

Type that in and you’ll have written your first virtual machine. Unfortunately, it’s not very flexible. We can’t define a spell that touches your opponent, or lowers a stat. We can only play one sound!

To get something that starts to have the expressive feel of an actual language, we need to get parameters in here.

A stack machine

The Interpreter pattern handles parameters and subexpressions by explicitly wiring them together as a tree of nested objects, but we want the speed of a flat list of instructions. To do that, we need to make the data flow that brings parameters to the callsite that needs them implicit in the structure of the code. We’ll do it the same way your CPU does: with a stack.

class VM
{
public:
  VM()
  : stackSize_(0)
  {}

  // Other stuff...

private:
  static const int MAX_STACK = 128;
  int stackSize_;
  int stack_[MAX_STACK];
};

The VM maintains an internal stack of values. In our example, the only kinds of values our instructions work with are numbers, so we can just use a simple array of ints. Whenever a bit of data needs to work its way from one instruction to another, it gets there through the stack.

Like the name implies, values can be pushed onto or popped off of the stack, so let’s add a couple of methods for that:

class VM
{
private:
  void push(int value)
  {
    // Check for stack overflow.
    assert(stackSize_ < MAX_STACK);
    stack_[stackSize_++] = value;
  }

  int pop()
  {
    // Make sure the stack isn't empty.
    assert(stackSize_ > 0);
    return stack_[--stackSize_];
  }

  // Other stuff...
};

When an instruction needs to receive parameters, it pops them off the stack like so:

switch (instruction)
{
  case INST_SET_HEALTH:
  {
    int amount = pop();
    int wizard = pop();
    setHealth(wizard, amount);
    break;
  }

  case INST_SET_WISDOM:
  case INST_SET_AGILITY:
    // Same as above...

  case INST_PLAY_SOUND:
    playSound(pop());
    break;

  case INST_SPAWN_PARTICLES:
    spawnParticles(pop());
    break;
}

To get some values onto that stack, we need one more instruction: a literal. It represents a raw integer value. But where does it get its value from? How do we avoid some turtles-all-the-way-down infinite regress here?

The trick is to take advantage of the fact that our instruction stream is just a sequence of bytes: we can stuff the number directly in the instruction stream. We define another instruction type for a number literal like so:

case INST_LITERAL:
{
  // Read the next byte from the bytecode.
  int value = bytecode[++i];
  push(value);
  break;
}

It reads the next byte in the bytecode stream as a number, and pushes it onto the stack.

Let’s string a few of these instructions together and watch the interpreter execute them to get a feel for how the stack works. We start with an empty stack and the interpreter pointing to the first instruction:

First, it executes the first INST_LITERAL. That reads the next byte from the bytecode (0) and pushes it onto the stack.

Then it executes the second INST_LITERAL. That reads the 10 and pushes it.

Finally, it executes INST_SET_HEALTH. That pops 10 and stores it in amount, then pops 0 and stores it in wizard. Then it calls setHealth() with those parameters.

Ta-da! We’ve got a spell that sets your wizard’s health to ten points. Now we’ve got enough flexibility to define spells that set either wizard’s stats to whatever amounts we want. We can also play different sounds and spawn particles.

But… this still just feels like a data format. We can’t, for example, raise a wizard’s health by half of their wisdom. Our designers want to be able to express rules for spells, not just values.

Behavior = composition

If we think of our little VM like a programming language, all it supports now is a couple of built-in functions and constant parameters for them. To get bytecode to feel like behavior, what we’re missing is composition.

Our designers need to be able to create expressions that combine different values in interesting ways. For a simple example, they want spells that modify a stat by a certain amount instead of to a certain amount.

That requires taking into account a stat’s current value. We have instructions for writing a stat, we just need to add a couple to read stats:

case INST_GET_HEALTH:
{
  int wizard = pop();
  push(getHealth(wizard));
  break;
}

case INST_GET_WISDOM:
case INST_GET_AGILITY:
  // You get the idea...

As you can see, these work with the stack in both directions. They pop a parameter to determine which wizard to get the stat for, then look up the stat’s value and push that back onto the stack.

This lets us write spells that copy stats around. We could create a spell that set a wizard’s agility to their wisdom, or a strange incantation that set one wizard’s health to mirror his opponent’s.

Better, but still quite limited. Next we need arithmetic. It’s time our baby VM learned how to add 1 + 1. We’ll add a few more instructions. By now, you’ve probably got the hang of it and can guess how they look. I’ll just show addition:

case INST_ADD:
{
  int b = pop();
  int a = pop();
  push(a + b);
  break;
}

Like our other instructions, it pops a couple of values, does a bit of work, and then pushes the result back. Up until now, every new instruction gave us an incremental improvement in expressiveness, but we just made a big leap. It isn’t obvious, but we can now handle all sorts of complicated, deeply nested arithmetic expressions.

Let’s walk through a slightly more complex example. Say we want a spell that increases your wizard’s health by the average of their agility and wisdom. In code, that’s:

setHealth(0, getHealth(0) +
    (getAgility(0) + getWisdom(0)) / 2);

You might think we’d need instructions to handle the explicit grouping that parentheses give you in the expression here, but the stack supports that implicitly. Here’s how you could evaluate this by hand:

  1. Get our wizard’s current health and remember it.
  2. Get our wizard’s agility and remember it.
  3. Do the same for their wisdom.
  4. Get those last two and add them, and remember the result.
  5. Divide that by two and remember the result.
  6. Recall the wizard’s health and add it to that result.
  7. Take that result and set the wizard’s health to that value.

Do you see all of those "remembers" and "recalls"? Each "remember" corresponds to a push, and the "recalls" are pops. That means we can translate this to bytecode pretty easily. For example, the first line to get the wizard’s current health is:

LITERAL 0
GET_HEALTH

This bit of bytecode pushes our wizard’s health onto the stack. If we mechanically translate each line like that, we end up with a chunk of bytecode that evaluates our original expression. To give you a feel for how the instructions compose, I’ve done that below.

To show how the stack changes over time, we’ll walk through a sample execution where the wizard’s current stats are 45 health, 7 agility, and 11 wisdom. Next to each instruction is what the stack looks like after executing it and then a little comment explaining the instruction’s purpose.

LITERAL 0    [0]            # Wizard index
GET_HEALTH   [45]           # getHealth()
LITERAL 0    [45, 0]        # Wizard index
GET_AGILITY  [45, 7]        # getAgility()
LITERAL 0    [45, 7, 0]     # Wizard index
GET_WISDOM   [45, 7, 11]    # getWisdom()
ADD          [45, 18]       # Add agility and wisdom
LITERAL 2    [45, 18, 2]    # Divisor
DIVIDE       [45, 9]        # Average agility and wisdom
ADD          [54]           # Add average to current health
SET_HEALTH   []             # Set health to result

If you watch the stack at each step, you can see how data flows through it almost like magic. We push the wizard’s current health at the beginning and it just hangs around at the bottom of the stack until we finally need it for the last addition at the end.

A virtual machine

I could keep going, adding more and more instructions, but this is a good place to stop. As it is, we’ve got a nice little VM that lets us define fairly open-ended behavior using a simple, compact data format. While "bytecode" and "virtual machines" sound intimidating, you can see they’re often as simple a stack, a loop, and a switch statement.

Remember our original goal to have behavior be nicely sandboxed? Now that you’ve seen exactly how the VM is implemented, it’s obvious that we’ve accomplished that. The bytecode can’t do anything malicious or reach out into weird parts of the game engine because we’ve only defined a few instructions that touch the rest of the game.

We control how much memory it uses by how big of a stack we create, and we’re careful to make sure it can’t overflow that. We can even control how much time it uses. In our instruction loop, we can track how many we’ve executed and bail out if it goes over some limit.

There’s just one problem left: actually creating the bytecode. So far, we’ve taken bits of pseudocode and compiled them to bytecode by hand. Unless you’ve got a lot of free time, that’s not going to work in practice.

Spellcasting tools

One of our initial goals was to have a higher-level way to author behavior, but we’ve gone and created something lower-level than C++. It has the runtime performance and safety we want, but absolutely none of the designer-friendly usability.

To fill that gap, we need some tooling. We need a program that lets users define the high-level behavior of a spell, then takes that and generates the apppropriate low-level stack machine bytecode.

That probably sounds way harder than making the VM. Many programmers were dragged through a compilers class in college and took away from it nothing but PTSD triggered by the sight of a book with a dragon on the cover or the words "lex" and "yacc".

In truth, compiling a text-based language isn’t that bad, though it’s a bit too broad of a topic to cram in here. However, you don’t have to do that. What I said we need was a tool — it doesn’t have to be a compiler whose input format is a text file.

On the contrary, I encourage you to consider building a graphical interface to let users define their behavior, especially if the people using it won’t be highly technical. Writing text that’s free of syntax errors is difficult for people who haven’t spent years getting used to a compiler yelling at them.

Instead, you can build an app that lets users "script" by clicking and dragging little boxes or pulling down menu items, or whatever else makes sense for the kind of behavior you want them to create.

The nice thing about this is that your UI can make it impossible for users to create "invalid" programs. Instead of vomiting error messages on them, you can just proactively disable buttons or provide default values to ensure that the thing they’ve created is valid at all points in time.

This spares you from designing a grammar and writing a parser for a little language. But, I know, some of you find UI programming equally unpleasant. Well, in that case, I don’t have any good news for you.

Ultimately, this pattern is about expressing behavior in a user-friendly, high-level format. You have to design the user experience for that format, and to execute it efficiently, you need to translate it into a lower-level form. It is real work, but if you’re up to the challenge, it can pay off in spades.

Design Decisions

I tried to keep this chapter as simple as I could, but what we’re really doing is creating a language. That’s a pretty open-ended design space. Exploring it can be tons of fun, so make sure you don’t forget to finish your game.

How do instructions access the stack?

Bytecode VMs come in two main flavors: stack-based and register-based. In a stack-based VM, instructions always work from the top of the stack, like our sample code. For example, INST_ADD pops two values, adds them, and pushes the result.

Register-based VMs still have a stack. The only difference is that instructions can read their inputs from deeper in it. Instead of INST_ADD always popping its operands, it has two indexes stored in the bytecode that identify where in the stack the to read the operands from.

So which should you do? My recommendation is to stick with a stack-based VM. They’re simpler to implement, and much simpler to generate code for. Register-based VMs got a reputation for being a bit faster after Lua converted to that style, but it depends deeply on your actual instructions and lots of other details of your VM.

What intructions do you have?

Your instruction set defines the boundaries of what can and cannot be expressed in bytecode, and also has a big impact on the performance of your VM. Here’s a laundry list of the different kinds of instructions you may want:

How our values represented?

Our sample VM only works with one kind of value, integers. That makes it easy: the stack is just a stack of ints. A more full-featured VM will support different data types: strings, objects, lists, etc. You’ll have to decide how those are stored internally.

My recommendation: If you can stick with a single data type, do that. Otherwise, do a tagged union. That’s what almost every language interpreter in the world does.

How is the bytecode generated?

I saved the most important question for last. I’ve walked you through the code to consume and interpret bytecode, but it’s up to you to build something to produce it. The typical solution here is to write a compiler, but it’s not the only option.

See Also