Beau­ti­ful Racket / Chap­ter 2

Make your first programming language — in one hour

This is an excerpt from a book I’m currently writing about making programming languages with Racket. I welcome comments & suggestions. Of course, I’m publishing the book with Pollen.
— Matthew Butterick

Let’s make stacker, a pro­gram­ming lan­guage that imple­ments a stack-based cal­cu­la­tor.

1
2
3
4
5
6
#lang stacker
push 4
push 8
+
push 3
*
1
36

The lan­guage relies on a stack of argu­ments, which starts out empty. The push com­mand puts a new argu­ment on the top of the stack. An oper­a­tor — either + or * — takes the top two argu­ments from the stack, and replaces them with the result of apply­ing the oper­a­tor (mean­ing first + second or first * second). So this sam­ple pro­gram means 3 * (8 + 4), which is 36.

Even though stacker is sim­ple, it will let us tour the whole lan­guage-mak­ing work­flow in Racket. Every lan­guage project will fol­low the same basic steps.

Later, we’ll see how stacker can become the foun­da­tion of a whole vir­tual-machine lan­guage.

Pre­req­ui­sites

Install Racket and the beautiful-racket pack­age as described in Setup. Though you can work through this tuto­r­ial with any code edi­tor, I rec­om­mend — and will assume you’re using — Racket’s graph­i­cal edi­tor, DrRacket.

Some gen­eral pro­gram­ming expe­ri­ence is assumed. No spe­cific expe­ri­ence with Racket, Scheme, or Lisp is assumed. Don’t worry if some of the ter­mi­nol­ogy or con­cepts go over your head — all the impor­tant top­ics will be reit­er­ated later.

What is a pro­gram­ming lan­guage?

On a tech­ni­cal level, a pro­gram­ming lan­guage just another pro­gram. It takes cer­tain input, then eval­u­ates that input to pro­duce a result.

On a design level, a pro­gram­ming lan­guage is a spec­i­fi­ca­tion for what kind of input is accept­able, how that input should be inter­preted, and what kind of results should be gen­er­ated. In that regard, what makes a pro­gram­ming lan­guage dif­fer­ent from an ordi­nary pro­gram is that it offers the widest hori­zon of abstrac­tion.

On a cul­tural level, a pro­gram­ming lan­guage is a way to express our evolv­ing ideas and dis­cov­er­ies about pro­gram­ming gen­er­ally — logic, per­for­mance, ergonom­ics, and so on. Though a pro­gram­ming lan­guage is a way of writ­ing a pro­gram, it’s also a way of think­ing about a pro­gram.

Because if lan­guages were just about writ­ing pro­grams, we could’ve stopped with C. (And some have.) But what would be the fun in that? Pro­grams are inter­est­ing specif­i­cally because they’re mal­leable. And the more we expect out of pro­grams, the more vital it is to explore new ways of mak­ing pro­grams.

Includ­ing new lan­guages.

By the way, does a pro­gram­ming lan­guage need to be large & gen­eral-pur­pose, like C? Not at all. A domain-spe­cific lan­guage (or DSL) is a “lit­tle lan­guage” — opti­mized to han­dle a spe­cific task, and noth­ing else. Some­times DSLs are designed to be used inside larger lan­guages — e.g,. reg­u­lar expres­sions, SQL. Some­times they stand alone — e.g., CSS/HTML, Post­Script, R, TeX. Many Unix-descended tools also qual­ify as DSLs — e.g., awk, bash, lex/yacc, make.

Why would I make a pro­gram­ming lan­guage?

If you think pro­gram­ming should fill your brain and soul with feel­ings of power, cre­ativ­ity, curios­ity, and fun — then you’ll prob­a­bly like mak­ing pro­gram­ming lan­guages.

If not, you can move along. (No hard feel­ings.)

But in terms of prac­ti­cal ben­e­fits:

Of course, a pro­gram­ming lan­guage isn’t the right solu­tion for every prob­lem. But when cre­at­ing a lan­guage is easy and inex­pen­sive — as the rest of this tuto­r­ial will show you — it becomes a real­is­tic option for many more prob­lems.

How are lan­guages imple­mented in Racket?

Racket is a descen­dant of Lisp and Scheme that pro­vides a high-level inter­face for mak­ing new lan­guages.

The basic idea is to take a pro­gram writ­ten in the new lan­guage and con­vert its sur­face syn­tax and seman­tics to a stan­dard Racket pro­gram. Then Racket runs this pro­gram nor­mally.

In that sense, every lan­guage in Racket is just an indi­rect way of writ­ing other Racket pro­grams (aka a source-to-source com­piler). This approach makes it pos­si­ble for a new lan­guage to imme­di­ately ben­e­fit from every­thing in Racket’s tool­chain: the libraries, the cross-plat­form com­pat­i­bil­ity, the DrRacket IDE, and so on.

Because every lan­guage is itself just a pro­gram, a lan­guage can do any­thing a pro­gram can. So you can make lan­guages that behave like tra­di­tional pro­gram­ming lan­guages. Or lan­guages that behave in uncon­ven­tional ways.

And why not? Mak­ing a pro­gram­ming lan­guage used to be a years-long com­mit­ment. Now you can make one in min­utes. (To be fair, this first one will be a lit­tle slower. But you’ll get the hang of it.)

The struc­ture of a lan­guage

Every lan­guage in Racket must pro­vide two func­tions:

  1. A reader, which con­verts the source file from a string of char­ac­ters into Racket-style paren­the­sized forms. (We’ll see how that works in a minute.)
  2. An expander, which deter­mines how the reader forms cor­re­spond to real Racket expres­sions, which are then eval­u­ated to pro­duce a result. (We’ll see why it’s called an expander later on.)

When you make a lan­guage, you’ll always have to spec­ify a reader and an expander. Some­times you’ll write them from scratch. Some­times you’ll import them from else­where.

But for stacker, we’ll write both the reader and expander. In most projects, the expander is more involved, so it often makes sense to start there and work back­ward to the reader. But for this project, they’re equally sim­ple. Thus, we’ll start with the reader, so we get a bet­ter sense of the pro­gram flow, start to fin­ish.

Boot­strap­ping your lan­guage

Every Racket source file begins with a #lang line. If you’ve explored Racket you might have seen vari­ants like:

1
2
3
4
#lang racket
#lang racket/base
#lang scribble/doc
#lang datalog

We just learned that the first step in inter­pret­ing a lan­guage is to process the source with a reader. The #lang line’s job is to tell Racket where to find a reader for the source file. The reader, in turn, will tell Racket where to find the expander.

Often, the reader and expander for a lan­guage are stored in dif­fer­ent source files. But they don’t have to be. For this project, we’ll just use one file, called stacker-lang.rkt. (There’s no magic in the file­name. You can call it what­ever you want.)

To use the abbre­vi­ated #lang stacker nota­tion, we’ll have to do some extra house­keep­ing that we’ll cover at the end of the tuto­r­ial.

For now, we’ll use the spe­cial boot­strap #lang reader ... nota­tion, like so:

1
#lang reader "stacker-lang.rkt"

This is the same as the nota­tion above, but allows us to use the path to the source file that has our reader.

Racket basics

We need to write some Racket code. So let’s learn some Racket.

Beyond that, Racket has the usual num­bers, strings, and func­tions that you know from other lan­guages. We’ll han­dle those as we encounter them.

Set­ting up our source files

Open DrRacket. Start a new file called stacker-lang.rkt. We’ll write this file using a spe­cial tuto­r­ial lan­guage called br. (br comes from the beautiful-racket pack­age we installed. It’s Racket, but with some extra con­ve­niences.) For test­ing pur­poses, let’s add a sec­ond line, so your whole file looks like this:

stacker-lang.rkt
1
2
#lang br
42

Click Run in DrRacket (or learn the key short­cut, because you’ll be using it a lot). In the inter­ac­tions win­dow, you’ll see the result of run­ning the file:

1
42

In the same direc­tory, cre­ate a sec­ond file that we’ll use to test our new lan­guage as we go:

stacker-test.rkt
1
#lang reader "stacker-lang.rkt"

Run this file too. You should get an awful-look­ing error like read-syntax: cannot find reader for `#lang reader "stacker-lang.rkt". If so, that’s the right result: we’re telling #lang to get a reader out of "stacker-lang.rkt", but we haven’t made one yet.

So let’s do that.

Design­ing our reader

Recall the pur­pose of the reader:

Before plung­ing into the code, we should visu­al­ize what our reader needs to accom­plish. Our stacker lan­guage will ulti­mately look like this:

1
2
3
4
5
6
#lang stacker
push 4
push 8
+
push 3
*

These, how­ever, are not paren­the­sized forms. How can we get there? We notice that our lan­guage has one instruc­tion on each line. Thus, the sim­plest idea — never a bad place to start — would be to just wrap paren­the­ses around each line:

1
2
3
4
5
(push 4)
(push 8)
(+)
(push 3)
(*)

Not bad. These now look like Racket expres­sions. If we code up a func­tion called push, these expres­sions would eval­u­ate cor­rectly.

The prob­lem comes with the oper­a­tors, which become expres­sions that look like (+) and (*). That paren­the­siz­ing means these func­tions would be eval­u­ated with­out any argu­ments. That would be wrong — they’re sup­posed to take their input argu­ments from the top of the stack. (You’re wel­come to run (+) or (*) in a sep­a­rate DrRacket win­dow to see what you get.)

We need to make it pos­si­ble to man­age the inter­ac­tion of our stack and our oper­a­tors. So let’s imag­ine that we have an inter­me­di­ate func­tion — call it dispatch — that will inspect each instruc­tion and decide what to do. Then we can wrap each instruc­tion like so:

1
2
3
4
5
(dispatch push 4)
(dispatch push 8)
(dispatch +)
(dispatch push 3)
(dispatch *)

What have we accom­plished? Most impor­tantly, our oper­a­tors won’t be treated as func­tion calls, but rather as argu­ments to another func­tion. (dispatch +) means “call dispatch with + as the argu­ment.” In that case, dispatch can get the argu­ments off the stack and apply the oper­a­tion. Like­wise for (dispatch push 4) — dispatch can just treat it like it were (push 4). There­fore, as long as we can write a dispatch func­tion for our expander that works this way — trust me, we can — then this reader trans­for­ma­tion will suf­fice.

Curi­ous char­ac­ters might won­der if we could make a reader that treated the two types of instruc­tions dif­fer­ently:

1
2
3
4
5
(push 4)
(push 8)
(dispatch +)
(push 3)
(dispatch *)

Yes, we could. But for this tuto­r­ial, we’re going to use the sim­plest pos­si­ble reader, and wrap every­thing with dispatch. We’ll reserve all the other work for the expander. In your own lan­guages, you can choose how to allo­cate respon­si­bil­i­ties between the reader and expander.

To sum­ma­rize — our reader will con­vert code that looks like this:

1
2
3
4
5
push 4
push 8
+
push 3
*

Into this:

1
2
3
4
5
(dispatch push 4)
(dispatch push 8)
(dispatch +)
(dispatch push 3)
(dispatch *)

Now that we know what our reader needs to do, we can imple­ment it.

Pro­gram­ming our reader: out­put

By con­ven­tion, Racket always expects the main reader func­tion to be called read-syntax. Racket will call this func­tion with two argu­ments: the path to the source file, and a port for read­ing from the file. But for now, let’s set these aside and focus on the out­put of the func­tion.

read-syntax has one job: to return a syn­tax object describ­ing a mod­ule. The mod­ule, in turn, will invoke the expander for our lan­guage, trig­ger­ing the full expan­sion and eval­u­a­tion of the input pro­gram.

Let’s see how this works by look­ing at some code.

stacker-lang.rkt
1
2
3
4
5
#lang br
(define (read-syntax src-path src-port)
  #'(module lucy br
      42))
(provide read-syntax)

To cre­ate a func­tion, we use define. (define (read-syntax src-path src-port) ...) defines a func­tion called read-syntax that accepts two argu­ments. These are posi­tional argu­ments, so we can name them what­ever we like. To make read-syntax func­tion avail­able out­side the file, we use provide. (In Racket, all def­i­n­i­tions within a source file are pri­vate by default.)

In the body of our func­tion we see our mod­ule syn­tax. (Racket has no return state­ment — a func­tion sim­ply returns the last expres­sion in its body.) In Racket, a mod­ule is the basic orga­ni­za­tional unit for code. Like every­thing else in Racket, a mod­ule is an expres­sion. A mod­ule expres­sion has this form:

1
2
(module name-of-module language-that-will-provide-the-expander
   expressions-to-evaluate ...)

So our mod­ule expres­sion:

1
2
(module lucy br
    42)

Means “a mod­ule named lucy, using the expander from the br lan­guage, that eval­u­ates the expres­sion 42.”

Then we need to turn this into a syn­tax object. A syn­tax object is a way of stor­ing a ref­er­ence to the lit­eral code of an expres­sion, so it can be eval­u­ated later. We can turn any expres­sion into a syn­tax object by wrap­ping it in the syntax con­verter:

1
2
(syntax (module untitled br
          42))

Syn­tax objects are so com­mon in Racket that there’s a short­hand nota­tion for mak­ing them — just append the #' pre­fix (pro­nounced hash-quote) to the expres­sion. The idiomatic way of writ­ing the above syn­tax object would be:

1
2
#'(module untitled br
    42)

“This syn­tax object will be eval­u­ated later — but when?” When we invoke the lan­guage. Save "stacker-lang.rkt". Run "stacker-test.rkt". This time, instead of an error, you should get a dif­fer­ent result:

1
42

What’s hap­pen­ing here? When we run "stacker-test.rkt", we’re invok­ing our lan­guage, which means call­ing our reader func­tion, read-syntax. The con­tents of the "stacker-test.rkt" source file are replaced with the result of this func­tion, which is a syn­tax object describ­ing a mod­ule. Once that syn­tax has been moved into the new loca­tion, it’s eval­u­ated, pro­duc­ing 42.

Let’s per­suade our­selves that noth­ing spooky has hap­pened. We can accom­plish the same trans­for­ma­tion by hand. We replace the #lang line in "stacker-test.rkt" with the mod­ule expres­sion (this time with­out the #' syn­tax pre­fix, because we want the code to be eval­u­ated), and run it:

stacker-test.rkt
1
2
(module untitled br
    42)

Once again we get 42.

This proved that we can make a round-trip from a source file to a reader and back. Our reader works, except that it’s not, you know, read­ing any­thing. So let’s fix that.

Pro­gram­ming our reader: input

We’ll upgrade our reader to do three new tasks:

  1. Read in the lines of the source file.
  2. Con­vert these lines to (dispatch ...) form.
  3. Put these new forms into the mod­ule we’re return­ing as a syn­tax object.

To do this, let’s swap out the code in "stacker-lang.rkt":

stacker-lang.rkt
1
2
3
4
5
6
7
8
9
#lang br
(define (read-syntax src-path src-port)
  (define src-strs (port->lines src-port))
  (define (make-datum str) (format-datum '(dispatch ~a) str))
  (define src-exprs (map make-datum src-strs))
  (inject-syntax ([#'(src-expr ...) src-exprs])
    #'(module stacker-reader br
        src-expr ...)))
(provide read-syntax)

We’ll step through each line to see what it does.

1
(define src-strs (port->lines src-port))

Recall that Racket passes two argu­ments to our func­tion: the path to the source file, which we’ve named src-path, and an input port point­ing at this file, which we’ve named src-port. A port is a generic inter­face for retriev­ing data from a file or other source. Using port->lines, we’ll retrieve all the lines from the file as strings, and store them in src-strs. (In this project, we won’t need src-path for any­thing.)

Next, we’ll con­vert these strings into datums. A datum is a string that’s been parsed as a Racket form, but not yet eval­u­ated. For instance, these are strings:

1
"(list 1 2 3)" "#t" "hello" "(+ 1 2)"

And these are the cor­re­spond­ing datums (I know the plural of datum ought to be data, but allow me some gram­mat­i­cal license, for clar­ity):

1
'(list 1 2 3)  #t  'hello  '(+ 1 2)

To cre­ate a datum from an exist­ing paren­the­sized expres­sion, we sim­ply wrap it with the quote con­verter:

1
2
(quote (module untitled br
          42))

Though as with syn­tax objects, there is ane equiv­a­lent short­hand nota­tion — in this case, we pre­fix the expres­sion with ' (also pro­nounced quote):

1
2
'(module untitled br
    42)

If a datum sounds like a syn­tax object, that’s no coin­ci­dence — a syn­tax object is just a datum with some extra infor­ma­tion attached. The #' pre­fix nota­tion we used for a syn­tax object now con­nects log­i­cally: we can think of the # as rep­re­sent­ing the extra info that gets attached to the datum:

1
2
#'(module untitled br
    42)

Back to our code:

1
2
(define (make-datum str) (format-datum '(dispatch ~a) str))
(define src-datums (map make-datum src-strs))

We want the strings we got from our source file to behave as expres­sions, so we’ll con­vert them to datums. Because we’re con­vert­ing a list of strings, we’ll use map to apply a con­ver­sion func­tion, make-datum, to every ele­ment in src-strs. This will cre­ate a new list of src-exprs. In turn, make-datum relies on format-datum to cre­ate a datum with the right form — '(dispatch ~a) (the value of str gets inter­po­lated into the ~a slot).

Finally, the twisti­est bit of code — but don’t panic:

1
2
3
(inject-syntax ([#'(src-expr ...) src-exprs])
  #'(module stacker-reader br
      src-expr ...))

Recall that the out­put of our func­tion is a syn­tax object describ­ing a mod­ule. We need to update this syn­tax object to incor­po­rate the datums we just made.

To do this, we’ll use inject-syntax to make new syn­tax avail­able within the mod­ule syn­tax. In the first line, we match our datums to a syn­tax pat­tern, #'(src-expr ...), which is just a way of label­ing ele­ments so they can be rearranged as pieces of syn­tax. Here, the syn­tax pat­tern unpacks the ele­ments of src-exprs so we can refer to them indi­vid­u­ally.

This is eas­ier to under­stand if you look at how the syn­tax object has changed. src-expr refers to the first item in the list. After that, the ... nota­tion means “do the same to every other item in the list.” So if src-exprs has three items, say (list '(dispatch 42) '(dispatch "Hello world") '(dispatch #t)), then the result­ing syn­tax object will look like this:

1
2
3
4
#'(module stacker-reader br
    (dispatch 42)
    (dispatch "Hello world")
    (dispatch #t))

Notice that the quote pre­fixes on our datums dis­ap­pear. That’s because a) inject-syntax auto­mat­i­cally upgrades our datums to syn­tax objects and then b) the syn­tax-object pre­fix #' on the front of the mod­ule expres­sion applies to every­thing in it.

Test­ing our reader

At this point, it would be nice to see if our updated reader works the way we hope. Let’s make a source file with these sam­ple val­ues and see if we do, in fact, get this code back. Save "stacker-lang.rkt". Update "stacker-test.rkt" as fol­lows:

stacker-lang.rkt
1
2
3
4
#lang reader "stacker-lang.rkt"
42
"Hello world"
#t

Run it. What hap­pens? You should get an error that looks like dispatch: unbound identifier in module. In time, you will think of this as a good-news error — it means that our code is try­ing to call the dispatch func­tion (which is what we wanted) but can’t, because we haven’t writ­ten it yet (which shouldn’t sur­prise us).

In the mean­time, how can we check the reader func­tion? For debug­ging pur­poses, we can take a sneaky short­cut: add a ' pre­fix to the src-expr, like so:

1
2
3
(inject-syntax ([#'(src-expr ...) src-exprs])
  #'(module stacker-reader br
      'src-expr ...))

What will hap­pen now? Before we run our test file again, let’s rea­son through the code. There is a trap afoot for the unwary. Here are the two pos­si­bil­i­ties.

Option A: “Well, adding the ' pre­fix turns an expres­sion into a datum, which is the lit­eral code src-expr. And if there are three lines of input, the ... means that datum will get repeated three times in the syn­tax object, result­ing in this—”

1
2
3
4
#'(module stacker-reader br
    'src-expr
    'src-expr
    'src-expr)

Option B: “No, every­thing works the same way as before, except that every line now has a ' in front of it, like so—”

1
2
3
4
#'(module stacker-reader br
    '(dispatch 42)
    '(dispatch "Hello world")
    '(dispatch #t))

As we might hope, the answer is B. But why? Don’t for­get that the point of using a syn­tax object is to delay eval­u­a­tion of the code inside. While we’re using inject-syntax, the only parts of our syn­tax object that can change are the pieces ref­er­enced by our syn­tax pat­tern — namely, the vari­able src-expr and the ... short­hand. The rest, includ­ing the ' pre­fix­ing, won’t be eval­u­ated until after the func­tion exits.

It may help to remem­ber that the ' pre­fix is equiv­a­lent to wrap­ping in quote like so:

1
2
3
(inject-syntax ([#'(src-expr ...) src-exprs])
  #'(module stacker-reader br
      (quote src-expr) ...))

This makes it a lit­tle more plain that the quote will be repeated for every expres­sion in (src-expr ...).

But then what will hap­pen? Run stacker-test.rkt and see for your­self:

1
2
3
'(dispatch 42)
'(dispatch "Hello world")
'(dispatch #t)

By using the ' pre­fix within the syn­tax object, we con­vert our code back into datums, which print as val­ues in DrRacket. Now we can ask: did the reader do what we hoped? Looks like it did.

To fin­ish our test, let’s change our source file to use the input from the very first exam­ple:

stacker-test.rkt
1
2
3
4
5
6
#lang reader "stacker-lang.rkt"
push 4
push 8
+
push 3
*

Run it again and we get:

1
2
3
4
5
'(dispatch push 4)
'(dispatch push 8)
'(dispatch +)
'(dispatch push 3)
'(dispatch *)

Let’s com­pare that to the reader behav­ior we were look­ing for:

1
2
3
4
5
(dispatch push 4)
(dispatch push 8)
(dispatch +)
(dispatch push 3)
(dispatch *)

Except for the quotes (which are just there tem­porar­ily for debug­ging), that’s exactly right. We can move on — after we’ve made one last change to pre­pare the reader to coop­er­ate with the expander (which we’ll build­ing in the next sec­tion). To do this, we change the mod­ule expres­sion so that instead of invok­ing br as the expander, it invokes "stacker-lang.rkt". So the fin­ished reader looks like this:

stacker-lang.rkt
1
2
3
4
5
6
7
8
9
#lang br
(define (read-syntax src-path src-port)
  (define src-strs (port->lines src-port))
  (define (make-datum str) (format-datum '(dispatch ~a) str))
  (define src-exprs (map make-datum src-strs))
  (inject-syntax ([#'(src-expr ...) src-exprs])
    #'(module stacker-mod "stacker-lang.rkt"
        src-expr ...)))
(provide read-syntax)

The expander

To recap — every lan­guage in Racket must pro­vide two func­tions:

  1. The reader con­verts the source file from a string of char­ac­ters into Racket-style paren­the­sized forms.
  2. The expander deter­mines how the reader forms cor­re­spond to real Racket expres­sions, which are then eval­u­ated to pro­duce a result.

We made our reader. Now we’ll make our expander.

First, why is it called an expander? Recall that our reader con­sisted of a read-syntax func­tion that took source code and, instead of return­ing a value, pro­duced a syn­tax object that added (dispatch ...) to each line of the source code, in essence “expand­ing” it.

It turns out that a lot of things in Racket that look like run-time func­tions are instead copy­ing & rear­rang­ing code at com­pile time. These are a spe­cial class of func­tions called macros. Macros have a restricted inter­face: they take one syn­tax object as input, and return another syn­tax object as out­put. Thus, macros are also known as syn­tax trans­form­ers.

(Though macros are func­tions, it’s con­ven­tional in Racket to only refer to them as “macros,” and reserve the term “func­tion” for some­thing that’s called at run time. So if I say some­thing like “con­vert a func­tion into a macro,” that’s the intended con­trast.)

Under this def­i­n­i­tion, our read-syntax func­tion wasn’t quite a macro. Though it did return a syn­tax object, it accepted two input argu­ments: a path and an input port.

But as we’ll see, our expander is going to start with a macro, which will in turn bring in the other func­tions we need to run our source file.

Design­ing our expander

As we did with the reader, let’s first pen­cil out what our expander needs to do.

The reader was respon­si­ble for the form of the code; we could say the expander is respon­si­ble for its mean­ing. The expander’s job is to eval­u­ate the code pre­pared by the reader. The code can be eval­u­ated only if every name used in the code has a con­nec­tion to an actual value or func­tion. In Racket, a “name used in the code” is known as an iden­ti­fier, and “a con­nec­tion to an actual value or func­tion” is known as a bind­ing. In short, the expander has to ensure that every iden­ti­fier in the code has a bind­ing. (A vari­able in Racket = an iden­ti­fier + a bind­ing.)

We already came across this ter­mi­nol­ogy when we ran "stacker-test.rkt" with­out an expander (run it again if you like). We got the error dispatch: unbound identifier in module. Now we see what it means. The code from the reader refers to a dispatch iden­ti­fier, e.g.:

1
(dispatch push 4)

This expres­sion means “call the func­tion dispatch with the argu­ments push and 4.” But the expander hasn’t yet given dispatch a bind­ing. Hence the error when the pro­gram runs: dispatch is unde­fined.

This error isn’t endemic to our lan­guage. You’ll trig­ger the same error in Racket when­ever you use a name that hasn’t yet been defined, for instance:

1
2
#lang racket
foo
1
2
#lang scribble/doc
@bar[]

In our case, the code we’re get­ting from the reader will look like this:

1
2
3
4
5
(dispatch push 4)
(dispatch push 8)
(dispatch +)
(dispatch push 3)
(dispatch *)

From this sam­ple, we can list the iden­ti­fers that will need bind­ings:

  1. dispatch, which is a func­tion.
  2. push, which is a stack com­mand.
  3. + and *, which are stack oper­a­tors.

We also need num­bers. But we get those for free — in Racket, num­bers can’t be iden­ti­fiers. They auto­mat­i­cally eval­u­ate to their numeric value.

Our expander design is com­plete. If our expander pro­vides bind­ings for these iden­ti­fiers — and one spe­cial macro, that we’ll dis­cuss in the next sec­tion — then the stacker lan­guage will work.

Pro­gram­ming our expander: out­put

We saw that Racket starts the reader by invok­ing a func­tion called read-syntax. Sim­i­larly, Racket starts the expander by invok­ing a macro called, by con­ven­tion, #%module-begin. There­fore, every expander needs to pro­vide a #%module-begin macro.

Racket has a small num­ber of macros with spe­cial sta­tus. You can rec­og­nize them by the #% pre­fix. #%module-begin is spe­cial because when a mod­ule is eval­u­ated, all the code inside that mod­ule gets sent to #%module-begin, which can do any­thing it pleases with it, before return­ing a syn­tax object, after which the pro­gram expan­sion & eval­u­a­tion will con­tinue.

Because every Racket lan­guage pro­vides its own #%module-begin, the most com­mon way to imple­ment the #%module-begin in a new lan­guage is:

  1. Han­dle any lan­guage-spe­cific pro­cess­ing of the code.
  2. Pass the result to another #%module-begin for the heavy lift­ing.

Of course, because all the #%module-begins have the same name, some care needs to be taken.

But let’s see the care­less approach first. Put this code in "stacker-lang.rkt":

1
2
3
4
5
(define #'(#%module-begin reader-line ...)
  #'(#%module-begin
     (define studio (* 6 9))
     (displayln studio)))
(provide #%module-begin)

As before, we’ll use define and provide to set up our func­tion and make it avail­able to other files. We’ll talk about #'(#%module-begin reader-line ...) in the next sec­tion.

For now, let’s focus on the out­put of the macro. As with read-syntax, it’s another syn­tax object (mean­ing, it begins with #'). This pre­lim­i­nary #%module-begin isn’t going to do any­thing with the inbound code. It’s just going to cre­ate a vari­able called studio and print its value. Then it will pass this syn­tax to the #%module-begin from the br lan­guage (which is always avail­able within "stacker-lang.rkt" because the file is writ­ten in #lang br).

Make sure "stacker-test.rkt" still looks like this, and run it:

stacker-test.rkt
1
2
3
4
5
6
#lang reader "stacker-lang.rkt"
push 4
push 8
+
push 3
*

What hap­pened? Noth­ing yet? OK, one Mis­sis­sippi, two Mis­sis­sippi ... how about now? Noth­ing at all? Why is this tak­ing so long?

Unfor­tu­nately the run will never fin­ish, because our expander has an awful bug in it. Within DrRacket, select Force the pro­gram to quit or type ctrl+K. (Con­sider mem­o­riz­ing this key com­bi­na­tion, because it’s not the last time you’ll need to kill a pro­gram in your lan­guage-debug­ging career.)

Astute observers prob­a­bly already noticed the bug: when we defined our macro with the name #%module-begin, the #%module-begin inside the macro, instead of refer­ring to the #%module-begin in #lang br, now referred to itself. (A prob­lem of vari­able nam­ing more gen­er­ally known as shad­ow­ing.) So our #%module-begin called itself recur­sively ... for­ever.

We can cure the prob­lem by giv­ing our macro an arbi­trary name when we define it, and then using rename-out to change its name as it passes through provide, like so:

1
2
3
4
5
(define #'(stacker-module-begin reader-line ...)
  #'(#%module-begin
     (define studio (* 6 9))
     (displayln studio)))
(provide (rename-out [stacker-module-begin #%module-begin]))

After you replace this code, run "stacker-test.rkt" again, and you’ll get the expected result:

1
54

“Hold on. Why does renam­ing the macro work? Doesn’t the out­put syn­tax still refer to #%module-begin, which is the final name of the macro?” No, but only because Racket macros are hygienic. Macros copy code to other places, pack­aged in syn­tax objects. That code con­tains iden­ti­fiers. So a ques­tion arises: how should we deter­mine the bind­ing — aka, the mean­ing — of a copied iden­ti­fier? Should we look at the des­ti­na­tion for the copy (where the macro is being invoked)? Or at the source for the copy (where the macro is defined)?

Hygienic macros fol­low the sec­ond rule — a macro copies code, but the iden­ti­fiers used in the macro keep their bind­ings from the place where the macro was defined. (Hence the “hygiene” metaphor — the macro keeps its iden­ti­fiers “cleanly” sep­a­rated from those at the des­ti­na­tion.)

To see hygiene in action, go into DrRacket and hover your cur­sor over the #%module-begin inside stacker-module-begin. Now that we’ve changed the name of the macro, you’ll see a mes­sage pop up: binding #%module-begin imported from br. What hygiene guar­an­tees is that when­ever this macro is used, the #%module-begin iden­ti­fier will always refer to the #%module-begin from #lang br.

For now, we won’t delve into the details of how Racket keeps all this straight. And yes, it is pos­si­ble to over­ride hygiene when needed. What’s impor­tant to notice is that a macro is not merely a find-and-replace mech­a­nism — it floats within a bub­ble of bind­ings.

More­over, hygiene is good pol­icy: it guar­an­tees that wher­ever our macro is used, the copied code will always mean the same thing. Here’s an exam­ple of why that might be a wise idea. Curi­ous char­ac­ters are invited to paste this pro­gram into DrRacket and hover over the two ref­er­ences to * to under­stand why the *-macro does not, in fact, start global ther­monu­clear war (unfor­tu­nately, (* 6 9) does):

1
2
3
4
5
6
7
8
9
#lang br
(module the-macro br
  (provide *-macro)
  (define #'(*-macro x y) #'(* x y)))

(require 'the-macro) ; `require` imports bindings from a module
(define (* x y) 'global-thermonuclear-war-initiated)
(*-macro 6 9)
(* 6 9)

We’ve now proved that we can make a round-trip from our source file, through our reader, then through our expander, and back. We’re another step closer to hav­ing a work­ing lan­guage.

Pro­gram­ming our expander: input

Recall that a macro takes one syn­tax object as input and returns another. Con­sis­tent with that logic, we can use the #' syn­tax pre­fix within define to eas­ily con­vert a func­tion into a macro. Just add the #' pre­fix to both the input and out­put expres­sions, like so.

1
2
3
4
#lang br
(define (func x y z) (list x y z))
(define #'(mac x y z) #'(list x y z))
(list (func 4 5 6) (mac 4 5 6)) ; '((4 5 6) (4 5 6))

One wrin­kle is that the input expres­sion for a macro is actu­ally a syn­tax pat­tern, like we saw in our read-syntax func­tion:

1
2
3
(inject-syntax ([#'(src-expr ...) src-exprs])
  #'(module stacker-reader br
      src-expr ...))

That means we can use the ... syn­tax ellip­sis when needed:

1
2
3
4
#lang br
(define (func x y z) (list x y z))
(define #'(mac arg ...) #'(list arg ...))
(list (func 4 5 6) (mac 4 5 6)) ; '((4 5 6) (4 5 6))

There’s often more than one valid way to write a macro. For instance, these three macros pro­duce the same result (shouldn’t be hard to infer why):

1
2
3
4
5
6
#lang br
(define (func x y z) (list x y z))
(define #'(mac x y z) #'(list x y z))
(define #'(mac-2 x y z) #'(func x y z))
(define #'mac-3 #'func)
(list (func 4 5 6) (mac 4 5 6) (mac-2 4 5 6) (mac-3 4 5 6))

For our #%module-begin macro, we’ll invoke define with the syn­tax pat­tern #'(#%module-begin reader-line ...). In so doing, our macro will match the first line of the incom­ing code to reader-line, and the remain­ing lines to the ... short­hand. For now, we’ll just pass the reader-lines through to the next #%module-begin in the chain, leav­ing us with the sim­plest pos­si­ble macro:

1
2
3
4
(define #'(stacker-module-begin reader-line ...)
  #'(#%module-begin
     reader-line ...))
(provide (rename-out [stacker-module-begin #%module-begin]))

Test­ing our expander I/O

Just as we did in our reader, we can add the quote pre­fix ' inside our macro to print out the lit­eral code:

1
2
3
4
(define #'(stacker-module-begin reader-line ...)
  #'(#%module-begin
     'reader-line ...))
(provide (rename-out [stacker-module-begin #%module-begin]))

When we run our "stacker-test.rkt":

stacker-test.rkt
1
2
3
4
5
6
#lang reader "stacker-lang.rkt"
push 4
push 8
+
push 3
*

We should get the same result as we got when test­ing our reader:

1
2
3
4
5
'(dispatch push 4)
'(dispatch push 8)
'(dispatch +)
'(dispatch push 3)
'(dispatch *)

You might also get an error: Interactions disabled: "stacker-lang.rkt" does not support a REPL. Don’t panic — we’ll fix that shortly.

For now, let’s make sure our "stacker-lang.rkt" is restored to its nor­mal state. We’re up to a whop­ping 12 lines of code:

stacker-lang.rkt
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
#lang br
(define (read-syntax src-path src-port)
  (define src-strs (port->lines src-port))
  (define (make-datum str) (format-datum '(dispatch ~a) str))
  (define src-exprs (map make-datum src-strs))
  (inject-syntax ([#'(src-expr ...) src-exprs])
    #'(module stacker-mod "stacker-lang.rkt"
        src-expr ...)))
(provide read-syntax)

(define #'(stacker-module-begin reader-line ...)
  #'(#%module-begin
     reader-line ...))
(provide (rename-out [stacker-module-begin #%module-begin]))

Run "stacker-test.rkt" again and we should see a dif­fer­ent error: dispatch: unbound identifier in module. We’ll fix that next.

Pro­gram­ming our expander: bind­ings

To com­plete our expander, we need to pro­vide bind­ings for the iden­ti­fiers we enu­mer­ated in our expander design:

  1. dispatch, which is a func­tion.
  2. push, which is a stack com­mand.
  3. + and *, which are stack oper­a­tors.

We’ll start with push, which takes one numer­i­cal argu­ment and pushes onto the top of a stack. Add the fol­low­ing code to "stacker-lang.rkt":

1
2
(define stack empty)
(define (push num) (set! stack (cons num stack)))

Our stack starts out as the empty list. push uses the cons func­tion to put num onto the front of the list, and set!s the stack to the new ver­sion. cons is one of the old­est Lisp func­tions, get­ting its name from its role con­struct­ing a new item in mem­ory. Every time we cons an item onto the list, we cre­ate a new list that’s one big­ger. set! is self-explana­tory, except maybe the ! suf­fix — that’s a Racket con­ven­tion to denote func­tions that update the value of a vari­able.

Now let’s add our long-awaited dispatch func­tion:

1
2
3
4
5
6
7
(define (dispatch arg-1 [arg-2 #f])
  (cond
    [(number? arg-2) (push arg-2)]
    [else
     (define op arg-1)
     (define op-result (op (first stack) (second stack)))
     (set! stack (cons op-result (drop stack 2)))]))

There are a few new Racket func­tions here, so we’ll walk through each line.

1
(define (dispatch arg-1 [arg-2 #f])

We define our dispatch func­tion to take two argu­ments. How do we know we need two argu­ments? We just look at the code that’s com­ing into our expander.

1
2
3
4
5
(dispatch push 4)
(dispatch push 8)
(dispatch +)
(dispatch push 3)
(dispatch *)

We see there are two call­ing pat­terns for dispatch. Either we’ll get two argu­ments, where the first argu­ment is push and the sec­ond argu­ment is a num­ber. Or we’ll get one argu­ment, which will be a stack oper­a­tor (+ or *). The sec­ond argu­ment is optional, so we put it in square brack­ets and assign a default value of #f.

1
2
3
4
(define (dispatch arg-1 [arg-2 #f])
  (cond
    [(number? arg-2) (push arg-2)]
    [else ...]))

cond is Racket’s con­di­tional expres­sion. Each set of square brack­ets intro­duces a con­di­tional test, fol­lowed by the code to eval­u­ate if the test is true. An else branch is invoked when none of the tests are true.

Our first branch tests whether we got a number? for arg-2. If so, we know it’s a push instruc­tion, so we (push arg-2).

1
2
3
4
[else
 (define op arg-1)
 (define op-result (op (first stack) (second stack)))
 (set! stack (cons op-result (drop stack 2)))]

If we don’t have a push instruc­tion, we must have a stack oper­a­tor in the first argu­ment. So we (define op arg-1). We need to send our oper­a­tor the first and second ele­ments from the stack, so we get (op (first stack) (second stack)). Then we put the result­ing value back onto the stack using set!, as we did in push. Notice, how­ever, that we drop those two ele­ments from the stack before we put the result onto the stack — the idea is that they get con­sumed by the oper­a­tor.

That’s it for bind­ings.

“Wait — what about + and *?” We don’t need to do any­thing, because they already have bind­ings. Recall that macros, includ­ing stacker-module-begin, are hygienic. This means that iden­ti­fiers inside any macro-cre­ated code take their mean­ing from where the macro was defined. Because stacker-module-begin has access to all the bind­ings pro­vided by #lang br, these bind­ings travel with the code the macro cre­ates. + and * are already defined by #lang br in the usual way, which suf­fices. (Nei­ther push nor dispatch are defined by #lang br, so we had to make those.)

That said, we remain free to define new bind­ings for + and * within our lan­guage, even in ways that are coun­terini­tu­itive:

1
2

Were we to do so, these new bind­ings would super­sede the ones from #lang br within "stacker-lang.rkt", includ­ing within the code cre­ated by a macro like stacker-module-begin.

“Don’t we have to provide all these lan­guage bind­ings, as we did for stacker-module-begin?” Actu­ally, no. provide makes a bind­ing avail­able out­side of a source file. But remem­ber that a macro brings its own bind­ings along for the ride (= that’s hygiene). So when our stacker-module-begin macro cre­ates ref­er­ences to dispatch, push, +, and * else­where, they auto­mat­i­cally refer to the def­i­n­i­tions that are local to the macro.

If that still seems spooky or weird, think about it this way. The Amer­i­can embassy in Paris occu­pies a very nice build­ing on the Place de la Con­corde. Cer­tainly, the embassy is phys­i­cally within the bound­aries of France. But when you step inside the embassy, what coun­try are you in? You’re no longer in France. You’re in the United States, and US law applies. In that way, a macro behaves like a sov­er­eign state. Though the code it cre­ates trav­els to other loca­tions, the iden­ti­fiers in that code retain the mean­ing they have in the macro home­land.

Our "stacker-lang.rkt" should now look like this:

stacker-lang.rkt
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
#lang br
(define (read-syntax src-path src-port)
  (define src-strs (port->lines src-port))
  (define (make-datum str) (format-datum '(dispatch ~a) str))
  (define src-exprs (map make-datum src-strs))
  (inject-syntax ([#'(src-expr ...) src-exprs])
                 #'(module stacker-mod "stacker-lang.rkt"
                     src-expr ...)))
(provide read-syntax)

(define #'(stacker-module-begin reader-line ...)
  #'(#%module-begin
     reader-line ...))
(provide (rename-out [stacker-module-begin #%module-begin]))

(define stack empty)

(define (push num) (set! stack (cons num stack)))

(define (dispatch arg-1 [arg-2 #f])
  (cond
    [(number? arg-2) (push arg-2)]
    [else
     (define op arg-1)
     (define op-result (op (first stack) (second stack)))
     (set! stack (cons op-result (drop stack 2)))]))

Test­ing our expander with bind­ings

Are we there yet? Just about. Let’s run our "stacker-test.rkt" file, which still looks like this:

stacker-test.rkt
1
2
3
4
5
6
#lang reader "stacker-lang.rkt"
push 4
push 8
+
push 3
*

When we do, we get an error: Interactions disabled: "stacker-lang.rkt" does not support a REPL. The REPL means read–eval­u­ate–print loop, and refers to the inter­ac­tive com­mand line that appears adja­cent to our code in DrRacket. To clear this error, we need to add one house­keep­ing line to our "stacker-lang.rkt", the expla­na­tion of which can wait till later:

1

Run "stacker-test.rkt" again. The good news — no errors. The bad news — no pro­gram result, either.

For­tu­nately this is also a sim­ple fix. The result of our pro­gram is what­ever value is sit­ting on top of the stack at the end. So we add one line to our stacker-module-begin, to display the first ele­ment of the stack after all the reader-lines have been eval­u­ated:

1
2
3
4
(define #'(stacker-module-begin reader-line ...)
#'(#%module-begin
   reader-line ...
   (display (first stack))))

Run "stacker-test.rkt" one more time, and you’ll get:

1
36

Con­grat­u­la­tions — you just made a pro­gram­ming lan­guage.