Custom range iterators in Go

Charles Francoise Jun 12, 2017

The problem

After a couple of weeks into Go, there's one thing I can say for sure: it's a fun language. In the current slew of "modern", systems-oriented languages, it's probably the easiest to just dive into and start banging away at your keyboard to put a couple of fresh ideas into shape.

Like most modern languages, Go has a construct to iterate over collections, instead of iterating on an index variable. Here's what iterating over an array looks like:

arr := []int{2,3,5,7,11,13,17,19}
for _, i := range arr {
    fmt.Printf("%d ", i)
}
fmt.Println()

Output:

2 3 5 7 11 13 17 19

The range keyword allows to operate over a number of types, yielding values for each iteration. Arrays, maps & strings can all be iterated over. The range iterator also applies to channels (which will be useful very soon). Iterating over channels pulls the next value from a channel, blocking if there is no value available, until the channel is closed.

Go also supports C-style iterations with an initializer, stop condition and step. This is actually the recommended way to iterate over a sequence of numbers.

for i := 0; i < 10; i++ {
    fmt.Printf("%d ", i)
}
fmt.Println()

I was quite disappointed when I learned this. This is 2017. I expect to write for variable in thing, and start iterating away in any new language I learn. For example, I'm a big fan of Python's range built-in function (and its sister xrange). It takes 1, 2 or 3 arguments and returns an object that can be iterated over. Here's Python's range translated to Go's for.

range(10) : for i := 0; i < 10; i++
range(10, 20) : for i := 10; i < 20; i++
range(10, 20, 2) : for i := 10; i < 20; i += 2

I wondered: was there a way to build such a simple and idiomatic construct in Go? (SPOILER: yes)

The solution

I set out to build my custom iterator in Go with no idea if this was possible. All I knew was that my type should respond to the range keyword. According to the Go language specification,

A "for" statement with a "range" clause iterates through all entries of an array, slice, string or map, or values received on a channel.

So range is just a keyword to build a "range clause" for a for iteration, and the number of types that it can iterate over is finite.

My first thought was to use an array or slice but I scrapped that idea because I didn't want the overhead of creating the entire range in memory. (Incidentally, this is what Python did for range, until Python 3). Plus, I was already thinking that I might want infinite iterable sequences later.

So the next type that popped to mind was the channel. Channels are great to push values into one by one, and block until the channel can be written to again. This sounded close enough to what I wanted, but there was a catch: I didn't want my iterator to block, I wanted it to "wait on the side" until I asked for another value. In other words, I wanted a coroutine...

Oh wait.

Go has this wonderful concept called goroutines, which is akin to a very lightweight thread. It's a minimal unit of concurrent programming and is tightly built into the language. To put it simply, goroutines are concurrency without parallelism. A goroutine will run concurrently to other goroutines, letting other goroutines run if it is blocked (e.g. by attempting to write on an unavailable channel).

It's not an accident that this name was chosen, and it turned out goroutines and channels were the way to Go.

Writing the code – 1.0

So how does this translate into code form?

First, let's realize that iterating over an integer sequence can be defined by 4 values:

start: the initial value of the iteration
stop: the maximum value of the iteration, non-inclusive
step: the increment added to the value each loop
current: the current value of the iteration

Notice the first 3 values are static and do not change during an iteration, while the 4th value is dynamic and only matters during the iteration.

Next, we know we want to give our range a channel, specifically an output int channel. So we'll be using a function that returns a <-chan int to encapsulate our state. And we'll be using a goroutine to make the iteration run concurrently and block when it has a value to yield.

func IntRange(start int, stop int, step int) <-chan int {
    ch := make(chan int)

    // Run iteration concurrently in a closure.
    // The start, stop and step are captured from the arguments and
    // the current value is kept in a variable local to the closure.
    go func() {
        for i := start; i < stop; i += step {
            ch <- i
        }
        close(ch)    // Don't forget to close to finish iteration
    }()

    return ch
}

The usage is very similar to Python:

func main() {
    for i := range IntRange(0, 10, 1) {
        fmt.Printf("%d ", i)
    }
    fmt.Println()

    for i := range IntRange(0, 20, 2) {
        fmt.Printf("%d ", i)
    }
    fmt.Println()
}

Output:

0 1 2 3 4 5 6 7 8 9
0 2 4 6 8 10 12 14 16 18

Great! In a few lines of Go, I created a custom iterator that responds to range and still looks quite idiomatic (although I'd love to find another name that doesn't repeat the word "range").

Writing the code – 2.0

A few things were bugging me though. First, my version only covered the 3-argument version of Python's range, making it more cumbersome to use. Second, I couldn't use this with negative steps, such as in range(20, 10, -1), equivalent to for i := 20; i > 10; i -= 1.

The second problem can easily be solved by switching the condition if the step is <= 0.

For the first problem, I had to dig a bit deeper in the language. Go does not support default parameters for functions, and does not support function overloading (at least not from argument count). The only mechanism I could find was variadic functions, but it turned out pretty well.

func IntRange(args ...int) <-chan int {
    var start, stop, step int
    switch len(args) {
    case 1:            // 1 argument: stop
        start = 0
        stop = args[0]
        step = 1
    case 2:            // 2 arguments: start, stop
        start = args[0]
        stop = args[1]
        step = 1
    case 3:            // 3 arguments: start, stop, step
        start = args[0]
        stop = args[1]
        step = args[2]
    default:           // invalid argument count
        panic("IntRange takes 1 to 3 arguments.")
    }
    ch := make(chan int)
    if step >= 0 {
        // increment case
        go func() {
            for i := start; i < stop; i += step {
                ch <- i
            }
            close(ch)
        }()
    } else {
        // decrement case
        go func() {
            for i := start; i > stop; i += step {
                ch <- i
            }
            close(ch)
        }()
    }
    return ch
}

Example usage:

func main() {
    for i := range IntRange(10) {
        fmt.Printf("%d ", i)
    }
    fmt.Println()

    for i := range IntRange(10, 20) {
        fmt.Printf("%d ", i)
    }
    fmt.Println()

    for i := range IntRange(20, 0, -1) {
        fmt.Printf("%d ", i)
    }
    fmt.Println()
}

Output:

0 1 2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17 18 19
20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

Conclusion

So what have we made? In a few lines of Go, we've built a modern iterator, that fits into the language's idioms, with minimal memory or performance overhead.

This pattern – a function returning a channel + a goroutine closure – can be used to build other types of range iterators. For example, iterating over lines of input text, iterating over Fibonacci numbers, etc.

There are a few issues, however. The number of arguments to the variadic function can only be checked at runtime, even though we know it should be between 1 and 3 at compile-time. Also, even though the function is a first-order value in Go, and perfectly able to hold state as it does here, I'd love to have a better encapsulation, perhaps returning a more complex object from the function that could be re-initialized or dynamically changed. Some form of iterator-generator pattern. But maybe that's just my OOP-educated brain fighting against other paradigms.

I should also probably add some tests and check for edge cases but eh...

What do you think? Can you see an issue in this design? What kind of structures would you build using this pattern? Feel free to leave a comment!

Charles Francoise

Go is a new language. Although it borrows ideas from existing languages, it has unusual properties that make effective Go programs different in character from programs written in its relatives. A straightforward translation of a C++ or Java program into Go is unlikely to produce a satisfactory result—Java programs are written in Java, not Go. On the other hand, thinking about the problem from a Go perspective could produce a successful but quite different program. In other words, to write Go well, it's important to understand its properties and idioms. It's also important to know the established conventions for programming in Go, such as naming, formatting, program construction, and so on, so that programs you write will be easy for other Go programmers to understand.

Brian Douglas

Link e52c429ee7fe82bfec2d6542cc7b4a04b7d9f886827998833080ac23fcd2f169

Interesting article I liked it.

Do you think you should apply the concepts of other languages to Go?

I ask as the very first paragraph in "Effective Go" is the following.

golang.org/doc/effective_go.html#i...

UPDATE: I might not have made it clear that this is just a proof-of-concept. I wanted to see if there was a clean construct that enabled iterating using range with custom behavior. I never intended to iterate over basic number sequences using this. But there might be more complex structures that need to encapsulate an iterator, and I wanted to reuse the idiomatic range iteration (instead of a Next function, or a dedicated iterator type).

This comment on Reddit gives a bit of historical context to this idea. There are performance and other issues related to this pattern, that pretty much rule it out. But I had fun learning Go by trying to make new stuff.

I guess I'm just a bit frustrated that I can't "write Go in Go". Other modern languages I've been playing with – Rust & Swift – have this idea that the standard library can be written using the language itself. And that there is no such thing as "primitive" types that would behave differently than other types. I like that, but I understand and respect Go's choice to keep the language opinionated, all the way down to the syntax.

nihar

There are a few ways you can do it, but the common theme between them is that you want to somehow transform your data into a type that Go is capable of ranging over.
Learn GO here: hackr.io/tutorials/learn-go

Approach 1: Slices

Since you mentioned that you have a slice internally, this may be easiest for your use case. The idea is simple: your type should have an Iterate() method (or similar) whose return value is a slice of the appropriate type. When called, a new slice is created containing all of the elements of the data structure in whatever order you'd like them to be iterated over. So, for example:

func (m *MyType) Iterate() []MyElementType { ... }

mm := NewMyType()
for i, v := range mm.Iterate() {
...
}
There are a few concerns here. First, allocation - unless you want to expose references to internal data (which, in general, you probably don't), you have to make a new slice and copy all of the elements over. From a big-O standpoint, this isn't that bad (you're doing a linear amount of work iterating over everything anyway), but for practical purposes, it may matter.

Additionally, this doesn't handle iterating over mutating data. This is probably not an issue most of the time, but if you really want to support concurrent updates and certain types of iteration semantics, you might care.

Approach 2: Channels

Channels are also something that can be ranged over in Go. The idea is to have your Iterate() method spawn a goroutine that will iterate over the elements in your data structure, and write them to a channel. Then, when the iteration is done, the channel can be closed, which will cause the loop to finish. For example:

func (m *MyType) Iterate() <-chan MyElementType {
c := make(chan MyElementType)
go func() {
for _, v := range m.elements {
c <- v
}
close(c)
}()
return c
}

mm := NewMyType()
for v := range mm.Iterate() {
...
}
There are two advantages of this method over the slice method: first, you don't have to allocate a linear amount of memory (although you may want to make your channel have a bit of a buffer for performance reasons), and second, you can have your iterator play nicely with concurrent updates if you're into that sort of thing.

The big downside of this approach is that, if you're not careful, you can leak goroutines. The only way around this is to make your channel have a buffer deep enough to hold all of the elements in your data structure so that the goroutine can fill it and then return even if no elements are read from the channel (and the channel can then later be garbage collected). The problem here is that, a) you're now back to linear allocation and, b) you have to know up-front how many elements you're going to write, which sort of puts a stop to the whole concurrent-updates thing.

The moral of the story is that channels are cute for iterating, but you probably don't want to actually use them.

Approach 3: Internal Iterators

Credit to hobbs for getting to this before me, but I'll cover it here for completeness (and because I want to say a bit more about it).

The idea here is to create an iterator object of sorts (or to just have your object only support one iterator at a time, and iterate on it directly), just like you would in languages that support this more directly. What you do, then, is call a Next() method which, a) advances the iterator to the next element and, b) returns a boolean indicating whether or not there's anything left. Then you need a separate Get() method to actually get the value of the current element. The usage of this doesn't actually use the range keyword, but it looks pretty natural nonetheless:

mm := MyNewType()
for mm.Next() {
v := mm.Get()
...
}
There are a few advantages of this technique over the previous two. First, it doesn't involve allocating memory up-front. Second, it supports errors very naturally. While it's not really an iterator, this is exactly what bufio.Scanner does. Basically the idea is to have an Error() method which you call after iteration is complete to see whether iteration terminated because it was done, or because an error was encountered midway through. For purely in-memory data structures this may not matter, but for ones that involve IO (e.g., walking a filesystem tree, iterating over database query results, etc), it's really nice. So, to complete the code snippet above:

mm := MyNewType()
for mm.Next() {
v := mm.Get()
...
}
if err := mm.Error(); err != nil {
...
}
Conclusion

Go doesn't support ranging over arbitrary data structures - or custom iterators - but you can hack it. If you have to do this in production code, the third approach is 100% the way to go, as it is both the cleanest and the least of a hack (after all, the standard library includes this pattern).

the3noname

Your concept is interesting, but there are two problems with your solution.

The function takes a integer range. This function would panic at runtime if used wrong. There are no compilation checks. Also the function handles each parameter differently, despite the fact that its one integer range. I don't think that's idiomatic.
I think the classic
```
for i := 0; i < 10; i++
```
is actually easier to read. Nearly every dev will understand this at first glance (even if it's not as clean as other solutions). On the other hand
```
for i := range IntRange(10, 20, 2)
```
is ambiguous for me. And it doesn't help if I look at the function signature.

I think I'm sticking with the classic solution.

Herbert Fischer

Python's range() just returns a list, that can then be iterated. If you do a version of IntRange() that just returns a slice, you will have something comparable. I would even rename it to something more appropriate, like IntSequence or IntSlice, because that is what the function would be doing.

What you did with channels here is closer to what Python's xrange() is. Given that it is not safe to use channels for this use-case, I would suggest any another approach, without using the range keyword. I believe using the 3-clause for loop is the best replacement for xrange()