Mutable strings in Golang
Golang strings are immutable. In general, immutable data is simpler to reason about, but it also means your program must allocate more memory to “change” that data. Sometimes, your program can’t afford that luxury. For example, there might not be any more memory to allocate. Another reason: you don’t want to create more work for the garbage collector.
In C, a string is a null-terminated sequence of chars — char*
. Each char
is a single byte, and the string keeps going until there’s a '\0'
character. If you pointed at an arbitrary memory location and called it a C string, you’d see every byte in order until you hit a zero.
In Go, string
is its own data type. At its core, it’s still a sequence of bytes, but:
- It’s a fixed length. It doesn’t just continue on until a zero appears.
- It comes with extra information: its length.
- “Characters” or
rune
s may span multiple bytes. - It’s immutable.
So string
in Go carries some additional structure compared to char*
in C. How does it do this? It’s actually a struct:
type StringHeader struct {
Data unsafe.Pointer
Len int
}
Data
here is analogous to the C string, and Len
is the length. The Golang struct memory layout starts with the last field, so if you were to look at a string
under the microscope, you’d see Len
first and then a pointer to the string
's contents. (You can find documentation of these header structs in the reflect
package.)
Before we start inspecting strings by looking at their StringHeader
fields, how do we cast a string
to a StringHeader
in the first place? When you really need to convert from one Go type to another, use the unsafe
package:
import (
"unsafe"
)
s := "hello"
header := (*StringHeader)(unsafe.Pointer(&s))
unsafe.Pointer
is an untyped pointer. It can point to any kind of value. It’s a way to tell the compiler, “Step aside. I know what I’m doing.” In this case, what we’re doing is converting a *string
into an unsafe.Pointer
into a *StringHeader
.
Now we have access to the underlying representation of the string
. Ever wondered how len("hello")
works? We can implement it ourselves:
func strLen(s string) int {
header := (*StringHeader)(unsafe.Pointer(&s)
return header.Len
}
Getting the length of a string is nice, but what about setting it? Here’s what happens if we artificially extend the length of a string:
s := "hello"
header := (*StringHeader)(unsafe.Pointer(&s))
header.Len = 100
// cast the header back to 'string' and print it
fmt.Print(*(*string)(unsafe.Pointer(header)))
/* on stdout:
helloint16int32int64panicslicestartuint8write (MB)
Value addr= code= ctxt: curg= list= m->p= p->m=
*/
By changing the Len
field of the string header, we can expand the string to include other parts of memory. It’s interesting to observe this behavior, but it’s not something you’d actually want to use.
Data :: unsafe.Pointer
You may have noticed that StringHeader
has an unsafe.Pointer
field which points to the string’s sequence of bytes. []byte
also has a sequence of bytes. In fact, we can build a []byte
from this pointer. Here’s what a slice actually looks like:
type SliceHeader struct {
Data unsafe.Pointer
Len int
Cap int
}
It’s a lot like StringHeader
, except it also has a Cap
(capacity) field. What happens if we build a SliceHeader
from the fields of a StringHeader
?
func strToBytes(s string) []byte {
header := (*StringHeader)(unsafe.Pointer(&s))
bytesHeader := &SliceHeader{
Data: header.Data,
Len: header.Len,
Cap: header.Len,
}
return *(*[]byte)(unsafe.Pointer(bytesHeader))
}
fmt.Print(strToBytes("hello")) // [104 101 108 108 111]
We’ve converted a string
into a []byte
. It’s just as easy to go the other direction:
func bytesToStr(b []byte) string {
header := (*SliceHeader)(unsafe.Pointer(&b))
strHeader := &StringHeader{
Data: header.Data,
Len: header.Len,
}
return *(*string)(unsafe.Pointer(strHeader))
}
fmt.Print(bytesToStr([]byte{104, 101, 108, 108, 111}) // "hello"
Both string
and []byte
headers are using the same Data
pointer, so they share memory. If you ever need to convert between string
and []byte
but there isn’t enough memory to perform a copy, this might be useful.
A word of caution, however: string
is meant to be immutable, but []byte
is not. If you cast a string
to []byte
and try to modify the byte array, it’s a segmentation fault.
s := "hello"
b := strToBytes(s)
b[0] = 100
// panic: runtime error: invalid memory address or nil pointer dereference
// [signal SIGSEGV: segmentation violation code=0xffffffff addr=0x0 pc=0xd56a2]
Casting in the other direction doesn’t cause a segmentation fault, but then your supposedly immutable string
can change:
b := []byte{104, 101, 108, 108, 111}
s := bytesToStr(b)
fmt.Print(s) // "hello"
b[0] = 100
fmt.Print(s) // "dello"
Try it out
That’s a little introduction to what you can do with unsafe.Pointer
and some knowledge of the underlying representation of Go types. If you’d like to play around with the code from this post (and a substr
implementation), have a look at the online Go Playground here: play.golang.org/p/PAjwbct_ohF