Skip to content

Conversation

@mitchellh
Copy link
Contributor

@mitchellh mitchellh commented Jan 10, 2026

This finally resolves #9962 (and the myriad of dupes).

The core issue was that when a non-standard size page is reused as part of our scrollback pruning, it was resized to be standard size, which caused our future frees to believe it was pooled and not call munmap properly.

The solution I chose was to never reuse non-standard sized pages. If during scrollback pruning we detect a non-standard page, we destroy it and re-alloc. This frees the old memory and reuses pool memory for the new page.

As part of this, I also introduced a custom page allocator that uses macOS's mach kernel virtual memory tagging feature to specifically tag our PageList memory. I was able to use this in conjunction with Instruments and footprint to verify that our PageList memory was previously not freed and is now successfully freed.

No AI was used in my work here. AI was used by others in their analysis of this issue that I used the results of to help guide me and give me other things to consider, but the ultimate understanding and fix was all done via my own meat sticks.

Detailed Explainer

Ghostty uses a memory pool of fixed-size mmap-ed pages to serve as the backing memory for our terminal. If the terminal requires a non-standard (larger) amount of memory due to an abundance of emoji, styles, hyperlinks, etc. then we allocate non-pooled pages directly with mmap. When freeing our pages, if it is <= standard size we just put it back into the memory pool. If it is larger, we munmap it.

This explains and defines both a standard and therefore non-standard page.

Ghostty also has the concept of a "scrollback limit" (exposed to the user via the scrollback-limit config). This caps the size of our scrollback or history. When we reach this limit, we have a trick that we do: to avoid allocation, we reuse the oldest page in the history.

Unfortunately, as part of this process, we were resizing the underlying memory to be standard size again. This was causing our future frees to believe this was pooled memory, and never unmap it. This was the main source of the leak.

For SEO

For search-ability, I will note that this particular case was often caused by:

  • Claude Code (Not their fault, just something in the format of their output often triggered non-standard pages)
  • Emoji-heavy output
  • Hyperlink-heavy output (like exa or eza)

Thanks

Big shout out to @grishy for being the person that finally got me a reproduction so I could analyze the issue for myself. His own analysis got to the same conclusion as me but thanks to the reproduction I was able to verify both our understandings independently.

Verified

This commit was signed with the committer’s verified signature.
mitchellh Mitchell Hashimoto

Verified

This commit was signed with the committer’s verified signature.
mitchellh Mitchell Hashimoto

Verified

This commit was signed with the committer’s verified signature.
mitchellh Mitchell Hashimoto

Verified

This commit was signed with the committer’s verified signature.
mitchellh Mitchell Hashimoto

Verified

This commit was signed with the committer’s verified signature.
mitchellh Mitchell Hashimoto
@mitchellh mitchellh added this to the 1.3.0 milestone Jan 10, 2026
@mitchellh mitchellh requested a review from a team as a code owner January 10, 2026 15:05

Verified

This commit was signed with the committer’s verified signature.
mitchellh Mitchell Hashimoto
@mitchellh mitchellh merged commit 17da138 into main Jan 10, 2026
112 checks passed
@mitchellh mitchellh deleted the page-leak branch January 10, 2026 15:54
@chrislloyd
Copy link

Great minds, you beat us by 45 mins! We literally just got some debug logs and Claude came up with a similar fix. Thank you!

kevincorvallis added a commit to kevincorvallis/ghostty that referenced this pull request Jan 11, 2026
This fixes a memory leak where standard-sized pages (592K) accumulate
in the memory pool without being freed during adjustCapacity operations.

PR ghostty-org#10251 fixed non-standard page leaks during scrollback pruning, but
a similar leak existed in the adjustCapacity() path. When a standard
page is expanded to non-standard size (e.g., when doubling grapheme_bytes
from 8192 to 16384), the old standard page was returned to the pool via
pool.pages.destroy(). However, the pool uses an ArenaAllocator that never
frees individual items, causing these pages to accumulate indefinitely.

This is particularly common with emoji-heavy and hyperlink-rich output
(like Claude Code), where adjustCapacity() is called frequently to expand
page capacity. Over time, this results in hundreds of thousands of leaked
592K VM_ALLOCATE regions.

The fix: when replacing a standard page with a non-standard page during
adjustCapacity(), munmap the old page directly instead of returning it
to the pool. This follows the same pattern as PR ghostty-org#10251 but applies to
the adjustCapacity() code path.
@grishy
Copy link

grishy commented Jan 11, 2026

@mitchellh can you recheck my assumption here.
today, I had a crash on this commit after this changes (release build from tip) related to terminal.Screen.cursorScrollAbove, which I got from a minidump file.

Flow PageList.grow() prune, non-standard page
Here we destroy first and exit, so tracked_pins not updated.

// ...
if (first.data.memory.len > std_size) {
    self.destroyNode(first);
    break :prune; // here exit before tracked_pins remap
}

/// Later in the code...
const pin_keys = self.tracked_pins.keys();
for (pin_keys) |p| {
    if (p.node != first) continue;
    p.node = self.pages.first.?;
    p.y = 0;
    p.x = 0;
    p.garbage = true;
}
// ...

Code for go-to:

if (first.data.memory.len > std_size) {
// Node is already removed so we can just destroy it.
self.destroyNode(first);
break :prune;
}
// Reset our memory
const buf = first.data.memory;
@memset(buf, 0);
assert(buf.len <= std_size);
// Initialize our new page and reinsert it as the last
first.data = .initBuf(.init(buf), Page.layout(cap));
first.data.size.rows = 1;
self.pages.insertAfter(last, first);
self.total_rows += 1;
// We also need to reset the serial number. Since this is the only
// place we ever reuse a serial number, we also can safely set
// page_serial_min to be one more than the old serial because we
// only ever prune the oldest pages.
self.page_serial_min = first.serial + 1;
first.serial = self.page_serial;
self.page_serial += 1;
// Update any tracked pins that point to this page to point to the
// new first page to the top-left.
const pin_keys = self.tracked_pins.keys();
for (pin_keys) |p| {
if (p.node != first) continue;
p.node = self.pages.first.?;
p.y = 0;
p.x = 0;
p.garbage = true;
}
self.viewport_pin.garbage = false;

@mitchellh
Copy link
Contributor Author

That does look like a bug on first glance!

mitchellh added a commit that referenced this pull request Jan 11, 2026

Verified

This commit was signed with the committer’s verified signature.
mitchellh Mitchell Hashimoto
Fixes a regression from #10251

Thanks to @grishy again for finding this. Updated tests to catch it,
too.
mitchellh added a commit that referenced this pull request Jan 11, 2026

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
…10285)

Fixes a regression from #10251

Thanks to @grishy again for finding this. Updated tests to catch it,
too.

No AI used, wrote this with my meat sticks.
@grishy
Copy link

grishy commented Jan 11, 2026

@mitchellh
And as we started with pin - resizeWithoutReflowGrowCols, I think this is was my case today with crash on resize 😄

if (prev) |prev_node| prev: {
    // ...
    for (dst_rows, src_rows) |*dst_row, *src_row| {
        prev_page.cloneRowFrom(...); // move page to prev_page
    }
    // and no pin update from page here
}

// Remove the old page.
// Deallocate the old page.
self.pages.remove(chunk.node);
self.destroyNode(chunk.node);  // pin still here a suppose
// pin dangle

if (prev) |prev_node| prev: {
const prev_page = &prev_node.data;
// We only want scenarios where we have excess capacity.
if (prev_page.size.rows >= prev_page.capacity.rows) break :prev;
// We can copy as much as we can to fill the capacity or our
// current page size.
const len = @min(
prev_page.capacity.rows - prev_page.size.rows,
page.size.rows,
);
const src_rows = page.rows.ptr(page.memory)[0..len];
const dst_rows = prev_page.rows.ptr(prev_page.memory)[prev_page.size.rows..];
for (dst_rows, src_rows) |*dst_row, *src_row| {
prev_page.size.rows += 1;
copied += 1;
prev_page.cloneRowFrom(
page,
dst_row,
src_row,
) catch {
// If an error happens, we undo our row copy and break out
// into creating a new page.
prev_page.size.rows -= 1;
copied -= 1;
break :prev;
};
}
assert(copied == len);
assert(prev_page.size.rows <= prev_page.capacity.rows);
}

upd. added link and formatting

@mitchellh
Copy link
Contributor Author

I think I got this in the linked PR (see GH), or did I miss something?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

None yet

4 participants