supercompact

Harder better faster stronger compacting for your AI agent.

supercompact is a conversation compaction tool for AI coding agents. It takes Claude Code session transcripts (JSONL) and compacts them to fit within a token budget while preserving the entities and context the agent needs to keep working — file paths, error messages, function names, commands, URLs, and more.

Unlike Claude Code's built-in /compact (which summarizes via LLM), supercompact uses score-and-select: it scores every assistant turn by relevance, then greedily selects the most important turns to keep within budget. The original turns are preserved verbatim — nothing is paraphrased or lost.

Install

Requires Python 3.11+ and uv.

Claude Code

git clone https://github.com/heiervang-technologies/supercompact.git
cd supercompact
./plugins/claude-code/install.sh

Then start Claude Code with the plugin:

claude --plugin-dir ~/.local/share/supercompact/claude-code/plugin

Compaction now uses EITF automatically — both /compact and auto-compact.

OpenAI Codex CLI

git clone https://github.com/heiervang-technologies/supercompact.git
cd supercompact
./plugins/codex-cli/install.sh

Then compact on demand or run the background daemon:

codex-compact                  # compact latest session
codex-compact-watch            # auto-intercept compaction

Uninstall

./plugins/claude-code/uninstall.sh   # Claude Code
./plugins/codex-cli/uninstall.sh     # Codex CLI

Why?

When a coding agent session gets long, context compaction becomes critical. The built-in /compact in both Claude Code and Codex CLI uses LLM summarization, which:

Is slow (~30s+ per compaction)
Loses exact technical details (file paths get paraphrased, error messages get summarized)
Costs API tokens for the summarization call itself

supercompact's EITF method runs in <1 second on any hardware (no GPU, no API calls) and preserves ~2x more entities than LLM summarization at the same token budget.

Standalone Usage

If you want to use supercompact as a CLI tool without installing the plugins:

git clone https://github.com/heiervang-technologies/supercompact.git
cd supercompact
uv sync

For the embed method (local PyTorch scorer):

uv sync --extra torch

The llama-embed and llama-rerank methods require a running llama.cpp server (see Embedding & Reranking Methods).

# Compact a conversation to 80k tokens using EITF (default)
uv run python compact.py session.jsonl --method eitf --budget 80000 --output compacted.jsonl

# Try different methods
uv run python compact.py session.jsonl --method setcover --budget 60000 --output compacted.jsonl
uv run python compact.py session.jsonl --method dedup --output compacted.jsonl

# Evaluate methods against each other
uv run python compact.py evaluate session.jsonl --method all --budget 100000 --eval-output results.json

# Generate Pareto plots from evaluation results
uv run python compact.py plot results.json -o pareto.png

Compaction Methods

All methods follow the same pipeline: score each assistant turn, then select turns greedily by score until the token budget is filled. User turns and short system turns (<=300 tokens) are always kept.

Local Methods (no model, instant)

Method	How it scores	Best for
eitf	Entity-frequency Inverse Turn Frequency. Extracts structured entities (file paths, errors, functions, etc.) and scores turns by weighted entity importance × rarity. BM25-style length normalization.	General use. Fast, good entity preservation.
setcover	EITF + exclusivity bonus. Entities that appear in only 1-2 system turns get a 20% score boost, since dropping that turn loses them entirely.	Slightly better coverage than EITF at tight budgets.
dedup	Suffix automaton deduplication. Builds an O(n) suffix automaton over the full conversation, scores turns by unique content ratio. Turns with mostly-repeated content score low.	Removing redundant tool output, repeated errors.

Embedding & Reranking Methods (need llama.cpp server)

Method	How it scores	Requirements
llama-embed	Qwen3-Embedding-0.6B cosine similarity. Embeds a query (recent user messages) and all system turns, ranks by similarity.	`llama-server` on port 8080 with Qwen3-Embedding-0.6B GGUF
llama-rerank	Qwen3-Reranker-0.6B cross-encoder. Sends query + document pairs to the reranker server for direct relevance scoring.	`llama-server` on port 8181 with `ggml-org/Qwen3-Reranker-0.6B-Q8_0-GGUF`
embed	Same as llama-embed but runs Qwen3-Embedding-0.6B locally via PyTorch.	`torch` extra installed, GPU recommended

LLM Baseline

Method	How it works	Requirements
claude-code	Sends the conversation to Claude for LLM summarization. Simulates `/compact` behavior. Used as a baseline in evaluations.	Authenticated Claude Code installation

CLI Reference

supercompact has three subcommands: compact (default), evaluate, and plot.

compact

Score and select turns to fit a token budget.

uv run python compact.py [compact] FILE.jsonl [options]

Option	Default	Description
`--method`	`eitf`	Scoring method: `dedup`, `eitf`, `setcover`, `embed`, `llama-embed`, `llama-rerank`
`--budget`	`80000`	Target token budget
`--output`	—	Write compacted JSONL to this path
`--format`	`jsonl`	Output format: `jsonl` or `summary` (text for Claude context)
`--short-threshold`	`300`	System turns <= this token count are always kept
`--min-repeat-len`	`64`	Minimum repeated substring length for dedup scoring
`--scores-file`	—	Write per-turn scores to CSV
`--dry-run`	—	Use random scores (for testing the pipeline)
`--verbose`	—	Show detailed breakdown
`--device`	`cpu`	PyTorch device for `embed` method
`--batch-size`	`16`	Embedding batch size
`--embed-url`	`http://localhost:8080`	llama.cpp embedding server URL
`--rerank-url`	`http://localhost:8181`	llama.cpp reranker server URL

evaluate

Run entity preservation evaluation across methods and budgets.

uv run python compact.py evaluate FILE.jsonl [options]

Splits the conversation into prefix (70%) and suffix (30%). Compacts the prefix, then measures what fraction of suffix-referenced entities survive in the kept turns.

Additional options beyond compact:

Option	Default	Description
`--method`	`eitf`	Use `all` to evaluate every method
`--split-ratio`	`0.70`	Prefix/suffix split ratio
`--probe-cache`	`eval_cache/`	Directory for cached LLM-as-Judge probe sets
`--eval-output`	—	Export results as JSON

plot

Generate Pareto plots from evaluation result JSON files.

uv run python compact.py plot results1.json [results2.json ...] [-o output.png]

Evaluation Framework

supercompact includes a multi-dimensional evaluation framework for comparing compaction methods.

Entity Coverage

The primary metric. Inspired by Łajewska et al. (EMNLP 2025) — entity preservation is the most discriminating metric for compression quality.

Extracts structured entities from text using regex patterns:

Entity Type	Weight	Examples
file_path	1.0	`/home/user/src/foo.py`
error	1.0	`ModuleNotFoundError`
exception	0.9	`TypeError`, `ValueError`
url	0.8	`http://localhost:8080/api`
port	0.8	`:3000`, `port 5432`
command	0.7	`git commit`, `docker build`
package	0.7	`httpx`, `transformers`
http_status	0.6	`404 Not Found`
function	0.5	`parse_jsonl()`, `build_query()`
class_name	0.4	`ScoredTurn`, `EntitySet`
env_var	0.4	`OPENROUTER_API_KEY`

Coverage is computed as: what fraction of entities referenced in the suffix (future conversation) are preserved in the compacted prefix.

Evidence Coverage (LLM-as-Judge)

An optional second metric using LLM-generated probes. An LLM reads the full conversation and generates ~25 probe questions across five dimensions:

Dimension	Weight	What it tests
error_solution	0.30	Can the agent recall failures, root causes, and fixes?
instruction	0.25	Are user requirements and preferences preserved?
progress	0.25	Does the agent know what's done, what failed, what's next?
environment	0.15	File paths, ports, configs, tool versions — exact factual recall
noise	0.05	Can the agent summarize verbose output without retaining raw noise?

Probes are cached per (conversation, split_ratio) pair. Evidence coverage measures whether the turns containing probe answers are kept by each method.

Plugin Details

Both plugins replace the CLI's built-in LLM summarization with EITF scoring. Configuration via environment variables:

PLUGIN_SETTING_METHOD=eitf             # eitf, setcover, dedup
PLUGIN_SETTING_BUDGET=80000            # token budget
PLUGIN_SETTING_FALLBACK_TO_BUILTIN=true  # fall back to LLM on error

The Claude Code plugin also provides a /supercompact slash command for manual compaction:

/supercompact           # Compact with default 80k budget
/supercompact 60000     # Compact with custom budget

See plugins/claude-code/README.md and plugins/codex-cli/README.md for full details.

Architecture

compact.py              # CLI entry point (compact / evaluate / plot subcommands)
lib/
├── parser.py           # JSONL parsing, Turn dataclass, text extraction
├── tokenizer.py        # Token counting via Qwen3 tokenizer
├── types.py            # ScoredTurn, build_query, random_scores
├── selector.py         # Budget-constrained greedy turn selection
├── formatter.py        # Output formatting (JSONL, summary text, CSV)
├── scorer_base.py      # Scorer protocol, registry, method resolution
├── eitf.py             # EITF scorer (entity-frequency inverse turn frequency)
├── setcover.py         # SetCover scorer (EITF + exclusivity bonus)
├── dedup.py            # Suffix automaton dedup scorer
├── scorer.py           # PyTorch embedding scorer (Qwen3-Embedding-0.6B)
├── llama_embed.py      # llama.cpp embedding scorer (HTTP)
├── llama_rerank.py     # llama.cpp reranker scorer (HTTP)
├── llm_compact.py      # Claude LLM summarization baseline
├── pareto.py           # Pareto plot generation
├── fitness.py          # Legacy fitness evaluation
└── eval/
    ├── probes.py       # LLM-as-Judge probe generation
    ├── entity_coverage.py  # Entity extraction and coverage metrics
    ├── evidence_coverage.py # Evidence turn coverage computation
    ├── cache.py        # Probe set caching
    ├── judge.py        # LLM judge for probe answering
    ├── aggregate.py    # Result aggregation
    └── report.py       # Evaluation reporting

Pipeline

Parse: Read JSONL, group into alternating user/system turns
Tokenize: Count tokens per turn using Qwen3 tokenizer
Score: Run the selected method to assign relevance scores (0-1) to each long system turn
Select: Greedily pick highest-scoring turns that fit in budget, with a 0.15 recency bonus. User turns, short system turns, and the most recent system turn are always kept.
Output: Write compacted JSONL (or summary text) preserving original turn structure

License

MIT

Name	Name	Last commit message	Last commit date
Latest commit marksverdhei and claude feat: make Claude Code plugin seamless — auto-config, single-script c… Feb 13, 2026 bb4fb00 · Feb 13, 2026 History 10 Commits
.claude/commands	.claude/commands	feat: add compaction methods, evaluation framework, and Pareto analysis	Feb 11, 2026
data	data	chore: organize eval data and results into directories	Feb 11, 2026
lib	lib	feat: add compaction methods, evaluation framework, and Pareto analysis	Feb 11, 2026
plugins	plugins	feat: make Claude Code plugin seamless — auto-config, single-script c…	Feb 13, 2026
results	results	chore: organize eval data and results into directories	Feb 11, 2026
.gitignore	.gitignore	chore: remove review files and gitignore REVIEW.md	Feb 13, 2026
README.md	README.md	docs: put plugin install instructions first in README	Feb 13, 2026
compact.py	compact.py	feat: add compaction methods, evaluation framework, and Pareto analysis	Feb 11, 2026
docker-compose.yml	docker-compose.yml	feat: add compaction methods, evaluation framework, and Pareto analysis	Feb 11, 2026
pareto.py	pareto.py	feat: add compaction methods, evaluation framework, and Pareto analysis	Feb 11, 2026
pyproject.toml	pyproject.toml	feat: add compaction methods, evaluation framework, and Pareto analysis	Feb 11, 2026
uv.lock	uv.lock	feat: add compaction methods, evaluation framework, and Pareto analysis	Feb 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

supercompact

Install

Claude Code

OpenAI Codex CLI

Uninstall

Why?

Standalone Usage

Compaction Methods

Local Methods (no model, instant)

Embedding & Reranking Methods (need llama.cpp server)

LLM Baseline

CLI Reference

compact

evaluate

plot

Evaluation Framework

Entity Coverage

Evidence Coverage (LLM-as-Judge)

Plugin Details

Architecture

Pipeline

License

About

Releases

Packages

Contributors 2

Languages

heiervang-technologies/supercompact

Folders and files

Latest commit

History

Repository files navigation

supercompact

Install

Claude Code

OpenAI Codex CLI

Uninstall

Why?

Standalone Usage

Compaction Methods

Local Methods (no model, instant)

Embedding & Reranking Methods (need llama.cpp server)

LLM Baseline

CLI Reference

compact

evaluate

plot

Evaluation Framework

Entity Coverage

Evidence Coverage (LLM-as-Judge)

Plugin Details

Architecture

Pipeline

License

About

Resources

Stars

Watchers

Forks

Releases

Packages

Contributors 2

Languages