PrismML
PrismML
82 posts
PrismML’s posts
Super excited to cache these talks today, we predict they will be excellent.
Presenting today's Inference Systems Track agenda, featuring speakers from , , , , , .
Thank you , Head of AI at
We’re looking for people who love building from scratch! DM if interested.
Quote
Omead Pooladzandi
@HessianFree
Hey! @PrismML is hiring!
We're looking for LLM people who have trained models at scale - SFT/RL, data mixtures, evals, distillation, long context, distributed training, kernels, you name it!
Especially interested in people who like owning the full stack from training dynamics
My job is literally to do interesting and crazy experiments, if you are into that sort of thing DM Omead or I (or , , , , , ) and come build with us at !
Quote
Omead Pooladzandi
@HessianFree
Hey! @PrismML is hiring!
We're looking for LLM people who have trained models at scale - SFT/RL, data mixtures, evals, distillation, long context, distributed training, kernels, you name it!
Especially interested in people who like owning the full stack from training dynamics
We’re expanding our highly technical team at — people who love pushing model quality end-to-end, from training dynamics to shipped models.
If you’ve scaled LLM training, RL/SFT, evals, distillation, long context, kernels, or infra, we’d love to talk.
Quote
Omead Pooladzandi
@HessianFree
Hey! @PrismML is hiring!
We're looking for LLM people who have trained models at scale - SFT/RL, data mixtures, evals, distillation, long context, distributed training, kernels, you name it!
Especially interested in people who like owning the full stack from training dynamics
Hey! is hiring!
We're looking for LLM people who have trained models at scale - SFT/RL, data mixtures, evals, distillation, long context, distributed training, kernels, you name it!
Especially interested in people who like owning the full stack from training dynamics
One of my favorite things about running for eleven years? The founders.
There's a secret "track" that's not on the schedule — an invisible hallway of builders. And the next wave is showing up at SF 2026:
of
of
Training gets the headlines. Inference gets the bill. As agents move from novelty to default workload, the hard problem isn't the model anymore. It's every millisecond and every watt between a prompt and the next token. A coding agent running for six hours straight is a very
Huge shoutout to
This shouldn’t be possible: a tiny model punching way above its weight.
The largest version is just 1.14 GB, which means it’s small enough for a phone.
Fast on a phone (spoiler: Pico for iOS is coming soon!). Insanely fast on a MacBook Pro M1 Max.
Quote
Pico AI Server and Pico AI Studio
@PicoGPT
Replying to @PicoGPT
How fast you ask? About 109 tokens per second on an M1 Max MacBook Pro fast.
The media could not be played.
Tiny local models like Bonsai are going to change things.
For the last three years, the default way most people used AI was simple: frontier models lived in data centres, you reached them through an API, and anything local felt like a toy.
That will probably stop being true in
Pico Local AI Server 1.4.21 is now available on the Mac App Store. This release adds support for Ternary Bonsai, a lightning-fast model that outperforms many much larger models
Replying to
Ran Ternary-Bonsai 8B on my iPhone through OnDevice LLM. Surprisingly fast.
Ternary is actually surprisingly powerful. Validated by bitnet and now again here. In the new model training research/experimentation I've been working on, ternary weights (in some places) actually beats bf16 (by a not-insignificant amount), at least up to the 7b scale (and
Interesting work here 
Ternary Bonsai 8B is within 5% of Qwen 3 8B at 9x lower memory!
Congratulations on yet another exciting release!
cc
One of the things I tried researching but found really hard.
1.58bpw is insane 10x smaller than original, I hope they push it to much larger models
Check out how you can use Ternary Bonsai 8B
for tool calling in your everyday life—an impressive demo on an amazing platform by and !
The models are getting smaller. Great for OpenClaws and Hermes. Gotta heat them up!
Yesterday someone told me "phones are three to five years away."
Oh, really?
Ternary Bonsai: state-of-the-art intelligence at 1.58 bits. The models are so small they can even run locally in your browser on WebGPU!
Here's the 8B version (just ~2GB in size) running at 60 tokens per second on my M4 Max.
Try the demo out yourself! 
People are the most valuable resource in action.
Quote
Omead Pooladzandi
@HessianFree
> > anon asked for one more state
> > we added zero
> > +600 MB
> > +5 benchmark points
> > 75.5 avg at 1.75 GB
> > still ~1/9 the size of Qwen3 8B
> > shout out brahmagupta
> > zero mattered x.com/PrismML/status…
Quote
merve
@mervenoyann
new open-source Bonsai models are out
> ternary weights in 8B (1.75 GB), 4B (0.86 GB), and 1.7B (0.37 GB)
> comes in MLX, ONNX weights and WebGPU browser demo
> a2.0 licensed
x.com/PrismML/status…