Member-only story
How TabPFN “Sees” Letters: Classifying 26 Characters with the Many Class Extension
I recently discovered something fascinating. TabPFN can literally “see” patterns in tabular data using its Many Class Classifier extension.
Of course, it’s not “seeing” in the visual sense like a CNN would. But when you look at how certain datasets encode shapes and textures as numbers, TabPFN starts to feel surprisingly perceptive.
Other tabular models can also do this, but with lower accuracy, which makes them a bit myopic in that sense. Here, TabPFN outperforms every baseline, including strong models like Random Forests.
Let’s go through this idea step-by-step and code it together.
Before explaining how is it doing that, first I need to explain a dataset, namely the UCI Letter Recognition dataset.
UCI Letter Recognition dataset
The UCI Letter Recognition dataset is a classic benchmark in machine learning.
Its goal: classify printed capital letters (A–Z) based on their geometric properties.
Here’s what makes it interesting:
- It has 20,000 samples of distorted black-and-white letters (A–Z) generated using 20 different fonts.
- Instead of images, each letter is…