Member-only story
Build With Qwen 3, MCP, and a Free GPU
You can have them all in a single Notebook
I was curious to know.
Can we run local LLMs using Ollama, run MCP servers, connect them, and build intelligent apps in a single Notebook?
If this is possible, we can get the most from the free Kaggle Notebooks (or Colabs.) For quantized models, the free GPU we get with them is sufficient.
To run Ollama, you need to access the notebook terminal. Of course, Kaggle and Colab give access to the terminal. But they don’t allow background processes. So, you can’t keep the Ollama server running.
Same story with MCP servers. Most of them run in the background.
But it’s not entirely out of reach.
In this post, I’ll show you how I start Ollama servers in a Kaggle Notebook.
Let’s run the Qwen3 model. I like Qwen 3 for two reasons. First, it’s now the most capable open-source thinking model, and second, we have options for every device. From 0.6 B to 235 B, we can pick the one that best suits our hardware needs.
Running Ollama on Kaggle Notebook
I’m a fan of Llama.CPP: Not Ollama.