How to (Ab)Use your KS-LE-B for LLM Models
So, you got one of these KS-LE-B and want to run some LLM models?
Smol short guide.
Grab the dependencies we need.
apt-get install pciutils build-essential cmake curl libcurl4-openssl-dev git ccache python3-pip python3.13-venv -y
Add a new user which will run the LLM models.
adduser llm
Logon
su llm
Grab llama.cpp
cd
git clone https://github.com/ggml-org/llama.cpp.git
Grab huggingface CLI
curl -LsSf https://hf.co/cli/install.sh | bash
export PATH="/home/llm/.local/bin:$PATH"
I have made a smol script to initial build / update llama.cpp: https://pastebin.com/raw/gKYBcXqc
wget -O update.sh https://pastebin.com/raw/gKYBcXq
chmod +x update.sh
bash update.sh
Lets download our first model.
hf download unsloth/GLM-4.7-Flash-GGUF --include "*Q4_K_M*" --local-dir models/
Either you can run llama.cpp on the CLI.
llama.cpp/llama-cli --jinja --model models/GLM-4.7-Flash-Q4_K_M.gguf
or use the Webinterface.
llama.cpp/llama-server --jinja --host 127.0.0.1 --port 8888 --models-dir models/
Including a model autoloader, which you can select in the webinterface.
Add a nginx reverse proxy and you set.
Comments
Its fast enough to chat but not blazing fast on CPU.

Free NAT KVM | Free NAT LXC
That's pretty cool !
TierHive - Hourly VPS - NAT Native - /24 per customer - Lab in the cloud - Free to try.
FREE tokens when you sign up, try before you buy. | Join us on Reddit
Before I forgot to mention this.
Try to get Q4 and higher, Q4 is a good balance.
The model mention above, needs 64GB if you have a KS-LE-B with less, try a smoler model.
To optimize performance / results always check the guide for the model.
e.g https://unsloth.ai/docs/models/glm-4.7-flash
Free NAT KVM | Free NAT LXC