How to (Ab)Use your KS-LE-B for LLM Models

Neoon · 12:10PM

So, you got one of these KS-LE-B and want to run some LLM models?
Smol short guide.

Grab the dependencies we need.
apt-get install pciutils build-essential cmake curl libcurl4-openssl-dev git ccache python3-pip python3.13-venv -y

Add a new user which will run the LLM models.
adduser llm

Logon
su llm

Grab llama.cpp

cd
git clone https://github.com/ggml-org/llama.cpp.git

Grab huggingface CLI

curl -LsSf https://hf.co/cli/install.sh | bash
export PATH="/home/llm/.local/bin:$PATH"

I have made a smol script to initial build / update llama.cpp: https://pastebin.com/raw/gKYBcXqc

wget -O update.sh https://pastebin.com/raw/gKYBcXq
chmod +x update.sh
bash update.sh

Lets download our first model.
hf download unsloth/GLM-4.7-Flash-GGUF --include "*Q4_K_M*" --local-dir models/

Either you can run llama.cpp on the CLI.
llama.cpp/llama-cli --jinja --model models/GLM-4.7-Flash-Q4_K_M.gguf

or use the Webinterface.
llama.cpp/llama-server --jinja --host 127.0.0.1 --port 8888 --models-dir models/

Including a model autoloader, which you can select in the webinterface.
Add a nginx reverse proxy and you set.

Neoon · 12:15PM

Its fast enough to chat but not blazing fast on CPU.

backtogeek · 12:44PM

That's pretty cool !

Neoon · 1:46PM

Before I forgot to mention this.
Try to get Q4 and higher, Q4 is a good balance.

The model mention above, needs 64GB if you have a KS-LE-B with less, try a smoler model.
To optimize performance / results always check the guide for the model.
e.g https://unsloth.ai/docs/models/glm-4.7-flash

How to (Ab)Use your KS-LE-B for LLM Models

Comments