Go
Home
Pricing
FAQ
Install
Home
Pricing
FAQ
Install
Login
via GitHub
ggml-org/llama.cpp
Pull Requests
Commits
Open
Closed
common : add standard Hugging Face cache support
#20775 opened 2026-03-19 21:10 by
angt
opencl: add flattened Q4_K mv and general Q4_K mm
ggml
OpenCL
#20773 opened 2026-03-19 20:14 by
shaofeiqi
ggml-cpu: fix cross-compilation to Windows from Linux
ggml
#20759 opened 2026-03-19 14:29 by
yeison-pati
llama-fit-params: fix patterns for gate_up tensors
#20758 opened 2026-03-19 14:14 by
JohannesGaessler
Add MCP Connection diagnostics and CORS hint to web-ui
examples
server
#20753 opened 2026-03-19 11:49 by
evalstate
Add Qwen3 TTS architecture support
model
examples
python
#20752 opened 2026-03-19 11:29 by
Acceldium
CANN: add RoPE cache preload before ACL graph capture
ggml
Ascend NPU
#20747 opened 2026-03-19 06:36 by
noemotiovon
ci: Use cmake --build instead of direct make invocation
devops
#20742 opened 2026-03-19 02:26 by
rillomas
SYCL: upgraded default oneAPI version
devops
#20731 opened 2026-03-18 20:33 by
WizardlyBump17
Convert: Make NVFP4 and MXFP4 HF conversions say NVFP4/MXFP4 instead of BF16
python
#20730 opened 2026-03-18 19:58 by
michaelw9999
server: workaround new chat parser regression
examples
server
#20729 opened 2026-03-18 19:55 by
jpohhhh
ggml-webgpu: port cpy pipline to shader lib with JIT compilation
ggml
WebGPU
#20728 opened 2026-03-18 19:36 by
abhijitramesh
server : improve mtmd ctx checkpoints
examples
server
#20726 opened 2026-03-18 18:24 by
ggerganov
ggml-cpu: extend RVV repack GEMM and GEMV to other VLENs
ggml
#20723 opened 2026-03-18 16:44 by
taimur-10x
gguf : fix division by zero
ggml
#20716 opened 2026-03-18 10:40 by
ggerganov
fix(rpc): prevent division by zero in deserialize_tensor
ggml
#20712 opened 2026-03-18 08:30 by
y198nt
ggml-webgpu: add vectorized flash attention
ggml
WebGPU
#20709 opened 2026-03-18 05:38 by
ArberSephirotheca
feat: MTP support for dense Qwen 3.5 (0.8B-27B)
model
examples
python
server
#20700 opened 2026-03-17 20:50 by
itigges22
common/chat, server: refactor, move all conversion functions to common, add tests
testing
examples
server
#20690 opened 2026-03-17 16:36 by
pwilkin
resolve missing <chrono> and pointer type issues for Windows/Clang build
examples
#20674 opened 2026-03-17 10:53 by
d-5t
Mention ANV_SYS_MEM_LIMIT in Vulkan/Linux section of build.md
documentation
#20670 opened 2026-03-17 07:45 by
adapt-L
vulkan: change gated_delta_net to shard a column across a subgroup
Vulkan
ggml
#20662 opened 2026-03-17 01:28 by
jeffbolznv
ggml-cuda: Add NVFP4 dp4a kernel
Nvidia GPU
python
ggml
#20644 opened 2026-03-16 14:19 by
michaelw9999
mtmd, server : add Voxtral Realtime 4B support with /v1/audio/transcriptions endpoint
model
examples
python
server
#20638 opened 2026-03-16 12:27 by
didlawowo
[CUDA] Increase number of output elements per-thread block if the K-dimension is small
Nvidia GPU
ggml
#20635 opened 2026-03-16 11:26 by
gaugarg-nv
ggml-cpu: add 128-bit RVV implementation for Quantization Vector Dot
ggml
#20633 opened 2026-03-16 10:56 by
rehan-10xengineer
model : add QKV weight fusion for LLaMA, Qwen2, and Qwen3
model
script
python
#20628 opened 2026-03-16 07:59 by
JoursBleu
ggml-cpu: simd_gemm implementation for riscv vector extension
ggml
#20627 opened 2026-03-16 07:45 by
rehan-10xengineer
[OpenVINO backend] add func is_splited_model()
ggml
OpenVINO
#20626 opened 2026-03-16 06:38 by
zhaixuejun1993
MiroThinker tool call parser
#20624 opened 2026-03-16 06:21 by
hksdpc255
Older