(cache)Pull Requests ggml-org/llama.cpp

common : add standard Hugging Face cache support

#20775 opened 2026-03-19 21:10 by angt

opencl: add flattened Q4_K mv and general Q4_K mm ggml OpenCL

#20773 opened 2026-03-19 20:14 by shaofeiqi

ggml-cpu: fix cross-compilation to Windows from Linux ggml

#20759 opened 2026-03-19 14:29 by yeison-pati

llama-fit-params: fix patterns for gate_up tensors

#20758 opened 2026-03-19 14:14 by JohannesGaessler

Add MCP Connection diagnostics and CORS hint to web-ui examples server

#20753 opened 2026-03-19 11:49 by evalstate

Add Qwen3 TTS architecture support model examples python

#20752 opened 2026-03-19 11:29 by Acceldium

CANN: add RoPE cache preload before ACL graph capture ggml Ascend NPU

#20747 opened 2026-03-19 06:36 by noemotiovon

ci: Use cmake --build instead of direct make invocation devops

#20742 opened 2026-03-19 02:26 by rillomas

SYCL: upgraded default oneAPI version devops

#20731 opened 2026-03-18 20:33 by WizardlyBump17

Convert: Make NVFP4 and MXFP4 HF conversions say NVFP4/MXFP4 instead of BF16 python

#20730 opened 2026-03-18 19:58 by michaelw9999

server: workaround new chat parser regression examples server

#20729 opened 2026-03-18 19:55 by jpohhhh

ggml-webgpu: port cpy pipline to shader lib with JIT compilation ggml WebGPU

#20728 opened 2026-03-18 19:36 by abhijitramesh

server : improve mtmd ctx checkpoints examples server

#20726 opened 2026-03-18 18:24 by ggerganov

ggml-cpu: extend RVV repack GEMM and GEMV to other VLENs ggml

#20723 opened 2026-03-18 16:44 by taimur-10x

gguf : fix division by zero ggml

#20716 opened 2026-03-18 10:40 by ggerganov

fix(rpc): prevent division by zero in deserialize_tensor ggml

#20712 opened 2026-03-18 08:30 by y198nt

ggml-webgpu: add vectorized flash attention ggml WebGPU

#20709 opened 2026-03-18 05:38 by ArberSephirotheca

feat: MTP support for dense Qwen 3.5 (0.8B-27B) model examples python server

#20700 opened 2026-03-17 20:50 by itigges22

common/chat, server: refactor, move all conversion functions to common, add tests testing examples server

#20690 opened 2026-03-17 16:36 by pwilkin

resolve missing <chrono> and pointer type issues for Windows/Clang build examples

#20674 opened 2026-03-17 10:53 by d-5t

Mention ANV_SYS_MEM_LIMIT in Vulkan/Linux section of build.md documentation

#20670 opened 2026-03-17 07:45 by adapt-L

vulkan: change gated_delta_net to shard a column across a subgroup Vulkan ggml

#20662 opened 2026-03-17 01:28 by jeffbolznv

ggml-cuda: Add NVFP4 dp4a kernel Nvidia GPU python ggml

#20644 opened 2026-03-16 14:19 by michaelw9999

mtmd, server : add Voxtral Realtime 4B support with /v1/audio/transcriptions endpoint model examples python server

#20638 opened 2026-03-16 12:27 by didlawowo

[CUDA] Increase number of output elements per-thread block if the K-dimension is small Nvidia GPU ggml

#20635 opened 2026-03-16 11:26 by gaugarg-nv

ggml-cpu: add 128-bit RVV implementation for Quantization Vector Dot ggml

#20633 opened 2026-03-16 10:56 by rehan-10xengineer

model : add QKV weight fusion for LLaMA, Qwen2, and Qwen3 model script python

#20628 opened 2026-03-16 07:59 by JoursBleu

ggml-cpu: simd_gemm implementation for riscv vector extension ggml

#20627 opened 2026-03-16 07:45 by rehan-10xengineer

[OpenVINO backend] add func is_splited_model() ggml OpenVINO

#20626 opened 2026-03-16 06:38 by zhaixuejun1993

MiroThinker tool call parser

#20624 opened 2026-03-16 06:21 by hksdpc255