(cache)MCP MVP by allozaur · Pull Request #18655 · ggml-org/llama.cpp

allozaur · 2026-01-07T08:32:56Z

Note

Demos and description are WIP

Video demos

Adding a new MCP Server and using it within an Agentic Loop

demo1.mp4

Using MCP Prompts

demo2.mp4

Using MCP Resources

demo3.mp4

Image Generation and Web Search using different MCP servers

demo4.mp4

New features

Adding System Message to conversation or injecting it to an existing one
CORS Proxy on llama-server backend side
MCP
- Servers Selector
- Settings with Server cards showing capabilities, instructions and other information
- Tool Calls
  - Agentic Loop
    - Logic
    - UI with processing stats
- Prompts
  - Detection logic in „Add” dropdown
  - Prompt Picker
  - Prompt Args Form
  - Prompt Attachments in Chat Form and Chat Messages
- Resources
  - Browser with search & filetree view
  - Resource Attachments & Preview dialog
Show raw output switch under the assistant message
Favicon utility
Key-Value form component (used for MCP Server headers in add new/edit mode)

UI Improvements

Created TruncatedText component
Created Image CORS error fallback UI
Created DropdownMenuSearchable component
Created HorizontalScrollCarousel
Created CollapsibleContentBlock component + refactored Reasoning Content && Tool Calls
Code block improved UI + unfinished code block handling + syntax highlighting improvements
Max-heght for components + autoscroll of overflowing content during streaming
New statistics UI + added new data for Tool Calls and Agentic summary
Better time formatting for Chat Message Statistics

Architecture refactors/improvements:

Autoscroll hook
Renamed all service files to have .service.ts format
Common types definitions
Abort Signal utility + refactor
API Fetch utility
Cache TTL
Components folder naming & restructuring
Context API for editing messages and message actions
Enums & constants cleanup
Markdown Rendering improvements
Chat Form components
Resolving images in Markdown Content
Chat Attachments API improvements
Removed Model Change validation logic
Removed File Upload validation logic
New formatters
Storybook upgrade
New prop to define initial section when opening Chat Settings

ServeurpersoCom · 2026-01-07T11:32:45Z

Thank you for the architectural unification! The SearchableDropdownMenu refactor is superb, we're making good progress!

Only remaining items/features for a MVP (& testing on my side) :

Server naming based solely on domain name is problematic for those wanting to host multiple MCP services on web subdirectories or subdomains (like on our home/personal servers or a future integrated MCP backend!): we need to find a solution
Display on narrow widths needs improvement -> MCP selector drops below the model name (yes model names are long, plus I prefix MoE/Dense on all my models because this info really needs to be visible to everyone. Auto-detection from GGUF would be fantastic someday!)
Stress test context with heavy RAG, we've already done many long complex agentic loops on sandbox successfully (your webinterface automatic publication from agent was cool btw)
Global on/off button on the first server seems strange to me

tools/server/webui/docs/architecture/high-level-architecture-simplified.md

tools/server/webui/docs/architecture/high-level-architecture.md

tools/server/webui/docs/flows/settings-flow.md

tools/server/webui/package.json

tools/server/webui/src/lib/clients/openai-sse.ts

tools/server/webui/src/lib/mcp/host-manager.ts

tools/server/webui/src/lib/mcp/server-connection.ts

tools/server/webui/src/lib/stores/chat.svelte.ts

tools/server/webui/src/lib/types/database.d.ts

strawberrymelonpanda · 2026-01-09T09:49:14Z

I'm interested in this, so gave the PR a look and just wanted to ask, are local MCP servers planned to be supported?

Right now it looks like URL is required, without a local "command" "args" "env" type option. (for Node, NPX / UVX, Docker, etc) Might be able to get around this with a MCP proxy server, but built-in support of local servers like many MCP clients would be welcomed.

e.g. Cursor, VS Code, OpenCode, Roo Code, Antigravity, LM Studio, and others support the following with small variations:

{
  "mcpServers": {
    "git": {
      "command": "uvx",
      "args": ["mcp-server-git"]
    },
    "name": {
      "command": "npx",
      "args": [ "/path/index.js" ],
      "env": { "VAR": "VAL" }
    }   
  }
}

Lots of examples here.

I know it's still WIP, but just wanted to ask. Or maybe I've missed it?

ServeurpersoCom · 2026-01-09T10:12:01Z

I'm interested in this, so gave the PR a look and just wanted to ask, are local MCP servers planned to be supported?

Right now it looks like URL is required, without a local "command" "args" "env" type option. (for Node, NPX / UVX, Docker, etc) Might be able to get around this with a MCP proxy server, but built-in support of local servers like many MCP clients would be welcomed.

e.g. Cursor, VS Code, OpenCode, Roo Code, Antigravity, LM Studio, and others support the following with small variations:
{
  "mcpServers": {
    "git": {
      "command": "uvx",
      "args": ["mcp-server-git"]
    },
    "name": {
      "command": "npx",
      "args": [ "/path/index.js" ],
      "env": { "VAR": "VAL" }
    }   
  }
}
Lots of examples here.

I know it's still WIP, but just wanted to ask. Or maybe I've missed it?

This PR is browser-side (Svelte) -> TCP (streamable-http / sse / websocket), so no stdio support on browser.
A backend relay in llama-server for stdio->HTTP MCP bridging would be possible but not yet implemented.
I have a personal Node.js proxy doing this (OAI->MCP (with stdio)->OAI hack), can share if useful.

EDIT: And once we have the MCP client in the browser, nothing prevents a small example script in Python or Node.js from relaying MCP to stdio :)

allozaur · 2026-01-09T10:39:57Z

I'm interested in this, so gave the PR a look and just wanted to ask, are local MCP servers planned to be supported?

Right now it looks like URL is required, without a local "command" "args" "env" type option. (for Node, NPX / UVX, Docker, etc) Might be able to get around this with a MCP proxy server, but built-in support of local servers like many MCP clients would be welcomed.

e.g. Cursor, VS Code, OpenCode, Roo Code, Antigravity, LM Studio, and others support the following with small variations:
{
  "mcpServers": {
    "git": {
      "command": "uvx",
      "args": ["mcp-server-git"]
    },
    "name": {
      "command": "npx",
      "args": [ "/path/index.js" ],
      "env": { "VAR": "VAL" }
    }   
  }
}
Lots of examples here.

I know it's still WIP, but just wanted to ask. Or maybe I've missed it?

Hey! We are introducing a solid basis for MCP support in llama.cpp, starting with pure WebUI implementation. We will add further enhancements in near future ;)

strawberrymelonpanda · 2026-01-09T22:29:50Z

This PR is browser-side (Svelte) -> TCP (streamable-http / sse / websocket), so no stdio support on browser.
Hey! We are introducing a solid basis for MCP support in llama.cpp, starting with pure WebUI implementation.

Thanks folks, makes sense. I'll make a small proxy script then, just wanted to make sure I wasn't overlooking a component.

I have a personal Node.js proxy doing this (OAI->MCP (with stdio)->OAI hack), can share if useful.

I think I can whip something up, but I wouldn't say no to a reference.

I would say that if the MCP button appears in the WebUI as is, this sort of question is probably bound to come up. A small example proxy script in a docs might not be a bad idea perhaps.

I appreciate the work being done on this and other WebUI PRs.

ServeurpersoCom · 2026-01-16T09:57:25Z

Now we can work with image attachments properly, without context overloading.

An MCP server can return an attachment; the SvelteUI doesn't saturate its context, but informs the LLM that an attachment is available, and intuitively creates a Markdown link!

If the Markdown link points to an attachment, it's displayed.

Now we can do things like ChatGPT DALL-E/Sora Image, and all locally. llama.cpp can query an MCP server stable-diffusion.cpp with Qwen3-Image, and from what I've tested (I'll make videos), the rendering is between DALL-E and Sora Image (an LLM prompts the image generator much better than a human, while capturing the intent without leaving any ambiguity for the image model).

ServeurpersoCom · 2026-01-16T10:47:30Z

Since this works, all that remains is to refactor the CoT with the new pre-rendering format (client-specific tags <<< ... >>>) to have complete control over the context and sending the "interleaving" back to the server during an agentic loop. This will answer several questions from llama.cpp users and developers about powerful models like Minimax. And for Qwen, it will finally provide visibility into the CoT during the agentic loop!

ServeurpersoCom · 2026-01-16T13:14:23Z

Testing interleaved reasoning block and toolcall. I need to remove "Filter reasoning after first turn" useless now

InterleavedThinkingBlockAndImage.mp4

ServeurpersoCom · 2026-01-16T17:31:00Z

Obviously, with the refactoring of the CoT display, for now, all the reasoning is sent back to the model with our proprietary UI tags included ! A simple strip that reuses the regular expressions before sending it to the API will restore the previous functionality.

Later, we'll need an option to choose whether to send this reasoning back to the backend, preserving the actual location of each reasoning block!

For the MCP tool responses, the model has access to everything during the agentic loop, but once a new loop is started, the previous loops are "compressed" to the last N lines of the option, just like the display. This eliminates the token cache, so the backend has to rebuild the last modified agentic loop. This optimizes the context and doesn't degrade agentic performance because the LLM is supposed to have already performed its synthesis.

ServeurpersoCom · 2026-01-17T10:40:20Z

A little fun with image generation through MCP (other server different from the LLM server but dedicated to image generation, nothing prevents from having both 2 in 1 if we have enough VRAM for both inference instances)

MCP-Image-Gen2.mp4

ServeurpersoCom · 2026-01-17T11:22:17Z

The CI fails because we simplified the code path so that the modality type is detected in the post-upload codepath (better), instead of pre-uploading (filepicker, too limiting and incompatible). The Storybook tests still check the old UX where Images/Audio buttons were disabled based on model modalities, but that logic got nuked from ChatForm in the refactor. Now the filepicker accepts everything and validation happens client-side after upload with text fallback, so tests are looking for DOM elements that don't exist anymore. The modality props became orphaned, ChatFormActions still receives them but ChatForm doesn't compute or pass them anymore. We need to nuke these obsolete UI tests and keep the actual modality validation logic in unit tests where it belongs.

ServeurpersoCom · 2026-01-17T17:25:23Z

I really want to rebase this branch (but I'll hold off)!
The existing feature of graying out non-Vision models if there's an image upload in the conversation isn't ideal: when you send an image to a Vision model, it "transforms" the projection into text that can be processed by any other non-Vision LLM. Furthermore, the property is only loaded for models already in use. Personally, I don't like this feature, this unnecessarily restricts the LLM use (Vision model are less powerfull for some specific following interactions, and there's also a wider selection of Non-Vision models)
But I can fix MCP tollcall response attachment should NOT be considered a user upload.

strawberrymelonpanda · 2026-01-17T21:40:55Z

The existing feature of graying out non-Vision models if there's an image upload in the conversation isn't ideal: when you send an image to a Vision model, it "transforms" the projection into text that can be processed by any other non-Vision LLM.

I've certainly wondered about this scenario. There are cases where I can certainly see chaining vision and non-vision models to be useful, such as having a vision model do OCR and then a non-vision model follow-up. I'd love an omni-model that's on par, but in my tests, even the recent large Qwen vision models still lack in coding and logic compared to their non-vision counterparts.

That said, the "transformation" is incomplete and based on prompt, isn't it? Like, describing the image or OCR'ing the image, rather than actually converting the image into non-vision text tokens? I assume it was grayed out because the non-vision model wouldn't be able to 'see' the image / make sense of the tokens,

Personally I'd totally support a PR to change it regardless.

ServeurpersoCom · 2026-01-17T21:53:50Z

The existing feature of graying out non-Vision models if there's an image upload in the conversation isn't ideal: when you send an image to a Vision model, it "transforms" the projection into text that can be processed by any other non-Vision LLM.

I've certainly wondered about this scenario. There are cases where I can certainly see chaining vision and non-vision models to be useful, such as having a vision model do OCR and then a non-vision model follow-up. I'd love an omni-model that's on par, but in my tests, even the recent large Qwen vision models still lack in coding and logic compared to their non-vision counterparts.

That said, the "transformation" is incomplete and based on prompt, isn't it? Like, describing the image or OCR'ing the image, rather than actually converting the image into non-vision text tokens? I assume it was grayed out because the non-vision model wouldn't be able to 'see' the image / make sense of the tokens,

Personally I'd totally support a PR to change it regardless.

As a precaution, I simply fixed the problem. But the question was raised, I performed a simple test:

I sent an image and asked the model NOT to respond; it accepted.

After several exchanges, I requested a very detailed description of the image, and it managed to provide it without error! ~~The projection is indeed present in the backend and is destroyed if the model is changed~~. This needs to be tested even when switching from one VL to another. -> This work

ServeurpersoCom · 2026-01-17T21:57:40Z

Actually, it's quite simple. The image remains attached in the client-side prompt. And of course, it's sent again on the next request. A non-VL model would throw an error. So the feature is simply a security measure,

Alternatively, you could just filter out the image and continue in text mode using the user-requested description instead, though this is less precise than having a VL model process the actual image. In fact, it would reduce code complexity and give users more freedom. You'd just need a small notification that the image and its full description are no longer considered in the context.

allozaur mentioned this pull request Jan 7, 2026

webui: MCP client with low coupling to current codebase #17487

Closed

loci-dev mentioned this pull request Jan 7, 2026

UPSTREAM PR #18655: MCP MVP auroralabs-loci/llama.cpp#842

Open

github-actions bot added examples server labels Jan 7, 2026

allozaur mentioned this pull request Jan 7, 2026

add in mcp server support to frontend webui [SERVER] [WEBUI] #18422

Closed

allozaur commented Jan 7, 2026

View reviewed changes

This was referenced Jan 8, 2026

webui: Fix the header backdrop blur #18230

Closed

Misc. bug: Header backdrop blur does not cover full width - WebUI #18229

Open

allozaur added server/webui enhancement labels Jan 8, 2026

ServeurpersoCom added a commit to ServeurpersoCom/llama.cpp that referenced this pull request Jan 9, 2026

Testing PR ggml-org#18694 and ggml-org#18655 together

500689f

ServeurpersoCom added a commit to ServeurpersoCom/llama.cpp that referenced this pull request Jan 9, 2026

Testing PR ggml-org#18694 and ggml-org#18655 together

e734d93

allozaur force-pushed the allozaur/mcp-mvp branch 2 times, most recently from 1c7048d to b11b32e Compare January 14, 2026 12:08

allozaur mentioned this pull request Jan 15, 2026

Feature Request: Add UI Toggle to Auto‑Remove Chain‑of‑Thought and Pre‑Submit Conversation History for Faster Next‑Turn Encoding #18853

Open

4 tasks

allozaur added 26 commits February 13, 2026 14:05

chore: update webui build output

10c1875

refactor: Cleanup

853f711

refactor: Cleanup

e55ee82

refactor: Redesign DropdownMenuSearchable as content provider

184cb50

feat: Enable MCP prompt button in chat message edit form

530868f

docs: Centralize and enhance service documentation

c6e5da1

chore: update webui build output

23a09c9

feat: UI improvements

cde9d45

refactor: Move MCP health checks to background process from core layout

2d59005

refactor: Cleanup

3df7d7f

chore: update webui build output

95c075d

chore: Update packages with npm audit fix

66a7819

chore: update webui build output

b8d5b93

fix: UI

3a35567

feat: Improve agentic turn visualization and statistics

fc13a0b

docs: Update diagrams

e233ec3

chore: Formatting

9b696fa

chore: update webui build output

d227605

refactor: Cleanup

2bc1111

refactor: Consolidate MCP resource attachment components

5824e8b

feat: Improve styling for MCP resource attachment chip

7207364

refactor: Use Svelte derived state for non-mutable reactive variables

57ad935

refactor: Standardize MCP server ID generation and prefix

daec8b9

refactor: ConversationSelection initial state handling & Minor fixes

8c8ac01

build: align CMake and CI with master while keeping CORS proxy

9240425

chore: update webui build output

Loading
Loading status checks…

f32a4f2

allozaur force-pushed the allozaur/mcp-mvp branch from 3d89eaa to f32a4f2 Compare February 13, 2026 13:08

allozaur added 2 commits February 13, 2026 14:16

fix: Post-rebase fixes

Loading
Loading status checks…

ecd4520

chore: update webui build output

Loading
Loading status checks…

49d3f4e

allozaur mentioned this pull request Feb 13, 2026

webui: Architecture and UI improvements #19596

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MCP MVP#18655

MCP MVP#18655
allozaur wants to merge 327 commits intoggml-org:masterfrom
allozaur:allozaur/mcp-mvp

allozaur commented Jan 7, 2026 •

edited

Loading

ServeurpersoCom commented Jan 7, 2026 •

edited

Loading

strawberrymelonpanda commented Jan 9, 2026 •

edited

Loading

ServeurpersoCom commented Jan 9, 2026 •

edited

Loading

allozaur commented Jan 9, 2026

strawberrymelonpanda commented Jan 9, 2026

ServeurpersoCom commented Jan 16, 2026 •

edited

Loading

ServeurpersoCom commented Jan 16, 2026

ServeurpersoCom commented Jan 16, 2026

ServeurpersoCom commented Jan 16, 2026

ServeurpersoCom commented Jan 17, 2026

ServeurpersoCom commented Jan 17, 2026

ServeurpersoCom commented Jan 17, 2026

strawberrymelonpanda commented Jan 17, 2026 •

edited

Loading

ServeurpersoCom commented Jan 17, 2026 •

edited

Loading

ServeurpersoCom commented Jan 17, 2026 •

edited

Loading

Reviewers

Assignees

Labels

Projects

Milestone

Development

10 participants

Conversation

allozaur commented Jan 7, 2026 • edited Loading

Video demos

Adding a new MCP Server and using it within an Agentic Loop

Using MCP Prompts

Using MCP Resources

Image Generation and Web Search using different MCP servers

New features

UI Improvements

Architecture refactors/improvements:

ServeurpersoCom commented Jan 7, 2026 • edited Loading

strawberrymelonpanda commented Jan 9, 2026 • edited Loading

ServeurpersoCom commented Jan 9, 2026 • edited Loading

allozaur commented Jan 9, 2026

strawberrymelonpanda commented Jan 9, 2026

ServeurpersoCom commented Jan 16, 2026 • edited Loading

ServeurpersoCom commented Jan 16, 2026

ServeurpersoCom commented Jan 16, 2026

ServeurpersoCom commented Jan 16, 2026

ServeurpersoCom commented Jan 17, 2026

ServeurpersoCom commented Jan 17, 2026

ServeurpersoCom commented Jan 17, 2026

strawberrymelonpanda commented Jan 17, 2026 • edited Loading

ServeurpersoCom commented Jan 17, 2026 • edited Loading

ServeurpersoCom commented Jan 17, 2026 • edited Loading

Reviewers

Assignees

Labels

Projects

Milestone

Development

10 participants

allozaur commented Jan 7, 2026 •

edited

Loading

ServeurpersoCom commented Jan 7, 2026 •

edited

Loading

strawberrymelonpanda commented Jan 9, 2026 •

edited

Loading

ServeurpersoCom commented Jan 9, 2026 •

edited

Loading

ServeurpersoCom commented Jan 16, 2026 •

edited

Loading

strawberrymelonpanda commented Jan 17, 2026 •

edited

Loading

ServeurpersoCom commented Jan 17, 2026 •

edited

Loading

ServeurpersoCom commented Jan 17, 2026 •

edited

Loading