MCP MVP#18655
Conversation
|
Thank you for the architectural unification! The SearchableDropdownMenu refactor is superb, we're making good progress! Only remaining items/features for a MVP (& testing on my side) :
|
tools/server/webui/docs/architecture/high-level-architecture-simplified.md
Show resolved
Hide resolved
|
I'm interested in this, so gave the PR a look and just wanted to ask, are local MCP servers planned to be supported? Right now it looks like URL is required, without a local "command" "args" "env" type option. (for Node, NPX / UVX, Docker, etc) Might be able to get around this with a MCP proxy server, but built-in support of local servers like many MCP clients would be welcomed. e.g. Cursor, VS Code, OpenCode, Roo Code, Antigravity, LM Studio, and others support the following with small variations: Lots of examples here. I know it's still WIP, but just wanted to ask. Or maybe I've missed it? |
This PR is browser-side (Svelte) -> TCP (streamable-http / sse / websocket), so no stdio support on browser. EDIT: And once we have the MCP client in the browser, nothing prevents a small example script in Python or Node.js from relaying MCP to stdio :) |
Hey! We are introducing a solid basis for MCP support in llama.cpp, starting with pure WebUI implementation. We will add further enhancements in near future ;) |
Thanks folks, makes sense. I'll make a small proxy script then, just wanted to make sure I wasn't overlooking a component.
I think I can whip something up, but I wouldn't say no to a reference. I would say that if the MCP button appears in the WebUI as is, this sort of question is probably bound to come up. A small example proxy script in a docs might not be a bad idea perhaps. I appreciate the work being done on this and other WebUI PRs. |
1c7048d to
b11b32e
Compare
|
Since this works, all that remains is to refactor the CoT with the new pre-rendering format (client-specific tags <<< ... >>>) to have complete control over the context and sending the "interleaving" back to the server during an agentic loop. This will answer several questions from llama.cpp users and developers about powerful models like Minimax. And for Qwen, it will finally provide visibility into the CoT during the agentic loop! |
|
Testing interleaved reasoning block and toolcall. I need to remove "Filter reasoning after first turn" useless now InterleavedThinkingBlockAndImage.mp4 |
|
Obviously, with the refactoring of the CoT display, for now, all the reasoning is sent back to the model with our proprietary UI tags included ! A simple strip that reuses the regular expressions before sending it to the API will restore the previous functionality. Later, we'll need an option to choose whether to send this reasoning back to the backend, preserving the actual location of each reasoning block! For the MCP tool responses, the model has access to everything during the agentic loop, but once a new loop is started, the previous loops are "compressed" to the last N lines of the option, just like the display. This eliminates the token cache, so the backend has to rebuild the last modified agentic loop. This optimizes the context and doesn't degrade agentic performance because the LLM is supposed to have already performed its synthesis. |
|
A little fun with image generation through MCP (other server different from the LLM server but dedicated to image generation, nothing prevents from having both 2 in 1 if we have enough VRAM for both inference instances) MCP-Image-Gen2.mp4 |
|
The CI fails because we simplified the code path so that the modality type is detected in the post-upload codepath (better), instead of pre-uploading (filepicker, too limiting and incompatible). The Storybook tests still check the old UX where Images/Audio buttons were disabled based on model modalities, but that logic got nuked from ChatForm in the refactor. Now the filepicker accepts everything and validation happens client-side after upload with text fallback, so tests are looking for DOM elements that don't exist anymore. The modality props became orphaned, ChatFormActions still receives them but ChatForm doesn't compute or pass them anymore. We need to nuke these obsolete UI tests and keep the actual modality validation logic in unit tests where it belongs. |
|
I've certainly wondered about this scenario. There are cases where I can certainly see chaining vision and non-vision models to be useful, such as having a vision model do OCR and then a non-vision model follow-up. I'd love an omni-model that's on par, but in my tests, even the recent large Qwen vision models still lack in coding and logic compared to their non-vision counterparts. That said, the "transformation" is incomplete and based on prompt, isn't it? Like, describing the image or OCR'ing the image, rather than actually converting the image into non-vision text tokens? I assume it was grayed out because the non-vision model wouldn't be able to 'see' the image / make sense of the tokens, Personally I'd totally support a PR to change it regardless. |
As a precaution, I simply fixed the problem. But the question was raised, I performed a simple test: I sent an image and asked the model NOT to respond; it accepted. After several exchanges, I requested a very detailed description of the image, and it managed to provide it without error! |
|
Actually, it's quite simple. The image remains attached in the client-side prompt. And of course, it's sent again on the next request. A non-VL model would throw an error. So the feature is simply a security measure, Alternatively, you could just filter out the image and continue in text mode using the user-requested description instead, though this is less precise than having a VL model process the actual image. In fact, it would reduce code complexity and give users more freedom. You'd just need a small notification that the image and its full description are no longer considered in the context. |
3d89eaa to
f32a4f2
Compare
Note
Demos and description are WIP
Video demos
Adding a new MCP Server and using it within an Agentic Loop
demo1.mp4
Using MCP Prompts
demo2.mp4
Using MCP Resources
demo3.mp4
Image Generation and Web Search using different MCP servers
demo4.mp4
New features
UI Improvements
Architecture refactors/improvements:
.service.tsformat