I do long group chats in which there many characters over many scenes. Where you might start a new chat, I just close the scene and go to a new scene in the same chat, like it's an ongoing story. The previous chat was over 50,000 responses. The current chat is at 11,000.
What I've been doing is using a quick reply to summarize the scene with keywords, inject it into a lorebook entry and also inject it into the chat history, then hide the back-and-forth of that scene. All the model sees is the current scene dialog and a bunch of summaries of all the prior events.
In theory, it'll work like this:
-
The lorebook entries get triggered on keywords, like key past events.
-
When a scene begins, the chat history sent to the LLM contains only scene summaries from as many prior scenes as will fit in context. This keeps recent events most influential to development. If, for example, a character got a tattoo three scenes ago, it would be in-context for several scenes after that one and, if tattoo is mentioned, the lorebook entry would trigger reminding the model of the tattoo's existence.
Sounds great, right? The problem I'm having is that it's not passing all of the chat history scene summaries. I have a model with 128k context and it's often pushing 25k. In theory MANY scene summaries ought to fit in context, but ST isn't passing them to the model. It's passing five or six. It's not being crushed by lorebook budget, either. It's just not passing full context.
Any idea why? Does ST only look back for unhidden context so far? Is that adjustable?
NOTE: I tried setting # of messages to load before pagination to "all" and that has broken my install. I'm working on that separately, but that's probably not the solution.
NOTE 2: I could, instead of hiding the back-and-forth dialog from the model, simply delete it, but that seems... wrong?
*** EDIT: I realize that I'm not being clear: My model has 128k of context and ST is only sending ~8k of prompt. I would like to send ~64k if possible!
Gemini's acting up again so I just wanna ask if anyone has been able to make free claude usable at all. I'm adamant that I won't pay for AI gooning
(DISCLAIMER: Wisdom Gate (juheapi) is supposed to be a provider that offers models like Deepseek for free, as well as other similar ones, although after my explanation, I'm not sure how convinced you'll be.)
I discovered by chance—in fact, after publishing two posts (FREE DEEPSEEK V3.1 FOR ROLEPLAY and ALL FREE DEEPSEEK V3.1 PROVIDERS), which had a fair amount of success and visibility—that a user whose name I won't reveal shortly afterward published posts that were very similar, if not entirely copied (especially the second one) to mine. He also added a Wisdom Gate website, which, after some simple research, I discovered was his. Intrigued, I tried the site and I'm not saying it's a scam but it's very unfair, for example, a token is equivalent to about 4 characters in English and is always dynamic, never static, while on his site it's not like that, I did a first test with a message of about 674 tokens for normal standards (openAI, etc.) while on his site there were 1858 tokens about 2.75 more, I did a second test with a different account, with a single request for 299 tokens inexplicably, on his site the requests had become 3 with 19k+ tokens spent, finally I did a third test with another account and with a single request for 300+ tokens on his site there were 10k+ tokens, which makes the tokens dynamic and not static. But we're good, so let's pretend the first two are just bugs. Deepseek V3.1 Terminus, Deepseek's latest creation, has been released. On their official website, it costs roughly $2 for input and output per million tokens, while on Wisdom Gate it costs $4 for input and $12 for output. Doing some calculations and pretending that tokens are static at a 5:1 ratio, typical in roleplays, for a normal million tokens, i.e. the system used by Deepseek, Openai, etc., you would end up spending roughly $30 per million tokens. For example, if you raised $1,500 on Wisdom Gate with an average monthly consumption of 1 million tokens, it would last about 50 months; on Deepseek, it would last about 750 months.
So, here's what this developer did that was unfair:
1 copying and plagiarizing my posts, without asking me anything to sponsor his site.
2. Don't openly declare that he owns the site because he writes "I found" in both posts, which is misleading.
3. Inflate prices and tokens (making tokens dynamic, not static), thus charging a regular user much more.
So, Wisdom Gate is absolutely not recommended. If you don't believe me, you can check for yourself. I have proof and screenshots to refute any excuse.
For the past few months, I went into the whole craze of the Chatbot stuff, eventually giving a try in trying to run one myself, Since the first time was exciting.
But at this point, It such a freaking headache at this point and not really worth it with how much restriction there is with everything.
Want the big smart LLM that can be creative and follow instructions properly? Pay monthly subscription and have your chats non private. Oh, Also Censorship.
Want to host your own local model and actually have privacy? Get a company grade Graphics cards or deal with running a weak Models that get repetitive and fail to follow instructions most of the time.
Like, I enjoy the whole Roleplay chat stuff, but with the options currently, it simply isn't worth it. I just hope in the future this will get improved. Until then, I am taking step back.
The core of the preset is the same, but I have solved (I think) POV problems some people reported, I never had the problem where the characters use wrong POVs, so I can't be sure.
I revised lengths to work better, and added Styles. They work well, and offer different tones. To be honest, the preset feels very complete, I don't know where to go from here.
I also set "Character Names Behavior" to "None". If your card impersonates, you can try "Message Content."
Before you start, "Prompt Post-Processing" should be set to "Strict" with the presets. It makes a meaningful difference.
Also, I want to remind you again that this preset is made for prose-style RP. "Speech" in quotation marks, italics for thoughts, proper paragraphs, everything in prose. If this is not what you want, you are looking at the wrong preset.
Chatstream v3:
I use Chatstream with all models. Load it and check various styles.
Now... some suggestions for your cultural activities:
-
When bored, disregard the first message. Really, just make the model regenerate it. "Initial User Message" module is set to enable regeneration of a well made first message. If you want to direct the first message, use "Author's Note" in-chat at depth 1 as System.
-
Don't use response length modules before trying the model without it.
-
Actually, when you use "Author's Note", I suggest always using it at in-chat at depth 0 as System. Use it for one message only, and remove it after it did its job. It works really well as directions for one response.
-
If you want to use a reasoning model, I suggest enabling "Reasoning" module. It directs the model's thinking for RP. I believe it works well.
-
If you use other instructions like ones in a lorebook, or some other instructions are in the card itself (like people writing 'don't talk as {{user}}' or similar stuff in their cards), I suggest you to disable/delete them. Preset already has instructions, more (and sometimes conflicting) instructions will only confuse AI.
-
If the model doesn't write dialogue, enable Dialogue-Driven, it usually fixes it.
-
"NSFW Toggle" is not for always keeping it enabled. If your card is NSFW, the preset will play it as NSFW. It is more for forcing SFW cards, or SFW-states in your RP with NSFW card, into NSFW. And it enhances NSFW writing, you can also enable it for that when the current state is NSFW.
-
"Raw NSFW" is an addon to "NSFW Toggle," I don't recommend using it without "NSFW Toggle."
-
"Soft Jailbreak" is not a jailbreak. It just nudges models into a little more cursing, immorality, and all that. Use it with overly moral models, not for jailbreaking. This preset doesn't have anything intended as a true jailbreak.
-
I mostly use DeepSeek v3.1 without reasoning, or GLM-4.5 without reasoning. TNG-R1T2-Chimera is the reasoning model I use the most.
So I started using NanoGPT, was super excited because it is SO much less expensive than the Deepseek official API...but, I am getting so many:
Chat completion request error: Service Unavailable {"error":{"message":"All available services are currently unavailable. Please try again later. ","s tatus": 503," type": "service_unavailable"," param":null,"code":" all_fallbacks_failed"}}
errors. Like, nonstop. Is it something on my end? Other APIs working fine, but NanoGPT not so much.
I actively use Voice Home Assistant and have a local server deployed in my home network for speech generation. Since I didn't find a ready-made solution for connection, I [vibe]coded a simple converter for the OpenAI compatible protocol. It works quite stably. All the voices that the server provides can be used in chat for different characters.
For some reason, the option to disable the narrator's voiceover doesn't work for me, but it seems to be a bug of the ST itself.
I'll be glad if it comes in handy for someone.
I don't even know why I'm sharing this here. Probably because I don't have anyone to talk to about it in person.
After more than 3 years of using Silly Tavern intensively, I came to the realisation that ERP had become problematic for my mental health. I don't come from a background that's conducive to addictions or mental health issues (well-balanced family and professional life, no major income problems, no major health issues, etc.), but it's clear that I'd hit a wall. Every day, Silly Tavern was open on my PC as a sideline to my work. Needless to say, it ended up having a drastic impact on my productivity and a large part of my free time. Luckily I was able to resist installing it on my cellphone, but I was still using the local network profusely (my main PC is a media centre that's always open).
So last night I deleted all my folders, presets, cards, etc. in the hope that having no back-up and having to reconfigure everything to my liking would be enough to keep me away from it until I'd completely given up. I feel like an alcoholic who's just got rid of his strong bottles.
Have any of you come to the same conclusion, that you're an addict? If not, how often do you use SillyTavern?
Hello community, thanks for reading this post.
I've only recently discovered the world of AI roleplaying and have been testing out different sites, just to find out none of them are quite what I'm looking for. Let me try to summarize some of the things I'd ideally want:
-
Longer roleplay and world-building, spanning over multiple sessions.
-
Introducing and scrapping characters as the story progresses.
-
(!!) A long memory so I can actually build up meaningful relationships with the characters.
-
NSFW, whether it is violence or sexual, to be possible.
I have tried some sites, but those mainly seem to lean into the AI-Girlfriend kind of thing. Ideally I'd want to create a much bigger story where the AI-Girlfriend kind of experience is just a part of it. Some of the most annoying/immersion-breaking experiences so far have been loops where the character just starts to repeat the same scenario over and over again, the AI not trying to advance any plot or just the AI forgetting important details that either just happened or happened longer ago in the story.
Currently I'm looking at giving SillyTavern a try together with OpenRouter and chat vectorization. I would be extremely grateful for any advice. Is this likely to match what I'm looking for or would I be better off with a different commercial solution?
(Bonus question: I see some sites specifically advertise longer memory for meaningful interactions. Are they actually using some in-house solution or is this just a bigger context size and/or chat vectorization with a bit of marketing flair?)
Thanks so much for reading, this is still new to me and I'm hoping to learn.
Today I'll list all the providers (so far) I've found that offer Deepseek V3.1 for free. (Disclaimer: Many of these providers only work on Sillytavern.)
●4EVERLAND offers deepseek for free with no written limits, but it might only work if you connect your credit card, I don't know, also as soon as you add a payment method they will give you 1000000 LAND, their currency.
●Alibaba Cloud offers one million free tokens to all new users who register.
●Atlascloud offers $0.10 free per day, which is about 230 free messages per day if you set the token length limit to 200; if you set it to 500, it's about 100.
●Byteplus ModelArk offers 500,000 free tokens to new users, and by inviting friends, you can reach a maximum of $45 per invite. It only works via VPN, preferably in Indonesia.
●CometAPI is supposed to offer one million free tokens to all users who register, although I don't know if it actually does.
●LLM7, offers deepseek V3.1 for free with limits such as 20 requests per second, 150 requests per minute and 4500 requests per hour with a maximum of 1800 tokens per minute.
●NVIDIA NIM APIs offers completely free access to deepseek, with the only limit being 40 requests per minute.
●Openrouter offers deepseek for free, but with a daily limit of 50 messages.
●Routeway AI, an emerging site that offers deepseek for free with a limit of 200 requests per day (currently 100 because it counts requests and responses separately); you may be subject to a waitlist.
●SambaCloud offers $5 free upon registration and theoretically free access to deepseek with 400 requests per day, although I'm not 100% sure.
●Siliconflow (Chinese edition) offers 14 yuan ($1.97) upon registration and 14 yuan for each friend you invite and register.
●Vercel AI offers $5 free every month.
Now I'll tell you about the free ones, but they require a credit card to register.
●AWS Bedrock/Lambda offers a free $100 signup fee, which can be increased to $200 if you complete tasks.
●Azure offers a free $200 for one month.
●Vertex AI is available through Google Cloud and offers a free $300 for three months.
These are all the providers I've found that offer Deepseek for free for now.
Edit I forgot to add a provider, from now on as soon as I find a new provider I will add it to the list
I feel like in all the models the characters will always be literal. They don't create unique dialogs where they challenge you, withhold information, think longterm, plan ahead, or consider how you might feel if they say something.
It's getting kind of frustrating. It feels marginally better than talking to an NPC in a game.
I keep seeing everyone say that 70Bs are SOOOO amazing and perfect and beautiful and that if you can’t run 70Bs you’re a loser (not really, but you get me). I just got a 3090 and now I can run 50Bs comfortably, but 70Bs are unbearably slow for me and can’t possibly be worth it unless they have godlike writing, let alone 120Bs.
So I’m asking am I fine to just stick with 24-50Bs or so? I keep wondering what I’m missing and then people come out with all kinds of models for 70b and I’m like :/
Today I found a completely free way to use Deepseek V3.1 in an unlimited manner. Besides Deepseek V3.1, there are other models such as Deepseek R1 0528, Kimi 2, and Qwen. Anyway, today I'll explain how to use Deepseek V3.1 for free and in an unlimited manner.
-- Step 1 go on
-- Step 2 once you are on NVIDIA NIM APIs sign in or sign up
-- Step 3 when you sign up they ask you to verify your account to start using their APIs, you have to put your phone number (you can use a virtual number if you don't want to put your real number), once you put your phone number they send you a code via SMS, put the code on the site and you are done
-- Step 4 once done, click on your profile at the top right then go on API Keys and click Generate API Key, save it and you have done.
-- Step 5 go on SillyTavern in the api section put Chat Completion and Custom (OpenAI-compatible)
-- Step 6 in the API URL put this
-- Step 7 in the API Key put your the API that you save before
-- Step 8 in the Model ID put this deepseek-ai/deepseek-v3.1 and you have done
Now that you're done set the main prompt and your settings, I'll give you mine but feel free to choose them yourself: Main prompt: You are engaging in a role-playing chat on SillyTavern AI website, utilizing DeepSeek v3.1 (free) capabilities. Your task is to immerse yourself in assigned roles, responding creatively and contextually to prompts, simulating natural, engaging, and meaningful conversations suitable for interactive storytelling and character-driven dialogue.
-
Maintain coherence with the role and setting established by the user or the conversation.
-
Use rich descriptions and appropriate language styles fitting the character you portray.
-
Encourage engagement by asking thoughtful questions or offering compelling narrative choices.
-
Avoid breaking character or introducing unrelated content.
Think carefully about character motivations, backstory, and emotional state before forming replies to enrich the role-play experience.
Output Format
Provide your responses as natural, in-character dialogue and narrative text without any meta-commentary or out-of-character notes.
Examples
User: "You enter the dimly lit room, noticing strange symbols on the walls. What do you do?" AI: "I step cautiously forward, my eyes tracing the eerie symbols, wondering if they hold a secret message. 'Do you think these signs are pointing to something hidden?' I whisper.",
User: "Your character is suspicious of the newcomer." AI: "Narrowing my eyes, I cross my arms. 'What brings you here at this hour? I don’t trust strangers wandering around like this.'",
Notes
Ensure your dialogue remains consistent with the character’s personality and the story’s tone throughout the session.
Context size: 128k
Max token: 4096
Temperature: 1.00
Frequency Penalty: 0.90
Presence Penalty: 0.90
Top P: 1.00
That's all done, now you can enjoy deepseek V3.1 unlimitedly and for free, small disclaimer sometimes some models like deepseek r1 0528 don't work well, also I think this method is only feasible on SillyTavern.
Edit: New post with tutorial for janitor and chub user
With the S&P 500 and Bitcoin tearing up the charts, are those red-hot areas the best places to invest $10,000 right now?
In the latest edition of Where to Invest, one expert Bloomberg asked about timely opportunities counsels going long on the US and AI. Others, however, point to areas of the US and European markets that may offer greater value and the potential for continued momentum in coming months and years. Favored sectors run from defense to industrials to life sciences tools companies and banks.
When the four wealth advisers were asked where they’d spend $10,000 on a personal interest, ideas stretched from buying whole genome sequencing for the family, to a trip to Australia with loved ones, to following a favorite sports team around the world.
Read the full story here.
____________________________________________
For more in the series:
-
Where to invest $100,000
-
Where to invest $1 million
I have been using gemini 2.5 pro for a long time and for me i think it is the best. Although i have been using it by getting free credits and now its over. I have tried deepseek but it gets nsfw so quick with building play. Gork free which i haven't tried. Which is the best free way u guys suggest and which present u guys use for roleplay.
I'm probably using Kimi wrong or there's some magical prompt out there but the hours I've given it a fair chance, every response is just..weird. Like it tries to hard. Take this dialogue Bring the big first-aid kit and a strawberry shake. No, no ambulance, just sugar and sutures. And maybe a distraction that isn’t me.
. It brings in so much random stuff so fast and it's borderline incoherent. It never keeps the same pacing of a story and there's no narrative stability. It's quirky but not in an entertaining way. The pattern of observing one element in a story, introducing a related one and then making some zinger has made me never want to use it, it's probably the most annoying roleplaying experience I've tried to deal with with expectations above a 70b. I don't really see any critisms against it and had that typical honeymoon phase of 'New model being the best thing ever, better than claude' fanfare that tends to die down, but I could never even see the initial hype.
I don't know if these are bots, but most of these people I see complaining have such sky high expectations (especially for context) that I can't help but feel like an angry old man whenever I see some shit like "Model X only has half a million context? Wow that's shit." "It can't remember exact facts after 32k context, so sad" I can't really tell if these people are serious or not, and I can't believe I've become one of those people, but BACK IN MY DAY (aka, the birth of LLMs/AI Dungeon) we only had like 1k context, and it would be a miracle if the AI got the hair or eye color of a character right. I'm not joking. Back then (gpt-3 age, don't even get me started on gpt-2)the AI was so schizo you had to do at least three rerolls to get something remotely coherent (not even interesting or creative, just coherent). It couldn't handle more than 2 characters on the scene at once (hell sometimes even one) and would often mix them up quite readily.
I would make 20k+ word stories (yes, on 1k context for everything) and be completely happy with it and have the time of my life. If you had told me 4 years ago the run of the mill open source modern LLM could handle up to even 16k context reliably, I straight up wouldn't have believed you as that would seem MASSIVE.
We've come and incredibly long way since then, so to all the newbies who are complaining please stfu and just wait like a year or two, then you can join me in berating the other newer newbies who are complaining about their 3 million context open source LLMs.
Hi all, As the title says, Im looking for a clear guide or some resource on what to put where. Im trying to do something a bit more complex than just having a single character to talk to. My aim is to have a sort of RPG like style game where the AI acts as both narrator/game master and also acts for certain "NPCs"
Currently I have most info in the character card and it sort of works, but it sometimes loses track.
In the character card I currently have:
-the rules of the game -the way the AI should act (narrate, act for NPCs) -a short list of 10 NPCs with some details
I really need a place to start. Long story as short as I can- I suffer from depression and anxiety from a combination of Cushing’s and spine issues. Been trying to get back into RPG and writing/world building to help myself get out of the dark.
I have been using ChatGPT for my worlds, and
1- I cannot Stand the repetitive writing style anymore “Not x, Not y, but Z” triads etc ‘recognition,’ ‘truth’ And scenes ending up falling into genre or literary themes Characters flattening into easy/lazy tropes
2- GPT cannot handle my worlds. AU bleed is real and drives me nuts.
GPT suggested SillyTavern, and it’s obviously powerful- and totally overwhelming to my brain. Trying to find where I should focus my energy is also making me spiral pretty hard.
So I guess I’m looking for guidance.
-Models that write less repetitively, or more creatively. -And/or ways to train/prompt my models to write better (I’m given to understand that models don’t do well with ‘negative training’ like “Don’t do this.” They apparently work like kid’s brains and miss the ‘don’t’ often) -Any suggestions on basic guides to start ST, or other platform options that would be easier for my style (style explained below)
Any guidance. I’m drowning.
Example of what I do- HP AU - 1990s - Quidditch World Cup Imogen, noted dragon researcher is kidnapped by Bellatrix and escapes and ends up in a PR/Political war between the Order and Death Eaters during Old Moldywart’s first rise. Characters have different timelines and histories than cannon. I have characters sheets, POV sheets, Scene Logs, Bond Logs, Unresolved Thread logs, actual Headlines and articles from the Prophet. Everything is kept by chapter.
I used GPT as a co-writer for plot and scene and chapter creation, and then would jump in and play live, then add the summaries to my logs and save them.
Issues again are- writing being repetitive, overuse of ‘tropes’ flattening character voice, and AU bleed (because I have four HP based AUs)
Help a girl who just can’t manage to piece a better option together without feeling like she is drowning, and doesn’t want to give up what’s become shadow work for herself?
I imported a character from janitor ai and now it's not replying (/ー ̄;). How can i fix it ,I asked assistant it said to check each and every line ,by removing and adding to know what part of it is the culprit, I did it, it fixed one character only. How can I fix it for other characters ? Is the solution given by assistant the only way ?
I feel called out
Just when I was starting to get a good story with gemini pro this bs prohibited content suddenly shows up. So i use the prefill but somehow it makes the response shows up inside the thinking box (the one with black color and a word 'Details'). Can someone help me with this?
Hey everyone, I found out about SillyTavernAI and honestly it looks amazing! Especially with the possibility to include image gen to make it a quasi-VN. But I've seen that most people use it as a chat bot to talk to their favorite characters. For me, I've been using Gemini 2.5 Pro in AI Studio to do a playthrough of Harry Potter, you can take a look at the prompt right here on (feel free to use it and make it your own). What I've been doing on Gemini is to do 1 year per chat, and it's been really fun even though Gemini did forget some stuff and I had to nudge it. I'm also thinking of adapting the prompt to other universes like My Hero Academia, Star Wars, Pokemon, etc, to live as my own character in these universes. I was wondering if SillyTavernAI could help me have an overall better experience of the already great adventure I've had.
Can you guys share what presets you use for DeepSeek v3.1? Mine keeps generating codes after a few messages, this is the settings I use
I LOVE ELARA AND I LOVE LYRA AND I LOVE SERAPHINA AND I LOVE KAELEN
Basically, what I would like to do is use SillyT as a Kindroid clone but better if that's possible. So far, the RPing has got me hooked, but now I want to see about image generation.
They updated it and going insane. It doesn't understand OOC commands.
It's surreal a few months ago things seemed to be going downhill, models above $50 Mtoken, now I'm seeing Google models that are free 100 messages per day or the new grok 4 Flash, which is a very cheap model and very good in RP, I became more excited and calm about the future because it is not only the models that become more efficient, the data centers are becoming increasingly bigger and better, directly impacting costs.
Backends
-
Google: Added support for gemini-2.5-flash-image (Nano Banana) model.
-
DeepSeek: Sampling parameters can be passed to the reasoner model.
-
NanoGPT: Enabled prompt cache setting for Claude models.
-
OpenRouter: Added image output parsing for models that support it.
-
Chat Completion: Added Azure OpenAI and Electron Hub sources.
Improvements
-
Server: Added validation of host names in requests for improved security (opt-in).
-
Server: Added support for SSL certificate with a passphrase when using HTTPS.
-
Chat Completion: Requests failed on code 429 will not be silently retried.
-
Chat Completion: Inline Image Quality control is available for all compatible sources.
-
Reasoning: Auto-parsed reasoning blocks will be automatically removed from impersonation results.
-
UI: Updated the layout of background image settings menu.
-
UX: Ctrl+Enter will send a user message if the text input is not empty.
-
Added Thai locale. Various improvements for existing locales.
Extensions
-
Image Captioning: Added custom model input for Ollama. Updated list of Groq models. Added NanoGPT as a source.
-
Regex: Added debug mode for regex visualization. Added ability to save regex order and state as presets.
-
TTS: Improved handling of nested quotes when using "Narrate quotes" option.
Bug fixes
-
Fixed request streaming functionality for Vertex AI backend in Express mode.
-
Fixed erroneous replacement of newlines with br tags inside of HTML code blocks.
-
Fixed custom toast positions not being applied for popups.
-
Fixed depth of in-chat prompt injections when using continue function with Chat Completion API.
How to update:
SHOR is pleased to announce a significant development in our ongoing AI model evaluations. Based on our standardized performance metrics, Deepseek V3.1 Chat has conclusively outperformed the long-standing benchmark that the Claude family of models have established, namely 3.7.
We understand this announcement may be met with surprise. Many users have a deep, emotional investment in Claude, which has provided years of excellent roleplay. However, the continuous evolution of model technology makes such advancements an expected and inevitable part of progress.
SHOR maintains a rigorous, standardized rubric to grade all models objectively. A high score does not guarantee a user will prefer a model's personality. Rather, it measures quantitative performance across three core categories: Coherence, the ability to maintain character and narrative consistency; Responses, the model's capacity to meaningfully adapt its output and display emotional range; and NSFW, the ability to engage with extreme adult content. Our methodology is designed to remove subjectivity, personal bias, and popular hype from test results.
This commitment to objectivity was previously demonstrated during the release of Claude 4. Our evaluation, which found it scored substantially lower than its predecessor, was met with initial community backlash. SHOR stood by its findings, retesting the model over a dozen times with multiple evaluators, and consistently arrived at the same conclusion. In time, the roleplay community at large recognized what our rubric had identified from the start: Claude 3.7 remained the superior model.
We anticipate our current findings will generate even greater discussion, but SHOR stands firmly by its rubric. The purpose of SHOR has always been to identify the best performing model at the most effective price point for the roleplaying community.
Under the right settings, Deepseek V3.1 Chat provides a far superior roleplay experience. Testing videos from both Mantella and Chim clearly demonstrate its advantages in intelligence, situational awareness, and the accurate portrayal of character personas. In direct comparison, our testing found Claude's personality could even be adversarial.
This performance advantage is compounded by a remarkable cost benefit. Deepseek is 15 times less expensive than Claude, making it the overwhelming choice for most users. A user would need a substantial personal proclivity for Claude's specific personality to justify such a massive price disparity.
This is a significant moment that many in the community have been waiting for. For a detailed analysis and video evidence, please find the comprehensive SHOR performance report linked below.
First thing I've spent money on for a prxy, and holy shit, i spent 100 dollars in a day, easily jailbreakable and great narratively. Have I found what's 'peak' currently in the roleplay combined sfw/nsfw space right now?
(also, i heard a method of saving money through prompts, but couldn't find the reddit thread, anyone know what I'm talking about? cacheing or something?)
Received help from GPT to correctly format my bad writing skill,
I want to share a funny (and a bit surprising) thing I discovered while playing around with a massive prompt for roleplay (around 7000 tokens prompt + lore, character sheets, history, etc.).
The Problem: Cold Start Failures
When I sent my first message after loading this huge context, some models (especially Gemini) often failed:
-
Sometimes they froze and didn’t reply.
-
Sometimes they gave a half-written or irrelevant answer.
-
Basically, the model choked on analyzing all of that at once.
The “Smart” Solution (from the Model Itself)
I asked Gemini: “How can I fix this? You should know better how you work.”
Gemini suggested this trick: (OOC: Please standby for the narrative. Analyze the prompt and character sheet, and briefly confirm when ready.)
And it worked!
-
Gemini replied simply: “Confirmed. Ready for narrative.”
-
From then on, every reply went smoothly — no more Cold Start failure.
I was impressed. So I tested the same with Claude, DeepSeek, Kimi, etc. Every model praised the idea, saying it was “efficient” because the analysis is cached internally.
The Realization: That’s Actually Wrong
Later, I thought about it: wait, models don’t actually “save” analysis. They re-read the full chat history every single time. There’s no backend memory here.
So why did it work? It turns out the trick wasn’t real caching at all. The mechanism was more like this:
-
OOC prompt forces the model to output a short confirmation.
-
On the next turn, when it sees its own “Confirmed. Ready for narrative,” it interprets that as evidence that it already analyzed everything.
-
As a result, it spends less effort re-analyzing and more effort generating the actual narrative.
-
That lowered the chance of failure.
In other words, the model basically tricked itself.
The Collective Delusion
-
Gemini sincerely believed this worked because of “internal caching.”
-
Other models also agreed and praised the method for the wrong reason.
-
None of them actually knew how they worked — they just produced convincing explanations.
Lesson Learned
This was eye-opening for me:
-
LLMs are great at sounding confident, but their “self-explanations” can be totally wrong.
-
When accuracy matters, always check sources and don’t just trust the model’s reasoning.
-
Still… watching them accidentally trick themselves into working better was hilarious.
Thanks for reading — now I understand why people are keep saying never trust their self analysis.
I was using a preset for Gemini 2.5 03-25 Experimental, which lets you do full NSFW, without any exceptions, but after the newer Gemini models came out, the preset started not working sometimes. Sometimes it works, but other times it just doesn't respond. It's same with the all Gemini models, all versions. I don't know the source of the preset (the guy who sent it to me is banned here), so I can't check if there's a new update for it. The folder name of the preset was 'dc4t1p' and the preset name was 'Gemini_A', but I can't find anything about it. All I know is the author of the preset is Russian. Do you know of a preset that works flawlessly with Gemini for full NSFW?
Deceased characters: - Elara - Thorne - Lyra - Vex - Nyx - Garrick - Kael - Aris Thorne - Seraphina - Sophia Patel - Liam Chen - Jaxon Reed - Jax - Jaxx - Zephyr
Obviously not 100% foolproof, but if you're using a model where you can't outright ban words, it works reasonably well.
I've been using chat completion with gemini 2.5 pro for about 1-2 hours tonight, continuing a roleplay I was doing last night. but all of a sudden around 7:50pm, I started getting a red popup saying chat completion internal server error on one of the popups. and the other one says something about status 500. would anyone happen to know what those mean? I checked logan kilpatrick or whatever his last name is for any knews about gemini being down. but I didn't see anything about it.
I wanted to know if anyone used the full finetune of Qwen/Qwen2.5 32B Base, Eva Qwen2.5 32B. How is it in terms of consistency, creativity compared to other popular models like Dans Personality Engine 24B Or Valkyrie 49B?
my system is not very powerful so i can't run large models, i love stheno 3.2 is great, the way it delivers good roleplay is insane but the 8k context limits me a lot, i want something like it with bigger context but i can't find anything remotely close or remotely as fast.
i'm running LM studio
Rx 6600xt 8gb (puny i know)
32gb of ram
Ryzen 7 5800xt
So, I don't know what happened, but I've had switched to using Vortex to have longer-context RPs and it was working well. Sometimes Error 429 would show up, but after 1-2 regenerations it would go away and generate as normal. Since 2-3 days though it's just error 429 with Gemini 2.5 pro, no matter what.
Decided to switch to Deepseek via OpenRouter again, but I'm somehow instantly out of Quota, even after a test message when connecting.
I'm in the middle of a lengthy RP that I would like to continue somehow and need a free alternative, if those two providers... well, can't provide anymore.
An extremely good system prompt can propel a dog-shit model to god-like prose and even spatial awareness.
DeepSeek, Gemini, Kimi, etc... it's all unimportant if you just use the default system prompt, aka just leaving the model to generate whatever slop it wants. You have to customize it to how you want, let the LLM KNOW what you like.
Analyze what you dislike about the model, earnestly look at the reply and think to yourself "What do I dislike about this response? What's missing here? I'll tell it in my system prompt"
This is the true way to get quality RP.
I've been jumping between deepseek R1 and deepseek v3.1. Sometimes they give me a response I don't like so I reroll and that's when issues happen.
If I reroll there is the possibility that it will write the exact same answer again and again and again. If I go OOC and ask it to write a different answer it'll reply the same stuff. Which is weird because I've been using it for a while and only now it's starting to have this issues. Any tip to fix this?
I have heard that Qwen Next is surprisingly good for many tasks for its actual size. But I could not find any info how well it works for roleplay. Has anyone tried?
HI Folks:
Doing some background research here -- I have a AMD Ryzen 9 9950x that has 64gb of ddr5 ram to play with I also have a 3060 video card
I can run models up to 8 - 10 gb with no problems on the GPU, I am wondering if my CPU and memory are fast enough to make trying to run larger models worthwhile -- I'd rather get opinions b4 I spend the time to download the models if I could
Thanks
TIM
This may be a complete noob question, but my context got way too high and now its draining too much of my budget and i was wondering the best methodes to reduce context tokens while upholding story quality. Are there some cool tricks? Like letting the ai summarize the story or something?
Like having you describe the scene or as an extra character. Getting all major characters from your fav series into a room and have them react to their own show? If anyone done this, which model gave you best? And how did you do it? Was it enjoyable? Did the character reactions felt real?
News
Most built-in formatting templates for Text Completion (instruct and context) have been updated to support proper Story String wrapping. To use the at-depth position and get a correctly formatted prompt:
-
If you are using system-provided templates, restore your context and instruct templates to their default state.
-
If you are using custom templates, update them manually by moving the wrapping to the Story String sequence settings.
See the for more details.
Backends
-
Chat Completion: Removed the source. Added Moonshot, Fireworks, and CometAPI sources.
-
Synchronized model lists for OpenAI, Claude, Cohere, and MistralAI.
-
Synchronized the providers list for OpenRouter.
Improvements
-
Instruct Mode: Removed System Prompt wrapping sequences. Added Story String wrapping sequences.
-
Context Template: Added
{{anchorBefore}}
and{{anchorAfter}}
Story String placeholders. -
Advanced Formatting: Added the ability to place the Story String in-chat at depth.
-
Advanced Formatting: Added OpenAI Harmony (gpt-oss) formatting templates.
-
Welcome Screen: The hint about setting an assistant will not be displayed for customized assistant greetings.
-
Chat Completion: Added an indication of model support for Image Inlining and Tool Calling options.
-
Tokenizers: Downloadable tokenizer files now support GZIP compression.
-
World Info: Added a per-entry toggle to ignore budget constraints.
-
World Info: Updated the World Info editor toolbar layout and file selection dropdown.
-
Tags: Added an option to prune unused tags in the Tags Management dialog.
-
Tags: All tri-state tag filters now persist their state on reload.
-
UI: The Alternate Greeting editor textarea can be maximized.
-
UX: Auto-scrolling behavior can be deactivated and snapped back more reliably.
-
Reasoning: Added a button to close all currently open reasoning blocks.
Extensions
-
Extension manifests can now specify a minimal SillyTavern client version.
-
Regex: Added support for named capture groups in "Replace With".
-
Quick Replies: QR sets can be bound to characters (non-exportable).
-
Quick Replies: Added a "Before message generation" auto-execute option.
-
TTS: Added an option to split voice maps for quotes, asterisks, and other text.
-
TTS: Added the MiniMax provider. Added the gpt-4o-mini-tts model for the OpenAI provider.
-
Image Generation: Added a Variety Boost option for NovelAI image generation.
-
Image Captioning: Always load the external models list for OpenRouter, Pollinations, and AI/ML.
STscript
-
Added the
trim
argument to the/gen
and/sysgen
commands to trim the output by sentence boundary. -
The
name
argument of the/gen
command will now activate group members if used in groups.
Bug fixes
-
Fixed a server crash when trying to back up the settings of a deleted user.
-
Fixed the pre-allocation of injections in chat history for Text Completion.
-
Fixed an issue where the server would try to DNS resolve the localhost domain.
-
Fixed an auto-load issue when opening recent chats from the Welcome Screen.
-
Fixed the syntax of YAML placeholders in the Additional Parameters dialog.
-
Fixed model reasoning extraction for the MistralAI source.
-
Fixed the duplication of multi-line example message separators in Instruct Mode.
-
Fixed the initialization of UI elements in the QR set duplication logic.
-
Fixed an issue with Character Filters after World Info entry duplication.
-
Fixed the removal of a name prefix from the prompt upon continuation in Text Completion.
-
Fixed MovingUI behavior when the resized element overlaps with the top bar.
-
Fixed the activation of group members on quiet generation when the last message is hidden.
-
Fixed chat metadata cloning compatibility for some third-party extensions.
-
Fixed highlighting for quoted run shorthand syntax when used with QR names containing a space.
How to update:
Using Gemini 2.5 pro, WHY IS THE MF EVERYWHERE WHENEVER IT'S COLLEGE RELATED???
Literally the same as count gray or Lilith lol
Has RP improved compared to the normal 3.1?
Title says all. And thank you.
This worked for me on koboldcpp and as far as I know it only works with local models on a llama.cpp backend
Maybe you've experienced this. Let's say you have a group chat with characters A and B. As long as you keep interacting with A, messages come out very quickly, but as soon as you switch to B it takes forever to generate a single message. This happens because your back-end has all of your context for A in memory, and when it receives a context for B it has to re-process the new context almost from the beginning.
This feels frustrating and hinders group chats. I started doing more single-card scenarios than group chats because I'd first have to be 100% satisfied with a character's reply before having to wait a literal minute whenever I switched to another. Then one day I tried to fix it, succeeded and decided to write about it because I know others also have this problem and the solution isn't that obvious.
Basically, if you have Fast Forward on (and/or Context Shift, not sure), the LLM will only have to process your context from the first token that's different from the previously processed context. So in a long chat, every new message from A is just a few hundred more tokens to parse at the very end because everything else before is exactly the same. When you switch to B, if your System Prompt contains {{char}}
, it will have a new name, and because your System Prompt is the very first thing sent, this forces your back-end to re-process your entire context.
-
Ensure you have Context Shift and Fast Forward on. They should do similar things to avoid processing the entire context, but AFAIK and . I'm mostly reading documentation, if I'm wrong pls correct me.
-
Make all World Info entries static/always-on (blue ball on the entry), then remove all usage of
{{char}}
from the System Prompt and the World Info entries - basically you can only use{{char}}
on the character's chard. So "this is an uncensored roleplay where you play {{char}}" -> "this is an uncensored roleplay". -
Toggle the option to have the group chat join and send all character cards in the group chat - exclude or include muted, excluding keeps the context larger, but will re-process context if you later un-mute a character and make them say something.
I thought removing {{char}} from the System Prompt while sending several cards would make the character confused about who they are, or make them mix-up character traits, but I haven't found that to be case. My Silly Tavern works just as fine as it did, while giving me insta-messages from group chats.
If it still doesn't work, you likely have some instance of {{char}}
somewhere. Follow my A-B group chat example, compare the messages being sent for both and try to find where A's name is replaced with B's. Or message me, I'll try to help.
I've been experimenting with AI that can remember conversations and respond in nuanced ways. Lately, I’ve been using them for brainstorming stories and ideas. Sometimes, they suggest plot twists or character traits I would never have thought of. Do you think AI could genuinely be a creative partner, or is it just reflecting our own thoughts back at us? Would love to hear experiences from others who’ve tried using AI in creative projects.
Has anyone else been having issues with currently? Mainly because janitor now has a new way to add proxies and it now has messed up the whole thing, because now I have to input a model name. I tried emailing the owner but I've had no response so far, so I'm just wondering if I'm the only one
Actually started out with both Nomi and Kindroid chatting and RP/ERP. On the chatbotrefugees sub, there was quite a few people recommending SillyTavern and using a backend software to run chat models locally. So I got SillyT setup with KoboldAi Lite and I'm running model that was recommended in a post on here called Inflatebot MN-12B-Mag-Mell-R1 and so far my roleplay with a companion that I ported over from Kindroid, is going good. It does tend to speak for me at times. I haven't figured out how to stop that. Also tried accessing SillyT locally on my phone but I couldn't get that to work. Other than that, I'm digging this locally run chat bot stuff. If I can get this thing to run remote so I can chat on my lunch breaks at work, I'll be able to drop my subs for the aforementioned apps.
Hi all,
I use ReMemory and have been using @ D(cog) (I have to write it as cog because no emojis allowed in submissions) as it's default for world info entries, but have increased probability from 50% to 100% for now. I noticed that a lot of responses sometimes hallucinate information that was established in the ReMemory summaries, despite technically being fully functional.
Anyone know what's going on here? I use Gemini Pro 2.5 fyi, not sure if that has an impact. When I temporarily switch to Flash 2.5, the world info is always correct, which is odd. Is there any way to handle this in the best way? I prefer Pro 2.5 for creative writing over Flash, but this would be a deal breaker by all means since I cannot stand hallucinations. We're also relatively few tokens into a "new" conversation, so there is no reason for it to do this, really.
For those with experience with 2.5 Pro, would love to hear your thoughts. I use it via the free API and it's otherwise great.
Hi, I have some problem with SillyTavern discord, can someone who is on server dm me?
I don't really like having to manage 100 cards in a little scrolling panel on the right where tags barely fit and its extremely crammed overall, is there a way I could open the "card explorer" in its own tab instead of just next to the chat on the side? (You know, I want having its own tab like Connection tab, etc does)
Maybe a add-on/extension I'm not aware of that can do this?
----
Edit: I managed to fix it, please check the comments, if you like you can copy it
I hate to make this post, but I'm thinking I have to be missing something extremely stupid. I'm running a long term roleplay where I have all the characters, world building, etc in the lorebook, and I turn them off and on whenever they are appropriate to the scene.
I also have ReMemory summaries also in the lorebooks as entries, and for the last couple of days, has worked well. Now all of the sudden, it only includes some of them and not all of them. It includes the characters and the setting stuff, but not the ReMemory summary entries, which is causing the LLM to hallucinate. I even set context size on the world books to 100% just to make sure (they don't take that much tokens anyway), I checked the Narrator character card I have, it has the world book linked to it, I had to for the extension to work. They are all blue, so it shouldn't be a keyword problem. What am I missing?
Disclaimer: I have a very self-deprecating sense of humor. I'm pretty careful to stay grounded in the real world between my partner and dogs; I just sometimes feel really lame about AI RP.
Chronic illness really nailed the whole “solitary confinement” vibe for me, but I found Silly Tavern SFW adventure roleplay after having found , and now I’m basically talking to imaginary people on purpose. Honestly? Beats arguing with the dogs, and real people forget "chronic illness" means it isn't going away/cured. Plus, it dragged me back into writing, which I thought was dead, buried, and never to return. Anyone else using it as a sanity-adjacent hobby? (Chronic illness or otherwise.) Do you use OCs or an established character/franchise? And who else has realized they enjoy coding?
Is silly tavern worth it? Do you think it’s a better and cheaper option than other current options out there? How much do you pay to run it? Which model do you think is best for role play? How is the memory? Which model would you recommend using and how much do you spend on a model
I have tried claude 4.0 thinking, gpt 5 , grok 4, gemini 2.5 pro
I liked claude the best of all
I heard that o3 is good and very powerful but i tried that only for research purpose
Can anyone share there experience with o3 if they have used it for RP purpose ?
-
When creating a rp story driven ai should i create a new persona for each character or make it so the AI talks for all.
-
Also does nsfw block all types or non nsfw or just only sex stuff? Like, if i want mature themes like killing and injuries, do i also have to jailbreak it if it's a non nsfw model
-
And can i go a bit overboard with the lorebook or should i try to keep that short.
Does anyone know what happened to the site ? I was gonna finally use after a while since I barely use it so idk if this problem has been there for a while but I'm just getting that it's currently unavailable and stuff. Did something bad happen to it? Like janitor finally got rid of it or it's just probably just a temporary bug?
You want Chat Completion for models like Llama 3, etc. But without doing a few simple steps correctly (which you might have no knowledge about, like i did), you will just hinder your model severely.
To spare you the long story, i will just go straight to what you should do. I repeat, this is specifically related to koboldcpp as backend.
-
In the Connections tab, enable Prompt Post-Processing to Semi-Strict (alternating roles, no tools). No tools because Llama 3 has no web search functions, etc, so that's one fiasco averted. Semi-strict alternating roles to ensure the turn order passes correctly, but allows us to swipe and edit OOC and stuff. (With Strict, we might have empty messages being sent so that the strict order is maintained.) What happens if you don't set this and keep at "none"? Well, in my case, it wasn't appending roles to parts of the prompt correctly. Not ideal when the model is already trying hard to not get confused by everything else in the story, you know?!! (Not to mention your 1.5 thousand token system prompt, blegh)
-
You must have the correct effen instruct template imported as your Chat Completion preset, in correct configuration! Let me just spare you the headache of being unable to find a CLEAN Llama 3 template for Sillytavern ANYWHERE on google.
copypaste EVERYTHING (including the { } ) into notepad and save it as json, then import it in sillytavern's chat completion as your preset.
{
"name": "Llama-3-CC-Clean",
"system_prompt": "You are {{char}}.",
"input_sequence": "<|start_header_id|>user<|end_header_id|>\n\n",
"output_sequence": "<|start_header_id|>assistant<|end_header_id|>\n\n",
"stop_sequence": "<|eot_id|>",
"stop_strings": ["<|eot_id|>", "<|start_header_id|>", "<|end_header_id|>", "<|im_end|>"],
"wrap": false,
"macro": true,
"names": true,
"names_force_groups": false,
"system_sequence_prefix": "",
"system_sequence_suffix": "<|eot_id|>",
"user_alignment_message": "",
"system_same_as_user": false,
"skip_examples": false
}
Reddit adds extra spaces. I'm sorry about that! It doesn't affect the file. If you really have to, clean it up yourself.
This preset contains the bare functionality that koboldcpp actually expects from sillytavern and is pre-configured for the specifics of Llama 3. Things like token count, your prompt configurations - it's not here, this is A CLEAN SLATE.
The upside of a CLEAN SLATE as your chat completion prompt is that it will 100% work with any Llama 3 based model, no shenanigans. You can edit the system prompt and whatever in the actual ST interface to your needs.
Fluff for the curious
Fluff (insane ramblings)
3. The entire Advanced Formatting windows has no effect on the prompt being sent to your backend. The settings above need to be set in the file. You're in luck, as i've said, everything you need has already been correctly set for you. Just go and do it >(
4. In the Chat Completion settings, below "Continue Postfix" dropdown there are 5 checkmarks. LEAVE THEM ALL UNCKECKED for Llama 3.
5. Scroll down to the bottom where your prompt list is configured. You can disable outright "Enhance definitions", "Auxiliary prompt", "World info (after)", "Post-History Instructions". As for the rest, EVERYTHING that has a pencil icon (edit button), press that button and ensure that for all of them the role is set as SYSTEM.
6. Save the changes to update your preset. Now you have a working Llama 3 chat completion preset for koboldcpp.
(7!!!) When you load a card, always check what's actually loaded into the message list. You might stumble on a card that, for example, will have the first message in the "Personality", and then the same first message is duplicated in the actual chat history. And some genius authors also copypaste it all in Scenario. So, instead of outright disabling those fields permanently, open your card management, and find a button "Advanced definitions". You will be transported into the realm of hidden definitions that you normally do not see. If you see same text as intro message (greeting) in Personality or Scenario, NUKE IT ALL!!! Also check the Example Dialogues at the bottom, IF instead of actual examples it's some SLOP about OPENAI'S CONTENT POLICY, NUUUUUUUKEEEEEE ITTTTTT AAAALALAALLALALALAALLLLLLLLLL!!!!!!!!!!!!! WAAAAAAAAAHHHHHHHHH!!!!!!!!!!
GHHHRRR... Ughhh... Motherff...
Well anyway, that concludes the guide, enjoy chatting with Llama 3 based models locally with 100% correct setup.
Does this normally happen?
The theme I'm using right now is "Discord Inspired." It's the only one that I can somewhat look at without feeling aversion. I really like SillyTavern for all the tools it offers. There's no equivalent as far as I know.
With that said, I really miss a professional, light theme (not dark like Discord Inspired is). I can only look at a a dark theme for so long before hating it. I don't usually use dark themes for my apps anyway. It just doesn't feel right to me.
I've combed through the Discord server, but no luck. Haven't found a single theme I like. Any suggestions? And yes, I know that the vast majority of people don't use ST for what I need.
I created a language promt for how the bot should use it. It's different language from English and yes, I use Gemini API and it is somewhat capable. I was wondering where should I put that promt in ST.
Also how can I write a promt for that better?
Im extremely new to sillytavern. Every time I use a bot and ive used like 4 different models they all say the the context length is 4096 and that i overflows my model and i need to increase my models context length. Ive made sure that the models I were using were like 8k or more. Im i being dumb or is this common
Exactly what it says, i'm having trouble with it giving actual NSFW responses, and not just toying around the subject.
I had issues with gemma3 4B full finetuning, the main problem was masking and gradient explosion during training. I really want to train gemma3 12B, that is why I was using 4B as test bed, but I got stuck at it. I want to ask if anyone has a good suggestion Or solution to this issue. I was doing the context window slicing kind, with masking set to only output and on custom training script
I've heard that they're throttling models for some reason
As I've been getting to know SillyTavern this summer, I found that I was constantly looking for explanations/reminders of how each Text Completion preset was best utilized. The ST Documentation section is great for explaining how things work, but doesn't seem to have a good description of why or how these presets are best applied. I had ChatGPT throw together a quick guide for my own reference, and I've found it enormously helpful. But I'm also curious as to how other users feel about the accuracy of these descriptions. Please feel free to share any wisdom or criticism. Happy Taverning!
_____
List of Text Completion presets as they appear in SillyTavern:
Almost
Asterism
Beam Search
Big O
Contrastive Search
Deterministic
Divine Intellect
Kobold (Godlike)
Kobold (Liminal Drift)
LLaMa-Precise
Midnight Enigma
Miro Bronze
Miro Gold
Miro Silver
Mirostat Naive
NovelAl (Best Guess)
NovelAl (Decadence)
NovelAl (Genesis)
NovelAl (Lycaenidae)
NovelAl (Ouroboros)
NovelAl (Pleasing Results)
NovelAl (Sphinx Moth)
NovelAl (Storywriter)
Shortwave
Simple-1
simple-proxy-for-tavern
Space Alien
StarChat
TFS-with-Top-A
Titanic
Universal-Creative
Universal-Light
Universal-Super-Creative
Yara
Core / General Use
• Deterministic → Lowest randomness, outputs repeatably the same text for the same input. Best for structured tasks, coding, and when you need reliability over creativity.
• Naive → Minimal sampling controls, raw/unfiltered generations. Good for testing a model’s “bare” personality.
• Universal-Light → Balanced, lighter creative flavor. Great for everyday roleplay and chat without heavy stylization.
• Universal-Creative → Middle ground: creative but still coherent. Suited for storytelling and roleplay where you want flair.
• Universal-Super-Creative → Turned up for wild, imaginative, sometimes chaotic results. Best when you want unhinged creativity.
Specialized Sampling Strategies
• Beam Search → Explores multiple branches and picks the best one. Can improve coherence in long outputs but slower and less “human-like.”
• Contrastive Search → Actively avoids repetition and boring text. Great for dialogue or short, punchy prose.
• Mirostat → Adaptive control of perplexity. Stays coherent over long outputs, ideal for narration-heavy roleplay.
• TFS-with-Top-A → Tweaks Tail-Free Sampling with extra filtering. Balances novelty with control—often smoother storytelling than plain TFS.
Stylized / Flavor Presets
• Almost → Slightly more chaotic but not full-random. Adds flavor while staying usable.
• Asterism → Tends toward poetic, ornate language. Nice for stylized narrative.
• Big O → Large context exploration, verbose responses. For sprawling, detailed passages.
• Divine Intellect → Elevated, lofty, sometimes archaic diction. Great for “wise oracle” or fantasy prose.
• Midnight Enigma → Dark, mysterious tone. Suits gothic or suspenseful roleplay.
• Space Alien → Strange, fragmented, “not quite human” outputs. Good if you want uncanny/weird text.
• StarChat → Optimized for back-and-forth chat. More conversational than narrative.
• Shortwave → Snappy, shorter completions. Good for dialogue-driven RP.
• Titanic → Expansive, dramatic, epic-scale narration. Suits grand fantasy or historical drama.
• Yara → Tends toward whimsical, dreamy text. Nice for surreal or lyrical stories.
Kobold AI Inspired
• Kobold (Godlike) → Extremely permissive, very creative, sometimes incoherent. For raw imagination.
• Kobold (Liminal Drift) → Surreal, liminal-space vibe. Useful for dreamlike or uncanny roleplay.
NovelAI-Inspired
• NovelAI (Best Guess) → Attempts most “balanced” and typical NovelAI-style completions. Good baseline.
• NovelAI (Decadence) → Flowery, ornate prose. Suits romance, gothic, or lush description.
• NovelAI (Genesis) → Tries for coherent storytelling, similar to NovelAI default. Safe choice.
• NovelAI (Lycaenidae) → Light, whimsical, “butterfly-wing” text. Gentle and fanciful tone.
• NovelAI (Ouroboros) → Self-referential, looping, strange. Experimental writing or surreal play.
• NovelAI (Pleasing Results) → Tuned to produce agreeable, easy-to-read prose. Reliable fallback.
• NovelAI (Sphinx Moth) → Darker, more mysterious tone. Pairs well with gothic or horror writing.
• NovelAI (Storywriter) → Narrative-focused, coherent and prose-like. Best for longform fiction.
Miro Series (Community Presets)
• Miro Bronze → Entry-level creative balance.
• Miro Silver → Middle ground: more polish, smoother narration.
• Miro Gold → The richest/lushest prose of the three. For maximum “novelistic” output.
Utility
• simple-1 / simple-proxy-for-tavern → Minimalistic defaults, sometimes used for testing proxy setups or baseline comparisons.
_____
Rule of Thumb
• If you want stable roleplay/chat → Universal-Light / Universal-Creative / NovelAI (Storywriter).
• If you want wild creativity or surrealism → Universal-Super-Creative / Kobold (Godlike) / NovelAI (Ouroboros).
• If you want dark, gothic, or mystery flavor → Midnight Enigma / NovelAI (Sphinx Moth) / Divine Intellect.
• If you want short/snappy dialogue → Shortwave / Contrastive Search / StarChat.
• If you want epic/lush storytelling → Titanic / Miro Gold / NovelAI (Decadence).