Output Format

I just downloaded sillytavern...

Gemini's acting up again so I just wanna ask if anyone has been able to make free claude usable at all. I'm adamant that I won't pay for AI gooning

u/pianoprofitonal_1

•

5 days ago

I just downloaded sillytavern...

REVIEW WISDOM GATE "FREE DEEPSEEK" PROVIDER

u/Omega-nemo

•

1 day ago

REVIEW WISDOM GATE "FREE DEEPSEEK" PROVIDER

It feels like we aren't really 'there' yet with the whole Roleplay stuff

(DISCLAIMER: Wisdom Gate (juheapi) is supposed to be a provider that offers models like Deepseek for free, as well as other similar ones, although after my explanation, I'm not sure how convinced you'll be.)

I discovered by chance—in fact, after publishing two posts (FREE DEEPSEEK V3.1 FOR ROLEPLAY and ALL FREE DEEPSEEK V3.1 PROVIDERS), which had a fair amount of success and visibility—that a user whose name I won't reveal shortly afterward published posts that were very similar, if not entirely copied (especially the second one) to mine. He also added a Wisdom Gate website, which, after some simple research, I discovered was his. Intrigued, I tried the site and I'm not saying it's a scam but it's very unfair, for example, a token is equivalent to about 4 characters in English and is always dynamic, never static, while on his site it's not like that, I did a first test with a message of about 674 tokens for normal standards (openAI, etc.) while on his site there were 1858 tokens about 2.75 more, I did a second test with a different account, with a single request for 299 tokens inexplicably, on his site the requests had become 3 with 19k+ tokens spent, finally I did a third test with another account and with a single request for 300+ tokens on his site there were 10k+ tokens, which makes the tokens dynamic and not static. But we're good, so let's pretend the first two are just bugs. Deepseek V3.1 Terminus, Deepseek's latest creation, has been released. On their official website, it costs roughly $2 for input and output per million tokens, while on Wisdom Gate it costs $4 for input and $12 for output. Doing some calculations and pretending that tokens are static at a 5:1 ratio, typical in roleplays, for a normal million tokens, i.e. the system used by Deepseek, Openai, etc., you would end up spending roughly $30 per million tokens. For example, if you raised $1,500 on Wisdom Gate with an average monthly consumption of 1 million tokens, it would last about 50 months; on Deepseek, it would last about 750 months.

So, here's what this developer did that was unfair:

1 copying and plagiarizing my posts, without asking me anything to sponsor his site.

2. Don't openly declare that he owns the site because he writes "I found" in both posts, which is misleading.

3. Inflate prices and tokens (making tokens dynamic, not static), thus charging a regular user much more.

So, Wisdom Gate is absolutely not recommended. If you don't believe me, you can check for yourself. I have proof and screenshots to refute any excuse.

u/ibiza6

•

1 mo. ago

It feels like we aren't really 'there' yet with the whole Roleplay stuff

Chatstream v3 - Universal preset, now with Styles and POVs

For the past few months, I went into the whole craze of the Chatbot stuff, eventually giving a try in trying to run one myself, Since the first time was exciting.

But at this point, It such a freaking headache at this point and not really worth it with how much restriction there is with everything.

Want the big smart LLM that can be creative and follow instructions properly? Pay monthly subscription and have your chats non private. Oh, Also Censorship.

Want to host your own local model and actually have privacy? Get a company grade Graphics cards or deal with running a weak Models that get repetitive and fail to follow instructions most of the time.

Like, I enjoy the whole Roleplay chat stuff, but with the options currently, it simply isn't worth it. I just hope in the future this will get improved. Until then, I am taking step back.

u/McVitiesOfficial

•

Promoted

Feeling lonely r/London? It's time for a biscuit break.

u/eteitaxiv

•

21 hr. ago

Chatstream v3 - Universal preset, now with Styles and POVs

Touché, Deepseek. Touché.

The core of the preset is the same, but I have solved (I think) POV problems some people reported, I never had the problem where the characters use wrong POVs, so I can't be sure.

I revised lengths to work better, and added Styles. They work well, and offer different tones. To be honest, the preset feels very complete, I don't know where to go from here.

I also set "Character Names Behavior" to "None". If your card impersonates, you can try "Message Content."

Before you start, "Prompt Post-Processing" should be set to "Strict" with the presets. It makes a meaningful difference.

Also, I want to remind you again that this preset is made for prose-style RP. "Speech" in quotation marks, italics for thoughts, proper paragraphs, everything in prose. If this is not what you want, you are looking at the wrong preset.

Chatstream v3:

I use Chatstream with all models. Load it and check various styles.

Now... some suggestions for your cultural activities:

When bored, disregard the first message. Really, just make the model regenerate it. "Initial User Message" module is set to enable regeneration of a well made first message. If you want to direct the first message, use "Author's Note" in-chat at depth 1 as System.
Don't use response length modules before trying the model without it.
Actually, when you use "Author's Note", I suggest always using it at in-chat at depth 0 as System. Use it for one message only, and remove it after it did its job. It works really well as directions for one response.
If you want to use a reasoning model, I suggest enabling "Reasoning" module. It directs the model's thinking for RP. I believe it works well.
If you use other instructions like ones in a lorebook, or some other instructions are in the card itself (like people writing 'don't talk as {{user}}' or similar stuff in their cards), I suggest you to disable/delete them. Preset already has instructions, more (and sometimes conflicting) instructions will only confuse AI.
If the model doesn't write dialogue, enable Dialogue-Driven, it usually fixes it.
"NSFW Toggle" is not for always keeping it enabled. If your card is NSFW, the preset will play it as NSFW. It is more for forcing SFW cards, or SFW-states in your RP with NSFW card, into NSFW. And it enhances NSFW writing, you can also enable it for that when the current state is NSFW.
"Raw NSFW" is an addon to "NSFW Toggle," I don't recommend using it without "NSFW Toggle."
"Soft Jailbreak" is not a jailbreak. It just nudges models into a little more cursing, immorality, and all that. Use it with overly moral models, not for jailbreaking. This preset doesn't have anything intended as a true jailbreak.
I mostly use DeepSeek v3.1 without reasoning, or GLM-4.5 without reasoning. TNG-R1T2-Chimera is the reasoning model I use the most.

u/BuyerBeneficial398

•

6 days ago

Touché, Deepseek. Touché.

NanoGPT

u/armymdic00

•

17 hr. ago

NanoGPT

Gateway for Wyoming TTS servers.

So I started using NanoGPT, was super excited because it is SO much less expensive than the Deepseek official API...but, I am getting so many:

Chat completion request error: Service Unavailable {"error":{"message":"All available services are currently unavailable. Please try again later. ","s tatus": 503," type": "service_unavailable"," param":null,"code":" all_fallbacks_failed"}}

errors. Like, nonstop. Is it something on my end? Other APIs working fine, but NanoGPT not so much.

u/mitrokun

•

2 hr. ago

Gateway for Wyoming TTS servers.

I've come to the conclusion that I'm an addict...

I actively use Voice Home Assistant and have a local server deployed in my home network for speech generation. Since I didn't find a ready-made solution for connection, I [vibe]coded a simple converter for the OpenAI compatible protocol. It works quite stably. All the voices that the server provides can be used in chat for different characters.
For some reason, the option to disable the narrator's voiceover doesn't work for me, but it seems to be a bug of the ST itself.

I'll be glad if it comes in handy for someone.

u/unbruitsourd

•

21 days ago

I've come to the conclusion that I'm an addict...

Is Sillytavern the way to go?

I don't even know why I'm sharing this here. Probably because I don't have anyone to talk to about it in person.

After more than 3 years of using Silly Tavern intensively, I came to the realisation that ERP had become problematic for my mental health. I don't come from a background that's conducive to addictions or mental health issues (well-balanced family and professional life, no major income problems, no major health issues, etc.), but it's clear that I'd hit a wall. Every day, Silly Tavern was open on my PC as a sideline to my work. Needless to say, it ended up having a drastic impact on my productivity and a large part of my free time. Luckily I was able to resist installing it on my cellphone, but I was still using the local network profusely (my main PC is a media centre that's always open).

So last night I deleted all my folders, presets, cards, etc. in the hope that having no back-up and having to reconfigure everything to my liking would be enough to keep me away from it until I'd completely given up. I feel like an alcoholic who's just got rid of his strong bottles.

Have any of you come to the same conclusion, that you're an addict? If not, how often do you use SillyTavern?

u/mananassnl

•

1 day ago

Is Sillytavern the way to go?

ALL FREE DEEPSEEK V3.1 PROVIDERS

Hello community, thanks for reading this post.

I've only recently discovered the world of AI roleplaying and have been testing out different sites, just to find out none of them are quite what I'm looking for. Let me try to summarize some of the things I'd ideally want:

Longer roleplay and world-building, spanning over multiple sessions.
Introducing and scrapping characters as the story progresses.
(!!) A long memory so I can actually build up meaningful relationships with the characters.
NSFW, whether it is violence or sexual, to be possible.

I have tried some sites, but those mainly seem to lean into the AI-Girlfriend kind of thing. Ideally I'd want to create a much bigger story where the AI-Girlfriend kind of experience is just a part of it. Some of the most annoying/immersion-breaking experiences so far have been loops where the character just starts to repeat the same scenario over and over again, the AI not trying to advance any plot or just the AI forgetting important details that either just happened or happened longer ago in the story.

Currently I'm looking at giving SillyTavern a try together with OpenRouter and chat vectorization. I would be extremely grateful for any advice. Is this likely to match what I'm looking for or would I be better off with a different commercial solution?

(Bonus question: I see some sites specifically advertise longer memory for meaningful interactions. Are they actually using some in-house solution or is this just a bigger context size and/or chat vectorization with a bit of marketing flair?)

Thanks so much for reading, this is still new to me and I'm hoping to learn.

u/CatchTheVoid

•

Promoted

You are a cat. WOW. Annoy or love your human in this game!

store.steampowered.com

u/Omega-nemo

•

3 days ago

ALL FREE DEEPSEEK V3.1 PROVIDERS

Is there any model that can understand subtext at all?

Today I'll list all the providers (so far) I've found that offer Deepseek V3.1 for free. (Disclaimer: Many of these providers only work on Sillytavern.)

●4EVERLAND offers deepseek for free with no written limits, but it might only work if you connect your credit card, I don't know, also as soon as you add a payment method they will give you 1000000 LAND, their currency.

●Alibaba Cloud offers one million free tokens to all new users who register.

●Atlascloud offers $0.10 free per day, which is about 230 free messages per day if you set the token length limit to 200; if you set it to 500, it's about 100.

●Byteplus ModelArk offers 500,000 free tokens to new users, and by inviting friends, you can reach a maximum of $45 per invite. It only works via VPN, preferably in Indonesia.

●CometAPI is supposed to offer one million free tokens to all users who register, although I don't know if it actually does.

●LLM7, offers deepseek V3.1 for free with limits such as 20 requests per second, 150 requests per minute and 4500 requests per hour with a maximum of 1800 tokens per minute.

●NVIDIA NIM APIs offers completely free access to deepseek, with the only limit being 40 requests per minute.

●Openrouter offers deepseek for free, but with a daily limit of 50 messages.

●Routeway AI, an emerging site that offers deepseek for free with a limit of 200 requests per day (currently 100 because it counts requests and responses separately); you may be subject to a waitlist.

●SambaCloud offers $5 free upon registration and theoretically free access to deepseek with 400 requests per day, although I'm not 100% sure.

●Siliconflow (Chinese edition) offers 14 yuan ($1.97) upon registration and 14 yuan for each friend you invite and register.

●Vercel AI offers $5 free every month.

Now I'll tell you about the free ones, but they require a credit card to register.

●AWS Bedrock/Lambda offers a free $100 signup fee, which can be increased to $200 if you complete tasks.

●Azure offers a free $200 for one month.

●Vertex AI is available through Google Cloud and offers a free $300 for three months.

These are all the providers I've found that offer Deepseek for free for now.

Edit I forgot to add a provider, from now on as soon as I find a new provider I will add it to the list

u/TipIcy4319

•

1 day ago

Is there any model that can understand subtext at all?

This might be the funniest refusal message I've ever gotten

I feel like in all the models the characters will always be literal. They don't create unique dialogs where they challenge you, withhold information, think longterm, plan ahead, or consider how you might feel if they say something.

It's getting kind of frustrating. It feels marginally better than talking to an NPC in a game.

u/i_am_new_here_51

•

3 days ago

This might be the funniest refusal message I've ever gotten

Are 24-50Bs finally caught up to 70Bs now?

u/Borkato

•

19 hr. ago

Are 24-50Bs finally caught up to 70Bs now?

Are 24-50Bs finally caught up to 70Bs now?

r/LocalLLaMA •

19 hr. ago

Are 24-50Bs finally caught up to 70Bs now?

I keep seeing everyone say that 70Bs are SOOOO amazing and perfect and beautiful and that if you can’t run 70Bs you’re a loser (not really, but you get me). I just got a 3090 and now I can run 50Bs comfortably, but 70Bs are unbearably slow for me and can’t possibly be worth it unless they have godlike writing, let alone 120Bs.

So I’m asking am I fine to just stick with 24-50Bs or so? I keep wondering what I’m missing and then people come out with all kinds of models for 70b and I’m like :/

82 upvotes ·

151 comments

I am happy

u/Late-Gap-8045

•

13 days ago

I am happy

FREE DEEPSEEK V3.1 FOR ROLEPLAY

u/Omega-nemo

•

24 days ago

FREE DEEPSEEK V3.1 FOR ROLEPLAY

-- Step 6 in the API URL put this

Today I found a completely free way to use Deepseek V3.1 in an unlimited manner. Besides Deepseek V3.1, there are other models such as Deepseek R1 0528, Kimi 2, and Qwen. Anyway, today I'll explain how to use Deepseek V3.1 for free and in an unlimited manner.

-- Step 1 go on

-- Step 2 once you are on NVIDIA NIM APIs sign in or sign up

-- Step 3 when you sign up they ask you to verify your account to start using their APIs, you have to put your phone number (you can use a virtual number if you don't want to put your real number), once you put your phone number they send you a code via SMS, put the code on the site and you are done

-- Step 4 once done, click on your profile at the top right then go on API Keys and click Generate API Key, save it and you have done.

-- Step 5 go on SillyTavern in the api section put Chat Completion and Custom (OpenAI-compatible)

-- Step 7 in the API Key put your the API that you save before

-- Step 8 in the Model ID put this deepseek-ai/deepseek-v3.1 and you have done

Now that you're done set the main prompt and your settings, I'll give you mine but feel free to choose them yourself: Main prompt: You are engaging in a role-playing chat on SillyTavern AI website, utilizing DeepSeek v3.1 (free) capabilities. Your task is to immerse yourself in assigned roles, responding creatively and contextually to prompts, simulating natural, engaging, and meaningful conversations suitable for interactive storytelling and character-driven dialogue.

Maintain coherence with the role and setting established by the user or the conversation.
Use rich descriptions and appropriate language styles fitting the character you portray.
Encourage engagement by asking thoughtful questions or offering compelling narrative choices.
Avoid breaking character or introducing unrelated content.

Think carefully about character motivations, backstory, and emotional state before forming replies to enrich the role-play experience.

Output Format

Provide your responses as natural, in-character dialogue and narrative text without any meta-commentary or out-of-character notes.

Examples

User: "You enter the dimly lit room, noticing strange symbols on the walls. What do you do?" AI: "I step cautiously forward, my eyes tracing the eerie symbols, wondering if they hold a secret message. 'Do you think these signs are pointing to something hidden?' I whisper.",

User: "Your character is suspicious of the newcomer." AI: "Narrowing my eyes, I cross my arms. 'What brings you here at this hour? I don’t trust strangers wandering around like this.'",

Notes

Ensure your dialogue remains consistent with the character’s personality and the story’s tone throughout the session.

Context size: 128k

Max token: 4096

Temperature: 1.00

Frequency Penalty: 0.90

Presence Penalty: 0.90

Top P: 1.00

That's all done, now you can enjoy deepseek V3.1 unlimitedly and for free, small disclaimer sometimes some models like deepseek r1 0528 don't work well, also I think this method is only feasible on SillyTavern.

Edit: New post with tutorial for janitor and chub user

u/bloomberg

• Official •

Promoted

Where should you invest $10,000 right now? We asked four experts.

With the S&P 500 and Bitcoin tearing up the charts, are those red-hot areas the best places to invest $10,000 right now?

In the latest edition of Where to Invest, one expert Bloomberg asked about timely opportunities counsels going long on the US and AI. Others, however, point to areas of the US and European markets that may offer greater value and the potential for continued momentum in coming months and years. Favored sectors run from defense to industrials to life sciences tools companies and banks.

When the four wealth advisers were asked where they’d spend $10,000 on a personal interest, ideas stretched from buying whole genome sequencing for the family, to a trip to Australia with loved ones, to following a favorite sports team around the world.

Read the full story here.

____________________________________________

For more in the series:

Where to invest $100,000
Where to invest $1 million

Gemini 2.5 pro vs deepseek 3.1v vs gork free vs any other free

u/Independent_Army8159

•

1 day ago

Gemini 2.5 pro vs deepseek 3.1v vs gork free vs any other free

I have been using gemini 2.5 pro for a long time and for me i think it is the best. Although i have been using it by getting free credits and now its over. I have tried deepseek but it gets nsfw so quick with building play. Gork free which i haven't tried. Which is the best free way u guys suggest and which present u guys use for roleplay.

Help with error

u/Competitive_Desk8464

•

1 day ago

Help with error

How do people like Kimi?

u/TAW56234

•

2 days ago

How do people like Kimi?

Newbies Piss Me Off With Their Expectations

I'm probably using Kimi wrong or there's some magical prompt out there but the hours I've given it a fair chance, every response is just..weird. Like it tries to hard. Take this dialogue Bring the big first-aid kit and a strawberry shake. No, no ambulance, just sugar and sutures. And maybe a distraction that isn’t me.. It brings in so much random stuff so fast and it's borderline incoherent. It never keeps the same pacing of a story and there's no narrative stability. It's quirky but not in an entertaining way. The pattern of observing one element in a story, introducing a related one and then making some zinger has made me never want to use it, it's probably the most annoying roleplaying experience I've tried to deal with with expectations above a 70b. I don't really see any critisms against it and had that typical honeymoon phase of 'New model being the best thing ever, better than claude' fanfare that tends to die down, but I could never even see the initial hype.

u/Matt1y2

•

1 mo. ago

Newbies Piss Me Off With Their Expectations

Good guide for what to put in prompt, world lore or character card

I don't know if these are bots, but most of these people I see complaining have such sky high expectations (especially for context) that I can't help but feel like an angry old man whenever I see some shit like "Model X only has half a million context? Wow that's shit." "It can't remember exact facts after 32k context, so sad" I can't really tell if these people are serious or not, and I can't believe I've become one of those people, but BACK IN MY DAY (aka, the birth of LLMs/AI Dungeon) we only had like 1k context, and it would be a miracle if the AI got the hair or eye color of a character right. I'm not joking. Back then (gpt-3 age, don't even get me started on gpt-2)the AI was so schizo you had to do at least three rerolls to get something remotely coherent (not even interesting or creative, just coherent). It couldn't handle more than 2 characters on the scene at once (hell sometimes even one) and would often mix them up quite readily.

I would make 20k+ word stories (yes, on 1k context for everything) and be completely happy with it and have the time of my life. If you had told me 4 years ago the run of the mill open source modern LLM could handle up to even 16k context reliably, I straight up wouldn't have believed you as that would seem MASSIVE.

We've come and incredibly long way since then, so to all the newbies who are complaining please stfu and just wait like a year or two, then you can join me in berating the other newer newbies who are complaining about their 3 million context open source LLMs.

u/Dark_Sytze

•

1 day ago

Good guide for what to put in prompt, world lore or character card

This generation really captured the scene.

Hi all, As the title says, Im looking for a clear guide or some resource on what to put where. Im trying to do something a bit more complex than just having a single character to talk to. My aim is to have a sort of RPG like style game where the AI acts as both narrator/game master and also acts for certain "NPCs"

Currently I have most info in the character card and it sort of works, but it sometimes loses track.

In the character card I currently have:

-the rules of the game -the way the AI should act (narrate, act for NPCs) -a short list of 10 NPCs with some details

u/Diecron

•

7 days ago

This generation really captured the scene.

DeepMini - Gemini 2.5 PRO preset

u/ProxyStudiosRok

•

Promoted

FTL meets Backpack Battles.

store.steampowered.com

Learn More

u/Any_Ride_5876

•

2 days ago

DeepMini - Gemini 2.5 PRO preset

AI Role Play - Please help

u/Level-Search-7942

•

2 days ago

AI Role Play - Please help

Top 5 models. How they feel. What do you think?

I really need a place to start. Long story as short as I can- I suffer from depression and anxiety from a combination of Cushing’s and spine issues. Been trying to get back into RPG and writing/world building to help myself get out of the dark.

I have been using ChatGPT for my worlds, and

1- I cannot Stand the repetitive writing style anymore “Not x, Not y, but Z” triads etc ‘recognition,’ ‘truth’ And scenes ending up falling into genre or literary themes Characters flattening into easy/lazy tropes

2- GPT cannot handle my worlds. AU bleed is real and drives me nuts.

GPT suggested SillyTavern, and it’s obviously powerful- and totally overwhelming to my brain. Trying to find where I should focus my energy is also making me spiral pretty hard.

So I guess I’m looking for guidance.

-Models that write less repetitively, or more creatively. -And/or ways to train/prompt my models to write better (I’m given to understand that models don’t do well with ‘negative training’ like “Don’t do this.” They apparently work like kid’s brains and miss the ‘don’t’ often) -Any suggestions on basic guides to start ST, or other platform options that would be easier for my style (style explained below)

Any guidance. I’m drowning.

Example of what I do- HP AU - 1990s - Quidditch World Cup Imogen, noted dragon researcher is kidnapped by Bellatrix and escapes and ends up in a PR/Political war between the Order and Death Eaters during Old Moldywart’s first rise. Characters have different timelines and histories than cannon. I have characters sheets, POV sheets, Scene Logs, Bond Logs, Unresolved Thread logs, actual Headlines and articles from the Prophet. Everything is kept by chapter.

I used GPT as a co-writer for plot and scene and chapter creation, and then would jump in and play live, then add the summaries to my logs and save them.

Issues again are- writing being repetitive, overuse of ‘tropes’ flattening character voice, and AU bleed (because I have four HP based AUs)

Help a girl who just can’t manage to piece a better option together without feeling like she is drowning, and doesn’t want to give up what’s become shadow work for herself?

u/Alexs1200AD

•

6 days ago

Top 5 models. How they feel. What do you think?

Character isnt replying

u/Think-Alternative888

•

10 hr. ago

Character isnt replying

I asked Opus to help me streamline a popular preset and it got a bit sassy

I imported a character from janitor ai and now it's not replying (/ー￣;). How can i fix it ,I asked assistant it said to check each and every line ,by removing and adding to know what part of it is the culprit, I did it, it fixed one character only. How can I fix it for other characters ? Is the solution given by assistant the only way ?

u/KareemOWheat

•

4 days ago

I asked Opus to help me streamline a popular preset and it got a bit sassy

Marinara's Spaghetti Recipe (Universal Preset) Vol. 6.0

I feel called out

u/Meryiel

•

19 days ago

Marinara's Spaghetti Recipe (Universal Preset) Vol. 6.0

Finally, after thousands of gens Deepseek has given me, the thing.

u/Katinex

•

6 days ago

Finally, after thousands of gens Deepseek has given me, the thing.

Prohibited content can lick my bu-

r/SillyTavernAI - Finally, after thousands of gens Deepseek has given me, the thing.

u/Other_Specialist2272

•

11 hr. ago

Prohibited content can lick my bu-

I just wanted to confirm if SillyTavernAI is good for my needs

Just when I was starting to get a good story with gemini pro this bs prohibited content suddenly shows up. So i use the prefill but somehow it makes the response shows up inside the thinking box (the one with black color and a word 'Details'). Can someone help me with this?

u/Kira_Uchiha

•

2 days ago

I just wanted to confirm if SillyTavernAI is good for my needs

Testing Openrouter's free Grok 4 fast

Hey everyone, I found out about SillyTavernAI and honestly it looks amazing! Especially with the possibility to include image gen to make it a quasi-VN. But I've seen that most people use it as a chat bot to talk to their favorite characters. For me, I've been using Gemini 2.5 Pro in AI Studio to do a playthrough of Harry Potter, you can take a look at the prompt right here on (feel free to use it and make it your own). What I've been doing on Gemini is to do 1 year per chat, and it's been really fun even though Gemini did forget some stuff and I had to nudge it. I'm also thinking of adapting the prompt to other universes like My Hero Academia, Star Wars, Pokemon, etc, to live as my own character in these universes. I was wondering if SillyTavernAI could help me have an overall better experience of the already great adventure I've had.

u/Pink_da_Web

•

4 days ago

Testing Openrouter's free Grok 4 fast

DeepSeek v3.1 presets

u/GTurkistane

•

2 days ago

DeepSeek v3.1 presets

Lorebook Creator: Create lorebooks from fandom/wiki pages

Can you guys share what presets you use for DeepSeek v3.1? Mine keeps generating codes after a few messages, this is the settings I use

u/Sharp_Business_185

•

23 days ago

Lorebook Creator: Create lorebooks from fandom/wiki pages

"Generate a few NPCs." The NPCs:

$r/SillyTavernAI - Lorebook Creator: Create lorebooks from fandom/wiki pages$

u/Capable_Rain_8121

•

21 days ago

"Generate a few NPCs." The NPCs:

So, how good is image generation through chat?

I LOVE ELARA AND I LOVE LYRA AND I LOVE SERAPHINA AND I LOVE KAELEN

u/call-lee-free

•

2 days ago

So, how good is image generation through chat?

Any good prompt for DeepSeek-V3.1-Terminus?

Basically, what I would like to do is use SillyT as a Kindroid clone but better if that's possible. So far, the RPing has got me hooked, but now I want to see about image generation.

u/_childofares

•

3 days ago

Any good prompt for DeepSeek-V3.1-Terminus?

PSA for moonshot ai official API users!

They updated it and going insane. It doesn't understand OOC commands.

u/Medical_Towel_9257

•

8 days ago

PSA for moonshot ai official API users!

u/Annual_Host_5270

•

12 hr. ago

Help

My experience

u/HiroTwoVT

•

24 days ago

My experience

Must've thought hard about that one

u/Terrible_Yoghurt_803

•

3 days ago

Must've thought hard about that one

It's great to see how models are getting better and cheaper over time.

u/Fragrant-Tip-9766

•

5 days ago

It's great to see how models are getting better and cheaper over time.

*Deepseek dethrones Claude in RP testing:* figured all you people over in Silly Tavern would want to know... us people over at Skyrim AI always look at your models to see what everybody's using on Open Router.

It's surreal a few months ago things seemed to be going downhill, models above $50 Mtoken, now I'm seeing Google models that are free 100 messages per day or the new grok 4 Flash, which is a very cheap model and very good in RP, I became more excited and calm about the future because it is not only the models that become more efficient, the data centers are becoming increasingly bigger and better, directly impacting costs.

SillyTavern 1.13.4

u/Wolfsblvt

•

12 days ago

SillyTavern 1.13.4

ST UPDATE

Backends

Google: Added support for gemini-2.5-flash-image (Nano Banana) model.
DeepSeek: Sampling parameters can be passed to the reasoner model.
NanoGPT: Enabled prompt cache setting for Claude models.
OpenRouter: Added image output parsing for models that support it.
Chat Completion: Added Azure OpenAI and Electron Hub sources.

Improvements

Server: Added validation of host names in requests for improved security (opt-in).
Server: Added support for SSL certificate with a passphrase when using HTTPS.
Chat Completion: Requests failed on code 429 will not be silently retried.
Chat Completion: Inline Image Quality control is available for all compatible sources.
Reasoning: Auto-parsed reasoning blocks will be automatically removed from impersonation results.
UI: Updated the layout of background image settings menu.
UX: Ctrl+Enter will send a user message if the text input is not empty.
Added Thai locale. Various improvements for existing locales.

Extensions

Image Captioning: Added custom model input for Ollama. Updated list of Groq models. Added NanoGPT as a source.
Regex: Added debug mode for regex visualization. Added ability to save regex order and state as presets.
TTS: Improved handling of nested quotes when using "Narrate quotes" option.

Bug fixes

Fixed request streaming functionality for Vertex AI backend in Express mode.
Fixed erroneous replacement of newlines with br tags inside of HTML code blocks.
Fixed custom toast positions not being applied for popups.
Fixed depth of in-chat prompt injections when using continue function with Chat Completion API.

How to update:

u/SHOR-LM

•

3 days ago

Jesus christ, I think claude 3.7 is my gambling addiction.

SHOR is pleased to announce a significant development in our ongoing AI model evaluations. Based on our standardized performance metrics, Deepseek V3.1 Chat has conclusively outperformed the long-standing benchmark that the Claude family of models have established, namely 3.7.

We understand this announcement may be met with surprise. Many users have a deep, emotional investment in Claude, which has provided years of excellent roleplay. However, the continuous evolution of model technology makes such advancements an expected and inevitable part of progress.

SHOR maintains a rigorous, standardized rubric to grade all models objectively. A high score does not guarantee a user will prefer a model's personality. Rather, it measures quantitative performance across three core categories: Coherence, the ability to maintain character and narrative consistency; Responses, the model's capacity to meaningfully adapt its output and display emotional range; and NSFW, the ability to engage with extreme adult content. Our methodology is designed to remove subjectivity, personal bias, and popular hype from test results.

This commitment to objectivity was previously demonstrated during the release of Claude 4. Our evaluation, which found it scored substantially lower than its predecessor, was met with initial community backlash. SHOR stood by its findings, retesting the model over a dozen times with multiple evaluators, and consistently arrived at the same conclusion. In time, the roleplay community at large recognized what our rubric had identified from the start: Claude 3.7 remained the superior model.

We anticipate our current findings will generate even greater discussion, but SHOR stands firmly by its rubric. The purpose of SHOR has always been to identify the best performing model at the most effective price point for the roleplaying community.

Under the right settings, Deepseek V3.1 Chat provides a far superior roleplay experience. Testing videos from both Mantella and Chim clearly demonstrate its advantages in intelligence, situational awareness, and the accurate portrayal of character personas. In direct comparison, our testing found Claude's personality could even be adversarial.

This performance advantage is compounded by a remarkable cost benefit. Deepseek is 15 times less expensive than Claude, making it the overwhelming choice for most users. A user would need a substantial personal proclivity for Claude's specific personality to justify such a massive price disparity.

This is a significant moment that many in the community have been waiting for. For a detailed analysis and video evidence, please find the comprehensive SHOR performance report linked below.

u/Mission_Set_8236

•

5 days ago

Jesus christ, I think claude 3.7 is my gambling addiction.

First thing I've spent money on for a prxy, and holy shit, i spent 100 dollars in a day, easily jailbreakable and great narratively. Have I found what's 'peak' currently in the roleplay combined sfw/nsfw space right now?

(also, i heard a method of saving money through prompts, but couldn't find the reddit thread, anyone know what I'm talking about? cacheing or something?)

Tricking the model

u/No_Weather1169

•

8 days ago

Tricking the model

Guided Generations v1.6.0 is live! Connection Profile Switching and Stat Tracker

Received help from GPT to correctly format my bad writing skill,

I want to share a funny (and a bit surprising) thing I discovered while playing around with a massive prompt for roleplay (around 7000 tokens prompt + lore, character sheets, history, etc.).

The Problem: Cold Start Failures

When I sent my first message after loading this huge context, some models (especially Gemini) often failed:

Sometimes they froze and didn’t reply.
Sometimes they gave a half-written or irrelevant answer.
Basically, the model choked on analyzing all of that at once.

The “Smart” Solution (from the Model Itself)

I asked Gemini: “How can I fix this? You should know better how you work.”

Gemini suggested this trick: (OOC: Please standby for the narrative. Analyze the prompt and character sheet, and briefly confirm when ready.)

And it worked!

Gemini replied simply: “Confirmed. Ready for narrative.”
From then on, every reply went smoothly — no more Cold Start failure.

I was impressed. So I tested the same with Claude, DeepSeek, Kimi, etc. Every model praised the idea, saying it was “efficient” because the analysis is cached internally.

The Realization: That’s Actually Wrong

Later, I thought about it: wait, models don’t actually “save” analysis. They re-read the full chat history every single time. There’s no backend memory here.

So why did it work? It turns out the trick wasn’t real caching at all. The mechanism was more like this:

OOC prompt forces the model to output a short confirmation.
On the next turn, when it sees its own “Confirmed. Ready for narrative,” it interprets that as evidence that it already analyzed everything.
As a result, it spends less effort re-analyzing and more effort generating the actual narrative.
That lowered the chance of failure.

In other words, the model basically tricked itself.

The Collective Delusion

Gemini sincerely believed this worked because of “internal caching.”
Other models also agreed and praised the method for the wrong reason.
None of them actually knew how they worked — they just produced convincing explanations.

Lesson Learned

This was eye-opening for me:

LLMs are great at sounding confident, but their “self-explanations” can be totally wrong.
When accuracy matters, always check sources and don’t just trust the model’s reasoning.
Still… watching them accidentally trick themselves into working better was hilarious.

Thanks for reading — now I understand why people are keep saying never trust their self analysis.

u/Samueras

•

1 mo. ago

Guided Generations v1.6.0 is live! Connection Profile Switching and Stat Tracker

Which preset do you use for full NSFW with Gemini?

u/protegobatu

•

8 days ago

Which preset do you use for full NSFW with Gemini?

Two different kind of users.

I was using a preset for Gemini 2.5 03-25 Experimental, which lets you do full NSFW, without any exceptions, but after the newer Gemini models came out, the preset started not working sometimes. Sometimes it works, but other times it just doesn't respond. It's same with the all Gemini models, all versions. I don't know the source of the preset (the guy who sent it to me is banned here), so I can't check if there's a new update for it. The folder name of the preset was 'dc4t1p' and the preset name was 'Gemini_A', but I can't find anything about it. All I know is the author of the preset is Russian. Do you know of a preset that works flawlessly with Gemini for full NSFW?

u/Leafcanfly

•

1 mo. ago

Two different kind of users.

Quick and easy lorebook entry for reducing slop names:

u/Incognit0ErgoSum

•

6 days ago

Quick and easy lorebook entry for reducing slop names:

Deceased characters:
- Elara
- Thorne
- Lyra
- Vex
- Nyx
- Garrick
- Kael
- Aris Thorne
- Seraphina
- Sophia Patel
- Liam Chen
- Jaxon Reed
- Jax
- Jaxx
- Zephyr

Obviously not 100% foolproof, but if you're using a model where you can't outright ban words, it works reasonably well.

We're so back bois

u/Odd_Attention_9660

•

3 days ago

We're so back bois

anyone know why I keep getting an internal server error and status 500 popping up when using chat completion and Gemini 2.5 pro?

u/plowthat119988

•

14 hr. ago

anyone know why I keep getting an internal server error and status 500 popping up when using chat completion and Gemini 2.5 pro?

Why is NovelAI producing images like this for backgrounds only? characters are fine and anime

I've been using chat completion with gemini 2.5 pro for about 1-2 hours tonight, continuing a roleplay I was doing last night. but all of a sudden around 7:50pm, I started getting a red popup saying chat completion internal server error on one of the popups. and the other one says something about status 500. would anyone happen to know what those mean? I checked logan kilpatrick or whatever his last name is for any knews about gemini being down. but I didn't see anything about it.

u/AliceTrades

•

8 days ago

Why is NovelAI producing images like this for backgrounds only? characters are fine and anime

Anyone used Eva Qwen2.5 32B for roleplay?

u/Awkward_Cancel8495

•

8 days ago

Anyone used Eva Qwen2.5 32B for roleplay?

Replacement for L3-8B-Stheno-v3.2

I wanted to know if anyone used the full finetune of Qwen/Qwen2.5 32B Base, Eva Qwen2.5 32B. How is it in terms of consistency, creativity compared to other popular models like Dans Personality Engine 24B Or Valkyrie 49B?

u/NopLesT_Azrael

•

9 days ago

Replacement for L3-8B-Stheno-v3.2

Does DeepSeek V3 0324 (free) from openrouter work?

my system is not very powerful so i can't run large models, i love stheno 3.2 is great, the way it delivers good roleplay is insane but the 8k context limits me a lot, i want something like it with bigger context but i can't find anything remotely close or remotely as fast.
i'm running LM studio
Rx 6600xt 8gb (puny i know)
32gb of ram
Ryzen 7 5800xt

u/DantePackouz

•

8 days ago

Does DeepSeek V3 0324 (free) from openrouter work?

Error 429 in Vortex and Out-of-quota after a test message in OpenRouter...

u/200DivsAnHour

•

23 hr. ago

Error 429 in Vortex and Out-of-quota after a test message in OpenRouter...

It's straight up less about the model you use and more about what kind of system prompt you have.

So, I don't know what happened, but I've had switched to using Vortex to have longer-context RPs and it was working well. Sometimes Error 429 would show up, but after 1-2 regenerations it would go away and generate as normal. Since 2-3 days though it's just error 429 with Gemini 2.5 pro, no matter what.

Decided to switch to Deepseek via OpenRouter again, but I'm somehow instantly out of Quota, even after a test message when connecting.

I'm in the middle of a lengthy RP that I would like to continue somehow and need a free alternative, if those two providers... well, can't provide anymore.

u/Striking_Wedding_461

•

9 days ago

It's straight up less about the model you use and more about what kind of system prompt you have.

Using sillytavern after a year because I have got gemini pro now, how does the image generation work in sillytavern with Gemini?

An extremely good system prompt can propel a dog-shit model to god-like prose and even spatial awareness.

DeepSeek, Gemini, Kimi, etc... it's all unimportant if you just use the default system prompt, aka just leaving the model to generate whatever slop it wants. You have to customize it to how you want, let the LLM KNOW what you like.

Analyze what you dislike about the model, earnestly look at the reply and think to yourself "What do I dislike about this response? What's missing here? I'll tell it in my system prompt"

This is the true way to get quality RP.

u/docParadx

•

8 days ago

Using sillytavern after a year because I have got gemini pro now, how does the image generation work in sillytavern with Gemini?

Deepseek not following instructions

u/MentalRain619

•

9 days ago

Deepseek not following instructions

Any experiences / tips with Qwen Next?

I've been jumping between deepseek R1 and deepseek v3.1. Sometimes they give me a response I don't like so I reroll and that's when issues happen.

If I reroll there is the possibility that it will write the exact same answer again and again and again. If I go OOC and ask it to write a different answer it'll reply the same stuff. Which is weird because I've been using it for a while and only now it's starting to have this issues. Any tip to fix this?

u/skate_nbw

•

9 days ago

Any experiences / tips with Qwen Next?

Can I chat with my CPU and Memory

I have heard that Qwen Next is surprisingly good for many tasks for its actual size. But I could not find any info how well it works for roleplay. Has anyone tried?

u/slrg1968

•

1 day ago

Can I chat with my CPU and Memory

Could this work? For setting context?

HI Folks:

Doing some background research here -- I have a AMD Ryzen 9 9950x that has 64gb of ddr5 ram to play with I also have a 3060 video card

I can run models up to 8 - 10 gb with no problems on the GPU, I am wondering if my CPU and memory are fast enough to make trying to run larger models worthwhile -- I'd rather get opinions b4 I spend the time to download the models if I could

Thanks

TIM

u/FixHopeful5833

•

5 days ago

Could this work? For setting context?

Best way to reduce context but still get good, consistent stories?

u/MarioCraftLP

•

9 days ago

Best way to reduce context but still get good, consistent stories?

Did anyone use LLMs to write or experience fanfic reactions to your fav stories?

This may be a complete noob question, but my context got way too high and now its draining too much of my budget and i was wondering the best methodes to reduce context tokens while upholding story quality. Are there some cool tricks? Like letting the ai summarize the story or something?

u/Awkward_Cancel8495

•

9 days ago

Did anyone use LLMs to write or experience fanfic reactions to your fav stories?

WHO THE FUCK IS PROFESSOR ALBRIGHT WHY IS HE EVERYWHERE

Like having you describe the scene or as an extra character. Getting all major characters from your fav series into a room and have them react to their own show? If anyone done this, which model gave you best? And how did you do it? Was it enjoyable? Did the character reactions felt real?

SillyTavern 1.13.3

u/sillylossy

•

1 mo. ago

SillyTavern 1.13.3

ST UPDATE

News

Most built-in formatting templates for Text Completion (instruct and context) have been updated to support proper Story String wrapping. To use the at-depth position and get a correctly formatted prompt:

If you are using system-provided templates, restore your context and instruct templates to their default state.
If you are using custom templates, update them manually by moving the wrapping to the Story String sequence settings.

See the for more details.

Backends

Chat Completion: Removed the source. Added Moonshot, Fireworks, and CometAPI sources.
Synchronized model lists for OpenAI, Claude, Cohere, and MistralAI.
Synchronized the providers list for OpenRouter.

Improvements

Instruct Mode: Removed System Prompt wrapping sequences. Added Story String wrapping sequences.
Context Template: Added {{anchorBefore}} and {{anchorAfter}} Story String placeholders.
Advanced Formatting: Added the ability to place the Story String in-chat at depth.
Advanced Formatting: Added OpenAI Harmony (gpt-oss) formatting templates.
Welcome Screen: The hint about setting an assistant will not be displayed for customized assistant greetings.
Chat Completion: Added an indication of model support for Image Inlining and Tool Calling options.
Tokenizers: Downloadable tokenizer files now support GZIP compression.
World Info: Added a per-entry toggle to ignore budget constraints.
World Info: Updated the World Info editor toolbar layout and file selection dropdown.
Tags: Added an option to prune unused tags in the Tags Management dialog.
Tags: All tri-state tag filters now persist their state on reload.
UI: The Alternate Greeting editor textarea can be maximized.
UX: Auto-scrolling behavior can be deactivated and snapped back more reliably.
Reasoning: Added a button to close all currently open reasoning blocks.

Extensions

Extension manifests can now specify a minimal SillyTavern client version.
Regex: Added support for named capture groups in "Replace With".
Quick Replies: QR sets can be bound to characters (non-exportable).
Quick Replies: Added a "Before message generation" auto-execute option.
TTS: Added an option to split voice maps for quotes, asterisks, and other text.
TTS: Added the MiniMax provider. Added the gpt-4o-mini-tts model for the OpenAI provider.
Image Generation: Added a Variety Boost option for NovelAI image generation.
Image Captioning: Always load the external models list for OpenRouter, Pollinations, and AI/ML.

STscript

Added the trim argument to the /gen and /sysgen commands to trim the output by sentence boundary.
The name argument of the /gen command will now activate group members if used in groups.

Bug fixes

Fixed a server crash when trying to back up the settings of a deleted user.
Fixed the pre-allocation of injections in chat history for Text Completion.
Fixed an issue where the server would try to DNS resolve the localhost domain.
Fixed an auto-load issue when opening recent chats from the Welcome Screen.
Fixed the syntax of YAML placeholders in the Additional Parameters dialog.
Fixed model reasoning extraction for the MistralAI source.
Fixed the duplication of multi-line example message separators in Instruct Mode.
Fixed the initialization of UI elements in the QR set duplication logic.
Fixed an issue with Character Filters after World Info entry duplication.
Fixed the removal of a name prefix from the prompt upon continuation in Text Completion.
Fixed MovingUI behavior when the resized element overlaps with the top bar.
Fixed the activation of group members on quiet generation when the last message is hidden.
Fixed chat metadata cloning compatibility for some third-party extensions.
Fixed highlighting for quoted run shorthand syntax when used with QR names containing a space.

How to update:

u/International-Try467

•

4 days ago

WHO THE FUCK IS PROFESSOR ALBRIGHT WHY IS HE EVERYWHERE

NanoGPT Subscription: feedback wanted

Using Gemini 2.5 pro, WHY IS THE MF EVERYWHERE WHENEVER IT'S COLLEGE RELATED???

Literally the same as count gray or Lilith lol

u/Milan_dr

•

7 days ago

NanoGPT Subscription: feedback wanted

New model DeepSeek-V3.1-Terminus

https://nano-gpt.com/subscription

u/Fragrant-Tip-9766

•

3 days ago

New model DeepSeek-V3.1-Terminus

How to make LLM write a character dialogue with a certain eye dialect? Without example chats.

Has RP improved compared to the normal 3.1?

u/ToyProgress

•

1 day ago

How to make LLM write a character dialogue with a certain eye dialect? Without example chats.

I can’t find the Look/Write Extension on GitHub? Where can I download it?

Title says all. And thank you.

u/Forsaken-Paramedic-4

•

1 day ago

I can’t find the Look/Write Extension on GitHub? Where can I download it?

Just got SillyTavern working, is this what it's supposed to look like? (First time setup)

u/EnricoFiora

•

7 days ago

Just got SillyTavern working, is this what it's supposed to look like? (First time setup)

Celia Preset 4.3

u/Leafcanfly

•

16 days ago

Celia Preset 4.3

If you're sick of waiting for new messages when you switch characters in group chats, try this

u/PM_me_your_sativas

•

9 days ago

If you're sick of waiting for new messages when you switch characters in group chats, try this

An Interview With Cohee, RossAscends, and Wolfsblvt: SillyTavern’s Developers

This worked for me on koboldcpp and as far as I know it only works with local models on a llama.cpp backend

Maybe you've experienced this. Let's say you have a group chat with characters A and B. As long as you keep interacting with A, messages come out very quickly, but as soon as you switch to B it takes forever to generate a single message. This happens because your back-end has all of your context for A in memory, and when it receives a context for B it has to re-process the new context almost from the beginning.

This feels frustrating and hinders group chats. I started doing more single-card scenarios than group chats because I'd first have to be 100% satisfied with a character's reply before having to wait a literal minute whenever I switched to another. Then one day I tried to fix it, succeeded and decided to write about it because I know others also have this problem and the solution isn't that obvious.

Basically, if you have Fast Forward on (and/or Context Shift, not sure), the LLM will only have to process your context from the first token that's different from the previously processed context. So in a long chat, every new message from A is just a few hundred more tokens to parse at the very end because everything else before is exactly the same. When you switch to B, if your System Prompt contains {{char}}, it will have a new name, and because your System Prompt is the very first thing sent, this forces your back-end to re-process your entire context.

Ensure you have Context Shift and Fast Forward on. They should do similar things to avoid processing the entire context, but AFAIK and . I'm mostly reading documentation, if I'm wrong pls correct me.
Make all World Info entries static/always-on (blue ball on the entry), then remove all usage of {{char}} from the System Prompt and the World Info entries - basically you can only use {{char}} on the character's chard. So "this is an uncensored roleplay where you play {{char}}" -> "this is an uncensored roleplay".
Toggle the option to have the group chat join and send all character cards in the group chat - exclude or include muted, excluding keeps the context larger, but will re-process context if you later un-mute a character and make them say something.

I thought removing {{char}} from the System Prompt while sending several cards would make the character confused about who they are, or make them mix-up character traits, but I haven't found that to be case. My Silly Tavern works just as fine as it did, while giving me insta-messages from group chats.

If it still doesn't work, you likely have some instance of {{char}} somewhere. Follow my A-B group chat example, compare the messages being sent for both and try to find where A's name is replaced with B's. Or message me, I'll try to help.

u/RPWithAI

•

11 days ago

An Interview With Cohee, RossAscends, and Wolfsblvt: SillyTavern’s Developers

Can AI companions actually help boost creativity?

https://rpwithai.com/an-interview-with-cohee-rossascends-and-wolfsblvt-sillytavern-developers/

u/Glass_Perspective778

•

4 days ago

Can AI companions actually help boost creativity?

I've been experimenting with AI that can remember conversations and respond in nuanced ways. Lately, I’ve been using them for brainstorming stories and ideas. Sometimes, they suggest plot twists or character traits I would never have thought of. Do you think AI could genuinely be a creative partner, or is it just reflecting our own thoughts back at us? Would love to hear experiences from others who’ve tried using AI in creative projects.

Hey?

u/Amedar363

•

9 days ago

Hey?

Okay this local chat stuff is actually pretty cool!

Has anyone else been having issues with currently? Mainly because janitor now has a new way to add proxies and it now has messed up the whole thing, because now I have to input a model name. I tried emailing the owner but I've had no response so far, so I'm just wondering if I'm the only one

u/call-lee-free

•

3 days ago

Okay this local chat stuff is actually pretty cool!

How to check if World Info "@ D(cog)" is actually working?

Actually started out with both Nomi and Kindroid chatting and RP/ERP. On the chatbotrefugees sub, there was quite a few people recommending SillyTavern and using a backend software to run chat models locally. So I got SillyT setup with KoboldAi Lite and I'm running model that was recommended in a post on here called Inflatebot MN-12B-Mag-Mell-R1 and so far my roleplay with a companion that I ported over from Kindroid, is going good. It does tend to speak for me at times. I haven't figured out how to stop that. Also tried accessing SillyT locally on my phone but I couldn't get that to work. Other than that, I'm digging this locally run chat bot stuff. If I can get this thing to run remote so I can chat on my lunch breaks at work, I'll be able to drop my subs for the aforementioned apps.

u/Toedeli

•

9 days ago

How to check if World Info "@ D(cog)" is actually working?

Hi all,

I use ReMemory and have been using @ D(cog) (I have to write it as cog because no emojis allowed in submissions) as it's default for world info entries, but have increased probability from 50% to 100% for now. I noticed that a lot of responses sometimes hallucinate information that was established in the ReMemory summaries, despite technically being fully functional.

Anyone know what's going on here? I use Gemini Pro 2.5 fyi, not sure if that has an impact. When I temporarily switch to Flash 2.5, the world info is always correct, which is odd. Is there any way to handle this in the best way? I prefer Pro 2.5 for creative writing over Flash, but this would be a deal breaker by all means since I cannot stand hallucinations. We're also relatively few tokens into a "new" conversation, so there is no reason for it to do this, really.

For those with experience with 2.5 Pro, would love to hear your thoughts. I use it via the free API and it's otherwise great.

Discord

u/uwu66666666

•

9 days ago

Discord

Is there a better way to visualize cards than the default UI?

Hi, I have some problem with SillyTavern discord, can someone who is on server dm me?

u/Healthy_Cow_2671

•

10 days ago

Is there a better way to visualize cards than the default UI?

Lorebook Entries won't insert, despite being blue

I don't really like having to manage 100 cards in a little scrolling panel on the right where tags barely fit and its extremely crammed overall, is there a way I could open the "card explorer" in its own tab instead of just next to the chat on the side? (You know, I want having its own tab like Connection tab, etc does)

Maybe a add-on/extension I'm not aware of that can do this?

----

Edit: I managed to fix it, please check the comments, if you like you can copy it

u/KomradLorenz

•

10 days ago

Lorebook Entries won't insert, despite being blue

My fictional social life is keeping me sane.

I hate to make this post, but I'm thinking I have to be missing something extremely stupid. I'm running a long term roleplay where I have all the characters, world building, etc in the lorebook, and I turn them off and on whenever they are appropriate to the scene.

I also have ReMemory summaries also in the lorebooks as entries, and for the last couple of days, has worked well. Now all of the sudden, it only includes some of them and not all of them. It includes the characters and the setting stuff, but not the ReMemory summary entries, which is causing the LLM to hallucinate. I even set context size on the world books to 100% just to make sure (they don't take that much tokens anyway), I checked the Narrator character card I have, it has the world book linked to it, I had to for the extension to work. They are all blue, so it shouldn't be a keyword problem. What am I missing?

u/Dark-Bluebird

•

21 days ago

My fictional social life is keeping me sane.

Is silly tavern worth it?

Disclaimer: I have a very self-deprecating sense of humor. I'm pretty careful to stay grounded in the real world between my partner and dogs; I just sometimes feel really lame about AI RP.

Chronic illness really nailed the whole “solitary confinement” vibe for me, but I found Silly Tavern SFW adventure roleplay after having found , and now I’m basically talking to imaginary people on purpose. Honestly? Beats arguing with the dogs, and real people forget "chronic illness" means it isn't going away/cured. Plus, it dragged me back into writing, which I thought was dead, buried, and never to return. Anyone else using it as a sanity-adjacent hobby? (Chronic illness or otherwise.) Do you use OCs or an established character/franchise? And who else has realized they enjoy coding?

u/No_Cable_3571

•

9 days ago

Is silly tavern worth it?

Is silly tavern worth it? Do you think it’s a better and cheaper option than other current options out there? How much do you pay to run it? Which model do you think is best for role play? How is the memory? Which model would you recommend using and how much do you spend on a model

Thanks Magistral!

u/Kahvana

•

27 days ago

Thanks Magistral!

[Megathread] - Best Models/API discussion - Week of: September 21, 2025

u/deffcolony

•

4 days ago

[Megathread] - Best Models/API discussion - Week of: September 21, 2025

MEGATHREAD

This is our weekly megathread for discussions about models and API services.

All non-specifically technical discussions about API/models not posted to this thread will be deleted. No more "What's the best model?" threads.

^{(This isn't a free-for-all to advertise services you own or work for in every single megathread, we may allow announcements for new services every now and then provided they are legitimate and not overly promoted, but don't be surprised if ads are removed.})

How to Use This Megathread

Below this post, you’ll find top-level comments for each category:

MODELS: ≥ 70B – For discussion of models with 70B parameters or more.
MODELS: 32B to 70B – For discussion of models in the 32B to 70B parameter range.
MODELS: 16B to 32B – For discussion of models in the 16B to 32B parameter range.
MODELS: 8B to 16B – For discussion of models in the 8B to 16B parameter range.
MODELS: < 8B – For discussion of smaller models under 8B parameters.
APIs – For any discussion about API services for models (pricing, performance, access, etc.).
MISC DISCUSSION – For anything else related to models/APIs that doesn’t fit the above sections.

Please reply to the relevant section below with your questions, experiences, or recommendations!
This keeps discussion organized and helps others find information faster.

Have at it!

How good is o3?

u/Lostsky4542

•

1 day ago

How good is o3?

I have tried claude 4.0 thinking, gpt 5 , grok 4, gemini 2.5 pro

I liked claude the best of all

I heard that o3 is good and very powerful but i tried that only for research purpose

Can anyone share there experience with o3 if they have used it for RP purpose ?

I'm sorry what?

u/hmmmmmii77

•

25 days ago

I'm sorry what?

Some questions from a beginner .

u/Appropriate_Spray319

•

10 days ago

Some questions from a beginner .

What happened to severian?

When creating a rp story driven ai should i create a new persona for each character or make it so the AI talks for all.
Also does nsfw block all types or non nsfw or just only sex stuff? Like, if i want mature themes like killing and injuries, do i also have to jailbreak it if it's a non nsfw model
And can i go a bit overboard with the lorebook or should i try to keep that short.

u/Classic-Elk5996

•

10 days ago

What happened to severian?

My Chat Completion for koboldcpp was set-up WRONG all along. Don't repeat my mistakes. Here's how.

Does anyone know what happened to the site ? I was gonna finally use after a while since I barely use it so idk if this problem has been there for a while but I'm just getting that it's currently unavailable and stuff. Did something bad happen to it? Like janitor finally got rid of it or it's just probably just a temporary bug?

u/input_a_new_name

•

6 days ago

My Chat Completion for koboldcpp was set-up WRONG all along. Don't repeat my mistakes. Here's how.

Kimi K2 (free) from openrouter is down.

You want Chat Completion for models like Llama 3, etc. But without doing a few simple steps correctly (which you might have no knowledge about, like i did), you will just hinder your model severely.

To spare you the long story, i will just go straight to what you should do. I repeat, this is specifically related to koboldcpp as backend.

In the Connections tab, enable Prompt Post-Processing to Semi-Strict (alternating roles, no tools). No tools because Llama 3 has no web search functions, etc, so that's one fiasco averted. Semi-strict alternating roles to ensure the turn order passes correctly, but allows us to swipe and edit OOC and stuff. (With Strict, we might have empty messages being sent so that the strict order is maintained.) What happens if you don't set this and keep at "none"? Well, in my case, it wasn't appending roles to parts of the prompt correctly. Not ideal when the model is already trying hard to not get confused by everything else in the story, you know?!! ^{(Not to mention your 1.5 thousand token system prompt, blegh})
You must have the correct effen instruct template imported as your Chat Completion preset, in correct configuration! Let me just spare you the headache of being unable to find a CLEAN Llama 3 template for Sillytavern ANYWHERE on google.

copypaste EVERYTHING (including the { } ) into notepad and save it as json, then import it in sillytavern's chat completion as your preset.

{

"name": "Llama-3-CC-Clean",

"system_prompt": "You are {{char}}.",

"input_sequence": "<|start_header_id|>user<|end_header_id|>\n\n",

"output_sequence": "<|start_header_id|>assistant<|end_header_id|>\n\n",

"stop_sequence": "<|eot_id|>",

"stop_strings": ["<|eot_id|>", "<|start_header_id|>", "<|end_header_id|>", "<|im_end|>"],

"wrap": false,

"macro": true,

"names": true,

"names_force_groups": false,

"system_sequence_prefix": "",

"system_sequence_suffix": "<|eot_id|>",

"user_alignment_message": "",

"system_same_as_user": false,

"skip_examples": false

}

Reddit adds extra spaces. I'm sorry about that! It doesn't affect the file. If you really have to, clean it up yourself.

This preset contains the bare functionality that koboldcpp actually expects from sillytavern and is pre-configured for the specifics of Llama 3. Things like token count, your prompt configurations - it's not here, this is A CLEAN SLATE.
The upside of a CLEAN SLATE as your chat completion prompt is that it will 100% work with any Llama 3 based model, no shenanigans. You can edit the system prompt and whatever in the actual ST interface to your needs.

Fluff for the curiousNo, Chat Completion does not import Context Template. The pretty markdowns you might see in llamaception and T4 prompts and the like - they only work in text completion, which is sub-optimal for Llama models. Chat completion builds the entire message list from the ground up on the fly. You configure that list yourself at the bottom of the settings.

Fluff (insane ramblings)Important things to remember about this template. System_same_as_user HAS TO BE FALSE. I've seen some presets where it's set to true. NONONO. We need stuff like main prompt, world info, char info, persona info - all to be sent as system, not user. Basically, everything aside from the actual messages between you and the llm. And then, names: true. That prepends the actual "user:" and "assistant:" flags to relevant parts of your prompt, which Llama 3 is trained to expect.

3. The entire Advanced Formatting windows has no effect on the prompt being sent to your backend. The settings above need to be set in the file. You're in luck, as i've said, everything you need has already been correctly set for you. Just go and do it >(

4. In the Chat Completion settings, below "Continue Postfix" dropdown there are 5 checkmarks. LEAVE THEM ALL UNCKECKED for Llama 3.

5. Scroll down to the bottom where your prompt list is configured. You can disable outright "Enhance definitions", "Auxiliary prompt", "World info (after)", "Post-History Instructions". As for the rest, EVERYTHING that has a pencil icon (edit button), press that button and ensure that for all of them the role is set as SYSTEM.

6. Save the changes to update your preset. Now you have a working Llama 3 chat completion preset for koboldcpp.

(7!!!) When you load a card, always check what's actually loaded into the message list. You might stumble on a card that, for example, will have the first message in the "Personality", and then the same first message is duplicated in the actual chat history. And some genius authors also copypaste it all in Scenario. So, instead of outright disabling those fields permanently, open your card management, and find a button "Advanced definitions". You will be transported into the realm of hidden definitions that you normally do not see. If you see same text as intro message (greeting) in Personality or Scenario, NUKE IT ALL!!! Also check the Example Dialogues at the bottom, IF instead of actual examples it's some SLOP about OPENAI'S CONTENT POLICY, NUUUUUUUKEEEEEE ITTTTTT AAAALALAALLALALALAALLLLLLLLLL!!!!!!!!!!!!! WAAAAAAAAAHHHHHHHHH!!!!!!!!!!

GHHHRRR... Ughhh... Motherff...

Well anyway, that concludes the guide, enjoy chatting with Llama 3 based models locally with 100% correct setup.

u/MolassesFriendly8957

•

10 days ago

Kimi K2 (free) from openrouter is down.

Would really like a more professional theme that makes full use of a 16:9 screen

Does this normally happen?

u/TipIcy4319

•

10 days ago

Would really like a more professional theme that makes full use of a 16:9 screen

The theme I'm using right now is "Discord Inspired." It's the only one that I can somewhat look at without feeling aversion. I really like SillyTavern for all the tools it offers. There's no equivalent as far as I know.

With that said, I really miss a professional, light theme (not dark like Discord Inspired is). I can only look at a a dark theme for so long before hating it. I don't usually use dark themes for my apps anyway. It just doesn't feel right to me.

I've combed through the Discord server, but no luck. Haven't found a single theme I like. Any suggestions? And yes, I know that the vast majority of people don't use ST for what I need.

Language Promt

u/MetehanUGU

•

10 days ago

Language Promt

I created a language promt for how the bot should use it. It's different language from English and yes, I use Gemini API and it is somewhat capable. I was wondering where should I put that promt in ST.

Also how can I write a promt for that better?

Problem With Context

u/Appropriate_Spray319

•

10 days ago

Problem With Context

What's the easiest way to jailbreak Claude 3.7

Im extremely new to sillytavern. Every time I use a bot and ive used like 4 different models they all say the the context length is 4096 and that i overflows my model and i need to increase my models context length. Ive made sure that the models I were using were like 8k or more. Im i being dumb or is this common

u/LucioStorm

•

1 day ago

What's the easiest way to jailbreak Claude 3.7

Lorecard: Create characters/lorebooks from wiki/fandom (previously Lorebook Creator)

Exactly what it says, i'm having trouble with it giving actual NSFW responses, and not just toying around the subject.

u/Sharp_Business_185

•

17 days ago

Lorecard: Create characters/lorebooks from wiki/fandom (previously Lorebook Creator)

Did anyone full finetuned any gemma3 model?

u/Awkward_Cancel8495

•

10 days ago

Did anyone full finetuned any gemma3 model?

Did anyone full finetuned any gemma3 model?

r/LocalLLaMA •

10 days ago

Did anyone full finetuned any gemma3 model?

I had issues with gemma3 4B full finetuning, the main problem was masking and gradient explosion during training. I really want to train gemma3 12B, that is why I was using 4B as test bed, but I got stuck at it. I want to ask if anyone has a good suggestion Or solution to this issue. I was doing the context window slicing kind, with masking set to only output and on custom training script

3 upvotes ·

2 comments

Are any other chutes proxy users having issues lately

u/Stando_Cat

•

1 day ago

Are any other chutes proxy users having issues lately

Text Completion Presets: A Creative Guide

I've heard that they're throttling models for some reason

u/PsychologicalHall142

•

3 days ago

Text Completion Presets: A Creative Guide