⏱️

Site Speed Poll

We've seen an uptick in complaints about download speed and site speed but are having a hard time recreating it. Getting a better idea of the scope of the issue will help us figure out how to solve it. Please let us know your experience!

SDXL Pony Fast Training Guide

247

954

Updated: Mar 15, 2024

character

anime girls

Verified: 2 months ago

SafeTensor

This asset is designed to work best with the Pony Diffusion XL model, it will work with other SDXL models but may not look as intended.

Details

Type	LoRA
Stats	956 28
Reviews	Very Positive (247)
Published	Mar 15, 2024
Base Model	Pony
Training	Epochs: 50
Hash	AutoV2 260A075F2A

1 File

Recommended Resources

default creator card background decoration

319

3.2k

am7coffeelove

Joined Nov 24, 2023

This guide explains my method for training character models.

Using 20 images, you can create a SDXL Pony LoRA in just 15 minutes of training time.

This guide assumes you have experience training with kohya_ss or sd-scripts. It skips over tool operation details.

In creating this training, I referred to the excellent guide at the following URL: https://civitai.com/models/281404/lora-training-guide-anime-sdxl

【Training Environment】

Recommended VRAM: 12GB or higher (Confirmed working on RTX 4060Ti 16GB)

*Can be trained with 10GB VRAM if using the FP8 option.

【Tools Used】

kohya_ss GUI: https://github.com/bmaltais/kohya_ss

I installed kohya_ss using Stability Matrix: https://github.com/LykosAI/StabilityMatrix

Pony Diffusion V6 XL:https://civitai.com/models/257749?modelVersionId=290640

zunko_dataset(20 image&tag):https://files.catbox.moe/lnelg0.zip

zunko_Exclude_tag_list.txt: https://files.catbox.moe/2jbc93.txt
kohya_ss preset(zunko_pony_prodigy_v1.json):https://files.catbox.moe/t5clrs.json

【Training Data】

Number of images: 20-40

Using more than this may decrease reproducibility. Consistent quality is more important than quantity.

It's best if the images are from the same illustrator, TV series, etc. with a consistent art style.

For fan art, try to gather illustrations with as consistent an art style as possible.

For this, I borrowed publicly released AI training data from the Japanese ZUNKO project: https://zunko.jp/con_illust.html

I selected 20 illustrations of zunko in the same outfit, converted the 768x1024 PNGs to WEBP format.

*The sd-script supports WEBP files which have a much smaller file size, so I prefer using them.

【Tagging】

Using webui's wd14tagger to re-tag the images:

Model: moat-tagger-v2

Weight Threshold: Default 0.35

select「Batch from directory」
set input & output directory path
Additonal tags "zunko,score_9,source_anime,znkAA"
Character Name: zunko
Trigger Word: znkAA
Quality Tags: score_9, source_anime
Excluded Tags
- Remove all character traits (green hair, yellow eyes, long hair...)
- Remove clothing traits except one (kept "japanese clothes")

I've attached a list of the words I excluded, so pasting that into the excluded tags field should give the same result.
Ideally, we want to consolidate into the trigger word, but with few training steps it's hard for the model to learn "znkAA" refers to the outfit.
So instead, I have the model absorb the outfit traits into the existing concept it recognizes as clothing: "japanese clothes", adding "znkAA" as a supplement.

- Leave tags for character poses, compositions, and undesirable objects (bows, books, food, etc.)

【Start Training】

Launch khya_ss and select the "LoRA" tab. Be careful not to open a LoRA preset while the DreamBooth tab is selected.
I've attached a preset, so download that and "Open" it from the settings.

Adjust the file and Source model paths for your environment. Also adjust the Mixed precision and Save precision setting based on your accelerator (e.g. fp16).

Base Settings:

Optimizer: prodigy, LR Scheduler:1

dim: 16,Network Alpha: 2

batch: 3,repetition: 1,epoch: 50

If you get an OOM error due to low VRAM, try checking the fp8 training option.

On my setup, 50 epochs took 14 minutes. Time will vary based on your PC specs.

【Selection】

Finally, review the results and pick your preferred epoch. 50 epochs is just a guideline - the final epoch isn't necessarily best.

The settings save every 10 epochs, but saving every 5 may be better.

If the training data and model are well suited, it may converge quickly.

Discussion

vokar28

a month ago

Does this work with realistic subjects, or only anime?

gynoidneko

a month ago

Don't know what I'm doing wrong but when I try this my training literally takes days. I'm using a 4090, with plenty of RAM and storage, and a new i9. I have no idea what I'm doing wrong. I can't find any preset file to use.

Robeloto

11 days ago

Great guide bro! ^^

kcatpa42

13 days ago

work well, thank you~

ApexThunder

3 days ago

Hi, could you make a video, it would be much easier that way. Since the new versions of kohya_ss GUI are very different from the images you used in the tutorial

MaterinnoBallerino

4 days ago

Wouldn't using images of the same style simply cook the style in the character lora? Personally I don't want that, I want as much control over the style and colors as possible. Are you sure this is the best way to go about it?

YikaPanic

8 days ago

Can't load tokenizer for 'openai/clip-vit-large-patch14'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'openai/clip-vit-large-patch14' is the correct path to a directory containing all relevant files for a CLIPTokenizer tokenizer.

what can I do

lemmywinks1965

10 days ago

50 epochs took 4 hours for me. Nvidia 4080 with 16GB VRAM. Don't see how it can be 15 minutes

Gallery

am7coffeelove

2 months ago - v1.0-pony

score_9, score_8_up, score_7_up, best quality, prefect lighting
zunko, 1girl,solo,znkAA,japanese clothes, outdoors, blue s...