/r/StableDiffusion is back open after the protest of Reddit killing open API access, which will bankrupt app developers, hamper moderation, and exclude blind users from the site. More info: https://rtech.support/docs/meta/blackout.html#what-is-going-on Discord: https://discord.gg/4WbTj8YskM Check out our new Lemmy instance: https://lemmy.dbzer0.com/c/stable_diffusion
The DEFINITIVE Comparison to Upscalers
I've been seeing a lot of piecemeal upscaler model comparisons on the subreddit. Some old, some with models that aren't in the SD WebUI, some only focused on a single image type. I really needed to figure out which is the right one in the right circumstance. So, here it is...
Meet the doggos:
a realistic ((photo)) of a dog, sitting on a grass field, photo by Alasdair McLellan Steps: 60, Sampler: Euler a, CFG scale: 7, Seed: 2589807749, Size: 512x512, Model hash: 7460a6fa (SD-v1.4)
a painting of a dog, sitting on a grass field, art by Vittorio Matteo Corcos Steps: 60, Sampler: Euler a, CFG scale: 7, Seed: 2524032541, Size: 512x512, Model hash: 7460a6fa (SD-v1.4)
an anime animation of a dog, sitting on a grass field, photo by Studio Ghibli Steps: 20, Sampler: Euler a, CFG scale: 7, Seed: 1580678771, Size: 512x512, Model hash: 0b8c694b (WD-v1.2)
These are all 512x512 pics, and we're going to use all of the different upscalers at 4x to blow them up to 2048x2048. This is no-frills within the Extras tab, but I do recommend using SD Upscale with a low blur for an actual upscale. However, this test is useful to actually nail down which model to use within those steps.
Full-sized Comparisons (very large image grids):
But, a more useful look is up close:
What conclusions can we draw from these comparisons?
Original - This is just the original 512x512, except I scaled it by 4x using ImageMagick with no algorithm or dithering. Just a straight duplication of pixels in a square, which makes it easier to compare against the others.
None - The "None" selector, which you would think would be closer to IM's Scale, but it actually looks like it's using Lanczos?
Lanczos - Just a basic algorithm that is a step below something like what you would find in Photoshop, but not by much. They are all crap. AIs are way better.
And I'll summarize the rest:
Upscaler | Photos | Paintings | Anime/Animation |
---|---|---|---|
LDSR | Much slower than anything else, but very good for photos | Too much random noise | Better, but still noisy |
BSRGAN | Good, subtly sharp without going too far | Okay, but maybe a bit too smooth | Good, but R-ESRGAN is better |
ESRGAN_4x | SUPER sharp, good here, but might be a tad too unrealistic | Too grainy, but might be good for a textured paint look | Terrible, worse than non-AI methods |
R-ESRGAN-General-4xV3 | Okay, like BSRGAN, but a bit too much blur | Ditto | Much better here, but not as good as R-ESRGAN-Anime |
R-ESRGAN-General-WDN-4xV3 | Closer to BSRGAN | Very good, texture and definition without being overbearing | Also good, also not as good as R-ESRGAN-Anime |
R-ESRGAN-AnimeVideo | Tends to "unphoto" a subject | Ditto | 2nd best for anime, but Anime6B is better |
R-ESRGAN-4x+ | Basically BSRGAN | Slightly more texture than BSRGAN | Basically BSRGAN |
R-ESRGAN-4x+-Anime6B | Straight-up anime dog | Also anime | The best for anime |
ScuNET-GAN | Too blurry | Too blurry | Average |
ScuNET-PSNR | Too blurry | Too blurry | Hot garbage |
SwinIR_4x | Yuck, I see tile lines! | Good, but not as textured as General-WDN | Hot garbage |
TL;DR
Picture type | Recommendations |
---|---|
Photos | LDSR (but it's slow), or ESRGAN_4x if you want to super-sharp detail and/or speed, or BSRGAN for subtly |
Paintings | ESRGAN_4x for high paint texture and detail, General-WDN for a better overall look |
Anime | Anime6B, also good for turning something into anime |
There is now also SwinIR2, or v2, which is an improvement on SwinIR
Which is better than esrgan4 x in all my cases.
Woah, where’s this one at?
Our 3x diffusion use cases: Photo, Painting, and Waifu
Awesome comparison, thank you for making this. I’m a big fan of SwinIR, surprised to see you didn’t think favorably of its results.
You might try Remacri, it's one of my favorites.
What about Gobig/txt2imghd?
tl;dr it slices the photo and uses img2img to add details to the upscaled slices, which can bring out minute details that didn't exist there previously.
I already recommended using the "SD upscale" feature above. There's a link to a guide for how to do that.
This was very useful, thanks a lot for posting it!
I was mainly interested in the painting Upscaler, so I conducted a few tests, including with two Upscalers that have not been tested (and one of them seems better than ESRGAN_4x and General-WDN.
4x_foolhardy_Remacri with 0 denoise, as to perfectly replicate a photo.
And 1. 4x_foolhardy_Remacri_0 denoise again, but this time upscaled with Tiled Diffusion. I hope this helps everyone who needs to work with upscalers.
One thing I didn't try to compare it with is realistic pictures. So I don't know if it performs better than those too. If you have the time to add it to the list and test it against the same data it would be helpful.
https://huggingface.co/FacehugmanIII/4x_foolhardy_Remacri - this is where I got it from, maybe it's in other places too.
While not included automatically, all 4x esrgan models are supported. Some of these are considered the best upscalers there are.
How do you enable them? What contexts are they useful for?
This is very nice. I will definitely try WDN and 6B, and see how they fare in comparison to what I am currently using.
What about stacking upscalers? That is, using one after the other?
I have found that ESRGAN works good for small to medium, while SWINIR works well for medium to large.
other ones i use sometimes:
003_realSR_BSRGAN_DFOWMFC_s64w8_SwinIR-L_x4_GAN (this is my favourite if it doesn't create tile lines, i don't know why they happen with some pictures, what are your SwinIR settings?)
4x-UltraSharp
4x_foolhardy_Remacri
4x_Valar_v1
maybe you could try them too
Great Comparison!
Which is best for halucinating contextal details from thumbnail sized images? LDSR?
It won't be long before we can upscale at 32x... from 32px to 1024px with contextual suitable textures and halucinated details!
I only have stock Automatic right now. If I were to download 1 for each of the three prompts, which would you suggest?
[removed]