Hypernetwork Style Training, a tiny guide #2670

Heathen · 2022-10-15T01:03:47Z

Heathen
11 days ago

tl;dr

Prep:

Select good images, quality over quantity
Train in 512x512, anything else can add distortion
Use BLIP and/or deepbooru to create labels
Examine every label and remove whatever is wrong, add whatever is missing

Training:

Learning Rate: 5e-5:100, 5e-6:1500, 5e-7:10000, 5e-8:20000
Prompt Template: a .txt with only [filewords] in it.
Steps: 20000 or less should be enough.

Longer explanation:

Select good images, quality over quantity
My best trained model was done using 21 images. Keep in mind that hypernetwork style transfer is highly dependent on content. If you pick an artist that only does cityscapes and then ask the AI to generate a character with his style, it might not give the results you expect. The hypernetwork intercepts the words used during training, so if there are no words describing characters, it doesn't know what to do. It might work, might not.
Train in 512x512, anything else can add distortion
I've tested this several times. I haven't gotten good results out of it yet. So up to you.
Use BLIP and/or deepbooru to create labels AND Examine every label and remove whatever it wrong, add whatever is missing
It's tedious and might not be necessary, if you see blip and deepbooru are working well, you can let it as is. In any way, describing the images is important so the hypernetwork knows what it is trying to change to be more like the training image.
Learning Rate: 5e-5:100, 5e-6:1500, 5e-7:10000, 5e-8:20000
They added a training scheduler a couple days ago. I've seen people recommending training fast and this and that. Well, this kind of does that. This schedule is quite safe to use. I haven't had a single model go bad yet at these rates and if you let it go to 20000 it captures the finer details of the art/style.
Prompt Template: a .txt with only [filewords] in it.
If your blip/booru labels are correct, this is all you need. You might want to use the regular hypernetwork txt file if you want to remove photo/art/etc bias from the model you're using. Up to you.
Steps: 20000 or less should be enough.
I'd say it's usable in the 5000-10000 range with my learning rate schedule up there. Buuut you will notice that in the 10000-20000 range, a lot of the finer details will show up. So as the rock would say, put in the work, put in the hours.

Final notes after the rock intermission.

If your model uses VAE, keep it on. Don't know if this makes a difference in training, but just making sure.
Unload any other hypernetworks. I'm not sure if it interferes with training, but better be safe.
If your model breaks and the preview tests show colorful noise, don't just go back a little, pick an even earlier model to train onward and reduce the learning rate even more
Don't change your training data midway, better start over.
https://www.birme.net/ is great for bulk cropping images.
If your loss goes past 0.3 you done goofed and probably broke your hypernetwork, lower the learning rates.

Examples:
Trained NAI for 6500 steps on Andreas Rocha style. I plan on letting it train to 20000 later. And done.

Vanilla NAI

RTX On, I mean, Style on

20000 steps

Vanilla NAI

Rocha ON

20000 Steps

andrei007999 · 2022-10-15T01:29:19Z

andrei007999
11 days ago

I find that my hypernetworks are starting to cook on 5e-6 somewhere after ~17k steps, so that might be a good stopping point.
On 5e-7 it's taking really tons of time to train, starting to be close to artist of my choice somewhere ~70k steps.

7 replies

Heathen 11 days ago
Author

It was added a couple days ago, the format is
learningrate:steps, ...
It does interpret scientific notation as well, so a schedule like this example works:
5e-6:15000, 5e-7:30000

ogkalu2 10 days ago

@andrei007999
How many images did you use to train at 5e-7 ?
How long past 70k steps did you go and how close did the style get ?

andrei007999 10 days ago

@ogkalu2 ~250 augmented i think, not much images produced by that author, but they were manually tagged and cropped.

I didn't, 70k did produce some outputs very close to style, so i stopped there, but it's likely that you can go for another 80k i think.

ogkalu2 9 days ago

@andrei007999
Did you try the scheduler ? Did it work better ?

andrei007999 8 days ago

@ogkalu2 I did, it seem to get faster to base of style, well, but that was obvious, because starting LR is much faster. But to see if it's better, i would need to train over 20k steps, and i did not have opportunity to do so yet on any hypernetwork.

What i can say, seem like lower LR make it work out smaller details better instead of generalized stylistic of art, so, in theory scheduler should achieve much better results in the long run.

ExponentialML · 2022-10-15T01:32:48Z

ExponentialML
11 days ago

Thanks for the guide! I find that hypernetworks work best to use after fine tuning or merging a model. Trying to train things that are too far out of domain seem to go haywire. It makes sense considering that when you fine tune a Stable Diffusion model, it will learn the concepts pretty well, but will be somewhat difficult to prompt engineer what you've trained on. Hypernetworks seem to help alleviate this issue.

1 reply

ogkalu2 8 days ago

Do you train the hypernetworks on the fine tuned or merged model ?

Heathen · 2022-10-15T02:35:15Z

Heathen
11 days ago
Author

A few more examples of NAI + Andreas Rocha hypernetwork now that it is trained.

0 replies

Chubler-XL · 2022-10-15T06:35:42Z

Chubler-XL
10 days ago

Thanks for this guide I have been struggling to get an embedding of a particular atrists style sorted out and this helped no end to an acceptable result.

I had 26 examples of the artists work which I manually resized/cropped to 512x512.
I created the labels using BLIP and manually corrected them.

Embeddings need a much bigger Learning rate, and after some trial and error, I ended up with:

Initialization text: *
Vectors: 2
Learning rate: 0.02:200, 0.008:800, 0.002:2000, 0.0008:5000
Prompt template file: textual_inversion_templates\style_filewords.txt

Nice thing about the embeddings is I can use a standard model and just add "painting by [artist-name]" to my prompts.

I will try extending the final learn out to a much bigger number of steps and see if more details appear.

12 replies

mykeehu 10 days ago

What I have noticed so far is that if the VAE was loaded for the basic SD model, it learned slowly, but more accurately. If it was not, it learned faster, but with txt2img it was very sensitive to steps and CFG.
Otherwise I had to set the CFG very low (3-5) to generate good results in both cases. At a value higher than 10, it transferred style rather than object.

Chubler-XL 8 days ago

Thank you very much for the ideas! I'm training for concept now, not style, but I'll try your numbers. Watching the reps and periods, they look good, but I'll only bring them down after a longer workout, but not to 0.00005 (for TI): 0.02:200, 0.008:800, 0.002:2000, 0.0008:8000, 0.0002:20000, 0.00008:30000 I always get very confused images, but it seems that as I lower the rate, it starts to learn what I want. The interesting thing about the training though is that even though I tried to step back from 5000 to 4000 and lower the rate, the training still continued from 5000, even though I deleted the newer models and changed the rate in the folder. Maybe I should try without VAE for TI as well? Although in @Heathen's case VAE was left on, elsewhere they say exactly to disable it anyway for training. I'm getting a little unsure about that. Maybe it should be turned off for HN but not for TI?

The current step (and everything learned) is stored in the .pt file embeddings\xyz.pt so you want to replace this file with textual_inversion\<date>\xyz\embeddings\xyz-4000.pt to back track to step 4000.

If you are training a hypernetwork you would replace models\hypernetworks\xyz.pt with the file from textual_inversion\<date>\xyz\hypernetworks\xyz-4000.pt

If you want to scrap and start again you would delete the file and then use "Create Embedding" or "Create Hypernetwork" to build an unpopulated start file.

mykeehu 7 days ago

@Chubler-XL I admit, I still don't understand these LR values. I've been trying different solutions for several days: training with a constant value of 0.005 for at least 20000 steps, or aggressive values at first, then lower values, but the loss always stays below 0.2 and above 0.05, average between 0.09-0.12. I almost always get the same loss values whichever LR I choose. Of course, the images always change and are sometimes better, sometimes worse, sometimes better again... So how to find out a good learning rate that works in many cases?

The other is that if I don't train on style but on concept (i.e. general, e.g. a motto, a posture) and I use [filewords], TI generates better images, but if I retrieve an image for the keyword, it doesn't understand it, as if it was just an effect, even though I train on subject_filewords.txt. Conversely, if I just use subject.txt, the images are blurry, but then txt2img generates the same blurry image, so that's the better way to go. I'm training with 10 images right now, but I also used 60 (30x2) and that seemed like too many for it.

That's why I'm really interested in LR, is it worth starting coarser (like 0.5 or 0.2) and then tweaking if I want to train something new and unknown, or should I start lower so it doesn't rush and then I can go up to 40,000 steps?

Chubler-XL 7 days ago

I've been playing around with quite small LRs like the guys are using for the Hyopernet "5e-5:300, 5e-6:1200, 5e-7:10000, 5e-8:20000" with some fairly good results. Everything I've tried so far has been a style rather than a single object / posture etc.

I'm trying to do "art by Ken Done" at the moment and that is quite difficult, very colorfull and abstract style which SD is struggling with.

mykeehu 7 days ago

For me, it would be a specific pose, shot from several angles, 10 pictures in total. I've been trying on TI beyond 20000, and with subject.txt (filewords just messed it up), but it still generates a similar one for every twentieth image (with a step of 50). It's not efficient that way, and the one that is similar is still far from what I want. It's like it has stopped learning and can't move on. He is training at 5e-8:50000 on the TI. Unfortunately Hypernetwork is not good for me because it is more style than subject or pose.
if I train from subject_filewords.txt, I see a good result (although it doesn't cover what I'm training for, but fewer bad pictures), but as soon as I generate for the keyword, I get again some deformed result. It's as if I fed it based on the filewords in vain.

chekaaa · 2022-10-15T17:15:27Z

chekaaa
10 days ago

can you share the post processed imgs that you used for the training? if its possible. Just to have a better idea of what works

1 reply

Heathen 10 days ago
Author

I use a mix of zoomed out images and close ups in details. These are from a single image.

What you really need to avoid is having images of the same thing that are only a few pixels apart, you might start getting double-vision-like images if you train on those.

tearxinnuan · 2022-10-15T20:58:42Z

tearxinnuan
10 days ago

Very good tutorial, although my VRAM currently doesn't support me to use Train XD

0 replies

chekaaa · 2022-10-16T07:46:45Z

chekaaa
9 days ago

I trained a mob psycho Hypernetwork , here are the results with 26k steps

No mob psycho prompts where use to generate these images.

Some extras

7 replies

chekaaa 7 days ago

Spy x family HN. I stopped at 12k but 9.6k was the sweet spot for this one.

I had to trash the first training. I realized that I had too much characters in the data , so I just left two , male and female>
Also I had some scenery in there , but noticed that the ones that danbooru only managed to extract 1tag werent detailed enough so I took them out too.

thezakman 7 days ago

Those look great, mind sharing how was steps to achieve such great results?

chekaaa 7 days ago

I collected around 44 screenshots from the anime , making sure to capture 1 women, 1 men and scenery
Used https://www.birme.net/ to crop them to 512x512 and trying to keep the face if there is one at the center
Preprocess using A1111 GUI, with flip and deepborru checked
Check text files with prompts, remove prompts that were duplicated or wrong
Remove images that deepbooru had problems recognizing tags, like images with only 1 tag
Start training with max steps 20k, e5 LR, and one sample image every 100 steps
Monitor the sampling images till it gets overtrained(It becomes very saturated)
search for the .pt before the image started to get artifafcts
copy the .pt to the hypernetwork folder , and start train gain but with e6 LR
In case it get overtraied again repeat same process and reduce the LR to for example e7 in this case
In the case of the Spy x family HN It went to 12k steps but I started to notice that It started to lose the style that I was aiming for , so I decided to use the 9.6k .pt

Note that I used a model that works with
danbooru tags, so for the template I just used [filewords] and nothing more

These are the imgs that I used for my first training , which failed. To fix that I removed the img with a couch because deepboru was only able to recognize a door, and also removed all the extra characters to just leave 1 male character and 1 female character, in this case two of the main characters

birme-512x512 (4).zip

chekaaa 6 days ago

Something that I noticed and Idk why It happens, I stopped the training and then started again with the same prompts and seed for the sample generation and I got a totally different img. It has happened to me before with other trainings. Any idea why this happens?

thezakman 4 days ago

I notice that too and I wonder why, sometimes I don't see any improvement and when I stop and start the training again, the same or next render is super different. Not sure if the hyper are being truly loaded in the actual render or something else.

gissleh · 2022-10-16T13:05:06Z

gissleh
9 days ago

I'm trying to train it on Mass Effect aliens. I know the SD 1.4/1.5 model has a vague idea of what they are, but training goes in circles.

Is Hypernetwork the wrong tool for that?

3 replies

Heathen 9 days ago
Author

It's a bit pushing it. It only modifies the results of a model. If the result is bad or nonexistent, hypernetwork isn't it for you. I'd say train on everything mass effect, aliens, humans, style, etc and see if that works. If not, you probably need Dreambooth or Textual Inversion. But for TI you will have to train each character as its own embedding.

gissleh 9 days ago

Thanks, you're probably right. I was waiting on getting a used M40 working for Dreambooth, but maybe my card (2080ti) could handle it now that Dreambooth's VRAM usage has been optimized.

AndreRatzenberger 6 days ago

Dreambooth or native fine tuning/training would be the tool of your choice

horribleCodes · 2022-10-16T20:23:41Z

horribleCodes
9 days ago

Do you know where the default values for training are kept? I'd like to edit change the usual 0.005 to your recommended schedule. There's usually always one value I forget to set and I have to start all over.

4 replies

horribleCodes 9 days ago

I found it: modules\ui.py, line 1232. You can find the path to the template at line 1236. Unfortunately, this file is being tracked, so you'd either have to force pull and redo the edit, or stash the change and pop it back in after the pull.

Heathen 9 days ago
Author

stable-diffusion-webui/modules/ui.py

Line 1232 in c8045c5

 
  learn_rate = gr.Textbox(label='Learning rate', placeholder="Learning rate", value="0.005")

Yup, was going to link it, but you beat me to it.

mykeehu 9 days ago

It would be nice if this would be included in the ui settings file, and then you wouldn't have to touch the code

horribleCodes 9 days ago

Already started an issue: #2900

Heathen · 2022-10-18T20:09:23Z

Heathen
7 days ago
Author

For anyone wanting to test something, this is an annealing learning rate I'm trying out:
5e-5:100, 5e-6:1500, 5e-7:2000, 5e-5:2100, 5e-7:3000, 5e-5:3100, 5e-7:4000, 5e-5:4100, 5e-7:5000, 5e-5:5100, 5e-7:6000, 5e-5:6100, 5e-7:7000, 5e-5:7100, 5e-7:8000, 5e-5:8100, 5e-7:9000, 5e-5:9100, 5e-7:10000, 5e-6:10100, 5e-8:11000, 5e-6:11100, 5e-8:12000, 5e-6:12100, 5e-8:13000, 5e-6:13100, 5e-8:14000, 5e-6:14100, 5e-8:15000, 5e-6:15100, 5e-8:16000, 5e-6:16100, 5e-8:17000, 5e-6:17100, 5e-8:18000, 5e-6:18100, 5e-8:19000, 5e-6:19100, 5e-8:20000

It would be better if we could put math expressions in the learning rate field instead.

10 replies

Chubler-XL 6 days ago

Thanks this worked great from me, as luck would have it right on 20000 we struck gold:

17251,16,451,0.1189907,5e-08
17501,16,701,0.1983684,5e-08
17751,16,951,0.1453333,5e-08
18001,17,81,0.1961740,5e-08
18251,17,331,0.1399961,5e-08
18501,17,581,0.1694585,5e-08
18751,17,831,0.1528273,5e-08
19001,17,1081,0.2063270,5e-08
19251,18,211,0.1538779,5e-08
19501,18,461,0.1576670,5e-08
19751,18,711,0.2313629,5e-08
20001,18,961,0.0995830,5e-08

In all my other attempts with this set I never got below .11, those short 5e-6 bursts certainly help getting it unstuck. As you say there are quite a few opportunities in that final 10000 steps to cherry pick one of the earlier results.

Heathen 6 days ago
Author

It doesn't seem loss converges in this model. I think there's a minimal downward trend, but that would take hundreds of thousands of steps to converge if at all.

It also appears that in my particular data set there is a group of images with small loss and the majority has high loss. I would like for the CSV to include the picture name of the step taken, if you set it to log every 1 step.

chekaaa 6 days ago

its time to experiment with multilayer structure now, and I have no idea what it does

enn-nafnlaus 5 days ago

@Heathen I see what you're doing and I like it. :) Helps avoid getting stuck in local minima. Will try this out this evening, on a "1,2,4,2,1" model. I went and extended it out to >80k cycles, with decreasing sizes of bumps and annealing cycles.

enn-nafnlaus 5 days ago

Made a quick-and-dirty customizable generator to make learning rate prompts that emulate your "punctuated evolution". :)

learning_rate.txt

The burst rates are able to be varied in magnitude, and you can set the exponent for the distribution - a high exponent means that the highest bursts are rare while most are low.

chekaaa · 2022-10-18T21:20:00Z

chekaaa
7 days ago

There is a PR for multilayer structure settings for hypernetworks #3086. Does anyone have an idea on this affects training?

3 replies

SeverianVoid 5 days ago

From this which is the original pull request
#3086

hyper network layer structure
If write "1, 2, 1", hypernetworks are composed of 2 fully connected layers whose intermediate dim is 2x, which is same as up to now.
The more you add the number, like "1, 2, 4, 2, 1", the more the structure of hypernetworks becomes deeper. Deep hypernetworks are suited for training with large datasets.

Add layer normalization
If checked, add layer normalization after every fully connected layer. It would be meaningful to prevent hypernetworks from overfitting and make training more stable.

enn-nafnlaus 5 days ago

My experiments with normalization have been... less than stellar. But maybe I'm doing something wrong. Seems you have to use much higher training rates to see changes, but then they wonk out. Experimenting with deeper networks now.

Heathen 5 days ago
Author

I've been able to train faster with normalization, but increasing the neural network density only slowed training down without any perceivable gain, at least on 100ish picture dataset.

enn-nafnlaus · 2022-10-20T18:18:54Z

enn-nafnlaus
5 days ago

What learning rate did you use? fim., 20. okt. 2022 kl. 17:42 skrifaði Pirate Kitty < ***@***.***>:

…

I've been able to train faster with normalization, but increasing the neural network density only slowed training down without any perceivable gain, at least on 100ish picture dataset. — Reply to this email directly, view it on GitHub <#2670 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/A3XG2HZZR47QM2FXR3N7R2LWEGABLANCNFSM6AAAAAARFV5TAI> . You are receiving this because you commented.Message ID: <AUTOMATIC1111/stable-diffusion-webui/repo-discussions/2670/comments/3927066 @github.com>

1 reply

Heathen 5 days ago
Author

I'm currently using 5e-3:200, 5e-4:400, 5e-5:1000, 5e-6:2000, 5e-7:3000 for normalized. Only training up to 3000 steps. But the results aren't good and they don't seem to get better with normalized. So it's only if you want something fast, I suppose.

enn-nafnlaus · 2022-10-20T20:03:03Z

enn-nafnlaus
5 days ago

Yeah, my goal is good, not fast. fim., 20. okt. 2022 kl. 19:43 skrifaði Pirate Kitty < ***@***.***>:

…

I'm currently using 5e-3:200, 5e-4:400, 5e-5:1000, 5e-6:2000, 5e-7:3000 for normalized. Only training up to 3000 steps. But the results aren't good and they don't seem to get better with normalized. So it's only if you want something fast, I suppose. — Reply to this email directly, view it on GitHub <#2670 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/A3XG2H2NL22VKOMDM6G22UDWEGOGJANCNFSM6AAAAAARFV5TAI> . You are receiving this because you commented.Message ID: <AUTOMATIC1111/stable-diffusion-webui/repo-discussions/2670/comments/3927937 @github.com>

0 replies

Heathen · 2022-10-20T23:29:56Z

Heathen
5 days ago
Author

Trained another hypernetwork with okayish results, but I feel the loss we're getting back isn't indicative of anything. I know it's median loss and not step loss, but even so.

There doesn't seem to be a downward trend per epoch.

But analyzing the loss density, there are local downward trends.

This hypernetwork was trained with VAE on, I'm going to try again with VAE off and compare results and loss data.

1 reply

Heathen 5 days ago
Author

Trained the same data with VAE off

Loss was generally lower, but also all over the place. Still doesn't seem indicative of anything as far as hypernetworks go.

giteeeeee · 2022-10-21T02:35:09Z

giteeeeee
5 days ago

Just a quick question on how to read "loss convergence".

For example, a loss from "0.30-0.10" to "0.25-0.15", can be interpreted as converging? Am I understanding this correctly?

1 reply

Heathen 5 days ago
Author

Technically yes, but I haven't seen it actually do that, as you can see in the graphs above.

horribleCodes · 2022-10-21T09:39:04Z

horribleCodes
4 days ago

I wrote a script that lets you create a list of learning rates out of a mathematical expression and sample points. You can also specify how many times you want to define the learning rate.
There's some deviation and you have to watch out for overfitting, but there's a graph that lets you review the result.

import numpy as np
from scipy.optimize import curve_fit
from matplotlib import pyplot as plt

# -- VARIABLES (mess with those) --

# The expression of the function of the learning rate.
def expression(x, a, b, c,d):
    return a*np.log(b+x)+c*x+d

# An array of the points in the function you'd like to hit, with x being the step, and y being the learning rate.
# The x-coordinate of the first point will be used as the starting point of the interval, so pick the step where you want the rate to change for the first time.
# Note that this is curve fitting, so there will be some deviation.
points = np.array([
    [200,0.0001],
    [1000,0.00005],
    [10000,0.00002],
    [25000,0.00000005]
    ])

# The number of learning rates you'd like to define.
# 50 means the output will change the learning rate 50 times.
outputPoints = 50

# The number of total steps you want the training to last for.
# Use 0 if you want to use the step of your last sample point.
totalSteps = 25000

# The path to the output file.
path = "learningRate.txt"

# -- FUNCTIONS (mess at own risk) --

# Turn the 2D array of the points into 2 1D arrays.
def PrepData(pointList):
    pointsPivot = list(zip(*pointList[:]))

    xData = pointsPivot[0]
    yData = pointsPivot[1]

    return xData,yData

# Calculate the parameters of the function using SciPy's curve_fit.
def GetParams(xData, yData):
    params = curve_fit(expression, xData, yData)

    print("Parameters:")
    print(params[0])
    return params

# Calculate the learning rates for our different steps using the calculated parameters.
def GetResults(xData, params):
    firstX = xData[0]
    lastX = totalSteps
    
    if lastX == 0:
        lastX = xData[len(xData)-1]
    
    xList = np.round(np.linspace(int(firstX), int(lastX), num = outputPoints),0)
    yList = expression(xList,*params[0])
    f = {}
    for i, x1 in enumerate(xList):
        f[x1] = yList[i]
    return xList, yList, f

# Draws a graph of the function, as well as the specified points.
def DrawPlot(xData,yData,xList,yList):
    plt.plot(xData, yData, 'o', color ='red', label ="Checkpoints")
    plt.plot(xList, yList, '-', color ='blue', label ="Optimized curve", drawstyle="steps-pre")
    plt.xlabel("Steps")
    plt.ylabel("Learning rate")
    plt.legend()
    plt.show()

# Writes the values into a string with a y1:x1, y2:x2,... format.
def GetOutput(xList, f):
    output = ""

    for x1 in xList:
        if output != "":
            output += ", "
        output += '{:0.1e}'.format(f[x1]) + ":" + format(x1.astype(np.int64))
    return output

# Writes the output string into a text file.
def WriteOutput(output):
    with open(path, 'w') as file:
        file.write(output)
    print(output.replace(", ","\n")+"\n")
    print("Saved output under "+path+".")
    
def main():
    xData,yData = PrepData(points)
    
    params = GetParams(xData, yData)
    xList, yList, f = GetResults(xData,params)
    
    DrawPlot(xData,yData,xList,yList)
    
    output = GetOutput(xList, f)
    WriteOutput(output)
    
main()

3 replies

rikimtasu 4 days ago

how do i use this to write a learning rate?

horribleCodes 4 days ago

You execute it, either through an IDE like VS Code or terminal. The easiest way to run it would be save it somewhere as a .py file, open the folder in terminal and execute python [filename.py].

horribleCodes 4 days ago

I made a Colab notebook with more instructions.
Learning Rate Generator

chekaaa · 2022-10-22T04:25:47Z

chekaaa
4 days ago

I`ve playing around with the new feature added Open clip aesthetic, I trained a embedding with 20 images and with the with the default batch size of 256
mobData.zip

Note: I plan to train again but removing the image with the explosion, gift and the older dude, because I feel It affected the results too much

Then I ran many times changing the settings and using the Hypernetwork that I trained of mob psycho using similar images shared before.

I noticed that I got better results by writing the tag "1boy" , a danbooru tag since the model that I'm using is trained with those
The images that I'm about to share used the next settings:
Sampling method used : DPM adaptive
Target style
Original Image
Hypernetwork
Hypernetwork + Clip Aesthetic
By my experience the same settings may not work for different seeds, but by slightly switching the LR I got better results

4 replies

chekaaa 3 days ago

Removed the images that I mentioned , and changed some settings

Realized that Aesthetic text for imgs is useful for keep the subject at high aesthetic steps , but I didn't need it for lower steps like 10:

After those changes I getting more consistent results, still is not a guaranteed awesome images but I got some cool ones:

chekaaa 3 days ago

By far the most consistent settings for my runs, using my mob psycho HN and aesthetic

horribleCodes 3 days ago

So this feature seems to be more suited for approximating a subject, rather than a style?

chekaaa 3 days ago

More like a helping hand to the hypernetwork ,at least for my case

Ethrillo · 2022-10-23T01:50:49Z

Ethrillo
3 days ago

What i wonder. Can i only train one object/style/etc. per hypernetwork or can i train multiple things? Like lets say cars, bikes and houses all at once instead of one at a time?

1 reply

Heathen 3 days ago
Author

You can train as much as you want. They should affect everything during generation, but they affect more on the words they have seen during training.

horribleCodes · 2022-10-24T02:03:03Z

horribleCodes
2 days ago

Multiple activation functions, and dropout have been added, but there isn't any information about these features on this repo yet. Does anyone know how they work in detail, and how they affect the results?

7 replies

Leon-Schoenbrunn yesterday

I've had very good results with higher learning rates (1e-3 and 1e-4), relu, normalization and dropout activated.
In general with normalization on, non-linear activations and low learning rates you will only experience minor changes.
Dropout activated has a great effect and can "prevent" overfitting to a certain extent.
I haven't tried changing the structure of the network, but the theory is, that deeper complex networks can learn more complex data. They take longer to train, but maybe they are better at learning different subjects.

horribleCodes yesterday

@Heathen, thanks for the insight.
I am wondering if there's a better evaluation function than loss to be had - obviously the later generations are much closer to the ground truth, but the loss data hardly says that. Personally, I've only used HNs to train for style, so I wonder if there's a better way to measure how closely the style of an image resembles the reference - maybe a GAN?

enn-nafnlaus yesterday

I'm not sure about style changes, but I did had the idea that every X training cycles it could run diffusion on several random batches of noise (say, from seeds 0, 1, 2, and 3), compare the output latents to the latents from the last run on those same batches of noise, and then take the magnitude of the differential. Low differentials = the model is drifting slowly; high differentials = the model is drifting quickly. The learning rate could then be automatically adjusted to try to maintain a desired rate of drift.

Not exactly what you were thinking of, but I think it'd be helpful...

SeverianVoid 8 hours ago

Looking through the code a bit and discussing it with a few others, would it be worth changing the normalization from layer norm to group norm?

Heathen 11 minutes ago
Author

Would probably be faster, right now normalization on eats about 0.5 it/s for me.

Hypernetwork Style Training, a tiny guide #2670

Replies: 19 comments · 66 replies

Heathen 11 days ago Author

Heathen 11 days ago Author

Heathen 10 days ago Author

Heathen 9 days ago Author

Heathen 9 days ago Author

Heathen 7 days ago Author

Heathen 6 days ago Author

Heathen 5 days ago Author

Heathen 5 days ago Author

Heathen 5 days ago Author

Heathen 5 days ago Author

Heathen 5 days ago Author

Heathen 3 days ago Author

Heathen 11 minutes ago Author

Insert Link

Replies: 19 comments 66 replies

Heathen 11 days ago
Author

Heathen
11 days ago
Author

Heathen 10 days ago
Author

Heathen 9 days ago
Author

Heathen 9 days ago
Author

Heathen
7 days ago
Author

Heathen 6 days ago
Author

Heathen 5 days ago
Author

Heathen 5 days ago
Author

Heathen
5 days ago
Author

Heathen 5 days ago
Author

Heathen 5 days ago
Author

Heathen 3 days ago
Author

Heathen 11 minutes ago
Author