Sample outputs from FGO StyleGAN. Also here is a the link to view it for free

FGO StyleGAN: This Heroic Spirit Doesn’t Exist

May 20 · 11 min read

When I first saw Nvidia’s StyleGAN’s results I felt like it looked like a bunch of black magic. I am not as experienced in the area of GANs as other parts of deep learning, this lack of experience and the thought that I really lacked the GPU firepower to train up my own StyleGAN stopped me from jumping in sooner. For scale, on the StyleGAN github Nvidia lists the GPU specifications, basically saying it takes around 1 week to train from scratch on 8 GPUs and if you only have a single GPU the training type is around 40 days. So running one of my GPU rigs for 40 days sounds terrifying in terms of time and also my electric bill. With those constraints I set aside my ambitions of training up a StyleGAN for awhile.

Leap of Faith: Custom FGO StyleGAN

While I enjoy black magic as much as the next person, I also enjoy understanding what is happening, demystifying things where I can, and building my own versions of things.

A few weeks ago a teammate of mine sent me some videos on LinkedIn of fashion models morphing into one another in a video style I recognized as an application of StyleGAN. Digging into it more I saw that a lot of work had been going on in the community around StyleGAN since the last time I had looked. Personally I do a lot of work in Pytorch these days, but I think when you are trying to adapt research to your own projects it is often easiest to use whatever tools the research was done with. In this case while there is a Pytorch port that seems fairly functional the best course of action was to use the Tensorflow based code that the research was done with that has been opensourced by Nvidia.

What caused me to take the leap of faith to customize my own StyleGan though was the work by an individual named Gwern Branwen for their work on making a website “This Waifu does not exist”. Frankly I would not have really bothered to devote time and resources to train up a StyleGAN if I had not seen Gwern’s post on how they walked through their StyleGAN and were kind enough to provided pretrained weights for an anime based StyleGAN trained at 512x512 resolution.

Gwern displays the anime based StyleGANs which they trained or were trained by others using the weights they provided. While my project is similar to the ones there, someone in the community trained a “Saber face” StyleGAN, the StyleGAN for this post is a general Fate Grand Order StyleGAN.

Brief GAN Background

Generative Adversarial Networks (GAN) are an interesting area of deep learning where the training process involves two networks a generator and a discriminator. The generator model starts to create images on its own, it starts from random noise while the discriminator gives feedback by looking at training examples and generator output and predicts if they are “real” or “fake”. Overtime this feedback helps the generator create more realistic images.

StyleGAN is an improvement over a previous model from Nvidia called ProGAN. ProGAN was trained to generate high quality images 1024x1024 and did so by implementing a progressive training cycle where it starts training images at low-resolution (4x4)and increases that resolution over time by adding additional layers. Training the low resolution images helped make training faster and increased the quality of final images as the networks were able to learn important lower level characteristics. However ProGAN has limited ability to control the generated images which is where StyleGAN comes in. StyleGAN is based on ProGAN but with the additions to the generator network to allow for control of three types of features.

Coarse: affects pose, general hair style, face shape, etc
Middle: affects finer facial features, hair style, eyes open/closed, etc.
Fine: affects color scheme (eye, hair and skin) and micro features.

This is just a brief description of StyleGAN for more information check out the paper or other writeups on medium.

This one shows a few male faces. However a lot of them turn into super evil looking images? Maybe guys are just evil? who knows. This one also shows a number of lower quality generated images probably due to me not removing low resolution images properly when I created the dataset.

Dataset Building and Preparation

In order to get StyleGAN running, the hardest part was deciding on how I wanted to approach the problem and getting a properly formatted dataset. I made some mistakes along the way which I will walk through as well. In line with the original paper and many other StyleGANs I have seen, I decided to make a dataset of headshots. As for topic, the only datasets I had lying around of sufficient size for this were related to the anime Fate Grand Order (FGO).

I used a previously built Tensorflow Object detector to crop out the heads from around 6K FGO wallpaper/fan art images.

The FGO dataset I used is comprised of ~6K images of various wallpapers, in game images, fan art etc of various sizes. I passed a Tensorflow based FGO tuned head detector over this dataset and extracted around ~8K heads. Once this was done I went through and cleaned the dataset by hand in a quick pass cutting the final set down to around ~6K heads. One of the lessons learned that I have is that I should spend more time in this section of the process since there are a number of lower quality images and also images of backgrounds, armor, non-character images left in the dataset which causes weird artifacts in generated images or just lower quality generated images.

Like other Tensorflow networks StyleGAN relies on using tfrecord files and can be generated by the dataset_tool.py file in the StyleGAN repo for training. One of the important notes in training a StyleGAN is that the images in the dataset need to be the same size and in the same color format as the StyleGAN that is being used for pretrained weights. So for me using the 1024x1024 original network was difficult because finding anime headshots at that size is somewhat difficult and my GPU would likely not be able to handle it. This makes the 512x512 network trained by Gwern extremely attractive since it is easier to find images where I can get heads that are close to that resolution and my GPU can handle it easier. I converted all the images into sRGB images and resized them to 512x512.

To do the resizing I used Python’s PIL library for processing and reformatting. Once that was done I was ready to start on the long and arduous process of fine-tuning a StyleGAN to my FGO use case.

Training

While starting from pretrained weights is a significant help, training was not a short process by any means. After was all said and done I stopped training after a little over a week of GPU time (~180 hours) on my 1080 card which amounted to ~85 “ticks” which is a 30K image cycle that StyleGAN uses as sort of epochs… so it generated 2.5 million FGO images.

The physical training process for StyleGAN is controlled by 2 scripts in the Nvidia repo. train.py and training/training_loop.py. For this I mostly followed the recommendations Gwern laid out in their blogpost.

Training/training_loop.py

The main area of the code to adjust starts around line 112. This section sets a lot of default parameters since StyleGAN does not take command line arguments so they must be set here instead.

One of the main parameters to set is the resume_kimg parameter. If it is set to 0 then the StyleGAN will begin training at what appears to be a random initialization. So instead I set it to resume_kimg=7000 at this level the StyleGAN is able to make use of the pretrained weights and starts at a much better point than it would otherwise.

I was having OOM issues and eventually traced it to be related to the metrics line on line 266, so I ended up just commenting it out. As of now I haven’t had additional issues with it.

Train.py

There are two areas to adjust, the first is where you specify which dataset to use. For this I added the dataset I had prepared called “face”.

desc += '-faces'; dataset = EasyDict(tfrecord_dir='faces',resolution=512);              train.mirror_augment = True

The second area was to specify the training process. Since I have 1 GPU I set the number of GPUs to 1 and specify that each minibatch should be of 4 images. After that it is some unadjusted learning rate scheduling and a total number of images being 99000K images. This essentially sets how long the model will train for.

Future Improvements

For this one there are two areas where I think I could improve my workflow. One is hardware, this was a very time intensive project and if I decide to do more GAN work I think an upgrade to my main GPU rig would be a good quality of life improvement. Second is dataset quality. I skimped on thoroughly cleaning my dataset and have paid a price for it, I think this is a good cautionary tale.

Hardware

This project is one of the more time intensive ones I have gone through so it frankly it made me consider whether it would be worthwhile to upgrade my computing resources. For this there are two main routes I could take, either use more cloud resources like AWS or physically upgrade my GPUs.

While AWS is likely the easier approach, I think with the amount of training I do I would quickly incur enough AWS fees to make me wish I upgraded my GPU. Folks in the community have mentioned it is around 10X cheaper to build your own GPU rig than use resources like AWS.

Paying hundreds of dollars to test out a StyleGAN seems terrifying to me.

So my main thought at the moment is to upgrade to a 1080TI ~$800–1000 or if I really want to go all in, a 2080TI card for a one time cost of ~$1300 which should yield. If anyone has thoughts on this feel free to let me know!

Either should help to cut down the training time required since they are significantly faster than my current main 1080 GPU.

Dataset Quality

As with all things in data science, the initial quality of a dataset and the effort of cleaning it can make or break a project. For this project I sped through the cleaning process in order to get to training an initial StyleGan quickly. This is fine I feel but also a mistake on my part. If I took more time in cleaning the end results would likely be much better.

In my defense I was flying out to a Wing Chun seminar the next day and wanted to start training the StyleGAN earlier rather than losing the 48 hours of training time I could get while I was gone.

Some initial ideas, remove low quality images, remove non head images, remove duplicates.

A simple process to remove low quality images would be to delete images below a certain resolution or ones without a side above a certain cutoff, say 300 pixels.

Removing non head images is a bit more involved but could be done with a CNN and a fairly straightforward process of labeling head and non head images. But it would take additional time to build.

Duplicate images probably do not help the process a lot, but I am also unsure how badly it hurts it? I built the dataset for this project using google image pulls for a number of FGO related terms and characters. This process yields a lot of duplicates. For instance if I search for something like “FGO Saber” this will give me FGO characters who are “saber class” or are generally referred to as “saber”. Next if I search for “FGO Artoria Pendragon” Who is commonly called “Saber” I get a lot of duplicate images.

I could deduplicate the dataset using something like perceptual hashing using the ImageHash library which I have done for work or a CNN also works well.

As of now I think that the most important thing to do would be to make sure the images are of a high enough quality that StyleGAN will learn to make good 512x512 images. Hopefully a careful manual pruning of the higher resolution dataset will get rid of most of the non head images and the duplication might skew the GAN towards making more of those characters but that doesn’t seem like the worst of problems for me at the moment.

One of the things I was happy about is that the GAN learned to generate Jeanne’s head piece that you can see appear for a bit in both of these examples

Closing Thoughts

As I stated before this was one of the more computationally intensive projects I have worked on but the results were quite interesting and I would like to build a few more of them. The lessons learned from training this initial FGO StyleGAN should help to smooth out those processes.

An interesting thought that I had while looking at these is around whether or not I could use StyleGANs for data augmentation in computer vision problems. Gwern notes that others have been fairly successful on training models with sub < 5K samples and this FGO StyleGAN shows similar success. This could be useful for domains where there are relatively few samples and the generation of examples could be controlled via the feature controls NVIDIA added into StyleGAN over ProGAN. The viability of this remains to be seen though, at the very least it could be cool, and might be pitchable as an R&D project for work. Who knows?

In the meantime I will enjoy generating more output samples and images for this FGO StyleGAN. I currently have a second StyleGAN training where I am using the anime weights as a starting point for a dataset of people photos. I am unsure how that will turn out but I will try to share those results in due time.

Feel free to check out the Git repo here

Get one more story in your member preview when you sign up. It’s free.

Already have an account? Sign in

FGO StyleGAN: This Heroic Spirit Doesn’t Exist

Leap of Faith: Custom FGO StyleGAN

Brief GAN Background

Dataset Building and Preparation

Training

Training/training_loop.py

Train.py

Future Improvements

Hardware

Dataset Quality

Closing Thoughts

Towards Data Science

Sharing concepts, ideas, and codes.

Michael Sugimura

data scientist, gamer, martial artist, photographer, and chef… also part time house cat https://www.linkedin.com/in/michael-sugimura-b8120940/

Towards Data Science

Sharing concepts, ideas, and codes.

Discover Medium

Make Medium yours

Become a member

Get one more story in your member preview when you sign up. It’s free.

Already have an account? Sign in

FGO StyleGAN: This Heroic Spirit Doesn’t Exist

Leap of Faith: Custom FGO StyleGAN

Brief GAN Background

Dataset Building and Preparation

Training

Training/training_loop.py

Train.py

Future Improvements

Hardware

Dataset Quality

Closing Thoughts

Towards Data Science

Sharing concepts, ideas, and codes.

257

257 claps

Michael Sugimura

data scientist, gamer, martial artist, photographer, and chef… also part time house cat https://www.linkedin.com/in/michael-sugimura-b8120940/

Towards Data Science

Sharing concepts, ideas, and codes.

Discover Medium

Make Medium yours

Become a member