Get one more story in your member preview when you sign up. It’s free.

Sample outputs from FGO StyleGAN. Also here is a the link to view it for free

FGO StyleGAN: This Heroic Spirit Doesn’t Exist

When I first saw Nvidia’s StyleGAN’s results I felt like it looked like a bunch of black magic. I am not as experienced in the area of GANs as other parts of deep learning, this lack of experience and the thought that I really lacked the GPU firepower to train up my own StyleGAN stopped me from jumping in sooner. For scale, on the StyleGAN github Nvidia lists the GPU specifications, basically saying it takes around 1 week to train from scratch on 8 GPUs and if you only have a single GPU the training type is around 40 days. So running one of my GPU rigs for 40 days sounds terrifying in terms of time and also my electric bill. With those constraints I set aside my ambitions of training up a StyleGAN for awhile.

FGO StyleGAN outputs

Leap of Faith: Custom FGO StyleGAN

A few weeks ago a teammate of mine sent me some videos on LinkedIn of fashion models morphing into one another in a video style I recognized as an application of StyleGAN. Digging into it more I saw that a lot of work had been going on in the community around StyleGAN since the last time I had looked. Personally I do a lot of work in Pytorch these days, but I think when you are trying to adapt research to your own projects it is often easiest to use whatever tools the research was done with. In this case while there is a Pytorch port that seems fairly functional the best course of action was to use the Tensorflow based code that the research was done with that has been opensourced by Nvidia.

What caused me to take the leap of faith to customize my own StyleGan though was the work by an individual named Gwern Branwen for their work on making a website “This Waifu does not exist”. Frankly I would not have really bothered to devote time and resources to train up a StyleGAN if I had not seen Gwern’s post on how they walked through their StyleGAN and were kind enough to provided pretrained weights for an anime based StyleGAN trained at 512x512 resolution.

Gwern displays the anime based StyleGANs which they trained or were trained by others using the weights they provided. While my project is similar to the ones there, someone in the community trained a “Saber face” StyleGAN, the StyleGAN for this post is a general Fate Grand Order StyleGAN.

Brief GAN Background

StyleGAN is an improvement over a previous model from Nvidia called ProGAN. ProGAN was trained to generate high quality images 1024x1024 and did so by implementing a progressive training cycle where it starts training images at low-resolution (4x4)and increases that resolution over time by adding additional layers. Training the low resolution images helped make training faster and increased the quality of final images as the networks were able to learn important lower level characteristics. However ProGAN has limited ability to control the generated images which is where StyleGAN comes in. StyleGAN is based on ProGAN but with the additions to the generator network to allow for control of three types of features.

  1. Coarse: affects pose, general hair style, face shape, etc
  2. Middle: affects finer facial features, hair style, eyes open/closed, etc.
  3. Fine: affects color scheme (eye, hair and skin) and micro features.

This is just a brief description of StyleGAN for more information check out the paper or other writeups on medium.

This one shows a few male faces. However a lot of them turn into super evil looking images? Maybe guys are just evil? who knows. This one also shows a number of lower quality generated images probably due to me not removing low resolution images properly when I created the dataset.

Dataset Building and Preparation

I used a previously built Tensorflow Object detector to crop out the heads from around 6K FGO wallpaper/fan art images.

The FGO dataset I used is comprised of ~6K images of various wallpapers, in game images, fan art etc of various sizes. I passed a Tensorflow based FGO tuned head detector over this dataset and extracted around ~8K heads. Once this was done I went through and cleaned the dataset by hand in a quick pass cutting the final set down to around ~6K heads. One of the lessons learned that I have is that I should spend more time in this section of the process since there are a number of lower quality images and also images of backgrounds, armor, non-character images left in the dataset which causes weird artifacts in generated images or just lower quality generated images.

Like other Tensorflow networks StyleGAN relies on using tfrecord files and can be generated by the dataset_tool.py file in the StyleGAN repo for training. One of the important notes in training a StyleGAN is that the images in the dataset need to be the same size and in the same color format as the StyleGAN that is being used for pretrained weights. So for me using the 1024x1024 original network was difficult because finding anime headshots at that size is somewhat difficult and my GPU would likely not be able to handle it. This makes the 512x512 network trained by Gwern extremely attractive since it is easier to find images where I can get heads that are close to that resolution and my GPU can handle it easier. I converted all the images into sRGB images and resized them to 512x512.

To do the resizing I used Python’s PIL library for processing and reformatting. Once that was done I was ready to start on the long and arduous process of fine-tuning a StyleGAN to my FGO use case.

Training

Images generated during training for 85 ticks. A few boxes were based on low quality of non headshot images so they just never developed well. However it is interesting to see the model trying to generate headgear (to varying success) as well as a few male characters who show up and I think have cool overly dramatic facial features and hair

The physical training process for StyleGAN is controlled by 2 scripts in the Nvidia repo. train.py and training/training_loop.py. For this I mostly followed the recommendations Gwern laid out in their blogpost.

Training/training_loop.py

One of the main parameters to set is the resume_kimg parameter. If it is set to 0 then the StyleGAN will begin training at what appears to be a random initialization. So instead I set it to resume_kimg=7000 at this level the StyleGAN is able to make use of the pretrained weights and starts at a much better point than it would otherwise.

I was having OOM issues and eventually traced it to be related to the metrics line on line 266, so I ended up just commenting it out. As of now I haven’t had additional issues with it.

Train.py

desc += '-faces'; dataset = EasyDict(tfrecord_dir='faces',resolution=512);              train.mirror_augment = True

The second area was to specify the training process. Since I have 1 GPU I set the number of GPUs to 1 and specify that each minibatch should be of 4 images. After that it is some unadjusted learning rate scheduling and a total number of images being 99000K images. This essentially sets how long the model will train for.

Future Improvements

Hardware

While AWS is likely the easier approach, I think with the amount of training I do I would quickly incur enough AWS fees to make me wish I upgraded my GPU. Folks in the community have mentioned it is around 10X cheaper to build your own GPU rig than use resources like AWS.

Paying hundreds of dollars to test out a StyleGAN seems terrifying to me.

So my main thought at the moment is to upgrade to a 1080TI ~$800–1000 or if I really want to go all in, a 2080TI card for a one time cost of ~$1300 which should yield. If anyone has thoughts on this feel free to let me know!

Either should help to cut down the training time required since they are significantly faster than my current main 1080 GPU.

More selected outputs

Dataset Quality

In my defense I was flying out to a Wing Chun seminar the next day and wanted to start training the StyleGAN earlier rather than losing the 48 hours of training time I could get while I was gone.

Some initial ideas, remove low quality images, remove non head images, remove duplicates.

A simple process to remove low quality images would be to delete images below a certain resolution or ones without a side above a certain cutoff, say 300 pixels.

Removing non head images is a bit more involved but could be done with a CNN and a fairly straightforward process of labeling head and non head images. But it would take additional time to build.

Duplicate images probably do not help the process a lot, but I am also unsure how badly it hurts it? I built the dataset for this project using google image pulls for a number of FGO related terms and characters. This process yields a lot of duplicates. For instance if I search for something like “FGO Saber” this will give me FGO characters who are “saber class” or are generally referred to as “saber”. Next if I search for “FGO Artoria Pendragon” Who is commonly called “Saber” I get a lot of duplicate images.

I could deduplicate the dataset using something like perceptual hashing using the ImageHash library which I have done for work or a CNN also works well.

As of now I think that the most important thing to do would be to make sure the images are of a high enough quality that StyleGAN will learn to make good 512x512 images. Hopefully a careful manual pruning of the higher resolution dataset will get rid of most of the non head images and the duplication might skew the GAN towards making more of those characters but that doesn’t seem like the worst of problems for me at the moment.

One of the things I was happy about is that the GAN learned to generate Jeanne’s head piece that you can see appear for a bit in both of these examples

Closing Thoughts

An interesting thought that I had while looking at these is around whether or not I could use StyleGANs for data augmentation in computer vision problems. Gwern notes that others have been fairly successful on training models with sub < 5K samples and this FGO StyleGAN shows similar success. This could be useful for domains where there are relatively few samples and the generation of examples could be controlled via the feature controls NVIDIA added into StyleGAN over ProGAN. The viability of this remains to be seen though, at the very least it could be cool, and might be pitchable as an R&D project for work. Who knows?

In the meantime I will enjoy generating more output samples and images for this FGO StyleGAN. I currently have a second StyleGAN training where I am using the anime weights as a starting point for a dataset of people photos. I am unsure how that will turn out but I will try to share those results in due time.

Feel free to check out the Git repo here

Both of these showcase some of the issues I had where low resolution or non head images were in the dataset so it rotates through some weird/creepy stages.

Towards Data Science

Sharing concepts, ideas, and codes.

)

Michael Sugimura

Written by

data scientist, gamer, martial artist, photographer, and chef… also part time house cat https://www.linkedin.com/in/michael-sugimura-b8120940/

Towards Data Science

Sharing concepts, ideas, and codes.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade