This is a quick run-through of what is required to setup a GPU-based installation of Theano which can be done either on an AWS Grid instance or on a computer with a suitable NVIDIA GPU that supports CUDA for computation. In this tutorial we will briefly run through installation of Cuda 7.5, CuDNN 7.0, Theano 0.7 and Keras. We will finally test out our installation by running a Deeplearning training instance on the well known MNIST dataset using an example from Keras and ensure that we have relatively fast training. We are going to use Ubuntu 14.04 in this post because I found 15.04 to be unstable on my Laptop at this time.
Start off by installing Ubuntu 14.04. Then update it and install essential build tools like so
1 2 |
sudo apt-get update && sudo apt-get upgrade sudo apt-get install build-essential git vim python-pip python3-pip liblapack-dev cython cython3 gfortran |
Then we need to install numpy and scipy (we will install NVIDIA’s drivers a little later):
1 2 |
sudo pip3 install numpy sudo pip3 install scipy |
Ensure that they build and get cythonized (through the appropriate version – cython or cython3 depending on which version python you will use). Remember to use pip instead of pip3 if you need to use Python2 instead of Python3. If you want to support both versions of Python repeat the process with the other pip version.
Now, let us try to install NVIDIA’s CUDA via the run-file. It can usually be found here and is slightly over 1 GB in size. This will download CUDA 7.5 provided the link still works:
1 |
wget http://developer.download.nvidia.com/compute/cuda/7.5/Prod/local_installers/cuda_7.5.18_linux.run |
Now extract the contents of the CUDA driver like so:
1 2 3 4 |
chmod a+x cuda_7.5.18_linux.run mkdir cuda_installer ./cuda_7.5.18_linux.run -extract=`pwd`/cuda_installer rm cuda_7.5.18_linux.run |
Then if you are on an Amazon EC2 instance you can directly move on to the next step, but if you’re on a machine where the NVIDIA driver or LightDM are running, I prefer uninstalling it and doing a reboot like so:
1 2 |
sudo apt-get remove lightdm sudo reboot |
After the reboot you can continue with the same steps that would be required on say a command-line instance of EC2. On a laptop use Ctrl-Alt-F1 to get a shell using which you can install the NVIDIA drivers.
1 |
sudo apt-get install linux-image-extra-virtual |
Now use an editor to create a new file to disable Nouveau (the open-source driver that conflicts with NVIDIA’s drivers):
1 |
sudo vi /etc/modprobe.d/blacklist-nouveau.conf |
Then add the following lines to it:
1 2 3 4 5 |
blacklist nouveau blacklist lbm-nouveau options nouveau modeset=0 alias nouveau off alias lbm-nouveau off |
After saving this file type the following in the shell:
1 2 3 |
echo options nouveau modeset=0 | sudo tee -a /etc/modprobe.d/nouveau-kms.conf sudo update-initramfs -u sudo reboot |
After the reboot:
1 2 |
sudo apt-get install linux-source sudo apt-get install linux-headers-`uname -r` |
Next run the NVIDIA driver installation (you may have a slightly different driver if you used a newer CUDA installation file) like so:
1 2 |
cd cuda_installer sudo ./NVIDIA_Linux-x86_64-352.39.run |
Select “Accept” by clicking the left arrow key and press Enter. Ignore any pre-install script failure by clicking “Continue Installation”. Ignore warnings about 32-bit compatibility drivers and install the 32-bit libraries and run the nvidia-xconfig at the end only if you’re going to be running X. On Amazon AWS EC2 instances select “no”. Once installed, you should be able to use:
1 |
nvidia-smi |
Now let us finally install CUDA and the samples:
1 2 3 |
sudo modprobe nvidia sudo ./cuda-linux64-rel-7.5.18-19867135.run sudo ./cuda-samples-linux-7.5.18-19867135.run |
In the steps above, use “q” to go much faster to the end of the EULA.
Now we want to add CUDA to our path and library paths so CUDA can be compiled again easily. This can easily be done by adding the following lines to the ~/.bashrc file in your home-directory.
1 2 |
export PATH=$PATH:/usr/local/cuda-7.5/bin export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-7.5/lib64 |
For a regular Linux laptop where you may want to use X again, reinstall lightdm now and reboot. For Amazon EC2 ignore this instruction.
1 2 |
sudo apt-get install lightdm sudo reboot |
At this point we want to install CuDNN – this is a component that while optional will improve the performance especially on Convolutional Neural Networks, so I recommend installing it. To get this software you will need to register with NVIDIA – go here to get it. The current version is cudnn-7.0. To install CuDNN use the following steps:
1 2 3 |
tar -xzvf cudnn-7.0-linux-x64-v3.0-prod.tgz sudo cp cuda/include/cudnn.h /usr/local/cuda/include/ sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64/ |
Installation of Theano
1 |
sudo pip3 install --upgrade --no-deps git+git://github.com/Theano/Theano.git |
Make sure to add the most important flags that enable GPU usage and so on to Theano’s config file:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
[global] floatX = float32 device = gpu optimizer = fast_run [lib] cnmem = 0.9 [nvcc] fastmath = True [blas] ldflags = -llapack -lblas |
Store this file in $HOME/.theanorc.
Install Keras (a neat library by Francois Chollet that is written in the spirit of Torch but is written in Python/Theano. Documentation can be found at keras.io.
1 2 3 |
git clone https://github.com/fchollet/keras.git keras cd keras sudo python3 setup.py install |
To test out GPU based Deep-learning training try this:
1 2 |
cd keras/examples/ python3 mnist_cnn.py |
If all goes well you should see something like the following line which I see on my laptop:
1 |
Using gpu device X: GeForce GTX 980M (CNMeM is enabled) |
To be sure that you’re running it correctly (just in case you don’t see Theano using your GPU try running it like so):
1 |
THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python3 mnist_cnn.py |
This will initially download the MNIST dataset in the form of a pickled zipped file. Then it will begin building the code to start training. If your GPU instance is optimized you should be able to come very close to 12 or 13 seconds on the basic Amazon EC2 cluster that has an NVIDIA K520 GRID CPU. There are likely to be additional optimizations that can be performed, but this should give you a fairly optimized installation.
Happy Deep Learning!!!