Tutorial on Using Google Colab for Kaggle Competition
First of all, What is Colab ?
Colab is a Google’s collaborative version of the Jupyter/iPython notebook. And it is FREE! Now you can use Nvidia Tesla K80 GPU for free. I think that more powerful graphics cards will be added in the future.
Yes, you heard right! It’s free. You do not have to pay Amazon Web Services (AWS) anymore! Google released it to the public with the aim of improving the machine learning education and research.
In this article, I will focus on using Colab for Kaggle competition. Kaggle has become significant for machine learning competitions as well as it is quite useful website for those who want to step into this field. For more info and tutorials about Kaggle, go this website.
Well then, what are we waiting for? Lets go!
If you click the Colab website, the following screen will welcome you.
Click the “Go to Colaboratory” button and sign in with your Google account. And then, open a New Python 3 notebook as you can see below.
After this step, you should see that a Jupyter notebook was opened. You are seeing now one empty cell with “Play” button. After the code is written, you can run the cell using the play button.
We will start the change the name of our *.ipynb file. I preferred to change it to tut_kaggle.ipynb.
Before start to use the notebook. We have to do a few annoying access right things but don’t worry. It’s easy.
Moreover, I added the whole notebook to this link so you can handle these tedious steps with a single run.
First we have to invoke the following code blocks to get ready our environment.
Install Drive FUSE wrapper
More info can be found in this Colab example. I have also copied some of the codes from here and here. I used these two links as the article references.
You should see the following output.
On the top right of the cell there are buttons which have options such as wiev output full screen, clear output, delete cell/selection, add a comment and link to cell.
Generate authentication tokens for Colab
Run the following codes with the new cell which can be opened via Code button on the top left of the notebook.
You should see the following output. Click the link to allow the access .
After that you will see the access code. Copy and paste the code in the box and press “Enter”.
Generate creds for the Drive FUSE library
Run the following codes.
You have to do similar things in the generate tokens section.
Run the cell, click the link, copy paste the code and press “Enter”.
If you saw “Access token retrieved correctly.”
Congratulations, you were successful.
Create a directory and mount Google Drive using that directory
I preferred to call the directory I mounted as “my_drive” but you can change it whatever you want.
After mounted it, I created one more folder called “tut_kaggle” for the purpose of adding all of the comptetition data.
Finally, we created “tut_kaggle” folder in our Google Drive.
Please check your related drive folder exist or not.
I also created “colab_test.py” file and put it to “tut_kaggle” folder.
colab_test.py file
Now, I uploaded all of the competiton data from my local computer to my “tut_kaggle” drive folder. I prefer drag and drop the files; but you can use following code to upload, for more info please visit the Colab example.
Let’s check the files exist or not in the folder with the following code.
You should see the following output files.
And, just run the colab_test.py with the code below.
Yes! It works.
Are these codes really working on the GPU?
Let’s check it works on CPU or GPU with the following code.
If you are not working on GPU, your output will be below.
So, let’s change the hardware accelerator from None to GPU.
“ Edit>Notebook settings>hardware accelerator>GPU”
Run the same code and see the output.
Now, we are sure that we are working on GPU.
Competition
Finally, everything is ready. We can start to fight for the competition.
The competition is about acoustic scene classification.
The data consists of recordings of 15 different classes such as beach, home, restaurant vs. and the recordings have been cut to 10 second long segments which its type is Mel-Frequency Cepstrum (MFC).
I created a Python script is shown below.
I am using scikit-learn library to train and prediction.
First of all, all of the data must be assigned to the variables.
I used “numpy.load” for it.
Please consider of your file path!!!
My files’ path is: /content/my_drive/tut_kaggle/
So yours should be /content/<your directory name which you mounted in your drive>/<your folder for competition>/
I moved the SVM_with_rbf_kernel.py file to my “tut_kaggle” folder.
As you can see in the code, there are different classifiers.
Support Vector Machine with 72% accuracy.
k-NN with 55.6% accuracy.
Linear Discriminant Analysis with 65.9% accuracy.
Since the best accuracy is obtained from the Support Vector Machine (SVM),
I prefer to use it for predicting the X_test.npy and also for submission.
Just run the SVM_with_rbf_kernel.py. It will create the sub.csv file in your competition folder.
Submit the Result to the Kaggle
If you do not have an account, create an account first.
Then, go to the competition website and join the competition.
Click “Submit Predictions” button as it can be seen below.
Now, Are you ready for getting the score of our prediction ?
So then, why are you waiting?! Go, go, go!
Reminding!
As the rules of this competition, there is a time limitation.
Thus, you can submit your predictions two times a day.
Click the “Make Submission” button.
Here it is!
It’s a pretty good score on the first try.
Using “Neural Networks” you can get much better scores. It’s also another topic which I am planning to write.
I will use keras for the “Neural Networks” implementation.
Briefly, you have to import “keras” before using.
As a consequence, Artificial Intelligence (AI) studies are gaining momentum from day to day so you can improve your skills by trying different methods in this kind of competitions with free GPU usage.
Please visit FAQ (Frequently Asked Questions) website for more info about Colaboratory. Look at the things that are already done and are ongoing.
If you like the post, please share it and make it spread to the public and applaud the article by clicking the hand symbol below the page.
You can communicate with me via these channels.
Thank you for reading.
And,
Keep trying…
Good luck!