Multi-Class Image Classification using Alexnet Deep Learning Network implemented in Keras API

Keshav Tangri
Jul 31, 2020 · 8 min read

Introduction

Computer is an amazing machine (no doubt in that) and I am really mesmerized by the fact how computers are able to learn and classify Images. Image classification have it’s own advantages and application in various ways, for example, we can buid a pet food dispenser based on which species (cat or dog) is approaching it. I know it’s a wierd idea like they will end up eating all of the food but the system can be time controlled and can be dispensed only once. Anyways let’s move further before getting distracted and continue our discussion. So after upskilling myself with the knowledge of Deep Learning Neural Networks, I thought of building one myself. So here I am going to share building an Alexnet Convolutional Neural Network for 6 different classes built from scratch using Keras and coded in Python.

Overview of AlexNet

Before getting to AlexNet , it is recommended to go through the Wikipedia article on Convolutional Neural Network Architecture to understand the terminologies in this article. Let’s dive in to get a basic overview of the AlexNet network

AlexNet[1] is a Classic type of Convolutional Neural Network, and it came into existence after the 2012 ImageNet challenge. The network architecture is given below :

AlexNet Architecture (courtesy of Andrew Ng on Coursera[2])

Model Explanation : The Input to this model have the dimensions 227x227x3 follwed by a Convolutional Layer with 96 filters of 11x11 dimensions and having a ‘same’ padding and a stride of 4. The resulting output dimensions are given as :

floor(((n + 2*padding - filter)/stride) + 1 ) * floor(((n + 2*padding — filter)/stride) + 1)

Note : This formula is for square input with height = width = n

Explaining the first Layer with input 227x227x3 and Convolutional layer with 96 filters of 11x11 , ‘valid’ padding and stride = 4 , output dims will be

= floor(((227 + 0–11)/4) + 1) * floor(((227 + 0–11)/4) + 1)

= floor((216/4) + 1) * floor((216/4) + 1)

= floor(54 + 1) * floor(54 + 1)

= 55 * 55

Since number of filters = 96 , thus output of first Layer is : 55x55x96

Continuing we have the MaxPooling layer (3, 3) with the stride of 2,making the output size decrease to 27x27x96, followed by another Convolutional Layer with 256, (5,5) filters and ‘same’ padding, that is, the output height and width are retained as the previous layer thus output from this layer is 27x27x256. Next we have the MaxPooling again ,reducing the size to 13x13x256. Another Convolutional Operation with 384, (3,3) filters having same padding is applied twice giving the output as 13x13x384, followed by another Convulutional Layer with 256 , (3,3) filters and same padding resulting in 13x13x256 output. This is MaxPooled and dimensions are reduced to 6x6x256. Further the layer is Flatten out and 2 Fully Connected Layers with 4096 units each are made which is further connected to 1000 units softmax layer. The network is used for classifying much large number of classes as per our requirement. However in our case, we will make the output softmax layer with 6 units as we ahve to classify into 6 classes. The softmax layer gives us the probablities for each class to which an Input Image might belong.

Implementing AlexNet using Keras

Keras is an API for python, built over Tensorflow 2.0,which is scalable and adapt to deployment capabilities of Tensorflow [3]. We will Build the Layers from scratch in Python using Keras API.

First, lets Import the essentials libraries

In this article we will use the Image Generator to build the Classifier. Next we will import the data using Image Data Generator. Before that let’s understand the Data. The dataset can be found here.

This Data contains around 25k images of size 150x150 distributed under 6 categories, namely : ‘buildings’ , ‘forest’ , ‘glacier’ , ‘mountain’ , ‘sea’ , ‘street’ . There are 14K images in training set, 3K in test setand 7K in Prediction set.

The data images for all the categories are split into it’s respective directories, thus making it easy to infer the labels as according to keras documentation[4]

Arguments :

directory: Directory where the data is located. If labels is "inferred", it should contain subdirectories, each containing images for a class. Otherwise, the directory structure is ignored.

In the linked dataset also, we have a directory structure and thus the ImageDataGenerator will infer the labels. A view of dataset directory structure is shown below :

Directory structure in dataset

Next we will import the dataset as shown below :

Output

As explained above, the input size for AlexNet is 227x227x3 and so we will change the target size to (227,227). The by default Batch Size is 32. Lets see the type of train and train_datagen.

The type keras.preprocessing.image.DirectoryIterator is an Iterator capable of reading images from a directory on disk[5]. The keras.preprocessing.image.ImageDataGenerator generate batches of tensor image data with real-time data augmentation. The by default batch_size is 32

Next let us check the dimensions of the first image and its associated output in the first batch.

Output :

Let’s check out some Examples from the Dataset :

axs[0][0].imshow(train[0][0][12])
axs[0][0].set_title(train[0][1][12])
axs[0][1].imshow(train[0][0][10])
axs[0][1].set_title(train[0][1][10])
axs[0][2].imshow(train[0][0][5])
axs[0][2].set_title(train[0][1][5])
axs[1][0].imshow(train[0][0][20])
axs[1][0].set_title(train[0][1][20])
axs[1][1].imshow(train[0][0][25])
axs[1][1].set_title(train[0][1][25])
axs[1][2].imshow(train[0][0][3])
axs[1][2].set_title(train[0][1][3])

These are harcoded examples to show one pic for each category in 1st batch, The results can differ based on the shuffling done by your machine.

Output :

Images from different categories, with the title as the output(category) of each image in the dataset

Next let’s start the construction of Model. The following code block will construct your AlexNet Deep Learning Network :

return model

Next we will call the function that will return the model. We passed in the shape as the shape of our image which we have already rescaled to 227x227

The model can be summarised using the command

Output :

Summary for our model

Next we will compile the model using adam optimizer and choosing loss as categorical_crossentropy , with accuracy metrics. You can study about losses in keras here[6] and quick study for optimizers in Keras can be done here[7].

Next we will train the model using fit_generator with the command :

Output :

To know more about fit_generator and its difference with fit, you can check out this website. After running our model , we got a training accuracy of 98.33%. Next we will load test data to get test accuracy :

Output

Next we will evaluate our model on test data

print ("Loss = " + str(preds[0]))
print ("Test Accuracy = " + str(preds[1]))

Output :

We got a test accuracy of 87.2% Next we will run the model over prediction Images

Output:

Run the predict_generator on it

Lets test out some predicted images :

Input Image to our model

To get its prediction we will do :

Output :

This is the output of our model, since we used softmax at last layer , the model is returning the probabilities for each category for this particular image input. As seen above, the model is predicting the image as ‘building’ with a probability of 0.99999893

Now we don’t want to have this to be our output format, so we will make a function that will give us the category to which the Input Image, predicted by the model will belong to.

We will call the function as :

Output :

Thus output of some other images are shown below :

axs[0][0].imshow(predict[1002][0][0])
axs[0][0].set_title(get_category(predictions[1002]))
axs[0][1].imshow(predict[22][0][0])
axs[0][1].set_title(get_category(predictions[22]))
axs[0][2].imshow(predict[1300][0][0])
axs[0][2].set_title(get_category(predictions[1300]))
axs[1][0].imshow(predict[3300][0][0])
axs[1][0].set_title(get_category(predictions[3300]))
axs[1][1].imshow(predict[7002][0][0])
axs[1][1].set_title(get_category(predictions[7002]))
axs[1][2].imshow(predict[512][0][0])
axs[1][2].set_title(get_category(predictions[512]))
These are some output samples from our model

The Python Notebook for this model can be cloned/downloaded from my github here. I hope you like this article and I hope you will be able to b uild your own model with a different data set and/or with custom layers instead of following a Classic CNN Network.

I hope this article will be able to give you an insight about AlexNet. Better networks such as VGG16 , VGG19, ResNets etc are also worth a try. I hope you find this article interesting and will definitely try some other classic CNN models on the classification problem. Feel free to share your results down in the comment box. Do clap for the article if you like it as it motivates me to write such more posts. Find me on Linked’In and Instagram and share your feedback.

References

[1] Krizhevsky, Alex & Sutskever, Ilya & Hinton, Geoffrey. (2012). ImageNet Classification with Deep Convolutional Neural Networks. Neural Information Processing Systems. 25. 10.1145/3065386.

[2] https://coursera.org/share/1fe2c4b8b8d1e3039ca6ae359b8edb30

[3] https://keras.io/

[4] https://keras.io/api/preprocessing/image/

[5] https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image/DirectoryIterator

[6] https://keras.io/api/losses/

[7] https://keras.io/api/optimizers/

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data…

Sign up for Analytics Vidhya News Bytes

By Analytics Vidhya

Latest news from Analytics Vidhya on our Hackathons and some of our best articles! Take a look.

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Keshav Tangri

Written by

Deep Learning Enthusiast, Full Stack Native Android App Dev + Web Developer, Software Developer - Python

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium