Published in

Towards Data Science

You have 2 free member-only stories left this month.

Richmond Alake

Aug 14, 2020

17 min read

Technical

Implementing AlexNet CNN Architecture Using TensorFlow 2.0+ and Keras

Learn how to implement the neural network architecture that kicked off the deep convolutional neural network revolution back in 2012.

Learn the basics of AI and Deep Learning with TensorFlow and Keras in this Live Training Session hosted by Me.

Introduction

The main content of this article will present how the AlexNet Convolutional Neural Network(CNN) architecture is implemented using TensorFlow and Keras.

But first, allow me to provide a brief background behind the AlexNet CNN architecture.

AlexNet was first utilized in the public setting when it won the ImageNet Large Scale Visual Recognition Challenge(ILSSVRC 2012 contest). It was at this contest that AlexNet showed that deep convolutional neural network can be used for solving image classification.

AlexNet won the ILSVRC 2012 contest by a margin.

The research paper that detailed the internal components of the CNN architecture also introduced some novel techniques and methods such as efficient computing resource utilization; data augmentation, GPU training, and multiple strategies to prevent overfitting within neural networks.

I have written an article that presents key ideas and techniques that AlexNet brought to the world of computer vision and deep learning.

Here are some of the key learning objectives from this article:

Introduction to neural network implementation with Keras and TensorFlow
Data preprocessing with TensorFlow
Training visualization with TensorBoard
Description of standard machine learning terms and terminologies

AlexNet Implementation

AlexNet CNN is probably one of the simplest methods to approach understanding deep learning concepts and techniques.

AlexNet is not a complicated architecture when it is compared with some state of the art CNN architectures that have emerged in the more recent years.

AlexNet is simple enough for beginners and intermediate deep learning practitioners to pick up some good practices on model implementation techniques.

All code presented in this article is written using Jupyter Lab. At the end of this article is a GitHub link to the notebook that includes all code in the implementation section.

So let’s begin.

1. Tools And Libraries

We begin implementation by importing the following libraries:

TensorFlow: An open-source platform for the implementation, training, and deployment of machine learning models.
Keras: An open-source library used for the implementation of neural network architectures that run on both CPUs and GPUs.
Matplotlib: A visualization python tool used for illustrating interactive charts and images.

import tensorflow as tf
from tensorflow import keras
import matplotlib.pyplot as plt
import os
import time

2. Dataset

The CIFAR-10 dataset contains 60,000 colour images, each with dimensions 32x32px. The content of the images within the dataset is sampled from 10 classes.

CIFAR-10 images were aggregated by some of the creators of the AlexNet network, Alex Krizhevsky and Geoffrey Hinton.

The deep learning Keras library provides direct access to the CIFAR10 dataset with relative ease, through its dataset module. Accessing common datasets such as CIFAR10 or MNIST, becomes a trivial task with Keras.

(train_images, train_labels), (test_images, test_labels) = keras.datasets.cifar10.load_data()

In order to reference the class names of the images during the visualization stage, a python list containing the classes is initialized with the variable name CLASS_NAMES.

CLASS_NAMES= ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']

The CIFAR dataset is partitioned into 50,000 training data and 10,000 test data by default. The last partition of the dataset we require is the validation data.

The validation data is obtained by taking the last 5000 images within the training data.

validation_images, validation_labels = train_images[:5000], train_labels[:5000]
train_images, train_labels = train_images[5000:], train_labels[5000:]

Training Dataset: This is the group of our dataset used to train the neural network directly. Training data refers to the dataset partition exposed to the neural network during training.
Validation Dataset: This group of the dataset is utilized during training to assess the performance of the network at various iterations.
Test Dataset: This partition of the dataset evaluates the performance of our network after the completion of the training phase.

TensorFlow provides a suite of functions and operations that enables easy data manipulation and modification through a defined input pipeline.

To be able to access these methods and procedures, it is required that we transform our dataset into an efficient data representation TensorFlow is familiar with. This is achieved using the tf.data.Dataset API.

More specifically, tf.data.Dataset.from_tensor_slices method takes the train, test, and validation dataset partitions and returns a corresponding TensorFlow Dataset representation.

train_ds = tf.data.Dataset.from_tensor_slices((train_images, train_labels))
test_ds = tf.data.Dataset.from_tensor_slices((test_images, test_labels))
validation_ds = tf.data.Dataset.from_tensor_slices((validation_images, validation_labels))

3. Preprocessing

Preprocessing within any machine learning is associated with the transformation of data from one form to another.

Usually, preprocessing is conducted to ensure the data utilized is within an appropriate format.

First, let’s visualize the images within the CIFAR-10 dataset.

The code snippet below uses the Matplotlib library to present the pixel information of the data from five training images into actual images. There is also an indicator of the class each depicted content within the images belongs to.

Excuse the blurriness of the images; the CIFAR-10 images have small dimensions, which makes visualization of the actual pictures a bit difficult.

plt.figure(figsize=(20,20))
for i, (image, label) in enumerate(train_ds.take(5)):
    ax = plt.subplot(5,5,i+1)
    plt.imshow(image)
    plt.title(CLASS_NAMES[label.numpy()[0]])
    plt.axis('off')

The primary preprocessing transformations that will be imposed on the data presented to the network are:

Normalizing and standardizing the images.
Resizing of the images from 32x32 to 227x227. The AlexNet network input expects a 227x227 image.

We’ll create a function called process_images.

This function will perform all preprocessing work that we require for the data. This function is called further down the machine learning workflow.

def process_images(image, label):
    # Normalize images to have a mean of 0 and standard deviation of 1
    image = tf.image.per_image_standardization(image)
    # Resize images from 32x32 to 277x277
    image = tf.image.resize(image, (227,227))
    return image, label

4. Data/Input Pipeline

So far, we have obtained and partitioned the dataset and created a function to process the dataset. The next step is to build an input pipeline.

An input/data pipeline is described as a series of functions or methods that are called consecutively one after another. Input pipelines are a chain of functions that either act upon the data or enforces an operation on the data flowing through the pipeline.

Let’s get the size of each of the dataset partition we created; the sizes of the dataset partitions are required to ensure that the dataset is thoroughly shuffled before passed through the network.

train_ds_size = tf.data.experimental.cardinality(train_ds).numpy()
test_ds_size = tf.data.experimental.cardinality(test_ds).numpy()
validation_ds_size = tf.data.experimental.cardinality(validation_ds).numpy()
print("Training data size:", train_ds_size)
print("Test data size:", test_ds_size)
print("Validation data size:", validation_ds_size)

For our basic input/data pipeline, we will conduct three primary operations:

Preprocessing the data within the dataset
Shuffle the dataset
Batch data within the dataset

train_ds = (train_ds
                  .map(process_images)
                  .shuffle(buffer_size=train_ds_size)
                  .batch(batch_size=32, drop_remainder=True))test_ds = (test_ds
                  .map(process_images)
                  .shuffle(buffer_size=train_ds_size)
                  .batch(batch_size=32, drop_remainder=True))validation_ds = (validation_ds
                  .map(process_images)
                  .shuffle(buffer_size=train_ds_size)
                  .batch(batch_size=32, drop_remainder=True))

5. Model Implementation

Within this section, we will implement the AlexNet CNN architecture from scratch.

Through the utilization of Keras Sequential API, we can implement consecutive neural network layers within our models that are stacked against each other.

Here are the types of layers the AlexNet CNN architecture is composed of, along with a brief description:

Convolutional layer: A convolution is a mathematical term that describes a dot product multiplication between two sets of elements. Within deep learning the convolution operation acts on the filters/kernels and image data array within the convolutional layer. Therefore a convolutional layer is simply a layer the houses the convolution operation that occurs between the filters and the images passed through a convolutional neural network.
Batch Normalisation layer: Batch Normalization is a technique that mitigates the effect of unstable gradients within a neural network through the introduction of an additional layer that performs operations on the inputs from the previous layer. The operations standardize and normalize the input values, after that the input values are transformed through scaling and shifting operations.
MaxPooling layer: Max pooling is a variant of sub-sampling where the maximum pixel value of pixels that fall within the receptive field of a unit within a sub-sampling layer is taken as the output. The max-pooling operation below has a window of 2x2 and slides across the input data, outputting an average of the pixels within the receptive field of the kernel.
Flatten layer: Takes an input shape and flattens the input image data into a one-dimensional array.
Dense Layer: A dense layer has an embedded number of arbitrary units/neurons within. Each neuron is a perceptron.

Some other operations and techniques utilized within the AlexNet CNN that are worth mentioning are:

Activation Function: A mathematical operation that transforms the result or signals of neurons into a normalized output. The purpose of an activation function as a component of a neural network is to introduce non-linearity within the network. The inclusion of an activation function enables the neural network to have greater representational power and solve complex functions.
Rectified Linear Unit Activation Function(ReLU): A type of activation function that transforms the value results of a neuron. The transformation imposed by ReLU on values from a neuron is represented by the formula y=max(0,x). The ReLU activation function clamps down any negative values from the neuron to 0, and positive values remain unchanged. The result of this mathematical transformation is utilized as the output of the current layer and used as input to a consecutive layer within a neural network.
Softmax Activation Function: A type of activation function that is utilized to derive the probability distribution of a set of numbers within an input vector. The output of a softmax activation function is a vector in which its set of values represents the probability of an occurrence of a class or event. The values within the vector all add up to 1.
Dropout: Dropout technique works by randomly reducing the number of interconnecting neurons within a neural network. At every training step, each neuron has a chance of being left out, or rather, dropped out of the collated contributions from connected neurons.

The code snippet represents the Keras implementation of the AlexNet CNN architecture.

model = keras.models.Sequential([
    keras.layers.Conv2D(filters=96, kernel_size=(11,11), strides=(4,4), activation='relu', input_shape=(227,227,3)),
    keras.layers.BatchNormalization(),
    keras.layers.MaxPool2D(pool_size=(3,3), strides=(2,2)),
    keras.layers.Conv2D(filters=256, kernel_size=(5,5), strides=(1,1), activation='relu', padding="same"),
    keras.layers.BatchNormalization(),
    keras.layers.MaxPool2D(pool_size=(3,3), strides=(2,2)),
    keras.layers.Conv2D(filters=384, kernel_size=(3,3), strides=(1,1), activation='relu', padding="same"),
    keras.layers.BatchNormalization(),
    keras.layers.Conv2D(filters=384, kernel_size=(3,3), strides=(1,1), activation='relu', padding="same"),
    keras.layers.BatchNormalization(),
    keras.layers.Conv2D(filters=256, kernel_size=(3,3), strides=(1,1), activation='relu', padding="same"),
    keras.layers.BatchNormalization(),
    keras.layers.MaxPool2D(pool_size=(3,3), strides=(2,2)),
    keras.layers.Flatten(),
    keras.layers.Dense(4096, activation='relu'),
    keras.layers.Dropout(0.5),
    keras.layers.Dense(4096, activation='relu'),
    keras.layers.Dropout(0.5),
    keras.layers.Dense(10, activation='softmax')
])

6. TensorBoard

At this point, we have the custom AlexNet network implemented.

Before we proceed onto training, validation, and evaluation of the network with data, we first have to set up some monitoring facilities.

TensorBoard is a tool that provides a suite of visualization and monitoring mechanisms. For the work in this tutorial, we’ll be utilizing TensorBoard to monitor the progress of the training of the network.

More specifically, we’ll be monitoring the following metrics: training loss, training accuracy, validation loss, validation accuracy.

In the shortcode snippet below we are creating a reference to the directory we would like all TensorBoard files to be stored within. The function get_run_logdir returns the location of the exact directory that is named according to the current time the training phase starts.

To complete this current process, we pass the directory to store TensorBoard related files for a particular training session to the TensorBoard callback.

root_logdir = os.path.join(os.curdir, "logs\\fit\\")def get_run_logdir():
    run_id = time.strftime("run_%Y_%m_%d-%H_%M_%S")
    return os.path.join(root_logdir, run_id)run_logdir = get_run_logdir()
tensorboard_cb = keras.callbacks.TensorBoard(run_logdir)

7. Training and Results

To train the network, we have to compile it.

The compilation processes involve specifying the following items:

Loss function: A method that quantifies ‘how well’ a machine learning model performs. The quantification is an output(cost) based on a set of inputs, which are referred to as parameter values. The parameter values are used to estimate a prediction, and the ‘loss’ is the difference between the predictions and the actual values.
Optimization Algorithm: An optimizer within a neural network is an algorithmic implementation that facilitates the process of gradient descent within a neural network by minimizing the loss values provided via the loss function. To reduce the loss, it is paramount the values of the weights within the network are selected appropriately.
Learning Rate: An integral component of a neural network implementation detail as it’s a factor value that determines the level of updates that are made to the values of the weights of the network. Learning rate is a type of hyperparameter.

model.compile(loss='sparse_categorical_crossentropy', optimizer=tf.optimizers.SGD(lr=0.001), metrics=['accuracy'])
model.summary()

We can also provide a summary of the network to have more insight into the layer composition of the network by running the model.summary()function.

Model: "sequential" _________________________________________________________________ Layer (type)                 Output Shape              Param #    ================================================================= conv2d (Conv2D)              (None, 55, 55, 96)        34944      _________________________________________________________________ batch_normalization (BatchNo (None, 55, 55, 96)        384        _________________________________________________________________ max_pooling2d (MaxPooling2D) (None, 27, 27, 96)        0          _________________________________________________________________ conv2d_1 (Conv2D)            (None, 27, 27, 256)       614656     _________________________________________________________________ batch_normalization_1 (Batch (None, 27, 27, 256)       1024       _________________________________________________________________ max_pooling2d_1 (MaxPooling2 (None, 13, 13, 256)       0          _________________________________________________________________ conv2d_2 (Conv2D)            (None, 13, 13, 384)       885120     _________________________________________________________________ batch_normalization_2 (Batch (None, 13, 13, 384)       1536       _________________________________________________________________ conv2d_3 (Conv2D)            (None, 13, 13, 384)       147840     _________________________________________________________________ batch_normalization_3 (Batch (None, 13, 13, 384)       1536       _________________________________________________________________ conv2d_4 (Conv2D)            (None, 13, 13, 256)       98560      _________________________________________________________________ batch_normalization_4 (Batch (None, 13, 13, 256)       1024       _________________________________________________________________ max_pooling2d_2 (MaxPooling2 (None, 6, 6, 256)         0          _________________________________________________________________ flatten (Flatten)            (None, 9216)              0          _________________________________________________________________ dense (Dense)                (None, 4096)              37752832   _________________________________________________________________ dropout (Dropout)            (None, 4096)              0          _________________________________________________________________ dense_1 (Dense)              (None, 4096)              16781312   _________________________________________________________________ dropout_1 (Dropout)          (None, 4096)              0          _________________________________________________________________ dense_2 (Dense)              (None, 10)                40970      ================================================================= Total params: 56,361,738 Trainable params: 56,358,986 Non-trainable params: 2,752 _________________________________________________________________

At this point, we are ready to train the network.

Training the custom AlexNet network is very simple with the Keras module enabled through TensorFlow. We simply have to call the fit()method and pass relevant arguments.

Epoch: This is a numeric value that indicates the number of time a network has been exposed to all the data points within a training dataset.

model.fit(train_ds,
          epochs=50,
          validation_data=validation_ds,
          validation_freq=1,
          callbacks=[tensorboard_cb])

After executing this cell of code within the notebook, the network will begin to train and validate against the data provided. You’ll start to see training and validation logs such as the one shown below:

Train for 1562 steps, validate for 156 steps
Epoch 1/50
   1/1562 [..............................] - ETA: 3:05:44 - loss: 5.6104 - accuracy: 0.0625WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.168881). Check your callbacks.
1562/1562 [==============================] - 42s 27ms/step - loss: 2.0966 - accuracy: 0.3251 - val_loss: 1.4436 - val_accuracy: 0.4920
Epoch 2/50
1562/1562 [==============================] - 39s 25ms/step - loss: 1.5864 - accuracy: 0.4382 - val_loss: 1.2939 - val_accuracy: 0.5447
Epoch 3/50
1562/1562 [==============================] - 39s 25ms/step - loss: 1.4391 - accuracy: 0.4889 - val_loss: 1.1749 - val_accuracy: 0.5859
Epoch 4/50
1562/1562 [==============================] - 39s 25ms/step - loss: 1.3278 - accuracy: 0.5307 - val_loss: 1.0841 - val_accuracy: 0.6228
Epoch 5/50
1562/1562 [==============================] - 39s 25ms/step - loss: 1.2349 - accuracy: 0.5630 - val_loss: 1.0094 - val_accuracy: 0.6569
Epoch 6/50
1562/1562 [==============================] - 40s 25ms/step - loss: 1.1657 - accuracy: 0.5876 - val_loss: 0.9599 - val_accuracy: 0.6851
Epoch 7/50
1562/1562 [==============================] - 39s 25ms/step - loss: 1.1054 - accuracy: 0.6128 - val_loss: 0.9102 - val_accuracy: 0.6937
Epoch 8/50
1562/1562 [==============================] - 40s 26ms/step - loss: 1.0477 - accuracy: 0.6285 - val_loss: 0.8584 - val_accuracy: 0.7109
Epoch 9/50
1562/1562 [==============================] - 39s 25ms/step - loss: 1.0026 - accuracy: 0.6461 - val_loss: 0.8392 - val_accuracy: 0.7137
Epoch 10/50
1562/1562 [==============================] - 39s 25ms/step - loss: 0.9601 - accuracy: 0.6627 - val_loss: 0.7684 - val_accuracy: 0.7398
Epoch 11/50
1562/1562 [==============================] - 40s 25ms/step - loss: 0.9175 - accuracy: 0.6771 - val_loss: 0.7683 - val_accuracy: 0.7476
Epoch 12/50
1562/1562 [==============================] - 40s 25ms/step - loss: 0.8827 - accuracy: 0.6914 - val_loss: 0.7012 - val_accuracy: 0.7702
Epoch 13/50
1562/1562 [==============================] - 40s 25ms/step - loss: 0.8465 - accuracy: 0.7035 - val_loss: 0.6496 - val_accuracy: 0.7903
Epoch 14/50
1562/1562 [==============================] - 40s 26ms/step - loss: 0.8129 - accuracy: 0.7160 - val_loss: 0.6137 - val_accuracy: 0.7991
Epoch 15/50
1562/1562 [==============================] - 40s 26ms/step - loss: 0.7832 - accuracy: 0.7250 - val_loss: 0.6181 - val_accuracy: 0.7957
Epoch 16/50
1562/1562 [==============================] - 40s 25ms/step - loss: 0.7527 - accuracy: 0.7371 - val_loss: 0.6102 - val_accuracy: 0.7953
Epoch 17/50
1562/1562 [==============================] - 40s 26ms/step - loss: 0.7193 - accuracy: 0.7470 - val_loss: 0.5236 - val_accuracy: 0.8327
Epoch 18/50
1562/1562 [==============================] - 40s 26ms/step - loss: 0.6898 - accuracy: 0.7559 - val_loss: 0.5091 - val_accuracy: 0.8425
Epoch 19/50
1562/1562 [==============================] - 40s 25ms/step - loss: 0.6620 - accuracy: 0.7677 - val_loss: 0.4824 - val_accuracy: 0.8468
Epoch 20/50
1562/1562 [==============================] - 40s 26ms/step - loss: 0.6370 - accuracy: 0.7766 - val_loss: 0.4491 - val_accuracy: 0.8620
Epoch 21/50
1562/1562 [==============================] - 41s 26ms/step - loss: 0.6120 - accuracy: 0.7850 - val_loss: 0.4212 - val_accuracy: 0.8694
Epoch 22/50
1562/1562 [==============================] - 41s 26ms/step - loss: 0.5846 - accuracy: 0.7943 - val_loss: 0.4091 - val_accuracy: 0.8746
Epoch 23/50
1562/1562 [==============================] - 40s 26ms/step - loss: 0.5561 - accuracy: 0.8070 - val_loss: 0.3737 - val_accuracy: 0.8872
Epoch 24/50
1562/1562 [==============================] - 40s 26ms/step - loss: 0.5314 - accuracy: 0.8150 - val_loss: 0.3808 - val_accuracy: 0.8810
Epoch 25/50
1562/1562 [==============================] - 40s 26ms/step - loss: 0.5107 - accuracy: 0.8197 - val_loss: 0.3246 - val_accuracy: 0.9048
Epoch 26/50
1562/1562 [==============================] - 40s 26ms/step - loss: 0.4833 - accuracy: 0.8304 - val_loss: 0.3085 - val_accuracy: 0.9115
Epoch 27/50
1562/1562 [==============================] - 41s 26ms/step - loss: 0.4595 - accuracy: 0.8425 - val_loss: 0.2992 - val_accuracy: 0.9111
Epoch 28/50
1562/1562 [==============================] - 40s 26ms/step - loss: 0.4395 - accuracy: 0.8467 - val_loss: 0.2566 - val_accuracy: 0.9305
Epoch 29/50
1562/1562 [==============================] - 41s 26ms/step - loss: 0.4157 - accuracy: 0.8563 - val_loss: 0.2482 - val_accuracy: 0.9339
Epoch 30/50
1562/1562 [==============================] - 40s 26ms/step - loss: 0.3930 - accuracy: 0.8629 - val_loss: 0.2129 - val_accuracy: 0.9449
Epoch 31/50
1562/1562 [==============================] - 40s 26ms/step - loss: 0.3727 - accuracy: 0.8705 - val_loss: 0.1999 - val_accuracy: 0.9525
Epoch 32/50
1562/1562 [==============================] - 40s 26ms/step - loss: 0.3584 - accuracy: 0.8751 - val_loss: 0.1791 - val_accuracy: 0.9593
Epoch 33/50
1562/1562 [==============================] - 41s 26ms/step - loss: 0.3387 - accuracy: 0.8830 - val_loss: 0.1770 - val_accuracy: 0.9557
Epoch 34/50
1562/1562 [==============================] - 41s 26ms/step - loss: 0.3189 - accuracy: 0.8905 - val_loss: 0.1613 - val_accuracy: 0.9643
Epoch 35/50
1562/1562 [==============================] - 40s 26ms/step - loss: 0.3036 - accuracy: 0.8969 - val_loss: 0.1421 - val_accuracy: 0.9681
Epoch 36/50
1562/1562 [==============================] - 41s 26ms/step - loss: 0.2784 - accuracy: 0.9039 - val_loss: 0.1290 - val_accuracy: 0.9736
Epoch 37/50
1562/1562 [==============================] - 40s 26ms/step - loss: 0.2626 - accuracy: 0.9080 - val_loss: 0.1148 - val_accuracy: 0.9762
Epoch 38/50
1562/1562 [==============================] - 41s 26ms/step - loss: 0.2521 - accuracy: 0.9145 - val_loss: 0.0937 - val_accuracy: 0.9828
Epoch 39/50
1562/1562 [==============================] - 42s 27ms/step - loss: 0.2387 - accuracy: 0.9190 - val_loss: 0.1045 - val_accuracy: 0.9768
Epoch 40/50
1562/1562 [==============================] - 40s 26ms/step - loss: 0.2215 - accuracy: 0.9247 - val_loss: 0.0850 - val_accuracy: 0.9860
Epoch 41/50
1562/1562 [==============================] - 40s 26ms/step - loss: 0.2124 - accuracy: 0.9274 - val_loss: 0.0750 - val_accuracy: 0.9862
Epoch 42/50
1562/1562 [==============================] - 41s 26ms/step - loss: 0.1980 - accuracy: 0.9335 - val_loss: 0.0680 - val_accuracy: 0.9896
Epoch 43/50
1562/1562 [==============================] - 41s 26ms/step - loss: 0.1906 - accuracy: 0.9350 - val_loss: 0.0616 - val_accuracy: 0.9912
Epoch 44/50
1562/1562 [==============================] - 40s 26ms/step - loss: 0.1769 - accuracy: 0.9410 - val_loss: 0.0508 - val_accuracy: 0.9922
Epoch 45/50
1562/1562 [==============================] - 41s 26ms/step - loss: 0.1648 - accuracy: 0.9455 - val_loss: 0.0485 - val_accuracy: 0.9936
Epoch 46/50
1562/1562 [==============================] - 41s 26ms/step - loss: 0.1571 - accuracy: 0.9487 - val_loss: 0.0435 - val_accuracy: 0.9952
Epoch 47/50
1562/1562 [==============================] - 40s 26ms/step - loss: 0.1514 - accuracy: 0.9501 - val_loss: 0.0395 - val_accuracy: 0.9950
Epoch 48/50
1562/1562 [==============================] - 41s 26ms/step - loss: 0.1402 - accuracy: 0.9535 - val_loss: 0.0274 - val_accuracy: 0.9984
Epoch 49/50
1562/1562 [==============================] - 40s 26ms/step - loss: 0.1357 - accuracy: 0.9549 - val_loss: 0.0308 - val_accuracy: 0.9966
Epoch 50/50
1562/1562 [==============================] - 42s 27ms/step - loss: 0.1269 - accuracy: 0.9596 - val_loss: 0.0251 - val_accuracy: 0.9976<tensorflow.python.keras.callbacks.History at 0x2de3aaa0ec8>

For better visualization and monitoring of training performance, we’ll use the TensorBoard functionality.

Open up a terminal at the directory level where the TensorBoard log folder exists and run the following command:

tensorboard --logdir logs

Directory level where TensorBoard log file resides

Follow the instructions on the terminal and navigate to ‘localhost:6006’ (this could be a different port number for you).

Alas, you will be presented with a page that is similar to the image depicted below:

Below is the snippet of the visualization of the complete training and validation phase provided by TensorBoard.

TensorBoard Training and Validation monitoring

8. Evaluation

The last official step is to assess the trained network through network evaluation.

The evaluation phase will provide a performance score of the trained model on unseen data. For the evaluation phase of the model, we’ll be utilizing the batch of test data created at earlier steps.

Evaluating a model is very simple, you simply call the evaluate()method and pass the batched test data.

model.evaluate(test_ds)

After executing the cell block above, we are presented with a score that indicates the performance of the model on unseen data.

312/312 [==============================] - 8s 27ms/step - loss: 0.9814 - accuracy: 0.7439
[0.9813630809673132, 0.7438902]

The first element of the returned result contains the evaluation loss: 0.9813, the second element indicates is the evaluation accuracy 0.74389.

The custom implemented AlexNet network that was trained, validated, and evaluated on the CIFAR-10 dataset to create a model with an evaluation accuracy of 74% on a test dataset containing 5000 data points.

Bonus (Optional)

This section includes some information that supplements the implementation of an AlexNet convolutional neural network.

Although this additional information is not crucial to gain an understanding of the implementation processes, these sections will provide readers with some additional background knowledge that can be leveraged in future work.

The sections covered are as follows:

Local Response Normalisation
Information into why we batch and shuffle the dataset before training

Local Response Normalisation

Many are familiar with batch normalization, but the AlexNet architecture used a different method of normalization within the network: Local Response Normalization (LRN).

LRN is a technique that maximizes the activation of neighbouring neurons. Neighbouring neurons describe neurons across several feature maps that share the same spatial position. By normalizing the activations of the neurons, neurons with high activations are highlighted; this essentially mimics the lateral inhibition that happens within neurobiology.

LRN are not widely utilized in modern CNN architectures, as there are other more effective methods of normalization. Although LRN implementations can still be found in some standard machine learning libraries and frameworks, so feel free to experiment.

Why do we shuffle the dataset?

Shuffling the dataset before training is a traditional process within a typical machine learning project. But why do we do it?

When conducting data aggregation, it is common to consecutively accumulate images or data points that correspond to the same classes and labels. A typical final result after loading data used to train, and validate a network is a set of images/data points that are arranged in order of corresponding classes.

The method by which neural networks learn within Deep learning is through the detection of patterns between spatial information within images.

Supposedly we have a dataset of 10,000 images with five classes. The first 2,000 images belong to Class 1; the second 2,000 images belong to Class 2, and so on.

During the training phase, if we present the network with unshuffled training data, we would find that the neural network will learn patterns that closely correlate to Class 1, as these are the images and data points the neural network is exposed to first. This will increase the difficulty of an optimization algorithm discovering an optimal solution for the entire dataset.

By shuffling the dataset, we ensure two key things:

1. There is large enough variance within the dataset that enables each data point within the training data to have an independent effect on the network. Therefore we can have a network that generalizes well to the entire dataset, rather than a subsection of the dataset.

2. Our validation partition of the dataset is obtained from the training data; if we fail to shuffle the dataset appropriately, we find that our validation dataset will not be representative of the classes within training data. For example, our validation dataset might only contain data points from the last class of the training data, as opposed to equal representation of every class with the dataset.

Why do we batch the dataset before training?

Dataset partitions are usually batched for memory optimization reasons. There are two ways you can train a network.

Present all the training data to the network at once
Batch the training data in smaller segments (e.g., 8, 16, 32, 64), and at each iteration, a single batch is presented to the network.

Approach #1 will work for a small dataset, but when you start approaching a larger sized dataset, you will find that approach #1 consumes a lot of memory resources

By using approach #1 for a large dataset, the images or data points are held in memory, and this typically causes ‘Out of Memory’ error during training.

Approach #2 is a more conservative method of training network with large dataset while considering efficient memory management. By batching the training data, we are only holding 16, 32, or 128 data points at any giving time in memory, as opposed to an entire dataset.

Conclusion

This detailed article covers some topics surrounding typical processes within deep learning projects. We’ve gone through the following subject areas:

Machine and Deep learning tools and libraries
Data partitioning
Creating Input and data pipelines using TensorFlow
Data Preprocessing
Convolutional Neural Network Implementation (AlexNet)
Model performance monitoring using TensorBoard
Model Evaluation

In the future, we’ll cover the implementation of another well known convolutional neural network architecture: GoogLeNet.

Want more from me?

Support my writing by becoming a referred Medium member
Subscribe to get notified when I publish articles
Connect and reach me on LinkedIn
Learn with me at O’Reilly

Are There Black People In AI?

We remain one of the least represented race in AI, and yet we could suffer the most from its utilization and misuse.

towardsdatascience.com

What AlexNet Brought To The World Of Deep Learning

Take a minute to understand the trivial techniques and neural network architecture that revolutionized how deep…

towardsdatascience.com

Understanding and Implementing LeNet-5 CNN Architecture (Deep Learning)

In this article, we perform image classification on the MNIST dataset with custom implemented LeNet-5 neural network…

towardsdatascience.com

Sign up for The Variable

By Towards Data Science

Every Thursday, the Variable delivers the very best of Towards Data Science: from hands-on tutorials and cutting-edge research to original features you don't want to miss. Take a look.

More from Towards Data Science

Your home for data science. A Medium publication sharing concepts, ideas and codes.

Trist'n Joseph

·Aug 14, 2020

How Social Media Companies Know What to Show You: Hello Logistic Regressions!

Examining why logistic regressions are more suitable for classification problems than linear regressions — It is no longer a question as to whether data science has taken over the world. Data has grown to become one of the world’s valuable resource, so much so that almost every decision made within big companies are strictly driven by data. This extreme use of data is especially…

Machine Learning

6 min read

How Social Media Companies Know What to Show You: Hello Logistic Regressions!

Share your ideas with millions of readers.

Write on Medium

Nishit Jain

·Aug 14, 2020

Building and Evaluating Classification ML Models

Must Read to Build Good Classification ML Models — There are different types of problems in machine learning. Some might fall under regression (having continuous targets) while others might fall under classification (having discrete targets). …

Data Science

9 min read

Building and Evaluating Classification ML Models

Claire D. Costa

·Aug 14, 2020

Top Python Libraries for Data Science

An Overview Of Popular Python Libraries for Data Science — “Python has been an important part of Google since the beginning, and remains so as the system grows and evolves. Today dozens of Google engineers use Python, and we’re looking for more people with skills in this language.” - Peter Norvig, director of search quality at Google, Inc. Python has…

Python

10 min read

Samir Saci

·Aug 14, 2020

Improve Warehouse Productivity using Pathfinding Algorithm with Python

Implement Pathfinding Algorithm based on Travelling Salesman Problem Designed with Google AI Operation Research Libraries — This article is part of a series about Warehouse Operations Optimization with Python. (Part 1, Part 2) I. Pathfinding Algorithm for Picking Route Creation In the first articles, we built a model to test several strategies of Orders Wave Creations to reduce the walking distance for picking by

Warehouse Management

6 min read

Improve Warehouse Productivity using Pathfinding Algorithm with Python

GreekDataGuy

·Aug 14, 2020

How (Not) To Mismanage Artificial Intelligence In A Startup

My learnings from the trenches of machine learning and early stage startups — As a technical founder, I’ve both built and managed the direction of ML in startups. I’ve also made a lot of mistakes. Many stem from treating ML as if it were software development. Others from the unstructured nature of the startup itself. Here are a few of my learnings. Don’t build an AI startup (if you can help it) Don’t…

Artificial Intelligence

3 min read

Recommended from Medium

ajit kale

Android with ML

Tony Shin

Towards Data Science

EfficientDet: Scalable and Efficient Object Detection Review

Graphcore

Poplar SDK 2.5 now available

Ujas

Foliar Leaf diseases in Apple Trees

Richmond Alake

Deep Learning Receptors and Pooling Explained

Namratesh Shrivastav

DataDrivenInvestor

Personalized Cancer Diagnosis

Sabir Jana, CFA

Analytics Vidhya

Keras & TensorFlow to Predict Market Movements and Backtest using Backtrader

Rinat Maksutov

Deep study of a not very deep neural network. Part 3a: Optimizers overview

About Help Terms Privacy

Technical

Implementing AlexNet CNN Architecture Using TensorFlow 2.0+ and Keras

Learn how to implement the neural network architecture that kicked off the deep convolutional neural network revolution back in 2012.

Introduction

Here are some of the key learning objectives from this article:

AlexNet Implementation

1. Tools And Libraries

2. Dataset

3. Preprocessing

4. Data/Input Pipeline

5. Model Implementation

6. TensorBoard

7. Training and Results

8. Evaluation

The custom implemented AlexNet network that was trained, validated, and evaluated on the CIFAR-10 dataset to create a model with an evaluation accuracy of 74% on a test dataset containing 5000 data points.

Bonus (Optional)

Local Response Normalisation

Why do we shuffle the dataset?

Why do we batch the dataset before training?

Conclusion

Want more from me?

Are There Black People In AI?

We remain one of the least represented race in AI, and yet we could suffer the most from its utilization and misuse.

What AlexNet Brought To The World Of Deep Learning

Take a minute to understand the trivial techniques and neural network architecture that revolutionized how deep…

Understanding and Implementing LeNet-5 CNN Architecture (Deep Learning)

In this article, we perform image classification on the MNIST dataset with custom implemented LeNet-5 neural network…

Sign up for The Variable

By Towards Data Science

More from Towards Data Science

How Social Media Companies Know What to Show You: Hello Logistic Regressions!

Building and Evaluating Classification ML Models

Top Python Libraries for Data Science

Improve Warehouse Productivity using Pathfinding Algorithm with Python

How (Not) To Mismanage Artificial Intelligence In A Startup

Recommended from Medium

Android with ML

EfficientDet: Scalable and Efficient Object Detection Review

Poplar SDK 2.5 now available

Foliar Leaf diseases in Apple Trees

Deep Learning Receptors and Pooling Explained

Personalized Cancer Diagnosis

Keras & TensorFlow to Predict Market Movements and Backtest using Backtrader

Deep study of a not very deep neural network. Part 3a: Optimizers overview

Get the Medium app

Richmond Alake

More from Medium

Is Machine Learning an Enemy to Humans, or Do Humans Have the Wrong Point of View?

What are Intrinsic and Extrinsic Camera Parameters in Computer Vision?

How to build a CGAN for generating MNIST digits in PyTorch

Building your data science portfolio(1e) — Exploring CT image arrays (part 1)