You have 2 free member-only stories left this month.

Photo by on

Technical

Implementing AlexNet CNN Architecture Using TensorFlow 2.0+ and Keras

Learn how to implement the neural network architecture that kicked off the deep convolutional neural network revolution back in 2012.

Introduction

The main content of this article will present how the AlexNet Convolutional Neural Network(CNN) architecture is implemented using TensorFlow and Keras.

But first, allow me to provide a brief background behind the AlexNet CNN architecture.

AlexNet was first utilized in the public setting when it won the ImageNet Large Scale Visual Recognition Challenge(ILSSVRC 2012 contest). It was at this contest that AlexNet showed that deep convolutional neural network can be used for solving image classification.

AlexNet won the ILSVRC 2012 contest by a margin.

The that detailed the internal components of the CNN architecture also introduced some novel techniques and methods such as efficient computing resource utilization; data augmentation, GPU training, and multiple strategies to prevent overfitting within neural networks.

I have written an that presents key ideas and techniques that AlexNet brought to the world of computer vision and deep learning.

Here are some of the key learning objectives from this article:

  • Introduction to neural network implementation with Keras and TensorFlow
  • Data preprocessing with TensorFlow
  • Training visualization with TensorBoard
  • Description of standard machine learning terms and terminologies

AlexNet Implementation

AlexNet CNN is probably one of the simplest methods to approach understanding deep learning concepts and techniques.

AlexNet is not a complicated architecture when it is compared with some state of the art CNN architectures that have emerged in the more recent years.

AlexNet is simple enough for beginners and intermediate deep learning practitioners to pick up some good practices on model implementation techniques.

All code presented in this article is written using Jupyter Lab. At the end of this article is a GitHub link to the notebook that includes all code in the implementation section.

So let’s begin.

1. Tools And Libraries

We begin implementation by importing the following libraries:

  • : An open-source platform for the implementation, training, and deployment of machine learning models.
  • : An open-source library used for the implementation of neural network architectures that run on both CPUs and GPUs.
  • : A visualization python tool used for illustrating interactive charts and images.
import tensorflow as tf
from tensorflow import keras
import matplotlib.pyplot as plt
import os
import time

2. Dataset

contains 60,000 colour images, each with dimensions 32x32px. The content of the images within the dataset is sampled from 10 classes.

CIFAR-10 images were aggregated by some of the creators of the AlexNet network, and .

The deep learning Keras library provides direct access to the CIFAR10 dataset with relative ease, through its . Accessing common datasets such as or , becomes a trivial task with Keras.

(train_images, train_labels), (test_images, test_labels) = keras.datasets.cifar10.load_data()

In order to reference the class names of the images during the visualization stage, a python list containing the classes is initialized with the variable name CLASS_NAMES.

CLASS_NAMES= ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']

The CIFAR dataset is partitioned into 50,000 training data and 10,000 test data by default. The last partition of the dataset we require is the validation data.

The validation data is obtained by taking the last 5000 images within the training data.

validation_images, validation_labels = train_images[:5000], train_labels[:5000]
train_images, train_labels = train_images[5000:], train_labels[5000:]

Training Dataset: This is the group of our dataset used to train the neural network directly. Training data refers to the dataset partition exposed to the neural network during training.

Validation Dataset: This group of the dataset is utilized during training to assess the performance of the network at various iterations.

Test Dataset: This partition of the dataset evaluates the performance of our network after the completion of the training phase.

TensorFlow provides a suite of functions and operations that enables easy data manipulation and modification through a defined input pipeline.

To be able to access these methods and procedures, it is required that we transform our dataset into an efficient data representation TensorFlow is familiar with. This is achieved using the API.

More specifically, method takes the train, test, and validation dataset partitions and returns a corresponding TensorFlow Dataset representation.

train_ds = tf.data.Dataset.from_tensor_slices((train_images, train_labels))
test_ds = tf.data.Dataset.from_tensor_slices((test_images, test_labels))
validation_ds = tf.data.Dataset.from_tensor_slices((validation_images, validation_labels))

3. Preprocessing

Preprocessing within any machine learning is associated with the transformation of data from one form to another.

Usually, preprocessing is conducted to ensure the data utilized is within an appropriate format.

First, let’s visualize the images within the CIFAR-10 dataset.

The code snippet below uses the Matplotlib library to present the pixel information of the data from five training images into actual images. There is also an indicator of the class each depicted content within the images belongs to.

Excuse the blurriness of the images; the CIFAR-10 images have small dimensions, which makes visualization of the actual pictures a bit difficult.

plt.figure(figsize=(20,20))
for i, (image, label) in enumerate(train_ds.take(5)):
ax = plt.subplot(5,5,i+1)
plt.imshow(image)
plt.title(CLASS_NAMES[label.numpy()[0]])
plt.axis('off')

The primary preprocessing transformations that will be imposed on the data presented to the network are:

  • Normalizing and standardizing the images.
  • Resizing of the images from 32x32 to 227x227. The AlexNet network input expects a 227x227 image.

We’ll create a function called process_images.

This function will perform all preprocessing work that we require for the data. This function is called further down the machine learning workflow.

def process_images(image, label):
# Normalize images to have a mean of 0 and standard deviation of 1
image = tf.image.per_image_standardization(image)
# Resize images from 32x32 to 277x277
image = tf.image.resize(image, (227,227))
return image, label

4. Data/Input Pipeline

So far, we have obtained and partitioned the dataset and created a function to process the dataset. The next step is to build an input pipeline.

An input/data pipeline is described as a series of functions or methods that are called consecutively one after another. Input pipelines are a chain of functions that either act upon the data or enforces an operation on the data flowing through the pipeline.

Let’s get the size of each of the dataset partition we created; the sizes of the dataset partitions are required to ensure that the dataset is thoroughly shuffled before passed through the network.

train_ds_size = tf.data.experimental.cardinality(train_ds).numpy()
test_ds_size = tf.data.experimental.cardinality(test_ds).numpy()
validation_ds_size = tf.data.experimental.cardinality(validation_ds).numpy()
print("Training data size:", train_ds_size)
print("Test data size:", test_ds_size)
print("Validation data size:", validation_ds_size)

For our basic input/data pipeline, we will conduct three primary operations:

  1. Preprocessing the data within the dataset
  2. Shuffle the dataset
  3. Batch data within the dataset
train_ds = (train_ds
.map(process_images)
.shuffle(buffer_size=train_ds_size)
.batch(batch_size=32, drop_remainder=True))
test_ds = (test_ds
.map(process_images)
.shuffle(buffer_size=train_ds_size)
.batch(batch_size=32, drop_remainder=True))
validation_ds = (validation_ds
.map(process_images)
.shuffle(buffer_size=train_ds_size)
.batch(batch_size=32, drop_remainder=True))

5. Model Implementation

Within this section, we will implement the AlexNet CNN architecture from scratch.

Through the utilization of , we can implement consecutive neural network layers within our models that are stacked against each other.

Here are the types of layers the AlexNet CNN architecture is composed of, along with a brief description:

Convolutional layer: A convolution is a mathematical term that describes a dot product multiplication between two sets of elements. Within deep learning the convolution operation acts on the filters/kernels and image data array within the convolutional layer. Therefore a convolutional layer is simply a layer the houses the convolution operation that occurs between the filters and the images passed through a convolutional neural network.

: Batch Normalization is a technique that mitigates the effect of unstable gradients within a neural network through the introduction of an additional layer that performs operations on the inputs from the previous layer. The operations standardize and normalize the input values, after that the input values are transformed through scaling and shifting operations.

: Max pooling is a variant of sub-sampling where the maximum pixel value of pixels that fall within the receptive field of a unit within a sub-sampling layer is taken as the output. The max-pooling operation below has a window of 2x2 and slides across the input data, outputting an average of the pixels within the receptive field of the kernel.

Flatten layer: Takes an input shape and flattens the input image data into a one-dimensional array.

Dense Layer: A dense layer has an embedded number of arbitrary units/neurons within. Each neuron is a perceptron.

Some other operations and techniques utilized within the AlexNet CNN that are worth mentioning are:

: A mathematical operation that transforms the result or signals of neurons into a normalized output. The purpose of an activation function as a component of a neural network is to introduce non-linearity within the network. The inclusion of an activation function enables the neural network to have greater representational power and solve complex functions.

Rectified Linear Unit Activation Function(ReLU): A type of activation function that transforms the value results of a neuron. The transformation imposed by ReLU on values from a neuron is represented by the formula y=max(0,x). The ReLU activation function clamps down any negative values from the neuron to 0, and positive values remain unchanged. The result of this mathematical transformation is utilized as the output of the current layer and used as input to a consecutive layer within a neural network.

Softmax Activation Function: A type of activation function that is utilized to derive the probability distribution of a set of numbers within an input vector. The output of a softmax activation function is a vector in which its set of values represents the probability of an occurrence of a class or event. The values within the vector all add up to 1.

: Dropout technique works by randomly reducing the number of interconnecting neurons within a neural network. At every training step, each neuron has a chance of being left out, or rather, dropped out of the collated contributions from connected neurons.

The code snippet represents the Keras implementation of the AlexNet CNN architecture.

model = keras.models.Sequential([
keras.layers.Conv2D(filters=96, kernel_size=(11,11), strides=(4,4), activation='relu', input_shape=(227,227,3)),
keras.layers.BatchNormalization(),
keras.layers.MaxPool2D(pool_size=(3,3), strides=(2,2)),
keras.layers.Conv2D(filters=256, kernel_size=(5,5), strides=(1,1), activation='relu', padding="same"),
keras.layers.BatchNormalization(),
keras.layers.MaxPool2D(pool_size=(3,3), strides=(2,2)),
keras.layers.Conv2D(filters=384, kernel_size=(3,3), strides=(1,1), activation='relu', padding="same"),
keras.layers.BatchNormalization(),
keras.layers.Conv2D(filters=384, kernel_size=(3,3), strides=(1,1), activation='relu', padding="same"),
keras.layers.BatchNormalization(),
keras.layers.Conv2D(filters=256, kernel_size=(3,3), strides=(1,1), activation='relu', padding="same"),
keras.layers.BatchNormalization(),
keras.layers.MaxPool2D(pool_size=(3,3), strides=(2,2)),
keras.layers.Flatten(),
keras.layers.Dense(4096, activation='relu'),
keras.layers.Dropout(0.5),
keras.layers.Dense(4096, activation='relu'),
keras.layers.Dropout(0.5),
keras.layers.Dense(10, activation='softmax')
])

6. TensorBoard

At this point, we have the custom AlexNet network implemented.

Before we proceed onto training, validation, and evaluation of the network with data, we first have to set up some monitoring facilities.

is a tool that provides a suite of visualization and monitoring mechanisms. For the work in this tutorial, we’ll be utilizing TensorBoard to monitor the progress of the training of the network.

More specifically, we’ll be monitoring the following metrics: training loss, training accuracy, validation loss, validation accuracy.

In the shortcode snippet below we are creating a reference to the directory we would like all TensorBoard files to be stored within. The function get_run_logdir returns the location of the exact directory that is named according to the current time the training phase starts.

To complete this current process, we pass the directory to store TensorBoard related files for a particular training session to the .

root_logdir = os.path.join(os.curdir, "logs\\fit\\")def get_run_logdir():
run_id = time.strftime("run_%Y_%m_%d-%H_%M_%S")
return os.path.join(root_logdir, run_id)
run_logdir = get_run_logdir()
tensorboard_cb = keras.callbacks.TensorBoard(run_logdir)

7. Training and Results

To train the network, we have to compile it.

The compilation processes involve specifying the following items:

Loss function: A method that quantifies ‘how well’ a machine learning model performs. The quantification is an output(cost) based on a set of inputs, which are referred to as parameter values. The parameter values are used to estimate a prediction, and the ‘loss’ is the difference between the predictions and the actual values.

Optimization Algorithm: An optimizer within a neural network is an algorithmic implementation that facilitates the process of gradient descent within a neural network by minimizing the loss values provided via the loss function. To reduce the loss, it is paramount the values of the weights within the network are selected appropriately.

Learning Rate: An integral component of a neural network implementation detail as it’s a factor value that determines the level of updates that are made to the values of the weights of the network. Learning rate is a type of hyperparameter.

model.compile(loss='sparse_categorical_crossentropy', optimizer=tf.optimizers.SGD(lr=0.001), metrics=['accuracy'])
model.summary()

We can also provide a summary of the network to have more insight into the layer composition of the network by running the model.summary()function.

Model: "sequential" _________________________________________________________________ Layer (type)                 Output Shape              Param #    ================================================================= conv2d (Conv2D)              (None, 55, 55, 96)        34944      _________________________________________________________________ batch_normalization (BatchNo (None, 55, 55, 96)        384        _________________________________________________________________ max_pooling2d (MaxPooling2D) (None, 27, 27, 96)        0          _________________________________________________________________ conv2d_1 (Conv2D)            (None, 27, 27, 256)       614656     _________________________________________________________________ batch_normalization_1 (Batch (None, 27, 27, 256)       1024       _________________________________________________________________ max_pooling2d_1 (MaxPooling2 (None, 13, 13, 256)       0          _________________________________________________________________ conv2d_2 (Conv2D)            (None, 13, 13, 384)       885120     _________________________________________________________________ batch_normalization_2 (Batch (None, 13, 13, 384)       1536       _________________________________________________________________ conv2d_3 (Conv2D)            (None, 13, 13, 384)       147840     _________________________________________________________________ batch_normalization_3 (Batch (None, 13, 13, 384)       1536       _________________________________________________________________ conv2d_4 (Conv2D)            (None, 13, 13, 256)       98560      _________________________________________________________________ batch_normalization_4 (Batch (None, 13, 13, 256)       1024       _________________________________________________________________ max_pooling2d_2 (MaxPooling2 (None, 6, 6, 256)         0          _________________________________________________________________ flatten (Flatten)            (None, 9216)              0          _________________________________________________________________ dense (Dense)                (None, 4096)              37752832   _________________________________________________________________ dropout (Dropout)            (None, 4096)              0          _________________________________________________________________ dense_1 (Dense)              (None, 4096)              16781312   _________________________________________________________________ dropout_1 (Dropout)          (None, 4096)              0          _________________________________________________________________ dense_2 (Dense)              (None, 10)                40970      ================================================================= Total params: 56,361,738 Trainable params: 56,358,986 Non-trainable params: 2,752 _________________________________________________________________

At this point, we are ready to train the network.

Training the custom AlexNet network is very simple with the Keras module enabled through TensorFlow. We simply have to call the fit()method and pass relevant arguments.

Epoch: This is a numeric value that indicates the number of time a network has been exposed to all the data points within a training dataset.

model.fit(train_ds,
epochs=50,
validation_data=validation_ds,
validation_freq=1,
callbacks=[tensorboard_cb])

After executing this cell of code within the notebook, the network will begin to train and validate against the data provided. You’ll start to see training and validation logs such as the one shown below:

Train for 1562 steps, validate for 156 steps
Epoch 1/50
1/1562 [..............................] - ETA: 3:05:44 - loss: 5.6104 - accuracy: 0.0625WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.168881). Check your callbacks.
1562/1562 [==============================] - 42s 27ms/step - loss: 2.0966 - accuracy: 0.3251 - val_loss: 1.4436 - val_accuracy: 0.4920
Epoch 2/50
1562/1562 [==============================] - 39s 25ms/step - loss: 1.5864 - accuracy: 0.4382 - val_loss: 1.2939 - val_accuracy: 0.5447
Epoch 3/50
1562/1562 [==============================] - 39s 25ms/step - loss: 1.4391 - accuracy: 0.4889 - val_loss: 1.1749 - val_accuracy: 0.5859
Epoch 4/50
1562/1562 [==============================] - 39s 25ms/step - loss: 1.3278 - accuracy: 0.5307 - val_loss: 1.0841 - val_accuracy: 0.6228
Epoch 5/50
1562/1562 [==============================] - 39s 25ms/step - loss: 1.2349 - accuracy: 0.5630 - val_loss: 1.0094 - val_accuracy: 0.6569
Epoch 6/50
1562/1562 [==============================] - 40s 25ms/step - loss: 1.1657 - accuracy: 0.5876 - val_loss: 0.9599 - val_accuracy: 0.6851
Epoch 7/50
1562/1562 [==============================] - 39s 25ms/step - loss: 1.1054 - accuracy: 0.6128 - val_loss: 0.9102 - val_accuracy: 0.6937
Epoch 8/50
1562/1562 [==============================] - 40s 26ms/step - loss: 1.0477 - accuracy: 0.6285 - val_loss: 0.8584 - val_accuracy: 0.7109
Epoch 9/50
1562/1562 [==============================] - 39s 25ms/step - loss: 1.0026 - accuracy: 0.6461 - val_loss: 0.8392 - val_accuracy: 0.7137
Epoch 10/50
1562/1562 [==============================] - 39s 25ms/step - loss: 0.9601 - accuracy: 0.6627 - val_loss: 0.7684 - val_accuracy: 0.7398
Epoch 11/50
1562/1562 [==============================] - 40s 25ms/step - loss: 0.9175 - accuracy: 0.6771 - val_loss: 0.7683 - val_accuracy: 0.7476
Epoch 12/50
1562/1562 [==============================] - 40s 25ms/step - loss: 0.8827 - accuracy: 0.6914 - val_loss: 0.7012 - val_accuracy: 0.7702
Epoch 13/50
1562/1562 [==============================] - 40s 25ms/step - loss: 0.8465 - accuracy: 0.7035 - val_loss: 0.6496 - val_accuracy: 0.7903
Epoch 14/50
1562/1562 [==============================] - 40s 26ms/step - loss: 0.8129 - accuracy: 0.7160 - val_loss: 0.6137 - val_accuracy: 0.7991
Epoch 15/50
1562/1562 [==============================] - 40s 26ms/step - loss: 0.7832 - accuracy: 0.7250 - val_loss: 0.6181 - val_accuracy: 0.7957
Epoch 16/50
1562/1562 [==============================] - 40s 25ms/step - loss: 0.7527 - accuracy: 0.7371 - val_loss: 0.6102 - val_accuracy: 0.7953
Epoch 17/50
1562/1562 [==============================] - 40s 26ms/step - loss: 0.7193 - accuracy: 0.7470 - val_loss: 0.5236 - val_accuracy: 0.8327
Epoch 18/50
1562/1562 [==============================] - 40s 26ms/step - loss: 0.6898 - accuracy: 0.7559 - val_loss: 0.5091 - val_accuracy: 0.8425
Epoch 19/50
1562/1562 [==============================] - 40s 25ms/step - loss: 0.6620 - accuracy: 0.7677 - val_loss: 0.4824 - val_accuracy: 0.8468
Epoch 20/50
1562/1562 [==============================] - 40s 26ms/step - loss: 0.6370 - accuracy: 0.7766 - val_loss: 0.4491 - val_accuracy: 0.8620
Epoch 21/50
1562/1562 [==============================] - 41s 26ms/step - loss: 0.6120 - accuracy: 0.7850 - val_loss: 0.4212 - val_accuracy: 0.8694
Epoch 22/50
1562/1562 [==============================] - 41s 26ms/step - loss: 0.5846 - accuracy: 0.7943 - val_loss: 0.4091 - val_accuracy: 0.8746
Epoch 23/50
1562/1562 [==============================] - 40s 26ms/step - loss: 0.5561 - accuracy: 0.8070 - val_loss: 0.3737 - val_accuracy: 0.8872
Epoch 24/50
1562/1562 [==============================] - 40s 26ms/step - loss: 0.5314 - accuracy: 0.8150 - val_loss: 0.3808 - val_accuracy: 0.8810
Epoch 25/50
1562/1562 [==============================] - 40s 26ms/step - loss: 0.5107 - accuracy: 0.8197 - val_loss: 0.3246 - val_accuracy: 0.9048
Epoch 26/50
1562/1562 [==============================] - 40s 26ms/step - loss: 0.4833 - accuracy: 0.8304 - val_loss: 0.3085 - val_accuracy: 0.9115
Epoch 27/50
1562/1562 [==============================] - 41s 26ms/step - loss: 0.4595 - accuracy: 0.8425 - val_loss: 0.2992 - val_accuracy: 0.9111
Epoch 28/50
1562/1562 [==============================] - 40s 26ms/step - loss: 0.4395 - accuracy: 0.8467 - val_loss: 0.2566 - val_accuracy: 0.9305
Epoch 29/50
1562/1562 [==============================] - 41s 26ms/step - loss: 0.4157 - accuracy: 0.8563 - val_loss: 0.2482 - val_accuracy: 0.9339
Epoch 30/50
1562/1562 [==============================] - 40s 26ms/step - loss: 0.3930 - accuracy: 0.8629 - val_loss: 0.2129 - val_accuracy: 0.9449
Epoch 31/50
1562/1562 [==============================] - 40s 26ms/step - loss: 0.3727 - accuracy: 0.8705 - val_loss: 0.1999 - val_accuracy: 0.9525
Epoch 32/50
1562/1562 [==============================] - 40s 26ms/step - loss: 0.3584 - accuracy: 0.8751 - val_loss: 0.1791 - val_accuracy: 0.9593
Epoch 33/50
1562/1562 [==============================] - 41s 26ms/step - loss: 0.3387 - accuracy: 0.8830 - val_loss: 0.1770 - val_accuracy: 0.9557
Epoch 34/50
1562/1562 [==============================] - 41s 26ms/step - loss: 0.3189 - accuracy: 0.8905 - val_loss: 0.1613 - val_accuracy: 0.9643
Epoch 35/50
1562/1562 [==============================] - 40s 26ms/step - loss: 0.3036 - accuracy: 0.8969 - val_loss: 0.1421 - val_accuracy: 0.9681
Epoch 36/50
1562/1562 [==============================] - 41s 26ms/step - loss: 0.2784 - accuracy: 0.9039 - val_loss: 0.1290 - val_accuracy: 0.9736
Epoch 37/50
1562/1562 [==============================] - 40s 26ms/step - loss: 0.2626 - accuracy: 0.9080 - val_loss: 0.1148 - val_accuracy: 0.9762
Epoch 38/50
1562/1562 [==============================] - 41s 26ms/step - loss: 0.2521 - accuracy: 0.9145 - val_loss: 0.0937 - val_accuracy: 0.9828
Epoch 39/50
1562/1562 [==============================] - 42s 27ms/step - loss: 0.2387 - accuracy: 0.9190 - val_loss: 0.1045 - val_accuracy: 0.9768
Epoch 40/50
1562/1562 [==============================] - 40s 26ms/step - loss: 0.2215 - accuracy: 0.9247 - val_loss: 0.0850 - val_accuracy: 0.9860
Epoch 41/50
1562/1562 [==============================] - 40s 26ms/step - loss: 0.2124 - accuracy: 0.9274 - val_loss: 0.0750 - val_accuracy: 0.9862
Epoch 42/50
1562/1562 [==============================] - 41s 26ms/step - loss: 0.1980 - accuracy: 0.9335 - val_loss: 0.0680 - val_accuracy: 0.9896
Epoch 43/50
1562/1562 [==============================] - 41s 26ms/step - loss: 0.1906 - accuracy: 0.9350 - val_loss: 0.0616 - val_accuracy: 0.9912
Epoch 44/50
1562/1562 [==============================] - 40s 26ms/step - loss: 0.1769 - accuracy: 0.9410 - val_loss: 0.0508 - val_accuracy: 0.9922
Epoch 45/50
1562/1562 [==============================] - 41s 26ms/step - loss: 0.1648 - accuracy: 0.9455 - val_loss: 0.0485 - val_accuracy: 0.9936
Epoch 46/50
1562/1562 [==============================] - 41s 26ms/step - loss: 0.1571 - accuracy: 0.9487 - val_loss: 0.0435 - val_accuracy: 0.9952
Epoch 47/50
1562/1562 [==============================] - 40s 26ms/step - loss: 0.1514 - accuracy: 0.9501 - val_loss: 0.0395 - val_accuracy: 0.9950
Epoch 48/50
1562/1562 [==============================] - 41s 26ms/step - loss: 0.1402 - accuracy: 0.9535 - val_loss: 0.0274 - val_accuracy: 0.9984
Epoch 49/50
1562/1562 [==============================] - 40s 26ms/step - loss: 0.1357 - accuracy: 0.9549 - val_loss: 0.0308 - val_accuracy: 0.9966
Epoch 50/50
1562/1562 [==============================] - 42s 27ms/step - loss: 0.1269 - accuracy: 0.9596 - val_loss: 0.0251 - val_accuracy: 0.9976
<tensorflow.python.keras.callbacks.History at 0x2de3aaa0ec8>

For better visualization and monitoring of training performance, we’ll use the TensorBoard functionality.

Open up a terminal at the directory level where the TensorBoard log folder exists and run the following command:

tensorboard --logdir logs
Directory level where TensorBoard log file resides

Follow the instructions on the terminal and navigate to ‘localhost:6006’ (this could be a different port number for you).

Alas, you will be presented with a page that is similar to the image depicted below:

TensorBoard Tool

Below is the snippet of the visualization of the complete training and validation phase provided by TensorBoard.

TensorBoard Training and Validation monitoring

8. Evaluation

The last official step is to assess the trained network through network evaluation.

The evaluation phase will provide a performance score of the trained model on unseen data. For the evaluation phase of the model, we’ll be utilizing the batch of test data created at earlier steps.

Evaluating a model is very simple, you simply call the evaluate()method and pass the batched test data.

model.evaluate(test_ds)

After executing the cell block above, we are presented with a score that indicates the performance of the model on unseen data.

312/312 [==============================] - 8s 27ms/step - loss: 0.9814 - accuracy: 0.7439
[0.9813630809673132, 0.7438902]

The first element of the returned result contains the evaluation loss: 0.9813, the second element indicates is the evaluation accuracy 0.74389.

The custom implemented AlexNet network that was trained, validated, and evaluated on the CIFAR-10 dataset to create a model with an evaluation accuracy of 74% on a test dataset containing 5000 data points.

Bonus (Optional)

This section includes some information that supplements the implementation of an AlexNet convolutional neural network.

Although this additional information is not crucial to gain an understanding of the implementation processes, these sections will provide readers with some additional background knowledge that can be leveraged in future work.

The sections covered are as follows:

  • Local Response Normalisation
  • Information into why we batch and shuffle the dataset before training

Local Response Normalisation

Many are familiar with batch normalization, but the AlexNet architecture used a different method of normalization within the network: Local Response Normalization (LRN).

LRN is a technique that maximizes the activation of neighbouring neurons. Neighbouring neurons describe neurons across several feature maps that share the same spatial position. By normalizing the activations of the neurons, neurons with high activations are highlighted; this essentially mimics the lateral inhibition that happens within neurobiology.

LRN are not widely utilized in modern CNN architectures, as there are other more effective methods of normalization. Although LRN implementations can still be found in some standard, so feel free to experiment.

Why do we shuffle the dataset?

Shuffling the dataset before training is a traditional process within a typical machine learning project. But why do we do it?

When conducting data aggregation, it is common to consecutively accumulate images or data points that correspond to the same classes and labels. A typical final result after loading data used to train, and validate a network is a set of images/data points that are arranged in order of corresponding classes.

The method by which neural networks learn within Deep learning is through the detection of patterns between spatial information within images.

Supposedly we have a dataset of 10,000 images with five classes. The first 2,000 images belong to Class 1; the second 2,000 images belong to Class 2, and so on.

During the training phase, if we present the network with unshuffled training data, we would find that the neural network will learn patterns that closely correlate to Class 1, as these are the images and data points the neural network is exposed to first. This will increase the difficulty of an optimization algorithm discovering an optimal solution for the entire dataset.

By shuffling the dataset, we ensure two key things:

1. There is large enough variance within the dataset that enables each data point within the training data to have an independent effect on the network. Therefore we can have a network that generalizes well to the entire dataset, rather than a subsection of the dataset.

2. Our validation partition of the dataset is obtained from the training data; if we fail to shuffle the dataset appropriately, we find that our validation dataset will not be representative of the classes within training data. For example, our validation dataset might only contain data points from the last class of the training data, as opposed to equal representation of every class with the dataset.

Why do we batch the dataset before training?

Dataset partitions are usually batched for memory optimization reasons. There are two ways you can train a network.

  1. Present all the training data to the network at once
  2. Batch the training data in smaller segments (e.g., 8, 16, 32, 64), and at each iteration, a single batch is presented to the network.

Approach #1 will work for a small dataset, but when you start approaching a larger sized dataset, you will find that approach #1 consumes a lot of memory resources

By using approach #1 for a large dataset, the images or data points are held in memory, and this typically causes ‘Out of Memory’ error during training.

Approach #2 is a more conservative method of training network with large dataset while considering efficient memory management. By batching the training data, we are only holding 16, 32, or 128 data points at any giving time in memory, as opposed to an entire dataset.

Conclusion

This detailed article covers some topics surrounding typical processes within deep learning projects. We’ve gone through the following subject areas:

  • Machine and Deep learning tools and libraries
  • Data partitioning
  • Creating Input and data pipelines using TensorFlow
  • Data Preprocessing
  • Convolutional Neural Network Implementation (AlexNet)
  • Model performance monitoring using TensorBoard
  • Model Evaluation

In the future, we’ll cover the implementation of another well known convolutional neural network architecture: GoogLeNet.

Want more from me?

  1. by becoming a referred Medium member
  2. to get notified when I publish articles
  3. Connect and reach me on
  4. Learn with me at

Sign up for The Variable

By Towards Data Science

Every Thursday, the Variable delivers the very best of Towards Data Science: from hands-on tutorials and cutting-edge research to original features you don't want to miss. 

Your home for data science. A Medium publication sharing concepts, ideas and codes.

Share your ideas with millions of readers.

Recommended from Medium

Android with ML

EfficientDet: Scalable and Efficient Object Detection Review

Poplar SDK 2.5 now available

Foliar Leaf diseases in Apple Trees

Deep Learning Receptors and Pooling Explained

Personalized Cancer Diagnosis

Keras & TensorFlow to Predict Market Movements and Backtest using Backtrader

Deep study of a not very deep neural network. Part 3a: Optimizers overview

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Richmond Alake

Richmond Alake

Machine Learning Content Creator with 1M+ views— Computer Vision Engineer. Interested in gaining and sharing knowledge on Technology and Finance

More from Medium

Is Machine Learning an Enemy to Humans, or Do Humans Have the Wrong Point of View?

What are Intrinsic and Extrinsic Camera Parameters in Computer Vision?

How to build a CGAN for generating MNIST digits in PyTorch

Building your data science portfolio(1e) — Exploring CT image arrays (part 1)