You have 2 free member-only stories left this month.

LSTM Autoencoder for Anomaly Detection

Create an AI deep learning anomaly detection model using Python, Keras and TensorFlow

Photo by Ellen Qin on Unsplash

The goal of this post is to walk you through the steps to create and train an AI deep learning neural network for anomaly detection using Python, Keras and TensorFlow. I will not delve too much in to the underlying theory and assume the reader has some basic knowledge of the underlying technologies. However, I will provide links to more detailed information as we go and you can find the source code for this study in my GitHub repo.

Analysis Dataset

We will use vibration sensor readings from the NASA Acoustics and Vibration Database as our dataset for this study. In the NASA study, sensor readings were taken on four bearings that were run to failure under constant load over multiple days. Our dataset consists of individual files that are 1-second vibration signal snapshots recorded at 10 minute intervals. Each file contains 20,480 sensor data points per bearing that were obtained by reading the bearing sensors at a sampling rate of 20 kHz.

You can download the sensor data here. Due to GitHub size limitations, the bearing sensor data is split between two zip files (Bearing_Sensor_Data_pt1 and 2). You will need to unzip them and combine them into a single data directory.

Anomaly Detection

Anomaly detection is the task of determining when something has gone astray from the “norm”. Anomaly detection using neural networks is modeled in an unsupervised / self-supervised manner; as opposed to supervised learning, where there is a one-to-one correspondence between input feature samples and their corresponding output labels. The presumption is that normal behavior, and hence the quantity of available “normal” data, is the norm and that anomalies are the exception to the norm to the point where the modeling of “normalcy” is possible.

We will use an autoencoder deep learning neural network model to identify vibrational anomalies from the sensor readings. The goal is to predict future bearing failures before they happen.

LSTM Networks

The concept for this study was taken in part from an excellent article by Dr. Vegard Flovik “Machine learning for anomaly detection and condition monitoring”. In that article, the author used dense neural network cells in the autoencoder model. Here, we will use Long Short-Term Memory (LSTM) neural network cells in our autoencoder model. LSTM networks are a sub-type of the more general recurrent neural networks (RNN). A key attribute of recurrent neural networks is their ability to persist information, or cell state, for use later in the network. This makes them particularly well suited for analysis of temporal data that evolves over time. LSTM networks are used in tasks such as speech recognition, text translation and here, in the analysis of sequential sensor readings for anomaly detection.

There are numerous excellent articles by individuals far better qualified than I to discuss the fine details of LSTM networks. So if you’re curious, here is a link to an excellent article on LSTM networks. There is also the defacto place for all things LSTM — Andrej Karpathy’s blog. Enough with the theory, let’s get on with the code…

Load , Pre-Process & Review Data

I will be using an Anaconda distribution Python 3 Jupyter notebook for creating and training our neural network model. We will use TensorFlow as our backend and Keras as our core model development library. The first task is to load our Python libraries. We then set our random seed in order to create reproducible results.

The assumption is that the mechanical degradation in the bearings occurs gradually over time; therefore, we will use one datapoint every 10 minutes in our analysis. Each 10 minute data file sensor reading is aggregated by using the mean absolute value of the vibration recordings over the 20,480 datapoints. We then merge everything together into a single Pandas dataframe.

Next, we define the datasets for training and testing our neural network. To do this, we perform a simple split where we train on the first part of the dataset, which represents normal operating conditions. We then test on the remaining part of the dataset that contains the sensor readings leading up to the bearing failure.

Now that we’ve loaded, aggregated and defined our training and test data, let’s review the trending pattern of the sensor data over time. First, we plot the training set sensor readings which represent normal operating conditions for the bearings.

Next, we take a look at the test dataset sensor readings over time.

Midway through the test set timeframe, the sensor patterns begin to change. Near the failure point, the bearing vibration readings become much stronger and oscillate wildly. To gain a slightly different perspective of the data, we will transform the signal from the time domain to the frequency domain using a Fourier transform.

Let’s first look at the training data in the frequency domain.

There is nothing notable about the normal operational sensor readings. Now, let’s look at the sensor frequency readings leading up to the bearing failure.

We can clearly see an increase in the frequency amplitude and energy in the system leading up to the bearing failures.

To complete the pre-processing of our data, we will first normalize it to a range between 0 and 1. Then we reshape our data into a format suitable for input into an LSTM network. LSTM cells expect a 3 dimensional tensor of the form [data samples, time steps, features]. Here, each sample input into the LSTM network represents one step in time and contains 4 features — the sensor readings for the four bearings at that time step.

One of the advantages of using LSTM cells is the ability to include multivariate features in your analysis. Here, it’s the four sensor readings per time step. However, in an online fraud anomaly detection analysis, it could be features such as the time of day, dollar amount, item purchased, internet IP per time step.

Neural Network Model

We will use an autoencoder neural network architecture for our anomaly detection model. The autoencoder architecture essentially learns an “identity” function. It will take the input data, create a compressed representation of the core / primary driving features of that data and then learn to reconstruct it again. For instance, input an image of a dog, it will compress that data down to the core constituents that make up the dog picture and then learn to recreate the original picture from the compressed version of the data.

The rationale for using this architecture for anomaly detection is that we train the model on the “normal” data and determine the resulting reconstruction error. Then, when the model encounters data that is outside the norm and attempts to reconstruct it, we will see an increase in the reconstruction error as the model was never trained to accurately recreate items from outside the norm.

We create our autoencoder neural network model as a Python function using the Keras library.

In the LSTM autoencoder network architecture, the first couple of neural network layers create the compressed representation of the input data, the encoder. We then use a repeat vector layer to distribute the compressed representational vector across the time steps of the decoder. The final output layer of the decoder provides us the reconstructed input data.

We then instantiate the model and compile it using Adam as our neural network optimizer and mean absolute error for calculating our loss function.

Finally, we fit the model to our training data and train it for 100 epochs. We then plot the training losses to evaluate our model’s performance.

Loss Distribution

By plotting the distribution of the calculated loss in the training set, we can determine a suitable threshold value for identifying an anomaly. In doing this, one can make sure that this threshold is set above the “noise level” so that false positives are not triggered.

Based on the above loss distribution, let’s try a threshold value of 0.275 for flagging an anomaly. We then calculate the reconstruction loss in the training and test sets to determine when the sensor readings cross the anomaly threshold.

Note that we’ve merged everything into one dataframe to visualize the results over time. The red line indicates our threshold value of 0.275.

Our neural network anomaly analysis is able to flag the upcoming bearing malfunction well in advance of the actual physical bearing failure by detecting when the sensor readings begin to diverge from normal operational values.

Finally, we save both the neural network model architecture and its learned weights in the h5 format. The trained model can then be deployed for anomaly detection.

Update:

In the next article, we’ll deploy our trained AI model as a REST API using Docker and Kubernetes for exposing it as a service.

Using AI deep learning to enhance human capabilities and make life better. Senior Technical Product Manager at T-Mobile

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store