Member-only story
PatchTST: A Breakthrough in Time Series Forecasting
From theory to practice, understand the PatchTST algorithm and apply it in Python alongside N-BEATS and N-HiTS
Transformer-based models have been successfully applied in many fields like natural language processing (think BERT or GPT models) and computer vision to name a few.
However, when it comes to time series, state-of-the-art results have mostly been achieved by MLP models (multilayer perceptron) such as N-BEATS and N-HiTS. A recent paper even shows that simple linear models outperform complex transformer-based forecasting models on many benchmark datasets (see Zheng et al., 2022).
Still, a new transformer-based model has been proposed that achieves state-of-the-art results for long-term forecasting tasks: PatchTST.
PatchTST stands for patch time series transformer, and it was first proposed in March 2023 by Nie, Nguyen et al in their paper: A Time Series is Worth 64 Words: Long-Term Forecasting with Transformers. Their proposed method achieved state-of-the-art results when compared to other transformer-based models.
In this article, we first explore the inner workings of PatchTST, using intuition and no equations. Then, we apply the model in a forecasting project and compare its performance to MLP models, like N-BEATS and N-HiTS, and assess its performance.
Of course, for more details about PatchTST, make sure to refer to the original paper.
Learn the latest time series analysis techniques with my free time series cheat sheet in Python! Get the implementation of statistical and deep learning techniques, all in Python and TensorFlow!
Let’s get started!
Exploring PatchTST
As mentioned, PatchTST stands for patch time series transformer.
As the name suggests, it makes use of patching and of the transformer architecture. It also includes channel-independence to treat multivariate time series. The general architecture is shown below.