The Forecaster

Data science with a time series flavor

Follow publication

Member-only story

PatchTST: A Breakthrough in Time Series Forecasting

Marco Peixeiro
The Forecaster
Published in
10 min readJun 20, 2023

Photo by Ray Hennessy on Unsplash

Transformer-based models have been successfully applied in many fields like natural language processing (think BERT or GPT models) and computer vision to name a few.

However, when it comes to time series, state-of-the-art results have mostly been achieved by MLP models (multilayer perceptron) such as N-BEATS and N-HiTS. A recent paper even shows that simple linear models outperform complex transformer-based forecasting models on many benchmark datasets (see Zheng et al., 2022).

Still, a new transformer-based model has been proposed that achieves state-of-the-art results for long-term forecasting tasks: PatchTST.

PatchTST stands for patch time series transformer, and it was first proposed in March 2023 by Nie, Nguyen et al in their paper: A Time Series is Worth 64 Words: Long-Term Forecasting with Transformers. Their proposed method achieved state-of-the-art results when compared to other transformer-based models.

In this article, we first explore the inner workings of PatchTST, using intuition and no equations. Then, we apply the model in a forecasting project and compare its performance to MLP models, like N-BEATS and N-HiTS, and assess its performance.

Of course, for more details about PatchTST, make sure to refer to the original paper.

Learn the latest time series analysis techniques with my free time series cheat sheet in Python! Get the implementation of statistical and deep learning techniques, all in Python and TensorFlow!

Let’s get started!

Exploring PatchTST

As mentioned, PatchTST stands for patch time series transformer.

As the name suggests, it makes use of patching and of the transformer architecture. It also includes channel-independence to treat multivariate time series. The general architecture is shown below.

Create an account to read the full story.

The author made this story available to Medium members only.
If you’re new to Medium, create a new account to read this story on us.

Or, continue in mobile web

Already have an account? Sign in

The Forecaster
Marco Peixeiro

Written by Marco Peixeiro

Senior data scientist | Author | Instructor. I write hands-on articles with a focus on practical skills.

Responses (13)

Write a response

Unimpressed, I think a linear model would perform better. MAE of 0.2 is probably higher than STD on that data

23

I wonder how it does against workhorses like AutoArima.

18

what is the n-time variable which you haven't defined

13