You have 2 free member-only stories left this month.

Mar 1

4 min read

Forecasting with Auto ARIMA in Python

ARIMA is a popular and powerful statistical time series forecasting method.

ARIMA Overview

ARIMA stands for (A)uto-(R)egressive (I)ntegrated (M)oving (A)verages. Here is a breakdown of what each of these terms mean:

Auto Regressive: This essentially means ARIMA looks back at historical data to help predict the next data point. Think of this as a lag term in the model (how far back are we looking to help predict the next data point). In ARIMA, the “AR” term is also referred to as ‘p’ and can be manually set by looking at PACF plots.
Integrated: This term in ARIMA refers to the differencing of values in a time series dataset. This helps transform the series into a stationary series. When a time series is stationary, the mean stays approximately the same. There is no trend upward or downward over time. In ARIMA, the “I” is also referred to as ‘d’. An ADF test can help determine if a series is stationary.
Moving Averages: The Moving Averages component of ARIMA is essentially a window. This component looks at previous values to help gauge the trend of the series. In ARIMA, this term is referred to as ‘q’ and can manually be set by looking at ACF plots.

For seasonal data, you can use a version of ARIMA called sARIMA. Which uses a ‘seasonal’ component to help generate a model.

Wouldn’t it be convenient if there was a function that automatically tried to estimate these parameters for you, so you can just get a forecast quickly and easily? Well there is! Those of you that have used R for forecasting may know the auto.arima function in the forecast package. Now there is also a similar function in Python thanks to the pmdarima library. We will use this library and auto ARIMA function to keep things simple in this post.

Code Example

Library Imports

# Data Library
import numpy as np# ARIMA library
import pmdarima as pm# Visualization Library
import matplotlib.pyplot as plt

Load Dataset

# Loading some sample data using the pmdarima library
training_data = pm.datasets.load_wineind()# Visualize the sample data
time = np.arange(training_data.shape[0])
plt.plot(time, training_data, c = 'black')# Set axis labels
plt.xlabel('Time')
plt.ylabel('Value')
plt.show()

Here we can see our sample data over time.

Create an ARIMA model

Now let’s finally create an ARIMA model using auto ARIMA in Python with the pmdarima library.

# Fit the ARIMA model with auto generated parameters and use the seasonal component.
model = pm.auto_arima(training_data, seasonal=True, m=12)# Set number of points to forecast as h
h = 50# Generate a forecast
forecast = model.predict(h)

Visualize Forecast

Now let’s look at our forecast along with the training data.

# Visualize forecast with training data
time = np.arange(training_data.shape[0] + h)# Slice the x-axis to only plot the training data as the color black
plt.plot(time[:training_data.shape[0]], training_data, c = 'black')# Slice the x-axis to plot our forecast after the training data as the color red
plt.plot(time[training_data.shape[0]:], forecast, c = 'red')# Set axis labels
plt.xlabel('Time')
plt.ylabel('Value')plt.show()

Fig. 2 (Plot of training data with forecast)

There we have it! We now have a forecast that takes into consideration previous data points, seasonality, and trend with just a few lines of code.

Summary

Time series forecasting is a common use case for data scientists to work on, using statistical models or machine learning models to generate forecasts. Using auto ARIMA is a quick way to generate a decent baseline forecast, with a few lines of code. Previously, auto ARIMA was primarily conducted using R’s auto.arimafunction, but now thanks to the pmdarima Python library we can generate quick forecasts using auto ARIMA in Python as well. There are many ways to forecast data and ARIMA is just one of them. To enhance accuracy, readers should look into other forecasting methods - some common methods include regression models like Generalized Models (Linear or Additive), LSTM models for those that want to explore deep learning approaches, and Decision Tree based methods like XGBoost.

Bio

Frankie Cancino is a Data Scientist for Mercedes-Benz Research & Development, living in the San Francisco Bay Area.

Links

Get an email whenever Frankie Cancino publishes.

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Medium sent you an email at to complete your subscription.

More from Frankie Cancino

Data Scientist for Mercedes-Benz Research & Development. Member of the Matrix Profile Foundation. I write about tech and artificial intelligence.

Published in Towards Data Science

·Aug 3, 2021

Evaluation Metrics for Machine Learning

Certain metrics measure model performance better than others. — There are a plethora of metrics to determine the performance of machine learning models. It is beneficial to know which evaluation metric will properly measure your model performance. Certain metrics measure model performance better than others, depending on the use case. …

Machine Learning

5 min read

Share your ideas with millions of readers.

Write on Medium

Published in Towards Data Science

·May 21, 2021

What Is Logistic Regression?

Logistic Regression is a fundamental classification tool for data scientists and machine learning engineers. — This tutorial is on the basics of applying logistic regression, using a little bit of Python. It is also a continuation of the post “What is Linear Regression?”, which can be found here. About Logistic Regression It is a little counterintuitive, but Logistic Regression is typically used as a classifier. In fact, Logistic…

Machine Learning

6 min read

Published in Towards Data Science

·Feb 15, 2021

What is Linear Regression?

Linear regression is one of the fundamental tools for data scientists and machine learning practitioners — Overview This tutorial is on the basics of linear regression. It is also a continuation of the Intro to Machine Learning post, “What is Machine Learning?”, which can be found here. So what is linear regression? Linear regression is one of the fundamental tools for data scientists and machine learning practitioners. We reference the equation y…

Data Science

7 min read

Published in Analytics Vidhya

·Sep 12, 2020

What is Gradient Descent?

Overview This tutorial is on the basics of gradient descent. It is also a continuation of the Intro to Machine Learning post, “What is Machine Learning?”, which can be found here. So what is gradient descent? Gradient descent is a method of finding the optimal weights for a model. We use the gradient descent algorithm to…

Data Science

6 min read

Published in The Startup

·May 16, 2020

What is Machine Learning?

Overview Machine learning, put simply, is the ability to use algorithms and mathematics to find patterns in data. You’ll find machine learning applied to time-series data (think stock exchange data), language, images, etc. All machine learning is really doing, is finding mathematical representations within the data that you fed it. These…

Machine Learning

6 min read

Love podcasts or audiobooks? Learn on the go with our new app.

Try Knowable