Poisson Regression and Generalised Linear Models

A theoretical introduction into Poisson Regression and Generalised Linear Models

Published in

Towards Data Science

6 min readNov 15, 2021

Note: Throughout this article I erroneously refer to E[Y] as the target output. When I am mentioning E[Y], I am implicitly meaning E[Y|X] as thats the correct notation! Linked here is a stats exchange thread that explains this difference.

Linear Regression is the first algorithm most Data Scientists begin their journey with. It is an intiuative and easily implemented and visualised model for continous data. The second most learned algorithm by beginner Data Scientists is Logistic Regression, where the model has a binary output. Most people see these two algorithms as completely separate when in actual fact they are part of the same family of models named Generalised Linear Models (GLMs). In this article we will gain an intuition about GLMs through an example scenario using the Poisson distribution.

Linear Regression Basics and Limitations

Linear Regression is a model used to fit a line or hyperplane to a dataset where the output is continuous and has residuals which are normally distributed. This is mathematical written as:

Where E(Y) is the mean response of the target variable, X is a matrix of the predictor variables and β are the unknown linear coefficients which are adjusted and trained to produce the best model.

Linear Regression is used in many industries such as Epidemiology, Finance and Economics. However, despite being used in all these areas it does have some flaws that makes it’s predictions redundant in certain applications. Imagine you are a phone operator and want to predict how many calls you will receive in a day. Do you think Linear Regression would be a suitable model? The answer is NO for the following reasons:

The number of calls have to be greater or equal to 0, whereas in Linear Regression the output can be negative as well as positive.
The number of calls only take integer values but Linear Regression can output fractional values.

These flaws, and many others, require us to use another regression algorithm to model the expected number of calls.

Poisson Regression and Generalised Linear Models

A theoretical introduction into Poisson Regression and Generalised Linear Models

Linear Regression Basics and Limitations

Read the full story with a free account.

Written by Egor Howell

More from Egor Howell and Towards Data Science

Beyond the Bell Curve: An Introduction to the t-distribution

Discover the origins, theory and uses behind the famous t-distribution

RAG vs Finetuning — Which Is the Best Tool to Boost Your LLM Application?

The definitive guide for choosing the right method for your use case

New ChatGPT Prompt Engineering Technique: Program Simulation

A potentially novel technique for turning a ChatGPT prompt into a mini-app.

A Data Scientist’s Guide To Improving Python Code Quality

Tools and packages to write production worthy Python code

Recommended from Medium

The Matrix Algebra of Linear Regression

Looking under the hood at the matrix operations behind linear regression

Chi Square Test — Intuition, Examples, and Step-by-Step Calculation

The best way to see if two variables are related.

Lists

Predictive Modeling w/ Python

Practical Guides to Machine Learning

Natural Language Processing

ChatGPT prompts

“Unlocking the Power of Chi Square: A Guide to Statistical Analysis”

Chi Square Distribution

Linear Regression and Logistic Regression - What is different?

What is a regression model in Machine Learning?

Logistic Regression Explained

Moving Beyond Linear Predictions

3 Reasons Why You Shouldn’t Become a Data Scientist

Data Science is fun, but is it a career you should choose? Think twice if any of these 3 reasons hit home.