Calculating the Median in PostgreSQL: A Comprehensive Guide

4 min readJun 18, 2024

Calculating the median, a measure of central tendency, is a common requirement in data analysis. PostgreSQL, a powerful open-source relational database, provides a straightforward way to compute the median using its built-in window functions. In this article, we’ll explore how to calculate the median in PostgreSQL, delve into the concept of percentiles, and provide a custom version for computing the median.

Understanding Percentiles

Before diving into the implementation, it’s essential to understand the concept of percentiles. A percentile is a statistical measure that indicates the value below which a given percentage of observations in a group falls. For example, the 50th percentile (median) is the value below which 50% of the observations can be found.

In general terms:
- The 25th percentile (Q1) is the value below which 25% of the data lies.
- The 50th percentile (Q2 or median) is the value below which 50% of the data lies.
- The 75th percentile (Q3) is the value below which 75% of the data lies.

Percentiles help in understanding the distribution of data by dividing it into parts.

Calculating the Median in PostgreSQL

PostgreSQL offers the `percentile_cont` function to calculate percentiles, including the median. This function computes a continuous percentile, which is essential for finding the median accurately.

Let’s go through the steps to calculate the median in PostgreSQL.

1. Creating a Sample Table

First, we’ll create a sample table and insert some data.

CREATE TABLE
sales (region VARCHAR(50), amount NUMERIC);

INSERT INTO
sales (region, amount)
VALUES
('North', 100),
('North', 200),
('North', 300),
('South', 400),
('South', 500),
('South', 600);

2. Calculating the Median

To calculate the median of the `amount` column in the `sales` table, we use the `percentile_cont` function.

SELECT
percentile_cont(0.5) WITHIN GROUP (
ORDER BY
amount
) AS median_amount
FROM
sales;

In this query:
- `percentile_cont(0.5)` computes the 50th percentile, which is the median.
- `WITHIN GROUP (ORDER BY amount)` specifies the order of the values before computing the percentile.

The result of this query will be the median value of the `amount` column across all rows in the `sales` table.

Calculating Median by Group

Often, you’ll need to calculate the median for different groups within your data. For example, let’s find the median sales amount for each region.

SELECT
region,
percentile_cont(0.5) WITHIN GROUP (
ORDER BY
amount
) AS median_amount
FROM
sales
GROUP BY
region;

In this query:
- We group the data by the `region` column.
- For each group, we compute the median of the `amount` column.

The result will give us the median sales amount for each region.

Custom Version for Calculating the Median

While `percentile_cont` is convenient, there may be scenarios where you want a custom method for calculating the median. Here’s how you can create a custom version using common table expressions (CTEs) and window functions:

WITH
ordered_sales AS (
SELECT
amount,
ROW_NUMBER() OVER (
ORDER BY
amount
) AS row_num,
COUNT(*) OVER () AS total_count
FROM
sales
)
SELECT
AVG(amount) AS median_amount
FROM
ordered_sales
WHERE
row_num IN ((total_count + 1) / 2, (total_count + 2) / 2);

In this custom version:
- We create a CTE called `ordered_sales` that assigns a row number to each `amount` and counts the total number of rows.
- We then select the average of the amounts where the row number is in the middle of the dataset. This approach ensures that we handle both even and odd numbers of rows correctly.

For grouped data, the custom version can be adapted as follows:

WITH
ordered_sales AS (
SELECT
region,
amount,
ROW_NUMBER() OVER (
PARTITION BY
region
ORDER BY
amount
) AS row_num,
COUNT(*) OVER (
PARTITION BY
region
) AS total_count
FROM
sales
)
SELECT
region,
AVG(amount) AS median_amount
FROM
ordered_sales
WHERE
row_num IN ((total_count + 1) / 2, (total_count + 2) / 2)
GROUP BY
region;

In this adaptation:
- We partition the data by `region` in the window functions to calculate row numbers and total counts within each region.
- The final query groups by `region` and calculates the median for each group.

Why Use Percentile Functions?

Using percentile functions like `percentile_cont` offers several advantages:
- Accuracy: `percentile_cont` calculates the percentile value accurately, even when dealing with continuous data.
- Simplicity: The function simplifies the process of percentile calculation, avoiding complex and manual calculations.
- Flexibility: You can compute any percentile (not just the median) by changing the argument. For example, `percentile_cont(0.25)` calculates the 25th percentile.

Conclusion

Calculating the median in PostgreSQL is straightforward with the `percentile_cont` function, and creating a custom version provides additional flexibility. Understanding and utilizing percentiles can provide valuable insights into your data’s distribution. Whether you’re analyzing sales data or any other dataset, PostgreSQL’s powerful window functions make it easy to perform these calculations efficiently.

By leveraging these functions and custom methods, you can enhance your data analysis capabilities and make informed decisions based on accurate statistical measures. Happy querying!

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

No responses yet

Write a response