Pandas vs Polars vs Rapids: What’s the Most Convenient for a Laptop?

A benchmark of three data packages

Published in

Level Up Coding

7 min readJan 10, 2024

Stepping into the ever-evolving realm of data analysis, the choice of the right library for data processing has become important.

In this dynamic landscape, Pandas, Polars, and Rapids emerge as formidable contenders, each armed with its arsenal of features and capabilities. Picture me navigating the data world aboard a laptop, immersed in the quest to determine which among Pandas, Polars, and Rapids proves to be the most reliable companion. My mission? To put these libraries to the test through a series of common operations on datasets of various sizes, shedding light on their performances and unique abilities.

In particular, I provide a comparison between:

Pandas v2.1.0
Pandas v2.1.4
Polars v0.18.00
Polars v0.20.02
cudf-cu11 v23.12.1 (a.k.a. Rapids)

Join me on a journey through the heart of data analysis, where Pandas, Polars, and Rapids engage in a duel for efficiency.

Who will claim the podium? Let’s explore together, unravelling the challenges and triumphs of these powerful libraries in the fascinating world of Toward Dat Science.

Package Introduction

Before starting with the methodology, I provide an introduction to each package.

Pandas is a widely used Python library for data manipulation and analysis. It offers a wide range of functionalities for working with structured data. However, it may become slow with large datasets due to sequential execution on the CPU.

Polars, a Python library, achieves higher performance than Pandas by leveraging Rust’s speed and memory safety. With optimized memory allocation, columnar data storage, and parallel processing capabilities, Polars is efficient for large datasets.

Rapids is an open-source framework developed by NVIDIA for data processing on GPUs. It leverages GPU acceleration for parallel operations on data arrays, providing significant performance improvements over Pandas. In the article, I will refer to this library with the name cudf.

Pandas vs Polars vs Rapids: What’s the Most Convenient for a Laptop?

A benchmark of three data packages

Package Introduction

Create an account to read the full story.

Written by Davide Gazzè - Ph.D.

More from Davide Gazzè - Ph.D. and Level Up Coding

Using PlantUML in Jupyter

A quick tutorial for using PlantUML with Jupyter

System Design Interview Question: Design Spotify

High-level overview of a System Design Interview Question - Design Spotify.

How to Use ChatGPT in Daily Life?

Save time and money using ChatGPT

An Introduction to GPT4All

A fast insight into this fascinating project

Recommended from Medium

This Is How You Make Your Python Functions 5000% Faster With Rust (Yes, You Read It Right!)

A Lesson On Creating Performant Rust Bindings In Python

6 Reasons Why PostgreSQL is Not So Popular, Yet!

According to DB-Engines, PostgreSQL is struggling to stay in the TOP 3 most popular database, but why?

Lists

Staff Picks

Stories to Help You Level-Up at Work

Self-Improvement 101

Productivity 101

A Guide To Data Pipeline Testing with Python

A gentle introduction to unit testing, mocking and patching for beginners

You’re Decent At Python If You Can Answer These 7 Questions Correctly

# No cheating pls!!

Benchmarking Python Processing Engines: Who’s the Fastest?

In the dynamic landscape of data engineering, two tools have recently caught attention: DuckDB and Polars. DuckDB impresses with its unique…

Running Your Very Own Local LLM

Tools like Ollama let you experiment with large language models on an average PC