How to iterate over rows in a DataFrame in Pandas: Answer: DON’T*!

Narender Kumar
3 min readAug 21, 2024

As a Pandas enthusiast, I often encounter questions from new users inquiring about the best way to iterate over a DataFrame to perform a specific task. The common perception is that iteration is the go-to solution, but in reality, this approach can often be suboptimal, leading to performance issues and potentially less efficient code. In this article, we will explore the alternative methods available in Pandas that can help you write more idiomatic, performant, and readable code.

The Perils of Iteration

Iteration in Pandas is often considered an anti-pattern, and the Pandas documentation itself warns about the potential performance pitfalls of using functions with “iter” in their names. The reason for this is that iterating over a DataFrame can be significantly slower than other approaches, especially when dealing with large datasets.

Embracing Vectorization

Pandas, built on top of NumPy, is designed to leverage vectorized operations, which perform computations on entire arrays or Series at once, rather than element-by-element. This approach is generally faster and more efficient than iterating over the data. Here’s an example of a vectorized operation:

import pandas as pd

Create an account to read the full story.

The author made this story available to Medium members only.
If you’re new to Medium, create a new account to read this story on us.

Or, continue in mobile web

Already have an account? Sign in