In this short guide, I'll show you how to show progress bars for most common operations in Pandas.

To start, here is the basic usage that might be applied in order show progress bar in Pandas:

  • simple Pandas operations
  • groupby and other operations
import pandas as pd
import numpy as np
from tqdm import tqdm

df = pd.DataFrame(np.random.randint(0, 100, (1000000, 100)))

tqdm.pandas(desc="power DataFrame 1M to 100 random int!")

df.progress_apply(lambda x: x**2)
df.groupby(0).progress_apply(lambda x: x**2)

In the next section, I’ll review the steps and details to apply the above syntax in real examples.

Step 1: Install and Update TQDM

First let's see how to install and update the library. To install tqdm in Python you can use the code below:

pip install tqdm

Many new features and bug fixes come with the latest version. It's always a good idea to stay up to date with most Python libraries:

pip install tqdm -U

tqdm shows progress bar for loops and other operations. All you need is to wrap any iterable with tqdm(iterable).

A fun fact for tqdm which you probably don't know:

  • the name derives from the Arabic - taqaddum (تقدّم) which mean "progress" * there is Spanish abbreviation - "te quiero demasiado" - meaning - "I love you so much"

Step 2: Show Progress bar on loops

As we so in Step 1 - all you need to do is wrap iterable with tqdm.

Let's check a small program which calculates the power of numbers in range from 0 to 2000000.

In order to follow the progress of this operation we can use the following syntax:

import numpy as np
from tqdm import tqdm

myrange = np.arange(2000000)
i_2 = []

for i in tqdm(myrange):
    i_2.append(i**2)

The result is:

100%|██████████| 2000000/2000000 [00:01<00:00, 1624316.35it/s]

Step 3: Change progress bar size and style

If you like to change the way tqdm shows the progress bar you can use the following options:

  • import from tqdm.auto import tqdm to show a different style
  • add parameter bar_format='{desc:<5.5}{percentage:3.0f}%|{bar:50}{r_bar}' or bar_format='{l_bar}{bar:10}{r_bar}{bar:-10b}'

Let's see both in action. First changing the style a nicer one (at least in JupyterLab):

import numpy as np
from tqdm.auto import tqdm

myrange = np.arange(2000000)
i_2 = []

for i in tqdm(myrange):
    i_2.append(i**2)

Change the size of the progress bar:

import numpy as np
from tqdm import tqdm

myrange = np.arange(2000000)
i_2 = []

for i in tqdm(myrange, bar_format='{desc:<5.5}{percentage:3.0f}%|{bar:50}{r_bar}'):
    i_2.append(i**2)

Step 4: Progress bar during Pandas operations

In this step we will see how to show the progress for the most common Pandas operations.

Adding a progress bar to Pandas shouldn't impact the performance but in case of doubts it's better to be checked.

You can find a simple example for Pandas progress bar below:

Pandas iterrows and progress bar

The simplest usage of tqdm in Pandas is in combination of loop and iterrows(). You will need to provide the total number of all items - which can be get by - df.shape[0]:

from time import sleep

for index, row in tqdm(df.iterrows(), total=df.shape[0]):
    sleep(0)

Pandas progress bar for lambda

progress_apply is a tqdm method which can accept a lambda. It can perform simple operations on the operand like power on 2:

tqdm.pandas(desc="power DataFrame 1M x 100 of random int!")

df.progress_apply(lambda x: x**2)

Pandas progress bar for function with progress_map

tqdm offers method progress_map which can be used to apply functions to Pandas DataFrame or Series. It'll show a progress bar:

from tqdm import tqdm

def cube(x):
    return x ** 2

tqdm.pandas()

df['cube'] = df[0].progress_map(cube)

Pandas progress bar for dictionary map

If you like to map the values of a dictionary to Pandas column and follow the progress with bar you can use the following syntax:

mapping = {15:'a', 9:'b'}

df[0].progress_map(lambda x: mapping.get(x))

Pandas and progress_aggregate

Finally let's check how to show progress bar on aggregation functions like:

  • sum
  • count
  • mean etc
from tqdm import tqdm
tqdm.pandas()

df.groupby(0).progress_aggregate(sum)

Resources