In this short guide, I'll show you how to show progress bars for most common operations in Pandas.
To start, here is the basic usage that might be applied in order show progress bar in Pandas:
- simple Pandas operations
groupby
and other operations
import pandas as pd
import numpy as np
from tqdm import tqdm
df = pd.DataFrame(np.random.randint(0, 100, (1000000, 100)))
tqdm.pandas(desc="power DataFrame 1M to 100 random int!")
df.progress_apply(lambda x: x**2)
df.groupby(0).progress_apply(lambda x: x**2)
In the next section, I’ll review the steps and details to apply the above syntax in real examples.
Step 1: Install and Update TQDM
First let's see how to install and update the library. To install tqdm
in Python you can use the code below:
pip install tqdm
Many new features and bug fixes come with the latest version. It's always a good idea to stay up to date with most Python libraries:
pip install tqdm -U
tqdm
shows progress bar for loops and other operations. All you need is to wrap any iterable with tqdm(iterable)
.
A fun fact for tqdm
which you probably don't know:
- the name derives from the Arabic - taqaddum (تقدّم) which mean "progress" * there is Spanish abbreviation - "te quiero demasiado" - meaning - "I love you so much"
Step 2: Show Progress bar on loops
As we so in Step 1 - all you need to do is wrap iterable with tqdm
.
Let's check a small program which calculates the power of numbers in range from 0 to 2000000.
In order to follow the progress of this operation we can use the following syntax:
import numpy as np
from tqdm import tqdm
myrange = np.arange(2000000)
i_2 = []
for i in tqdm(myrange):
i_2.append(i**2)
The result is:
100%|██████████| 2000000/2000000 [00:01<00:00, 1624316.35it/s]
Step 3: Change progress bar size and style
If you like to change the way tqdm
shows the progress bar you can use the following options:
- import
from tqdm.auto import tqdm
to show a different style - add parameter
bar_format='{desc:<5.5}{percentage:3.0f}%|{bar:50}{r_bar}'
orbar_format='{l_bar}{bar:10}{r_bar}{bar:-10b}'
Let's see both in action. First changing the style a nicer one (at least in JupyterLab):
import numpy as np
from tqdm.auto import tqdm
myrange = np.arange(2000000)
i_2 = []
for i in tqdm(myrange):
i_2.append(i**2)
Change the size of the progress bar:
import numpy as np
from tqdm import tqdm
myrange = np.arange(2000000)
i_2 = []
for i in tqdm(myrange, bar_format='{desc:<5.5}{percentage:3.0f}%|{bar:50}{r_bar}'):
i_2.append(i**2)
Step 4: Progress bar during Pandas operations
In this step we will see how to show the progress for the most common Pandas operations.
Adding a progress bar to Pandas shouldn't impact the performance but in case of doubts it's better to be checked.
You can find a simple example for Pandas progress bar below:
Pandas iterrows and progress bar
The simplest usage of tqdm
in Pandas is in combination of loop and iterrows()
. You will need to provide the total number of all items - which can be get by - df.shape[0]
:
from time import sleep
for index, row in tqdm(df.iterrows(), total=df.shape[0]):
sleep(0)
Pandas progress bar for lambda
progress_apply
is a tqdm
method which can accept a lambda. It can perform simple operations on the operand like power on 2:
tqdm.pandas(desc="power DataFrame 1M x 100 of random int!")
df.progress_apply(lambda x: x**2)
Pandas progress bar for function with progress_map
tqdm
offers method progress_map
which can be used to apply functions to Pandas DataFrame or Series. It'll show a progress bar:
from tqdm import tqdm
def cube(x):
return x ** 2
tqdm.pandas()
df['cube'] = df[0].progress_map(cube)
Pandas progress bar for dictionary map
If you like to map the values of a dictionary to Pandas column and follow the progress with bar you can use the following syntax:
mapping = {15:'a', 9:'b'}
df[0].progress_map(lambda x: mapping.get(x))
Pandas and progress_aggregate
Finally let's check how to show progress bar on aggregation functions like:
- sum
- count
- mean etc
from tqdm import tqdm
tqdm.pandas()
df.groupby(0).progress_aggregate(sum)