Pandas pivot_table Silently Drops Indices with NaNs

In this post, we will discuss when pivot_table silently drops indices with NaN-s. We will give an example, expected behavior and many resources.

Example

Let's have a DataFrame like:

import pandas as pd
import numpy as np
df = pd.DataFrame({'foo': ['one', 'one', 'one', 'two', 'two', 'two'],
          		 'bar': ['A', 'B',  np.nan, 'A', 'B', 'C'],
          		 'baz': [1, 2, 3, 4, 5, 6],
          		 'zoo': ['x', 'y', 'z', 'q', 'w', 't']})

with data:

	foo	bar	baz	zoo
0	one	A	1	x
1	one	B	2	y
2	one	NaN	3	z
3	two	A	4	q
4	two	B	5	w

silent drop of NaN-s indexes

Now let's run two different examples:

pivot_table

df.pivot_table(index='foo', columns='bar', values='zoo', aggfunc=sum)

result is:

bar	A	B	C
foo
one	x	y	NaN
two	q	w	t

Even trying with dropna=False still results in the same behavior in pandas 2.0.1:

df.pivot_table(index='foo', columns='bar', values='zoo', aggfunc=sum, dropna=False)

pivot_table and dropna

Below you can read what is doing parameter dropna:

dropna bool, default True

Do not include columns whose entries are all NaN. If True, rows with a NaN value in any column will be omitted before computing margins.

pivot

while pivot will give us different result:

df.pivot(index='foo', columns='bar', values='zoo')

which returns NaN-s from the bar column:

bar	nan	A	B	C
foo
one	z	x	y	NaN
two	NaN	q	w	t

Stop silent drop

Once you analyze the error and data a potential solution might be to fill NaN values with default value (which differs from the rest):

df['bar'] = df['bar'].fillna(0)
df.pivot_table(index='foo', columns='bar', values='zoo', aggfunc=sum)

After this change pivot_table will not drop the NaN indexes:

	foo	bar	baz	zoo
0	one	A	1	x
1	one	B	2	y
2	one	0	3	z
3	two	A	4	q
4	two	B	5	w

The image below show the behaviour before and after the silent drop of NaN-s:

fix-pivot_table-silently-drops-indices-with-nans

Conclusion

You can always refer to the official Pandas documentation for examples and what is expected: Reshaping and pivot tables

Pandas offers a variety of methods and functions to wrangle data. Sometimes the results might be unexpected. In this case test the results against another method or sequence of steps.

If you notice a Pandas bug or unexpected behavior you can open ticket or check Pandas issues like: ENH: pivot/groupby index with nan #3729

> Basic concepts

> Installations

> Series

> DataFrame

> Create

> Data Types

> Exercise

> Cheat Sheet

> Basic concepts

> Row

> Column

> Index

> MultiIndex

> Exercise

> Basic concepts

> read_csv()

> read_excel()

> Kaggle

> Exercise

> read_xml()

> read_json()

> to_csv()

> to_dict()

> to_json()

> Basic concepts

> groupby()

> Reshape

> melt()

> Exercise

> Pivot

> merge()

> Filter

> Basic concepts

> replace()

> split()

> Regex

> Search

> Exercise

> Find

> Basic concepts

> apply()

> aggfunc

> Convert

> count()

> Other

> Exercise

> map()

> Basic concepts

> Data Validation

> Data Cleaning

> Duplicate

> Time Series

> Pandas Error

> Get

> Basic concepts

> Styling

> Table

> Display

> DataIsBeautiful

> Beginners

> Data Science Projects

> Newsletter

Example

silent drop of NaN-s indexes

pivot_table

pivot_table and dropna

pivot

Stop silent drop

Conclusion

Resources