ValueError: Index contains duplicate entries, cannot reshape in Pandas

In this tutorial, we'll see how to solve a common Pandas error – "ValueError: Index contains duplicate entries, cannot reshape". We get this error from the Pandas when we try to reindex columns or index objects with duplicate entries.

The error is very common for Pandas methods like:

  • df.pivot()
  • df.unstack()

There are several quick solutions to it:

(1) Drop duplicates on the index or columns:

df = df.drop_duplicates(['col_1','col_2'])

(2) Use pivot_table:

pd.pivot_table(df, index='col_1', columns='col_2', values=['col_3'] , aggfunc='count')

(3) Use reset the index

.reset_index()

Let's cover several examples explaining the error in more context.

Setup

For this post we will work with the following DataFrame:

import pandas as pd

data = {
    'team': ['team_blue', 'team_red', 'team_blue', 'team_red', 'team_blue', 'team_red', 'team_blue', 'team_red', 'team_blue', 'team_red'],
    'day': [7, 7, 8, 8, 9, 9, 10, 10, 10, 10],
    'points': [10, 12, 9, 7, 14, 5, 9, 8, 10, 11],
}
df = pd.DataFrame(data)

data:

team day points
0 team_blue 7 10
1 team_red 7 12
2 team_blue 8 9
3 team_red 8 7
4 team_blue 9 14

Step 1: Detect duplicate records

The error will be raised in case of duplicates on any of the axes.

So in order to ensure that there's no duplicates on the selected columns or rows we can use:

df[df.duplicated(['team', 'day'], keep=False)]

output:

team day points
6 team_blue 10 9
7 team_red 10 8
8 team_blue 10 10
9 team_red 10 11

if we get non empty result it means that error will be occur if we try to use those columns with pivot:

pd.pivot(df,'team', 'day' )['points']

The result is Pandas well known error:

ValueError: Index contains duplicate entries, cannot reshape

Note: We can use also:

df.duplicated(['row', 'col']).any()

The output is True or False. In case above it's True

Step 2: Remove duplicate records

To solve the error when we have duplicates we can remove duplicates. To do so we will use method: df.drop_duplicates():

df = df.drop_duplicates(['team', 'day'])

This time running:

pd.pivot(df,'team', 'day' )['points']

will not raise errors. The result is:

day 7 8 9 10
team
team_blue 10 9 14 9
team_red 12 7 5 8

Step 3: Use pivot_table

For duplicated records we can use the method pivot_table with aggregation functions. For example we can check number of records per each group by:

pd.pivot_table(df, index='day', columns='team', values=['points'] , aggfunc='count')

Instead of error we get the count for each column:

points
team team_blue team_red
day
7 1 1
8 1 1
9 1 1
10 2 2
Pro Tip 1
Two main differences between:
pivot
and
pivot_table

(1) "pivot_table" is a generalization of "pivot" that can handle duplicate values for index/column pair.
(2) "pivot_table" will only allow numeric types as "values=", whereas "pivot" will take string types.

Step 4: Use reset_index

Depending on the data and the expected result we can use multiple transformations to solve the error.

Let's demonstrate that with using method reset_index():

df_agg = df.groupby(by=['team', 'day'])['points'].sum().reset_index()
df_agg.pivot(index='team', columns='day', values=['points'])

The output is:

points
day 7 8 9 10
team
team_blue 10 9 14 19
team_red 12 7 5 19

How does it work?

First we group by the two columns - 'team', 'day' and then calculate the sum() of points for each group.

df.groupby(by=['team', 'day'])['points'].sum()

The result is a pandas Series:

team       day
team_blue  7      10
           8       9
           9      14
           10     19
team_red   7      12
           8       7
           9       5
           10     19
Name: points, dtype: int64

reset_index() convert it to a DataFrame without duplicates:

points
day 7 8 9 10
team
team_blue 10 9 14 19
team_red 12 7 5 19

Now we can do pivot() without getting an error: "ValueError: Index contains duplicate entries, cannot reshape".

Pro Tip 2
The same result might be achieved in different ways. Depending on the context we can use different methods.

Conclusion

To sum up, this article shows how removing duplicates can solve the "ValueError: Index contains duplicate entries, cannot reshape" error.

We also covered how to find the reasons for the error. Finally we show alternative ways to get the same result.