Pandas pivot - ValueError: Index contains duplicate entries, cannot reshape

In this article we will see how to solve Pandas pivot error: "ValueError: Index contains duplicate entries, cannot reshape".

Let's see how to solve this error in different ways depending on the case.

Setup

Suppose we have a DataFrame like:

import pandas as pd
df = pd.DataFrame({'foo': ['one', 'one', 'one', 'two', 'two', 'two'],
               	'bar': ['A', 'B', 'B', 'A', 'B', 'C'],
               	'baz': [1, 2, 3, 4, 5, 6],
               	'zoo': ['x', 'y', 'z', 'q', 'w', 't']})

You can see data below:

foo bar baz zoo
0 one A 1 x
1 one B 2 y
2 one B 3 z
3 two A 4 q
4 two B 5 w
5 two C 6 t

Note that there is a duplication in the row with index - 2 we have B instead of C.

If we try to use method 'pivot' with duplicate entries like:

df.pivot(index='foo', columns='bar', values='baz')

we will get the error:

ValueError: Index contains duplicate entries, cannot reshape"

1. Use pivot_table

For tables with duplicate entries we need to use pivot_table:

df.pivot_table(index='foo', columns='bar', values='baz')

this will solve the error and produce correct result:

bar A B C
foo
one 1.0 2.5 NaN
two 4.0 5.0 6.0

2. Remove duplicates

If you prefer to use the pivot method you need to drop duplicates from the DataFrame by:

df = df.drop_duplicates(['foo','bar'])
df.pivot(index='foo', columns='bar', values='baz')

In this case the result is the same as using pivot_table:

bar A B C
foo
one 1.0 2.5 NaN
two 4.0 5.0 6.0

3. Aggregate

You can also use a custom aggregation to mimic pivot behavior. Let's combine methods like:

  • groupby
  • sum

to produce aggregate data as the method pivot without getting error:

df_agg = df.groupby(by=['foo', 'bar']).sum().reset_index()
df_agg.pivot(index='foo', columns='bar', values='baz')

And again we get the same result:

bar A B C
foo
one 1.0 2.5 NaN
two 4.0 5.0 6.0

Conclusion

In this post, we covered the most common solution for Pandas error on method pivot:

"ValueError: Index contains duplicate entries, cannot reshape".

To solve Pandas errors you need to:

  • understand your data very well
  • know what the expected result should be.

Resources