How to Group by multiple columns, count and map in Pandas

To group by two or multiple columns, count unique combinations and map the result we can chain two Pandas methods:

  • groupby()
  • size()
df.groupby(['col1', 'col2']).size()

The picture below shows all the steps and the final result:

Let's create a sample DataFrame and explain all the steps in details:

import pandas as pd

data = {'col1': ['a', 'c', 'a', 'b', 'a', 'c'],
    	'col2': ['d', 'a', 'c', 'e', 'd', 'a']}
df = pd.DataFrame(data)

data looks like:

col1 col2
0 a d
1 c a
2 a c
3 b e
4 a d
5 c a

Group by two columns and count

To group by multiple columns in Pandas and count the combinations we can chain methods:

df_g = df.groupby(['col1', 'col2']).size().reset_index(name='counts')

This gives us a new DataFrame with counts of unique combinations from the columns. We have the original columns plus new columns - counts which contains the occurrences:

col1 col2 counts
0 a c 1
1 a d 2
2 b e 1
3 c a 2

Map count to new column in first DataFrame

Finally we can map the count to the original DataFrame. We will use method merge and map on two columns ['col1', 'col2']:

pd.merge(df, df_g, on=['col1', 'col2'])

This gives us:

col1 col2 counts
0 a d 2
1 a d 2
2 c a 2
3 c a 2
4 a c 1
5 b e 1

Or if we like to preserve the order of the original DataFrame we can use left join - how="left":

pd.merge(df, df_g, on=['col1', 'col2'], how='left')

How does it work?

  • we group by two columns col1 and col2
  • the size method counts the number of occurrences of each combination
  • reset_index method reset the index
  • rename of the new column to counts and assign counts
  • finally map the two DataFrames on multiple columns