How to Group by multiple columns, count and map in Pandas
To group by two or multiple columns, count unique combinations and map the result we can chain two Pandas methods:
groupby()
size()
df.groupby(['col1', 'col2']).size()
The picture below shows all the steps and the final result:
Let's create a sample DataFrame and explain all the steps in details:
import pandas as pd
data = {'col1': ['a', 'c', 'a', 'b', 'a', 'c'],
'col2': ['d', 'a', 'c', 'e', 'd', 'a']}
df = pd.DataFrame(data)
data looks like:
col1 | col2 | |
---|---|---|
0 | a | d |
1 | c | a |
2 | a | c |
3 | b | e |
4 | a | d |
5 | c | a |
Group by two columns and count
To group by multiple columns in Pandas and count the combinations we can chain methods:
df_g = df.groupby(['col1', 'col2']).size().reset_index(name='counts')
This gives us a new DataFrame with counts of unique combinations from the columns. We have the original columns plus new columns - counts
which contains the occurrences:
col1 | col2 | counts | |
---|---|---|---|
0 | a | c | 1 |
1 | a | d | 2 |
2 | b | e | 1 |
3 | c | a | 2 |
Map count to new column in first DataFrame
Finally we can map the count to the original DataFrame. We will use method merge
and map on two columns ['col1', 'col2']
:
pd.merge(df, df_g, on=['col1', 'col2'])
This gives us:
col1 | col2 | counts | |
---|---|---|---|
0 | a | d | 2 |
1 | a | d | 2 |
2 | c | a | 2 |
3 | c | a | 2 |
4 | a | c | 1 |
5 | b | e | 1 |
Or if we like to preserve the order of the original DataFrame we can use left join - how="left"
:
pd.merge(df, df_g, on=['col1', 'col2'], how='left')
How does it work?
- we group by two columns
col1
andcol2
- the size
method
counts the number of occurrences of each combination reset_index
method reset the index- rename of the new column to counts and assign counts
- finally map the two DataFrames on multiple columns