Here are two equivalents to count(distinct) or how to count distinct values per group in Pandas DataFrame:

(1) nunique() method

df.groupby('Magnitude Type')['Date'].nunique()

(2) nunique() in combination with other methods

df.groupby(['Magnitude Type'])['Date'].agg(['count', 'nunique'])

The above solution will group by column 'Magnitude Type' and will find all unique values for column 'Date'.

Let's cover the above in more detail. Suppose we have DataFrame like:

Date Latitude Longitude Depth Magnitude Type
01/02/1965 19.246 145.616 131.6 MW
01/04/1965 1.863 127.352 80.0 MW
01/05/1965 -20.579 -173.972 20.0 MW
01/08/1965 -59.076 -23.557 15.0 MW
01/09/1965 11.938 126.427 15.0 MW

Step 1: Pandas equivalent to count(distinct) - .nunique()

If you like to group by a column and then get unique values for another one in Pandas you can use the method: .nunique().

The syntax is simple:

df.groupby('Magnitude Type')['Date'].nunique()

the result is similar to the SQL query:

SELECT count(distinct Date) FROM earthquakes GROUP BY `Magnitude Type`;
Magnitude Type
MB     2474
MD        5
MH        4
ML       60
MS     1316
MW     4665
MWB    2036
MWC    3458
MWR      18
MWW    1256
Name: Date, dtype: int64
× Pro Tip 1
Method value_counts will return the total values and not the unique ones.
df['Magnitude Type'].value_counts(dropna=False)

result:

MW     7722
MWC    5669
MB     3761
MWB    2458
MWW    1983
MS     1702
ML       77
MWR      26
MD        6
MH        5
NaN       3
Name: Magnitude Type, dtype: int64
× Pro Tip 2
Be careful for NaN values. They can make difference in the numbers.

Step 2: Pandas count(distinct) - .nunique() in combination with count

If you like to get all unique values in each group plus other information like:

  • min
  • max
  • count
  • mean

You can modify the upper query a bit:

df.groupby(['Magnitude Type'])['Date'].agg(['count', 'nunique'])

This will result into:

count nunique min
Magnitude Type
MB 3761 2474 01/01/1975
MD 6 5 02/28/2001
MH 5 4 02/09/1971
ML 77 60 01/03/1976
MS 1702 1316 01/01/1973
MW 7722 4665 01/01/1967
MWB 2458 2036 01/01/1995
MWC 5669 3458 01/01/1997
MWR 26 18 02/13/2012
MWW 1983 1256 01/01/2011

As you can see for Magnitude Type = MH there are 4 unique Dates but 5 records:

Date Latitude Longitude Depth Magnitude Type
1848 02/09/1971 34.416000 -118.370000 6.000 MH
1849 02/09/1971 34.416000 -118.370000 6.000 MH
9664 10/18/1989 37.036167 -121.879833 17.214 MH
10584 08/17/1991 41.679000 -125.856000 1.303 MH
10869 04/25/1992 40.335333 -124.228667 9.856 MH

Because there are 2 records for date: 02/09/1971.

Step 3: Pandas count(distinct) - .nunique() for multiple columns

If you like to get the number of unique values in several columns if you have two options.

.agg + nunique

First one is using the same approach as above:

df.groupby('Magnitude Type').agg({'Date': ['nunique', 'count'],
                                      'Depth': ['nunique', 'count']})

result:

Date Depth
nunique count nunique count
Magnitude Type
MB 2474 3761 911 3761
MD 5 6 6 6
MH 4 5 4 5
ML 60 77 67 77
MS 1316 1702 206 1702

crosstab

Usually Pandas provide multiple ways of achieving something. In this example we are going to use the method crosstab.

It might be slower than the other but you may have additional useful information for more insights.

So first let's do a simple demo of crosstab:

pd.crosstab(df['Magnitude Type'], df['Date'])

This will give a table which is showing information for each Date and Magnitude Type:

Date 01/01/1967 01/01/1969 01/01/1970 01/01/1971 01/01/1972
Magnitude Type
MW 2 1 1 1 1
MWB 0 0 0 0 0
MWC 0 0 0 0 0
MWR 0 0 0 0 0
MWW 0 0 0 0 0

And then we can summarize the information into unique values by using .ne(0).sum(1) :

pd.crosstab(df['Magnitude Type'], df['Date']).ne(0).sum(1)

result:

Magnitude Type
MB     2474
MD        5
MH        4
ML       60
MS     1316
MW     4665
MWB    2036
MWC    3458
MWR      18
MWW    1256
dtype: int64
× How does it work?
Method ne(0) will convert anything different from 0 to True and 0 to False. Then we are going to sum across rows.

Step 4: Pandas count(distinct) - by lambda and conditional

Finally you have an option of using a lambda in order to count the unique values.

This will be useful if you like to apply conditions on the count - for example excluding some records.

df.groupby('Magnitude Type')['Date'].apply(lambda x: x.unique().shape[0])

For example example excluding groups smaller than 1000:

df.groupby('Magnitude Type')['Date'].apply(lambda x: x.unique().shape[0] if x.unique().shape[0] > 1000 else None).dropna()

Resources