How to Get most frequent values in Pandas DataFrame

In this short post, I'll show you how to get most frequent values in Pandas DataFrame.

You can find also: How to Get Top 10 Highest or Lowest Values in Pandas

To start, here are the ways to get most frequent N values in your DataFrame:

df['Magnitude'].value_counts()
df['Magnitude'].mode()

In the next steps we will cover more details in simple examples

Step 1: Create Sample DataFrame

To start, let's create DataFrame with data from Kaggle:Significant Earthquakes, 1965-2016.

To learn more about Pandas and Kaggle: How to Search and Download Kaggle Dataset to Pandas DataFrame

import pandas as pd

cols = ['Date', 'Time', 'Latitude', 'Longitude', 'Depth', 'Magnitude Type', 'Type', 'ID', 'Magnitude']
df = pd.read_csv(f'../data/earthquakes_1965_2016_database.csv.zip')[cols]

data:

Date Latitude Longitude Depth Magnitude Type Magnitude
01/02/1965 19.246 145.616 131.6 MW 6.0
01/04/1965 1.863 127.352 80.0 MW 5.8
01/05/1965 -20.579 -173.972 20.0 MW 6.2
01/08/1965 -59.076 -23.557 15.0 MW 5.8
01/09/1965 11.938 126.427 15.0 MW 5.8

Step 2: Get Most Frequent value of Column in Pandas

To get the most frequent value of a column we can use the method mode. It will return the value that appears most often. It can be multiple values.

So to get the most frequent value in a single column - 'Magnitude' we can use:

df['Magnitude'].mode()

result:

0    5.5
dtype: float64

It has few parameters like:

  • numeric_only
  • dropna
  • axis

If the most frequent values are multiple they will be returned:

df['Time'].mode()

result:

0    02:56:58
1    14:09:03
dtype: object

Step 3: Get Most Frequent value for all columns in Pandas

In order to get the most frequent value for all columns we can iterate through all columns of the DataFrame:

dfs = []

for col in df.columns:
    top_values = []
    top_values = df[col].mode()
    dfs.append(pd.DataFrame({col: top_values}).reset_index(drop=True))
pd.concat(dfs, axis=1)

the most frequent values for all columns:

Date Time Depth Magnitude Type Type Magnitude
03/11/2011 02:56:58 10.0 MW Earthquake 5.5
NaN 14:09:03 NaN NaN NaN NaN

Or get the most frequent values per dtype-s - for example only numeric columns:

from pandas.api.types import is_numeric_dtype

dfs = []

for col in df.columns:
    top_values = []
    if is_numeric_dtype(df[col]):
        top_values = df[col].mode()
        dfs.append(pd.DataFrame({col: top_values}).reset_index(drop=True))
pd.concat(dfs, axis=1)

The most frequent values for numeric columns

Depth Magnitude
10.0 5.5

Step 4: Get N most frequent values in a column

To get 5, 10 or N most frequent values in a single column we can use method value_counts:

df['Magnitude'].value_counts()

it will return the frequency for all values in the column:

5.50    4685
5.60    3967
5.70    3079
5.80    2346
5.90    1947
6.00    1580
....
Name: Magnitude, dtype: int64

To get the N most frequent values only without the count we can use:

n = 5
df['Magnitude'].value_counts().index.tolist()[:n]

result:

[5.5, 5.6, 5.7, 5.8, 5.9]

To get only the count:

n = 5
df['Magnitude'].value_counts().values.tolist()[:n]

result:
[4685, 3967, 3079, 2346, 1947]

To get the single most frequent value from value_counts we can combine it with idmax:

df['Magnitude'].value_counts().idxmax()

output:

5.5

To get the count of the most frequent value:

df['Magnitude'].value_counts().max()

output:

4685

Step 5: Get Top 10 most frequent values for all columns

Finally let's cover the case - how to get the most frequent values for all columns in DataFrame.

For this purpose we are going to use loop:

from pandas.api.types import is_categorical_dtype

for col in df.columns:
    print(col, end=' - \n')
    print('_' * 50)
    if col in ['Magnitude'] or is_categorical_dtype(col):
        display(pd.DataFrame(df[col].astype('str').value_counts().sort_values(ascending=False).head(3)))
    else:
        display(pd.DataFrame(df[col].value_counts().sort_values(ascending=False).head(5)))

In the example above we are applying different behaviour for categorical columns. Because value_counts can't be applied directly on them.

result:

Date -


Date
03/11/2011 128
12/26/2004 51
02/27/2010 39
Time -


Time
02:56:58 5
14:09:03 5
16:25:34 4

Resources