In this short post, I'll show you how to get most frequent values in Pandas DataFrame.
You can find also: How to Get Top 10 Highest or Lowest Values in Pandas
To start, here are the ways to get most frequent N values in your DataFrame:
In the next steps we will cover more details in simple examples
Step 1: Create Sample DataFrame
To start, let's create DataFrame with data from Kaggle:Significant Earthquakes, 1965-2016.
To learn more about Pandas and Kaggle: How to Search and Download Kaggle Dataset to Pandas DataFrame
data:
Date | Latitude | Longitude | Depth | Magnitude Type | Magnitude |
---|---|---|---|---|---|
01/02/1965 | 19.246 | 145.616 | 131.6 | MW | 6.0 |
01/04/1965 | 1.863 | 127.352 | 80.0 | MW | 5.8 |
01/05/1965 | -20.579 | -173.972 | 20.0 | MW | 6.2 |
01/08/1965 | -59.076 | -23.557 | 15.0 | MW | 5.8 |
01/09/1965 | 11.938 | 126.427 | 15.0 | MW | 5.8 |
Step 2: Get Most Frequent value of Column in Pandas
To get the most frequent value of a column we can use the method mode
. It will return the value that appears most often. It can be multiple values.
So to get the most frequent value in a single column - 'Magnitude' we can use:
result:
0 5.5
dtype: float64
It has few parameters like:
numeric_only
dropna
axis
If the most frequent values are multiple they will be returned:
result:
0 02:56:58
1 14:09:03
dtype: object
Step 3: Get Most Frequent value for all columns in Pandas
In order to get the most frequent value for all columns we can iterate through all columns of the DataFrame:
the most frequent values for all columns:
Date | Time | Depth | Magnitude Type | Type | Magnitude |
---|---|---|---|---|---|
03/11/2011 | 02:56:58 | 10.0 | MW | Earthquake | 5.5 |
NaN | 14:09:03 | NaN | NaN | NaN | NaN |
Or get the most frequent values per dtype-s - for example only numeric columns:
The most frequent values for numeric columns
Depth | Magnitude |
---|---|
10.0 | 5.5 |
Step 4: Get N most frequent values in a column
To get 5, 10 or N most frequent values in a single column we can use method value_counts
:
it will return the frequency for all values in the column:
5.50 4685
5.60 3967
5.70 3079
5.80 2346
5.90 1947
6.00 1580
....
Name: Magnitude, dtype: int64
To get the N most frequent values only without the count we can use:
result:
[5.5, 5.6, 5.7, 5.8, 5.9]
To get only the count:
result:
[4685, 3967, 3079, 2346, 1947]
To get the single most frequent value from value_counts
we can combine it with idmax
:
output:
5.5
To get the count of the most frequent value:
output:
4685
Step 5: Get Top 10 most frequent values for all columns
Finally let's cover the case - how to get the most frequent values for all columns in DataFrame.
For this purpose we are going to use loop:
In the example above we are applying different behaviour for categorical columns. Because value_counts can't be applied directly on them.
result:
Date -
Date
03/11/2011 128
12/26/2004 51
02/27/2010 39
Time -
Time
02:56:58 5
14:09:03 5
16:25:34 4