In this short guide, I'll show you how to get top 5, 10 or N values in Pandas DataFrame. You can find also how to print top/bottom values for all columns in a DataFrame.
If you need the most frequent values in a column or DataFrame you can check: How to Get Most Frequent Values in Pandas Dataframe
To start, here is the syntax that you may apply in order to get top 10 biggest numeric values in your DataFrame:
df['Magnitude'].nlargest(n=10)
To list the top 10 lowest values in DataFrame you can use:
df.nlargest(n=5, columns=['Magnitude', 'Depth'])
In the next section, I’ll show you more examples and other techniques in order to get top/bottom values in DataFrame.
Step 1: Create Sample DataFrame
In this example we are going to use real data available from Kaggle:Significant Earthquakes, 1965-2016.
To learn more about reading Kaggle data with Python and Pandas: How to Search and Download Kaggle Dataset to Pandas DataFrame
import pandas as pd
cols = ['Date', 'Time', 'Latitude', 'Longitude', 'Depth', 'Magnitude Type', 'Type', 'ID', 'Magnitude']
df = pd.read_csv(f'../data/earthquakes_1965_2016_database.csv.zip')[cols]
data:
Date | Latitude | Longitude | Depth | Magnitude Type | Magnitude |
---|---|---|---|---|---|
01/02/1965 | 19.246 | 145.616 | 131.6 | MW | 6.0 |
01/04/1965 | 1.863 | 127.352 | 80.0 | MW | 5.8 |
01/05/1965 | -20.579 | -173.972 | 20.0 | MW | 6.2 |
01/08/1965 | -59.076 | -23.557 | 15.0 | MW | 5.8 |
01/09/1965 | 11.938 | 126.427 | 15.0 | MW | 5.8 |
Step 2: Get Top 10 biggest/lowest values for single column
You can use functions:
nsmallest
- return the first n rows ordered by columns in ascending ordernlargest
- return the first n rows ordered by columns in descending order
to get the top N highest or lowest values in Pandas DataFrame.
So let's say that we would like to find the strongest earthquakes by magnitude. From the data above we can use the following syntax:
df['Magnitude'].nlargest(n=10)
the result is:
17083 9.1
20501 9.1
19928 8.8
16 8.7
17329 8.6
21219 8.6
...
Name: Magnitude, dtype: float64
How does it return the top 10 values? It orders all values in ascending or descending order. Then will return 5, 10 or N values from this order.
Step 3: Get Top 10 biggest/lowest values - duplicates
As you can see in Step 2 there are duplicates in the results.
nsmallest
/ nlargest
have a parameter:
keep{‘first’, ‘last’, ‘all’}, default ‘first’
which helps to deal with duplicates.
Let's say that you would like to get 5 smallest values by Depth
of the earthquake:
df.nsmallest(n=5, columns=['Depth'])
result:
Date | Time | Depth | Magnitude Type | Type | ID | Magnitude |
---|---|---|---|---|---|---|
06/28/1992 | 12:00:45 | -1.100 | ML | Earthquake | CI3043549 | 5.77 |
06/28/1992 | 11:57:34 | -0.097 | MW | Earthquake | CI3031111 | 7.30 |
07/21/1986 | 22:07:16 | -0.076 | ML | Earthquake | NC95459 | 5.60 |
02/16/1973 | 05:02:58 | 0.000 | MB | Explosion | USP00000JC | 5.60 |
07/23/1973 | 01:22:58 | 0.000 | MB | Nuclear Explosion | USP00002TP | 6.30 |
It seems that we have many records with Depth 0. In order to get all records which have Depth 0 we can use:
df.nsmallest(n=5, columns=['Depth'], keep='all')
the result will return the previous ones plus the duplicated:
173 rows × 7 columns
While if we use keep='last'
we will get again 5 rows as a result.
Step 4: Get Top N values in multiple columns
You can get top 5, 10 or N records from multiple columns with:
nsmallest
nlargest
To do so you can use the following syntax:
df.nlargest(n=5, columns=['Magnitude', 'Depth'])
result:
Date | Time | Depth | Magnitude Type | Type | Magnitude |
---|---|---|---|---|---|
12/26/2004 | 00:58:53 | 30.0 | MW | Earthquake | 9.1 |
03/11/2011 | 05:46:24 | 29.0 | MWW | Earthquake | 9.1 |
02/27/2010 | 06:34:12 | 22.9 | MWW | Earthquake | 8.8 |
02/04/1965 | 05:01:22 | 30.3 | MW | Earthquake | 8.7 |
03/28/2005 | 16:09:37 | 30.0 | MWW | Earthquake | 8.6 |
Step 5: How do nsmallest
and nlargest
work
So to summarise the algorithm:
- order all values - ascending or descending
- find top N values
- filter out duplicates ( depends on parameter
keep
) - return N top values
Both methods will work only with numeric types. If you try to use them on columns with dtype object you will get an error:
TypeError: Column 'Magnitude Type' has dtype object, cannot use method 'nlargest' with this dtype
Step 6: Display Top values for all numeric columns in DataFrame
If you like to get top values for all columns you can use the next syntax:
from pandas.api.types import is_numeric_dtype
dfs = []
for col in df.columns:
top_values = []
if is_numeric_dtype(df[col]):
top_values = df[col].nlargest(n=7)
dfs.append(pd.DataFrame({col: top_values}).reset_index(drop=True))
pd.concat(dfs, axis=1)
This will return the top values per column as a new DataFrame:
Depth | Magnitude |
---|---|
700.0 | 9.1 |
691.6 | 9.1 |
690.0 | 8.8 |
688.0 | 8.7 |
687.6 | 8.6 |
Step 7: nsmallest
and nlargest
and datetime
It's possible to use methods nsmallest
and nlargest
with datetime. You need to be sure that the target column is from type: datetime64[ns, UTC]
If you need to convert string to datetime column you can use:
df['Date'] = pd.to_datetime(df['Date'], utc=True)
and then use the nlargest
:
df['Date'].nlargest()