In this post we will see how to find the closest values in a pandas Series to a given number. Here you can find two short solutions:
(1) Find the single closest value in a Pandas Series
closest_value = df['column_name'].iloc[(df['column_name'] - input_value).abs().idxmin()]
(2) Find the N closest values
n_closest = df.loc[(df['column_name'] - input_value).abs().nsmallest(N).index, 'column_name']
Finding the closest value to a given input in a Pandas Series is useful for rounding numbers, performing nearest-neighbor lookups, or handling continuous data.
1. Sample Data
Let's create a sample dataset:
import pandas as pd
import numpy as np
data = {'values': [5, 12, 20, 28, 35, 42]}
df = pd.DataFrame(data)
df
This creates a DataFrame:
values | |
---|---|
0 | 5 |
1 | 12 |
2 | 20 |
3 | 28 |
4 | 35 |
5 | 42 |
2. Find the Single Closest Value
To find the closest value to 25
, we use idxmin()
on the absolute difference:
value = 25
closest_value = df['values'].iloc[(df['values'] - value).abs().idxmin()]
closest_value
Output:
28
The closest value to 25
is 28
.
3. Sort by proximity to value
We can sort values in a Series based on their proximity to a given value. Be careful for NaN values which can bring unexpected results:
import pandas as pd
import numpy as np
data = {'values': [ 20, np.NaN, 28, 35, 42, 1, -1, 5, -5, 12,]}
df = pd.DataFrame(data)
value = 0
df.loc[(df['values']).abs().nsmallest(5).index, 'values']
Here are the 5 closest numbers to the given value:
5 1.0
6 -1.0
7 5.0
8 -5.0
9 12.0
Name: values, dtype: float64
4. Find the N Closest Values
To find the 3
closest values to 25
:
n = 3
df.loc[(df['values'] - value).abs().nsmallest(n).index, 'values']
Output:
3 28
2 20
4 35
Name: values, dtype: int64
The three closest values to 25
are:
20
28
35
5. Find the Closest Larger or Smaller Value
To find the closest larger or the smaller number from Pandas Series to a given number we can use:
- Find the closest larger value:
df.loc[df['values'] >= value, 'values'].min()
Output:
28
- Find the closest smaller value:
df.loc[df['values'] <= value, 'values'].max()
Output:
20
6. Handling Missing or Empty Data
If the Series contains NaN
values, you should drop them before finding the closest value:
df = df.dropna()
To handle the case where no valid values exist (e.g., all values are larger/smaller), use:
if df['values'].lt(input_value).any():
smallest = df.loc[df['values'] < input_value, 'values'].max()
else:
smallest = None
7. Conclusion
Finding the closest value in a Pandas Series is simple using .idxmin()
for a single value or .nsmallest()
for multiple values. You can also filter values to find the closest larger or smaller number as needed.