How to Compare Each Value in Pandas Column to All Subsequent Values

Learn how to compare every value in a pandas DataFrame column with all following values efficiently.

Sample Data

import pandas as pd

val = [16, 19, 15, 19, 15]
df = pd.DataFrame({'val': val})

	val
0	16
1	19
2	15
3	19
4	15

1. Compare with Subsequent Values Using apply

Create a new column with lists of comparison results (e.g., 1 if equal, 0 otherwise) for all later rows:

df['match'] = df.apply(
    lambda row: [
        1 if row['val'] == df.loc[idx, 'val'] else 0
        for idx in range(row.name + 1, len(df))
    ],
    axis=1
)

Result:

	val	match
0	16	[0, 0, 0, 0]
1	19	[0, 1, 0]
2	15	[0, 1]
3	19	[0]
4	15	[]

This approach works row-wise and is suitable for moderate-sized DataFrames.

2. Compare Text Values with Subsequent for similarity

You need to install library: python-Levenshtein

!pip install python-Levenshtein

The idea is to match all similarities - i.e. apple and appl:

from Levenshtein import ratio

df_str = pd.DataFrame({'text': ['apple', 'appl', 'banana', 'apple', 'bananna']})

def is_similar(a, b, threshold=0.8):
    return 1 if ratio(a, b) >= threshold else 0

df_str['similar_later'] = df_str.apply(
    lambda row: [
        is_similar(row['text'], df_str.loc[idx, 'text'])
        for idx in range(row.name + 1, len(df_str))
    ],
    axis=1
)

df_str

result:

	text	similar_later
0	apple	[1, 0, 1, 0]
1	appl	[0, 1, 0]
2	banana	[0, 1]
3	apple	[0]
4	bananna	[]

3. Compare Values for large DataFrames

import numpy as np
arr = df['val'].values
comparisons = (arr[:, np.newaxis] == arr[np.newaxis, :])  # Full matrix
upper_tri = np.triu(comparisons, k=1)

result for array([16, 19, 15, 19, 15]):

array([[False, False, False, False, False],
       [False, False, False,  True, False],
       [False, False, False, False,  True],
       [False, False, False, False, False],
       [False, False, False, False, False]])

Notes

For large DataFrames, this apply method can be slow due to Python loops.
Customize the comparison (e.g., == to > or a function like Levenshtein distance for strings).
For fully vectorized alternatives, consider NumPy broadcasting if the output format allows (e.g., upper triangular matrix).

Resources

Notebook

> Basic concepts

> Installations

> Series

> DataFrame

> Create

> Data Types

> Exercise

> Cheat Sheet

> Basic concepts

> Row

> Column

> Index

> MultiIndex

> Exercise

> Basic concepts

> read_csv()

> read_excel()

> Kaggle

> Exercise

> read_xml()

> read_json()

> to_csv()

> to_dict()

> to_json()

> Basic concepts

> groupby()

> Reshape

> melt()

> Exercise

> Pivot

> merge()

> Filter

> Basic concepts

> replace()

> split()

> Regex

> Search

> Exercise

> Find

> Basic concepts

> apply()

> aggfunc

> Convert

> count()

> Other

> Exercise

> map()

> Basic concepts

> Data Validation

> Data Cleaning

> Duplicate

> Time Series

> Pandas Error

> Get

> Basic concepts

> Styling

> Table

> Display

> DataIsBeautiful

> Beginners

> Data Science Projects

> Newsletter

Sample Data

1. Compare with Subsequent Values Using apply

2. Compare Text Values with Subsequent for similarity

3. Compare Values for large DataFrames

Notes

Resources