How to Compare Two Excel or CSV Files Using Pandas

When working with data, you may need to compare two Excel or CSV files to:

find differences
detect updates
history changes
mistakes
validate records etc

In this guide, we'll explore how to compare two Excel or CSV files using Pandas, with practical examples.

Why Compare Files with Pandas

Comparing datasets is useful when:

Validating data consistency
Detecting changes between old and new versions
Finding missing or extra rows
Comparing specific columns for modifications
Data comes from different sources

Let's dive into two approaches to compare files using Pandas.

Example 1: Compare Two CSV Files and Find Differences

Step 1: Load the CSV Files into Pandas

Assume we have two CSV files:

old_data.csv (original dataset)
new_data.csv (updated dataset)

import pandas as pd

# Load the CSV files
df_old = pd.read_csv("old_data.csv")
df_new = pd.read_csv("new_data.csv")

print("Old Data:\n", df_old.head())
print("\nNew Data:\n", df_new.head())

We can already check if there are some differences between the rows and columnts for the files.

You can find the data for both CSV files below:

new_data.csv

name,age,city
Alice,25,New York
Bobby,30,Los Angeles
Charlie,35,Chicago
David,37,San Francisco
Ema,28,Houston

and

old_data.csv

name,age,city
Alice,25,New York
Bob,30,Los Angeles
Charlie,35,Chicago
David,40,San Francisco
Emma,28,Houston

Step 2: Find Differences Between the Files

To detect changes, we can use merge with an outer join and keep only the differences:

df_new.compare(df_old, keep_shape=False, keep_equal=True)

or keep the original data shape by:

df_new.compare(df_old, keep_shape=False, keep_equal=True)

result:

	name		age
	self	other	self	other
1	Bobby	Bob	30	30
3	David	David	37	40
4	Ema	Emma	28	28

From the result table we can see 3 differencies:

Bobby <-> Bob
37 <-> 40
Ema <-> Emma

If you like to find out how to highlight the changes you can check this article: How to Compare Two Pandas DataFrames and Get Differences

Step 3: Check number of differences

To check what is the number of the differences you can use:

((df_old.fillna(1) == df_new.fillna(1)).melt()['value'] == 0).sum()

which give us 3.

Example 2: Compare Specific Columns for Changes

If both files have the same structure but contain modified values, we can compare specific columns.

Step 1: Load the Excel Files

df_old = pd.read_excel("old_data.xlsx")
df_new = pd.read_excel("new_data.xlsx")

Note: At this step you can compare individual sheets. For example you can download the sheet from Google Drive by:

Open the file from Google Sheets
File
Download
Comma Separated Values (.csv)

Step 2: Identify Modified Rows

We use df.compare() to highlight differences in matching rows.

# Compare only specific columns
differences = df_old.compare(df_new)

differences

output:

	name		age
	self	other	self	other
1	Bobby	Bob	30	30
3	David	David	37	40
4	Ema	Emma	28	28

Conclusion

With Pandas, comparing two CSV or Excel files is simple and effective. We explored:

Detecting modified values using compare()
Extracting only differences
Counting the differences

These methods help ensure data accuracy when working with different versions of datasets.

> Basic concepts

> Installations

> Series

> DataFrame

> Create

> Data Types

> Exercise

> Cheat Sheet

> Basic concepts

> Row

> Column

> Index

> MultiIndex

> Exercise

> Basic concepts

> read_csv()

> read_excel()

> Kaggle

> Exercise

> read_xml()

> read_json()

> to_csv()

> to_dict()

> to_json()

> Basic concepts

> groupby()

> Reshape

> melt()

> Exercise

> Pivot

> merge()

> Filter

> Basic concepts

> replace()

> split()

> Regex

> Search

> Exercise

> Find

> Basic concepts

> apply()

> aggfunc

> Convert

> count()

> Other

> Exercise

> map()

> Basic concepts

> Data Validation

> Data Cleaning

> Duplicate

> Time Series

> Pandas Error

> Get

> Basic concepts

> Styling

> Table

> Display

> DataIsBeautiful

> Beginners

> Data Science Projects

> Newsletter

Why Compare Files with Pandas

Example 1: Compare Two CSV Files and Find Differences

Step 1: Load the CSV Files into Pandas

Step 2: Find Differences Between the Files

Step 3: Check number of differences

Example 2: Compare Specific Columns for Changes

Step 1: Load the Excel Files

Step 2: Identify Modified Rows

Conclusion