How to Merge Two DataFrames on Index in Pandas

In this short tutorial, we'll show how to merge two DataFrames on index in Pandas.

Here are two approaches to merge DataFrames on index:

(1) Use method merge with left_index and right_index

pd.merge(df1, df2, left_index=True, right_index=True)

(2) Method concat with axis=1

pd.concat([df1, df2], axis=1)

To start, let's say that we have two DataFrames:

df1

A	B	C	D
A0	B0	C0	D0
A1	B1	C1	D1
A2	B2	C2	D2
A3	B3	C3	D3

df2

A	B	C	D
A4	B4	C4	D4
A5	B5	C5	D5
A6	B6	C6	D6
A7	B7	C7	D7

Option 1: Pandas: merge on index by method `merge`

The first example will show how to use method merge in combination with left_index and right_index.

This will merge on index by inner join - only the rows from the both DataFrames with similar index will be added to the result:

pd.merge(df1, df2, left_index=True, right_index=True)

The result is:

A_x	B_x	C_x	D_x	A_y	B_y	C_y	D_y
A0	B0	C0	D0	A4	B4	C4	D4
A1	B1	C1	D1	A5	B5	C5	D5
A2	B2	C2	D2	A6	B6	C6	D6
A3	B3	C3	D3	A7	B7	C7	D7

merge-two-dataframes-on-index-pandas

As you can see the result DataFrame is a combination of both merged on the index. The column names have suffixes: _x and _y.

In order to change the suffixes use the following syntax:

df_m = pd.merge(df1, df2, left_index=True, right_index=True, suffixes=('_left', '_right'))
df_m

which uses parameter suffixes: 'Suffixes' = ('_x', '_y')

The functions is described as:

Merge DataFrame or named Series objects with a database-style join.

Duplicated index while merging on index

In case of duplicated index while merging on index the rows will be duplicated:

merge-two-dataframes-on-index-pandas-duplicated-indexes

Option 2: Pandas: merge on index by `concat` and axis=1

Alternative solution is to merge both DataFrames by using method concat. For this approach you need to set the parameter axis=1 which is equivalent to axis='columns'.

pd.concat([df1, df2], axis='columns')

The result is similar to the previous one with differences in the column names. Using concat will not change the column names.

A	B	C	D	A	B	C	D
A0	B0	C0	D0	A4	B4	C4	D4
A1	B1	C1	D1	A5	B5	C5	D5
A2	B2	C2	D2	A6	B6	C6	D6
A3	B3	C3	D3	A7	B7	C7	D7

As you can see we have duplicate names for the columns. In order to avoid concat with duplicated columns use verify_integrity:

df_m = pd.concat([df1, df2], axis='columns', verify_integrity=True)
df_m

This will raise error:

ValueError: Indexes have overlapping values: Index(['A', 'B', 'C', 'D'], dtype='object')

If you still prefer to have duplicate columns you can access them in the normal way:

df_m['A']

result:

A	A
A0	A4
A1	A5
A2	A6
A3	A7

Option 3: Pandas: merge on index - `join`

Method join can be used for the merge operation. By default is left join.

In case of duplicated column names will error:

ValueError: columns overlap but no suffix specified: Index(['A', 'B', 'C', 'D'], dtype='object')

In order to avoid errors use the following syntax:

df1.join(df2, lsuffix='_x')

Conclusion: Pandas: merge on index - `merge` vs `concat`

Finally let's compare the both ways:

merge will assign suffixes for both DataFrames / concat will not change the column names
merge has control on the merge operation. By default the operation is inner join - you can change it to left, right, outer etc
merge

So if we add extra row for df2 as:

	A	B	C	D
0	A4	B4	C4	D4
1	A5	B5	C5	D5
2	A6	B6	C6	D6
3	A7	B7	C7	D7
4	A8	B8	C8	D8

Then concat will do full outer join by default:

	A	B	C	D	A	B	C	D
0	A0	B0	C0	D0	A4	B4	C4	D4
1	A1	B1	C1	D1	A5	B5	C5	D5
2	A2	B2	C2	D2	A6	B6	C6	D6
3	A3	B3	C3	D3	A7	B7	C7	D7
4	NaN	NaN	NaN	NaN	A8	B8	C8	D8

While merge will do only inner join and the result will be the same as in Step 1. In order to change merge behaviour you need to change the how parameter:

df_m = pd.merge(df1, df2, left_index=True, right_index=True, how='outer')
df_m

So the The best way to merge on an index is to use the method merge.

A_x	B_x	C_x	D_x	A_y	B_y	C_y	D_y
A0	B0	C0	D0	A4	B4	C4	D4
A1	B1	C1	D1	A5	B5	C5	D5
A2	B2	C2	D2	A6	B6	C6	D6
A3	B3	C3	D3	A7	B7	C7	D7

A	B	C	D	A	B	C	D
A0	B0	C0	D0	A4	B4	C4	D4
A1	B1	C1	D1	A5	B5	C5	D5
A2	B2	C2	D2	A6	B6	C6	D6
A3	B3	C3	D3	A7	B7	C7	D7

	A	B	C	D	A	B	C	D
0	A0	B0	C0	D0	A4	B4	C4	D4
1	A1	B1	C1	D1	A5	B5	C5	D5
2	A2	B2	C2	D2	A6	B6	C6	D6
3	A3	B3	C3	D3	A7	B7	C7	D7
4	NaN	NaN	NaN	NaN	A8	B8	C8	D8

A_x	B_x	C_x	D_x	A_y	B_y	C_y	D_y
A0	B0	C0	D0	A4	B4	C4	D4
A1	B1	C1	D1	A5	B5	C5	D5
A2	B2	C2	D2	A6	B6	C6	D6
A3	B3	C3	D3	A7	B7	C7	D7

A	B	C	D	A	B	C	D
A0	B0	C0	D0	A4	B4	C4	D4
A1	B1	C1	D1	A5	B5	C5	D5
A2	B2	C2	D2	A6	B6	C6	D6
A3	B3	C3	D3	A7	B7	C7	D7

	A	B	C	D	A	B	C	D
0	A0	B0	C0	D0	A4	B4	C4	D4
1	A1	B1	C1	D1	A5	B5	C5	D5
2	A2	B2	C2	D2	A6	B6	C6	D6
3	A3	B3	C3	D3	A7	B7	C7	D7
4	NaN	NaN	NaN	NaN	A8	B8	C8	D8

> Basic concepts

> Installations

> Series

> DataFrame

> Create

> Data Types

> Exercise

> Cheat Sheet

> Basic concepts

> Row

> Column

> Index

> MultiIndex

> Exercise

> Basic concepts

> read_csv()

> read_excel()

> Kaggle

> Exercise

> read_xml()

> read_json()

> to_csv()

> to_dict()

> to_json()

> Basic concepts

> groupby()

> Reshape

> melt()

> Exercise

> Pivot

> merge()

> Filter

> Basic concepts

> replace()

> split()

> Regex

> Search

> Exercise

> Find

> Basic concepts

> apply()

> aggfunc

> Convert

> count()

> Other

> Exercise

> map()

> Basic concepts

> Data Validation

> Data Cleaning

> Duplicate

> Time Series

> Pandas Error

> Get

> Basic concepts

> Styling

> Table

> Display

> DataIsBeautiful

> Beginners

> Data Science Projects

> Newsletter

Option 1: Pandas: merge on index by method merge

Duplicated index while merging on index

Option 2: Pandas: merge on index by concat and axis=1

Option 3: Pandas: merge on index - join

Conclusion: Pandas: merge on index - merge vs concat

Resources

Option 1: Pandas: merge on index by method `merge`

Option 2: Pandas: merge on index by `concat` and axis=1

Option 3: Pandas: merge on index - `join`

Conclusion: Pandas: merge on index - `merge` vs `concat`

A_x	B_x	C_x	D_x	A_y	B_y	C_y	D_y
A0	B0	C0	D0	A4	B4	C4	D4
A1	B1	C1	D1	A5	B5	C5	D5
A2	B2	C2	D2	A6	B6	C6	D6
A3	B3	C3	D3	A7	B7	C7	D7

A	B	C	D	A	B	C	D
A0	B0	C0	D0	A4	B4	C4	D4
A1	B1	C1	D1	A5	B5	C5	D5
A2	B2	C2	D2	A6	B6	C6	D6
A3	B3	C3	D3	A7	B7	C7	D7

	A	B	C	D	A	B	C	D
0	A0	B0	C0	D0	A4	B4	C4	D4
1	A1	B1	C1	D1	A5	B5	C5	D5
2	A2	B2	C2	D2	A6	B6	C6	D6
3	A3	B3	C3	D3	A7	B7	C7	D7
4	NaN	NaN	NaN	NaN	A8	B8	C8	D8