In this quick Pandas tutorial, we'll cover how we can read Excel sheet or CSV file with multiple header rowswith Python/Pandas. Reading multi-line headers with Pandas creates a MultiIndex.

Reading multiple headers from a CSV or Excel files can be done by using parameter - header of method read_csv:

import pandas as pd

df = pd.read_csv('../data/csv/multine_header.csv', header=[0,1])

In the rest of the article we will cover different examples and details about using header=[0,1].

Step 1: Sample CSV or Excel sheet

To start lets create a simple CSV file named: multine_header.csv and show how we can read the multi rows header with Pandas read_csv method.

The content of the file is:

Date,Company A,Company A,Company B,Company A
,Rank,Points,Rank,Points
2021-09-06,1,7.9,2,6
2021-09-07,1,8.5,2,7
2021-09-08,2,8,1,8.1

Data from the above file shown in a tabular form is(the same is if we read the CSV without the multi row header):

Date Company A Company A.1 Company B Company B.1
NaN Rank Points Rank Points
2021-09-06 1 7.9 2 6
2021-09-07 1 8.5 2 7
2021-09-08 2 8 1 8.1

Step 2: Read CSV file with multiple headers

To read the above CSV file which has two headers we can use read_csv with a combination of parameter header.

The parameter is described as:

Row number(s) to use as the column names, and the start of the data. Default behavior is to infer the column names: if no names are passed the behavior is identical to header=0 and column names are inferred from the first line of the file, if column names are passed explicitly then the behavior is identical to header=None.

So if a CSV file has two rows as a headers we can read them by:

import pandas as pd

df = pd.read_csv('../data/csv/multine_header.csv', header=[0,1])

Result:

Date Company A Company B
Unnamed: 0_level_1 Rank Points Rank Points
2021-09-06 1 7.9 2 6.0
2021-09-07 1 8.5 2 7.0
2021-09-08 2 8.0 1 8.1

Now we can notice that DataFrame has two levels of columns. In next step we can find how to access the data.

Read CSV file with 3 and more headers

To read CSV file with more than two rows as headers we can use:

df = pd.read_csv('../data/csv/multine_header.csv', header=[0,1,2])

Step 3: Access data from multi-line header DataFrame

In order to access columns of the above DataFrame we need to use MultiIndex syntax.

First let's find what are the column names by:

df.columns

The column names are presented as tuple pairs because they are a MultiIndex:

MultiIndex([(     'Date', 'Unnamed: 0_level_1'),
            ('Company A',               'Rank'),
            ('Company A',             'Points'),
            ('Company B',               'Rank'),
            ('Company B',             'Points')],

So in order to access column Company A - Rank we will need to use the following syntax:

df[('Company A', 'Rank')]

result:

0    1
1    1
2    2
Name: (Company A, Rank), dtype: int64

and to read multiple columns from multi-line header we can use:

df[[('Company A', 'Rank'), ('Company B', 'Rank')]]

result:

Company A Company B
Rank Rank
1 2
1 2
2 1

Step 4: Read Excel file with multiple headers

Pandas can read excel sheets with multiple headers the same way as the CSV files.

Below you can find the code for reading multiple headers from excel file:

pd.read_excel('../data/excel/multine_header.xlsx', sheet_name="multine_header", header=[0,1])

Where the file name is: multine_header.xlsx, the sheet name is multine_header.

To learn more about reading Excel files with Python and Pandas please check: Read Excel XLS with Python Pandas

Resources