Read Excel XLS with Python Pandas
In this post you can learn how to read Excel files (ext xls, xlsx etc) with Python and Pandas. We will import one or several sheets from an Excel file to a Pandas DataFrame.
The list of the supported file extensions:
xls
xlsx
xlsm
xlsb
odf
ods
odt
Note for ods
, ods
and odt
please check: Read Excel(OpenDocument ODS) with Python Pandas
Step 1: Install Pandas and odfpy
Python offers many different modules for reading and manipulating Excel files. In this guide we are going to use pandas
and odfpy
:
pip install pandas
pip install odfpy
Step 2: Read the one sheet of Excel(XLS) file
Pandas offers a powerful method for reading any type of Excel files read_excel()
. It's pretty easy to be used and requires only the file path:
import pandas as pd
pd.read_excel('animals.xls')
It will read and return all non empty cells from the Excel file:
Rank | Animal | Maximum speed | Class | Notes | |
---|---|---|---|---|---|
0 | 1 | Peregrine falcon | 389 km/h (242 mph)108 m/s (354 ft/s)[2][6] | Flight-diving | The peregrine falcon is the fastest aerial ani... |
1 | 2 | Golden eagle | 240–320 km/h (150–200 mph)67–89 m/s (220–293 f... | Flight-diving | Assuming the maximum size at 1.02 m, its relat... |
2 | 3 | White-throated needletail swift | 169 km/h (105 mph)[8][9][10] | Flight | NaN |
3 | 4 | Eurasian hobby | 160 km/h (100 mph)[11] | Flight | Can sometimes outfly the swift |
4 | 5 | Mexican free-tailed bat | 160 km/h (100 mph)[12] | Flight | It has been claimed to have the fastest horizo... |
5 | 6 | Frigatebird | 153 km/h (95 mph) | Flight | The frigatebird's high speed is helped by its ... |
6 | 7 | Rock dove (pigeon) | 148.9 km/h (92.5 mph)[13] | Flight | Pigeons have been clocked flying 92.5 mph (148... |
7 | 8 | Spur-winged goose | 142 km/h (88 mph)[14] | Flight | NaN |
8 | 9 | Gyrfalcon | 128 km/h (80 mph)[citation needed] | Flight | NaN |
Step 3: Read the second sheet of Excel file by name
If you like to read data from a specific sheet - for example Sheet 2
then you can specify the name as a parameter - sheet_name
:
pd.read_excel('animals.xlsx', sheet_name="Sheet2")
Which will result in:
Blackbuck | Unnamed: 1 | |
---|---|---|
0 | NaN | NaN |
1 | Male blackbuck | Male blackbuck |
2 | NaN | NaN |
3 | Female with young at the National Zoological Park Delhi | Female with young at the National Zoological P... |
4 | Conservation status | Conservation status |
5 | Least Concern (IUCN 3.1)[1] | Least Concern (IUCN 3.1)[1] |
6 | Scientific classification | Scientific classification |
Step 4: Python read excel file - specify columns and rows
If you like to read a range of data and not the whole sheet - read_excel
offers several very useful parameters.
Python read excel file select rows
Next code example will show you how to read 3 rows skipping the first two rows. In this way Pandas will read only some rows from the whole sheet:
pd.read_excel('animals.xlsx', skiprows=2, nrows=3)
which will result in:
2 | Golden eagle | 240–320 km/h (150–200 mph)67–89 m/s (220–293 f... | Flight-diving | Assuming the maximum size at 1.02 m, its relat... | |
---|---|---|---|---|---|
0 | 3 | White-throated needletail swift | 169 km/h (105 mph)[8][9][10] | Flight | NaN |
1 | 4 | Eurasian hobby | 160 km/h (100 mph)[11] | Flight | Can sometimes outfly the swift |
2 | 5 | Mexican free-tailed bat | 160 km/h (100 mph)[12] | Flight | It has been claimed to have the fastest horizo... |
Python read excel file select columns
If you like to** work with few columns** and not the whole sheet - then parameter use_cols
can be used as shown:
pd.read_excel('animals.xlsx', usecols='C:D')
Python read excel file specify columns and rows
Finally if you like to select a range from specific columns and rows than you can use:
Which will result into:
240–320 km/h (150–200 mph)67–89 m/s (220–293 f... | Flight-diving | |
---|---|---|
0 | 169 km/h (105 mph)[8][9][10] | Flight |
1 | 160 km/h (100 mph)[11] | Flight |
2 | 160 km/h (100 mph)[12] | Flight |
Step 5. Read multiple sheets from Excel file
What if you like to read with Pandas multiple sheets from Excel. It's possible with pd.read_excel
by providing a list of all sheets to be read as follows:
pd.read_excel('animals.xlsx', sheet_name=["Sheet1", "Sheet2"])
Note that a dictionary of
- keys - sheet names
- values - resulted DataFrames
will be returned.
In order to access data you can access it by a sheet name as:
pd.read_excel('animals.xlsx', sheet_name=["Sheet1", "Sheet2"]).get('Sheet1')
which will return the data for Sheet1
as a DataFrame.
Read All Sheets
For loading all sheets from Excel file use sheet_name=None
:
pd.read_excel('animals.xlsx', sheet_name=None)
Step 6. Pandas read excel data with conversion, NA values and parsing
Finally let's check what we can do if we need to convert data, drop or fill missing values, parse dates and numbers.
Pandas offers several parameters for this purpose:
- converters - dict of functions for converting values in certain columns
- keep_default_na - whether or not to include the default NaN values
- parse_dates
- ate_parser - converting a sequence of string columns to an array of datetime instances.
- thousands
- convert_float
You can check the Notebook in the resources for more examples of the above.