How to Drop Column in Pandas
In this quick tutorial, we will see how to drop single or multiple columns by name or index in Pandas.
We'll first look into using the drop()
method to:
- drop a single column
- then by using alternatives like -
del
anddf.pop
- drop column with NaN values
- finally how to drop multiple columns.
Setup
In the post, we'll use the following DataFrame, which consists of several rows and columns:
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/softhints/Pandas-Tutorials/master/data/csv/extremes.csv')
DataFrame looks like:
Continent | Highest point | Elevation high | Lowest point | Elevation low |
---|---|---|---|---|
Asia | Mount Everest | 8848 | Dead Sea | −427 |
South America | Aconcagua | 6960 | Laguna del Carbón | −105 |
North America | Denali | 6198 | Death Valley | −86 |
Africa | Mount Kilimanjaro | 5895 | Lake Assal | −155 |
Europe | Mount Elbrus | 5642 | Caspian Sea | −28 |
Step 1: Drop column by name in Pandas
Let's start by using the DataFrame method drop()
to remove a single column.
To drop column named - 'Lowest point' we can use the next syntax:
df = df.drop('Lowest point', axis=1)
or the equivalent:
df = df.drop(columns='Lowest point')
By default method drop()
will return a copy. If you like to do the operation in place you can use the syntax above or parameter:
df.drop('Lowest point', axis=1, inplace=True)
Note that method works on both axes - `axis=1` - means columns.
After the operation the DataFrame will look like:
Continent | Highest point | Elevation high | Elevation low |
---|---|---|---|
Asia | Mount Everest | 8848 | −427 |
South America | Aconcagua | 6960 | −105 |
North America | Denali | 6198 | −86 |
Africa | Mount Kilimanjaro | 5895 | −155 |
Europe | Mount Elbrus | 5642 | −28 |
Step 2: Drop column by index in Pandas
To drop a column by index we will combine:
df.columns
drop()
This step is based on the previous step plus getting the name of the columns by index. So to get the first column we have:
df.columns[0]
the result is:
Continent
So to drop the column on index 0 we can use the following syntax:
df.drop(df.columns[0], axis=1)
Step 3. Drop multiple columns by name in Pandas
Next let's see how to drop multiple columns in Pandas - for example: "Elevation high" and "Elevation low".
Again we are going to use method drop()
by providing list of columns:
df.drop(["Elevation high", "Elevation low"], axis=1)
result:
Continent | Highest point |
---|---|
Asia | Mount Everest |
South America | Aconcagua |
North America | Denali |
Africa | Mount Kilimanjaro |
Europe | Mount Elbrus |
This is possible because parameter labels
can be single or list-like.
Instead of using axis - `labels, axis=1` you can use parameter `columns`:
df.drop(columns=["Highest point"])
Step 4. Drop multiple columns by index
To drop multiple columns by index we can use syntax like:
cols = [0, 2]
df.drop(df.columns[cols], axis=1, inplace=True)
This will drop the first and the third column from the DataFrame
Step 5. Drop column with NaN in Pandas
To drop column or columns which contain NaN values we can use method dropna()
:
df.dropna(axis=1, how='all')
The parameter how='all'
will drop all columns which contain only NaN values.
that
dropna()doesn't change DataFrame in place. We need to use parameter -
inplace=Trueto do so
To drop columns with NaN values by method dropna()
we need the following parameters:
axis=1
- for columnshow
any
- If any NA values are present, drop that row or columnall
- If all values are NA, drop that row or column
subset
- Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include
Step 6. Drop column with del
and df.pop
An alternative solution to remove column from DataFrame is using the Python keyword - del
:
del df["Lowest point"]
Note that this is going to delete the column in place.
One more way to achieve the same behavior is by using method df.pop
:
df.pop('Highest point')
This method will return the column as series:
0 Mount Everest
1 Aconcagua
2 Denali
3 Mount Kilimanjaro
4 Mount Elbrus
5 Vinson Massif
6 Puncak Jaya
Name: Highest point, dtype: object
At the same time will remove the column from the DataFrame.
Conclusion & Resources
In this article, we looked at different ways to drop columns in Pandas.
We saw how to drop single or multiple columns. How to drop columns by index or name. How to drop columns with NaN values.
We covered alternative ways for dropping columns. Finally we saw which is the most efficient way of doing it.
The code for the examples is available over on GitHub in a Notebook.