In this short guide, we'll see how to remove consecutive, leading and trailing whitespaces in Pandas.

We will cover how to strip all spaces in:

  • entire DataFrame
  • multiple columns
  • columns names
  • read_csv and whitespaces

Below we can find the steps to follow for cleaning whitespaces in Pandas.

Setup

In the post, we'll use the following DataFrame, which consists of several rows and columns:

import pandas as pd
data = {'title': ["  The   jungle book", "Beautiful mind ", "  Rose  "],
        'number': [
            " 123 456 789",
            " 234 567 890 ",
            "  1   2    3 "
        ]}
df = pd.DataFrame.from_dict(data)

DataFrame looks like:

title number
0 The jungle book 123 456 789
1 Beautiful mind 234 567 890
2 The Rose 1 2 3

Sometimes white spaces are not visible in the DataFrame because of the way browsers work. Consecutive spaces are not shown by most browsers.

The real printed data will look like:

                 title         number
0    The   jungle book    123 456 789
1       Beautiful mind    234 567 890
2      The      Rose      1   2    3

Step 1: Strip leading and trailing whitespaces in Pandas

To strip whitespaces from a single column in Pandas We will use method - str.strip().

This method is applied on a single column a follows:

df['title'].str.strip()

After the operation the leading and trailing whitespace are trimmed:

0    The   jungle book
1        Beautiful mind
2        The      Rose
Name: title, dtype: object
Note:

strip() only works on leading and trailing white space.

Step 2: Replace consecutive spaces in Pandas

To remove consecutive spaces from a column in Pandas we will use method replace with regex=True:

df['title'].replace(r'\s+',' ', regex=True)

The result is:

0     The jungle book
1      Beautiful mind
2           The Rose
Name: title, dtype: object

We can notice that leading and trailing whitespace are still present. To remove them we need to chain both methods:

df['title'].str.strip().replace(r'\s+',' ', regex=True)

The final result is:

0    The jungle book
1      Beautiful mind
2           The Rose
Name: title, dtype: object

Consecutive, leading and trailing whitespace are removed from the column.

To remove whitespaces from the column names we can use the following syntax:

df.columns.str.strip()

Step 3: Remove whitespaces from multiple columns

To remove leading and trailing whitespace from multiple columns in Pandas we can use the following code:

for column in ['title', 'number']:
    df[column] = df[column].apply(lambda x:x.strip())

or to remove also the consecutive spaces we can use:

for column in ['title', 'number']:
    df[column] = df[column].apply(lambda x:x.strip().replace(r'\s+',' ') )

Step 4: Remove whitespaces from entire DataFrame

To remove whitespaces from the entire DataFrame in Pandas we will loop over all columns in the DataFrame by:

for col in df.select_dtypes('object'):
    df[col] = df[col].str.strip()

There are several ways to handle the possible errors. One way is to loop over the string columns - dtype 'object' or by catching the errors in try except block:

for col in df.columns:
    try:
        df[col] = df[col].str.strip()
    except AttributeError:
        pass

Depending on the need we can remove consecutive, leading or trailing whitespaces.

Step 5: Skip whitespace with read_csv()

Finally we can remove the whitespaces read by method read_csv(). This can be done by parameters:

  • delim_whitespace - Specifies whether or not whitespace (e.g. ' ' or ' ') will be used as the sep. Equivalent to setting sep='\s+'.
  • sep - regex can be used like - ',\s+
  • skipinitialspace - Skip spaces after delimiter.
df = pd.read_csv(file, sep=',\s+', skipinitialspace=True)

This will help us skipping the spaces after delimiters. Combination of the above parameters might be needed in order to get optimal results. More can be found on the in the read_csv() documentation.

This will help removing the white space from the column names.

Conclusion

To summarize, in this article, we've seen examples of text cleaning.

How to remove leading and trailing whitespace. How to use regex to remove consecutive spaces from data in one or multiple columns.

And finally, we've seen how to manage errors and clean the whole DataFrame.