Replace text is one of the most popular operation in Pandas DataFrames and columns. In this post we will see how to replace text in a Pandas.

The short answer of this questions is:

(1) Replace character in Pandas column

df['Depth'].str.replace('.',',')

(2) Replace text in the whole Pandas DataFrame

df.replace('\.',',', regex=True)

We will see several practical examples on how to replace text in Pandas columns and DataFrames.

Suppose we have DataFrame like:

Date Time Latitude Longitude Depth Magnitude Type
12/27/2016 23:20:56 45.7192 26.5230 97.0 MWW
12/28/2016 08:18:01 38.3754 -118.8977 10.8 ML
12/28/2016 08:22:12 38.3917 -118.8941 12.3 ML
12/28/2016 09:13:47 38.3777 -118.8957 8.8 ML
12/28/2016 12:38:51 36.9179 140.4262 10.0 MWW

Replace single character in Pandas Column with .str.replace

First let's start with the most simple example - replacing a single character in a single column. We are going to use the string method - replace:

df['Depth'].str.replace('.',',')

A warning message might be shown - for this one you can check the section below:

FutureWarning: The default value of regex will change from True to False in a future version.

If you are using regex than you can specify it by:

df['Depth'].str.replace('.',',', regex= True)

result:

23405       97
23406     10,8
23407     12,3

Replace regex pattern in Pandas Column

Let say that you would like to use a regex in order to replace specific text patterns in Pandas.

For example let's change the date format of you Pandas DataFrame from:

  • mm/dd/yyyy
    to
  • yyyy-mm-dd

This can be done by using regex flag and using regex groups like:

df['Date'].str.replace(r'(\d{2})/(\d{2})/(\d{4})', r"\3-\2-\1", regex=True)

result:

23405    2016-27-12
23406    2016-28-12
23407    2016-28-12

How does the regex replace in Pandas work? . So this one is translated as (\d{2}):

  • 1st capturing group (\d{2})
    • \d - matches a digit (equivalent to [0-9])
    • {2} matches the previous token exactly 2 times

Then in the replacement part we are replaced by the group numbers - 1st group is \1.

So you can try a simple exercise - to change the format to: dd mm 'yy . The answer is below:

df['Date'].str.replace(r'(\d{2})/(\d{2})/\d{2}(\d{2})', r"\2 \1 '\3", regex=True)

result:

23405    27 12 '16
23406    28 12 '16
23407    28 12 '16

Replace text in whole DataFrame

If you like to replace values in all columns in your Pandas DataFrame then you can use syntax like:

df.replace('\.',',', regex=True)

If you don't specify the columns then the replace operation will be done over all columns and rows.

Replace text with conditions in Pandas with lambda and .apply/.applymap

.applymap is another option to replace text and string in Pandas. This one is useful if you have additional conditions - sometimes the regex might be too complex or to not work.

In this case the replacement can be done by lambda and .apply for columns:

df['Date'].apply(lambda x: x.replace('/', '-'))

result:

Date Time Latitude Longitude Depth Magnitude Type
12/27/2016 23:20:56 45.7192 26.523 97 MWW
12/28/2016 08:18:01 38.3754 -118.8977 10.8 ML

and .applymap for the whole DataFrame:

df.applymap(lambda x: x.replace('/', '-'))

result:

Date Time Latitude Longitude Depth Magnitude Type
12/27/2016 23:20:56 45,7192 26,523 97 MWW
12/28/2016 08:18:01 38,3754 -118,8977 10,8 ML

Note that this solution might be slower than the others.

Conditional replace in Pandas

If you like to apply condition to your replacement in Pandas you can use syntax like:

df['Date'].apply(lambda x: x.replace('/', '-') if '20' in x else x)

result:

23405    12-27-2016
23406    12-28-2016
23407    12-28-2016

Warning: FutureWarning: The default value of regex will change from True to False in a future version.

If you get an warning like:

FutureWarning: The default value of regex will change from True to False in a future version.

It means that you will need to explicitly set the regex parameter for replace method:

df.replace('\.',',', regex=True)

The official documentation of method replace is:

`regex` -** bool, default True**
Determines if the passed-in pattern is a regular expression:
    * If True, assumes the passed-in pattern is a regular expression.
    * If False, treats the pattern as a literal string

Cannot be set to False if pat is a compiled regex or repl is a callable.

So the difference is how the passed pattern will be parsed - as a regex expression or not.

So in a practical example:

df.replace('.',',')

this will not change the values of the DataFrame:

45.7192

While the regex expression - '\.' and regex=True:

df.replace('\.',',', regex=True)

will change them:

45,7192

Resources