How to Replace Text in a Pandas DataFrame Or Column
Replace text is one of the most popular operation in Pandas DataFrames and columns. In this post we will see how to replace text in a Pandas.
The short answer of this questions is:
(1) Replace character in Pandas column
df['Depth'].str.replace('.',',')
(2) Replace text in the whole Pandas DataFrame
df.replace('\.',',', regex=True)
We will see several practical examples on how to replace text in Pandas columns and DataFrames.
Suppose we have DataFrame like:
Date | Time | Latitude | Longitude | Depth | Magnitude Type |
---|---|---|---|---|---|
12/27/2016 | 23:20:56 | 45.7192 | 26.5230 | 97.0 | MWW |
12/28/2016 | 08:18:01 | 38.3754 | -118.8977 | 10.8 | ML |
12/28/2016 | 08:22:12 | 38.3917 | -118.8941 | 12.3 | ML |
12/28/2016 | 09:13:47 | 38.3777 | -118.8957 | 8.8 | ML |
12/28/2016 | 12:38:51 | 36.9179 | 140.4262 | 10.0 | MWW |
Replace single character in Pandas Column with .str.replace
First let's start with the most simple example - replacing a single character in a single column. We are going to use the string method - replace
:
df['Depth'].str.replace('.',',')
A warning message might be shown - for this one you can check the section below:
FutureWarning: The default value of regex will change from True to False in a future version.
If you are using regex than you can specify it by:
df['Depth'].str.replace('.',',', regex= True)
result:
23405 97
23406 10,8
23407 12,3
Replace regex pattern in Pandas Column
Let say that you would like to use a regex in order to replace specific text patterns in Pandas.
For example let's change the date format of you Pandas DataFrame from:
- mm/dd/yyyy
to - yyyy-mm-dd
This can be done by using regex flag and using regex groups like:
df['Date'].str.replace(r'(\d{2})/(\d{2})/(\d{4})', r"\3-\2-\1", regex=True)
result:
23405 2016-27-12
23406 2016-28-12
23407 2016-28-12
How does the regex replace in Pandas work? . So this one is translated as (\d{2})
:
- 1st capturing group
(\d{2})
\d
- matches a digit (equivalent to [0-9])- {2} matches the previous token exactly 2 times
Then in the replacement part we are replaced by the group numbers - 1st group is \1
.
So you can try a simple exercise - to change the format to: dd mm 'yy . The answer is below:
df['Date'].str.replace(r'(\d{2})/(\d{2})/\d{2}(\d{2})', r"\2 \1 '\3", regex=True)
result:
23405 27 12 '16
23406 28 12 '16
23407 28 12 '16
Replace text in whole DataFrame
If you like to replace values in all columns in your Pandas DataFrame then you can use syntax like:
df.replace('\.',',', regex=True)
If you don't specify the columns then the replace operation will be done over all columns and rows.
Replace text with conditions in Pandas with lambda and .apply/.applymap
.applymap
is another option to replace text and string in Pandas. This one is useful if you have additional conditions - sometimes the regex might be too complex or to not work.
In this case the replacement can be done by lambda and .apply
for columns:
df['Date'].apply(lambda x: x.replace('/', '-'))
result:
Date | Time | Latitude | Longitude | Depth | Magnitude Type |
---|---|---|---|---|---|
12/27/2016 | 23:20:56 | 45.7192 | 26.523 | 97 | MWW |
12/28/2016 | 08:18:01 | 38.3754 | -118.8977 | 10.8 | ML |
and .applymap
for the whole DataFrame:
df.applymap(lambda x: x.replace('/', '-'))
result:
Date | Time | Latitude | Longitude | Depth | Magnitude Type |
---|---|---|---|---|---|
12/27/2016 | 23:20:56 | 45,7192 | 26,523 | 97 | MWW |
12/28/2016 | 08:18:01 | 38,3754 | -118,8977 | 10,8 | ML |
Note that this solution might be slower than the others.
Conditional replace in Pandas
If you like to apply condition to your replacement in Pandas you can use syntax like:
df['Date'].apply(lambda x: x.replace('/', '-') if '20' in x else x)
result:
23405 12-27-2016
23406 12-28-2016
23407 12-28-2016
Warning: FutureWarning: The default value of regex will change from True to False in a future version.
If you get an warning like:
FutureWarning: The default value of regex will change from True to False in a future version.
It means that you will need to explicitly set the regex
parameter for replace
method:
df.replace('\.',',', regex=True)
The official documentation of method replace
is:
`regex` -** bool, default True**
Determines if the passed-in pattern is a regular expression:
* If True, assumes the passed-in pattern is a regular expression.
* If False, treats the pattern as a literal string
Cannot be set to False if pat is a compiled regex or repl is a callable.
So the difference is how the passed pattern will be parsed - as a regex expression or not.
So in a practical example:
df.replace('.',',')
this will not change the values of the DataFrame:
45.7192
While the regex expression - '\.'
and regex=True
:
df.replace('\.',',', regex=True)
will change them:
45,7192