In this short guide, I'll show you how to solve Pandas or Python errors:
valueerror: could not convert string to float: '< "0.01'"
ValueError: could not convert string to float: '2,000'
ValueError: could not convert string to float: '$100.00'
ValueError: Unable to parse string "$10.00" at position 0
We will see how to solve the errors above and how to identify the problematic rows in Pandas.
Setup
Let's create an example DataFrame in order to reproduce the error:
ValueError: could not convert string to float: '$10.00'
import pandas as pd
df = pd.DataFrame({'day': [1, 2, 3, 4, 5],
'amount': ['$10.00', '20.5', '17.34', '4,2', '111.00']})
DataFrame looks like:
day | amount | |
---|---|---|
0 | 1 | $10.00 |
1 | 2 | 20.5 |
2 | 3 | 17.34 |
3 | 4 | 4,2 |
4 | 5 | 111.00 |
Step 1: ValueError: could not convert string to float
To convert string to float we can use the function: .astype(float)
. If we try to do so for the column - amount:
df['amount'].astype(float)
we will face error:
ValueError: could not convert string to float: '$10.00'
Step 2: ValueError: Unable to parse string "$10.00" at position 0
We will see similar result if we try to convert the column to numerical by method pd.to_numeric(
:
pd.to_numeric(df['amount'])
error is raised:
ValueError: Unable to parse string "$10.00" at position 0
In both cases we can see the problematic value. But we are not sure if more different cases exist.
Step 3: Identify problematic non numeric values
To find which values are causing the problem we will use a simple trick. We will convert all values except the problematic ones by:
pd.to_numeric(df['amount'], errors='coerce')
This give us successfully converted float values:
0 NaN
1 20.50
2 17.34
3 NaN
4 111.00
Name: amount, dtype: float64
The ones in error will be replaced by NaN
. No we can use mask to get only value which cause the error during the conversion to numerical values:
df[pd.to_numeric(df['amount'], errors='coerce').isna()]['amount']
result is:
0 $10.00
3 4,2
Name: amount, dtype: object
Step 4: Solve ValueError: could not convert string to float
To solve the errors:
ValueError: could not convert string to float: '$10.00'
ValueError: Unable to parse string "$10.00" at position 0
We have several options:
- ignore errors from invalid parsing and keep the output as it is:
pd.to_numeric(df['amount'], errors='ignore')
- convert only numeric values and
NaN
for the rest in the column:pd.to_numeric(df['amount'], errors='coerce')
- fix problematic values
In this step we are going to fix the problematic values. This can be done by replacing the non numeric symbols like:
$
,
- wrong decimal symbol etc
So to replace the problematic characters we can use str.replace
:
df['amount'].str.replace('$', '', regex=True)
Replacing multiple characters can be done by chaining multiple functions like.(for multiple replacing values):
df['amount'].str.replace('$', '', regex=True).replace('\,', '.', regex=True)
of by using regex (when all symbols are replaced by a single value):
df['amount'].str.replace('[$|,]', '', regex=True)
Finally we get only numeric values which can be converted to numeric column:
0 10.00
1 20.5
2 17.34
3 42
4 111.00
Name: amount, dtype: object
Step 5: Convert numbers and keep the rest
Finally if we like to convert only valid numbers we can use errors='coerce'
. Then for all missing values we can populate them from the original column or Series:
data = pd.Series(['$10.00', '20.5', '17.34', '4,2', '111.00', np.NaN, ''])
conv_data = pd.to_numeric(data, errors='coerce').fillna(data)
conv_data
the result will be:
0 $10.00
1 20.5
2 17.34
3 4,2
4 111.0
5 NaN
6
dtype: object
While using
pd.to_numeric(['$10.00', '20.5', '17.34', '4,2', '111.00', np.NaN, ''], errors='ignore')
will keep all the values the same if there is invalid parsing:
array(['$10.00', '20.5', '17.34', '4,2', '111.00', nan, ''], dtype=object)
Conclusion
In this post, we saw how to properly convert strings to float columns in Pandas. We covered the most popular errors and how to solve them.
Finally we discussed finding the problematic cases and fixing them.