Pandas Most Typical Errors and Solutions for Beginners
In this post I'll try to list the most often errors and their solution in Pandas and Python.
The list will grow with time and will be updated frequently.
DateTime
Invalid comparison between or subtraction must have the same timezones
TypeError: Timestamp subtraction must have the same timezones or no timezones
datetimearray subtraction must have the same timezones or no timezones
TypeError: Invalid comparison between dtype=datetime64[ns] and DatetimeArray
TypeError: Invalid comparison between dtype=datetime64[ns] and Date
Quick solution is to remove the timezone information by:
df['time_tz'].dt.tz_localize(None)
Example and more details: How to Remove Timezone from a DateTime Column in Pandas
TypeError: Addition/subtraction of integers and integer-arrays with Timestamp is no longer supported
TypeError: Addition/subtraction of integers and integer-arrays with Timestamp is no longer supported. Instead of adding/subtracting `n`, use `n * obj.freq`
Quick solution is to use n * obj.freq
:
pd.to_datetime("today") - pd.Timedelta(10, unit='D')
Example and more details: How to Get Today's Date in Pandas
'index' object has no attribute 'tz_localize'
'index' object has no attribute 'tz_localize'
attributeerror: 'index' object has no attribute 'tz_localize'
Quick solution is to check if the index is from DateTime or convert a column before using it as index:
df.set_index(pd.DatetimeIndex(df['date']), drop=False, inplace=True)
Example and more details: How to Remove Timezone from a DateTime Column in Pandas
OutOfBoundsDatetime
OutOfBoundsDatetime: Out of bounds nanosecond timestamp
The short answer of this error is:
pd.to_datetime(df['date'], errors = 'ignore')
Example and more details: OutOfBoundsDatetime: Out of bounds nanosecond timestamp - Pandas and pd.to_datetime
Wrong dates - ParserError: Unknown string format
ParserError: Unknown string format: 1975-02-23T02:58:41.000Z 1975-02-23T02:58:41.000Z
df['date'] = pd.to_datetime(df['date_str'], format='%d/%m/%Y', errors='coerce')
Example and more details:
- How to Fix Pandas to_datetime: Wrong Date and Errors
- Combine Multiple columns into a single one in Pandas
Wrong dates - ValueError: time data does not match format '%Y%m%d HH:MM:SS' (match)
ValueError: time data '28-01-2022 5:25:00 PM' does not match format '%Y%m%d HH:MM:SS' (match)
pd.to_datetime('20220701', format='%Y%m%d', errors='ignore')
Example and more details:
read_csv
UnicodeDecodeError - 'utf-8' codec can't decode byte 0x97 in position 6785: invalid start byte
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x97 in position 6785: invalid start byte
The short answer of this error is:
df = pd.read_csv('../data/csv/file_utf-16.csv', encoding='utf-16')
Example and more details: How to Fix - UnicodeDecodeError: invalid start byte - during read_csv in Pandas
ParserError: Expected 5 fields in line 5, saw 6
ParserError: Expected 5 fields in line 5, saw 6. Error could possibly be due to quotes being ignored when a multi-char delimiter is used
The short answer of this error is:
df = pd.read_csv(csv_file, delimiter=';;', engine='python', error_bad_lines=False)
df = pd.read_csv(csv_file, delimiter=';;', engine='python', on_bad_lines='skip)
Example and more details: How to Use Multiple Char Separator in read_csv in Pandas
ParserError: Error tokenizing data. C error: Expected 2 fields in line 4, saw 4
ParserError: Error tokenizing data. C error: Expected 2 fields in line 4, saw 4
The short answer of this error is:
pd.read_csv('test.csv', on_bad_lines='skip')
Example and more details: How to Solve Error Tokenizing Data on read_csv in Pandas
to_csv
AttributeError: object has no attribute 'to_csv'
AttributeError: 'numpy.ndarray' object has no attribute 'to_csv'
attributeerror: 'tuple' object has no attribute 'to_csv'
The short answer of this error is:
pd.Series(df['Magnitude Type'].unique()).to_csv('data.csv')
Example and more details: Dump (unique) values to CSV / to_csv in Pandas
Index / MultiIndex
MultiIndex - indexerror: too many levels
ValueError: Cannot remove 1 levels from an index with 1 levels: at least one level must be left.
IndexError: Too many levels: Index has only 1 level, not 4
indexerror: too many levels: index has only 1 level, not 2
The short answer of this error is:
df.index
df.droplevel(level=1)
df.reset_index(level=1)
df.columns.droplevel(level=0)
Example and more details: How to Drop a Level from a MultiIndex in Pandas DataFrame
Sort MultiIndex - label must be a tuple with elements corresponding to each level
ValueError: The column label 'Depth' is not unique. For a multi-index, the label must be a tuple with elements corresponding to each level.
The short answer of this error is:
df_multi.columns
df_multi.columns.get_level_values(1)
df_multi.sort_values(by=[('Depth', 'mean')], ascending=False)
Example and more details: How to Sort MultiIndex in Pandas
Merge
ValueError: Indexes have overlapping values
ValueError: Indexes have overlapping values: Index(['A', 'B', 'C', 'D'], dtype='object')
The short answer of this error is:
pd.concat([df1, df2], axis='columns', verify_integrity=False)
df1.join(df2, lsuffix='_x')
Example and more details: How to Merge Two DataFrames on Index in Pandas
String
TypeError: can only concatenate str (not "float") to str
TypeError: can only concatenate str (not "float") to str
The short answer of this error is:
df['Magnitude'].astype(str)
Example and more details: Combine Multiple columns into a single one in Pandas
Column
ValueError: cannot reindex from a duplicate axis
ValueError: cannot reindex from a duplicate axis
The short answer of this error is:
df = df.sort_index(axis=1)
Example and more details: How to Change the Order of Columns in Pandas DataFrame
DataFrame
ValueError: caption
must be either a string or 2-tuple of strings.
ValueError:
captionmust be either a string or 2-tuple of strings.
The short answer of this error is - Use string or 2-tuple for DataFrame captions - and not integer:
df.style.set_caption('DataFrame Name')
Example and more details: How to Set Caption and Customize Font Size and Color in Pandas DataFrame