Pandas Most Typical Errors and Solutions for Beginners

In this post I'll try to list the most often errors and their solution in Pandas and Python.

The list will grow with time and will be updated frequently.

DateTime

Invalid comparison between or subtraction must have the same timezones

  • TypeError: Timestamp subtraction must have the same timezones or no timezones
  • datetimearray subtraction must have the same timezones or no timezones
  • TypeError: Invalid comparison between dtype=datetime64[ns] and DatetimeArray
  • TypeError: Invalid comparison between dtype=datetime64[ns] and Date

Quick solution is to remove the timezone information by:

df['time_tz'].dt.tz_localize(None)

Example and more details: How to Remove Timezone from a DateTime Column in Pandas

TypeError: Addition/subtraction of integers and integer-arrays with Timestamp is no longer supported

  • TypeError: Addition/subtraction of integers and integer-arrays with Timestamp is no longer supported. Instead of adding/subtracting `n`, use `n * obj.freq`

Quick solution is to use n * obj.freq:

pd.to_datetime("today") - pd.Timedelta(10, unit='D')

Example and more details: How to Get Today's Date in Pandas

'index' object has no attribute 'tz_localize'

  • 'index' object has no attribute 'tz_localize'
  • attributeerror: 'index' object has no attribute 'tz_localize'

Quick solution is to check if the index is from DateTime or convert a column before using it as index:

df.set_index(pd.DatetimeIndex(df['date']), drop=False, inplace=True)

Example and more details: How to Remove Timezone from a DateTime Column in Pandas

OutOfBoundsDatetime

  • OutOfBoundsDatetime: Out of bounds nanosecond timestamp

The short answer of this error is:

pd.to_datetime(df['date'], errors = 'ignore')

Example and more details: OutOfBoundsDatetime: Out of bounds nanosecond timestamp - Pandas and pd.to_datetime

Wrong dates - ParserError: Unknown string format

  • ParserError: Unknown string format: 1975-02-23T02:58:41.000Z 1975-02-23T02:58:41.000Z
df['date'] = pd.to_datetime(df['date_str'], format='%d/%m/%Y', errors='coerce')

Example and more details:

Wrong dates - ValueError: time data does not match format '%Y%m%d HH:MM:SS' (match)

  • ValueError: time data '28-01-2022 5:25:00 PM' does not match format '%Y%m%d HH:MM:SS' (match)
pd.to_datetime('20220701', format='%Y%m%d', errors='ignore')

Example and more details:

read_csv

UnicodeDecodeError - 'utf-8' codec can't decode byte 0x97 in position 6785: invalid start byte

  • UnicodeDecodeError: 'utf-8' codec can't decode byte 0x97 in position 6785: invalid start byte

The short answer of this error is:

df = pd.read_csv('../data/csv/file_utf-16.csv', encoding='utf-16')

Example and more details: How to Fix - UnicodeDecodeError: invalid start byte - during read_csv in Pandas

ParserError: Expected 5 fields in line 5, saw 6

  • ParserError: Expected 5 fields in line 5, saw 6. Error could possibly be due to quotes being ignored when a multi-char delimiter is used

The short answer of this error is:

df = pd.read_csv(csv_file, delimiter=';;', engine='python', error_bad_lines=False)
df = pd.read_csv(csv_file, delimiter=';;', engine='python', on_bad_lines='skip)

Example and more details: How to Use Multiple Char Separator in read_csv in Pandas

ParserError: Error tokenizing data. C error: Expected 2 fields in line 4, saw 4

  • ParserError: Error tokenizing data. C error: Expected 2 fields in line 4, saw 4

The short answer of this error is:

pd.read_csv('test.csv', on_bad_lines='skip')

Example and more details: How to Solve Error Tokenizing Data on read_csv in Pandas

to_csv

AttributeError: object has no attribute 'to_csv'

  • AttributeError: 'numpy.ndarray' object has no attribute 'to_csv'
  • attributeerror: 'tuple' object has no attribute 'to_csv'

The short answer of this error is:

pd.Series(df['Magnitude Type'].unique()).to_csv('data.csv')

Example and more details: Dump (unique) values to CSV / to_csv in Pandas

Index / MultiIndex

MultiIndex - indexerror: too many levels

  • ValueError: Cannot remove 1 levels from an index with 1 levels: at least one level must be left.
  • IndexError: Too many levels: Index has only 1 level, not 4
  • indexerror: too many levels: index has only 1 level, not 2

The short answer of this error is:

df.index
df.droplevel(level=1)
df.reset_index(level=1)
df.columns.droplevel(level=0)

Example and more details: How to Drop a Level from a MultiIndex in Pandas DataFrame

Sort MultiIndex - label must be a tuple with elements corresponding to each level

  • ValueError: The column label 'Depth' is not unique. For a multi-index, the label must be a tuple with elements corresponding to each level.

The short answer of this error is:

df_multi.columns
df_multi.columns.get_level_values(1)
df_multi.sort_values(by=[('Depth', 'mean')], ascending=False)

Example and more details: How to Sort MultiIndex in Pandas

Merge

ValueError: Indexes have overlapping values

  • ValueError: Indexes have overlapping values: Index(['A', 'B', 'C', 'D'], dtype='object')

The short answer of this error is:

pd.concat([df1, df2], axis='columns', verify_integrity=False)
df1.join(df2, lsuffix='_x')

Example and more details: How to Merge Two DataFrames on Index in Pandas

String

TypeError: can only concatenate str (not "float") to str

  • TypeError: can only concatenate str (not "float") to str

The short answer of this error is:

df['Magnitude'].astype(str)

Example and more details: Combine Multiple columns into a single one in Pandas

Column

ValueError: cannot reindex from a duplicate axis

  • ValueError: cannot reindex from a duplicate axis

The short answer of this error is:

df = df.sort_index(axis=1)

Example and more details: How to Change the Order of Columns in Pandas DataFrame

DataFrame

ValueError: caption must be either a string or 2-tuple of strings.

  • ValueError: caption must be either a string or 2-tuple of strings.

The short answer of this error is - Use string or 2-tuple for DataFrame captions - and not integer:

df.style.set_caption('DataFrame Name')

Example and more details: How to Set Caption and Customize Font Size and Color in Pandas DataFrame