OutOfBoundsDatetime: Out of bounds nanosecond timestamp - Pandas and pd.to_datetime

Have you tried to convert string to datetime with Pandas/Python but got a OutOfBoundsDatetime error?

OutOfBoundsDatetime: Out of bounds nanosecond timestamp

If so, I'll show you the reason for the problem, how to investigate it and how to solve it.

Normally, If you want to convert a string to date with pd.to_datetime, you have many options and important details.

Some of them can be found in the articles below:

Step 1: What is "OutOfBoundsDatetime: Out of bounds nanosecond timestamp"

To start let's explain what is error:

OutOfBoundsDatetime: Out of bounds nanosecond timestamp

Pandas uses NumPy 'datetime64' and 'timedelta64' dtypes in order to add features around Time Series.

This is related to limitations as follows - the range of dates in limited in next interval:

pd.Timestamp.min

result:

Timestamp('1677-09-21 00:12:43.145225')
pd.Timestamp.max

result:

Timestamp('2262-04-11 23:47:16.854775807')

Anything which is outside this date range will raise the error: OutOfBoundsDatetime: Out of bounds nanosecond timestamp

So all the code examples below will raise the error:

pd.to_datetime('Jun 1, 1111')
pd.to_datetime('7 1')
pd.to_datetime('Jun 1')

Step 2: Analyse error "OutOfBoundsDatetime: Out of bounds nanosecond timestamp"

In this step you can learn how to analyse the error and find where the problem is.

Suppose we have a DataFrame like the one below:

import pandas as pd

data = {
    "company":{"0":"Bad Pandas","2":"Clever Fox","4":"Max Wolf","6":"Rage Raycon","8":"Massive Shark"},
    "date":{"0":"Jul 16, 2020","2":"Jul 2, 2021","4":"Jun 27, 2019","6":"Jun 13","8":"May 17"},
    "sales":{"0":"17","2":"27","4":"202","6":"33","8":"29"}}

df = pd.DataFrame(data)
company date sales
Bad Pandas Jul 16, 2020 17
Clever Fox Jul 2, 2021 27
Max Wolf Jun 27, 2019 202
Rage Raycon Jun 13 33
Massive Shark May 17 29

If we try to convert column date to a datetime we will end with error:

pd.to_datetime(df['date'])

output:

Out of bounds nanosecond timestamp: 1-06-13 00:00:00

In this case it might be obvious where the problem is: Jun 13 but in some cases you will have thousands out of millions which will need a fix.

To find the records which are causing issues follow:

1) Create a new column for date 'date2' with:

df['date2'] = pd.to_datetime(df['date'], errors = 'coerce')

2) Find all problematic dates

df[df['date2'].isna()]

result:

Jun 13
May 17

3) Correct the problematic records

In my case the problem is that dates for the current year are missing the year. So in order to fix that problem we can use:

df_temp = df[df['date2'].isna()]
df.loc[df_temp.index, 'date']= df_temp['date'] + ', 2021'

4) Convert all dates to datetime with: pd.to_datetime

The final step is to convert the dates as intended initially:

pd.to_datetime(df['date'])

Step 3: Fix and workarounds for "OutOfBoundsDatetime: Out of bounds nanosecond timestamp"

There are several options in order to workaround the error. You can find two of them explained below:

  1. Parameter - errors = 'ignore' will convert all dates which are OK and the rest will remain unchanged
pd.to_datetime(df['date'], errors = 'ignore')

result:

0    Jul 16, 2020
2     Jul 2, 2021
4    Jun 27, 2019
6          Jun 13
8          May 17
  1. Parameter - errors = 'coerce' will convert all dates which are OK and the rest will be NaT
pd.to_datetime(df['date'], errors = 'coerce')

result:

0   2020-07-16
2   2021-07-02
4   2019-06-27
6          NaT
8          NaT

Resources