Have you tried to convert string to datetime with Pandas/Python but got a OutOfBoundsDatetime error?
OutOfBoundsDatetime: Out of bounds nanosecond timestamp
If so, I'll show you the reason for the problem, how to investigate it and how to solve it.
Normally, If you want to convert a string to date with pd.to_datetime
, you have many options and important details.
Some of them can be found in the articles below:
- How to Fix Pandas to_datetime: Wrong Date and Errors By John D K in How To Guides • 2 months ago
- How to convert month number to month name in Pandas DataFrame
Step 1: What is "OutOfBoundsDatetime: Out of bounds nanosecond timestamp"
To start let's explain what is error:
OutOfBoundsDatetime: Out of bounds nanosecond timestamp
Pandas uses NumPy 'datetime64' and 'timedelta64' dtypes in order to add features around Time Series.
This is related to limitations as follows - the range of dates in limited in next interval:
pd.Timestamp.min
result:
Timestamp('1677-09-21 00:12:43.145225')
pd.Timestamp.max
result:
Timestamp('2262-04-11 23:47:16.854775807')
Anything which is outside this date range will raise the error: OutOfBoundsDatetime: Out of bounds nanosecond timestamp
So all the code examples below will raise the error:
pd.to_datetime('Jun 1, 1111')
pd.to_datetime('7 1')
pd.to_datetime('Jun 1')
Step 2: Analyse error "OutOfBoundsDatetime: Out of bounds nanosecond timestamp"
In this step you can learn how to analyse the error and find where the problem is.
Suppose we have a DataFrame like the one below:
import pandas as pd
data = {
"company":{"0":"Bad Pandas","2":"Clever Fox","4":"Max Wolf","6":"Rage Raycon","8":"Massive Shark"},
"date":{"0":"Jul 16, 2020","2":"Jul 2, 2021","4":"Jun 27, 2019","6":"Jun 13","8":"May 17"},
"sales":{"0":"17","2":"27","4":"202","6":"33","8":"29"}}
df = pd.DataFrame(data)
company | date | sales |
---|---|---|
Bad Pandas | Jul 16, 2020 | 17 |
Clever Fox | Jul 2, 2021 | 27 |
Max Wolf | Jun 27, 2019 | 202 |
Rage Raycon | Jun 13 | 33 |
Massive Shark | May 17 | 29 |
If we try to convert column date
to a datetime we will end with error:
pd.to_datetime(df['date'])
output:
Out of bounds nanosecond timestamp: 1-06-13 00:00:00
In this case it might be obvious where the problem is: Jun 13
but in some cases you will have thousands out of millions which will need a fix.
To find the records which are causing issues follow:
1) Create a new column for date 'date2' with:
df['date2'] = pd.to_datetime(df['date'], errors = 'coerce')
2) Find all problematic dates
df[df['date2'].isna()]
result:
Jun 13
May 17
3) Correct the problematic records
In my case the problem is that dates for the current year are missing the year. So in order to fix that problem we can use:
df_temp = df[df['date2'].isna()]
df.loc[df_temp.index, 'date']= df_temp['date'] + ', 2021'
4) Convert all dates to datetime with: pd.to_datetime
The final step is to convert the dates as intended initially:
pd.to_datetime(df['date'])
Step 3: Fix and workarounds for "OutOfBoundsDatetime: Out of bounds nanosecond timestamp"
There are several options in order to workaround the error. You can find two of them explained below:
- Parameter - errors = 'ignore' will convert all dates which are OK and the rest will remain unchanged
pd.to_datetime(df['date'], errors = 'ignore')
result:
0 Jul 16, 2020
2 Jul 2, 2021
4 Jun 27, 2019
6 Jun 13
8 May 17
- Parameter - errors = 'coerce' will convert all dates which are OK and the rest will be NaT
pd.to_datetime(df['date'], errors = 'coerce')
result:
0 2020-07-16
2 2021-07-02
4 2019-06-27
6 NaT
8 NaT