How to Convert String to DateTime in Pandas
To convert string column to DateTime in Pandas and Python we can use:
(1) method: pd.to_datetime()
pd.to_datetime(df['date'])
(2) method .astype('datetime64[ns]')
df['timestamp'].astype('datetime64[ns]')
Let's check the most popular cases of conversion of string to dates in Pandas like:
- custom date and time patterns
- infer date pattern
- dates different language locales
- different date formats
Setup
Suppose we have DataFrame with Unix timestamp column as follows:
dict = {'date': {0: '28-01-2022 5:25:00 PM',
1: '27-02-2022 6:25:00 PM',
2: '30-03-2022 7:25:00 PM',
3: '29-04-2022 8:25:00 PM',
4: '31-05-2022 9:25:00 PM'},
'date_short': {0: 'Jan-2022', 1: 'Feb-2022', 2: 'Mar-2022', 3: 'Apr-2022', 4: 'May-2022'}}
df = pd.DataFrame(dict)
So data will look like:
date | date_short | |
---|---|---|
0 | 28-01-2022 5:25:00 PM | Jan-2022 |
1 | 27-02-2022 6:25:00 PM | Feb-2022 |
2 | 30-03-2022 7:25:00 PM | Mar-2022 |
3 | 29-04-2022 8:25:00 PM | Apr-2022 |
4 | 31-05-2022 9:25:00 PM | May-2022 |
Step 1: Convert string to date with pd.to_datetime()
The first and the most common example is to convert a time pattern to a datetime in Pandas.
To do so we can use method pd.to_datetime()
which will recognize the correct date in most cases:
pd.to_datetime(df['date'])
The result is the correct datetime values:
0 2022-01-28 17:25:00
1 2022-02-27 18:25:00
2 2022-03-30 19:25:00
3 2022-04-29 20:25:00
4 2022-05-31 21:25:00
Name: date, dtype: datetime64[ns]
Step 2: Convert time or date pattern "%d/%m/%Y" to date
The method to_datetime
has different parameters which can be found on: pandas.to_datetime.
To give a date format we can use parameter format
:
pd.to_datetime('20220701', format='%Y%m%d', errors='ignore')
Once more example:
pd.to_datetime(df['date'] , format='%Y%m%d HH:MM:SS', errors='ignore')
Note: If we use wrong format we will get an error:
ValueError: time data '28-01-2022 5:25:00 PM' does not match format '%Y%m%d HH:MM:SS' (match)
In order to solve it we can use errors='ignore'
.
To understand how to analyze Pandas date errors you can check this article: OutOfBoundsDatetime: Out of bounds nanosecond timestamp - Pandas and pd.to_datetime
To find more Pandas errors related to dates please check: Pandas Most Typical Errors and Solutions for Beginners
Step 3: Check if string is a date in Pandas
If we need to check if a given column contain dates (even if there are extra characters or words) we can build method like:
from dateutil.parser import parse
def is_date(string, fuzzy=False):
try:
parse(string, fuzzy=fuzzy)
return True
except ValueError:
return False
Then we can use it as:
df['date'].apply(is_date)
For fuzzy matching - meaning that:
today is 2019-03-27
will return True - we need to call it as:
df['date_short'].apply(is_date, fuzzy=True)
Step 4: Infer date format from string
Python and Pandas has several option if we need to infer the date or time pattern from a string.
_guess_datetime_format_for_array
The first option is by using _guess_datetime_format_for_array
:
import numpy as np
from pandas.core.tools.datetimes import _guess_datetime_format_for_array
array = np.array(['2022-06-01T00:10:45.300000'])
_guess_datetime_format_for_array(array)
which will result into:
'%Y-%m-%dT%H:%M:%S.%f'
This option has some limitations and might return None
for valid dates.
For Pandas column we can use:
import numpy as np
from pandas.core.tools.datetimes import _guess_datetime_format_for_array
array = np.array(df["Date"].to_list())
_guess_datetime_format_for_array(array)
hi-dateinfer - Before python 3.8
We can use library: hi-dateinfer which can be installed by:
pip install hi-dateinfer
Now we can infer date or time format for Pandas column as follows:
import hidateinfer as dateinfer
df['Date'].apply(dateinfer.infer)
Which would give us something like:
'%a %b %d %H:%M:%S %Z %Y'
py-dateinfer - Before python 3.8
Another option is to use Python library: py-dateinfer
which can be installed by:
pip install py-dateinfer
To use it we can do:
import dateinfer
dateinfer.infer(['Mon Jan 13 09:52:52 MST 2014', 'Tue Jan 21 15:30:00 EST 2014'])
the result would be:
'%a %b %d %H:%M:%S %Z %Y'
Step 5: Generic parsing of dates 200 locales
What if we need to parse dates in different languages like:
- French
- Thai
- Russian
- Spanish
In this case we can use the Python library called dateparser
. It can be installed by:
pip install dateparser
To parse different locales with dateparser and Pandas:
df['date'].apply(dateparser.parse)
We can also use settings:
df['dates'].apply(dateparser.parse, settings={'DATE_ORDER': 'DMY'})
Step 6: Working with mixed datetime formats
Finally lets cover the case of multiple date formats in a single column in Pandas.
In that case we can build a custom function to detect a parse the correct format like:
def date_parser(date):
if '/' in date:
return pd.to_datetime(date, format = '%Y/%m/%d')
elif '-' in date:
return pd.to_datetime(date, format = '%d-%m-%Y')
else:
return pd.to_datetime(date, format = '%d.%m.%Y')
Or we can parse different format separately and them merge the results:
date_1 = pd.to_datetime(df['date'], errors='coerce', format='%Y/%m/%d')
date_2 = pd.to_datetime(df['date'], errors='coerce', format='%d-%m-%Y')
df['date'] = date_1.fillna(date_2)
Conclusion
In this article we covered conversion of string to date in Pandas. We covered multiple edge cases like locales, formats and errors.
Now we know how to infer date format from a string and how to parse multiple formats in a single Pandas column.