How to Convert String to DateTime in Pandas

To convert string column to DateTime in Pandas and Python we can use:

(1) method: pd.to_datetime()

pd.to_datetime(df['date'])

(2) method .astype('datetime64[ns]')

df['timestamp'].astype('datetime64[ns]')

Let's check the most popular cases of conversion of string to dates in Pandas like:

  • custom date and time patterns
  • infer date pattern
  • dates different language locales
  • different date formats

Setup

Suppose we have DataFrame with Unix timestamp column as follows:

dict = {'date': {0: '28-01-2022  5:25:00 PM',
  1: '27-02-2022  6:25:00 PM',
  2: '30-03-2022  7:25:00 PM',
  3: '29-04-2022  8:25:00 PM',
  4: '31-05-2022  9:25:00 PM'},
 'date_short': {0: 'Jan-2022', 1: 'Feb-2022', 2: 'Mar-2022', 3: 'Apr-2022', 4: 'May-2022'}}

df = pd.DataFrame(dict)

So data will look like:

date date_short
0 28-01-2022 5:25:00 PM Jan-2022
1 27-02-2022 6:25:00 PM Feb-2022
2 30-03-2022 7:25:00 PM Mar-2022
3 29-04-2022 8:25:00 PM Apr-2022
4 31-05-2022 9:25:00 PM May-2022

Step 1: Convert string to date with pd.to_datetime()

The first and the most common example is to convert a time pattern to a datetime in Pandas.

To do so we can use method pd.to_datetime() which will recognize the correct date in most cases:

pd.to_datetime(df['date'])

The result is the correct datetime values:

0   2022-01-28 17:25:00
1   2022-02-27 18:25:00
2   2022-03-30 19:25:00
3   2022-04-29 20:25:00
4   2022-05-31 21:25:00
Name: date, dtype: datetime64[ns]

Step 2: Convert time or date pattern "%d/%m/%Y" to date

The method to_datetime has different parameters which can be found on: pandas.to_datetime.

To give a date format we can use parameter format:

pd.to_datetime('20220701', format='%Y%m%d', errors='ignore')

Once more example:

pd.to_datetime(df['date'] , format='%Y%m%d HH:MM:SS', errors='ignore')

Note: If we use wrong format we will get an error:

ValueError: time data '28-01-2022 5:25:00 PM' does not match format '%Y%m%d HH:MM:SS' (match)

In order to solve it we can use errors='ignore'.

To understand how to analyze Pandas date errors you can check this article: OutOfBoundsDatetime: Out of bounds nanosecond timestamp - Pandas and pd.to_datetime

To find more Pandas errors related to dates please check: Pandas Most Typical Errors and Solutions for Beginners

Step 3: Check if string is a date in Pandas

If we need to check if a given column contain dates (even if there are extra characters or words) we can build method like:

from dateutil.parser import parse

def is_date(string, fuzzy=False):
    try:
        parse(string, fuzzy=fuzzy)
        return True

    except ValueError:
        return False

Then we can use it as:

df['date'].apply(is_date)

For fuzzy matching - meaning that:

today is 2019-03-27

will return True - we need to call it as:

df['date_short'].apply(is_date, fuzzy=True)

Step 4: Infer date format from string

Python and Pandas has several option if we need to infer the date or time pattern from a string.

_guess_datetime_format_for_array

The first option is by using _guess_datetime_format_for_array:

import numpy as np
from pandas.core.tools.datetimes import _guess_datetime_format_for_array
array = np.array(['2022-06-01T00:10:45.300000'])
_guess_datetime_format_for_array(array)

which will result into:

'%Y-%m-%dT%H:%M:%S.%f'

This option has some limitations and might return None for valid dates.

For Pandas column we can use:

import numpy as np
from pandas.core.tools.datetimes import _guess_datetime_format_for_array
array = np.array(df["Date"].to_list())
_guess_datetime_format_for_array(array)

hi-dateinfer - Before python 3.8

We can use library: hi-dateinfer which can be installed by:

pip install hi-dateinfer

Now we can infer date or time format for Pandas column as follows:

import hidateinfer as dateinfer
df['Date'].apply(dateinfer.infer)

Which would give us something like:

'%a %b %d %H:%M:%S %Z %Y'

py-dateinfer - Before python 3.8

Another option is to use Python library: py-dateinfer which can be installed by:

pip install py-dateinfer

To use it we can do:

import dateinfer
dateinfer.infer(['Mon Jan 13 09:52:52 MST 2014', 'Tue Jan 21 15:30:00 EST 2014'])

the result would be:

'%a %b %d %H:%M:%S %Z %Y'

Step 5: Generic parsing of dates 200 locales

What if we need to parse dates in different languages like:

  • French
  • Thai
  • Russian
  • Spanish

In this case we can use the Python library called dateparser. It can be installed by:

pip install dateparser

To parse different locales with dateparser and Pandas:

df['date'].apply(dateparser.parse)

We can also use settings:

df['dates'].apply(dateparser.parse, settings={'DATE_ORDER': 'DMY'})

Step 6: Working with mixed datetime formats

Finally lets cover the case of multiple date formats in a single column in Pandas.

In that case we can build a custom function to detect a parse the correct format like:

def date_parser(date):

    if '/' in date:
        return pd.to_datetime(date, format = '%Y/%m/%d')
    elif '-' in date:
        return pd.to_datetime(date, format = '%d-%m-%Y')
    else:
        return pd.to_datetime(date, format = '%d.%m.%Y')

Or we can parse different format separately and them merge the results:

date_1 = pd.to_datetime(df['date'], errors='coerce', format='%Y/%m/%d')
date_2 = pd.to_datetime(df['date'], errors='coerce', format='%d-%m-%Y')
df['date'] = date_1.fillna(date_2)

Conclusion

In this article we covered conversion of string to date in Pandas. We covered multiple edge cases like locales, formats and errors.

Now we know how to infer date format from a string and how to parse multiple formats in a single Pandas column.