Pandas read_csv: Automatic Date Reading from CSV Files

In this article, we will see how Pandas handles dates during the CSV reading process and automatic date recognition with method read_csv().

Automatic Date Reading in Pandas

Pandas is designed to automatically recognize and parse dates while reading data from a CSV file, provided that dates are formatted consistently and we provide details about them. The library uses several parameters:

  • parse_dates
  • date_format
  • date_parser

in its read_csv() function to enable automatic date parsing.

You can read more about this parameter here: pandas.read_csv

The behavior of parameter parse_dates is as follows:

  • bool. If True -> try parsing the index.
  • list of int or names. e.g. If [1, 2, 3] -> try parsing columns 1, 2, 3 each as a separate date column.
  • list of list. e.g. If [[1, 3]] -> combine columns 1 and 3 and parse as a single date column.
  • dict, e.g. {'foo' : [1, 3]} -> parse columns 1, 3 as date and call result ‘foo’

Setup

Suppose we work with the following CSV file:

Date,Value
2023-01-01,5
2023-01-02,15
2023-01-03,25

or:

Date,Value,Time
2023-01-01,5,5:45
2023-01-02,15,6:17
2023-01-03,25,8:20

Read Date Columns

By default Pandas will not parse date columns. We need to set which columns to be parsed as dates:

import pandas as pd

df = pd.read_csv('data.csv', parse_dates=['Date'])

This dataframe will have Date columns which are of type datetime64[ns].

The read dataframe:

        Date  Value
0 2023-01-01      5
1 2023-01-02     15
2 2023-01-03     25

read_csv + date_format + parse_dates

After Pandas 2.0 we can apply custom formatting by using parameter date_format:

import pandas as pd

df = pd.read_csv('data.csv', parse_dates=['Date'], date_format={'Date': '%Y-%m-%d'})

Automatic Index Date Parsing

To parse datetime index in Pandas while reading CSV file we can use:

  • parse_dates=True
  • index_col='Date'

Example:

import pandas as pd

df = pd.read_csv('data.csv', parse_dates=True, index_col='Date')

The index will look like:

DatetimeIndex(['2023-01-01', '2023-01-02', '2023-01-03'], dtype='datetime64[ns]', name='Date', freq=None)

and data is:

            Value
Date             
2023-01-01      5
2023-01-02     15
2023-01-03     25

Custom Function with read_csv()

We can force custom date parsing with custom function in Pandas by date_parser:

import pandas as pd

custom_date_parser = lambda x: pd.to_datetime(x, format='%Y-%d-%m')

df = pd.read_csv('data.csv', parse_dates=['Date'], date_parser=custom_date_parser)

We are defining a custom date parsing function and then use it by parameter date_parser.

DataFrame will be read as:

        Date  Value
0 2023-01-01      5
1 2023-02-01     15
2 2023-03-01     25

Combine Two columns into single datetime

With Pandas we can read separate columns - date and time into a single datetime column by using parameter - parse_dates:

df = pd.read_csv('data/data.csv', parse_dates={'datetime': ['Date', 'Time']})

Parameter parse_dates takes a mapping of the expected type and the columns: {'datetime': ['Date', 'Time']}

Conclusion

Pandas offers several ways to parse dates from CSV files. Pandas provides the flexibility to handle various date formats by automatic date recognition or custom parsing.

Converting dates during read_csv operation might prevent errors and offer additional functionality on those columns.