Pandas read_csv: Automatic Date Reading from CSV Files
In this article, we will see how Pandas handles dates during the CSV reading process and automatic date recognition with method read_csv()
.
Automatic Date Reading in Pandas
Pandas is designed to automatically recognize and parse dates while reading data from a CSV file, provided that dates are formatted consistently and we provide details about them. The library uses several parameters:
parse_dates
date_format
date_parser
in its read_csv()
function to enable automatic date parsing.
You can read more about this parameter here: pandas.read_csv
The behavior of parameter parse_dates
is as follows:
bool
. IfTrue
-> try parsing the index.list
ofint
or names. e.g. If[1, 2, 3]
-> try parsing columns 1, 2, 3 each as a separate date column.list
oflist
. e.g. If[[1, 3]]
-> combine columns 1 and 3 and parse as a single date column.dict
, e.g.{'foo' : [1, 3]}
-> parse columns 1, 3 as date and call result ‘foo’
Setup
Suppose we work with the following CSV file:
Date,Value
2023-01-01,5
2023-01-02,15
2023-01-03,25
or:
Date,Value,Time
2023-01-01,5,5:45
2023-01-02,15,6:17
2023-01-03,25,8:20
Read Date Columns
By default Pandas will not parse date columns. We need to set which columns to be parsed as dates:
import pandas as pd
df = pd.read_csv('data.csv', parse_dates=['Date'])
This dataframe will have Date columns which are of type datetime64[ns]
.
The read dataframe:
Date Value
0 2023-01-01 5
1 2023-01-02 15
2 2023-01-03 25
read_csv + date_format + parse_dates
After Pandas 2.0 we can apply custom formatting by using parameter date_format
:
import pandas as pd
df = pd.read_csv('data.csv', parse_dates=['Date'], date_format={'Date': '%Y-%m-%d'})
Automatic Index Date Parsing
To parse datetime index in Pandas while reading CSV file we can use:
parse_dates=True
index_col='Date'
Example:
import pandas as pd
df = pd.read_csv('data.csv', parse_dates=True, index_col='Date')
The index will look like:
DatetimeIndex(['2023-01-01', '2023-01-02', '2023-01-03'], dtype='datetime64[ns]', name='Date', freq=None)
and data is:
Value
Date
2023-01-01 5
2023-01-02 15
2023-01-03 25
Custom Function with read_csv()
We can force custom date parsing with custom function in Pandas by date_parser
:
import pandas as pd
custom_date_parser = lambda x: pd.to_datetime(x, format='%Y-%d-%m')
df = pd.read_csv('data.csv', parse_dates=['Date'], date_parser=custom_date_parser)
We are defining a custom date parsing function and then use it by parameter date_parser
.
DataFrame will be read as:
Date Value
0 2023-01-01 5
1 2023-02-01 15
2 2023-03-01 25
Combine Two columns into single datetime
With Pandas we can read separate columns - date and time into a single datetime column by using parameter - parse_dates
:
df = pd.read_csv('data/data.csv', parse_dates={'datetime': ['Date', 'Time']})
Parameter parse_dates
takes a mapping of the expected type and the columns: {'datetime': ['Date', 'Time']}
Conclusion
Pandas offers several ways to parse dates from CSV files. Pandas provides the flexibility to handle various date formats by automatic date recognition or custom parsing.
Converting dates during read_csv
operation might prevent errors and offer additional functionality on those columns.