How to Remove Timezone from a DateTime Column in Pandas
In this tutorial, we'll look at how to remove the timezone info from a datetime column in a Pandas DataFrame.
The short answer of this questions is:
(1) Use method .dt.tz_localize(None)
df['time_tz'].dt.tz_localize(None)
(2) With .dt.tz_convert(None)
df['time_tz'].dt.tz_convert(None)
This will help you when you need to compare with another column which doesn't have timezone info. Otherwise you will get an errors like:
TypeError: Timestamp subtraction must have the same timezones or no timezones
TypeError: Invalid comparison between dtype=datetime64[ns] and DatetimeArray
Let's show simple example on removing the timezone information in Pandas.
Starting with creating DataFrame like:
import pandas as pd
import datetime
import pytz
dates = ['2021-08-01', '2021-08-02', '2021-08-03']
timestamps = [
datetime.datetime(2021, 8, 1, 12, 30, 41, 775854, tzinfo=pytz.timezone('US/Pacific')),
datetime.datetime(2021, 8, 2, 12, 31, 12, 432523, tzinfo=pytz.timezone('US/Pacific')),
datetime.datetime(2021, 8, 3, 12, 29, 59, 123512, tzinfo=pytz.timezone('US/Pacific')),
]
df = pd.DataFrame({'start_date': dates, 'time': timestamps})
df
Data:
start_date | time | time_tz |
---|---|---|
2021-08-01 | 2021-08-01 12:30:41.775854 | 2021-08-01 13:23:41.775854-07:00 |
2021-08-02 | 2021-08-02 12:31:12.432523 | 2021-08-02 13:24:12.432523-07:00 |
2021-08-03 | 2021-08-03 12:29:59.123512 | 2021-08-03 13:22:59.123512-07:00 |
Now let's check the data stored in column time
:
0 2021-08-01 13:23:41.775854-07:00
1 2021-08-02 13:24:12.432523-07:00
2 2021-08-03 13:22:59.123512-07:00
Name: time, dtype: datetime64[ns, US/Pacific]
We have dtype
: datetime64[ns, US/Pacific]
If you like to compare this information with another datetime column without the timezone info you will get an errors like:
TypeError: Timestamp subtraction must have the same timezones or no timezones
TypeError: Invalid comparison between dtype=datetime64[ns] and DatetimeArray
If you compare the column without the timezone info you will not face an error.
df['time'] > pd.to_datetime(df['start_date'])
working with timezone info is causing the errors mentioned above.
Remove TimeZone from DateTime column in Pandas
.dt.tz_localize(None)
In order to drop the timezone info from this column you can use:
df['time_tz'].dt.tz_localize(None)
which will result into:
0 2021-08-01 13:23:41.775854
1 2021-08-02 13:24:12.432523
2 2021-08-03 13:22:59.123512
Name: time_tz, dtype: datetime64[ns]
Now you can use this column without getting the errors mentioned above.
.dt.tz_convert(None)
Another option to deal with TimeZone info is by using the method: .dt.tz_convert('UTC')
. You can delete or add the timezone info:
df['time_tz'].dt.tz_convert(None)
or
df['time_tz'].dt.tz_convert('UTC')