Here are several approaches to filter rows in Pandas DataFrame by date:
1) Filter rows between two dates
df[(df['date'] > '2019-12-01') & (df['date'] < '2019-12-31')]
2) Filter rows by date in index
df2.loc['2019-12-01':'2019-12-31']
3) Filter rows by date with Pandas query
df.query('20191201 < date < 20191231')
In the next section, you'll see several examples of how to apply the above approaches using simple examples. As a bonus you can find information how to filter rows per month, week, year, quarter etc.
The DataFrame for the examples below is available from Kaggle.
date | title |
---|---|
2019-05-30 | A Beginner’s Guide to Word Embedding with Gensim Word2Vec Model |
2019-05-30 | Hands-on Graph Neural Networks with PyTorch & PyTorch Geometric |
2019-05-30 | How to Use ggplot2 in Python |
2019-05-30 | Databricks: How to Save Files in CSV on Your Local Computer |
2019-05-30 | A Step-by-Step Implementation of Gradient Descent and Backpropagation |
If you like to learn more about how to read Kaggle as a Pandas DataFrame check this article: How to Search and Download Kaggle Dataset to Pandas DataFrame
Note: For all examples we need to convert column date
to datetime with method - to_datetime
:
df['date'] = pd.to_datetime(df['date'])
Option 1: Filter DataFrame by date in Pandas
To start with a simple example, let's filter the DataFrame by two dates:
- '2019-12-01'
- '2019-12-31'
We would like to get all rows which have date between those two dates.
So filtering the rows which meet the above requirement can be done:
df[(df['date'] > '2019-12-01') & (df['date'] < '2019-12-31')]
result is:
614 rows
Note: The format is YYYY-MM-DD
Note 2: the filtering above is exclusive. If you like to get inclusive filtering just add =
:
df[(df['date'] >= '2019-12-01') & (df['date'] <= '2019-12-31')]
Option 2: Filter DataFrame by date using the index
For this example we will change the original index of the DataFrame in order to have a index which is a date:
df = df.set_index('date')
Now having a DataFrame with index which is a datetime we can filter the rows by:
df.loc['2019-12-01':'2019-12-31']
the result is the same:
614 rows
2.1 MultiIndex filtering by date
The example below shows how to do filtering when you have a date in hierarchical index:
import datetime
df3 = df.set_index(['publication', 'date'])
df3[(df3.index.get_level_values('date') > '2019-12-01')]
result:
614 rows
Find all article written on Sunday-s:
df3[df3.index.get_level_values('date').dayofweek == 6]
Option 3: Filter rows by date and pd.Timestamp
You can use pd.Timestamp
in order to construct your dates and compare the value with each row. The syntax for creating a date with Pandas is:
pd.Timestamp(2019,12,1)
So the comparison will be:
df[(df['date'] > pd.Timestamp(2019,12,1)) & (df['date'] < pd.Timestamp(2019,12,31))]
You can pass a string like in option 1 and Pandas will do the conversion for you.
This can be useful when conversion is not explicit or you like to have control on the format.
Option 4: Pandas filter rows by date with Query
Pandas offers a simple way to query rows by method query
.
The syntax is minimalistic and self-explanatory:
df.query('20191201 < date < 20191231')
result:
614 rows
Option 5: Pandas filter rows by day, month, week, quarter etc
Finally let's check several useful and frequently used filters.
Filter rows by day
df[df['date'].dt.strftime('%Y-%m-%d') == '2019-12-30']
result:
165 rows
Filter rows by day of the week
df[df['date'].dt.dayofweek == 6]
result:
609 rows
Filter rows by month and year
df[df['date'].dt.to_period('M') == '2019-12']
result:
614 rows
Filter rows by month
df[df['date'].dt.month == 12]
result:
614 rows
Filter rows by quarter
df[df['date'].dt.to_period('Q') == '2019Q4']
or
df[df['date'].dt.to_period('Q') == '2019-12']
result:
1915 rows
Filter by year
df[df['date'].dt.to_period('Y') == '2019']
result:
6508 rows