Here are several approaches to filter rows in Pandas DataFrame by date:

1) Filter rows between two dates

df[(df['date'] > '2019-12-01') & (df['date'] < '2019-12-31')]

2) Filter rows by date in index

df2.loc['2019-12-01':'2019-12-31']

3) Filter rows by date with Pandas query

df.query('20191201 < date < 20191231')

In the next section, you'll see several examples of how to apply the above approaches using simple examples. As a bonus you can find information how to filter rows per month, week, year, quarter etc.

The DataFrame for the examples below is available from Kaggle.

date title
2019-05-30 A Beginner’s Guide to Word Embedding with Gensim Word2Vec Model
2019-05-30 Hands-on Graph Neural Networks with PyTorch & PyTorch Geometric
2019-05-30 How to Use ggplot2 in Python
2019-05-30 Databricks: How to Save Files in CSV on Your Local Computer
2019-05-30 A Step-by-Step Implementation of Gradient Descent and Backpropagation

If you like to learn more about how to read Kaggle as a Pandas DataFrame check this article: How to Search and Download Kaggle Dataset to Pandas DataFrame

Note: For all examples we need to convert column date to datetime with method - to_datetime:

df['date'] = pd.to_datetime(df['date'])

Option 1: Filter DataFrame by date in Pandas

To start with a simple example, let's filter the DataFrame by two dates:

  • '2019-12-01'
  • '2019-12-31'

We would like to get all rows which have date between those two dates.

So filtering the rows which meet the above requirement can be done:

df[(df['date'] > '2019-12-01') & (df['date'] < '2019-12-31')]

result is:

614 rows

Note: The format is YYYY-MM-DD

Note 2: the filtering above is exclusive. If you like to get inclusive filtering just add =:

df[(df['date'] >= '2019-12-01') & (df['date'] <= '2019-12-31')]

Option 2: Filter DataFrame by date using the index

For this example we will change the original index of the DataFrame in order to have a index which is a date:

df = df.set_index('date')

Now having a DataFrame with index which is a datetime we can filter the rows by:

df.loc['2019-12-01':'2019-12-31']

the result is the same:

614 rows

2.1 MultiIndex filtering by date

The example below shows how to do filtering when you have a date in hierarchical index:

import datetime
df3 = df.set_index(['publication', 'date'])
df3[(df3.index.get_level_values('date') > '2019-12-01')]

result:

614 rows

Find all article written on Sunday-s:

df3[df3.index.get_level_values('date').dayofweek == 6]

Option 3: Filter rows by date and pd.Timestamp

You can use pd.Timestamp in order to construct your dates and compare the value with each row. The syntax for creating a date with Pandas is:

pd.Timestamp(2019,12,1)

So the comparison will be:

df[(df['date'] > pd.Timestamp(2019,12,1)) & (df['date'] < pd.Timestamp(2019,12,31))]

You can pass a string like in option 1 and Pandas will do the conversion for you.

This can be useful when conversion is not explicit or you like to have control on the format.

Option 4: Pandas filter rows by date with Query

Pandas offers a simple way to query rows by method query.

The syntax is minimalistic and self-explanatory:

df.query('20191201 < date < 20191231')

result:

614 rows

Option 5: Pandas filter rows by day, month, week, quarter etc

Finally let's check several useful and frequently used filters.

Filter rows by day

df[df['date'].dt.strftime('%Y-%m-%d') == '2019-12-30']

result:

165 rows

Filter rows by day of the week

df[df['date'].dt.dayofweek == 6]

result:

609 rows

Filter rows by month and year

df[df['date'].dt.to_period('M') == '2019-12']

result:

614 rows

Filter rows by month

df[df['date'].dt.month == 12]

result:

614 rows

Filter rows by quarter

df[df['date'].dt.to_period('Q') == '2019Q4']

or

df[df['date'].dt.to_period('Q') == '2019-12']

result:

1915 rows

Filter by year

df[df['date'].dt.to_period('Y') == '2019']

result:

6508 rows

Resources