How to Select Rows by List of Values in Pandas DataFrame

1. Overview

In this post, we'll explore how to select rows by list of values in Pandas. We'll discuss the basic indexing, which is needed in order to perform selection by values.

Row selection is also known as indexing. There are several ways to select rows by multiple values:

isin() - Pandas way - exact match from list of values
df.query() - SQL like way
df.loc + df.apply(lambda - when custom function is needed to be applied; more flexible way

2. Setup

We'll use the following DataFrame, which consists of several rows and columns:

import pandas as pd

df = pd.read_csv('https://raw.githubusercontent.com/softhints/Pandas-Tutorials/master/data/csv/extremes.csv')

DataFrame looks like:

Continent	Highest point	Elevation high	Lowest point	Elevation low
Asia	Mount Everest	8848	Dead Sea	−427
South America	Aconcagua	6960	Laguna del Carbón	−105
North America	Denali	6198	Death Valley	−86
Africa	Mount Kilimanjaro	5895	Lake Assal	−155
Europe	Mount Elbrus	5642	Caspian Sea	−28

Then let's suppose that we would like to search by different lists from single or multiple columns:

['America', 'Europe', 'Asia']
['America', 'Europe', 'Asia'] and range of numbers [5000, 8000]

3. Select rows by values with method isin()

The method pandas.DataFrame.isin is probably the most popular way for selection by exact match of list of values.

3.1. Positive selection

Let's find all rows which match exactly one of the next values in column - Continent:

['America', 'Europe', 'Asia']

sel_continents = ['America', 'Europe', 'Asia']

df[df['Continent'].isin(sel_continents)]

result will be:

	Continent	Highest point	Elevation high	Lowest point	Elevation low
0	Asia	Mount Everest	8848	Dead Sea	−427
4	Europe	Mount Elbrus	5642	Caspian Sea	−28

Note:

Since method `isin` works by exact match rows for America are not selected.

3.2. Negative selection

To get all rows which doesn't include list of values you can use operator: ~:

df[~df['Continent'].isin(sel_continents)]

result:

	Continent	Highest point	Elevation high	Lowest point	Elevation low
1	South America	Aconcagua	6960	Laguna del Carbón	−105
2	North America	Denali	6198	Death Valley	−86
3	Africa	Mount Kilimanjaro	5895	Lake Assal	−155
5	Antarctica	Vinson Massif	4892	Deep Lake, Vestfold Hills	−50
6	Australia	Puncak Jaya	4884	Lake Eyre	−15

4. Query rows by values - method df.query()

Let's now look at how we can do a query based on a list of values. The method: pandas.DataFrame.query gives us the opportunity to select rows in SQL fashion.

4.1. Query by list of values

To select rows by query and multiple values use the next syntax:

sel_continents = ['America', 'Europe', 'Asia']
df.query('Continent in @sel_continents')

The resulted rows are:

	Continent	Highest point	Elevation high	Lowest point	Elevation low
0	Asia	Mount Everest	8848	Dead Sea	−427
4	Europe	Mount Elbrus	5642	Caspian Sea	−28

× You can refer to variables in the environment by prefixing them with an ‘@’ character like @a + b.

Note:
For columns with spaces in their name, you can use backtick quoting:

df.query('B == `C C`')

which is equivalent to:

df[df.B == df['C C']]

4.2. Select rows by range of numbers

In case of range of values query method can be used as follows:

df.query('5000 < `Elevation high` < 6000')

result:

Continent Highest point Elevation high Lowest point Elevation low

0 Asia Mount Everest 8848 Dead Sea −427

4 Europe Mount Elbrus 5642 Caspian Sea −28

4.3. Select rows by multiple queries

Selecting by df.query() is considered to be the fastest way to get rows by values.

So it's suitable for searching based on multiple criterias:

sel_continents = ['America', 'Europe', 'Asia'] df.query('(Continent in @sel_continents) and (`Elevation high` > 8000)')

This returns:

Continent Highest point Elevation high Lowest point Elevation low

0 Asia Mount Everest 8848 Dead Sea −427

5. Select rows by values - df.loc + df.apply(lambda

Finally let's check a slower but more flexible way of indexing by list of values in Pandas.

5.1. Select rows by function

The basic selection by df.loc + df.apply(lambda does boolean indexing based on lambda function applied over the rows:

sel_continents = ['America', 'Europe', 'Asia'] df.loc[df.apply(lambda x: x['Continent'] in sel_continents, axis=1)]

result:

Continent Highest point Elevation high Lowest point Elevation low

0 Asia Mount Everest 8848 Dead Sea −427

4 Europe Mount Elbrus 5642 Caspian Sea −28

× To select rows by a custom function use the next syntax:

def some_function(x): if x in sel_continents: return True else: return False sel_continents = ['America', 'Europe', 'Asia'] df[df['Continent'].apply(some_function)]

5.2. Case-insensitive search

Now let's say that you would like to perform case-insensitive search by list of values:

sel_continents = ['America', 'Europe', 'Asia'] sel_continents = [item.lower() for item in sel_continents] df.loc[df.apply(lambda x: x['Continent'].lower() in sel_continents, axis=1)]

output:

Continent Highest point Elevation high Lowest point Elevation low

0 Asia Mount Everest 8848 Dead Sea −427

4 Europe Mount Elbrus 5642 Caspian Sea −28

5.3. Select rows by partial match based on list of values

Custom functions will help us to select rows by multiple values with partial match. For example getting list of rows which contains part of the next values:

['America', 'Europe', 'Asia']

def some_function(x): flag = False for continent in sel_continents: if continent in x or continent == x: flag = True return flag sel_continents = ['America', 'Europe', 'Asia'] df[df['Continent'].apply(some_function)]

in the result we have the rows for America:

Continent Highest point Elevation high Lowest point Elevation low

0 Asia Mount Everest 8848 Dead Sea −427

1 South America Aconcagua 6960 Laguna del Carbón −105

2 North America Denali 6198 Death Valley −86

4 Europe Mount Elbrus 5642 Caspian Sea −28

6. Conclusion

In this article, we looked at different solutions for selection and indexing of rows based on values. We focused on the basics and followed some advanced techniques like SQL like query and selection by functions.

The code for the examples is available: on GitHub

> Basic concepts

> Installations

> Series

> DataFrame

> Create

> Data Types

> Exercise

> Cheat Sheet

> Basic concepts

> Row

> Column

> Index

> MultiIndex

> Exercise

> Basic concepts

> read_csv()

> read_excel()

> Kaggle

> Exercise

> read_xml()

> read_json()

> to_csv()

> to_dict()

> to_json()

> Basic concepts

> groupby()

> Reshape

> melt()

> Exercise

> Pivot

> merge()

> Filter

> Basic concepts

> replace()

> split()

> Regex

> Search

> Exercise

> Find

> Basic concepts

> apply()

> aggfunc

> Convert

> count()

> Other

> Exercise

> map()

> Basic concepts

> Data Validation

> Data Cleaning

> Duplicate

> Time Series

> Pandas Error

> Get

> Basic concepts

> Styling

> Table

> Display

> DataIsBeautiful

> Beginners

> Data Science Projects

> Newsletter

1. Overview

2. Setup

3. Select rows by values with method isin()

3.1. Positive selection

3.2. Negative selection

4. Query rows by values - method df.query()

4.1. Query by list of values

4.2. Select rows by range of numbers

4.3. Select rows by multiple queries

5. Select rows by values - df.loc + df.apply(lambda

5.1. Select rows by function

5.2. Case-insensitive search

5.3. Select rows by partial match based on list of values

6. Conclusion

5. Select rows by values - `df.loc` + `df.apply(lambda`