1. Overview

In this post, we'll explore how to select rows by list of values in Pandas. We'll discuss the basic indexing, which is needed in order to perform selection by values.

Row selection is also known as indexing. There are several ways to select rows by multiple values:

  • isin() - Pandas way - exact match from list of values
  • df.query() - SQL like way
  • df.loc + df.apply(lambda - when custom function is needed to be applied; more flexible way

2. Setup

We'll use the following DataFrame, which consists of several rows and columns:

import pandas as pd

df = pd.read_csv('https://raw.githubusercontent.com/softhints/Pandas-Tutorials/master/data/csv/extremes.csv')

DataFrame looks like:

Continent Highest point Elevation high Lowest point Elevation low
Asia Mount Everest 8848 Dead Sea −427
South America Aconcagua 6960 Laguna del Carbón −105
North America Denali 6198 Death Valley −86
Africa Mount Kilimanjaro 5895 Lake Assal −155
Europe Mount Elbrus 5642 Caspian Sea −28

Then let's suppose that we would like to search by different lists from single or multiple columns:

  • ['America', 'Europe', 'Asia']
  • ['America', 'Europe', 'Asia'] and range of numbers [5000, 8000]

3. Select rows by values with method isin()

The method pandas.DataFrame.isin is probably the most popular way for selection by exact match of list of values.

3.1. Positive selection

Let's find all rows which match exactly one of the next values in column - Continent:

  • ['America', 'Europe', 'Asia']
sel_continents = ['America', 'Europe', 'Asia']

df[df['Continent'].isin(sel_continents)]

result will be:

Continent Highest point Elevation high Lowest point Elevation low
0 Asia Mount Everest 8848 Dead Sea −427
4 Europe Mount Elbrus 5642 Caspian Sea −28
Note:

Since method `isin` works by exact match rows for America are not selected.

3.2. Negative selection

To get all rows which doesn't include list of values you can use operator: ~:

df[~df['Continent'].isin(sel_continents)]

result:

Continent Highest point Elevation high Lowest point Elevation low
1 South America Aconcagua 6960 Laguna del Carbón −105
2 North America Denali 6198 Death Valley −86
3 Africa Mount Kilimanjaro 5895 Lake Assal −155
5 Antarctica Vinson Massif 4892 Deep Lake, Vestfold Hills −50
6 Australia Puncak Jaya 4884 Lake Eyre −15

4. Query rows by values - method df.query()

Let's now look at how we can do a query based on a list of values. The method: pandas.DataFrame.query gives us the opportunity to select rows in SQL fashion.

4.1. Query by list of values

To select rows by query and multiple values use the next syntax:

sel_continents = ['America', 'Europe', 'Asia']
df.query('Continent in @sel_continents')

The resulted rows are:

Continent Highest point Elevation high Lowest point Elevation low
0 Asia Mount Everest 8848 Dead Sea −427
4 Europe Mount Elbrus 5642 Caspian Sea −28
× You can refer to variables in the environment by prefixing them with an ‘@’ character like @a + b.
Note:

For columns with spaces in their name, you can use backtick quoting:

df.query('B == `C C`')

which is equivalent to:

df[df.B == df['C C']]

4.2. Select rows by range of numbers

In case of range of values query method can be used as follows:

df.query('5000 < `Elevation high` < 6000')

result:

Continent Highest point Elevation high Lowest point Elevation low
0 Asia Mount Everest 8848 Dead Sea −427
4 Europe Mount Elbrus 5642 Caspian Sea −28

4.3. Select rows by multiple queries

Selecting by df.query() is considered to be the fastest way to get rows by values.

So it's suitable for searching based on multiple criterias:

sel_continents = ['America', 'Europe', 'Asia']
df.query('(Continent in @sel_continents) and (`Elevation high` > 8000)')

This returns:

Continent Highest point Elevation high Lowest point Elevation low
0 Asia Mount Everest 8848 Dead Sea −427

5. Select rows by values - df.loc + df.apply(lambda

Finally let's check a slower but more flexible way of indexing by list of values in Pandas.

5.1. Select rows by function

The basic selection by df.loc + df.apply(lambda does boolean indexing based on lambda function applied over the rows:

sel_continents = ['America', 'Europe', 'Asia']

df.loc[df.apply(lambda x: x['Continent'] in sel_continents, axis=1)]

result:

Continent Highest point Elevation high Lowest point Elevation low
0 Asia Mount Everest 8848 Dead Sea −427
4 Europe Mount Elbrus 5642 Caspian Sea −28
× To select rows by a custom function use the next syntax:
def some_function(x):
    if x in sel_continents:
        return True
    else:
        return False

sel_continents = ['America', 'Europe', 'Asia']    
df[df['Continent'].apply(some_function)]

Now let's say that you would like to perform case-insensitive search by list of values:

sel_continents = ['America', 'Europe', 'Asia']
sel_continents = [item.lower() for item in sel_continents]
df.loc[df.apply(lambda x: x['Continent'].lower() in sel_continents, axis=1)]

output:

Continent Highest point Elevation high Lowest point Elevation low
0 Asia Mount Everest 8848 Dead Sea −427
4 Europe Mount Elbrus 5642 Caspian Sea −28

5.3. Select rows by partial match based on list of values

Custom functions will help us to select rows by multiple values with partial match. For example getting list of rows which contains part of the next values:

  • ['America', 'Europe', 'Asia']
def some_function(x):
    flag = False
    for continent in sel_continents:
        if continent in x or continent == x:
            flag = True
    return flag

sel_continents = ['America', 'Europe', 'Asia']    
df[df['Continent'].apply(some_function)]

in the result we have the rows for America:

Continent Highest point Elevation high Lowest point Elevation low
0 Asia Mount Everest 8848 Dead Sea −427
1 South America Aconcagua 6960 Laguna del Carbón −105
2 North America Denali 6198 Death Valley −86
4 Europe Mount Elbrus 5642 Caspian Sea −28

6. Conclusion

In this article, we looked at different solutions for selection and indexing of rows based on values. We focused on the basics and followed some advanced techniques like SQL like query and selection by functions.

The code for the examples is available: on GitHub