To search for a string in all columns of a Pandas DataFrame we can use two different ways:

(1) Lambda and str.contains

df.apply(lambda row: row.astype(str).str.contains('data').any(), axis=1)

(2) np.column_stack + str.contains

import numpy as np
mask = np.column_stack([df[col].astype(str).str.contains("data", na=False) for col in df])
df.loc[mask.any(axis=1)]

Let's check two examples on how to use the above techniques in practice.

To start with DataFrame like:

from IPython.display import HTML
import pandas as pd

df = pd.DataFrame({
    'id':[1,2,3,4],
    'name':['Softhints\nLinux', 'dataplotplus', 'DataScientyst\nPandas', 'test'],
    'url':['https://www.softhints.com', 'https://dataplotplus.com/', 'https://datascientyst.com', 'test\data']
})

which has this data:

id name url
1 Softhints\nLinux https://www.softhints.com
2 dataplotplus https://dataplotplus.com/
3 DataScientyst\nPandas https://datascientyst.com
4 test test\data

Search whole DataFrame with lambda and str.contains

Searching with lambda and str.contains is straightforward:

df.apply(lambda row: row.astype(str).str.contains('data').any(), axis=1)

The lambda will iterate over all rows. Then we will convert the values to string - in order to avoid errors.

The converted data will be searched for a string pattern - in this case data.

Without method any we will get the search result for each value:

id name url
False False False
False True True
False False True
False False True

So method any will return True if there is at least one True value per row.

So the final output is:

0 False
1 True
2 True
3 True
dtype: bool

Search whole DataFrame with numpy and str.contains

As an alternative solution you can use the Numpy method - column_stack to find all values in all columns. This solution is faster than the previous one.

import numpy as np
mask = np.column_stack([df[col].astype(str).str.contains("data", na=False) for col in df])
df.loc[mask.any(axis=1)]

How does it work?

So the code:

[df[col].astype(str).str.contains("data", na=False) for col in df]

will iterate over all columns and then will convert values to string. Then we perform a search for a given value.

The result would be:

[0    False
 1    False
 2    False
 3    False
 Name: id, dtype: bool,
 0    False
 1     True
 2    False
 3    False
 Name: name, dtype: bool,
 0    False
 1     True
 2     True
 3     True
 Name: url, dtype: bool]

Method np.column_stack will convert the above result into array:

array([[False, False, False],
       [False,  True,  True],
       [False, False,  True],
       [False, False,  True]])

Again we are going to use method any in order to return a single True or False value per row. The method loc will return only the rows which contain the searched value.

df.loc[mask.any(axis=1)]

Resources