How to Extract Capital Words from a Pandas DataFrame

If you're working with textual data in a Pandas DataFrame and want to find all words written in uppercase, there are several simple ways to do it using Python.

A "capital word" here means a word where every letter is uppercase (like JAVA or PYTHON).

This is useful when cleaning data, detecting acronyms, or filtering for entries that stand out in text data.

Example DataFrame

Here’s a sample DataFrame we’ll work with:

import pandas as pd

data = {
    'country': ['JAPAN', 'INDIA', 'CHINA', 'FRANCE'],
    'users': [30, 15, 25, 3],
    'city': ['TOKYO', 'Delhi', 'Beijing', 'PARIS']
}

df = pd.DataFrame(data)
df

This results in:

	country	users	city
0	JAPAN	30	TOKYO
1	INDIA	15	Delhi
2	CHINA	25	Beijing
3	FRANCE	3	PARIS

1. Use `str.isupper()` for Simple Matching

The easiest way is to convert each element to a string and check if it is uppercase:

import pandas as pd

caps = []

for col in df.columns:
    for val in df[col]:
        if str(val).isupper():
            caps.append(val)

print(caps)

Output:

['JAPAN', 'INDIA', 'CHINA', 'FRANCE', 'TOKYO', 'PARIS']

This method checks if the string version of each cell is all uppercase.

2. Use a Regex Pattern

If you prefer regular expressions, you can match uppercase words using a pattern like r'^[A-Z]+$':

import re

caps = []

for col in df.columns:
    for val in df[col]:
        if re.match(r'^[A-Z]+$', str(val)):
            caps.append(val)

print(caps)

This matches values made only of uppercase letters and excludes numbers or mixed-case strings.

['JAPAN', 'INDIA', 'CHINA', 'FRANCE', 'TOKYO', 'PARIS']

3. Apply Across Entire DataFrame

You can also use map() to test every cell at once and collect uppercase words:

caps = df.map(lambda x: str(x).isupper())

uppercase_values = df[caps].stack().tolist()
print(uppercase_values)

This returns the same list of uppercase words.

['JAPAN', 'TOKYO', 'INDIA', 'CHINA', 'FRANCE', 'PARIS']

4. Extract words starting with Capital Letters

We can also extract words staring with Capital letters but ending in normal case by regex:

import re

caps = []

for col in df.columns:
    for val in df[col]:
        if re.match(r'^[A-Z][a-z]+$', str(val)):
            caps.append(val)

print(caps)

result:

['Delhi', 'Beijing']

> Basic concepts

> Installations

> Series

> DataFrame

> Create

> Data Types

> Exercise

> Cheat Sheet

> Basic concepts

> Row

> Column

> Index

> MultiIndex

> Exercise

> Basic concepts

> read_csv()

> read_excel()

> Kaggle

> Exercise

> read_xml()

> read_json()

> to_csv()

> to_dict()

> to_json()

> Basic concepts

> groupby()

> Reshape

> melt()

> Exercise

> Pivot

> merge()

> Filter

> Basic concepts

> replace()

> split()

> Regex

> Search

> Exercise

> Find

> Basic concepts

> apply()

> aggfunc

> Convert

> count()

> Other

> Exercise

> map()

> Basic concepts

> Data Validation

> Data Cleaning

> Duplicate

> Time Series

> Pandas Error

> Get

> Basic concepts

> Styling

> Table

> Display

> DataIsBeautiful

> Beginners

> Data Science Projects

> Newsletter

Example DataFrame

1. Use str.isupper() for Simple Matching

2. Use a Regex Pattern

3. Apply Across Entire DataFrame

4. Extract words starting with Capital Letters

Resources

1. Use `str.isupper()` for Simple Matching