If you're working with textual data in a Pandas DataFrame and want to find all words written in uppercase, there are several simple ways to do it using Python.
A "capital word" here means a word where every letter is uppercase (like JAVA or PYTHON).
This is useful when cleaning data, detecting acronyms, or filtering for entries that stand out in text data.
Example DataFrame
Here’s a sample DataFrame we’ll work with:
import pandas as pd
data = {
'country': ['JAPAN', 'INDIA', 'CHINA', 'FRANCE'],
'users': [30, 15, 25, 3],
'city': ['TOKYO', 'Delhi', 'Beijing', 'PARIS']
}
df = pd.DataFrame(data)
df
This results in:
| country | users | city | |
|---|---|---|---|
| 0 | JAPAN | 30 | TOKYO |
| 1 | INDIA | 15 | Delhi |
| 2 | CHINA | 25 | Beijing |
| 3 | FRANCE | 3 | PARIS |
1. Use str.isupper() for Simple Matching
The easiest way is to convert each element to a string and check if it is uppercase:
import pandas as pd
caps = []
for col in df.columns:
for val in df[col]:
if str(val).isupper():
caps.append(val)
print(caps)
Output:
['JAPAN', 'INDIA', 'CHINA', 'FRANCE', 'TOKYO', 'PARIS']
This method checks if the string version of each cell is all uppercase.
2. Use a Regex Pattern
If you prefer regular expressions, you can match uppercase words using a pattern like r'^[A-Z]+$':
import re
caps = []
for col in df.columns:
for val in df[col]:
if re.match(r'^[A-Z]+$', str(val)):
caps.append(val)
print(caps)
This matches values made only of uppercase letters and excludes numbers or mixed-case strings.
['JAPAN', 'INDIA', 'CHINA', 'FRANCE', 'TOKYO', 'PARIS']
3. Apply Across Entire DataFrame
You can also use map() to test every cell at once and collect uppercase words:
caps = df.map(lambda x: str(x).isupper())
uppercase_values = df[caps].stack().tolist()
print(uppercase_values)
This returns the same list of uppercase words.
['JAPAN', 'TOKYO', 'INDIA', 'CHINA', 'FRANCE', 'PARIS']
4. Extract words starting with Capital Letters
We can also extract words staring with Capital letters but ending in normal case by regex:
import re
caps = []
for col in df.columns:
for val in df[col]:
if re.match(r'^[A-Z][a-z]+$', str(val)):
caps.append(val)
print(caps)
result:
['Delhi', 'Beijing']