In this short post we will see how to split a column into multiple parts and then extract only the last component or the last not null value. This is common with file paths, URLs, codes, delimiter-separated strings and dirty data.
For example, if your column contains:
"user/home/file.txt", you might want"file.txt".New York,US-USetc
Let's learn how to split a Pandas column and get the last part using simple, efficient methods.
We can use the following syntax to margin on a single axis column or row in Pandas:
(1) Use .str.split() and .str[-1]
df['filename'] = df['path'].str.split('/').str[-1]
(2) Use rsplit() with n=1
df['filename'] = df['path'].str.rsplit('/', n=1).str[1]
(3) 3: Get next to the last
df['filename'] = df['path'].str.rsplit('/', n=2).str[1]
Example DataFrame
Let’s suppose you have the following DataFrame:
import pandas as pd
df = pd.DataFrame({
'path': [
'home/user/data.csv',
'var/log/errors.log',
'tmp/cache/file.txt'
]
})
print(df)
Output:
| path | |
|---|---|
| 0 | home/user/data.csv |
| 1 | var/log/errors.log |
| 2 | tmp/cache/file.txt |
1: Use .str.split() and .str[-1]
The easiest way to extract the last element after splitting is to use str.split() followed by .str[-1]:
df['filename'] = df['path'].str.split('/').str[-1]
Result:
| path | filename | |
|---|---|---|
| 0 | home/user/data.csv | data.csv |
| 1 | var/log/errors.log | errors.log |
| 2 | tmp/cache/file.txt | file.txt |
Here, '/' is the separator, and .str[-1] selects the last item.
2: Use rsplit() with n=1
If you want better performance (especially on long strings), you can use .str.rsplit() with a limit:
df['filename'] = df['path'].str.rsplit('/', n=1).str[1]
rsplit() splits from the right, and n=1 ensures only one split is performed.
| path | filename | |
|---|---|---|
| 0 | home/user/data.csv | data.csv |
| 1 | var/log/errors.log | errors.log |
| 2 | tmp/cache/file.txt | file.txt |
3: Get next to the last
Finally if you need to get the next to the last one or X after the split we can control the parameter n=1:
df['filename'] = df['path'].str.rsplit('/', n=2).str[1]
rsplit() splits from the right, and n=2 takes the level we need.
| path | filename | |
|---|---|---|
| 0 | home/user/data.csv | user |
| 1 | var/log/errors.log | log |
| 2 | tmp/cache/file.txt | cache |
Summary
To extract the last portion of a string in a Pandas column:
- Use
str.split()with.str[-1]for simple cases. - Use
str.rsplit()withn=1for slightly better performance on long text.
This technique is especially useful for handling file paths, URLs, and delimiter-separated codes.
Resources
- Pandas
str.split()Documentation
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.split.html - Pandas
str.rsplit()Documentation
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.rsplit.html - Get last "column" after .str.split() operation on column in pandas DataFrame