Working with text data in Pandas, you may need to split strings based on a n-th occuramce of delimiter and extract specific parts.

This is useful for parsing URLs, file paths, or structured data. Pandas provides efficient ways to handle such operations with str.split() and expand=True.

(1) Split the string at the nth occurrence and keep the first part

df['column_name'].str.split('-', n=2).str[0]

(2) Extract the nth part from a split string

df['column_name'].str.split('-', expand=True)[2]

1. Sample data

import pandas as pd

data = ['https://example.com/search?q=avatar', 
        'https://example.com/profile/avatar', 
        'https://example.com/map']
df = pd.DataFrame({'text': data})

df

data looks like:

text first_three
0 https://example.com/search?q=avatar https:
1 https://example.com/profile/avatar https:
2 https://example.com/map https:

2. Splitting a String at the n-th Occurrence

To split a string only at the n-th occurrence of a delimiter, use the n parameter of str.split(). We can extract the last part of the URL by:

df['text'].str.split('/', n=3, expand=True)[3]

Output:

0    search?q=avatar
1     profile/avatar
2                map
Name: 3, dtype: object
  • n=3 ensures only 3 splits occur
  • [3] extracts the 3rd part of the split

below you can find the resulted dataframe from the split:

0 1 2 3
0 https: example.com search?q=avatar
1 https: example.com profile/avatar
2 https: example.com map

3. Extracting the nth Element from a Split String

If you need to extract the nth part of the split string, use expand=True to create multiple columns.

df[['protocol', 'empty', 'domain', 'method', 'param']] = df['text'].str.split('/', expand=True)
df

Output:

text first_three protocol empty domain method param
0 https://example.com/search?q=avatar https: https: example.com search?q=avatar None
1 https://example.com/profile/avatar https: https: example.com profile avatar
2 https://example.com/map https: https: example.com map None

4. Keeping Only the Last Two Parts of a Split String

For cases like domain extraction (example.com, www.example.com), keep only the last two parts. Or keep the domain and the method from URL:

df['text'].apply(lambda x: '/'.join(x.split('/')[-2:]))

Output:

0    example.com/search?q=avatar
1                 profile/avatar
2                example.com/map
Name: text, dtype: object
  • x.split('/')[-2:] keeps only the last two elements.
  • '/'.join(...) reconstructs the truncated string.

5. Conclusion

Pandas provides multiple ways to split strings based on the nth occurrence of a delimiter. Whether you need to keep a portion of the string, extract a specific element, or retain only the last few parts, str.split() and apply() are effective tools for data transformation.


Resources