How To Split Column by Multiple Characters with Regex in Pandas

To split Pandas column by multiple characters we can use complex regex pattern as:

  • df['address'].str.split('; |, |\n', expand=True)
  • df['address'].str.extract(r'(.*)\n(.*)')

Steps to split column in Pandas

  • Import matplotlib library
  • Create DataFrame with correlated data
  • Create the figure and axes object - fig, ax = plt.subplots()
  • Plot the first variable on x and left y axes
  • Plot the second variable on x and secondary y axes

More information can be found: DataFrame.plot - secondary_y

Data

Suppose we have DataFrame with Fake address data:

from faker import Faker
import pandas as pd

Faker.seed(0)
fake = Faker()
addr = []
for _ in range(5):
    addr.append(fake.address())
df = pd.DataFrame({'address':addr})

Data should be something like:

address
0 48764 Howard Forge Apt. 421\nVanessaside, VT 79393
1 PSC 4115, Box 7815\nAPO AA 41945
2 778 Brown Plaza\nNorth Jenniferfurt, VT 88077
3 3513 John Divide Suite 115\nRodriguezside, LA 93111
4 398 Wallace Ranch Suite 593\nIvanburgh, AZ 80818

Check resources to find out how to create more fake data with Pandas.

Example 1 - str.split

We can use method str.split with parameter expand=True to split Pandas column by multiple separators and expand content into new columns:

df['address'].str.split('; |, |\n', expand=True)

output:

0 1 2
0 48764 Howard Forge Apt. 421 Vanessaside VT 79393
1 PSC 4115 Box 7815 APO AA 41945
2 778 Brown Plaza North Jenniferfurt VT 88077
3 3513 John Divide Suite 115 Rodriguezside LA 93111
4 398 Wallace Ranch Suite 593 Ivanburgh AZ 80818

Example 2 - str.extract

As alternative we can use str.extract and capturing groups to split string into columns as follow:

df['address'].str.extract(r'(.*)\n(.*)\,(.*)')

output:

0 1 2
0 48764 Howard Forge Apt. 421 Vanessaside VT 79393
1 NaN NaN NaN
2 778 Brown Plaza North Jenniferfurt VT 88077
3 3513 John Divide Suite 115 Rodriguezside LA 93111
4 398 Wallace Ranch Suite 593 Ivanburgh AZ 80818

Output

Resources