How To Split Column by Multiple Characters with Regex in Pandas
To split Pandas column by multiple characters we can use complex regex pattern as:
df['address'].str.split('; |, |\n', expand=True)
df['address'].str.extract(r'(.*)\n(.*)')
Steps to split column in Pandas
- Import matplotlib library
- Create DataFrame with correlated data
- Create the figure and axes object -
fig, ax = plt.subplots()
- Plot the first variable on x and left y axes
- Plot the second variable on x and secondary y axes
More information can be found: DataFrame.plot - secondary_y
Data
Suppose we have DataFrame with Fake address data:
from faker import Faker
import pandas as pd
Faker.seed(0)
fake = Faker()
addr = []
for _ in range(5):
addr.append(fake.address())
df = pd.DataFrame({'address':addr})
Data should be something like:
address | |
---|---|
0 | 48764 Howard Forge Apt. 421\nVanessaside, VT 79393 |
1 | PSC 4115, Box 7815\nAPO AA 41945 |
2 | 778 Brown Plaza\nNorth Jenniferfurt, VT 88077 |
3 | 3513 John Divide Suite 115\nRodriguezside, LA 93111 |
4 | 398 Wallace Ranch Suite 593\nIvanburgh, AZ 80818 |
Check resources to find out how to create more fake data with Pandas.
Example 1 - str.split
We can use method str.split
with parameter expand=True
to split Pandas column by multiple separators and expand content into new columns:
df['address'].str.split('; |, |\n', expand=True)
output:
0 | 1 | 2 | |
---|---|---|---|
0 | 48764 Howard Forge Apt. 421 | Vanessaside | VT 79393 |
1 | PSC 4115 | Box 7815 | APO AA 41945 |
2 | 778 Brown Plaza | North Jenniferfurt | VT 88077 |
3 | 3513 John Divide Suite 115 | Rodriguezside | LA 93111 |
4 | 398 Wallace Ranch Suite 593 | Ivanburgh | AZ 80818 |
Example 2 - str.extract
As alternative we can use str.extract
and capturing groups to split string into columns as follow:
df['address'].str.extract(r'(.*)\n(.*)\,(.*)')
output:
0 | 1 | 2 | |
---|---|---|---|
0 | 48764 Howard Forge Apt. 421 | Vanessaside | VT 79393 |
1 | NaN | NaN | NaN |
2 | 778 Brown Plaza | North Jenniferfurt | VT 88077 |
3 | 3513 John Divide Suite 115 | Rodriguezside | LA 93111 |
4 | 398 Wallace Ranch Suite 593 | Ivanburgh | AZ 80818 |