How to Extract Everything Before or After with Regex in Pandas

To plot two variables on two sides of Y-axes, we can plot in two steps:

  • '(.*?)\n'
  • '.+?(?=\n)'

Steps to extract everything until/after

Below are the steps which I usually follow for regex extraction in Pandas

  • analyse the data from which I will extract
  • clean the data
  • choose pandas method - split, extract etc
  • define regex pattern
  • create new column(s)

Data

Let's create simple sample DataFrame to be used for regex extraction:

from faker import Faker
import pandas as pd

Faker.seed(0)
fake = Faker()
addr = []
for _ in range(5):
    addr.append(fake.address())
df = pd.DataFrame({'address':addr})
address
0 48764 Howard Forge Apt. 421\nVanessaside, VT 79393
1 PSC 4115, Box 7815\nAPO AA 41945
2 778 Brown Plaza\nNorth Jenniferfurt, VT 88077
3 3513 John Divide Suite 115\nRodriguezside, LA 93111
4 398 Wallace Ranch Suite 593\nIvanburgh, AZ 80818

Example 1 - Captcharing group and characters

Extract everything in Pandas column up to new line

df['address'].str.extract('(.*?)\n')

result:

0
0 48764 Howard Forge Apt. 421
1 PSC 4115, Box 7815
2 778 Brown Plaza
3 3513 John Divide Suite 115
4 398 Wallace Ranch Suite 593

Example 2 - Non captcharing groups

Extract everything in Pandas column up to new line

df['address'].str.extract('(.+)?(?=\n)')

result:

0
0 48764 Howard Forge Apt. 421
1 PSC 4115, Box 7815
2 778 Brown Plaza
3 3513 John Divide Suite 115
4 398 Wallace Ranch Suite 593

Output

Resources