How to Read Data from Text File Into Pandas?

The following step-by-step example shows how to load data from a text file into Pandas. We can use:

read_csv() function
- it handles various delimiters, including commas, tabs, and spaces
pd.read_fwf()
- read fixed-width formatted lines into DataFrame

Let's cover both cases into examples:

read_csv - delimited file

To read a text into Pandas DataFrame we can use method read_csv() and provide the separator:

import pandas as pd
df = pd.read_csv('data.txt', sep=',')

Where sep argument specifies the separator. Separator can be continuous - '\s+'.

Other useful parameters are:

header=None - does the file contain headers
names=["a", "b", "c"] - the column names
skiprows=[0,1] - skip rows
index_col=True - use index from the file

read_fwf - fixed-width file

To read data from a fixed-width file in Pandas we can use read_fwf. Suppose we have a file 'data.txt' like:

John   35	123 A
Jane D 28	45  E
Bob	42	678 D

We can see that columns are aligned by position rather than separated by delimiters. Since the columns are separated by fixed widths:

first column - 7 chars
- (0, 7)
second - 2 chars
- (7, 9)

we can't use read_csv() with a separator. Instead we will:

specify the column widths
read the fixed-width file into a DataFrame

import pandas as pd

colspecs = [(0, 7), (7, 9), (13, 17), (17, 18)]

df = pd.read_fwf('data.txt', colspecs=colspecs, header=None, names=['name', 'age', 'score', 'class'])
df

The result is:

	name	age	score	class
0	John	35	123	A
1	Jane D	28	45	E
2	Bob	42	678	D

Pandas read text file line by line

To read a text file line by line into a pandas DataFrame we can:

create an empty DataFrame
create an iterator to read the file line by line
iterate over the iterator and append each line to the DataFrame
reset the index of the DataFrame

import pandas as pd

df = pd.DataFrame()

iterator = pd.read_csv('data.txt', header=None, iterator=True, chunksize=1)

for chunk in iterator:
	df = df.append(chunk)

df = df.reset_index(drop=True)

Pandas read text file with pattern

As an alternative we can use list comprehension to read files and filter it.

Let's work with the following file:

John   35	123 A
Pattern
Jane D 28	45  E
Bob	42	678 D
End of pattern

We can find the numbers of the start and end lines by matching pattern:

a=[]
with open('data.txt',"r") as r:
	a=r.readlines()
a=[x.replace("\n","") for x in a]
start = a.index("Pattern") +1
end = a.index("End of pattern")
start, end

After that we can read the file with read_fwf or read_csv and filter the lines:

import pandas as pd

df = pd.read_fwf('data.txt', colspecs=colspecs, header=None, names=['name', 'age', 'score', 'class'])

df = df[start:end]

Which give us:

	name	age	score	class
2	Jane D	28	45	E
3	Bob	42	678	D

Summary

We've seen three different ways of reading and loading text file into Pandas DataFrame. We covered how to read delimited or fixed-length files with Pandas.

We also saw how to read text files line by line and how to filter csv or text file by pattern.

> Basic concepts

> Installations

> Series

> DataFrame

> Create

> Data Types

> Exercise

> Cheat Sheet

> Basic concepts

> Row

> Column

> Index

> MultiIndex

> Exercise

> Basic concepts

> read_csv()

> read_excel()

> Kaggle

> Exercise

> read_xml()

> read_json()

> to_csv()

> to_dict()

> to_json()

> Basic concepts

> groupby()

> Reshape

> melt()

> Exercise

> Pivot

> merge()

> Filter

> Basic concepts

> replace()

> split()

> Regex

> Search

> Exercise

> Find

> Basic concepts

> apply()

> aggfunc

> Convert

> count()

> Other

> Exercise

> map()

> Basic concepts

> Data Validation

> Data Cleaning

> Duplicate

> Time Series

> Pandas Error

> Get

> Basic concepts

> Styling

> Table

> Display

> DataIsBeautiful

> Beginners

> Data Science Projects

> Newsletter

read_csv - delimited file

read_fwf - fixed-width file

Pandas read text file line by line

Pandas read text file with pattern

Summary