How to Fix: ValueError: Trailing Data - Pandas and JSON

In this tutorial, we'll see how to solve a common Pandas error – ValueError: Trailing data. We get this error from the Pandas read_json() method when we try to load a JSON or JSON lines file.

To fix ValueError: Trailing data we can try:

(1) Add parameter - lines=True

pd.read_json('data.json', lines=True)

(2) Evaluate the file line by line

with open("data.json") as f:
	text = f.readlines()

data = [eval(line) for line in text]
df = pd.DataFrame(data)

(3) Convert JSONl to JSON with jq

jq -s '.' data.json > out.json

Image below summarize the errors and some of the fixes:

1. Reasons - ValueError: Trailing data

In Pandas and Python the error ValueError: Trailing data suggests that the data we are trying to load into a DataFrame is not properly formatted JSON data.

There are a few common reasons why this error may occur.

JSON lines

If we try to read JSON lines file as normal JSON file without using lines=True:

Example JSON file:

{"message": "Too Many Requests", "error": 429}
{"message": "Too Many Requests", "error": 429}

characters outside the JSON data

If there are any characters outside of the JSON data, they will cause:

ValueError: Trailing data error.

Example JSON file:

{"message": "Too Many Requests", "error": 429}2
{"message": "Too Many Requests", "error": 429}

Inconsistent or incorrectly JSON data

If JSON data is not properly formatted with correct syntax, including:

  • quotes - single or double quotes
  • values
  • commas separating elements

Data should be consistent using only double or single quotes.

Examples:

 {
"message": "Too Many Requests",
"error": 429
 }
 {
"message": "Too Many Requests",
"error": 429
 }

In this example data is not in the JSON array ([]) and quotes are missing.

2. Solve ValueError: Trailing data - JSON lines

Depending on the case we can apply different solutions for the error. For example loading JSON lines file can be solved by adding lines=True:

import pandas as pd
pd.read_json('data.json', lines=True)

This will solve the error and load the file:

{"message": "Too Many Requests", "error": 429}
{"message": "Too Many Requests", "error": 429}

as DataFrame:

message error
0 Too Many Requests 429
1 Too Many Requests 429

3. ValueError: Trailing data - detect errors

In order to detect problematic JSON records or lines we can use the following code:

import pandas as pd

with open('data/data_1.json') as f:
	content = f.readlines()

data = [eval(c) for c in content]
df = pd.DataFrame(data)
df

if we try to load the JSON content of:

{"message": "Too Many Requests", "error": 429}2
{"message": "Too Many Requests", "error": 429}

We will get the following error:

{"message": "Too Many Requests", "error": 429}2
                                    			  ^
SyntaxError: invalid syntax

So we can extract all problematic records and fix them. To skip problematic values check the next section.

4. Handle JSON errors

To skip errors in a JSON file we can read the file line by line. We can parse each line and append only good ones.

For a JSON lines file with 3 rows and one of them is broken:

{"message": "Unknown Error", "error": 501}
{"message"3: "Unknown Error", "error": 502}
{"message": "Unknown Error", "error": 503}

We can use the following code in order to read the JSON file and skip problematic rows by:

import pandas as pd

with open('data/data_1.json') as f:
	json_data = f.readlines()
    
for row in json_data:
	try:
  	  data = json.loads(row)
	except Exception as e:
  	  pass
data

This reads the corrupted JSON file into a DataFrame:

message error
0 Unknown Error 501
1 Unknown Error 503

As we can see line:

{"message"3: "Unknown Error", "error": 502}

Is not present in the final DataFrame.

5. ValueError: Trailing data - more fixes

You can also try to solve the errors also by using the following parameters:

pd.read_json('data.json', orient='records')
pd.read_json('data.json', orient='split')
pd.read_json('Data.json', encoding = 'utf-8-sig')

This might be helpful if you face more errors after fixing the original one:

  • ValueError: Expected object or value
  • error: json.decoder.JSONDecodeError: Extra data: line 1 column 112 (char 10)

Conclusion

To sum up, this article shows how using proper parameters for read_json() method can solve the "ValueError: Trailing data" error.

We covered multiple examples and solutions for the error.

If you have an interesting case or problem which is not solved by this article - please share it in the comments section below. Thanks!