In this tutorial, we'll see how to solve a common Pandas error – ValueError: Trailing data
. We get this error from the Pandas read_json()
method when we try to load a JSON or JSON lines file.
To fix ValueError: Trailing data
we can try:
(1) Add parameter - lines=True
pd.read_json('data.json', lines=True)
(2) Evaluate the file line by line
with open("data.json") as f:
text = f.readlines()
data = [eval(line) for line in text]
df = pd.DataFrame(data)
(3) Convert JSONl to JSON with jq
jq -s '.' data.json > out.json
Image below summarize the errors and some of the fixes:
1. Reasons - ValueError: Trailing data
In Pandas and Python the error ValueError: Trailing data
suggests that the data we are trying to load into a DataFrame is not properly formatted JSON data.
There are a few common reasons why this error may occur.
JSON lines
If we try to read JSON lines file as normal JSON file without using lines=True
:
Example JSON file:
{"message": "Too Many Requests", "error": 429}
{"message": "Too Many Requests", "error": 429}
characters outside the JSON data
If there are any characters outside of the JSON data, they will cause:
ValueError: Trailing data error.
Example JSON file:
{"message": "Too Many Requests", "error": 429}2
{"message": "Too Many Requests", "error": 429}
Inconsistent or incorrectly JSON data
If JSON data is not properly formatted with correct syntax, including:
- quotes - single or double quotes
- values
- commas separating elements
Data should be consistent using only double or single quotes.
Examples:
{
"message": "Too Many Requests",
"error": 429
}
{
"message": "Too Many Requests",
"error": 429
}
In this example data is not in the JSON array ([])
and quotes are missing.
2. Solve ValueError: Trailing data - JSON lines
Depending on the case we can apply different solutions for the error. For example loading JSON lines file can be solved by adding lines=True
:
import pandas as pd
pd.read_json('data.json', lines=True)
This will solve the error and load the file:
{"message": "Too Many Requests", "error": 429}
{"message": "Too Many Requests", "error": 429}
as DataFrame:
message | error | |
---|---|---|
0 | Too Many Requests | 429 |
1 | Too Many Requests | 429 |
3. ValueError: Trailing data - detect errors
In order to detect problematic JSON records or lines we can use the following code:
import pandas as pd
with open('data/data_1.json') as f:
content = f.readlines()
data = [eval(c) for c in content]
df = pd.DataFrame(data)
df
if we try to load the JSON content of:
{"message": "Too Many Requests", "error": 429}2
{"message": "Too Many Requests", "error": 429}
We will get the following error:
{"message": "Too Many Requests", "error": 429}2
^
SyntaxError: invalid syntax
So we can extract all problematic records and fix them. To skip problematic values check the next section.
4. Handle JSON errors
To skip errors in a JSON file we can read the file line by line. We can parse each line and append only good ones.
For a JSON lines file with 3 rows and one of them is broken:
{"message": "Unknown Error", "error": 501}
{"message"3: "Unknown Error", "error": 502}
{"message": "Unknown Error", "error": 503}
We can use the following code in order to read the JSON file and skip problematic rows by:
import pandas as pd
with open('data/data_1.json') as f:
json_data = f.readlines()
for row in json_data:
try:
data = json.loads(row)
except Exception as e:
pass
data
This reads the corrupted JSON file into a DataFrame:
message | error | |
---|---|---|
0 | Unknown Error | 501 |
1 | Unknown Error | 503 |
As we can see line:
{"message"3: "Unknown Error", "error": 502}
Is not present in the final DataFrame.
5. ValueError: Trailing data - more fixes
You can also try to solve the errors also by using the following parameters:
pd.read_json('data.json', orient='records')
pd.read_json('data.json', orient='split')
pd.read_json('Data.json', encoding = 'utf-8-sig')
This might be helpful if you face more errors after fixing the original one:
ValueError: Expected object or value
error: json.decoder.JSONDecodeError: Extra data: line 1 column 112 (char 10)
Conclusion
To sum up, this article shows how using proper parameters for read_json()
method can solve the "ValueError: Trailing data" error.
We covered multiple examples and solutions for the error.
If you have an interesting case or problem which is not solved by this article - please share it in the comments section below. Thanks!