[python] Python json.loads shows ValueError: Extra data

I am getting some data from a JSON file "new.json", and I want to filter some data and store it into a new JSON file. Here is my code:

import json
with open('new.json') as infile:
    data = json.load(infile)
for item in data:
    iden = item.get["id"]
    a = item.get["a"]
    b = item.get["b"]
    c = item.get["c"]
    if c == 'XYZ' or  "XYZ" in data["text"]:
        filename = 'abc.json'
    try:
        outfile = open(filename,'ab')
    except:
        outfile = open(filename,'wb')
    obj_json={}
    obj_json["ID"] = iden
    obj_json["VAL_A"] = a
    obj_json["VAL_B"] = b

and I am getting an error, the traceback is:

  File "rtfav.py", line 3, in <module>
    data = json.load(infile)
  File "/usr/lib64/python2.7/json/__init__.py", line 278, in load
    **kw)
  File "/usr/lib64/python2.7/json/__init__.py", line 326, in loads
    return _default_decoder.decode(s)
  File "/usr/lib64/python2.7/json/decoder.py", line 369, in decode
    raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 88 column 2 - line 50607 column 2 (char 3077 - 1868399)

Can someone help me?

Here is a sample of the data in new.json, there are about 1500 more such dictionaries in the file

{
    "contributors": null, 
    "truncated": false, 
    "text": "@HomeShop18 #DreamJob to professional rafter", 
    "in_reply_to_status_id": null, 
    "id": 421584490452893696, 
    "favorite_count": 0, 
    "source": "<a href=\"https://mobile.twitter.com\" rel=\"nofollow\">Mobile Web (M2)</a>", 
    "retweeted": false, 
    "coordinates": null, 
    "entities": {
        "symbols": [], 
        "user_mentions": [
            {
                "id": 183093247, 
                "indices": [
                    0, 
                    11
                ], 
                "id_str": "183093247", 
                "screen_name": "HomeShop18", 
                "name": "HomeShop18"
            }
        ], 
        "hashtags": [
            {
                "indices": [
                    12, 
                    21
                ], 
                "text": "DreamJob"
            }
        ], 
        "urls": []
    }, 
    "in_reply_to_screen_name": "HomeShop18", 
    "id_str": "421584490452893696", 
    "retweet_count": 0, 
    "in_reply_to_user_id": 183093247, 
    "favorited": false, 
    "user": {
        "follow_request_sent": null, 
        "profile_use_background_image": true, 
        "default_profile_image": false, 
        "id": 2254546045, 
        "verified": false, 
        "profile_image_url_https": "https://pbs.twimg.com/profile_images/413952088880594944/rcdr59OY_normal.jpeg", 
        "profile_sidebar_fill_color": "171106", 
        "profile_text_color": "8A7302", 
        "followers_count": 87, 
        "profile_sidebar_border_color": "BCB302", 
        "id_str": "2254546045", 
        "profile_background_color": "0F0A02", 
        "listed_count": 1, 
        "profile_background_image_url_https": "https://abs.twimg.com/images/themes/theme1/bg.png", 
        "utc_offset": null, 
        "statuses_count": 9793, 
        "description": "Rafter. Rafting is what I do. Me aur mera Tablet.  Technocrat of Future", 
        "friends_count": 231, 
        "location": "", 
        "profile_link_color": "473623", 
        "profile_image_url": "http://pbs.twimg.com/profile_images/413952088880594944/rcdr59OY_normal.jpeg", 
        "following": null, 
        "geo_enabled": false, 
        "profile_banner_url": "https://pbs.twimg.com/profile_banners/2254546045/1388065343", 
        "profile_background_image_url": "http://abs.twimg.com/images/themes/theme1/bg.png", 
        "name": "Jayy", 
        "lang": "en", 
        "profile_background_tile": false, 
        "favourites_count": 41, 
        "screen_name": "JzayyPsingh", 
        "notifications": null, 
        "url": null, 
        "created_at": "Fri Dec 20 05:46:00 +0000 2013", 
        "contributors_enabled": false, 
        "time_zone": null, 
        "protected": false, 
        "default_profile": false, 
        "is_translator": false
    }, 
    "geo": null, 
    "in_reply_to_user_id_str": "183093247", 
    "lang": "en", 
    "created_at": "Fri Jan 10 10:09:09 +0000 2014", 
    "filter_level": "medium", 
    "in_reply_to_status_id_str": null, 
    "place": null
} 

This question is related to python json

The answer is


I came across this because I was trying to load a JSON file dumped from MongoDB. It was giving me an error

JSONDecodeError: Extra data: line 2 column 1

The MongoDB JSON dump has one object per line, so what worked for me is:

import json
data = [json.loads(line) for line in open('data.json', 'r')]

My json file was formatted exactly as the one in the question but none of the solutions here worked out. Finally I found a workaround on another Stackoverflow thread. Since this post is the first link in Google search, I put the that answer here so that other people come to this post in the future will find it more easily.

As it's been said there the valid json file needs "[" in the beginning and "]" in the end of file. Moreover, after each json item instead of "}" there must be a "},". All brackets without quotations! This piece of code just modifies the malformed json file into its correct format.

https://stackoverflow.com/a/51919788/2772087


I think saving dicts in a list is not an ideal solution here proposed by @falsetru.

Better way is, iterating through dicts and saving them to .json by adding a new line.

our 2 dictionaries are

d1 = {'a':1}

d2 = {'b':2}

you can write them to .json

import json
with open('sample.json','a') as sample:
    for dict in [d1,d2]:
        sample.write('{}\n'.format(json.dumps(dict)))

and you can read json file without any issues

with open('sample.json','r') as sample:
    for line in sample:
        line = json.loads(line.strip())

simple and efficient


One-liner for your problem:

data = [json.loads(line) for line in open('tweets.json', 'r')]

Well , it might help someone. i just got the same error while my json file is like this

{"id":"1101010","city_id":"1101","name":"TEUPAH SELATAN"}
{"id":"1101020","city_id":"1101","name":"SIMEULUE TIMUR"}

and i found it malformed, so i changed it into somekind of

{
  "datas":[
    {"id":"1101010","city_id":"1101","name":"TEUPAH SELATAN"},
    {"id":"1101020","city_id":"1101","name":"SIMEULUE TIMUR"}
  ]
}

You can just read from a file, jsonifying each line as you go:

tweets = []
for line in open('tweets.json', 'r'):
    tweets.append(json.loads(line))

This avoids storing intermediate python objects. As long as your write one full tweet per append() call, this should work.


If you want to solve it in a two-liner you can do it like this:

with open('data.json') as f:
    data = [json.loads(line) for line in f]

This may also happen if your JSON file is not just 1 JSON record. A JSON record looks like this:

[{"some data": value, "next key": "another value"}]

It opens and closes with a bracket [ ], within the brackets are the braces { }. There can be many pairs of braces, but it all ends with a close bracket ]. If your json file contains more than one of those:

[{"some data": value, "next key": "another value"}]
[{"2nd record data": value, "2nd record key": "another value"}]

then loads() will fail.

I verified this with my own file that was failing.

import json

guestFile = open("1_guests.json",'r')
guestData = guestFile.read()
guestFile.close()
gdfJson = json.loads(guestData)

This works because 1_guests.json has one record []. The original file I was using all_guests.json had 6 records separated by newline. I deleted 5 records, (which I already checked to be bookended by brackets) and saved the file under a new name. Then the loads statement worked.

Error was

   raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 2 column 1 - line 10 column 1 (char 261900 - 6964758)

PS. I use the word record, but that's not the official name. Also, if your file has newline characters like mine, you can loop through it to loads() one record at a time into a json variable.