JSON to pandas DataFrame

Question

What I am trying to do is extract elevation data from a google maps API along a path specified by latitude and longitude coordinates as follows   from urllib2 import Request  urlopen import json  path1    42 974049 -81 205203 42 974298 -81 195755  request Request  http   maps googleapis com maps api elevation json locations   path1   amp sensor false   response   urlopen request  elevations   response read     This gives me a data that looks like this   elevations splitlines                results                                elevation    243 3462677001953                location                       lat    42 974049                   lng    -81 205203                               resolution    19 08790397644043                                        elevation    244 1318664550781                location                       lat    42 974298                   lng    -81 19575500000001                               resolution    19 08790397644043                               status     OK            when putting into as DataFrame here is what I get     pd read json elevations    and here is what I want     I m not sure if this is possible  but mainly what I am looking for is a way to be able to put the elevation  latitude and longitude data together in a pandas dataframe  doesn t have to have fancy mutiline headers    If any one can help or give some advice on working with this data that would be great  If you can t tell I haven t worked much with json data before     EDIT   This method isn t all that attractive but seems to work   data   json loads elevations  lat lng el            for result in data  results        lat append result u location   u lat        lng append result u location   u lng        el append result u elevation    df   pd DataFrame  lat lng el   T   ends up dataframe having columns latitude  longitude  elevation

User · Answer

You could first import your json data in a Python dictionnary :

data = json.loads(elevations)

Then modify data on the fly :

for result in data['results']:
    result[u'lat']=result[u'location'][u'lat']
    result[u'lng']=result[u'location'][u'lng']
    del result[u'location']

Rebuild json string :

elevations = json.dumps(data)

Finally :

pd.read_json(elevations)

You can, also, probably avoid to dump data back to a string, I assume Panda can directly create a DataFrame from a dictionnary (I haven't used it since a long time :p)

User · Answer

I prefer a more generic method in which may be user doesn t prefer to give key  results   You can still flatten it by using a recursive approach of finding key having nested data or if you have key but your JSON is very nested  It is something like  from pandas import json normalize  def findnestedlist js       for i in js keys            if isinstance js i  list               return js i      for v in js values            if isinstance v dict               return check list v    def recursive lookup k  d       if k in d          return d k      for v in d values            if isinstance v  dict               return recursive lookup k  v      return None  def flat json content key       nested list          js   json loads content      if key is None or key                nested list   findnestedlist js      else          nested list   recursive lookup key  js      return json normalize nested list sep  quot   quot    key    quot results quot    If you don t have it  give it None  csv data   flat json your json string root key  print csv data

User · Answer

Check this snip out      reading the JSON data using json load   file    data json  with open file  as train file      dict train   json load train file     converting json dataset from dictionary to dataframe train   pd DataFrame from dict dict train  orient  index   train reset index level 0  inplace True    Hope it helps

User · Answer

billmanH s solution helped me but didn t work until i switched from   n   data loc row  json column     to   n   data iloc  row    json column     here s the rest of it  converting to a dictionary is helpful for working with json data   import json  for row in range len data        n   data iloc  row    json column   item       jsonDict   json loads n      if   mykey  in jsonDict           display jsonDict  mykey

User · Answer

Just a new version of the accepted answer  as python3 x does not support urllib2  from requests import request import json from pandas io json import json normalize  path1    42 974049 -81 205203 42 974298 -81 195755  response request url  http   maps googleapis com maps api elevation json locations   path1   amp sensor false   method  get   elevations   response json   elevations data   json loads elevations  json normalize data  results

User · Answer

Here is small utility class that converts JSON to DataFrame and back  Hope you find this helpful     - - coding  utf-8 - - from pandas io json import json normalize  class DFConverter        Converts the input JSON to a DataFrame     def convertToDF self dfJSON           return json normalize dfJSON         Converts the input DataFrame to JSON      def convertToJSON self  df           resultJSON   df to json orient  records           return resultJSON

User · Answer

Rumble supports JSON natively with JSONiq and runs on Spark  managing DataFrames internally so you don t need to -- even if the data isn t fully structured  let  coords     quot 42 974049 -81 205203 7C42 974298 -81 195755 quot  let  request    json-doc  quot http   maps googleapis com maps api elevation json locations  quot    coords   quot  amp sensor false quot   for  obj in  request results   return      quot latitude quot     obj location lat     quot longitude quot     obj location lng     quot elevation quot     obj elevation    The results can be exported to CSV and then reopened in any other host language as a DataFrame

User · Answer

Once you have the flattened DataFrame obtained by the accepted answer  you can make the columns a MultiIndex   fancy multiline header   like this    df columns   pd MultiIndex from tuples  tuple c split       for c in df columns

User · Answer

Use the small trick to make the data json interpret-able  Since your data is not directly interpreted by json loads     gt  gt  gt  import json  gt  gt  gt  f open  sampledata txt   r     gt  gt  gt  data   f read    gt  gt  gt  for x in data split   n            strlist       x             datalist json loads strlist          for y in datalist                  print type y                   print y           lt type  dict  gt   u 0     10 8  36 0    u 10   0  u 1   0  u 0   0  u 3   0  u 2   0  u 5   0  u 4   0  u 7   0  u 6   0  u 9   0  u 8   0     lt type  dict  gt   u 1     10 8  36 1    u 10   0  u 1   0  u 0   0  u 3   0  u 2   0  u 5   0  u 4   0  u 7   0  u 6   0  u 9   0  u 8   0     lt type  dict  gt   u 2     10 8  36 2    u 10   0  u 1   0  u 0   0  u 3   0  u 2   0  u 5   0  u 4   0  u 7   0  u 6   0  u 9   0  u 8   0     lt type  dict  gt   u 3     10 8  36 300000000000004    u 10   0  u 1   0  u 0   0  u 3   0  u 2   0  u 5   0  u 4   0  u 7   0  u 6   0  u 9   0  u 8   0     lt type  dict  gt   u 4     10 8  36 4    u 10   0  u 1   0  u 0   0  u 3   0  u 2   0  u 5   0  u 4   0  u 7   0  u 6   0  u 9   0  u 8   0     lt type  dict  gt   u 5     10 8  36 5    u 10   0  u 1   0  u 0   0  u 3   0  u 2   0  u 5   0  u 4   0  u 7   0  u 6   0  u 9   0  u 8   0     lt type  dict  gt   u 6     10 8  36 6    u 10   0  u 1   0  u 0   0  u 3   0  u 2   0  u 5   0  u 4   0  u 7   0  u 6   0  u 9   0  u 8   0     lt type  dict  gt   u 7     10 8  36 7    u 10   0  u 1   0  u 0   0  u 3   0  u 2   0  u 5   0  u 4   0  u 7   0  u 6   0  u 9   0  u 8   0     lt type  dict  gt   u 8     10 8  36 800000000000004    u 1   0  u 0   0  u 3   0  u 2   0  u 5   0  u 4   0  u 7   0  u 6   0  u 9   0  u 8   0     lt type  dict  gt   u 9     10 8  36 9    u 1   0  u 0   0  u 3   0  u 2   0  u 5   0  u 4   0  u 7   0  u 6   0  u 9   0  u 8   0

User · Answer

The problem is that you have several columns in the data frame that contain dicts with smaller dicts inside them  Useful Json is often heavily nested  I have been writing small functions that pull the info I want out into a new column  That way I have it in the format that I want to use    for row in range len data         First I load the dict  one at a time      n   data loc row  dict column        Now I make a new column that pulls out the data that I want      data loc row  new column     n get  key

User · Answer

Optimization of the accepted answer   The accepted answer has some functioning problems  so I want to share my code that does not rely on urllib2  import requests from pandas import json normalize url    https   www energidataservice dk proxy api datastore search resource id nordpoolmarket amp limit 5   response   requests get url  dictr   response json   recs   dictr  result    records   df   json normalize recs  print df   Output           id                    HourUTC               HourDK      ElbasAveragePriceEUR  ElbasMaxPriceEUR  ElbasMinPriceEUR 0    264028  2019-01-01T00 00 00 00 00  2019-01-01T01 00 00                       NaN               NaN               NaN 1    138428  2017-09-03T15 00 00 00 00  2017-09-03T17 00 00                     33 28              33 4              32 0 2    138429  2017-09-03T16 00 00 00 00  2017-09-03T18 00 00                     35 20              35 7              34 9 3    138430  2017-09-03T17 00 00 00 00  2017-09-03T19 00 00                     37 50              37 8              37 3 4    138431  2017-09-03T18 00 00 00 00  2017-09-03T20 00 00                     39 65              42 9              35 3                                                                                                                           995  139290  2017-10-09T13 00 00 00 00  2017-10-09T15 00 00                     38 40              38 4              38 4 996  139291  2017-10-09T14 00 00 00 00  2017-10-09T16 00 00                     41 90              44 3              33 9 997  139292  2017-10-09T15 00 00 00 00  2017-10-09T17 00 00                     46 26              49 5              41 4 998  139293  2017-10-09T16 00 00 00 00  2017-10-09T18 00 00                     56 22              58 5              49 1 999  139294  2017-10-09T17 00 00 00 00  2017-10-09T19 00 00                     56 71              65 4              42 2   PS  API is for Danish electricity prices

User · Answer

I found a quick and easy solution to what I wanted using json normalize   included in pandas 1 01  from urllib2 import Request  urlopen import json  import pandas as pd      path1    42 974049 -81 205203 42 974298 -81 195755  request Request  http   maps googleapis com maps api elevation json locations   path1   amp sensor false   response   urlopen request  elevations   response read   data   json loads elevations  df   pd json normalize data  results     This gives a nice flattened dataframe with the json data that I got from the Google Maps API

[python] JSON to pandas DataFrame

Examples related to python

Examples related to json

Examples related to google-maps

Examples related to pandas