[json] Parsing JSON with Unix tools

I'm trying to parse JSON returned from a curl request, like so:

curl 'http://twitter.com/users/username.json' |
    sed -e 's/[{}]/''/g' | 
    awk -v k="text" '{n=split($0,a,","); for (i=1; i<=n; i++) print a[i]}'

The above splits the JSON into fields, for example:

% ...
"geo_enabled":false
"friends_count":245
"profile_text_color":"000000"
"status":"in_reply_to_screen_name":null
"source":"web"
"truncated":false
"text":"My status"
"favorited":false
% ...

How do I print a specific field (denoted by the -v k=text)?

This question is related to json bash parsing

The answer is


Here is the answer for shell nerds using POSIX shell (with local) and egrep: JSON.sh, 4.7 KB.

This thing has plenty of test cases, so it should be correct. It is also pipeable. It is used in the package manager for bash, bpkg.


If you have php:

php -r 'var_export(json_decode(`curl http://twitter.com/users/username.json`, 1));'

For example:
we have resource that provides json with countries iso codes: http://country.io/iso3.json and we can easily see it in a shell with curl:

curl http://country.io/iso3.json

but it looks not very convenient, and not readable, better parse json and see readable structure:

php -r 'var_export(json_decode(`curl http://country.io/iso3.json`, 1));'

This code will print something like:

array (
  'BD' => 'BGD',
  'BE' => 'BEL',
  'BF' => 'BFA',
  'BG' => 'BGR',
  'BA' => 'BIH',
  'BB' => 'BRB',
  'WF' => 'WLF',
  'BL' => 'BLM',
  ...

if you have nested arrays this output will looks much better...

Hope this will helpful...


You've asked how to shoot yourself in the foot and I'm here to provide the ammo:

curl -s 'http://twitter.com/users/username.json' | sed -e 's/[{}]/''/g' | awk -v RS=',"' -F: '/^text/ {print $2}'

You could use tr -d '{}' instead of sed. But leaving them out completely seems to have the desired effect as well.

If you want to strip off the outer quotes, pipe the result of the above through sed 's/\(^"\|"$\)//g'

I think others have sounded sufficient alarm. I'll be standing by with a cell phone to call an ambulance. Fire when ready.


This is a good usecase for pythonpy:

curl 'http://twitter.com/users/username.json' | py 'json.load(sys.stdin)["name"]'

To quickly extract the values for a particular key, I personally like to use "grep -o", which only returns the regex's match. For example, to get the "text" field from tweets, something like:

grep -Po '"text":.*?[^\\]",' tweets.json

This regex is more robust than you might think; for example, it deals fine with strings having embedded commas and escaped quotes inside them. I think with a little more work you could make one that is actually guaranteed to extract the value, if it's atomic. (If it has nesting, then a regex can't do it of course.)

And to further clean (albeit keeping the string's original escaping) you can use something like: | perl -pe 's/"text"://; s/^"//; s/",$//'. (I did this for this analysis.)

To all the haters who insist you should use a real JSON parser -- yes, that is essential for correctness, but

  1. To do a really quick analysis, like counting values to check on data cleaning bugs or get a general feel for the data, banging out something on the command line is faster. Opening an editor to write a script is distracting.
  2. grep -o is orders of magnitude faster than the Python standard json library, at least when doing this for tweets (which are ~2 KB each). I'm not sure if this is just because json is slow (I should compare to yajl sometime); but in principle, a regex should be faster since it's finite state and much more optimizable, instead of a parser that has to support recursion, and in this case, spends lots of CPU building trees for structures you don't care about. (If someone wrote a finite state transducer that did proper (depth-limited) JSON parsing, that would be fantastic! In the meantime we have "grep -o".)

To write maintainable code, I always use a real parsing library. I haven't tried jsawk, but if it works well, that would address point #1.

One last, wackier, solution: I wrote a script that uses Python json and extracts the keys you want, into tab-separated columns; then I pipe through a wrapper around awk that allows named access to columns. In here: the json2tsv and tsvawk scripts. So for this example it would be:

json2tsv id text < tweets.json | tsvawk '{print "tweet " $id " is: " $text}'

This approach doesn't address #2, is more inefficient than a single Python script, and it's a little brittle: it forces normalization of newlines and tabs in string values, to play nice with awk's field/record-delimited view of the world. But it does let you stay on the command line, with more correctness than grep -o.


Using standard Unix tools available on most distro's. Also works well with backslashes (\) and quotes (")

WARNING: this doesn't come close to the power of jq and will only work with very simple JSON objects. It's an attempt to answer to the original question and in situations where you can't install additional tools.

function parse_json()
{
    echo $1 | \
    sed -e 's/[{}]/''/g' | \
    sed -e 's/", "/'\",\"'/g' | \
    sed -e 's/" ,"/'\",\"'/g' | \
    sed -e 's/" , "/'\",\"'/g' | \
    sed -e 's/","/'\"---SEPERATOR---\"'/g' | \
    awk -F=':' -v RS='---SEPERATOR---' "\$1~/\"$2\"/ {print}" | \
    sed -e "s/\"$2\"://" | \
    tr -d "\n\t" | \
    sed -e 's/\\"/"/g' | \
    sed -e 's/\\\\/\\/g' | \
    sed -e 's/^[ \t]*//g' | \
    sed -e 's/^"//'  -e 's/"$//'
}


parse_json '{"username":"john, doe","email":"[email protected]"}' username
parse_json '{"username":"john doe","email":"[email protected]"}' email

--- outputs ---

john, doe
[email protected]

here's one way you can do it with awk

curl -sL 'http://twitter.com/users/username.json' | awk -F"," -v k="text" '{
    gsub(/{|}/,"")
    for(i=1;i<=NF;i++){
        if ( $i ~ k ){
            print $i
        }
    }
}'

Using Bash with Python

Create a bash function in your .bash_rc file

function getJsonVal () { 
    python -c "import json,sys;sys.stdout.write(json.dumps(json.load(sys.stdin)$1))"; 
}

Then

$ curl 'http://twitter.com/users/username.json' | getJsonVal "['text']"
My status
$ 

Here is the same function, but with error checking.

function getJsonVal() {
   if [ \( $# -ne 1 \) -o \( -t 0 \) ]; then
       cat <<EOF
Usage: getJsonVal 'key' < /tmp/
 -- or -- 
 cat /tmp/input | getJsonVal 'key'
EOF
       return;
   fi;
   python -c "import json,sys;sys.stdout.write(json.dumps(json.load(sys.stdin)$1))";
}

Where $# -ne 1 makes sure at least 1 input, and -t 0 make sure you are redirecting from a pipe.

The nice thing about this implementation is that you can access nested json values and get json in return! =)

Example:

$ echo '{"foo": {"bar": "baz", "a": [1,2,3]}}' |  getJsonVal "['foo']['a'][1]"
2

If you want to be really fancy, you could pretty print the data:

function getJsonVal () { 
    python -c "import json,sys;sys.stdout.write(json.dumps(json.load(sys.stdin)$1, sort_keys=True, indent=4))"; 
}

$ echo '{"foo": {"bar": "baz", "a": [1,2,3]}}' |  getJsonVal "['foo']"
{
    "a": [
        1, 
        2, 
        3
    ], 
    "bar": "baz"
}

For more complex JSON parsing I suggest using python jsonpath module (by Stefan Goessner) -

  1. Install it -

sudo easy_install -U jsonpath

  1. Use it -

Example file.json (from http://goessner.net/articles/JsonPath) -

{ "store": {
    "book": [ 
      { "category": "reference",
        "author": "Nigel Rees",
        "title": "Sayings of the Century",
        "price": 8.95
      },
      { "category": "fiction",
        "author": "Evelyn Waugh",
        "title": "Sword of Honour",
        "price": 12.99
      },
      { "category": "fiction",
        "author": "Herman Melville",
        "title": "Moby Dick",
        "isbn": "0-553-21311-3",
        "price": 8.99
      },
      { "category": "fiction",
        "author": "J. R. R. Tolkien",
        "title": "The Lord of the Rings",
        "isbn": "0-395-19395-8",
        "price": 22.99
      }
    ],
    "bicycle": {
      "color": "red",
      "price": 19.95
    }
  }
}

Parse it (extract all book titles with price < 10) -

$ cat file.json | python -c "import sys, json, jsonpath; print '\n'.join(jsonpath.jsonpath(json.load(sys.stdin), 'store.book[?(@.price < 10)].title'))"

Will output -

Sayings of the Century
Moby Dick

NOTE: The above command line does not include error checking. for full solution with error checking you should create small python script, and wrap the code with try-except.


There is an easier way to get a property from a json string. Using a package.json file as an example, try this:

#!/usr/bin/env bash
my_val="$(json=$(<package.json) node -pe "JSON.parse(process.env.json)['version']")"

We're using process.env because this gets the file's contents into node.js as a string without any risk of malicious contents escaping their quoting and being parsed as code.


Use Python's JSON support instead of using awk!

Something like this:

curl -s http://twitter.com/users/username.json | \
    python -c "import json,sys;obj=json.load(sys.stdin);print(obj['name']);"

If someone just wants to extract values from simple JSON objects without the need for nested structures, it is possible to use regular expressions without even leaving the bash.

Here is a function I defined using bash regular expressions based on the JSON standard:

function json_extract() {
  local key=$1
  local json=$2

  local string_regex='"([^"\]|\\.)*"'
  local number_regex='-?(0|[1-9][0-9]*)(\.[0-9]+)?([eE][+-]?[0-9]+)?'
  local value_regex="${string_regex}|${number_regex}|true|false|null"
  local pair_regex="\"${key}\"[[:space:]]*:[[:space:]]*(${value_regex})"

  if [[ ${json} =~ ${pair_regex} ]]; then
    echo $(sed 's/^"\|"$//g' <<< "${BASH_REMATCH[1]}")
  else
    return 1
  fi
}

Caveats: objects and arrays are not supported as value, but all other value types defined in the standard are supported. Also, a pair will be matched no matter how deep in the JSON document it is as long as it has exactly the same key name.

Using OP's example:

$ json_extract text "$(curl 'http://twitter.com/users/username.json')"
My status

$ json_extract friends_count "$(curl 'http://twitter.com/users/username.json')"
245

There is also a very simple but powerful JSON CLI processing tool fxhttps://github.com/antonmedv/fx

Example of JSON formatting in Bash terminal

Examples

Use anonymous function:

$ echo '{"key": "value"}' | fx "x => x.key"
value

If you don't pass anonymous function param => ..., code will be automatically transformed into anonymous function. And you can get access to JSON by this keyword:

$ echo '[1,2,3]' | fx "this.map(x => x * 2)"
[2, 4, 6]

Or just use dot syntax too:

$ echo '{"items": {"one": 1}}' | fx .items.one
1

You can pass any number of anonymous functions for reducing JSON:

$ echo '{"items": ["one", "two"]}' | fx "this.items" "this[1]"
two

You can update existing JSON using spread operator:

$ echo '{"count": 0}' | fx "{...this, count: 1}"
{"count": 1}

Just plain JavaScript. Don't need to learn new syntax.


UPDATE 2018-11-06

fx now has interactive mode (!)

https://github.com/antonmedv/fx


Version which uses Ruby and http://flori.github.com/json/

$ < file.json ruby -e "require 'rubygems'; require 'json'; puts JSON.pretty_generate(JSON[STDIN.read]);"

or more concisely:

$ < file.json ruby -r rubygems -r json -e "puts JSON.pretty_generate(JSON[STDIN.read]);"

Someone who also has xml files, might want to look at my Xidel. It is a cli, dependency-free JSONiq processor. (i.e. it also supports XQuery for xml or json processing)

The example in the question would be:

 xidel -e 'json("http://twitter.com/users/username.json")("name")'

Or with my own, non standard extension syntax:

 xidel -e 'json("http://twitter.com/users/username.json").name'

Using Node.js

If the system has installed, it's possible to use the -p print and -e evaulate script flags with JSON.parse to pull out any value that is needed.

A simple example using the JSON string { "foo": "bar" } and pulling out the value of "foo":

$ node -pe 'JSON.parse(process.argv[1]).foo' '{ "foo": "bar" }'
bar

Because we have access to cat and other utilities, we can use this for files:

$ node -pe 'JSON.parse(process.argv[1]).foo' "$(cat foobar.json)"
bar

Or any other format such as an URL that contains JSON:

$ node -pe 'JSON.parse(process.argv[1]).name' "$(curl -s https://api.github.com/users/trevorsenior)"
Trevor Senior

I've done this, "parsing" a json response for a particular value, as follows:

curl $url | grep $var | awk '{print $2}' | sed s/\"//g 

Clearly, $url here would be the twitter url, and $var would be "text" to get the response for that var.

Really, I think the only thing I'm doing the OP has left out is grep for the line with the specific variable he seeks. Awk grabs the second item on the line, and with sed I strip the quotes.

Someone smarter than I am could probably do the whole think with awk or grep.

Now, you could do it all with just sed:

curl $url | sed '/text/!d' | sed s/\"text\"://g | sed s/\"//g | sed s/\ //g

thus, no awk, no grep...I don't know why I didn't think of that before. Hmmm...


One interesting tool that hasn't be covered in the existing answers is using gron written in Go which has a tagline that says Make JSON greppable! which is exactly what it does.

So essentially gron breaks down your JSON into discrete assignments see the absolute 'path' to it. The primary advantage of it over other tools like jq would be to allow searching for the value without knowing how nested the record to search is present at, without breaking the original JSON structure

e.g., I want to search for the 'twitter_username' field from the following link, I just do

% gron 'https://api.github.com/users/lambda' | fgrep 'twitter_username'
json.twitter_username = "unlambda";
% gron 'https://api.github.com/users/lambda' | fgrep 'twitter_username' | gron -u
{
  "twitter_username": "unlambda"
}

As simple as that. Note how the gron -u (short for ungron) reconstructs the JSON back from the search path. The need for fgrep is just to filter your search to the paths needed and not let the search expression be evaluated as a regex, but as a fixed string (which is essentially grep -F)

Another example to search for a string to see where in the nested structure the record is under

% echo '{"foo":{"bar":{"zoo":{"moo":"fine"}}}}' | gron | fgrep "fine"
json.foo.bar.zoo.moo = "fine";

It also supports streaming JSON with its -s command line flag, where you can continuously gron the input stream for a matching record. Also gron has zero runtime dependencies. You can download a binary for Linux, Mac, Windows or FreeBSD and run it.

More usage examples and trips can be found at the official Github page - Advanced Usage

As for why you one can use gron over other JSON parsing tools, see from author's note from the project page.

Why shouldn't I just use jq?

jq is awesome, and a lot more powerful than gron, but with that power comes complexity. gron aims to make it easier to use the tools you already know, like grep and sed.


Unfortunately the top voted answer that uses grep returns the full match that didn't work in my scenario, but if you know the JSON format will remain constant you can use lookbehind and lookahead to extract just the desired values.

# echo '{"TotalPages":33,"FooBar":"he\"llo","anotherValue":100}' | grep -Po '(?<="FooBar":")(.*?)(?=",)'
he\"llo
# echo '{"TotalPages":33,"FooBar":"he\"llo","anotherValue":100}' | grep -Po '(?<="TotalPages":)(.*?)(?=,)'
33
#  echo '{"TotalPages":33,"FooBar":"he\"llo","anotherValue":100}' | grep -Po '(?<="anotherValue":)(.*?)(?=})'
100

I needed something in BASH that's short and would run without dependencies beyond vanilla Linux LSB and Mac OS for both python 2.7 & 3 and handle errors, e.g. would report json parse errors and missing property errors without spewing python exceptions:

json-extract () {
  if [[ "$1" == "" || "$1" == "-h" || "$1" == "-?" || "$1" == "--help" ]] ; then
    echo 'Extract top level property value from json document'
    echo '  Usage: json-extract <property> [ <file-path> ]'
    echo '  Example 1: json-extract status /tmp/response.json'
    echo '  Example 2: echo $JSON_STRING | json-extract-file status'
    echo '  Status codes: 0 - success, 1 - json parse error, 2 - property missing'
  else
    python -c $'import sys, json;\ntry: obj = json.load(open(sys.argv[2])); \nexcept: sys.exit(1)\ntry: print(obj[sys.argv[1]])\nexcept: sys.exit(2)' "$1" "${2:-/dev/stdin}"
  fi
}

I can not use any of the answers here. No available jq, no shell arrays, no declare, no grep -P, no lookbehind and lookahead, no Python, no Perl, no Ruby, no - not even Bash... Remaining answers simply do not work well. JavaScript sounded familiar, but the tin says Nescaffe - so it is a no go, too :) Even if available, for my simple need - they would be overkill and slow.

Yet, it is extremely important for me to get many variables from the json formatted reply of my modem. I am doing it in a sh with very trimmed down BusyBox at my routers! No problems using awk alone: just set delimiters and read the data. For a single variable, that is all!

awk 'BEGIN { FS="\""; RS="," }; { if ($2 == "login") {print $4} }' test.json

Remember I have no arrays? I had to assign within the awk parsed data to the 11 variables which I need in a shell script. Wherever I looked, that was said to be an impossible mission. No problem with that, too.

My solution is simple. This code will: 1) parse .json file from the question (actually, I have borrowed a working data sample from the most upvoted answer) and pick out the quoted data, plus 2) create shell variables from within the awk assigning free named shell variable names.

eval $( curl -s 'https://api.github.com/users/lambda' | 
awk ' BEGIN { FS="\""; RS="," };
{
    if ($2 == "login") { print "Login=\""$4"\"" }
    if ($2 == "name") { print "Name=\""$4"\"" }
    if ($2 == "updated_at") { print "Updated=\""$4"\"" }
}' )
echo "$Login, $Name, $Updated"

No problems with blanks within. In my use, the same command parses a long single line output. As eval is used, this solution is suited for trusted data only. It is simple to adapt it to pickup unquoted data. For huge number of variables, marginal speed gain can be achieved using else if. Lack of array obviously means: no multiple records without extra fiddling. But where arrays are available, adapting this solution is a simple task.

@maikel sed answer almost works (but I can not comment on it). For my nicely formatted data - it works. Not so much with the example used here (missing quotes throw it off). It is complicated and difficult to modify. Plus, I do not like having to make 11 calls to extract 11 variables. Why? I timed 100 loops extracting 9 variables: the sed function took 48.99 sec and my solution took 0.91 sec! Not fair? Doing just a single extraction of 9 variables: 0.51 vs. 0.02 sec.


Following MartinR and Boecko's lead:

$ curl -s 'http://twitter.com/users/username.json' | python -mjson.tool

That will give you an extremely grep friendly output. Very convenient:

$ curl -s 'http://twitter.com/users/username.json' | python -mjson.tool | grep my_key

Here is a good reference. In this case:

curl 'http://twitter.com/users/username.json' | sed -e 's/[{}]/''/g' | awk -v k="text" '{n=split($0,a,","); for (i=1; i<=n; i++) { where = match(a[i], /\"text\"/); if(where) {print a[i]} }  }'

Niet is a tool that help you to extract data from json or yaml file directly in your shell/bash CLI.

$ pip install niet

Consider a json file named project.json with the following contents:

{
  project: {
    meta: {
      name: project-sample
    }
}

You can use niet like this:

$ PROJECT_NAME=$(niet project.json project.meta.name)
$ echo ${PROJECT_NAME}
project-sample

You can try something like this -

curl -s 'http://twitter.com/users/jaypalsingh.json' | 
awk -F=":" -v RS="," '$1~/"text"/ {print}'

Here is a simple approach for Node.js-ready environment:

curl -L https://github.com/trentm/json/raw/master/lib/json.js > json
chmod +x json
echo '{"hello":{"hi":"there"}}' | ./json "hello.hi"

Now that Powershell is cross platform, I thought I'd throw its way out there, since I find it to be fairly intuitive and extremely simple.

curl -s 'https://api.github.com/users/lambda' | ConvertFrom-Json 

ConvertFrom-Json converts the JSON into a Powershell custom object, so you can easily work with the properties from that point forward. If you only wanted the 'id' property for example, you'd just do this:

curl -s 'https://api.github.com/users/lambda' | ConvertFrom-Json | select -ExpandProperty id

If you wanted to invoke the whole thing from within Bash, then you'd have to call it like this:

powershell 'curl -s "https://api.github.com/users/lambda" | ConvertFrom-Json'

Of course there's a pure Powershell way to do it without curl, which would be:

Invoke-WebRequest 'https://api.github.com/users/lambda' | select -ExpandProperty Content | ConvertFrom-Json

Finally, there's also 'ConvertTo-Json' which converts a custom object to JSON just as easily. Here's an example:

(New-Object PsObject -Property @{ Name = "Tester"; SomeList = @('one','two','three')}) | ConvertTo-Json

Which would produce nice JSON like this:

{
"Name":  "Tester",
"SomeList":  [
                 "one",
                 "two",
                 "three"
             ]

}

Admittedly, using a Windows shell on Unix is somewhat sacrilegious but Powershell is really good at some things, and parsing JSON and XML are a couple of them. This the GitHub page for the cross platform version https://github.com/PowerShell/PowerShell


If pip is avaiable on the system then:

$ pip install json-query

Examples of usage:

$ curl -s http://0/file.json | json-query
{
    "key":"value"    
}

$ curl -s http://0/file.json | json-query my.key
value

$ curl -s http://0/file.json | json-query my.keys.
key_1
key_2
key_3

$ curl -s http://0/file.json | json-query my.keys.2
value_2

Update (2020)

My biggest issue with external tools (e.g. Python) was that you have to deal with package managers and dependencies to install them.

However, now that we have jq as a standalone, static tool that's easy to install cross-platform via Github Releases and Webi (webinstall.dev/jq), I'd recommend that:

Mac, Linux:

curl -sS https://webinstall.dev/jq | bash

Windows 10:

curl.exe -A MS https://webinstall.dev/jq | powershell

Original (2011)

TickTick is a JSON parser written in bash (<250 lines of code)

Here's the author's snippit from his article, Imagine a world where Bash supports JSON:

#!/bin/bash
. ticktick.sh

``  
  people = { 
    "Writers": [
      "Rod Serling",
      "Charles Beaumont",
      "Richard Matheson"
    ],  
    "Cast": {
      "Rod Serling": { "Episodes": 156 },
      "Martin Landau": { "Episodes": 2 },
      "William Shatner": { "Episodes": 2 } 
    }   
  }   
``  

function printDirectors() {
  echo "  The ``people.Directors.length()`` Directors are:"

  for director in ``people.Directors.items()``; do
    printf "    - %s\n" ${!director}
  done
}   

`` people.Directors = [ "John Brahm", "Douglas Heyes" ] ``
printDirectors

newDirector="Lamont Johnson"
`` people.Directors.push($newDirector) ``
printDirectors

echo "Shifted: "``people.Directors.shift()``
printDirectors

echo "Popped: "``people.Directors.pop()``
printDirectors

A two-liner which uses python. It works particularly well if you're writing a single .sh file and you don't want to depend on another .py file. It also leverages the usage of pipe |. echo "{\"field\": \"value\"}" can be replaced by anything printing a json to the stdout.

echo "{\"field\": \"value\"}" | python -c 'import sys, json
print(json.load(sys.stdin)["field"])'

On the basis that some of the recommendations here (esp in the comments) suggested the use of Python, I was disappointed not to find an example.

So, here's a one liner to get a single value from some JSON data. It assumes that you are piping the data in (from somewhere) and so should be useful in a scripting context.

echo '{"hostname":"test","domainname":"example.com"}' | python -c 'import json,sys;obj=json.load(sys.stdin);print obj["hostname"]'

Parsing JSON with PHP CLI

Arguably off topic but since precedence reigns this question remains incomplete without a mention of our trusty and faithful PHP, am I right?

Using the same example JSON but lets assign it to a variable to reduce obscurity.

$ export JSON='{"hostname":"test","domainname":"example.com"}'

Now for PHP goodness, using file_get_contents and the php://stdin stream wrapper.

$ echo $JSON|php -r 'echo json_decode(file_get_contents("php://stdin"))->hostname;'

or as pointed out using fgets and the already opened stream at CLI constant STDIN.

$ echo $JSON|php -r 'echo json_decode(fgets(STDIN))->hostname;'

nJoy!


You could just download jq binary for your platform and run (chmod +x jq):

$ curl 'https://twitter.com/users/username.json' | ./jq -r '.name'

It extracts "name" attribute from the json object.

jq homepage says it is like sed for JSON data.


You can use bashJson

It’s a wrapper for the Python's JSON module and can handle complex JSON data.

Let's consider this exmaple JSON data from the file test.json

{
    "name":"Test tool",
    "author":"hack4mer",
    "supported_os":{
        "osx":{
            "foo":"bar",
            "min_version" : 10.12,
            "tested_on" : [10.1,10.13]
        },
        "ubuntu":{
            "min_version":14.04,
            "tested_on" : 16.04
        }
    }
}

Following commands read data from this example JSON file

./bashjson.sh test.json name

Prints: Test Tool

./bashjson.sh test.json supported_os osx foo

Prints: bar

./bashjson.sh test.json supported_os osx tested_on

Prints: [10.1,10.13]


Parsing JSON is painful in a shell script. With a more appropriate language, create a tool that extracts JSON attributes in a way consistent with shell scripting conventions. You can use your new tool to solve the immediate shell scripting problem and then add it to your kit for future situations.

For example, consider a tool jsonlookup such that if I say jsonlookup access token id it will return the attribute id defined within the attribute token defined within the attribute access from stdin, which is presumably JSON data. If the attribute doesn't exist, the tool returns nothing (exit status 1). If the parsing fails, exit status 2 and a message to stderr. If the lookup succeeds, the tool prints the attribute's value.

Having created a unix tool for the precise purpose of extracting JSON values you can easily use it in shell scripts:

access_token=$(curl <some horrible crap> | jsonlookup access token id)

Any language will do for the implementation of jsonlookup. Here is a fairly concise python version:

#!/usr/bin/python                                                               

import sys
import json

try: rep = json.loads(sys.stdin.read())
except:
    sys.stderr.write(sys.argv[0] + ": unable to parse JSON from stdin\n")
    sys.exit(2)
for key in sys.argv[1:]:
    if key not in rep:
        sys.exit(1)
    rep = rep[key]
print rep

You can use jshon:

curl 'http://twitter.com/users/username.json' | jshon -e text

This is yet another bash & python hybrid answer. I posted this answer because I wanted to process more complex JSON output, but, reducing the complexity of my bash application. I want to crack open the following JSON object from http://www.arcgis.com/sharing/rest/info?f=json in bash:

{
  "owningSystemUrl": "http://www.arcgis.com",
  "authInfo": {
    "tokenServicesUrl": "https://www.arcgis.com/sharing/rest/generateToken",
    "isTokenBasedSecurity": true
  }
}

In the following example, I created my own implementation of jq and unquote leveraging python. You'll note that once we import the python object from json to a python dictionary we can use python syntax to navigate the dictionary. To navigate the above, the syntax is:

  • data
  • data[ "authInfo" ]
  • data[ "authInfo" ][ "tokenServicesUrl" ]

By using magic in bash, we omit data and only supply the python text to the right of data, i.e.

  • jq
  • jq '[ "authInfo" ]'
  • jq '[ "authInfo" ][ "tokenServicesUrl" ]'

Note, with no parameters, jq acts as a JSON prettifier. With parameters, we can use python syntax to extract anything we want from the dictionary including navigating subdictionaries and array elements.

Here are the bash python hybrid functions:

#!/bin/bash -xe

jq_py() {
  cat <<EOF
import json, sys
data = json.load( sys.stdin )
print( json.dumps( data$1, indent = 4 ) )
EOF
}

jq() {
  python -c "$( jq_py "$1" )"
}

unquote_py() {
  cat <<EOF
import json,sys
print( json.load( sys.stdin ) )
EOF
}

unquote() {
  python -c "$( unquote_py )"
}

Here's a sample usage of the bash python functions:

curl http://www.arcgis.com/sharing/rest/info?f=json | tee arcgis.json
# {"owningSystemUrl":"https://www.arcgis.com","authInfo":{"tokenServicesUrl":"https://www.arcgis.com/sharing/rest/generateToken","isTokenBasedSecurity":true}}

cat arcgis.json | jq
# {
#     "owningSystemUrl": "https://www.arcgis.com",
#     "authInfo": {
#         "tokenServicesUrl": "https://www.arcgis.com/sharing/rest/generateToken",
#         "isTokenBasedSecurity": true
#     }
# }

cat arcgis.json | jq '[ "authInfo" ]'
# {
#     "tokenServicesUrl": "https://www.arcgis.com/sharing/rest/generateToken",
#     "isTokenBasedSecurity": true
# }

cat arcgis.json | jq '[ "authInfo" ][ "tokenServicesUrl" ]'
# "https://www.arcgis.com/sharing/rest/generateToken"

cat arcgis.json | jq '[ "authInfo" ][ "tokenServicesUrl" ]' | unquote
# https://www.arcgis.com/sharing/rest/generateToken

i used this to extract video duration from ffprobe json output :

MOVIE_INFO=`ffprobe "path/to/movie.mp4"  -show_streams -show_format -print_format json -v quiet` 
MOVIE_SECONDS=`echo "$MOVIE_INFO"|grep -w \"duration\" |tail -1 | cut -d\" -f4 |cut -d \. -f 1`

it can be used to extract value from any json :

value=`echo "$jsondata"|grep -w \"key_name\" |tail -1 | cut -d\" -f4

Examples related to json

Use NSInteger as array index Uncaught SyntaxError: Unexpected end of JSON input at JSON.parse (<anonymous>) HTTP POST with Json on Body - Flutter/Dart Importing json file in TypeScript json.decoder.JSONDecodeError: Extra data: line 2 column 1 (char 190) Angular 5 Service to read local .json file How to import JSON File into a TypeScript file? Use Async/Await with Axios in React.js Uncaught SyntaxError: Unexpected token u in JSON at position 0 how to remove json object key and value.?

Examples related to bash

Comparing a variable with a string python not working when redirecting from bash script Zipping a file in bash fails How do I prevent Conda from activating the base environment by default? Get first line of a shell command's output Fixing a systemd service 203/EXEC failure (no such file or directory) /bin/sh: apt-get: not found VSCode Change Default Terminal Run bash command on jenkins pipeline How to check if the docker engine and a docker container are running? How to switch Python versions in Terminal?

Examples related to parsing

Got a NumberFormatException while trying to parse a text file for objects Uncaught SyntaxError: Unexpected end of JSON input at JSON.parse (<anonymous>) Python/Json:Expecting property name enclosed in double quotes Correctly Parsing JSON in Swift 3 How to get response as String using retrofit without using GSON or any other library in android UIButton action in table view cell "Expected BEGIN_OBJECT but was STRING at line 1 column 1" How to convert an XML file to nice pandas dataframe? How to extract multiple JSON objects from one file? How to sum digits of an integer in java?