[javascript] Fastest way to flatten / un-flatten nested JSON objects

I threw some code together to flatten and un-flatten complex/nested JSON objects. It works, but it's a bit slow (triggers the 'long script' warning).

For the flattened names I want "." as the delimiter and [INDEX] for arrays.

Examples:

un-flattened | flattened
---------------------------
{foo:{bar:false}} => {"foo.bar":false}
{a:[{b:["c","d"]}]} => {"a[0].b[0]":"c","a[0].b[1]":"d"}
[1,[2,[3,4],5],6] => {"[0]":1,"[1].[0]":2,"[1].[1].[0]":3,"[1].[1].[1]":4,"[1].[2]":5,"[2]":6}

I created a benchmark that ~simulates my use case http://jsfiddle.net/WSzec/

  • Get a nested JSON object
  • Flatten it
  • Look through it and possibly modify it while flattened
  • Unflatten it back to it's original nested format to be shipped away

I would like faster code: For clarification, code that completes the JSFiddle benchmark (http://jsfiddle.net/WSzec/) significantly faster (~20%+ would be nice) in IE 9+, FF 24+, and Chrome 29+.

Here's the relevant JavaScript code: Current Fastest: http://jsfiddle.net/WSzec/6/

JSON.unflatten = function(data) {
    "use strict";
    if (Object(data) !== data || Array.isArray(data))
        return data;
    var result = {}, cur, prop, idx, last, temp;
    for(var p in data) {
        cur = result, prop = "", last = 0;
        do {
            idx = p.indexOf(".", last);
            temp = p.substring(last, idx !== -1 ? idx : undefined);
            cur = cur[prop] || (cur[prop] = (!isNaN(parseInt(temp)) ? [] : {}));
            prop = temp;
            last = idx + 1;
        } while(idx >= 0);
        cur[prop] = data[p];
    }
    return result[""];
}
JSON.flatten = function(data) {
    var result = {};
    function recurse (cur, prop) {
        if (Object(cur) !== cur) {
            result[prop] = cur;
        } else if (Array.isArray(cur)) {
             for(var i=0, l=cur.length; i<l; i++)
                 recurse(cur[i], prop ? prop+"."+i : ""+i);
            if (l == 0)
                result[prop] = [];
        } else {
            var isEmpty = true;
            for (var p in cur) {
                isEmpty = false;
                recurse(cur[p], prop ? prop+"."+p : p);
            }
            if (isEmpty)
                result[prop] = {};
        }
    }
    recurse(data, "");
    return result;
}

EDIT 1 Modified the above to @Bergi 's implementation which is currently the fastest. As an aside, using ".indexOf" instead of "regex.exec" is around 20% faster in FF but 20% slower in Chrome; so I'll stick with the regex since it's simpler (here's my attempt at using indexOf to replace the regex http://jsfiddle.net/WSzec/2/).

EDIT 2 Building on @Bergi 's idea I managed to created a faster non-regex version (3x faster in FF and ~10% faster in Chrome). http://jsfiddle.net/WSzec/6/ In the this (the current) implementation the rules for key names are simply, keys cannot start with an integer or contain a period.

Example:

  • {"foo":{"bar":[0]}} => {"foo.bar.0":0}

EDIT 3 Adding @AaditMShah 's inline path parsing approach (rather than String.split) helped to improve the unflatten performance. I'm very happy with the overall performance improvement reached.

The latest jsfiddle and jsperf:

http://jsfiddle.net/WSzec/14/

http://jsperf.com/flatten-un-flatten/4

This question is related to javascript algorithm

The answer is


You can try out the package jpflat.

It flattens, inflates, resolves promises, flattens arrays, has customizable path creation and customizable value serialization.

The reducers and serializers receive the whole path as an array of it's parts, so more complex operations can be done to the path instead of modifying a single key or changing the delimiter.

Json path is the default, hence "jp"flat.

https://www.npmjs.com/package/jpflat

let flatFoo = await require('jpflat').flatten(foo)

You can use https://github.com/hughsk/flat

Take a nested Javascript object and flatten it, or unflatten an object with delimited keys.

Example from the doc

var flatten = require('flat')

flatten({
    key1: {
        keyA: 'valueI'
    },
    key2: {
        keyB: 'valueII'
    },
    key3: { a: { b: { c: 2 } } }
})

// {
//   'key1.keyA': 'valueI',
//   'key2.keyB': 'valueII',
//   'key3.a.b.c': 2
// }


var unflatten = require('flat').unflatten

unflatten({
    'three.levels.deep': 42,
    'three.levels': {
        nested: true
    }
})

// {
//     three: {
//         levels: {
//             deep: 42,
//             nested: true
//         }
//     }
// }

This code recursively flattens out JSON objects.

I included my timing mechanism in the code and it gives me 1ms but I'm not sure if that's the most accurate one.

            var new_json = [{
              "name": "fatima",
              "age": 25,
              "neighbour": {
                "name": "taqi",
                "location": "end of the street",
                "property": {
                  "built in": 1990,
                  "owned": false,
                  "years on market": [1990, 1998, 2002, 2013],
                  "year short listed": [], //means never
                }
              },
              "town": "Mountain View",
              "state": "CA"
            },
            {
              "name": "qianru",
              "age": 20,
              "neighbour": {
                "name": "joe",
                "location": "opposite to the park",
                "property": {
                  "built in": 2011,
                  "owned": true,
                  "years on market": [1996, 2011],
                  "year short listed": [], //means never
                }
              },
              "town": "Pittsburgh",
              "state": "PA"
            }]

            function flatten(json, flattened, str_key) {
                for (var key in json) {
                  if (json.hasOwnProperty(key)) {
                    if (json[key] instanceof Object && json[key] != "") {
                      flatten(json[key], flattened, str_key + "." + key);
                    } else {
                      flattened[str_key + "." + key] = json[key];
                    }
                  }
                }
            }

        var flattened = {};
        console.time('flatten'); 
        flatten(new_json, flattened, "");
        console.timeEnd('flatten');

        for (var key in flattened){
          console.log(key + ": " + flattened[key]);
        }

Output:

flatten: 1ms
.0.name: fatima
.0.age: 25
.0.neighbour.name: taqi
.0.neighbour.location: end of the street
.0.neighbour.property.built in: 1990
.0.neighbour.property.owned: false
.0.neighbour.property.years on market.0: 1990
.0.neighbour.property.years on market.1: 1998
.0.neighbour.property.years on market.2: 2002
.0.neighbour.property.years on market.3: 2013
.0.neighbour.property.year short listed: 
.0.town: Mountain View
.0.state: CA
.1.name: qianru
.1.age: 20
.1.neighbour.name: joe
.1.neighbour.location: opposite to the park
.1.neighbour.property.built in: 2011
.1.neighbour.property.owned: true
.1.neighbour.property.years on market.0: 1996
.1.neighbour.property.years on market.1: 2011
.1.neighbour.property.year short listed: 
.1.town: Pittsburgh
.1.state: PA

I added +/- 10-15% efficiency to the selected answer by minor code refactoring and moving the recursive function outside of the function namespace.

See my question: Are namespaced functions reevaluated on every call? for why this slows nested functions down.

function _flatten (target, obj, path) {
  var i, empty;
  if (obj.constructor === Object) {
    empty = true;
    for (i in obj) {
      empty = false;
      _flatten(target, obj[i], path ? path + '.' + i : i);
    }
    if (empty && path) {
      target[path] = {};
    }
  } 
  else if (obj.constructor === Array) {
    i = obj.length;
    if (i > 0) {
      while (i--) {
        _flatten(target, obj[i], path + '[' + i + ']');
      }
    } else {
      target[path] = [];
    }
  }
  else {
    target[path] = obj;
  }
}

function flatten (data) {
  var result = {};
  _flatten(result, data, null);
  return result;
}

See benchmark.


Here's mine. It runs in <2ms in Google Apps Script on a sizable object. It uses dashes instead of dots for separators, and it doesn't handle arrays specially like in the asker's question, but this is what I wanted for my use.

function flatten (obj) {
  var newObj = {};
  for (var key in obj) {
    if (typeof obj[key] === 'object' && obj[key] !== null) {
      var temp = flatten(obj[key])
      for (var key2 in temp) {
        newObj[key+"-"+key2] = temp[key2];
      }
    } else {
      newObj[key] = obj[key];
    }
  }
  return newObj;
}

Example:

var test = {
  a: 1,
  b: 2,
  c: {
    c1: 3.1,
    c2: 3.2
  },
  d: 4,
  e: {
    e1: 5.1,
    e2: 5.2,
    e3: {
      e3a: 5.31,
      e3b: 5.32
    },
    e4: 5.4
  },
  f: 6
}

Logger.log("start");
Logger.log(JSON.stringify(flatten(test),null,2));
Logger.log("done");

Example output:

[17-02-08 13:21:05:245 CST] start
[17-02-08 13:21:05:246 CST] {
  "a": 1,
  "b": 2,
  "c-c1": 3.1,
  "c-c2": 3.2,
  "d": 4,
  "e-e1": 5.1,
  "e-e2": 5.2,
  "e-e3-e3a": 5.31,
  "e-e3-e3b": 5.32,
  "e-e4": 5.4,
  "f": 6
}
[17-02-08 13:21:05:247 CST] done

ES6 version:

const flatten = (obj, path = '') => {        
    if (!(obj instanceof Object)) return {[path.replace(/\.$/g, '')]:obj};

    return Object.keys(obj).reduce((output, key) => {
        return obj instanceof Array ? 
             {...output, ...flatten(obj[key], path +  '[' + key + '].')}:
             {...output, ...flatten(obj[key], path + key + '.')};
    }, {});
}

Example:

console.log(flatten({a:[{b:["c","d"]}]}));
console.log(flatten([1,[2,[3,4],5],6]));

Here's a recursive solution for flatten I put together in PowerShell:

#---helper function for ConvertTo-JhcUtilJsonTable
#
function getNodes {
    param (
        [Parameter(Mandatory)]
        [System.Object]
        $job,
        [Parameter(Mandatory)]
        [System.String]
        $path
    )

    $t = $job.GetType()
    $ct = 0
    $h = @{}

    if ($t.Name -eq 'PSCustomObject') {
        foreach ($m in Get-Member -InputObject $job -MemberType NoteProperty) {
            getNodes -job $job.($m.Name) -path ($path + '.' + $m.Name)
        }
        
    }
    elseif ($t.Name -eq 'Object[]') {
        foreach ($o in $job) {
            getNodes -job $o -path ($path + "[$ct]")
            $ct++
        }
    }
    else {
        $h[$path] = $job
        $h
    }
}


#---flattens a JSON document object into a key value table where keys are proper JSON paths corresponding to their value
#
function ConvertTo-JhcUtilJsonTable {
    param (
        [Parameter(Mandatory = $true, ValueFromPipeline = $true)]
        [System.Object[]]
        $jsonObj
    )

    begin {
        $rootNode = 'root'    
    }
    
    process {
        foreach ($o in $jsonObj) {
            $table = getNodes -job $o -path $rootNode

            # $h = @{}
            $a = @()
            $pat = '^' + $rootNode
            
            foreach ($i in $table) {
                foreach ($k in $i.keys) {
                    # $h[$k -replace $pat, ''] = $i[$k]
                    $a += New-Object -TypeName psobject -Property @{'Key' = $($k -replace $pat, ''); 'Value' = $i[$k]}
                    # $h[$k -replace $pat, ''] = $i[$k]
                }
            }
            # $h
            $a
        }
    }

    end{}
}

Example:

'{"name": "John","Address": {"house": "1234", "Street": "Boogie Ave"}, "pets": [{"Type": "Dog", "Age": 4, "Toys": ["rubberBall", "rope"]},{"Type": "Cat", "Age": 7, "Toys": ["catNip"]}]}' | ConvertFrom-Json | ConvertTo-JhcUtilJsonTable
Key              Value
---              -----
.Address.house   1234
.Address.Street  Boogie Ave
.name            John
.pets[0].Age     4
.pets[0].Toys[0] rubberBall
.pets[0].Toys[1] rope
.pets[0].Type    Dog
.pets[1].Age     7
.pets[1].Toys[0] catNip
.pets[1].Type    Cat

3 ½ Years later...

For my own project I wanted to flatten JSON objects in mongoDB dot notation and came up with a simple solution:

/**
 * Recursively flattens a JSON object using dot notation.
 *
 * NOTE: input must be an object as described by JSON spec. Arbitrary
 * JS objects (e.g. {a: () => 42}) may result in unexpected output.
 * MOREOVER, it removes keys with empty objects/arrays as value (see
 * examples bellow).
 *
 * @example
 * // returns {a:1, 'b.0.c': 2, 'b.0.d.e': 3, 'b.1': 4}
 * flatten({a: 1, b: [{c: 2, d: {e: 3}}, 4]})
 * // returns {a:1, 'b.0.c': 2, 'b.0.d.e.0': true, 'b.0.d.e.1': false, 'b.0.d.e.2.f': 1}
 * flatten({a: 1, b: [{c: 2, d: {e: [true, false, {f: 1}]}}]})
 * // return {a: 1}
 * flatten({a: 1, b: [], c: {}})
 *
 * @param obj item to be flattened
 * @param {Array.string} [prefix=[]] chain of prefix joined with a dot and prepended to key
 * @param {Object} [current={}] result of flatten during the recursion
 *
 * @see https://docs.mongodb.com/manual/core/document/#dot-notation
 */
function flatten (obj, prefix, current) {
  prefix = prefix || []
  current = current || {}

  // Remember kids, null is also an object!
  if (typeof (obj) === 'object' && obj !== null) {
    Object.keys(obj).forEach(key => {
      this.flatten(obj[key], prefix.concat(key), current)
    })
  } else {
    current[prefix.join('.')] = obj
  }

  return current
}

Features and/or caveats

  • It only accepts JSON objects. So if you pass something like {a: () => {}} you might not get what you wanted!
  • It removes empty arrays and objects. So this {a: {}, b: []} is flattened to {}.

Use this library:

npm install flat

Usage (from https://www.npmjs.com/package/flat):

Flatten:

    var flatten = require('flat')


    flatten({
        key1: {
            keyA: 'valueI'
        },
        key2: {
            keyB: 'valueII'
        },
        key3: { a: { b: { c: 2 } } }
    })

    // {
    //   'key1.keyA': 'valueI',
    //   'key2.keyB': 'valueII',
    //   'key3.a.b.c': 2
    // }

Un-flatten:

var unflatten = require('flat').unflatten

unflatten({
    'three.levels.deep': 42,
    'three.levels': {
        nested: true
    }
})

// {
//     three: {
//         levels: {
//             deep: 42,
//             nested: true
//         }
//     }
// }

I'd like to add a new version of flatten case (this is what i needed :)) which, according to my probes with the above jsFiddler, is slightly faster then the currently selected one. Moreover, me personally see this snippet a bit more readable, which is of course important for multi-developer projects.

function flattenObject(graph) {
    let result = {},
        item,
        key;

    function recurr(graph, path) {
        if (Array.isArray(graph)) {
            graph.forEach(function (itm, idx) {
                key = path + '[' + idx + ']';
                if (itm && typeof itm === 'object') {
                    recurr(itm, key);
                } else {
                    result[key] = itm;
                }
            });
        } else {
            Reflect.ownKeys(graph).forEach(function (p) {
                key = path + '.' + p;
                item = graph[p];
                if (item && typeof item === 'object') {
                    recurr(item, key);
                } else {
                    result[key] = item;
                }
            });
        }
    }
    recurr(graph, '');

    return result;
}

Here is some code I wrote to flatten an object I was working with. It creates a new class that takes every nested field and brings it into the first layer. You could modify it to unflatten by remembering the original placement of the keys. It also assumes the keys are unique even across nested objects. Hope it helps.

class JSONFlattener {
    ojson = {}
    flattenedjson = {}

    constructor(original_json) {
        this.ojson = original_json
        this.flattenedjson = {}
        this.flatten()
    }

    flatten() {
        Object.keys(this.ojson).forEach(function(key){
            if (this.ojson[key] == null) {

            } else if (this.ojson[key].constructor == ({}).constructor) {
                this.combine(new JSONFlattener(this.ojson[key]).returnJSON())
            } else {
                this.flattenedjson[key] = this.ojson[key]
            }
        }, this)        
    }

    combine(new_json) {
        //assumes new_json is a flat array
        Object.keys(new_json).forEach(function(key){
            if (!this.flattenedjson.hasOwnProperty(key)) {
                this.flattenedjson[key] = new_json[key]
            } else {
                console.log(key+" is a duplicate key")
            }
        }, this)
    }

    returnJSON() {
        return this.flattenedjson
    }
}

console.log(new JSONFlattener(dad_dictionary).returnJSON())

As an example, it converts

nested_json = {
    "a": {
        "b": {
            "c": {
                "d": {
                    "a": 0
                }
            }
        }
    },
    "z": {
        "b":1
    },
    "d": {
        "c": {
            "c": 2
        }
    }
}

into

{ a: 0, b: 1, c: 2 }


Here's another approach that runs slower (about 1000ms) than the above answer, but has an interesting idea :-)

Instead of iterating through each property chain, it just picks the last property and uses a look-up-table for the rest to store the intermediate results. This look-up-table will be iterated until there are no property chains left and all values reside on uncocatenated properties.

JSON.unflatten = function(data) {
    "use strict";
    if (Object(data) !== data || Array.isArray(data))
        return data;
    var regex = /\.?([^.\[\]]+)$|\[(\d+)\]$/,
        props = Object.keys(data),
        result, p;
    while(p = props.shift()) {
        var m = regex.exec(p),
            target;
        if (m.index) {
            var rest = p.slice(0, m.index);
            if (!(rest in data)) {
                data[rest] = m[2] ? [] : {};
                props.push(rest);
            }
            target = data[rest];
        } else {
            target = result || (result = (m[2] ? [] : {}));
        }
        target[m[2] || m[1]] = data[p];
    }
    return result;
};

It currently uses the data input parameter for the table, and puts lots of properties on it - a non-destructive version should be possible as well. Maybe a clever lastIndexOf usage performs better than the regex (depends on the regex engine).

See it in action here.


I wrote two functions to flatten and unflatten a JSON object.


Flatten a JSON object:

var flatten = (function (isArray, wrapped) {
    return function (table) {
        return reduce("", {}, table);
    };

    function reduce(path, accumulator, table) {
        if (isArray(table)) {
            var length = table.length;

            if (length) {
                var index = 0;

                while (index < length) {
                    var property = path + "[" + index + "]", item = table[index++];
                    if (wrapped(item) !== item) accumulator[property] = item;
                    else reduce(property, accumulator, item);
                }
            } else accumulator[path] = table;
        } else {
            var empty = true;

            if (path) {
                for (var property in table) {
                    var item = table[property], property = path + "." + property, empty = false;
                    if (wrapped(item) !== item) accumulator[property] = item;
                    else reduce(property, accumulator, item);
                }
            } else {
                for (var property in table) {
                    var item = table[property], empty = false;
                    if (wrapped(item) !== item) accumulator[property] = item;
                    else reduce(property, accumulator, item);
                }
            }

            if (empty) accumulator[path] = table;
        }

        return accumulator;
    }
}(Array.isArray, Object));

Performance:

  1. It's faster than the current solution in Opera. The current solution is 26% slower in Opera.
  2. It's faster than the current solution in Firefox. The current solution is 9% slower in Firefox.
  3. It's faster than the current solution in Chrome. The current solution is 29% slower in Chrome.

Unflatten a JSON object:

function unflatten(table) {
    var result = {};

    for (var path in table) {
        var cursor = result, length = path.length, property = "", index = 0;

        while (index < length) {
            var char = path.charAt(index);

            if (char === "[") {
                var start = index + 1,
                    end = path.indexOf("]", start),
                    cursor = cursor[property] = cursor[property] || [],
                    property = path.slice(start, end),
                    index = end + 1;
            } else {
                var cursor = cursor[property] = cursor[property] || {},
                    start = char === "." ? index + 1 : index,
                    bracket = path.indexOf("[", start),
                    dot = path.indexOf(".", start);

                if (bracket < 0 && dot < 0) var end = index = length;
                else if (bracket < 0) var end = index = dot;
                else if (dot < 0) var end = index = bracket;
                else var end = index = bracket < dot ? bracket : dot;

                var property = path.slice(start, end);
            }
        }

        cursor[property] = table[path];
    }

    return result[""];
}

Performance:

  1. It's faster than the current solution in Opera. The current solution is 5% slower in Opera.
  2. It's slower than the current solution in Firefox. My solution is 26% slower in Firefox.
  3. It's slower than the current solution in Chrome. My solution is 6% slower in Chrome.

Flatten and unflatten a JSON object:

Overall my solution performs either equally well or even better than the current solution.

Performance:

  1. It's faster than the current solution in Opera. The current solution is 21% slower in Opera.
  2. It's as fast as the current solution in Firefox.
  3. It's faster than the current solution in Firefox. The current solution is 20% slower in Chrome.

Output format:

A flattened object uses the dot notation for object properties and the bracket notation for array indices:

  1. {foo:{bar:false}} => {"foo.bar":false}
  2. {a:[{b:["c","d"]}]} => {"a[0].b[0]":"c","a[0].b[1]":"d"}
  3. [1,[2,[3,4],5],6] => {"[0]":1,"[1][0]":2,"[1][1][0]":3,"[1][1][1]":4,"[1][2]":5,"[2]":6}

In my opinion this format is better than only using the dot notation:

  1. {foo:{bar:false}} => {"foo.bar":false}
  2. {a:[{b:["c","d"]}]} => {"a.0.b.0":"c","a.0.b.1":"d"}
  3. [1,[2,[3,4],5],6] => {"0":1,"1.0":2,"1.1.0":3,"1.1.1":4,"1.2":5,"2":6}

Advantages:

  1. Flattening an object is faster than the current solution.
  2. Flattening and unflattening an object is as fast as or faster than the current solution.
  3. Flattened objects use both the dot notation and the bracket notation for readability.

Disadvantages:

  1. Unflattening an object is slower than the current solution in most (but not all) cases.

The current JSFiddle demo gave the following values as output:

Nested : 132175 : 63
Flattened : 132175 : 564
Nested : 132175 : 54
Flattened : 132175 : 508

My updated JSFiddle demo gave the following values as output:

Nested : 132175 : 59
Flattened : 132175 : 514
Nested : 132175 : 60
Flattened : 132175 : 451

I'm not really sure what that means, so I'll stick with the jsPerf results. After all jsPerf is a performance benchmarking utility. JSFiddle is not.