[json] What is the difference between YAML and JSON?

What are the differences between YAML and JSON, specifically considering the following things?

  • Performance (encode/decode time)
  • Memory consumption
  • Expression clarity
  • Library availability, ease of use (I prefer C)

I was planning to use one of these two in our embedded system to store configure files.

Related:

Should I use YAML or JSON to store my Perl data?

This question is related to json yaml

The answer is


From: Arnaud Lauret Book “The Design of Web APIs.” :

The JSON data format

JSON is a text data format based on how the JavaScript programming language describes data but is, despite its name, completely language-independent (see https://www.json.org/). Using JSON, you can describe objects containing unordered name/value pairs and also arrays or lists containing ordered values, as shown in this figure.

enter image description here

An object is delimited by curly braces ({}). A name is a quoted string ("name") and is sep- arated from its value by a colon (:). A value can be a string like "value", a number like 1.23, a Boolean (true or false), the null value null, an object, or an array. An array is delimited by brackets ([]), and its values are separated by commas (,). The JSON format is easily parsed using any programming language. It is also relatively easy to read and write. It is widely adopted for many uses such as databases, configura- tion files, and, of course, APIs.

YAML

YAML (YAML Ain’t Markup Language) is a human-friendly, data serialization format. Like JSON, YAML (http://yaml.org) is a key/value data format. The figure shows a comparison of the two.

enter image description here

Note the following points:

  • There are no double quotes (" ") around property names and values in YAML.

  • JSON’s structural curly braces ({}) and commas (,) are replaced by newlines and indentation in YAML.

  • Array brackets ([]) and commas (,) are replaced by dashes (-) and newlines in YAML.

  • Unlike JSON, YAML allows comments beginning with a hash mark (#). It is relatively easy to convert one of those formats into the other. Be forewarned though, you will lose comments when converting a YAML document to JSON.


If you don't need any features which YAML has and JSON doesn't, I would prefer JSON because it is very simple and is widely supported (has a lot of libraries in many languages). YAML is more complex and has less support. I don't think the parsing speed or memory use will be very much different, and maybe not a big part of your program's performance.


Sometimes you don't have to decide for one over the other.

In Go, for example, you can have both at the same time:

type Person struct {
    Name string `json:"name" yaml:"name"`
    Age int `json:"age" yaml:"age"`
}

Technically YAML offers a lot more than JSON (YAML v1.2 is a superset of JSON):

  • comments
  • anchors and inheritance - example of 3 identical items:

    item1: &anchor_name
      name: Test
      title: Test title
    item2: *anchor_name
    item3:
      <<: *anchor_name
      # You may add extra stuff.
    
  • ...

Most of the time people will not use those extra features and the main difference is that YAML uses indentation whilst JSON uses brackets. This makes YAML more concise and readable (for the trained eye).

Which one to choose?

  • YAML extra features and concise notation makes it a good choice for configuration files (non-user provided files).
  • JSON limited features, wide support, and faster parsing makes it a great choice for interoperability and user provided data.

GIT and YAML

The other answers are good. Read those first. But I'll add one other reason to use YAML sometimes: git.

Increasingly, many programming projects use git repositories for distribution and archival. And, while a git repo's history can equally store JSON and YAML files, the "diff" method used for tracking and displaying changes to a file is line-oriented. Since YAML is forced to be line-oriented, any small changes in a YAML file are easier to see by a human.

It is true, of course, that JSON files can be "made pretty" by sorting the strings/keys and adding indentation. But this is not the default and I'm lazy.

Personally, I generally use JSON for system-to-system interaction. I often use YAML for config files, static files, and tracked files. (I also generally avoid adding YAML relational anchors. Life is too short to hunt down loops.)

Also, if speed and space are really a concern, I don't use either. You might want to look at BSON.


I find YAML to be easier on the eyes: less parenthesis, "" etc. Although there is the annoyance of tabs in YAML... but one gets the hang of it.

In terms of performance/resources, I wouldn't expect big differences between the two.

Futhermore, we are talking about configuration files and so I wouldn't expect a high frequency of encode/decode activity, no?


This question is 6 years old, but strangely, none of the answers really addresses all four points (speed, memory, expressiveness, portability).

Speed

Obviously this is implementation-dependent, but because JSON is so widely used, and so easy to implement, it has tended to receive greater native support, and hence speed. Considering that YAML does everything that JSON does, plus a truckload more, it's likely that of any comparable implementations of both, the JSON one will be quicker.

However, given that a YAML file can be slightly smaller than its JSON counterpart (due to fewer " and , characters), it's possible that a highly optimised YAML parser might be quicker in exceptional circumstances.

Memory

Basically the same argument applies. It's hard to see why a YAML parser would ever be more memory efficient than a JSON parser, if they're representing the same data structure.

Expressiveness

As noted by others, Python programmers tend towards preferring YAML, JavaScript programmers towards JSON. I'll make these observations:

  • It's easy to memorise the entire syntax of JSON, and hence be very confident about understanding the meaning of any JSON file. YAML is not truly understandable by any human. The number of subtleties and edge cases is extreme.
  • Because few parsers implement the entire spec, it's even harder to be certain about the meaning of a given expression in a given context.
  • The lack of comments in JSON is, in practice, a real pain.

Portability

It's hard to imagine a modern language without a JSON library. It's also hard to imagine a JSON parser implementing anything less than the full spec. YAML has widespread support, but is less ubiquitous than JSON, and each parser implements a different subset. Hence YAML files are less interoperable than you might think.

Summary

JSON is the winner for performance (if relevant) and interoperability. YAML is better for human-maintained files. HJSON is a decent compromise although with much reduced portability. JSON5 is a more reasonable compromise, with well-defined syntax.


Differences:

  1. YAML, depending on how you use it, can be more readable than JSON
  2. JSON is often faster and is probably still interoperable with more systems
  3. It's possible to write a "good enough" JSON parser very quickly
  4. Duplicate keys, which are potentially valid JSON, are definitely invalid YAML.
  5. YAML has a ton of features, including comments and relational anchors. YAML syntax is accordingly quite complex, and can be hard to understand.
  6. It is possible to write recursive structures in yaml: {a: &b [*b]}, which will loop infinitely in some converters. Even with circular detection, a "yaml bomb" is still possible (see xml bomb).
  7. Because there are no references, it is impossible to serialize complex structures with object references in JSON. YAML serialization can therefore be more efficient.
  8. In some coding environments, the use of YAML can allow an attacker to execute arbitrary code.

Observations:

  1. Python programmers are generally big fans of YAML, because of the use of indentation, rather than bracketed syntax, to indicate levels.
  2. Many programmers consider the attachment of "meaning" to indentation a poor choice.
  3. If the data format will be leaving an application's environment, parsed within a UI, or sent in a messaging layer, JSON might be a better choice.
  4. YAML can be used, directly, for complex tasks like grammar definitions, and is often a better choice than inventing a new language.

Benchmark results

Below are the results of a benchmark to compare YAML vs JSON loading times, on Python and Perl

JSON is much faster, at the expense of some readability, and features such as comments

Test method

Results

Python 3.8.3 timeit
    JSON:            0.108
    YAML CLoader:    3.684
    YAML:           29.763

Perl 5.26.2-043 Benchmark::cmpthese
    JSON XS:         0.107
    YAML XS:         0.574
    YAML Syck:       1.050

Perl 5.26.2-043 Dumbbench (Brian D Foy, excludes outliers)
    JSON XS:         0.102
    YAML XS:         0.514
    YAML Syck:       1.027

Since this question now features prominently when searching for YAML and JSON, it's worth noting one rarely-cited difference between the two: license. JSON purports to have a license which JSON users must adhere to (including the legally-ambiguous "shall be used for Good, not Evil"). YAML carries no such license claim, and that might be an important difference (to your lawyer, if not to you).


Bypassing esoteric theory

This answers the title, not the details as most just read the title from a search result on google like me so I felt it was necessary to explain from a web developer perspective.

  1. YAML uses space indentation, which is familiar territory for Python developers.
  2. JavaScript developers love JSON because it is a subset of JavaScript and can be directly interpreted and written inside JavaScript, along with using a shorthand way to declare JSON, requiring no double quotes in keys when using typical variable names without spaces.
  3. There are a plethora of parsers that work very well in all languages for both YAML and JSON.
  4. YAML's space format can be much easier to look at in many cases because the formatting requires a more human-readable approach.
  5. YAML's form while being more compact and easier to look at can be deceptively difficult to hand edit if you don't have space formatting visible in your editor. Tabs are not spaces so that further confuses if you don't have an editor to interpret your keystrokes into spaces.
  6. JSON is much faster to serialize and deserialize because of significantly less features than YAML to check for, which enables smaller and lighter code to process JSON.
  7. A common misconception is that YAML needs less punctuation and is more compact than JSON but this is completely false. Whitespace is invisible so it seems like there are less characters, but if you count the actual whitespace which is necessary to be there for YAML to be interpreted properly along with proper indentation, you will find YAML actually requires more characters than JSON. JSON doesn't use whitespace to represent hierarchy or grouping and can be easily flattened with unnecessary whitespace removed for more compact transport.

The Elephant in the room: The Internet itself

JavaScript so clearly dominates the web by a huge margin and JavaScript developers prefer using JSON as the data format overwhelmingly along with popular web APIs so it becomes difficult to argue using YAML over JSON when doing web programming in the general sense as you will likely be outvoted in a team environment. In fact, the majority of web programmers aren't even aware YAML exists, let alone consider using it.

If you are doing any web programming, JSON is the default way to go because no translation step is needed when working with JavaScript so then you must come up with a better argument to use YAML over JSON in that case.


I find both YAML and JSON to be very effective. The only two things that really dictate when one is used over the other for me is one, what the language is used most popularly with. For example, if I'm using Java, Javascript, I'll use JSON. For Java, I'll use their own objects, which are pretty much JSON but lacking in some features, and convert it to JSON if I need to or make it in JSON in the first place. I do that because that's a common thing in Java and makes it easier for other Java developers to modify my code. The second thing is whether I'm using it for the program to remember attributes, or if the program is receiving instructions in the form of a config file, in this case I'll use YAML, because it's very easily human read, has nice looking syntax, and is very easy to modify, even if you have no idea how YAML works. Then, the program will read it and convert it to JSON, or whatever is preferred for that language.

In the end, it honestly doesn't matter. Both JSON and YAML are easily read by any experienced programmer.