[yaml] How can I include a YAML file inside another?

So I have two YAML files, "A" and "B" and I want the contents of A to be inserted inside B, either spliced into the existing data structure, like an array, or as a child of an element, like the value for a certain hash key.

Is this possible at all? How? If not, any pointers to a normative reference?

This question is related to yaml transclusion

The answer is


I think the solution used by @maxy-B looks great. However, it didn't succeed for me with nested inclusions. For example if config_1.yaml includes config_2.yaml, which includes config_3.yaml there was a problem with the loader. However, if you simply point the new loader class to itself on load, it works! Specifically, if we replace the old _include function with the very slightly modified version:

def _include(self, loader, node):                                    
     oldRoot = self.root                                              
     filename = os.path.join(self.root, loader.construct_scalar(node))
     self.root = os.path.dirname(filename)                           
     data = yaml.load(open(filename, 'r'), loader = IncludeLoader)                            
     self.root = oldRoot                                              
     return data

Upon reflection I agree with the other comments, that nested loading is not appropriate for yaml in general as the input stream may not be a file, but it is very useful!


For Python users, you can try pyyaml-include.

Install

pip install pyyaml-include

Usage

import yaml
from yamlinclude import YamlIncludeConstructor

YamlIncludeConstructor.add_to_loader_class(loader_class=yaml.FullLoader, base_dir='/your/conf/dir')

with open('0.yaml') as f:
    data = yaml.load(f, Loader=yaml.FullLoader)

print(data)

Consider we have such YAML files:

+-- 0.yaml
+-- include.d
    +-- 1.yaml
    +-- 2.yaml
  • 1.yaml 's content:
name: "1"
  • 2.yaml 's content:
name: "2"

Include files by name

  • On top level:

    If 0.yaml was:

!include include.d/1.yaml

We'll get:

{"name": "1"}
  • In mapping:

    If 0.yaml was:

file1: !include include.d/1.yaml
file2: !include include.d/2.yaml

We'll get:

  file1:
    name: "1"
  file2:
    name: "2"
  • In sequence:

    If 0.yaml was:

files:
  - !include include.d/1.yaml
  - !include include.d/2.yaml

We'll get:

files:
  - name: "1"
  - name: "2"

? Note:

File name can be either absolute (like /usr/conf/1.5/Make.yml) or relative (like ../../cfg/img.yml).

Include files by wildcards

File name can contain shell-style wildcards. Data loaded from the file(s) found by wildcards will be set in a sequence.

If 0.yaml was:

files: !include include.d/*.yaml

We'll get:

files:
  - name: "1"
  - name: "2"

? Note:

  • For Python>=3.5, if recursive argument of !include YAML tag is true, the pattern “**” will match any files and zero or more directories and subdirectories.
  • Using the “**” pattern in large directory trees may consume an inordinate amount of time because of recursive search.

In order to enable recursive argument, we shall write the !include tag in Mapping or Sequence mode:

  • Arguments in Sequence mode:
!include [tests/data/include.d/**/*.yaml, true]
  • Arguments in Mapping mode:
!include {pathname: tests/data/include.d/**/*.yaml, recursive: true}

Probably it was not supported when question was asked but you can import other YAML file into one:

imports: [/your_location_to_yaml_file/Util.area.yaml]

Though I don't have any online reference but this works for me.


Includes are not directly supported in YAML as far as I know, you will have to provide a mechanism yourself however, this is generally easy to do.

I have used YAML as a configuration language in my python apps, and in this case often define a convention like this:

>>> main.yml <<<
includes: [ wibble.yml, wobble.yml]

Then in my (python) code I do:

import yaml
cfg = yaml.load(open("main.yml"))
for inc in cfg.get("includes", []):
   cfg.update(yaml.load(open(inc)))

The only down side is that variables in the includes will always override the variables in main, and there is no way to change that precedence by changing where the "includes: statement appears in the main.yml file.

On a slightly different point, YAML doesn't support includes as its not really designed as as exclusively as a file based mark up. What would an include mean if you got it in a response to an AJAX request?


Unfortunately YAML doesn't provide this in its standard.

But if you are using Ruby, there is a gem providing the functionality you are asking for by extending the ruby YAML library: https://github.com/entwanderer/yaml_extend


With Symfony, its handling of yaml will indirectly allow you to nest yaml files. The trick is to make use of the parameters option. eg:

common.yml

parameters:
    yaml_to_repeat:
        option: "value"
        foo:
            - "bar"
            - "baz"

config.yml

imports:
    - { resource: common.yml }
whatever:
    thing: "%yaml_to_repeat%"
    other_thing: "%yaml_to_repeat%"

The result will be the same as:

whatever:
    thing:
        option: "value"
        foo:
            - "bar"
            - "baz"
    other_thing:
        option: "value"
        foo:
            - "bar"
            - "baz"

I make some examples for your reference.

import yaml

main_yaml = """
Package:
 - !include _shape_yaml    
 - !include _path_yaml
"""

_shape_yaml = """
# Define
Rectangle: &id_Rectangle
    name: Rectangle
    width: &Rectangle_width 20
    height: &Rectangle_height 10
    area: !product [*Rectangle_width, *Rectangle_height]

Circle: &id_Circle
    name: Circle
    radius: &Circle_radius 5
    area: !product [*Circle_radius, *Circle_radius, pi]

# Setting
Shape:
    property: *id_Rectangle
    color: red
"""

_path_yaml = """
# Define
Root: &BASE /path/src/

Paths: 
    a: &id_path_a !join [*BASE, a]
    b: &id_path_b !join [*BASE, b]

# Setting
Path:
    input_file: *id_path_a
"""


# define custom tag handler
def yaml_import(loader, node):
    other_yaml_file = loader.construct_scalar(node)
    return yaml.load(eval(other_yaml_file), Loader=yaml.SafeLoader)


def yaml_product(loader, node):
    import math
    list_data = loader.construct_sequence(node)
    result = 1
    pi = math.pi
    for val in list_data:
        result *= eval(val) if isinstance(val, str) else val
    return result


def yaml_join(loader, node):
    seq = loader.construct_sequence(node)
    return ''.join([str(i) for i in seq])


def yaml_ref(loader, node):
    ref = loader.construct_sequence(node)
    return ref[0]


def yaml_dict_ref(loader: yaml.loader.SafeLoader, node):
    dict_data, key, const_value = loader.construct_sequence(node)
    return dict_data[key] + str(const_value)


def main():
    # register the tag handler
    yaml.SafeLoader.add_constructor(tag='!include', constructor=yaml_import)
    yaml.SafeLoader.add_constructor(tag='!product', constructor=yaml_product)
    yaml.SafeLoader.add_constructor(tag='!join', constructor=yaml_join)
    yaml.SafeLoader.add_constructor(tag='!ref', constructor=yaml_ref)
    yaml.SafeLoader.add_constructor(tag='!dict_ref', constructor=yaml_dict_ref)

    config = yaml.load(main_yaml, Loader=yaml.SafeLoader)

    pk_shape, pk_path = config['Package']
    pk_shape, pk_path = pk_shape['Shape'], pk_path['Path']
    print(f"shape name: {pk_shape['property']['name']}")
    print(f"shape area: {pk_shape['property']['area']}")
    print(f"shape color: {pk_shape['color']}")

    print(f"input file: {pk_path['input_file']}")


if __name__ == '__main__':
    main()

output

shape name: Rectangle
shape area: 200
shape color: red
input file: /path/src/a

Update 2

and you can combine it, like this

# xxx.yaml
CREATE_FONT_PICTURE:
  PROJECTS:
    SUNG: &id_SUNG
      name: SUNG
      work_dir: SUNG
      output_dir: temp
      font_pixel: 24


  DEFINE: &id_define !ref [*id_SUNG]  # you can use config['CREATE_FONT_PICTURE']['DEFINE'][name, work_dir, ... font_pixel]
  AUTO_INIT:
    basename_suffix: !dict_ref [*id_define, name, !product [5, 3, 2]]  # SUNG30

# ? This is not correct.
# basename_suffix: !dict_ref [*id_define, name, !product [5, 3, 2]]  # It will build by Deep-level. id_define is Deep-level: 2. So you must put it after 2. otherwise, it can't refer to the correct value.

The YML standard does not specify a way to do this. And this problem does not limit itself to YML. JSON has the same limitations.

Many applications which use YML or JSON based configurations run into this problem eventually. And when that happens, they make up their own convention.

e.g. for swagger API definitions:

$ref: 'file.yml'

e.g. for docker compose configurations:

services:
  app:
    extends:
      file: docker-compose.base.yml

Alternatively, if you want to split up the content of a yml file in multiple files, like a tree of content, you can define your own folder-structure convention and use an (existing) merge script.


Maybe this could inspire you, try to align to jbb conventions:

https://docs.openstack.org/infra/jenkins-job-builder/definition.html#inclusion-tags

- job: name: test-job-include-raw-1 builders: - shell: !include-raw: include-raw001-hello-world.sh


Your question does not ask for a Python solution, but here is one using PyYAML.

PyYAML allows you to attach custom constructors (such as !include) to the YAML loader. I've included a root directory that can be set so that this solution supports relative and absolute file references.

Class-Based Solution

Here is a class-based solution, that avoids the global root variable of my original response.

See this gist for a similar, more robust Python 3 solution that uses a metaclass to register the custom constructor.

import yaml
import os

class Loader(yaml.SafeLoader):

    def __init__(self, stream):

        self._root = os.path.split(stream.name)[0]

        super(Loader, self).__init__(stream)

    def include(self, node):

        filename = os.path.join(self._root, self.construct_scalar(node))

        with open(filename, 'r') as f:
            return yaml.load(f, Loader)

Loader.add_constructor('!include', Loader.include)

An example:

foo.yaml

a: 1
b:
    - 1.43
    - 543.55
c: !include bar.yaml

bar.yaml

- 3.6
- [1, 2, 3]

Now the files can be loaded using:

>>> with open('foo.yaml', 'r') as f:
>>>    data = yaml.load(f, Loader)
>>> data
{'a': 1, 'b': [1.43, 543.55], 'c': [3.6, [1, 2, 3]]}

Standard YAML 1.2 doesn't include natively this feature. Nevertheless many implementations provides some extension to do so.

I present a way of achieving it with Java and snakeyaml:1.24 (Java library to parse/emit YAML files) that allows creating a custom YAML tag to achieve the following goal (you will see I'm using it to load test suites defined in several YAML files and that I made it work as a list of includes for a target test: node):

# ... yaml prev stuff

tests: !include
  - '1.hello-test-suite.yaml'
  - '3.foo-test-suite.yaml'
  - '2.bar-test-suite.yaml'

# ... more yaml document

Here is the one-class Java that allows processing the !include tag. Files are loaded from classpath (Maven resources directory):

/**
 * Custom YAML loader. It adds support to the custom !include tag which allows splitting a YAML file across several
 * files for a better organization of YAML tests.
 */
@Slf4j   // <-- This is a Lombok annotation to auto-generate logger
public class MyYamlLoader {

    private static final Constructor CUSTOM_CONSTRUCTOR = new MyYamlConstructor();

    private MyYamlLoader() {
    }

    /**
     * Parse the only YAML document in a stream and produce the Java Map. It provides support for the custom !include
     * YAML tag to split YAML contents across several files.
     */
    public static Map<String, Object> load(InputStream inputStream) {
        return new Yaml(CUSTOM_CONSTRUCTOR)
                .load(inputStream);
    }


    /**
     * Custom SnakeYAML constructor that registers custom tags.
     */
    private static class MyYamlConstructor extends Constructor {

        private static final String TAG_INCLUDE = "!include";

        MyYamlConstructor() {
            // Register custom tags
            yamlConstructors.put(new Tag(TAG_INCLUDE), new IncludeConstruct());
        }

        /**
         * The actual include tag construct.
         */
        private static class IncludeConstruct implements Construct {

            @Override
            public Object construct(Node node) {
                List<Node> inclusions = castToSequenceNode(node);
                return parseInclusions(inclusions);
            }

            @Override
            public void construct2ndStep(Node node, Object object) {
                // do nothing
            }

            private List<Node> castToSequenceNode(Node node) {
                try {
                    return ((SequenceNode) node).getValue();

                } catch (ClassCastException e) {
                    throw new IllegalArgumentException(String.format("The !import value must be a sequence node, but " +
                            "'%s' found.", node));
                }
            }

            private Object parseInclusions(List<Node> inclusions) {

                List<InputStream> inputStreams = inputStreams(inclusions);

                try (final SequenceInputStream sequencedInputStream =
                             new SequenceInputStream(Collections.enumeration(inputStreams))) {

                    return new Yaml(CUSTOM_CONSTRUCTOR)
                            .load(sequencedInputStream);

                } catch (IOException e) {
                    log.error("Error closing the stream.", e);
                    return null;
                }
            }

            private List<InputStream> inputStreams(List<Node> scalarNodes) {
                return scalarNodes.stream()
                        .map(this::inputStream)
                        .collect(toList());
            }

            private InputStream inputStream(Node scalarNode) {
                String filePath = castToScalarNode(scalarNode).getValue();
                final InputStream is = getClass().getClassLoader().getResourceAsStream(filePath);
                Assert.notNull(is, String.format("Resource file %s not found.", filePath));
                return is;
            }

            private ScalarNode castToScalarNode(Node scalarNode) {
                try {
                    return ((ScalarNode) scalarNode);

                } catch (ClassCastException e) {
                    throw new IllegalArgumentException(String.format("The value must be a scalar node, but '%s' found" +
                            ".", scalarNode));
                }
            }
        }

    }

}

Expanding on @Josh_Bode's answer, here's my own PyYAML solution, which has the advantage of being a self-contained subclass of yaml.Loader. It doesn't depend on any module-level globals, or on modifying the global state of the yaml module.

import yaml, os

class IncludeLoader(yaml.Loader):                                                 
    """                                                                           
    yaml.Loader subclass handles "!include path/to/foo.yml" directives in config  
    files.  When constructed with a file object, the root path for includes       
    defaults to the directory containing the file, otherwise to the current       
    working directory. In either case, the root path can be overridden by the     
    `root` keyword argument.                                                      

    When an included file F contain its own !include directive, the path is       
    relative to F's location.                                                     

    Example:                                                                      
        YAML file /home/frodo/one-ring.yml:                                       
            ---                                                                   
            Name: The One Ring                                                    
            Specials:                                                             
                - resize-to-wearer                                                
            Effects: 
                - !include path/to/invisibility.yml                            

        YAML file /home/frodo/path/to/invisibility.yml:                           
            ---                                                                   
            Name: invisibility                                                    
            Message: Suddenly you disappear!                                      

        Loading:                                                                  
            data = IncludeLoader(open('/home/frodo/one-ring.yml', 'r')).get_data()

        Result:                                                                   
            {'Effects': [{'Message': 'Suddenly you disappear!', 'Name':            
                'invisibility'}], 'Name': 'The One Ring', 'Specials':              
                ['resize-to-wearer']}                                             
    """                                                                           
    def __init__(self, *args, **kwargs):                                          
        super(IncludeLoader, self).__init__(*args, **kwargs)                      
        self.add_constructor('!include', self._include)                           
        if 'root' in kwargs:                                                      
            self.root = kwargs['root']                                            
        elif isinstance(self.stream, file):                                       
            self.root = os.path.dirname(self.stream.name)                         
        else:                                                                     
            self.root = os.path.curdir                                            

    def _include(self, loader, node):                                    
        oldRoot = self.root                                              
        filename = os.path.join(self.root, loader.construct_scalar(node))
        self.root = os.path.dirname(filename)                           
        data = yaml.load(open(filename, 'r'))                            
        self.root = oldRoot                                              
        return data                                                      

If you're using Symfony's version of YAML, this is possible, like this:

imports:
    - { resource: sub-directory/file.yml }
    - { resource: sub-directory/another-file.yml }

With Yglu, you can import other files like this:

A.yaml

foo: !? $import('B.yaml')

B.yaml

bar: Hello
$ yglu A.yaml
foo:
  bar: Hello

As $import is a function, you can also pass an expression as argument:

  dep: !- b
  foo: !? $import($_.dep.toUpper() + '.yaml')

This would give the same output as above.

Disclaimer: I am the author of Yglu.