[image] Finding the layers and layer sizes for each Docker image

For research purposes I'm trying to crawl the public Docker registry ( https://registry.hub.docker.com/ ) and find out 1) how many layers an average image has and 2) the sizes of these layers to get an idea of the distribution.

However I studied the API and public libraries as well as the details on the github but I cant find any method to:

  • retrieve all the public repositories/images (even if those are thousands I still need a starting list to iterate through)
  • find all the layers of an image
  • find the size for a layer (so not an image but for the individual layer).

Can anyone help me find a way to retrieve this information?

Thank you!

EDIT: is anyone able to verify that searching for '*' in Docker registry is returning all the repositories and not just anything that mentions '*' anywhere? https://registry.hub.docker.com/search?q=*

This question is related to image docker web-crawler

The answer is


You can find the layers of the images in the folder /var/lib/docker/aufs/layers; provide if you configured for storage-driver as aufs (default option)

Example:

 docker ps -a
 CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS                      PORTS               NAMES
 0ca502fa6aae        ubuntu              "/bin/bash"         44 minutes ago      Exited (0) 44 seconds ago                       DockerTest

Now to view the layers of the containers that were created with the image "Ubuntu"; go to /var/lib/docker/aufs/layers directory and cat the file starts with the container ID (here it is 0ca502fa6aae*)

 root@viswesn-vm2:/var/lib/docker/aufs/layers# cat    0ca502fa6aaefc89f690736609b54b2f0fdebfe8452902ca383020e3b0d266f9-init 
 d2a0ecffe6fa4ef3de9646a75cc629bbd9da7eead7f767cb810f9808d6b3ecb6
 29460ac934423a55802fcad24856827050697b4a9f33550bd93c82762fb6db8f
 b670fb0c7ecd3d2c401fbfd1fa4d7a872fbada0a4b8c2516d0be18911c6b25d6
 83e4dde6b9cfddf46b75a07ec8d65ad87a748b98cf27de7d5b3298c1f3455ae4

This will show the result of same by running

root@viswesn-vm2:/var/lib/docker/aufs/layers# docker history ubuntu
IMAGE               CREATED             CREATED BY                                         SIZE                COMMENT
d2a0ecffe6fa        13 days ago         /bin/sh -c #(nop) CMD ["/bin/bash"]             0 B                 
29460ac93442        13 days ago         /bin/sh -c sed -i 's/^#\s*\   (deb.*universe\)$/   1.895 kB            
b670fb0c7ecd        13 days ago         /bin/sh -c echo '#!/bin/sh' > /usr/sbin/polic   194.5 kB            
83e4dde6b9cf        13 days ago         /bin/sh -c #(nop) ADD file:c8f078961a543cdefa   188.2 MB 

To view the full layer ID; run with --no-trunc option as part of history command.

docker history --no-trunc ubuntu

You can first find the image ID using:

$ docker images -a

Then find the image's layers and their sizes:

$ docker history --no-trunc <Image ID>

Note: I'm using Docker version 1.13.1

$ docker -v
Docker version 1.13.1, build 092cba3

one more tool : https://github.com/CenturyLinkLabs/dockerfile-from-image

GUI using ImageLayers.io


  1. https://hub.docker.com/search?q=* shows all the images in the entire Docker hub, it's not possible to get this via the search command as it doesnt accept wildcards.

  2. As of v1.10 you can find all the layers in an image by pulling it and using these commands:

    docker pull ubuntu
    ID=$(sudo docker inspect -f {{.Id}} ubuntu)
    jq .rootfs.diff_ids /var/lib/docker/image/aufs/imagedb/content/$(echo $ID|tr ':' '/')
    

3) The size can be found in /var/lib/docker/image/aufs/layerdb/sha256/{LAYERID}/size although LAYERID != the diff_ids found with the previous command. For this you need to look at /var/lib/docker/image/aufs/layerdb/sha256/{LAYERID}/diff and compare with the previous command output to properly match the correct diff_id and size.


It's indeed doable to query the manifest or blob info from docker registry server without pulling the image to local disk.

You can refer to the Registry v2 API to fetch the manifest of image.

GET /v2/<name>/manifests/<reference>

Note, you have to handle different manifest version. For v2 you can directly get the size of layer and digest of blob. For v1 manifest, you can HEAD the blob download url to get the actual layer size.

There is a simple script for handling above cases that will be continuously maintained.


This will inspect the docker image and print the layers:

$ docker image inspect nginx -f '{{.RootFS.Layers}}'
[sha256:d626a8ad97a1f9c1f2c4db3814751ada64f60aed927764a3f994fcd88363b659 sha256:82b81d779f8352b20e52295afc6d0eab7e61c0ec7af96d85b8cda7800285d97d sha256:7ab428981537aa7d0c79bc1acbf208c71e57d9678f7deca4267cc03fba26b9c8]

I've solved this problem by using the search function on Docker's website where '*' is a valid search that returns 200k repositories and then I crawled each invididual page. HTML parsing allows me to extract all the image names on each page.


They have a very good answer here: https://stackoverflow.com/a/32455275/165865

Just run below images:

docker run --rm -v /var/run/docker.sock:/var/run/docker.sock nate/dockviz images -t

Not exactly the original question but to find the sum total of all the images without double-counting shared layers, the following is useful (ubuntu 18):

sudo du -h -d1  /var/lib/docker/overlay2 | sort -h

Check out dive written in golang.

Awesome tool!


In my opinion, docker history <image> is sufficient. This returns the size of each layer:

$ docker history jenkinsci-jnlp-slave:2019-1-9c
IMAGE        CREATED    CREATED BY                                    SIZE  COMMENT
93f48953d298 42 min ago /bin/sh -c #(nop)  USER jenkins               0B
6305b07d4650 42 min ago /bin/sh -c chown jenkins:jenkins -R /home/je… 1.45GB

Examples related to image

Reading images in python Numpy Resize/Rescale Image Convert np.array of type float64 to type uint8 scaling values Extract a page from a pdf as a jpeg How do I stretch an image to fit the whole background (100% height x 100% width) in Flutter? Angular 4 img src is not found How to make a movie out of images in python Load local images in React.js How to install "ifconfig" command in my ubuntu docker image? How do I display local image in markdown?

Examples related to docker

standard_init_linux.go:190: exec user process caused "no such file or directory" - Docker What is the point of WORKDIR on Dockerfile? E: gnupg, gnupg2 and gnupg1 do not seem to be installed, but one of them is required for this operation How do I add a user when I'm using Alpine as a base image? docker: Error response from daemon: Get https://registry-1.docker.io/v2/: Service Unavailable. IN DOCKER , MAC How to fix docker: Got permission denied issue pull access denied repository does not exist or may require docker login Docker error: invalid reference format: repository name must be lowercase Docker: "no matching manifest for windows/amd64 in the manifest list entries" OCI runtime exec failed: exec failed: (...) executable file not found in $PATH": unknown

Examples related to web-crawler

TypeError: can't use a string pattern on a bytes-like object in re.findall() Finding the layers and layer sizes for each Docker image Sending "User-agent" using Requests library in Python How to find sitemap.xml path on websites? How to request Google to re-crawl my website? python: [Errno 10054] An existing connection was forcibly closed by the remote host How to get a web page's source code from Java Python: maximum recursion depth exceeded while calling a Python object Python Web Crawlers and "getting" html source code How do I make a simple crawler in PHP?