[debugging] How can I inspect the file system of a failed `docker build`?

I'm trying to build a new Docker image for our development process, using cpanm to install a bunch of Perl modules as a base image for various projects.

While developing the Dockerfile, cpanm returns a failure code because some of the modules did not install cleanly.

I'm fairly sure I need to get apt to install some more things.

My question is, where can I find the /.cpanm/work directory quoted in the output, in order to inspect the logs? In the general case, how can I inspect the file system of a failed docker build command?

Morning edit After biting the bullet and running a find I discovered

/var/lib/docker/aufs/diff/3afa404e[...]/.cpanm

Is this reliable, or am I better off building a "bare" container and running stuff manually until I have all the things I need?

This question is related to debugging docker cpanm

The answer is


The top answer works in the case that you want to examine the state immediately prior to the failed command.

However, the question asks how to examine the state of the failed container itself. In my situation, the failed command is a build that takes several hours, so rewinding prior to the failed command and running it again takes a long time and is not very helpful.

The solution here is to find the container that failed:

$ docker ps -a
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS                          PORTS               NAMES
6934ada98de6        42e0228751b3        "/bin/sh -c './utils/"   24 minutes ago      Exited (1) About a minute ago                       sleepy_bell

Commit it to an image:

$ docker commit 6934ada98de6
sha256:7015687976a478e0e94b60fa496d319cdf4ec847bcd612aecf869a72336e6b83

And then run the image [if necessary, running bash]:

$ docker run -it 7015687976a4 [bash -il]

Now you are actually looking at the state of the build at the time that it failed, instead of at the time before running the command that caused the failure.


Debugging build step failures is indeed very annoying.

The best solution I have found is to make sure that each step that does real work succeeds, and adding a check after those that fails. That way you get a committed layer that contains the outputs of the failed step that you can inspect.

A Dockerfile, with an example after the # Run DB2 silent installer line:

#
# DB2 10.5 Client Dockerfile (Part 1)
#
# Requires
#   - DB2 10.5 Client for 64bit Linux ibm_data_server_runtime_client_linuxx64_v10.5.tar.gz
#   - Response file for DB2 10.5 Client for 64bit Linux db2rtcl_nr.rsp 
#
#
# Using Ubuntu 14.04 base image as the starting point.
FROM ubuntu:14.04

MAINTAINER David Carew <[email protected]>

# DB2 prereqs (also installing sharutils package as we use the utility uuencode to generate password - all others are required for the DB2 Client) 
RUN dpkg --add-architecture i386 && apt-get update && apt-get install -y sharutils binutils libstdc++6:i386 libpam0g:i386 && ln -s /lib/i386-linux-gnu/libpam.so.0 /lib/libpam.so.0
RUN apt-get install -y libxml2


# Create user db2clnt
# Generate strong random password and allow sudo to root w/o password
#
RUN  \
   adduser --quiet --disabled-password -shell /bin/bash -home /home/db2clnt --gecos "DB2 Client" db2clnt && \
   echo db2clnt:`dd if=/dev/urandom bs=16 count=1 2>/dev/null | uuencode -| head -n 2 | grep -v begin | cut -b 2-10` | chgpasswd && \
   adduser db2clnt sudo && \
   echo '%sudo ALL=(ALL) NOPASSWD:ALL' >> /etc/sudoers

# Install DB2
RUN mkdir /install
# Copy DB2 tarball - ADD command will expand it automatically
ADD v10.5fp9_linuxx64_rtcl.tar.gz /install/
# Copy response file
COPY  db2rtcl_nr.rsp /install/
# Run  DB2 silent installer
RUN mkdir /logs
RUN (/install/rtcl/db2setup -t /logs/trace -l /logs/log -u /install/db2rtcl_nr.rsp && touch /install/done) || /bin/true
RUN test -f /install/done || (echo ERROR-------; echo install failed, see files in container /logs directory of the last container layer; echo run docker run '<last image id>' /bin/cat /logs/trace; echo ----------)
RUN test -f /install/done

# Clean up unwanted files
RUN rm -fr /install/rtcl

# Login as db2clnt user
CMD su - db2clnt

What I would do is comment out the Dockerfile below and including the offending line. Then you can run the container and run the docker commands by hand, and look at the logs in the usual way. E.g. if the Dockerfile is

RUN foo
RUN bar
RUN baz

and it's dying at bar I would do

RUN foo
# RUN bar
# RUN baz

Then

$ docker build -t foo .
$ docker run -it foo bash
container# bar
...grep logs...

Docker caches the entire filesystem state after each successful RUN line.

Knowing that:

  • to examine the latest state before your failing RUN command, comment it out in the Dockerfile (as well as any and all subsequent RUN commands), then run docker build and docker run again.
  • to examine the state after the failing RUN command, simply add || true to it to force it to succeed; then proceed like above (keep any and all subsequent RUN commands commented out, run docker build and docker run)

Tada, no need to mess with Docker internals or layer IDs, and as a bonus Docker automatically minimizes the amount of work that needs to be re-done.


Examples related to debugging

How do I enable logging for Spring Security? How to run or debug php on Visual Studio Code (VSCode) How do you debug React Native? How do I debug "Error: spawn ENOENT" on node.js? How can I inspect the file system of a failed `docker build`? Swift: print() vs println() vs NSLog() JavaScript console.log causes error: "Synchronous XMLHttpRequest on the main thread is deprecated..." How to debug Spring Boot application with Eclipse? Unfortunately MyApp has stopped. How can I solve this? 500 internal server error, how to debug

Examples related to docker

standard_init_linux.go:190: exec user process caused "no such file or directory" - Docker What is the point of WORKDIR on Dockerfile? E: gnupg, gnupg2 and gnupg1 do not seem to be installed, but one of them is required for this operation How do I add a user when I'm using Alpine as a base image? docker: Error response from daemon: Get https://registry-1.docker.io/v2/: Service Unavailable. IN DOCKER , MAC How to fix docker: Got permission denied issue pull access denied repository does not exist or may require docker login Docker error: invalid reference format: repository name must be lowercase Docker: "no matching manifest for windows/amd64 in the manifest list entries" OCI runtime exec failed: exec failed: (...) executable file not found in $PATH": unknown

Examples related to cpanm

How can I inspect the file system of a failed `docker build`?