Understanding VOLUME instruction in DockerFile

Question

Below is the content of my  Dockerfile    FROM node boron    Create app directory RUN mkdir -p  usr src app    change working dir to  usr src app WORKDIR  usr src app  VOLUME    usr src app  RUN npm install  EXPOSE 8080  CMD   node     server      In this file I am expecting  VOLUME    usr src app  instruction to mount contents of present working directory in host to be mounted on  usr src app folder of container   Please let me know if this is the correct way

User · Accepted Answer

The official docker tutorial says   A data volume is a specially-designated directory within one or more containers that bypasses the Union File System  Data volumes provide several useful features for persistent or shared data   Volumes are initialized when a container is created  If the    container   s base image contains data at the specified mount point  that existing data is copied into the new volume upon volume initialization   Note that this does not apply when mounting a host directory    Data volumes can be shared and reused among containers   Changes to a data volume are made directly   Changes to a data volume will not be included when you update an image   Data volumes persist even if the container itself is deleted     In Dockerfile you can specify only the destination of a volume inside a container  e g   usr src app  When you run a container  e g  docker run --volume  opt  usr src app my image  you may but do not have to specify its mounting point   opt  on the host machine  If you do not specify --volume argument then the mount point will be chosen automatically  usually under  var lib docker volumes

User · Answer

To better understand the volume instruction in dockerfile  let us learn the typical volume usage in mysql official docker file implementation  VOLUME  var lib mysql  Reference  https   github com docker-library mysql blob 3362baccb4352bcf0022014f67c1ec7e6808b8c5 8 0 Dockerfile The  var lib mysql is the default location of MySQL that store data files  When you run test container for test purpose only  you may not specify its mounting point e g  docker run mysql 8  then the mysql container instance will use the default mount path which is specified by the volume instruction in dockerfile  the volumes is created with a very long ID-like name inside the Docker root  this is called  quot unnamed quot  or  quot anonymous quot  volume  In the folder of underlying host system   var lib docker volumes   var lib docker volumes 320752e0e70d1590e905b02d484c22689e69adcbd764a69e39b17bc330b984e4  This is very convenient for quick test purposes without the need to specify the mounting point  but still can get best performance by using Volume for data store  not the container layer  For a formal use  you will need to specify the mount path by using named volume or bind mount  e g  docker run  -v  my own datadir  var lib mysql mysql 8  The command mounts the  my own datadir directory from the underlying host system as  var lib mysql inside the container The data directory  my own datadir won t be automatically deleted  even the container is deleted  Usage of the mysql official image  Please check the  quot Where to Store Data quot  section   Reference  https   hub docker com   mysql

User · Answer

In short  No  your VOLUME instruction is not correct  Dockerfile s VOLUME specify one or more volumes given container-side paths  But it does not allow the image author to specify a host path  On the host-side  the volumes are created with a very long ID-like name inside the Docker root  On my machine this is  var lib docker volumes  Note  Because the autogenerated name is extremely long and makes no sense from a human s perspective  these volumes are often referred to as  quot unnamed quot  or  quot anonymous quot   Your example that uses a     character will not even run on my machine  no matter if I make the dot the first or second argument  I get this error message   docker  Error response from daemon  oci runtime error  container linux go 265  starting container process caused  quot process linux go 368  container init caused  quot open  dev ptmx  no such file or directory quot  quot    I know that what has been said to this point is probably not very valuable to someone trying to understand VOLUME and -v and it certainly does not provide a solution for what you try to accomplish  So  hopefully  the following examples will shed some more light on these issues  Minitutorial  Specifying volumes Given this Dockerfile  FROM openjdk 8u131-jdk-alpine VOLUME vol1 vol2   For the outcome of this minitutorial  it makes no difference if we specify vol1 vol2 or  vol1  vol2     this is because the default working directory within a Dockerfile is    Build it  docker build -t my-openjdk  Run  docker run --rm -it my-openjdk  Inside the container  run ls in the command line and you ll notice two directories exist   vol1 and  vol2  Running the container also creates two directories  or  quot volumes quot   on the host-side  While having the container running  execute docker volume ls on the host machine and you ll see something like this  I have replaced the middle part of the name with three dots for brevity   DRIVER    VOLUME NAME local     c984   e4fc local     f670   49f0  Back in the container  execute touch  vol1 weird-ass-file  creates a blank file at said location   This file is now available on the host machine  in one of the unnamed volumes lol  It took me two tries because I first tried the first listed volume  but eventually I did find my file in the second listed volume  using this command on the host machine  sudo ls  var lib docker volumes f670   49f0  data  Similarly  you can try to delete this file on the host and it will be deleted in the container as well  Note  The  data folder is also referred to as a  quot mount point quot   Exit out from the container and list the volumes on the host  They are gone  We used the --rm flag when running the container and this option effectively wipes out not just the container on exit  but also the volumes  Run a new container  but specify a volume using -v  docker run --rm -it -v  vol3 my-openjdk  This adds a third volume and the whole system ends up having three unnamed volumes  The command would have crashed had we specified only -v vol3  The argument must be an absolute path inside the container  On the host-side  the new third volume is anonymous and resides together with the other two volumes in  var lib docker volumes   It was stated earlier that the Dockerfile can not map to a host path which sort of pose a problem for us when trying to bring files in from the host to the container during runtime  A different -v syntax solves this problem  Imagine I have a subfolder in my project directory   src that I wish to sync to  src inside the container  This command does the trick  docker run -it -v   pwd  src  src my-openjdk  Both sides of the   character expects an absolute path  Left side being an absolute path on the host machine  right side being an absolute path inside the container  pwd is a command that  quot print current working directory quot   Putting the command in     takes the command within parenthesis  runs it in a subshell and yields back the absolute path to our project directory  Putting it all together  assume we have   src Hello java in our project folder on the host machine with the following contents  public class Hello       public static void main String    ignored            System out println  quot Hello  World  quot             We build this Dockerfile  FROM openjdk 8u131-jdk-alpine WORKDIR  src ENTRYPOINT javac Hello java  amp  amp  java Hello  We run this command  docker run -v   pwd  src  src my-openjdk  This prints  quot Hello  World  quot   The best part is that we re completely free to modify the  java file with a new message for another output on a second run - without having to rebuild the image    Final remarks I am quite new to Docker  and the aforementioned  quot tutorial quot  reflects information I gathered from a 3-day command line hackathon  I am almost ashamed I haven t been able to provide links to clear English-like documentation backing up my statements  but I honestly think this is due to a lack of documentation and not personal effort  I do know the examples work as advertised using my current setup which is  quot Windows 10 - gt  Vagrant 2 0 0 - gt  Docker 17 09 0-ce quot   The tutorial does not solve the problem  quot how do we specify the container s path in the Dockerfile and let the run command only specify the host path quot   There might be a way  I just haven t found it  Finally  I have a gut feeling that specifying VOLUME in the Dockerfile is not just uncommon  but it s probably a best practice to never use VOLUME  For two reasons  The first reason we have already identified  We can not specify the host path - which is a good thing because Dockerfiles should be very agnostic to the specifics of a host machine  But the second reason is people might forget to use the --rm option when running the container  One might remember to remove the container but forget to remove the volume  Plus  even with the best of human memory  it might be a daunting task to figure out which of all anonymous volumes are safe to remove

User · Answer

Specifying a VOLUME line in a Dockerfile configures a bit of metadata on your image  but how that metadata is used is important   First  what did these two lines do   WORKDIR  usr src app VOLUME    usr src app   The WORKDIR line there creates the directory if it doesn t exist  and updates some image metadata to specify all relative paths  along with the current directory for commands like RUN will be in that location  The VOLUME line there specifies two volumes  one is the relative path    and the other is  usr src app  both just happen to be the same directory  Most often the VOLUME line only contains a single directory  but it can contain multiple as you ve done  or it can be a json formatted array   You cannot specify a volume source in the Dockerfile  A common source of confusion when specifying volumes in a Dockerfile is trying to match the runtime syntax of a source and destination at image build time  this will not work  The Dockerfile can only specify the destination of the volume  It would be a trivial security exploit if someone could define the source of a volume since they could update a common image on the docker hub to mount the root directory into the container and then launch a background process inside the container as part of an entrypoint that adds logins to  etc passwd  configures systemd to launch a bitcoin miner on next reboot  or searches the filesystem for credit cards  SSNs  and private keys to send off to a remote site   What does the VOLUME line do  As mentioned  it sets some image metadata to say a directory inside the image is a volume  How is this metadata used  Every time you create a container from this image  docker will force that directory to be a volume  If you do not provide a volume in your run command  or compose file  the only option for docker is to create an anonymous volume  This is a local named volume with a long unique id for the name and no other indication for why it was created or what data it contains  anonymous volumes are were data goes to get lost   If you override the volume  pointing to a named or host volume  your data will go there instead   VOLUME breaks things  You cannot disable a volume once defined in a Dockerfile  And more importantly  the RUN command in docker is implemented with temporary containers  Those temporary containers will get a temporary anonymous volume  That anonymous volume will be initialized with the contents of your image  Any writes inside the container from your RUN command will be made to that volume  When the RUN command finishes  changes to the image are saved  and changes to the anonymous volume are discarded  Because of this  I strongly recommend against defining a VOLUME inside the Dockerfile  It results in unexpected behavior for downstream users of your image that wish to extend the image with initial data in volume location   How should you specify a volume  To specify where you want to include volumes with your image  provide a docker-compose yml  Users can modify that to adjust the volume location to their local environment  and it captures other runtime settings like publishing ports and networking   Someone should document this  They have  Docker includes warnings on the VOLUME usage in their documentation on the Dockerfile along with advice to specify the source at runtime         Changing the volume from within the Dockerfile  If any build steps change the data within the volume after it has been declared    those changes will be discarded                      The host directory is declared at container run-time  The host directory  the mountpoint  is  by its nature  host-dependent  This is   to preserve image portability  since a given host directory can   t be   guaranteed to be available on all hosts  For this reason  you can   t   mount a host directory from within the Dockerfile  The VOLUME   instruction does not support specifying a host-dir parameter  You   must specify the mountpoint when you create or run the container

User · Answer

The VOLUME command in a Dockerfile is quite legit  totally conventional  absolutely fine to use and it is not deprecated in anyway  Just need to understand it    We use it to point to any directories which the app in the container will write to a lot  We don t use VOLUME just because we want to share between host and container like a config file    The command simply needs one param  a path to a folder  relative to WORKDIR if set  from within the container  Then docker will create a volume in its graph  var lib docker  and mount it to the folder in the container  Now the container will have somewhere to write to with high performance  Without the VOLUME command the write speed to the specified folder will be very slow because now the container is using it s copy on write strategy in the container itself  The copy on write strategy is a main reason why volumes exist    If you mount over the folder specified by the VOLUME command  the command is never run because VOLUME is only executed when the container starts  kind of like ENV   Basically with VOLUME command you get performance without externally mounting any volumes  Data will save across container runs too without any external mounts  Then when ready simply mount something over it    Some good example use cases   - logs  - temp folders  Some bad use cases   - static files  - configs  - code

User · Answer

I don t consider the use of VOLUME good in any case  except if you are creating an image for yourself and no one else is going to use it  I was impacted negatively due to VOLUME exposed in base images that I extended and only came up to know about the problem after the image was already running  like wordpress that declares the  var www html folder as a VOLUME  and this meant that any files added or changed during the build stage aren t considered  and live changes persist  even if you don t know  There is an ugly workaround to define web directory in another place  but this is just a bad solution to a much simpler one  just remove the VOLUME directive  You can achieve the intent of volume easily using the -v option  this not only make it clear what will be the volumes of the container  without having to take a look at the Dockerfile and parent Dockerfiles   but this also gives the consumer the option to use the volume or not  It s also bad to use VOLUMES due to the following reasons  as said by this answer   However  the VOLUME instruction does come at a cost   Users might not be aware of the unnamed volumes being created  and    continuing to take up storage space on their Docker host after containers are removed  There is no way to remove a volume declared    in a Dockerfile  Downstream images cannot add data to paths where    volumes exist   The latter issue results in problems like these   How to    undeclare    volumes in docker image  GitLab on Docker  how to persist user data between deployments    Having the option to undeclare a volume would help  but only if you know the volumes defined in the dockerfile that generated the image  and the parent dockerfiles    Furthermore  a VOLUME could be added in newer versions of a Dockerfile and break things unexpectedly for the consumers of the image  Another good explanation  about the oracle image having VOLUME  which was removed   https   github com oracle docker-images issues 640 issuecomment-412647328 More cases in which VOLUME broke stuff for people   https   github com datastax docker-images issues 31 https   github com docker-library wordpress issues 232 https   github com docker-library ghost issues 195 https   github com samos123 docker-drupal issues 10  A pull request to add options to reset properties the parent image  including VOLUME   was closed and is being discussed here  and you can see several cases of people affected adversely due to volumes defined in dockerfiles   which has a comment with a good explanation against VOLUME   Using VOLUME in the Dockerfile is worthless  If a user needs persistence  they will be sure to provide a volume mapping when running the specified container  It was very hard to track down that my issue of not being able to set a directory s ownership   var lib influxdb  was due to the VOLUME declaration in InfluxDB s Dockerfile  Without an UNVOLUME type of option  or getting rid of it altogether  I am unable to change anything related to the specified folder  This is less than ideal  especially when you are security-aware and desire to specify a certain UID the image should be ran as  in order to avoid a random user  with more permissions than necessary  running software on your host   The only good thing I can see about VOLUME is about documentation  and I would consider it good if it only did that  without any side effects   TL DR I consider that the best use of VOLUME is to be deprecated

[docker] Understanding "VOLUME" instruction in DockerFile

Examples related to docker

Examples related to dockerfile