Understand Docker layers by example : RUN instructions Impact

I recently discussed with several people about how to build a Container and, more specifically, why it's important to minimize the usage of RUN instructions in a Dockerfile.

In this article, we will explore together the impact of unnecessarily multiplying multiple RUN instructions, as well as how to inspect the different layers of your images.

Dockerfile

We will take the simplest possible Dockerfile for this article and compare two configurations:

A simple Dockerfile with only one RUN instruction: Dockerfile.1run

# Utilisation d'une image de base
FROM ubuntu:latest
# Create and delete secret file 
RUN echo "secret_content" > /AAA && rm /AAA

and another with 2 RUN instructions: Dockerfile.2run.

FROM ubuntu:latest
# Create and delete secret file 
RUN echo "secret_content" > /AAA 
RUN rm /AAA

The result may seem exactly the same but not really.

Each time you put a RUN instruction, Docker will create a new layer in the image with the state of the file system for each layer, which can sometimes pose a risk if it is also combined with other bad practices.

Let's look at this together and build the 2 images:

docker build -t 1run -f Dockerfile.1run 
docker build -t 2run -f Dockerfile.2run

Now let's analyze the image containing 2 instructions with the dive tool.

dive 2run

Surprisingly, we have 3 layers in this image:

  • 1 for the base image

  • 1 for our RUN instruction that creates the file

  • 1 for our RUN instruction that deletes the file

On the right, we can observe that the file is present in this layer.

Let's see how we can manage to retrieve it.

Retrieve files from docker layers

We have kept aside the ID of the layer, which is found in the layer details section.

Thanks to the docker command, we have the ability to easily extract the file system of the image.

docker save 2run -o test.tar
tar -xvf test.tar 

$ tar -xvf test.tar 

33570c3886dd27db6e7b8bd88205951f82a6a8048fd9b9292c5f556180dc894e/
33570c3886dd27db6e7b8bd88205951f82a6a8048fd9b9292c5f556180dc894e/VERSION
33570c3886dd27db6e7b8bd88205951f82a6a8048fd9b9292c5f556180dc894e/json
33570c3886dd27db6e7b8bd88205951f82a6a8048fd9b9292c5f556180dc894e/layer.tar
5a00f72ab3686152f0cf5067e06ff9937877cf31138ef4b8749a184c46b09898/
5a00f72ab3686152f0cf5067e06ff9937877cf31138ef4b8749a184c46b09898/VERSION
5a00f72ab3686152f0cf5067e06ff9937877cf31138ef4b8749a184c46b09898/json
5a00f72ab3686152f0cf5067e06ff9937877cf31138ef4b8749a184c46b09898/layer.tar
a23855d9376693f12ed1a1e68dd307dfd384615e4b1860a87978c5f9a7969120.json
d10815363623202ac07b49763f180d45a6941ef6a0ea905a92a7ddfa6bb45422/
d10815363623202ac07b49763f180d45a6941ef6a0ea905a92a7ddfa6bb45422/VERSION
d10815363623202ac07b49763f180d45a6941ef6a0ea905a92a7ddfa6bb45422/json
d10815363623202ac07b49763f180d45a6941ef6a0ea905a92a7ddfa6bb45422/layer.tar
manifest.json
repositories

During the docker save command, docker will extract all the layers of the image and create a .tar file for each layer.

All that's left is to extract the right layer.

In our case, the ID is:

d10815363623202ac07b49763f180d45a6941ef6a0ea905a92a7ddfa6bb45422

$ cd d10815363623202ac07b49763f180d45a6941ef6a0ea905a92a7ddfa6bb45422/
$ ls 
json  layer.tar  VERSION
$ tar -xvf layer.tar 
AAA
$ cat AAA 
secret_content

Nous avons pu retrouver le contenu du fichier AAA.

Reduce numbers of RUN instructions

Let's test our image with a single instruction combining the two instructions.

dive 1run

We can see on the left 2 layers:

  • 1 for the base image

  • 1 for our RUN instruction (create + delete)

Let's follow the same process with the ID corresponding to the ID of the layer of our RUN instruction cf388386fad4ed6625a7eb438b9bdd915ab52e7be0d0a76a97f1cb4aa0ec40a5:

docker save 1run -o test.tar 
tar -xvf test.tar 
cd cf388386fad4ed6625a7eb438b9bdd915ab52e7be0d0a76a97f1cb4aa0ec40a5
$ cd cf388386fad4ed6625a7eb438b9bdd915ab52e7be0d0a76a97f1cb4aa0ec40a5
$ ls 
json  layer.tar  VERSION
$ tar -xvf layer.tar
$ ls 
json  layer.tar  VERSION

It is not possible to find this file in this image.

Advice

Minimizing the number of RUN instructions in a Dockerfile is crucial for reducing the size of the final image, improving build and deployment performance, and enhancing security by avoiding the creation of unnecessary layers that might contain sensitive or redundant data. A best practice is to combine related commands into a single RUN instruction, thus reducing the number of layers and optimizing the image's efficiency.