Use Docker multi stage construction to reduce the size of the image

  • 2020-12-22 17:51:11
  • OfStack

This article shows how Docker's multi-stage build capabilities can be used to significantly reduce the size of the image for those that need to build programs in Dockerfile, such as javac, and that need to have an additional compiler toolchain installed. (such as Java)

First, learn the words. (This article uses All Chinese words. If you need to consult a foreign document, you can refer to this vocabulary. Theoretically, I do not approve of translation of terms) :

multi - stage multi-stage build build image mirror stage phase

Let's take another look at the effect: the original 110M+, now 92M.

Contrast 1 with Dockerfile

Dockerfile before optimization:


FROM openjdk:8u171-jdk-alpine3.8

ADD . /app
WORKDIR /app

RUN apk add maven \
  && mvn clean package \
  && apk del maven \
  && mv target/final.jar / \
  && cd / \
  && rm -rf /app \
  && rm -rf /root/.m2

ENTRYPOINT java -jar /final.jar

Optimized Dockerfile:


FROM openjdk:8u171-jdk-alpine3.8 as builder

ADD . /app
WORKDIR /app

RUN apk add maven \
  && mvn clean package \
  && apk del maven \
  && mv target/final.jar /

FROM openjdk:8u181-jre-alpine3.8 as environment
WORKDIR /
COPY --from=builder /final.jar .
ENTRYPOINT java -jar /final.jar

Obviously, the optimized Dockerfile adds the command FROM AS and two FROM appear. This is called multi-stage construction.

Learn about the next multi-phase build

Multi-phase builds are a new feature of Docker 17.05 that allows you to use multiple FROM statements in one Dockerfile to create multiple Stages (phases). Each phase is independent (source request) and files for the other phases can be obtained via COPY --from. Let's say the final mirror image is a dish (stir-fried green peppers). Stir-fry the green peppers and serve.


#  Comparing listing 
 The mirror  -> 1 dish 
 The first 1 A phase  ->  Fried 
 The first 2 A phase  ->  On the table 

The goal of the two stages is to make the final dish (mirror image). So what we're going to do is we're going to take the first stage of cooking and we're going to bring it to the table. Our goal is to make dishes that are as light as possible (dishes and intermediates).

The visualization process is as follows:


#  Cooking process 
...  Omit the raw material 
 The raw material  -> [ The first 1 Stage -- Saute ] #  At this point on the plate are the tools of the frying, the results of the frying and the intermediates 
#  Now turn on the control 2 The stage, only retain the results of the stir-fry, no other need. 
->  The result of the fire  -> [ Start serving, keeping only the results ] #  We're going to get the green peppers. ( COPY --from ), nothing else 
->  In the end is 1 Dish. 

You should now have a general idea of the multi-stage build process. Let's turn the microphone over to Java and see how to build 1 JAR in Dockerfile using the compilation tool and keep only the finished JAR and run time to Image, and throw away the rest:


#  The first 1 Stage -- Compile (fry) 
FROM openjdk:8u171-jdk-alpine3.8 as builder #  Bring your own compiler 

ADD . /app
WORKDIR /app

RUN ...  Omit compilation and cleanup ...

#  Now, JAR  It's out. JDK  It's no longer needed, so it can't stay in the mirror image. 
#  So let's turn on the number one 2 Stage - run (to the table), and throw out the first 1 All files in phase (including compilation tools) 
FROM openjdk:8u181-jre-alpine3.8 as environment #  Run time only 

#  For now, compile tools and so on 1 The stage stuff has been left behind. The only thing in the image right now is the runtime, so we need to put up 1 Stage (stir-fry) results, other do not. 
COPY --from=0 /final.jar .

#  Ok, now the mirror only has the necessary run-time sums  JAR  . 
ENTRYPOINT java -jar /final.jar

That was the introduction to the multi-phase build.

Use a multi-phase build

The core command for the multi-phase build is FROM. FORM is not much to say to the battle-hardened. In a multi-phase build, each time FROM starts a new Stage (phase), you can view it as a new Image (inaccurate, source request), isolated from the rest of the phases (even including environment variables). Only the final FROM will be included in Image.

Let's do a multi-stage build example of simple:


# Stage 1
FROM alpine:3.8
WORKDIR /demo
RUN echo "Hello, stage 1" > /demo/hi-1.txt

# Stage 2
FROM alpine:3.8
WORKDIR /demo
RUN echo "Hello, stage 2" > /demo/hi-2.txt

You can build 1 Dockerfile and then docker save < tag > > Take a look at the content. It should be only /demo/ ES89en-2.txt and Alpine.

In this Dockerfile, we created two phases. Phase 1 creates ES95en-1.txt, phase 2 creates ES97en-2.txt, and phase 2 is added to the final Image, but nothing else.

Copy files - Bridges between phases

If the phases are completely isolated from each other, then the multiple phases are meaningless -- the results of the previous phase are completely discarded and the next phase is completely new.

The COPY command allows you to get the files for the other stages. Use COPY and normal applications in multiple stages completely 1, only need to add form '. So, let's revise the previous example so that the final mirror image contains the products of two stages:


# Stage 1
FROM alpine:3.8
WORKDIR /demo
RUN echo "Hello, stage 1" > /demo/hi-1.txt

# Stage 2
FROM alpine:3.8
WORKDIR /demo
COPY --from=0 /demo/hi-1.txt /demo
RUN echo "Hello, stage 2" > /demo/hi-2.txt

Rebuild and save (Save) and you will find an additional layer of Layer containing hi-1.ES116en.

Stage naming - Rapid identification

stage index is not a great thing for us, who only have 7 seconds of memory. At this point, they can be given names by means of stage naming to facilitate recognition.

Adding a name to a stage is easy, just add as after FROM < name > Can.

Now, let's update Dockerfile, give the stage name and use the name to COPY.


# Stage 1, it's name is "build1"
FROM alpine:3.8 as build1
WORKDIR /demo
RUN echo "Hello, stage 1" > /demo/hi-1.txt

# Stage 2, it's name is "build2"
FROM alpine:3.8 as build2
WORKDIR /demo
# No longer use indexes
COPY --from=build1 /demo/hi-1.txt /demo
RUN echo "Hello, stage 2" > /demo/hi-2.txt

Rebuild and save. The result should be the same as last time.

Build only part of the phase -- easy debugging

Docker also gives us a convenient way to debug -- building only part of the phase. It can cause a build to stop at one stage and not build later. This is easy for us to debug; Distinguish between production, development, and testing.

We still use Dockerfile, but use --target < stage > Parameter construction:


$ docker build --target build1 .

Again with Save, you will find only the contents of build1.

conclusion

That's all there is to multistage construction. Let's go back to the two Dockerfile comparisons in the beginning. Can you see where the pre-optimized mirror image is fat?

Obviously, it contains JDK which is useless, JDK only works at compile time, and it is useless when compiled, only JRE is needed. Therefore, multi-stage builds can be used to isolate the build and run phases for mirroring optimization.

reference

https://docs.docker.com/develop/develop-images/multistage-build/#name-your-build-stages

https://yeasy.gitbooks.io/docker_practice/image/multistage-builds.html


Related articles: