Performance

Docker Image Optimization: Reduce Size by 90% with Multi-Stage Builds

February 22, 2025 · 15 min read

A typical Node.js application built with FROM node:20 produces an image over 1 GB in size. The same application, built with multi-stage builds and an Alpine base, comes in under 100 MB. A Go application can drop from 800 MB to under 10 MB. That is not a theoretical improvement — it is a practical, reproducible optimization that affects build times, deployment speed, storage costs, registry bandwidth, and your attack surface.

This guide covers every technique for shrinking Docker images, from quick wins like .dockerignore to advanced strategies like distroless images and build cache optimization. Each section includes real Dockerfile examples you can adapt for your projects.

Why Image Size Matters

Large Docker images are not just a storage concern. They create cascading problems across your entire workflow:

Slower CI/CD pipelines — Every build pushes and pulls large images. A 1 GB image over a 100 Mbps connection takes 80 seconds just to transfer.
Slower deployments — Rolling updates require pulling the new image on every node before the new container starts.
Higher cloud costs — Registry storage, bandwidth between regions, and disk space on every node add up.
Larger attack surface — Every package, library, and binary in your image is a potential vulnerability. The node:20 base image contains over 400 packages, most of which your application never uses.
Slower scaling — Auto-scaling new nodes requires pulling images. Smaller images mean faster scale-up response.

Quick Win: The .dockerignore File

Before optimizing anything else, create a .dockerignore file. Without it, docker build sends your entire project directory to the Docker daemon as the build context — including node_modules, .git, test fixtures, local databases, and editor configs:

# .dockerignore
# Version control
.git
.gitignore

# Dependencies (will be installed during build)
node_modules
vendor
__pycache__
*.pyc
.venv
venv

# Build output
dist
build
*.egg-info

# IDE and editor files
.idea
.vscode
*.swp
*.swo
.DS_Store

# Docker files
Dockerfile*
docker-compose*
.dockerignore

# Documentation
README.md
docs/
*.md

# Tests
tests/
test/
__tests__
*.test.js
*.spec.js
coverage/
.nyc_output

# CI/CD
.github
.gitlab-ci.yml
.circleci
Jenkinsfile

# Environment and secrets
.env
.env.*
*.pem
*.key

The impact can be dramatic. A project with a 500 MB node_modules directory and a 200 MB .git directory sends 700 MB of unnecessary data to the Docker daemon before the build even starts.

Multi-Stage Builds: The Core Technique

Multi-stage builds let you use one image for building your application and a different (smaller) image for running it. Only the final stage becomes your production image. Build tools, compilers, dev dependencies, and source code are left behind in the build stages.

Node.js Example

# ---- BAD: Single-stage build ----
FROM node:20
WORKDIR /app
COPY . .
RUN npm ci && npm run build
EXPOSE 3000
CMD ["node", "dist/index.js"]
# Result: ~1.1 GB

# ---- GOOD: Multi-stage build ----
# Stage 1: Build
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production && \
    cp -R node_modules prod_modules && \
    npm ci && \
    npm run build

# Stage 2: Production
FROM node:20-alpine AS production
WORKDIR /app

# Copy only production dependencies and built output
COPY --from=builder /app/prod_modules ./node_modules
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/package.json ./

# Run as non-root user
RUN addgroup -g 1001 -S appgroup && \
    adduser -S appuser -u 1001 -G appgroup
USER appuser

EXPOSE 3000
CMD ["node", "dist/index.js"]
# Result: ~150 MB

Go Example

Go produces statically-linked binaries, making it the best case for multi-stage optimization:

# Stage 1: Build
FROM golang:1.22-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -ldflags="-s -w" -o /app/server .

# Stage 2: Minimal runtime
FROM scratch
COPY --from=builder /app/server /server
COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/

EXPOSE 8080
ENTRYPOINT ["/server"]
# Result: ~8 MB (down from ~800 MB)

The scratch base image is literally empty — no shell, no package manager, nothing. The Go binary and TLS certificates are the only things in the image. The -ldflags="-s -w" flags strip debug information and symbol tables, further reducing binary size.

Python Example

# Stage 1: Build dependencies
FROM python:3.12-slim AS builder
WORKDIR /app

RUN python -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Stage 2: Production
FROM python:3.12-slim AS production
WORKDIR /app

COPY --from=builder /opt/venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"

COPY . .

RUN useradd --create-home appuser
USER appuser

EXPOSE 8000
CMD ["gunicorn", "--bind", "0.0.0.0:8000", "app:create_app()"]
# Result: ~180 MB (down from ~1 GB with full python:3.12)

Java Example

# Stage 1: Build with Maven
FROM maven:3.9-eclipse-temurin-21 AS builder
WORKDIR /app
COPY pom.xml .
RUN mvn dependency:go-offline
COPY src ./src
RUN mvn package -DskipTests

# Stage 2: JRE-only runtime
FROM eclipse-temurin:21-jre-alpine AS production
WORKDIR /app
COPY --from=builder /app/target/*.jar app.jar

RUN addgroup -g 1001 -S appgroup && \
    adduser -S appuser -u 1001 -G appgroup
USER appuser

EXPOSE 8080
CMD ["java", "-jar", "app.jar"]
# Result: ~200 MB (down from ~800 MB with full JDK)

Choosing the Right Base Image

Your base image choice has the biggest single impact on final image size. Here is how the common options compare:

Base Image	Size	Packages	Use Case
`ubuntu:24.04`	~78 MB	Full apt ecosystem	When you need specific Ubuntu packages
`debian:bookworm-slim`	~74 MB	Minimal Debian	General-purpose slim base
`alpine:3.19`	~7 MB	musl libc, BusyBox	Minimal Linux with package manager
`gcr.io/distroless/static`	~2 MB	None (static binaries only)	Go, Rust, statically-linked apps
`gcr.io/distroless/base`	~20 MB	glibc, libssl, ca-certs	C/C++, dynamically-linked apps
`gcr.io/distroless/java21`	~220 MB	JRE only	Java applications
`scratch`	0 MB	Nothing at all	Static binaries with no OS dependencies

Alpine Linux Considerations

Alpine uses musl libc instead of glibc. This makes it much smaller but can cause compatibility issues with some applications:

# Alpine: Small but uses musl libc
FROM node:20-alpine
# Some native npm packages may fail to build
# Python C extensions may behave differently

# Slim: Larger but uses standard glibc
FROM node:20-slim
# Maximum compatibility
# Still much smaller than the full image

If you hit musl-related issues on Alpine, switch to the -slim variant of your language's base image. The size difference (70 MB vs 180 MB) is usually less important than compatibility.

Distroless Images

Google's distroless images contain only your application and its runtime dependencies. No shell, no package manager, no utilities. This dramatically reduces the attack surface:

# Using distroless for a Node.js app
FROM node:20-slim AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .

FROM gcr.io/distroless/nodejs20-debian12
WORKDIR /app
COPY --from=builder /app .
CMD ["dist/index.js"]
# No shell access - even if compromised, attacker cannot
# run arbitrary commands

The trade-off is debugging: without a shell, you cannot docker exec into the container. For debugging, use a debug variant or temporarily swap to a normal base image.

Layer Caching Optimization

Docker caches each layer of your image. When a layer changes, all subsequent layers are rebuilt. Ordering your Dockerfile instructions from least-frequently-changed to most-frequently-changed maximizes cache hits:

# BAD: COPY everything first, cache busts on any file change
FROM node:20-alpine
WORKDIR /app
COPY . .                    # Any file change invalidates this layer
RUN npm ci                  # Always reinstalls even if only code changed
RUN npm run build

# GOOD: Dependencies first, source code last
FROM node:20-alpine
WORKDIR /app
COPY package.json package-lock.json ./  # Only changes when deps change
RUN npm ci                               # Cached unless deps changed
COPY . .                                 # Code changes only rebuild from here
RUN npm run build

This pattern means that changing a source file only rebuilds the COPY . . and RUN npm run build layers. The npm ci layer (which can take minutes) is cached.

Advanced Cache Mounting

Docker BuildKit provides cache mounts that persist across builds, even when the layer itself changes:

# Cache Go module downloads across builds
FROM golang:1.22-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN --mount=type=cache,target=/go/pkg/mod \
    go mod download
COPY . .
RUN --mount=type=cache,target=/go/pkg/mod \
    --mount=type=cache,target=/root/.cache/go-build \
    CGO_ENABLED=0 go build -o /app/server .

# Cache pip downloads
FROM python:3.12-slim AS builder
RUN --mount=type=cache,target=/root/.cache/pip \
    pip install -r requirements.txt

# Cache apt packages
FROM debian:bookworm-slim
RUN --mount=type=cache,target=/var/cache/apt \
    --mount=type=cache,target=/var/lib/apt \
    apt-get update && apt-get install -y curl

Reducing Layer Count and Size

Combine RUN Commands

# BAD: Each RUN creates a new layer
RUN apt-get update
RUN apt-get install -y curl wget git
RUN rm -rf /var/lib/apt/lists/*

# GOOD: Single layer with cleanup
RUN apt-get update && \
    apt-get install -y --no-install-recommends \
      curl \
      wget \
      git && \
    rm -rf /var/lib/apt/lists/*

The --no-install-recommends flag prevents apt from installing suggested packages, which can save hundreds of megabytes.

Remove Build Dependencies in the Same Layer

# BAD: Build deps persist in a previous layer even if you remove them later
RUN apt-get install -y gcc python3-dev
RUN pip install -r requirements.txt
RUN apt-get remove -y gcc python3-dev  # Still in the image!

# GOOD: Install, use, and remove in one layer
RUN apt-get update && \
    apt-get install -y --no-install-recommends gcc python3-dev && \
    pip install --no-cache-dir -r requirements.txt && \
    apt-get purge -y gcc python3-dev && \
    apt-get autoremove -y && \
    rm -rf /var/lib/apt/lists/*

Image Size Comparison: Real-World Results

Here are actual size comparisons for a typical web application across different optimization levels:

Optimization Level	Node.js	Python	Go	Java
Unoptimized (full base)	1.1 GB	1.0 GB	800 MB	800 MB
Slim base image	400 MB	350 MB	400 MB	400 MB
Alpine base	200 MB	180 MB	250 MB	200 MB
Multi-stage + Alpine	130 MB	120 MB	15 MB	200 MB
Multi-stage + distroless	120 MB	100 MB	8 MB	220 MB
Multi-stage + scratch	N/A	N/A	6 MB	N/A

Analyzing Image Size

Use these tools to understand what is consuming space in your images:

# Show image size
docker images my-app

# Show layer sizes
docker history my-app:latest

# Detailed layer analysis with dive
# https://github.com/wagoodman/dive
docker run --rm -it \
  -v /var/run/docker.sock:/var/run/docker.sock \
  wagoodman/dive:latest my-app:latest

# Compare two images
docker images --format "{{.Repository}}:{{.Tag}} {{.Size}}" | sort -k2 -h

The dive tool is particularly useful. It shows you exactly which files are in each layer and highlights wasted space from files that were added and then removed in a later layer.

Security Benefits of Smaller Images

Smaller images are not just about performance — they are inherently more secure:

# Scan a full image vs a distroless image
$ trivy image node:20
# Total: 847 vulnerabilities (142 HIGH, 23 CRITICAL)

$ trivy image gcr.io/distroless/nodejs20-debian12
# Total: 12 vulnerabilities (2 HIGH, 0 CRITICAL)

# Fewer packages = fewer CVEs = less patching

Every package in your image is a potential entry point for attackers and a maintenance burden for patching. By removing unnecessary packages, you dramatically reduce both your vulnerability count and the frequency of required updates.

Tip: Use usulnet's image management to monitor the sizes of images across all your Docker hosts. Identify oversized images, clean up unused ones, and track size trends over time. Combined with the optimization techniques in this article, you can significantly reduce your infrastructure's storage footprint and deployment times.

Optimization Checklist

Create a comprehensive .dockerignore file
Use multi-stage builds to separate build and runtime
Choose the smallest viable base image (alpine, slim, distroless)
Order Dockerfile instructions for maximum cache hits (dependencies before source code)
Combine RUN commands and clean up in the same layer
Use --no-install-recommends for apt and --no-cache-dir for pip
Use BuildKit cache mounts for dependency downloads
Strip debug symbols from compiled binaries
Run as a non-root user (adds no size, improves security)
Analyze with dive to find hidden waste

Conclusion

Docker image optimization is one of the highest-ROI activities in container infrastructure. A few hours of Dockerfile improvements can save minutes on every build, seconds on every deployment, and gigabytes across your registry and hosts. Start with the easy wins (.dockerignore and multi-stage builds), then progress to smaller base images and layer optimization as needed.

The goal is not to achieve the absolute smallest image possible. It is to remove everything that does not serve your application. A 150 MB image that builds reliably and runs correctly is better than a 10 MB image that breaks because of missing system libraries. Optimize pragmatically, test thoroughly, and let the size reduction be a side effect of good container hygiene rather than an end in itself.

Why Image Size Matters

Quick Win: The .dockerignore File

Multi-Stage Builds: The Core Technique

Node.js Example

Go Example

Python Example

Java Example

Choosing the Right Base Image

Alpine Linux Considerations

Distroless Images

Layer Caching Optimization

Advanced Cache Mounting

Reducing Layer Count and Size

Combine RUN Commands

Remove Build Dependencies in the Same Layer

Image Size Comparison: Real-World Results

Analyzing Image Size

Security Benefits of Smaller Images

Optimization Checklist

Conclusion

Related Articles

Docker Secrets Management: Handling Sensitive Data in Containers

Docker Logging Best Practices: Centralized Logging for Containers

Deploy a Full Docker Management Stack in 60 Seconds