Docker Image Optimization: Reduce Size by 90% with Multi-Stage Builds
A typical Node.js application built with FROM node:20 produces an image over 1 GB in size. The same application, built with multi-stage builds and an Alpine base, comes in under 100 MB. A Go application can drop from 800 MB to under 10 MB. That is not a theoretical improvement — it is a practical, reproducible optimization that affects build times, deployment speed, storage costs, registry bandwidth, and your attack surface.
This guide covers every technique for shrinking Docker images, from quick wins like .dockerignore to advanced strategies like distroless images and build cache optimization. Each section includes real Dockerfile examples you can adapt for your projects.
Why Image Size Matters
Large Docker images are not just a storage concern. They create cascading problems across your entire workflow:
- Slower CI/CD pipelines — Every build pushes and pulls large images. A 1 GB image over a 100 Mbps connection takes 80 seconds just to transfer.
- Slower deployments — Rolling updates require pulling the new image on every node before the new container starts.
- Higher cloud costs — Registry storage, bandwidth between regions, and disk space on every node add up.
- Larger attack surface — Every package, library, and binary in your image is a potential vulnerability. The node:20 base image contains over 400 packages, most of which your application never uses.
- Slower scaling — Auto-scaling new nodes requires pulling images. Smaller images mean faster scale-up response.
Quick Win: The .dockerignore File
Before optimizing anything else, create a .dockerignore file. Without it, docker build sends your entire project directory to the Docker daemon as the build context — including node_modules, .git, test fixtures, local databases, and editor configs:
# .dockerignore
# Version control
.git
.gitignore
# Dependencies (will be installed during build)
node_modules
vendor
__pycache__
*.pyc
.venv
venv
# Build output
dist
build
*.egg-info
# IDE and editor files
.idea
.vscode
*.swp
*.swo
.DS_Store
# Docker files
Dockerfile*
docker-compose*
.dockerignore
# Documentation
README.md
docs/
*.md
# Tests
tests/
test/
__tests__
*.test.js
*.spec.js
coverage/
.nyc_output
# CI/CD
.github
.gitlab-ci.yml
.circleci
Jenkinsfile
# Environment and secrets
.env
.env.*
*.pem
*.key
The impact can be dramatic. A project with a 500 MB node_modules directory and a 200 MB .git directory sends 700 MB of unnecessary data to the Docker daemon before the build even starts.
Multi-Stage Builds: The Core Technique
Multi-stage builds let you use one image for building your application and a different (smaller) image for running it. Only the final stage becomes your production image. Build tools, compilers, dev dependencies, and source code are left behind in the build stages.
Node.js Example
# ---- BAD: Single-stage build ----
FROM node:20
WORKDIR /app
COPY . .
RUN npm ci && npm run build
EXPOSE 3000
CMD ["node", "dist/index.js"]
# Result: ~1.1 GB
# ---- GOOD: Multi-stage build ----
# Stage 1: Build
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production && \
cp -R node_modules prod_modules && \
npm ci && \
npm run build
# Stage 2: Production
FROM node:20-alpine AS production
WORKDIR /app
# Copy only production dependencies and built output
COPY --from=builder /app/prod_modules ./node_modules
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/package.json ./
# Run as non-root user
RUN addgroup -g 1001 -S appgroup && \
adduser -S appuser -u 1001 -G appgroup
USER appuser
EXPOSE 3000
CMD ["node", "dist/index.js"]
# Result: ~150 MB
Go Example
Go produces statically-linked binaries, making it the best case for multi-stage optimization:
# Stage 1: Build
FROM golang:1.22-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -ldflags="-s -w" -o /app/server .
# Stage 2: Minimal runtime
FROM scratch
COPY --from=builder /app/server /server
COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/
EXPOSE 8080
ENTRYPOINT ["/server"]
# Result: ~8 MB (down from ~800 MB)
The scratch base image is literally empty — no shell, no package manager, nothing. The Go binary and TLS certificates are the only things in the image. The -ldflags="-s -w" flags strip debug information and symbol tables, further reducing binary size.
Python Example
# Stage 1: Build dependencies
FROM python:3.12-slim AS builder
WORKDIR /app
RUN python -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Stage 2: Production
FROM python:3.12-slim AS production
WORKDIR /app
COPY --from=builder /opt/venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
COPY . .
RUN useradd --create-home appuser
USER appuser
EXPOSE 8000
CMD ["gunicorn", "--bind", "0.0.0.0:8000", "app:create_app()"]
# Result: ~180 MB (down from ~1 GB with full python:3.12)
Java Example
# Stage 1: Build with Maven
FROM maven:3.9-eclipse-temurin-21 AS builder
WORKDIR /app
COPY pom.xml .
RUN mvn dependency:go-offline
COPY src ./src
RUN mvn package -DskipTests
# Stage 2: JRE-only runtime
FROM eclipse-temurin:21-jre-alpine AS production
WORKDIR /app
COPY --from=builder /app/target/*.jar app.jar
RUN addgroup -g 1001 -S appgroup && \
adduser -S appuser -u 1001 -G appgroup
USER appuser
EXPOSE 8080
CMD ["java", "-jar", "app.jar"]
# Result: ~200 MB (down from ~800 MB with full JDK)
Choosing the Right Base Image
Your base image choice has the biggest single impact on final image size. Here is how the common options compare:
| Base Image | Size | Packages | Use Case |
|---|---|---|---|
ubuntu:24.04 |
~78 MB | Full apt ecosystem | When you need specific Ubuntu packages |
debian:bookworm-slim |
~74 MB | Minimal Debian | General-purpose slim base |
alpine:3.19 |
~7 MB | musl libc, BusyBox | Minimal Linux with package manager |
gcr.io/distroless/static |
~2 MB | None (static binaries only) | Go, Rust, statically-linked apps |
gcr.io/distroless/base |
~20 MB | glibc, libssl, ca-certs | C/C++, dynamically-linked apps |
gcr.io/distroless/java21 |
~220 MB | JRE only | Java applications |
scratch |
0 MB | Nothing at all | Static binaries with no OS dependencies |
Alpine Linux Considerations
Alpine uses musl libc instead of glibc. This makes it much smaller but can cause compatibility issues with some applications:
# Alpine: Small but uses musl libc
FROM node:20-alpine
# Some native npm packages may fail to build
# Python C extensions may behave differently
# Slim: Larger but uses standard glibc
FROM node:20-slim
# Maximum compatibility
# Still much smaller than the full image
If you hit musl-related issues on Alpine, switch to the -slim variant of your language's base image. The size difference (70 MB vs 180 MB) is usually less important than compatibility.
Distroless Images
Google's distroless images contain only your application and its runtime dependencies. No shell, no package manager, no utilities. This dramatically reduces the attack surface:
# Using distroless for a Node.js app
FROM node:20-slim AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
FROM gcr.io/distroless/nodejs20-debian12
WORKDIR /app
COPY --from=builder /app .
CMD ["dist/index.js"]
# No shell access - even if compromised, attacker cannot
# run arbitrary commands
The trade-off is debugging: without a shell, you cannot docker exec into the container. For debugging, use a debug variant or temporarily swap to a normal base image.
Layer Caching Optimization
Docker caches each layer of your image. When a layer changes, all subsequent layers are rebuilt. Ordering your Dockerfile instructions from least-frequently-changed to most-frequently-changed maximizes cache hits:
# BAD: COPY everything first, cache busts on any file change
FROM node:20-alpine
WORKDIR /app
COPY . . # Any file change invalidates this layer
RUN npm ci # Always reinstalls even if only code changed
RUN npm run build
# GOOD: Dependencies first, source code last
FROM node:20-alpine
WORKDIR /app
COPY package.json package-lock.json ./ # Only changes when deps change
RUN npm ci # Cached unless deps changed
COPY . . # Code changes only rebuild from here
RUN npm run build
This pattern means that changing a source file only rebuilds the COPY . . and RUN npm run build layers. The npm ci layer (which can take minutes) is cached.
Advanced Cache Mounting
Docker BuildKit provides cache mounts that persist across builds, even when the layer itself changes:
# Cache Go module downloads across builds
FROM golang:1.22-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN --mount=type=cache,target=/go/pkg/mod \
go mod download
COPY . .
RUN --mount=type=cache,target=/go/pkg/mod \
--mount=type=cache,target=/root/.cache/go-build \
CGO_ENABLED=0 go build -o /app/server .
# Cache pip downloads
FROM python:3.12-slim AS builder
RUN --mount=type=cache,target=/root/.cache/pip \
pip install -r requirements.txt
# Cache apt packages
FROM debian:bookworm-slim
RUN --mount=type=cache,target=/var/cache/apt \
--mount=type=cache,target=/var/lib/apt \
apt-get update && apt-get install -y curl
Reducing Layer Count and Size
Combine RUN Commands
# BAD: Each RUN creates a new layer
RUN apt-get update
RUN apt-get install -y curl wget git
RUN rm -rf /var/lib/apt/lists/*
# GOOD: Single layer with cleanup
RUN apt-get update && \
apt-get install -y --no-install-recommends \
curl \
wget \
git && \
rm -rf /var/lib/apt/lists/*
The --no-install-recommends flag prevents apt from installing suggested packages, which can save hundreds of megabytes.
Remove Build Dependencies in the Same Layer
# BAD: Build deps persist in a previous layer even if you remove them later
RUN apt-get install -y gcc python3-dev
RUN pip install -r requirements.txt
RUN apt-get remove -y gcc python3-dev # Still in the image!
# GOOD: Install, use, and remove in one layer
RUN apt-get update && \
apt-get install -y --no-install-recommends gcc python3-dev && \
pip install --no-cache-dir -r requirements.txt && \
apt-get purge -y gcc python3-dev && \
apt-get autoremove -y && \
rm -rf /var/lib/apt/lists/*
Image Size Comparison: Real-World Results
Here are actual size comparisons for a typical web application across different optimization levels:
| Optimization Level | Node.js | Python | Go | Java |
|---|---|---|---|---|
| Unoptimized (full base) | 1.1 GB | 1.0 GB | 800 MB | 800 MB |
| Slim base image | 400 MB | 350 MB | 400 MB | 400 MB |
| Alpine base | 200 MB | 180 MB | 250 MB | 200 MB |
| Multi-stage + Alpine | 130 MB | 120 MB | 15 MB | 200 MB |
| Multi-stage + distroless | 120 MB | 100 MB | 8 MB | 220 MB |
| Multi-stage + scratch | N/A | N/A | 6 MB | N/A |
Analyzing Image Size
Use these tools to understand what is consuming space in your images:
# Show image size
docker images my-app
# Show layer sizes
docker history my-app:latest
# Detailed layer analysis with dive
# https://github.com/wagoodman/dive
docker run --rm -it \
-v /var/run/docker.sock:/var/run/docker.sock \
wagoodman/dive:latest my-app:latest
# Compare two images
docker images --format "{{.Repository}}:{{.Tag}} {{.Size}}" | sort -k2 -h
The dive tool is particularly useful. It shows you exactly which files are in each layer and highlights wasted space from files that were added and then removed in a later layer.
Security Benefits of Smaller Images
Smaller images are not just about performance — they are inherently more secure:
# Scan a full image vs a distroless image
$ trivy image node:20
# Total: 847 vulnerabilities (142 HIGH, 23 CRITICAL)
$ trivy image gcr.io/distroless/nodejs20-debian12
# Total: 12 vulnerabilities (2 HIGH, 0 CRITICAL)
# Fewer packages = fewer CVEs = less patching
Every package in your image is a potential entry point for attackers and a maintenance burden for patching. By removing unnecessary packages, you dramatically reduce both your vulnerability count and the frequency of required updates.
Optimization Checklist
- Create a comprehensive
.dockerignorefile - Use multi-stage builds to separate build and runtime
- Choose the smallest viable base image (alpine, slim, distroless)
- Order Dockerfile instructions for maximum cache hits (dependencies before source code)
- Combine RUN commands and clean up in the same layer
- Use
--no-install-recommendsfor apt and--no-cache-dirfor pip - Use BuildKit cache mounts for dependency downloads
- Strip debug symbols from compiled binaries
- Run as a non-root user (adds no size, improves security)
- Analyze with
diveto find hidden waste
Conclusion
Docker image optimization is one of the highest-ROI activities in container infrastructure. A few hours of Dockerfile improvements can save minutes on every build, seconds on every deployment, and gigabytes across your registry and hosts. Start with the easy wins (.dockerignore and multi-stage builds), then progress to smaller base images and layer optimization as needed.
The goal is not to achieve the absolute smallest image possible. It is to remove everything that does not serve your application. A 150 MB image that builds reliably and runs correctly is better than a 10 MB image that breaks because of missing system libraries. Optimize pragmatically, test thoroughly, and let the size reduction be a side effect of good container hygiene rather than an end in itself.