Guides

Dockerfile Best Practices: Writing Production-Ready Dockerfiles

March 5, 2025 · 20 min read

A Dockerfile is often the first thing written when containerizing an application and the last thing optimized. The result is production images running as root, bloated with build tools, missing health checks, and leaking secrets in layer history. A well-written Dockerfile is the foundation of a secure, efficient, and maintainable container deployment.

This guide covers every essential best practice, from base image selection to signal handling, with real examples you can adapt for your own projects.

1. Choose the Right Base Image

Your base image determines your image size, attack surface, and available system libraries. Choose deliberately:

Base Image	Size	Best For	Trade-offs
`scratch`	0 MB	Static Go binaries	No shell, no libc, no debugging tools
`alpine:3.19`	~7 MB	Most applications	Uses musl libc (rare compatibility issues)
`debian:bookworm-slim`	~75 MB	Apps needing glibc	Larger but maximum compatibility
`ubuntu:22.04`	~77 MB	Development, ML workloads	Familiar but heavier
`distroless`	~20 MB	Security-focused deployments	No shell, minimal attack surface

# Pin to specific versions - never use :latest in production
FROM node:20.11.1-alpine3.19

# Use digest for maximum reproducibility
FROM node@sha256:abcdef123456...

Rule: Always pin your base image to a specific version. FROM node:latest means your build can produce different results tomorrow than it does today. In production, reproducibility is not optional.

2. Multi-Stage Builds

Multi-stage builds are the single most impactful optimization for production images. They separate the build environment from the runtime environment:

# syntax=docker/dockerfile:1

# === Build Stage ===
FROM golang:1.22-alpine AS builder

RUN apk add --no-cache git ca-certificates

WORKDIR /src
COPY go.mod go.sum ./
RUN --mount=type=cache,target=/go/pkg/mod go mod download

COPY . .
RUN --mount=type=cache,target=/root/.cache/go-build \
    CGO_ENABLED=0 GOOS=linux go build \
    -ldflags="-s -w" \
    -o /app ./cmd/server

# === Runtime Stage ===
FROM alpine:3.19

# Install runtime dependencies only
RUN apk add --no-cache ca-certificates tzdata

# Create non-root user
RUN addgroup -S appgroup && adduser -S appuser -G appgroup

# Copy only the binary from the build stage
COPY --from=builder /app /usr/local/bin/app

USER appuser
EXPOSE 8080
HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
  CMD ["/usr/local/bin/app", "healthcheck"]
ENTRYPOINT ["app"]

The build stage contains the Go compiler, source code, and build tools (~800MB). The runtime stage contains only the compiled binary and minimal system libraries (~15MB). Nothing from the build stage leaks into the final image.

3. Layer Ordering for Cache Efficiency

Docker caches each layer. When a layer changes, all subsequent layers are rebuilt. Order your instructions from least to most frequently changing:

# Good: Dependencies change less often than source code
FROM node:20-alpine

WORKDIR /app

# 1. Copy dependency manifests (changes rarely)
COPY package.json package-lock.json ./

# 2. Install dependencies (cached unless manifests change)
RUN npm ci --production

# 3. Copy source code (changes often)
COPY . .

# 4. Build (re-runs when source changes)
RUN npm run build

CMD ["node", "dist/server.js"]

# Bad: Source code changes bust the dependency cache
FROM node:20-alpine
WORKDIR /app
COPY . .                    # Any source change invalidates everything below
RUN npm ci --production     # Reinstalls ALL dependencies every time
RUN npm run build
CMD ["node", "dist/server.js"]

4. The .dockerignore File

A missing or inadequate .dockerignore sends unnecessary files to the build daemon, slowing builds and potentially leaking sensitive data:

# .dockerignore
.git
.gitignore
.dockerignore
Dockerfile*
docker-compose*.yml
README.md
LICENSE
docs/

# Dependencies (installed during build)
node_modules
vendor
__pycache__
*.pyc

# Build artifacts
dist
build
*.tar.gz

# IDE and OS files
.vscode
.idea
*.swp
.DS_Store

# Environment and secrets
.env
.env.*
*.pem
*.key
credentials.*

# Test and CI
coverage
.pytest_cache
.nyc_output
tests/

5. Run as Non-Root User

Containers run as root by default. This is a security risk—if an attacker escapes the container, they have root access to the host:

# Create a dedicated user and group
FROM node:20-alpine

# Create user early in the Dockerfile
RUN addgroup -S nodejs && adduser -S nodejs -G nodejs

WORKDIR /app
COPY --chown=nodejs:nodejs package*.json ./
RUN npm ci --production
COPY --chown=nodejs:nodejs . .

# Switch to non-root user BEFORE CMD
USER nodejs

EXPOSE 3000
CMD ["node", "server.js"]

# For distroless images (already have a nonroot user)
FROM gcr.io/distroless/static-debian12:nonroot
COPY --from=builder /app /usr/local/bin/app
USER nonroot:nonroot
ENTRYPOINT ["app"]

Warning: Place the USER instruction after RUN commands that need root (like apt-get install), but before EXPOSE, CMD, and ENTRYPOINT. If you set USER too early, package installation will fail.

6. HEALTHCHECK Instruction

Health checks enable Docker (and orchestrators like Swarm) to detect and replace unhealthy containers automatically:

# HTTP health check
HEALTHCHECK --interval=30s --timeout=5s --start-period=30s --retries=3 \
  CMD curl -f http://localhost:8080/health || exit 1

# TCP port check (no curl needed)
HEALTHCHECK --interval=30s --timeout=5s --retries=3 \
  CMD nc -z localhost 8080 || exit 1

# Custom health check binary (recommended for production)
HEALTHCHECK --interval=30s --timeout=5s --start-period=60s --retries=3 \
  CMD ["/usr/local/bin/app", "healthcheck"]

# PostgreSQL
HEALTHCHECK --interval=10s --timeout=5s --retries=5 \
  CMD pg_isready -U postgres || exit 1

The --start-period gives the application time to start before health checks begin counting against it. This is critical for applications with slow startup (Java, .NET).

7. COPY vs ADD

Use COPY for everything. ADD has two implicit behaviors that cause confusion:

ADD auto-extracts tar archives (unexpected side effect)
ADD can fetch URLs (use curl or wget instead for clarity)

# Use COPY for local files
COPY ./config/nginx.conf /etc/nginx/nginx.conf
COPY . /app

# Only use ADD for intentional tar extraction
ADD rootfs.tar.gz /

# For downloading files, use RUN with curl (transparent and controllable)
RUN curl -fsSL https://example.com/tool.tar.gz | tar xz -C /usr/local/bin/

8. ARG vs ENV

Both set variables, but they have fundamentally different scopes and persistence:

Feature	ARG	ENV
Available during build	Yes	Yes
Available at runtime	No	Yes
Persists in image	No	Yes
Visible in docker history	Yes	Yes
Can be overridden at build	Yes (`--build-arg`)	No (only at runtime)

# ARG for build-time configuration
ARG GO_VERSION=1.22
FROM golang:${GO_VERSION}-alpine

ARG APP_VERSION=dev
RUN go build -ldflags "-X main.version=${APP_VERSION}" -o /app

# ENV for runtime configuration
ENV PORT=8080
ENV LOG_LEVEL=info
EXPOSE ${PORT}
CMD ["app"]

# Build with custom ARG
docker build --build-arg APP_VERSION=2.1.0 -t myapp:2.1.0 .

Warning: Never use ARG or ENV for secrets. Both are visible in docker history. Use BuildKit's --mount=type=secret instead.

9. ENTRYPOINT vs CMD

Understanding the interaction between ENTRYPOINT and CMD is essential for predictable container behavior:

# CMD alone: Easy to override, used for the default command
CMD ["node", "server.js"]
# docker run myapp               -> node server.js
# docker run myapp node test.js  -> node test.js (CMD replaced)

# ENTRYPOINT alone: Hard to override, defines the container's purpose
ENTRYPOINT ["node", "server.js"]
# docker run myapp               -> node server.js
# docker run myapp --port 3000   -> node server.js --port 3000 (appended!)

# ENTRYPOINT + CMD: Best pattern for production
ENTRYPOINT ["node"]
CMD ["server.js"]
# docker run myapp               -> node server.js
# docker run myapp test.js       -> node test.js (CMD replaced)

# Shell form vs exec form
# Always use exec form (JSON array) in production:
CMD ["node", "server.js"]       # Exec form: node is PID 1
CMD node server.js              # Shell form: /bin/sh -c "node server.js"
                                 # sh is PID 1, node won't receive signals!

10. Signal Handling and Graceful Shutdown

When Docker stops a container, it sends SIGTERM to PID 1. If PID 1 does not handle SIGTERM, Docker waits the stop timeout (default 10s) then sends SIGKILL:

# Problem: Shell form CMD means sh is PID 1, not your app
CMD node server.js
# sh doesn't forward SIGTERM to node
# Container always takes 10s to stop (waits for SIGKILL)

# Solution 1: Use exec form
CMD ["node", "server.js"]
# node is PID 1 and receives SIGTERM directly

# Solution 2: Use tini for proper init
RUN apk add --no-cache tini
ENTRYPOINT ["tini", "--"]
CMD ["node", "server.js"]

# Solution 3: Use Docker's built-in init
# docker run --init myapp

# Your application should handle SIGTERM:
# Node.js example:
# process.on('SIGTERM', () => {
#   console.log('Received SIGTERM, shutting down gracefully');
#   server.close(() => process.exit(0));
# });

11. Minimize RUN Layers

Each RUN instruction creates a new layer. Combine related operations and clean up in the same layer:

# Bad: Multiple layers, cache files persist in earlier layers
RUN apt-get update
RUN apt-get install -y curl git
RUN apt-get clean

# Good: Single layer, cleanup in the same layer
RUN apt-get update && \
    apt-get install -y --no-install-recommends \
      curl \
      git \
    && rm -rf /var/lib/apt/lists/*

# With BuildKit heredoc (cleaner):
RUN <<EOF
apt-get update
apt-get install -y --no-install-recommends curl git
rm -rf /var/lib/apt/lists/*
EOF

12. Security Scanning

Scan your images for known vulnerabilities before deploying to production:

# Scan with Trivy
trivy image myapp:latest

# Scan with Docker Scout
docker scout cves myapp:latest

# Scan with Grype
grype myapp:latest

# Integrate into Dockerfile (fail build on critical vulnerabilities)
FROM aquasec/trivy:latest AS scanner
COPY --from=builder /app /scan/app
RUN trivy filesystem --exit-code 1 --severity CRITICAL /scan/

# In CI/CD pipeline
docker build -t myapp:latest .
trivy image --exit-code 1 --severity CRITICAL,HIGH myapp:latest

Tip: Use platforms like usulnet that integrate security scanning into your Docker workflow. Automated scanning on every image push ensures vulnerabilities are caught before they reach production.

13. Complete Production Dockerfile Template

Here is a production-ready Dockerfile that combines all best practices:

# syntax=docker/dockerfile:1

# === Build Arguments ===
ARG NODE_VERSION=20
ARG ALPINE_VERSION=3.19

# === Build Stage ===
FROM node:${NODE_VERSION}-alpine${ALPINE_VERSION} AS builder

WORKDIR /app

# Install build dependencies
RUN apk add --no-cache python3 make g++

# Install app dependencies (cached unless package files change)
COPY package.json package-lock.json ./
RUN --mount=type=cache,target=/root/.npm \
    npm ci

# Copy source and build
COPY . .
RUN npm run build && npm prune --production

# === Runtime Stage ===
FROM node:${NODE_VERSION}-alpine${ALPINE_VERSION} AS runtime

# Security: Install only what's needed, remove package manager
RUN apk add --no-cache tini curl && \
    rm -rf /var/cache/apk/*

# Security: Create non-root user
RUN addgroup -S appgroup && adduser -S appuser -G appgroup

WORKDIR /app

# Copy only production artifacts
COPY --from=builder --chown=appuser:appgroup /app/dist ./dist
COPY --from=builder --chown=appuser:appgroup /app/node_modules ./node_modules
COPY --from=builder --chown=appuser:appgroup /app/package.json ./

# Security: Switch to non-root user
USER appuser

# Runtime configuration
ENV NODE_ENV=production
ENV PORT=8080
EXPOSE ${PORT}

# Health check
HEALTHCHECK --interval=30s --timeout=5s --start-period=30s --retries=3 \
  CMD curl -f http://localhost:${PORT}/health || exit 1

# Use tini for proper PID 1 signal handling
ENTRYPOINT ["tini", "--"]
CMD ["node", "dist/server.js"]

# Metadata
LABEL org.opencontainers.image.source="https://github.com/myorg/myapp" \
      org.opencontainers.image.description="My Application" \
      org.opencontainers.image.version="2.1.0"

Summary Checklist

Pin base image versions (never use :latest)
Use multi-stage builds to separate build and runtime
Order layers from least to most frequently changing
Create and maintain a .dockerignore file
Run as non-root with the USER instruction
Add HEALTHCHECK for automated health monitoring
Use COPY instead of ADD
Use exec form ["cmd", "arg"] for ENTRYPOINT and CMD
Handle SIGTERM for graceful shutdown (or use tini)
Use BuildKit secrets for sensitive build data
Combine and clean up in single RUN layers
Scan images for vulnerabilities before production
Add OCI labels for image metadata

1. Choose the Right Base Image

2. Multi-Stage Builds

3. Layer Ordering for Cache Efficiency

4. The .dockerignore File

5. Run as Non-Root User

6. HEALTHCHECK Instruction

7. COPY vs ADD

8. ARG vs ENV

9. ENTRYPOINT vs CMD

10. Signal Handling and Graceful Shutdown

11. Minimize RUN Layers

12. Security Scanning

13. Complete Production Dockerfile Template

Summary Checklist

Related Articles

Docker BuildKit: Advanced Image Building Techniques

Docker Image Optimization: Reducing Size and Build Time

Docker Security Scanning: Finding Vulnerabilities Before Production