Docker Compose in Production: 12 Best Practices You Should Follow

Docker Compose was originally designed for development environments. But in 2025, it's widely used in production, from single-server deployments to multi-service architectures managed by small teams. And for good reason: it's simple, declarative, and doesn't require the operational complexity of Kubernetes.

The problem is that most Docker Compose files are written for development and deployed to production without modification. A docker-compose.yml that works fine on your laptop will cause problems in production: containers eating all available memory, no health checks to detect failures, secrets stored in plain text, logs filling up disk space.

These 12 best practices will help you close the gap between development convenience and production reliability.

1. Always Pin Image Versions

This is the single most important practice. Never use :latest in production.

# BAD: unpredictable, different result each time
services:
  api:
    image: node:latest

# BETTER: pinned to minor version
services:
  api:
    image: node:20-alpine

# BEST: pinned to specific digest
services:
  api:
    image: node:20-alpine@sha256:abc123...

Using :latest means that docker compose pull will give you a different image on Tuesday than it did on Monday. A new minor version might include breaking changes that take down your service at the worst possible time.

Pin to at least a minor version (node:20-alpine). For maximum reproducibility, pin to the SHA256 digest, which is immutable.

2. Set Resource Limits

Without resource limits, a single container can consume all available CPU and memory, starving other containers and potentially crashing the host.

services:
  api:
    image: myapp:1.2.3
    deploy:
      resources:
        limits:
          cpus: '2.0'
          memory: 512M
        reservations:
          cpus: '0.5'
          memory: 256M

The limits section sets hard caps. If a container tries to use more than 512 MB of RAM, Docker will kill it (OOM). The reservations section tells Docker's scheduler how many resources to guarantee for the container.

How to determine limits: Run your application under load and observe actual resource usage with docker stats. Set the limit to 1.5-2x the observed peak usage to allow for spikes.

3. Configure Health Checks

Docker's restart policy will restart a crashed container, but it won't restart a container that's still running but no longer serving requests (hung process, deadlock, database connection pool exhaustion). Health checks solve this.

services:
  api:
    image: myapp:1.2.3
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s
    restart: unless-stopped

Key parameters:

  • test — the command to run. Use a lightweight endpoint. curl -f returns non-zero on HTTP errors.
  • interval — how often to check. 30s is a good default.
  • timeout — how long to wait for the check to complete.
  • retries — how many consecutive failures before marking unhealthy.
  • start_period — grace period for slow-starting containers. Health check failures during this period don't count.

For containers that don't have curl installed (like Alpine-based images), use alternatives:

# Using wget (available on Alpine)
healthcheck:
  test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:3000/health"]

# Using a custom script
healthcheck:
  test: ["CMD", "/app/healthcheck.sh"]

# For PostgreSQL
healthcheck:
  test: ["CMD-SHELL", "pg_isready -U postgres"]

# For Redis
healthcheck:
  test: ["CMD", "redis-cli", "ping"]

4. Use Restart Policies

Every production container needs a restart policy. Without one, a crashed container stays down until someone manually restarts it.

services:
  api:
    image: myapp:1.2.3
    restart: unless-stopped  # Recommended for most services

  worker:
    image: myapp-worker:1.2.3
    restart: on-failure       # For batch jobs that should complete

The options:

  • no — never restart (default). Only use for one-off tasks.
  • always — restart no matter what, including after Docker daemon restart.
  • unless-stopped — like always, but respects manual stops. Best for most production services.
  • on-failure — only restart on non-zero exit codes. Good for workers and batch jobs.

5. Never Store Secrets in Environment Variables in the Compose File

This is one of the most common mistakes in production Docker Compose setups:

# BAD: secrets in plain text, committed to git
services:
  api:
    image: myapp:1.2.3
    environment:
      - DATABASE_URL=postgres://admin:supersecret@db:5432/myapp
      - API_KEY=sk-1234567890abcdef

Instead, use one of these approaches:

Option 1: Environment file (not committed to git)

services:
  api:
    image: myapp:1.2.3
    env_file:
      - .env  # Add .env to .gitignore!

Option 2: Docker secrets (more secure)

services:
  api:
    image: myapp:1.2.3
    secrets:
      - db_password
      - api_key

secrets:
  db_password:
    file: ./secrets/db_password.txt
  api_key:
    file: ./secrets/api_key.txt

Docker secrets mount the secret as a file at /run/secrets/<secret_name> inside the container. Your application reads the file instead of an environment variable. This is more secure because secrets don't appear in docker inspect output or process environment listings.

Option 3: External secret manager

For production at scale, use HashiCorp Vault, AWS Secrets Manager, or similar tools. Your container pulls secrets from the vault at startup.

6. Configure Logging Properly

Docker's default logging driver stores container logs as JSON files on disk with no size limit. On a busy server, logs can fill up the disk in hours.

services:
  api:
    image: myapp:1.2.3
    logging:
      driver: json-file
      options:
        max-size: "10m"
        max-file: "3"
        tag: "{{.Name}}"

This limits each container's logs to 3 files of 10 MB each (30 MB total). When the limit is reached, the oldest file is rotated out.

For centralized logging, use a different driver:

# Send logs to a syslog server
logging:
  driver: syslog
  options:
    syslog-address: "tcp://logserver:514"
    tag: "myapp-api"

# Or use fluentd
logging:
  driver: fluentd
  options:
    fluentd-address: "localhost:24224"
    tag: "docker.{{.Name}}"

You can also set the default logging driver globally in /etc/docker/daemon.json:

{
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "10m",
    "max-file": "3"
  }
}

7. Use Named Volumes for Persistent Data

Bind mounts (./data:/data) are fine for development. In production, use named volumes for better portability and management:

services:
  db:
    image: postgres:16-alpine
    volumes:
      - postgres_data:/var/lib/postgresql/data

volumes:
  postgres_data:
    driver: local

Named volumes:

  • Are managed by Docker (visible in docker volume ls)
  • Can be backed up with docker volume commands
  • Don't depend on the host's directory structure
  • Can use different storage drivers (local, NFS, cloud storage)

If you must use bind mounts (e.g., for configuration files), use the long syntax for clarity:

volumes:
  - type: bind
    source: ./nginx.conf
    target: /etc/nginx/nginx.conf
    read_only: true

8. Define Explicit Networks

By default, Docker Compose creates a single network for all services. In production, isolate services that don't need to communicate:

services:
  api:
    image: myapp:1.2.3
    networks:
      - frontend
      - backend

  db:
    image: postgres:16-alpine
    networks:
      - backend  # Not on frontend network, can't be reached by proxy

  proxy:
    image: caddy:2-alpine
    networks:
      - frontend  # Can reach api but not db directly
    ports:
      - "443:443"

networks:
  frontend:
  backend:

This way, the reverse proxy can reach the API, but it cannot directly connect to the database. The API can reach both the proxy and the database. Defense in depth.

9. Drop Unnecessary Linux Capabilities

By default, Docker containers run with a set of Linux capabilities that they probably don't need. Drop them:

services:
  api:
    image: myapp:1.2.3
    cap_drop:
      - ALL
    cap_add:
      - NET_BIND_SERVICE  # Only if binding to ports < 1024
    security_opt:
      - no-new-privileges:true

The cap_drop: ALL removes all capabilities, then cap_add adds back only what's needed. no-new-privileges prevents the container from gaining additional privileges through setuid binaries.

For even stricter isolation:

services:
  api:
    image: myapp:1.2.3
    cap_drop:
      - ALL
    security_opt:
      - no-new-privileges:true
    read_only: true          # Read-only root filesystem
    tmpfs:
      - /tmp                 # Writable temp directory
      - /app/cache           # Writable cache directory

10. Use Depends-On with Conditions

Basic depends_on only waits for the container to start, not for the service inside to be ready. Use conditions with health checks for proper startup ordering:

services:
  api:
    image: myapp:1.2.3
    depends_on:
      db:
        condition: service_healthy
      redis:
        condition: service_healthy

  db:
    image: postgres:16-alpine
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 5s
      timeout: 5s
      retries: 5

  redis:
    image: redis:7-alpine
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 5s
      timeout: 5s
      retries: 5

With condition: service_healthy, the API container won't start until both the database and Redis have passed their health checks. This eliminates race conditions during startup.

11. Use Profiles for Environment-Specific Services

Some services should only run in certain environments. Docker Compose profiles handle this cleanly:

services:
  api:
    image: myapp:1.2.3
    # No profile = always runs

  db:
    image: postgres:16-alpine
    # No profile = always runs

  debug-tools:
    image: nicolaka/netshoot
    profiles:
      - debug
    # Only runs when: docker compose --profile debug up

  seed:
    image: myapp-seed:1.2.3
    profiles:
      - setup
    # Only runs when: docker compose --profile setup run seed
# Normal production startup
docker compose up -d

# Start with debug tools
docker compose --profile debug up -d

# Run database seeding
docker compose --profile setup run seed

12. Implement a Zero-Downtime Deployment Strategy

The default docker compose up -d stops the old container and starts a new one, causing a brief outage. For zero-downtime deployments, use a blue-green strategy:

#!/bin/bash
# deploy.sh - Zero-downtime deployment

SERVICE="api"
NEW_IMAGE="myapp:$(git rev-parse --short HEAD)"

# Pull the new image
docker compose pull $SERVICE

# Scale up a new instance alongside the old one
docker compose up -d --no-deps --scale $SERVICE=2 --no-recreate $SERVICE

# Wait for the new instance to be healthy
echo "Waiting for new instance to be healthy..."
sleep 30

# Remove the old instance
docker compose up -d --no-deps --scale $SERVICE=1 $SERVICE

echo "Deployment complete."

This works when you have a reverse proxy (Traefik, Nginx, Caddy) in front of your service that can route to multiple backends. The proxy detects the new healthy container and removes the old one from the pool.

Alternatively, use the newer docker compose up --wait flag which waits for health checks to pass:

# Pull new images and recreate with health check waiting
docker compose pull
docker compose up -d --wait

Putting It All Together

Here's a complete production-grade Docker Compose file incorporating all 12 practices:

version: "3.8"

services:
  proxy:
    image: caddy:2.7-alpine
    restart: unless-stopped
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - type: bind
        source: ./Caddyfile
        target: /etc/caddy/Caddyfile
        read_only: true
      - caddy_data:/data
      - caddy_config:/config
    networks:
      - frontend
    cap_drop:
      - ALL
    cap_add:
      - NET_BIND_SERVICE
    security_opt:
      - no-new-privileges:true
    deploy:
      resources:
        limits:
          cpus: '0.5'
          memory: 128M
    logging:
      driver: json-file
      options:
        max-size: "10m"
        max-file: "3"
    healthcheck:
      test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:80"]
      interval: 30s
      timeout: 10s
      retries: 3

  api:
    image: myapp:1.2.3
    restart: unless-stopped
    env_file:
      - .env
    secrets:
      - db_password
    depends_on:
      db:
        condition: service_healthy
      redis:
        condition: service_healthy
    networks:
      - frontend
      - backend
    cap_drop:
      - ALL
    security_opt:
      - no-new-privileges:true
    read_only: true
    tmpfs:
      - /tmp
    deploy:
      resources:
        limits:
          cpus: '2.0'
          memory: 512M
        reservations:
          cpus: '0.5'
          memory: 256M
    logging:
      driver: json-file
      options:
        max-size: "10m"
        max-file: "5"
    healthcheck:
      test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:3000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 30s

  db:
    image: postgres:16.2-alpine
    restart: unless-stopped
    environment:
      POSTGRES_DB: myapp
      POSTGRES_USER: myapp
      POSTGRES_PASSWORD_FILE: /run/secrets/db_password
    secrets:
      - db_password
    volumes:
      - postgres_data:/var/lib/postgresql/data
    networks:
      - backend
    cap_drop:
      - ALL
    cap_add:
      - CHOWN
      - DAC_OVERRIDE
      - FOWNER
      - SETGID
      - SETUID
    security_opt:
      - no-new-privileges:true
    deploy:
      resources:
        limits:
          cpus: '2.0'
          memory: 1G
        reservations:
          cpus: '0.5'
          memory: 512M
    logging:
      driver: json-file
      options:
        max-size: "10m"
        max-file: "3"
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U myapp"]
      interval: 10s
      timeout: 5s
      retries: 5

  redis:
    image: redis:7.2-alpine
    restart: unless-stopped
    command: redis-server --maxmemory 128mb --maxmemory-policy allkeys-lru
    volumes:
      - redis_data:/data
    networks:
      - backend
    cap_drop:
      - ALL
    security_opt:
      - no-new-privileges:true
    deploy:
      resources:
        limits:
          cpus: '0.5'
          memory: 256M
    logging:
      driver: json-file
      options:
        max-size: "5m"
        max-file: "3"
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 5s
      retries: 5

volumes:
  postgres_data:
  redis_data:
  caddy_data:
  caddy_config:

networks:
  frontend:
  backend:

secrets:
  db_password:
    file: ./secrets/db_password.txt

Managing Compose in Production

Writing a good compose file is half the battle. You also need a way to manage it in production. A Docker management platform like usulnet lets you deploy, monitor, and update Docker Compose stacks through a web UI, making it easier for your team to manage production services without SSH-ing into servers.

Whatever approach you use, the 12 practices above will give your Docker Compose deployments a production-grade foundation. Start by auditing your existing compose files against this list and addressing the gaps one at a time.

Quick audit: Run docker compose config in your project directory to see the fully resolved compose file. Check for missing health checks, resource limits, and restart policies.