Docker Compose in Production: 12 Best Practices You Should Follow
Docker Compose was originally designed for development environments. But in 2025, it's widely used in production, from single-server deployments to multi-service architectures managed by small teams. And for good reason: it's simple, declarative, and doesn't require the operational complexity of Kubernetes.
The problem is that most Docker Compose files are written for development and deployed to production without modification. A docker-compose.yml that works fine on your laptop will cause problems in production: containers eating all available memory, no health checks to detect failures, secrets stored in plain text, logs filling up disk space.
These 12 best practices will help you close the gap between development convenience and production reliability.
1. Always Pin Image Versions
This is the single most important practice. Never use :latest in production.
# BAD: unpredictable, different result each time
services:
api:
image: node:latest
# BETTER: pinned to minor version
services:
api:
image: node:20-alpine
# BEST: pinned to specific digest
services:
api:
image: node:20-alpine@sha256:abc123...
Using :latest means that docker compose pull will give you a different image on Tuesday than it did on Monday. A new minor version might include breaking changes that take down your service at the worst possible time.
Pin to at least a minor version (node:20-alpine). For maximum reproducibility, pin to the SHA256 digest, which is immutable.
2. Set Resource Limits
Without resource limits, a single container can consume all available CPU and memory, starving other containers and potentially crashing the host.
services:
api:
image: myapp:1.2.3
deploy:
resources:
limits:
cpus: '2.0'
memory: 512M
reservations:
cpus: '0.5'
memory: 256M
The limits section sets hard caps. If a container tries to use more than 512 MB of RAM, Docker will kill it (OOM). The reservations section tells Docker's scheduler how many resources to guarantee for the container.
docker stats. Set the limit to 1.5-2x the observed peak usage to allow for spikes.
3. Configure Health Checks
Docker's restart policy will restart a crashed container, but it won't restart a container that's still running but no longer serving requests (hung process, deadlock, database connection pool exhaustion). Health checks solve this.
services:
api:
image: myapp:1.2.3
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
restart: unless-stopped
Key parameters:
- test — the command to run. Use a lightweight endpoint.
curl -freturns non-zero on HTTP errors. - interval — how often to check. 30s is a good default.
- timeout — how long to wait for the check to complete.
- retries — how many consecutive failures before marking unhealthy.
- start_period — grace period for slow-starting containers. Health check failures during this period don't count.
For containers that don't have curl installed (like Alpine-based images), use alternatives:
# Using wget (available on Alpine)
healthcheck:
test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:3000/health"]
# Using a custom script
healthcheck:
test: ["CMD", "/app/healthcheck.sh"]
# For PostgreSQL
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
# For Redis
healthcheck:
test: ["CMD", "redis-cli", "ping"]
4. Use Restart Policies
Every production container needs a restart policy. Without one, a crashed container stays down until someone manually restarts it.
services:
api:
image: myapp:1.2.3
restart: unless-stopped # Recommended for most services
worker:
image: myapp-worker:1.2.3
restart: on-failure # For batch jobs that should complete
The options:
no— never restart (default). Only use for one-off tasks.always— restart no matter what, including after Docker daemon restart.unless-stopped— likealways, but respects manual stops. Best for most production services.on-failure— only restart on non-zero exit codes. Good for workers and batch jobs.
5. Never Store Secrets in Environment Variables in the Compose File
This is one of the most common mistakes in production Docker Compose setups:
# BAD: secrets in plain text, committed to git
services:
api:
image: myapp:1.2.3
environment:
- DATABASE_URL=postgres://admin:supersecret@db:5432/myapp
- API_KEY=sk-1234567890abcdef
Instead, use one of these approaches:
Option 1: Environment file (not committed to git)
services:
api:
image: myapp:1.2.3
env_file:
- .env # Add .env to .gitignore!
Option 2: Docker secrets (more secure)
services:
api:
image: myapp:1.2.3
secrets:
- db_password
- api_key
secrets:
db_password:
file: ./secrets/db_password.txt
api_key:
file: ./secrets/api_key.txt
Docker secrets mount the secret as a file at /run/secrets/<secret_name> inside the container. Your application reads the file instead of an environment variable. This is more secure because secrets don't appear in docker inspect output or process environment listings.
Option 3: External secret manager
For production at scale, use HashiCorp Vault, AWS Secrets Manager, or similar tools. Your container pulls secrets from the vault at startup.
6. Configure Logging Properly
Docker's default logging driver stores container logs as JSON files on disk with no size limit. On a busy server, logs can fill up the disk in hours.
services:
api:
image: myapp:1.2.3
logging:
driver: json-file
options:
max-size: "10m"
max-file: "3"
tag: "{{.Name}}"
This limits each container's logs to 3 files of 10 MB each (30 MB total). When the limit is reached, the oldest file is rotated out.
For centralized logging, use a different driver:
# Send logs to a syslog server
logging:
driver: syslog
options:
syslog-address: "tcp://logserver:514"
tag: "myapp-api"
# Or use fluentd
logging:
driver: fluentd
options:
fluentd-address: "localhost:24224"
tag: "docker.{{.Name}}"
You can also set the default logging driver globally in /etc/docker/daemon.json:
{
"log-driver": "json-file",
"log-opts": {
"max-size": "10m",
"max-file": "3"
}
}
7. Use Named Volumes for Persistent Data
Bind mounts (./data:/data) are fine for development. In production, use named volumes for better portability and management:
services:
db:
image: postgres:16-alpine
volumes:
- postgres_data:/var/lib/postgresql/data
volumes:
postgres_data:
driver: local
Named volumes:
- Are managed by Docker (visible in
docker volume ls) - Can be backed up with
docker volumecommands - Don't depend on the host's directory structure
- Can use different storage drivers (local, NFS, cloud storage)
If you must use bind mounts (e.g., for configuration files), use the long syntax for clarity:
volumes:
- type: bind
source: ./nginx.conf
target: /etc/nginx/nginx.conf
read_only: true
8. Define Explicit Networks
By default, Docker Compose creates a single network for all services. In production, isolate services that don't need to communicate:
services:
api:
image: myapp:1.2.3
networks:
- frontend
- backend
db:
image: postgres:16-alpine
networks:
- backend # Not on frontend network, can't be reached by proxy
proxy:
image: caddy:2-alpine
networks:
- frontend # Can reach api but not db directly
ports:
- "443:443"
networks:
frontend:
backend:
This way, the reverse proxy can reach the API, but it cannot directly connect to the database. The API can reach both the proxy and the database. Defense in depth.
9. Drop Unnecessary Linux Capabilities
By default, Docker containers run with a set of Linux capabilities that they probably don't need. Drop them:
services:
api:
image: myapp:1.2.3
cap_drop:
- ALL
cap_add:
- NET_BIND_SERVICE # Only if binding to ports < 1024
security_opt:
- no-new-privileges:true
The cap_drop: ALL removes all capabilities, then cap_add adds back only what's needed. no-new-privileges prevents the container from gaining additional privileges through setuid binaries.
For even stricter isolation:
services:
api:
image: myapp:1.2.3
cap_drop:
- ALL
security_opt:
- no-new-privileges:true
read_only: true # Read-only root filesystem
tmpfs:
- /tmp # Writable temp directory
- /app/cache # Writable cache directory
10. Use Depends-On with Conditions
Basic depends_on only waits for the container to start, not for the service inside to be ready. Use conditions with health checks for proper startup ordering:
services:
api:
image: myapp:1.2.3
depends_on:
db:
condition: service_healthy
redis:
condition: service_healthy
db:
image: postgres:16-alpine
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 5s
timeout: 5s
retries: 5
redis:
image: redis:7-alpine
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 5s
timeout: 5s
retries: 5
With condition: service_healthy, the API container won't start until both the database and Redis have passed their health checks. This eliminates race conditions during startup.
11. Use Profiles for Environment-Specific Services
Some services should only run in certain environments. Docker Compose profiles handle this cleanly:
services:
api:
image: myapp:1.2.3
# No profile = always runs
db:
image: postgres:16-alpine
# No profile = always runs
debug-tools:
image: nicolaka/netshoot
profiles:
- debug
# Only runs when: docker compose --profile debug up
seed:
image: myapp-seed:1.2.3
profiles:
- setup
# Only runs when: docker compose --profile setup run seed
# Normal production startup
docker compose up -d
# Start with debug tools
docker compose --profile debug up -d
# Run database seeding
docker compose --profile setup run seed
12. Implement a Zero-Downtime Deployment Strategy
The default docker compose up -d stops the old container and starts a new one, causing a brief outage. For zero-downtime deployments, use a blue-green strategy:
#!/bin/bash
# deploy.sh - Zero-downtime deployment
SERVICE="api"
NEW_IMAGE="myapp:$(git rev-parse --short HEAD)"
# Pull the new image
docker compose pull $SERVICE
# Scale up a new instance alongside the old one
docker compose up -d --no-deps --scale $SERVICE=2 --no-recreate $SERVICE
# Wait for the new instance to be healthy
echo "Waiting for new instance to be healthy..."
sleep 30
# Remove the old instance
docker compose up -d --no-deps --scale $SERVICE=1 $SERVICE
echo "Deployment complete."
This works when you have a reverse proxy (Traefik, Nginx, Caddy) in front of your service that can route to multiple backends. The proxy detects the new healthy container and removes the old one from the pool.
Alternatively, use the newer docker compose up --wait flag which waits for health checks to pass:
# Pull new images and recreate with health check waiting
docker compose pull
docker compose up -d --wait
Putting It All Together
Here's a complete production-grade Docker Compose file incorporating all 12 practices:
version: "3.8"
services:
proxy:
image: caddy:2.7-alpine
restart: unless-stopped
ports:
- "80:80"
- "443:443"
volumes:
- type: bind
source: ./Caddyfile
target: /etc/caddy/Caddyfile
read_only: true
- caddy_data:/data
- caddy_config:/config
networks:
- frontend
cap_drop:
- ALL
cap_add:
- NET_BIND_SERVICE
security_opt:
- no-new-privileges:true
deploy:
resources:
limits:
cpus: '0.5'
memory: 128M
logging:
driver: json-file
options:
max-size: "10m"
max-file: "3"
healthcheck:
test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:80"]
interval: 30s
timeout: 10s
retries: 3
api:
image: myapp:1.2.3
restart: unless-stopped
env_file:
- .env
secrets:
- db_password
depends_on:
db:
condition: service_healthy
redis:
condition: service_healthy
networks:
- frontend
- backend
cap_drop:
- ALL
security_opt:
- no-new-privileges:true
read_only: true
tmpfs:
- /tmp
deploy:
resources:
limits:
cpus: '2.0'
memory: 512M
reservations:
cpus: '0.5'
memory: 256M
logging:
driver: json-file
options:
max-size: "10m"
max-file: "5"
healthcheck:
test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:3000/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 30s
db:
image: postgres:16.2-alpine
restart: unless-stopped
environment:
POSTGRES_DB: myapp
POSTGRES_USER: myapp
POSTGRES_PASSWORD_FILE: /run/secrets/db_password
secrets:
- db_password
volumes:
- postgres_data:/var/lib/postgresql/data
networks:
- backend
cap_drop:
- ALL
cap_add:
- CHOWN
- DAC_OVERRIDE
- FOWNER
- SETGID
- SETUID
security_opt:
- no-new-privileges:true
deploy:
resources:
limits:
cpus: '2.0'
memory: 1G
reservations:
cpus: '0.5'
memory: 512M
logging:
driver: json-file
options:
max-size: "10m"
max-file: "3"
healthcheck:
test: ["CMD-SHELL", "pg_isready -U myapp"]
interval: 10s
timeout: 5s
retries: 5
redis:
image: redis:7.2-alpine
restart: unless-stopped
command: redis-server --maxmemory 128mb --maxmemory-policy allkeys-lru
volumes:
- redis_data:/data
networks:
- backend
cap_drop:
- ALL
security_opt:
- no-new-privileges:true
deploy:
resources:
limits:
cpus: '0.5'
memory: 256M
logging:
driver: json-file
options:
max-size: "5m"
max-file: "3"
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 10s
timeout: 5s
retries: 5
volumes:
postgres_data:
redis_data:
caddy_data:
caddy_config:
networks:
frontend:
backend:
secrets:
db_password:
file: ./secrets/db_password.txt
Managing Compose in Production
Writing a good compose file is half the battle. You also need a way to manage it in production. A Docker management platform like usulnet lets you deploy, monitor, and update Docker Compose stacks through a web UI, making it easier for your team to manage production services without SSH-ing into servers.
Whatever approach you use, the 12 practices above will give your Docker Compose deployments a production-grade foundation. Start by auditing your existing compose files against this list and addressing the gaps one at a time.
docker compose config in your project directory to see the fully resolved compose file. Check for missing health checks, resource limits, and restart policies.