Docker Swarm in Production: Complete Deployment and Operations Guide
Docker Swarm remains one of the most operationally simple container orchestration platforms available. It ships inside the Docker Engine, requires no external dependencies, and can take a cluster from zero to production in under an hour. Yet simplicity does not mean it is trivial to run well. A poorly planned Swarm deployment will eventually suffer from split-brain scenarios, resource exhaustion on manager nodes, or cascading failures during rolling updates.
This guide is a production-readiness checklist for Docker Swarm. It covers every decision you need to make before going live: node topology, manager quorum, worker placement strategies, resource reservations, deployment configurations, and the operational runbooks you need to keep things running once they are live.
Production Readiness Checklist
Before deploying your first service, verify every item on this list. Skipping any of these has caused real outages in real clusters:
| Category | Requirement | Why It Matters |
|---|---|---|
| Manager nodes | 3 or 5 managers (odd number) | Raft consensus requires a majority quorum |
| Network | Low-latency links between managers (<10ms RTT) | Raft leader election is latency-sensitive |
| Firewall | Ports 2377, 7946, 4789 open between all nodes | Cluster management, gossip, overlay VXLAN |
| Time sync | NTP or chrony running on every node | Raft and TLS certificate validation require clock accuracy |
| Storage | Fast SSD for manager nodes (/var/lib/docker/swarm) | Raft WAL writes are latency-sensitive |
| Docker version | Same Docker Engine version across all nodes | Mixed versions cause subtle API incompatibilities |
| DNS | Stable hostnames resolvable between nodes | Node communication relies on name resolution |
| Monitoring | Prometheus + node-exporter on every node | You cannot operate what you cannot observe |
Node Sizing: Manager vs. Worker
Managers and workers have fundamentally different resource profiles. Managers run the Raft consensus algorithm, store the cluster state, and schedule tasks. Workers only execute containers. Sizing them the same way is a common and costly mistake.
Manager Node Sizing
Manager nodes handle Raft log replication, service scheduling, and cluster state. Their resource consumption scales with the number of services and tasks, not with the actual workloads running on them.
# Recommended manager node specifications
# Small cluster (up to 100 services, 500 tasks)
# CPU: 2 cores
# RAM: 4 GB
# Disk: 20 GB SSD (for Raft WAL)
# Medium cluster (100-500 services, 2000 tasks)
# CPU: 4 cores
# RAM: 8 GB
# Disk: 50 GB SSD
# Large cluster (500+ services, 5000+ tasks)
# CPU: 8 cores
# RAM: 16 GB
# Disk: 100 GB SSD (NVMe preferred)
Worker Node Sizing
Worker nodes should be sized for the workloads they carry. The key decision is whether to use many small nodes or fewer large ones:
- Many small nodes (e.g., 10 x 4-core/16GB): Better fault isolation, smaller blast radius per node failure, easier to scale incrementally
- Fewer large nodes (e.g., 3 x 16-core/64GB): Lower overhead, simpler management, better for memory-heavy workloads that cannot be split
In practice, most production clusters benefit from a mix: standardized worker nodes sized for the majority of workloads, with a few specialized nodes labeled for specific requirements (GPU, high-memory, SSD storage).
Manager Quorum: 3 or 5 Managers
The Raft consensus algorithm requires a strict majority (quorum) of managers to be available for the cluster to accept writes and schedule tasks. The math is straightforward:
| Total Managers | Quorum Required | Tolerated Failures |
|---|---|---|
| 1 | 1 | 0 (no fault tolerance) |
| 3 | 2 | 1 |
| 5 | 3 | 2 |
| 7 | 4 | 3 |
Rule of thumb: Use 3 managers for most production clusters. Use 5 managers only if you need to tolerate 2 simultaneous failures or if you are running a multi-datacenter deployment where a full site loss is a realistic scenario. Never use more than 7 managers; the Raft overhead outweighs the additional fault tolerance.
Initializing the Swarm
# Initialize on the first manager
docker swarm init --advertise-addr 10.0.1.10
# Get the manager join token
docker swarm join-token manager
# Join additional managers
docker swarm join \
--token SWMTKN-1-xxxxx \
--advertise-addr 10.0.1.11 \
10.0.1.10:2377
# Get the worker join token
docker swarm join-token worker
# Join workers
docker swarm join \
--token SWMTKN-1-yyyyy \
--advertise-addr 10.0.2.10 \
10.0.1.10:2377
Rotating Join Tokens
Join tokens should be rotated periodically and after any security incident:
# Rotate the worker join token
docker swarm join-token --rotate worker
# Rotate the manager join token
docker swarm join-token --rotate manager
Worker Placement: Drain, Active, and Pause
Every node in a Swarm cluster has an availability status that controls whether the scheduler can place tasks on it:
- active (default): The node accepts new tasks and continues running existing ones
- pause: The node does not accept new tasks but continues running existing ones. Useful for debugging a node without disrupting running services
- drain: The node does not accept new tasks and all existing tasks are shut down and rescheduled to other nodes. Use this for maintenance
# Drain a node for maintenance
docker node update --availability drain worker-03
# Verify tasks have been rescheduled
docker node ps worker-03
# Return the node to active duty
docker node update --availability active worker-03
# Pause a node (keep running tasks, reject new ones)
docker node update --availability pause worker-02
# Drain all manager nodes (run from a manager)
for node in $(docker node ls --filter role=manager -q); do
docker node update --availability drain "$node"
done
Constraint Labels and Placement
Labels are the primary mechanism for controlling where services run. They enable you to create logical groups of nodes and direct specific workloads to specific hardware.
# Add labels to nodes
docker node update --label-add zone=us-east-1a manager-01
docker node update --label-add zone=us-east-1b manager-02
docker node update --label-add zone=us-east-1c manager-03
docker node update --label-add tier=frontend worker-01
docker node update --label-add tier=frontend worker-02
docker node update --label-add tier=backend worker-03
docker node update --label-add tier=backend worker-04
docker node update --label-add tier=database worker-05
docker node update --label-add disk=ssd worker-05
docker node update --label-add gpu=true worker-06
Using Constraints in Service Definitions
# Deploy a service only on frontend nodes
docker service create \
--name nginx \
--replicas 4 \
--constraint 'node.labels.tier == frontend' \
nginx:latest
# Deploy a database only on SSD nodes
docker service create \
--name postgres \
--constraint 'node.labels.disk == ssd' \
--constraint 'node.role == worker' \
postgres:16
# Spread across availability zones
docker service create \
--name api \
--replicas 6 \
--placement-pref 'spread=node.labels.zone' \
myapp/api:latest
In a Compose file for docker stack deploy:
version: "3.8"
services:
api:
image: myapp/api:v2.1.0
deploy:
replicas: 6
placement:
constraints:
- node.labels.tier == backend
- node.role == worker
preferences:
- spread: node.labels.zone
resources:
limits:
cpus: "2.0"
memory: 1G
reservations:
cpus: "0.5"
memory: 256M
Resource Reservations and Limits
Resource management in Swarm operates on two axes: reservations (guaranteed minimums that affect scheduling) and limits (hard caps enforced by cgroups).
| Setting | Scheduling Effect | Runtime Effect |
|---|---|---|
resources.reservations.memory |
Node must have this much free memory to accept the task | None (soft guarantee) |
resources.limits.memory |
None | Container is OOM-killed if it exceeds this value |
resources.reservations.cpus |
Node must have this many CPU shares available | None (soft guarantee) |
resources.limits.cpus |
None | Container is throttled beyond this value |
# Service with both reservations and limits
docker service create \
--name worker-svc \
--replicas 10 \
--reserve-cpu 0.25 \
--reserve-memory 128M \
--limit-cpu 1.0 \
--limit-memory 512M \
myapp/worker:latest
Deploy Strategies
Swarm supports two deployment modes: replicated (run N copies distributed across the cluster) and global (run exactly one copy on every eligible node).
Replicated Mode
# Standard replicated deployment
docker service create \
--name api \
--replicas 6 \
--update-parallelism 2 \
--update-delay 10s \
--update-failure-action rollback \
--rollback-parallelism 1 \
--rollback-delay 5s \
myapp/api:v2.1.0
Global Mode
Global services are ideal for infrastructure components that need to run on every node: log collectors, monitoring agents, and node-level security tools.
# Deploy a monitoring agent on every node
docker service create \
--name node-exporter \
--mode global \
--mount type=bind,source=/proc,target=/host/proc,readonly \
--mount type=bind,source=/sys,target=/host/sys,readonly \
--mount type=bind,source=/,target=/rootfs,readonly \
--network monitoring \
prom/node-exporter:latest \
--path.procfs=/host/proc \
--path.sysfs=/host/sys \
--path.rootfs=/rootfs
Stack Deployments
For production, always use docker stack deploy with Compose files rather than individual docker service create commands. Stacks are declarative, version-controllable, and reproducible.
# Deploy a stack
docker stack deploy -c docker-compose.prod.yml myapp
# List stacks
docker stack ls
# List services in a stack
docker stack services myapp
# Remove a stack
docker stack rm myapp
A production-grade stack file:
version: "3.8"
services:
web:
image: myapp/web:v2.1.0
deploy:
replicas: 4
update_config:
parallelism: 2
delay: 15s
failure_action: rollback
monitor: 30s
order: start-first
rollback_config:
parallelism: 1
delay: 5s
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3
window: 120s
placement:
constraints:
- node.labels.tier == frontend
resources:
limits:
cpus: "1.0"
memory: 512M
reservations:
cpus: "0.25"
memory: 128M
ports:
- "80:8080"
networks:
- frontend
- backend
healthcheck:
test: ["CMD", "wget", "--spider", "-q", "http://localhost:8080/health"]
interval: 15s
timeout: 5s
retries: 3
start_period: 30s
api:
image: myapp/api:v2.1.0
deploy:
replicas: 6
update_config:
parallelism: 2
delay: 10s
failure_action: rollback
monitor: 30s
order: stop-first
placement:
constraints:
- node.labels.tier == backend
preferences:
- spread: node.labels.zone
resources:
limits:
cpus: "2.0"
memory: 1G
reservations:
cpus: "0.5"
memory: 256M
networks:
- backend
- db
secrets:
- db_password
- api_key
configs:
- source: api_config
target: /app/config.yaml
postgres:
image: postgres:16
deploy:
replicas: 1
placement:
constraints:
- node.labels.disk == ssd
- node.labels.tier == database
resources:
limits:
cpus: "4.0"
memory: 8G
reservations:
cpus: "2.0"
memory: 4G
volumes:
- pgdata:/var/lib/postgresql/data
networks:
- db
secrets:
- db_password
environment:
POSTGRES_PASSWORD_FILE: /run/secrets/db_password
networks:
frontend:
driver: overlay
backend:
driver: overlay
driver_opts:
encrypted: "true"
db:
driver: overlay
driver_opts:
encrypted: "true"
internal: true
volumes:
pgdata:
driver: local
secrets:
db_password:
external: true
api_key:
external: true
configs:
api_config:
file: ./config/api.yaml
Operational Runbooks
Adding a New Worker Node
# On the new node: install Docker
curl -fsSL https://get.docker.com | sh
# Get the join token from a manager
docker swarm join-token worker
# Join the cluster
docker swarm join --token SWMTKN-1-yyyyy 10.0.1.10:2377
# From a manager: add labels
docker node update --label-add tier=backend new-worker-01
docker node update --label-add zone=us-east-1b new-worker-01
# Verify
docker node ls
Removing a Worker Node Gracefully
# Drain the node first
docker node update --availability drain worker-03
# Wait for tasks to be rescheduled
watch docker node ps worker-03
# On the worker: leave the swarm
docker swarm leave
# From a manager: remove the node from the cluster
docker node rm worker-03
Promoting and Demoting Nodes
# Promote a worker to manager
docker node promote worker-05
# Demote a manager to worker
docker node demote manager-03
Inspecting Cluster Health
# List all nodes with status
docker node ls
# Check Raft status on a manager
docker info | grep -A5 "Raft"
# Inspect a specific node
docker node inspect --pretty worker-01
# List all services
docker service ls
# Check task distribution
docker service ps --no-trunc myapp_api
Common Production Pitfalls
- Running an even number of managers. With 2 or 4 managers, you gain no additional fault tolerance over 1 or 3, but you still pay the Raft overhead.
- Not draining managers. Application workloads on manager nodes compete with the Raft consensus for CPU and memory.
- Ignoring resource reservations. Without reservations, the scheduler cannot make intelligent placement decisions. Overcommitting memory leads to OOM kills across multiple services simultaneously.
- Using
docker service createin production. Individual commands are not reproducible. Always use stack files checked into version control. - Not setting health checks. Without health checks, Swarm considers a container healthy as soon as it starts. A service that starts but fails to bind its port will be considered "running" indefinitely.
- Placing all managers in one availability zone. A single network partition or rack failure can take out all managers, losing the entire cluster.
Monitoring Your Production Swarm
A production Swarm cluster needs comprehensive monitoring. At minimum, track these metrics:
- Node level: CPU, memory, disk I/O, network I/O, Docker daemon health
- Cluster level: Manager quorum status, Raft leader, node availability, total tasks vs. running tasks
- Service level: Replica count (desired vs. actual), update state, restart count, response latency
# Quick health check script for cron
#!/bin/bash
set -euo pipefail
MANAGERS=$(docker node ls --filter role=manager -q | wc -l)
REACHABLE=$(docker node ls --filter role=manager \
--format '{{.Status}}' | grep -c "Ready")
if [ "$REACHABLE" -lt $(( (MANAGERS / 2) + 1 )) ]; then
echo "CRITICAL: Manager quorum at risk! $REACHABLE/$MANAGERS reachable"
# Send alert via your preferred mechanism
fi
# Check for services with fewer running tasks than desired
docker service ls --format '{{.Name}} {{.Replicas}}' | \
while read name replicas; do
running=$(echo "$replicas" | cut -d/ -f1)
desired=$(echo "$replicas" | cut -d/ -f2)
if [ "$running" != "$desired" ]; then
echo "WARNING: $name has $running/$desired replicas"
fi
done
Conclusion
Docker Swarm in production is less about learning complex abstractions and more about disciplined engineering: right-size your nodes, protect your managers, set resource boundaries, use declarative deployments, and monitor everything. The checklist in this guide covers the decisions that separate a proof-of-concept Swarm from one that handles real traffic reliably.
Start with 3 managers and a handful of workers, deploy your first stack, and iterate. Swarm's operational simplicity means you can focus on your application rather than fighting the orchestrator. For teams looking to centralize their Swarm operations, usulnet provides a management layer that brings visibility and control to multi-node Docker environments without the complexity of heavier orchestration platforms.