Guides

Docker Swarm in Production: Complete Deployment and Operations Guide

April 2, 2025 · 18 min read

Docker Swarm remains one of the most operationally simple container orchestration platforms available. It ships inside the Docker Engine, requires no external dependencies, and can take a cluster from zero to production in under an hour. Yet simplicity does not mean it is trivial to run well. A poorly planned Swarm deployment will eventually suffer from split-brain scenarios, resource exhaustion on manager nodes, or cascading failures during rolling updates.

This guide is a production-readiness checklist for Docker Swarm. It covers every decision you need to make before going live: node topology, manager quorum, worker placement strategies, resource reservations, deployment configurations, and the operational runbooks you need to keep things running once they are live.

Production Readiness Checklist

Before deploying your first service, verify every item on this list. Skipping any of these has caused real outages in real clusters:

Category	Requirement	Why It Matters
Manager nodes	3 or 5 managers (odd number)	Raft consensus requires a majority quorum
Network	Low-latency links between managers (<10ms RTT)	Raft leader election is latency-sensitive
Firewall	Ports 2377, 7946, 4789 open between all nodes	Cluster management, gossip, overlay VXLAN
Time sync	NTP or chrony running on every node	Raft and TLS certificate validation require clock accuracy
Storage	Fast SSD for manager nodes (/var/lib/docker/swarm)	Raft WAL writes are latency-sensitive
Docker version	Same Docker Engine version across all nodes	Mixed versions cause subtle API incompatibilities
DNS	Stable hostnames resolvable between nodes	Node communication relies on name resolution
Monitoring	Prometheus + node-exporter on every node	You cannot operate what you cannot observe

Node Sizing: Manager vs. Worker

Managers and workers have fundamentally different resource profiles. Managers run the Raft consensus algorithm, store the cluster state, and schedule tasks. Workers only execute containers. Sizing them the same way is a common and costly mistake.

Manager Node Sizing

Manager nodes handle Raft log replication, service scheduling, and cluster state. Their resource consumption scales with the number of services and tasks, not with the actual workloads running on them.

# Recommended manager node specifications
# Small cluster (up to 100 services, 500 tasks)
#   CPU: 2 cores
#   RAM: 4 GB
#   Disk: 20 GB SSD (for Raft WAL)

# Medium cluster (100-500 services, 2000 tasks)
#   CPU: 4 cores
#   RAM: 8 GB
#   Disk: 50 GB SSD

# Large cluster (500+ services, 5000+ tasks)
#   CPU: 8 cores
#   RAM: 16 GB
#   Disk: 100 GB SSD (NVMe preferred)

Warning: Never run memory-intensive application workloads on manager nodes. A container that consumes all available RAM on a manager can cause the Raft consensus to fail, triggering a leader election or, worse, a loss of quorum. Either drain your managers or use resource constraints to protect them.

Worker Node Sizing

Worker nodes should be sized for the workloads they carry. The key decision is whether to use many small nodes or fewer large ones:

Many small nodes (e.g., 10 x 4-core/16GB): Better fault isolation, smaller blast radius per node failure, easier to scale incrementally
Fewer large nodes (e.g., 3 x 16-core/64GB): Lower overhead, simpler management, better for memory-heavy workloads that cannot be split

In practice, most production clusters benefit from a mix: standardized worker nodes sized for the majority of workloads, with a few specialized nodes labeled for specific requirements (GPU, high-memory, SSD storage).

Manager Quorum: 3 or 5 Managers

The Raft consensus algorithm requires a strict majority (quorum) of managers to be available for the cluster to accept writes and schedule tasks. The math is straightforward:

Total Managers	Quorum Required	Tolerated Failures
1	1	0 (no fault tolerance)
3	2	1
5	3	2
7	4	3

Rule of thumb: Use 3 managers for most production clusters. Use 5 managers only if you need to tolerate 2 simultaneous failures or if you are running a multi-datacenter deployment where a full site loss is a realistic scenario. Never use more than 7 managers; the Raft overhead outweighs the additional fault tolerance.

Initializing the Swarm

# Initialize on the first manager
docker swarm init --advertise-addr 10.0.1.10

# Get the manager join token
docker swarm join-token manager

# Join additional managers
docker swarm join \
  --token SWMTKN-1-xxxxx \
  --advertise-addr 10.0.1.11 \
  10.0.1.10:2377

# Get the worker join token
docker swarm join-token worker

# Join workers
docker swarm join \
  --token SWMTKN-1-yyyyy \
  --advertise-addr 10.0.2.10 \
  10.0.1.10:2377

Rotating Join Tokens

Join tokens should be rotated periodically and after any security incident:

# Rotate the worker join token
docker swarm join-token --rotate worker

# Rotate the manager join token
docker swarm join-token --rotate manager

Worker Placement: Drain, Active, and Pause

Every node in a Swarm cluster has an availability status that controls whether the scheduler can place tasks on it:

active (default): The node accepts new tasks and continues running existing ones
pause: The node does not accept new tasks but continues running existing ones. Useful for debugging a node without disrupting running services
drain: The node does not accept new tasks and all existing tasks are shut down and rescheduled to other nodes. Use this for maintenance

# Drain a node for maintenance
docker node update --availability drain worker-03

# Verify tasks have been rescheduled
docker node ps worker-03

# Return the node to active duty
docker node update --availability active worker-03

# Pause a node (keep running tasks, reject new ones)
docker node update --availability pause worker-02

Tip: Drain your manager nodes in production to keep them dedicated to cluster management. This is especially important for clusters with 50+ services where the scheduling overhead alone can consume significant CPU.

# Drain all manager nodes (run from a manager)
for node in $(docker node ls --filter role=manager -q); do
  docker node update --availability drain "$node"
done

Constraint Labels and Placement

Labels are the primary mechanism for controlling where services run. They enable you to create logical groups of nodes and direct specific workloads to specific hardware.

# Add labels to nodes
docker node update --label-add zone=us-east-1a manager-01
docker node update --label-add zone=us-east-1b manager-02
docker node update --label-add zone=us-east-1c manager-03

docker node update --label-add tier=frontend worker-01
docker node update --label-add tier=frontend worker-02
docker node update --label-add tier=backend worker-03
docker node update --label-add tier=backend worker-04
docker node update --label-add tier=database worker-05

docker node update --label-add disk=ssd worker-05
docker node update --label-add gpu=true worker-06

Using Constraints in Service Definitions

# Deploy a service only on frontend nodes
docker service create \
  --name nginx \
  --replicas 4 \
  --constraint 'node.labels.tier == frontend' \
  nginx:latest

# Deploy a database only on SSD nodes
docker service create \
  --name postgres \
  --constraint 'node.labels.disk == ssd' \
  --constraint 'node.role == worker' \
  postgres:16

# Spread across availability zones
docker service create \
  --name api \
  --replicas 6 \
  --placement-pref 'spread=node.labels.zone' \
  myapp/api:latest

In a Compose file for docker stack deploy:

version: "3.8"
services:
  api:
    image: myapp/api:v2.1.0
    deploy:
      replicas: 6
      placement:
        constraints:
          - node.labels.tier == backend
          - node.role == worker
        preferences:
          - spread: node.labels.zone
      resources:
        limits:
          cpus: "2.0"
          memory: 1G
        reservations:
          cpus: "0.5"
          memory: 256M

Resource Reservations and Limits

Resource management in Swarm operates on two axes: reservations (guaranteed minimums that affect scheduling) and limits (hard caps enforced by cgroups).

Setting	Scheduling Effect	Runtime Effect
`resources.reservations.memory`	Node must have this much free memory to accept the task	None (soft guarantee)
`resources.limits.memory`	None	Container is OOM-killed if it exceeds this value
`resources.reservations.cpus`	Node must have this many CPU shares available	None (soft guarantee)
`resources.limits.cpus`	None	Container is throttled beyond this value

Warning: If you set reservations without limits, tasks will be scheduled correctly but can still consume unbounded resources at runtime. If you set limits without reservations, the scheduler may pack too many tasks onto a single node. Always set both.

# Service with both reservations and limits
docker service create \
  --name worker-svc \
  --replicas 10 \
  --reserve-cpu 0.25 \
  --reserve-memory 128M \
  --limit-cpu 1.0 \
  --limit-memory 512M \
  myapp/worker:latest

Deploy Strategies

Swarm supports two deployment modes: replicated (run N copies distributed across the cluster) and global (run exactly one copy on every eligible node).

Replicated Mode

# Standard replicated deployment
docker service create \
  --name api \
  --replicas 6 \
  --update-parallelism 2 \
  --update-delay 10s \
  --update-failure-action rollback \
  --rollback-parallelism 1 \
  --rollback-delay 5s \
  myapp/api:v2.1.0

Global Mode

Global services are ideal for infrastructure components that need to run on every node: log collectors, monitoring agents, and node-level security tools.

# Deploy a monitoring agent on every node
docker service create \
  --name node-exporter \
  --mode global \
  --mount type=bind,source=/proc,target=/host/proc,readonly \
  --mount type=bind,source=/sys,target=/host/sys,readonly \
  --mount type=bind,source=/,target=/rootfs,readonly \
  --network monitoring \
  prom/node-exporter:latest \
  --path.procfs=/host/proc \
  --path.sysfs=/host/sys \
  --path.rootfs=/rootfs

Stack Deployments

For production, always use docker stack deploy with Compose files rather than individual docker service create commands. Stacks are declarative, version-controllable, and reproducible.

# Deploy a stack
docker stack deploy -c docker-compose.prod.yml myapp

# List stacks
docker stack ls

# List services in a stack
docker stack services myapp

# Remove a stack
docker stack rm myapp

A production-grade stack file:

version: "3.8"

services:
  web:
    image: myapp/web:v2.1.0
    deploy:
      replicas: 4
      update_config:
        parallelism: 2
        delay: 15s
        failure_action: rollback
        monitor: 30s
        order: start-first
      rollback_config:
        parallelism: 1
        delay: 5s
      restart_policy:
        condition: on-failure
        delay: 5s
        max_attempts: 3
        window: 120s
      placement:
        constraints:
          - node.labels.tier == frontend
      resources:
        limits:
          cpus: "1.0"
          memory: 512M
        reservations:
          cpus: "0.25"
          memory: 128M
    ports:
      - "80:8080"
    networks:
      - frontend
      - backend
    healthcheck:
      test: ["CMD", "wget", "--spider", "-q", "http://localhost:8080/health"]
      interval: 15s
      timeout: 5s
      retries: 3
      start_period: 30s

  api:
    image: myapp/api:v2.1.0
    deploy:
      replicas: 6
      update_config:
        parallelism: 2
        delay: 10s
        failure_action: rollback
        monitor: 30s
        order: stop-first
      placement:
        constraints:
          - node.labels.tier == backend
        preferences:
          - spread: node.labels.zone
      resources:
        limits:
          cpus: "2.0"
          memory: 1G
        reservations:
          cpus: "0.5"
          memory: 256M
    networks:
      - backend
      - db
    secrets:
      - db_password
      - api_key
    configs:
      - source: api_config
        target: /app/config.yaml

  postgres:
    image: postgres:16
    deploy:
      replicas: 1
      placement:
        constraints:
          - node.labels.disk == ssd
          - node.labels.tier == database
      resources:
        limits:
          cpus: "4.0"
          memory: 8G
        reservations:
          cpus: "2.0"
          memory: 4G
    volumes:
      - pgdata:/var/lib/postgresql/data
    networks:
      - db
    secrets:
      - db_password
    environment:
      POSTGRES_PASSWORD_FILE: /run/secrets/db_password

networks:
  frontend:
    driver: overlay
  backend:
    driver: overlay
    driver_opts:
      encrypted: "true"
  db:
    driver: overlay
    driver_opts:
      encrypted: "true"
    internal: true

volumes:
  pgdata:
    driver: local

secrets:
  db_password:
    external: true
  api_key:
    external: true

configs:
  api_config:
    file: ./config/api.yaml

Operational Runbooks

Adding a New Worker Node

# On the new node: install Docker
curl -fsSL https://get.docker.com | sh

# Get the join token from a manager
docker swarm join-token worker

# Join the cluster
docker swarm join --token SWMTKN-1-yyyyy 10.0.1.10:2377

# From a manager: add labels
docker node update --label-add tier=backend new-worker-01
docker node update --label-add zone=us-east-1b new-worker-01

# Verify
docker node ls

Removing a Worker Node Gracefully

# Drain the node first
docker node update --availability drain worker-03

# Wait for tasks to be rescheduled
watch docker node ps worker-03

# On the worker: leave the swarm
docker swarm leave

# From a manager: remove the node from the cluster
docker node rm worker-03

Promoting and Demoting Nodes

# Promote a worker to manager
docker node promote worker-05

# Demote a manager to worker
docker node demote manager-03

Inspecting Cluster Health

# List all nodes with status
docker node ls

# Check Raft status on a manager
docker info | grep -A5 "Raft"

# Inspect a specific node
docker node inspect --pretty worker-01

# List all services
docker service ls

# Check task distribution
docker service ps --no-trunc myapp_api

Common Production Pitfalls

Running an even number of managers. With 2 or 4 managers, you gain no additional fault tolerance over 1 or 3, but you still pay the Raft overhead.
Not draining managers. Application workloads on manager nodes compete with the Raft consensus for CPU and memory.
Ignoring resource reservations. Without reservations, the scheduler cannot make intelligent placement decisions. Overcommitting memory leads to OOM kills across multiple services simultaneously.
Using docker service create in production. Individual commands are not reproducible. Always use stack files checked into version control.
Not setting health checks. Without health checks, Swarm considers a container healthy as soon as it starts. A service that starts but fails to bind its port will be considered "running" indefinitely.
Placing all managers in one availability zone. A single network partition or rack failure can take out all managers, losing the entire cluster.

Tip: Use usulnet to get a unified view of your Swarm cluster across all nodes. usulnet provides real-time visibility into service placement, node health, and resource utilization, making it significantly easier to manage production Swarm deployments than the Docker CLI alone.

Monitoring Your Production Swarm

A production Swarm cluster needs comprehensive monitoring. At minimum, track these metrics:

Node level: CPU, memory, disk I/O, network I/O, Docker daemon health
Cluster level: Manager quorum status, Raft leader, node availability, total tasks vs. running tasks
Service level: Replica count (desired vs. actual), update state, restart count, response latency

# Quick health check script for cron
#!/bin/bash
set -euo pipefail

MANAGERS=$(docker node ls --filter role=manager -q | wc -l)
REACHABLE=$(docker node ls --filter role=manager \
  --format '{{.Status}}' | grep -c "Ready")

if [ "$REACHABLE" -lt $(( (MANAGERS / 2) + 1 )) ]; then
  echo "CRITICAL: Manager quorum at risk! $REACHABLE/$MANAGERS reachable"
  # Send alert via your preferred mechanism
fi

# Check for services with fewer running tasks than desired
docker service ls --format '{{.Name}} {{.Replicas}}' | \
  while read name replicas; do
    running=$(echo "$replicas" | cut -d/ -f1)
    desired=$(echo "$replicas" | cut -d/ -f2)
    if [ "$running" != "$desired" ]; then
      echo "WARNING: $name has $running/$desired replicas"
    fi
  done

Conclusion

Docker Swarm in production is less about learning complex abstractions and more about disciplined engineering: right-size your nodes, protect your managers, set resource boundaries, use declarative deployments, and monitor everything. The checklist in this guide covers the decisions that separate a proof-of-concept Swarm from one that handles real traffic reliably.

Start with 3 managers and a handful of workers, deploy your first stack, and iterate. Swarm's operational simplicity means you can focus on your application rather than fighting the orchestrator. For teams looking to centralize their Swarm operations, usulnet provides a management layer that brings visibility and control to multi-node Docker environments without the complexity of heavier orchestration platforms.

Production Readiness Checklist

Node Sizing: Manager vs. Worker

Manager Node Sizing

Worker Node Sizing

Manager Quorum: 3 or 5 Managers

Initializing the Swarm

Rotating Join Tokens

Worker Placement: Drain, Active, and Pause

Constraint Labels and Placement

Using Constraints in Service Definitions

Resource Reservations and Limits

Deploy Strategies

Replicated Mode

Global Mode

Stack Deployments

Operational Runbooks

Adding a New Worker Node

Removing a Worker Node Gracefully

Promoting and Demoting Nodes

Inspecting Cluster Health

Common Production Pitfalls

Monitoring Your Production Swarm

Conclusion

Related Articles

Docker Swarm Networking Deep Dive: Overlay Networks, Ingress and Service Mesh

Docker Swarm High Availability: Building Resilient Container Infrastructure

Multi-Node Docker Architecture: Managing Containers Across Multiple Servers