CI/CD with Docker Swarm: Automated Deployments from Git to Production
A CI/CD pipeline for Docker Swarm follows a straightforward pattern: build the image, push it to a registry, SSH into a manager node, and run docker stack deploy. The simplicity of this pattern is one of Swarm's greatest advantages over more complex orchestrators. There is no kubectl, no Helm charts, no custom resource definitions. Just Docker commands you already know.
But simplicity does not mean there are no decisions to make. How do you pass registry credentials to Swarm nodes? How do you handle secret updates during deployment? How do you implement rollback when a deployment fails? How do you promote changes across staging and production environments? This guide covers complete CI/CD pipelines for both GitHub Actions and GitLab CI, along with the deployment scripts and testing strategies that make them production-ready.
The Build-Push-Deploy Pipeline
Every Swarm CI/CD pipeline has three phases:
| Phase | Where | What Happens |
|---|---|---|
| Build | CI runner | Run tests, build Docker image, tag with version |
| Push | CI runner | Push tagged image to container registry |
| Deploy | Swarm manager (via SSH) | Update stack file, run docker stack deploy |
GitHub Actions Pipeline
Here is a complete GitHub Actions workflow for building, testing, and deploying to Docker Swarm:
# .github/workflows/deploy.yml
name: Build and Deploy to Swarm
on:
push:
branches: [main]
pull_request:
branches: [main]
env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run unit tests
run: |
docker compose -f docker-compose.test.yml up -d
docker compose -f docker-compose.test.yml run --rm app npm test
docker compose -f docker-compose.test.yml down
build:
runs-on: ubuntu-latest
needs: test
permissions:
contents: read
packages: write
outputs:
image_tag: ${{ steps.meta.outputs.tags }}
steps:
- uses: actions/checkout@v4
- name: Log in to GitHub Container Registry
uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Extract metadata
id: meta
uses: docker/metadata-action@v5
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
tags: |
type=sha,prefix=
type=ref,event=branch
type=semver,pattern={{version}}
- name: Build and push image
uses: docker/build-push-action@v5
with:
context: .
push: ${{ github.event_name == 'push' }}
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
cache-from: type=gha
cache-to: type=gha,mode=max
deploy-staging:
runs-on: ubuntu-latest
needs: build
if: github.ref == 'refs/heads/main'
environment: staging
steps:
- uses: actions/checkout@v4
- name: Deploy to staging Swarm
uses: appleboy/ssh-action@v1
with:
host: ${{ secrets.STAGING_MANAGER_HOST }}
username: deploy
key: ${{ secrets.STAGING_SSH_KEY }}
script: |
# Login to registry
echo "${{ secrets.GITHUB_TOKEN }}" | docker login ghcr.io -u ${{ github.actor }} --password-stdin
# Update image tag in stack file
export IMAGE_TAG="${{ needs.build.outputs.image_tag }}"
cd /opt/myapp
sed -i "s|image:.*myapp.*|image: ${IMAGE_TAG}|" docker-compose.yml
# Deploy
docker stack deploy \
-c docker-compose.yml \
--with-registry-auth \
myapp
# Wait for update to complete
sleep 10
docker service ls --filter label=com.docker.stack.namespace=myapp
deploy-production:
runs-on: ubuntu-latest
needs: [build, deploy-staging]
if: github.ref == 'refs/heads/main'
environment: production
steps:
- uses: actions/checkout@v4
- name: Deploy to production Swarm
uses: appleboy/ssh-action@v1
with:
host: ${{ secrets.PROD_MANAGER_HOST }}
username: deploy
key: ${{ secrets.PROD_SSH_KEY }}
script: |
echo "${{ secrets.GITHUB_TOKEN }}" | docker login ghcr.io -u ${{ github.actor }} --password-stdin
export IMAGE_TAG="${{ needs.build.outputs.image_tag }}"
cd /opt/myapp
sed -i "s|image:.*myapp.*|image: ${IMAGE_TAG}|" docker-compose.yml
docker stack deploy \
-c docker-compose.yml \
--with-registry-auth \
myapp
# Verify deployment
./scripts/verify-deployment.sh myapp
GitLab CI/CD Pipeline
# .gitlab-ci.yml
stages:
- test
- build
- deploy-staging
- deploy-production
variables:
DOCKER_TLS_CERTDIR: "/certs"
IMAGE_TAG: $CI_REGISTRY_IMAGE:$CI_COMMIT_SHORT_SHA
test:
stage: test
image: docker:24-dind
services:
- docker:24-dind
script:
- docker compose -f docker-compose.test.yml run --rm app npm test
build:
stage: build
image: docker:24-dind
services:
- docker:24-dind
before_script:
- docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY
script:
- docker build
--cache-from $CI_REGISTRY_IMAGE:latest
--tag $IMAGE_TAG
--tag $CI_REGISTRY_IMAGE:latest
.
- docker push $IMAGE_TAG
- docker push $CI_REGISTRY_IMAGE:latest
only:
- main
- tags
deploy-staging:
stage: deploy-staging
image: alpine:latest
before_script:
- apk add --no-cache openssh-client
- eval $(ssh-agent -s)
- echo "$STAGING_SSH_KEY" | tr -d '\r' | ssh-add -
- mkdir -p ~/.ssh
- echo "$STAGING_SSH_KNOWN_HOSTS" >> ~/.ssh/known_hosts
script:
- |
ssh deploy@$STAGING_MANAGER_HOST << DEPLOY
echo "$CI_REGISTRY_PASSWORD" | docker login -u $CI_REGISTRY_USER --password-stdin $CI_REGISTRY
cd /opt/myapp
export IMAGE_TAG=$IMAGE_TAG
envsubst < docker-compose.template.yml > docker-compose.yml
docker stack deploy \
-c docker-compose.yml \
--with-registry-auth \
myapp
# Wait and verify
sleep 15
docker service ls --filter label=com.docker.stack.namespace=myapp
DEPLOY
environment:
name: staging
url: https://staging.example.com
only:
- main
deploy-production:
stage: deploy-production
image: alpine:latest
before_script:
- apk add --no-cache openssh-client
- eval $(ssh-agent -s)
- echo "$PROD_SSH_KEY" | tr -d '\r' | ssh-add -
- mkdir -p ~/.ssh
- echo "$PROD_SSH_KNOWN_HOSTS" >> ~/.ssh/known_hosts
script:
- |
ssh deploy@$PROD_MANAGER_HOST << DEPLOY
echo "$CI_REGISTRY_PASSWORD" | docker login -u $CI_REGISTRY_USER --password-stdin $CI_REGISTRY
cd /opt/myapp
export IMAGE_TAG=$IMAGE_TAG
envsubst < docker-compose.template.yml > docker-compose.yml
docker stack deploy \
-c docker-compose.yml \
--with-registry-auth \
myapp
/opt/myapp/scripts/verify-deployment.sh myapp
DEPLOY
environment:
name: production
url: https://www.example.com
when: manual
only:
- main
SSH Deploy Configuration
The deploy user on your Swarm manager needs SSH access and Docker permissions, but nothing else:
# On the Swarm manager node
# Create a deploy user
useradd -m -s /bin/bash deploy
usermod -aG docker deploy
# Set up SSH key authentication
mkdir -p /home/deploy/.ssh
echo "ssh-ed25519 AAAA... ci-deploy-key" >> /home/deploy/.ssh/authorized_keys
chmod 700 /home/deploy/.ssh
chmod 600 /home/deploy/.ssh/authorized_keys
chown -R deploy:deploy /home/deploy/.ssh
# Restrict the deploy user (optional: limit to docker commands only)
# /etc/ssh/sshd_config.d/deploy.conf
Match User deploy
AllowTcpForwarding no
X11Forwarding no
PermitTunnel no
AllowUsers deploy@CI_IP in sshd_config or firewall rules to restrict access.
Docker Registry Integration
Swarm nodes need access to your container registry to pull images during deployment. There are two approaches:
Approach 1: --with-registry-auth
The --with-registry-auth flag on docker stack deploy distributes the deployer's registry credentials to all nodes in the Swarm:
# Login on the manager first
docker login registry.example.com
# Deploy with registry auth distribution
docker stack deploy \
-c docker-compose.yml \
--with-registry-auth \
myapp
Approach 2: Pre-authenticated Nodes
Configure Docker credentials on every node using a config.json file:
# On every Swarm node
mkdir -p /root/.docker
cat > /root/.docker/config.json << 'EOF'
{
"auths": {
"ghcr.io": {
"auth": "BASE64_ENCODED_USER:TOKEN"
},
"registry.example.com": {
"auth": "BASE64_ENCODED_USER:TOKEN"
}
}
}
EOF
chmod 600 /root/.docker/config.json
Stack Deploy Automation
Template-Based Deployment
Use a template file with environment variable substitution for consistent deployments across environments:
# docker-compose.template.yml
version: "3.8"
services:
api:
image: ${IMAGE_TAG}
deploy:
replicas: ${API_REPLICAS:-4}
update_config:
parallelism: 2
delay: 15s
failure_action: rollback
monitor: 30s
order: start-first
resources:
limits:
cpus: "${API_CPU_LIMIT:-2.0}"
memory: ${API_MEM_LIMIT:-1G}
healthcheck:
test: ["CMD", "wget", "--spider", "-q", "http://localhost:8080/health"]
interval: 10s
timeout: 5s
retries: 3
start_period: 30s
networks:
- app
secrets:
- db_password
networks:
app:
driver: overlay
secrets:
db_password:
external: true
# deploy.sh - Deployment script with verification
#!/bin/bash
set -euo pipefail
STACK_NAME="${1:?Usage: deploy.sh STACK_NAME}"
COMPOSE_TEMPLATE="docker-compose.template.yml"
COMPOSE_FILE="docker-compose.yml"
log() { echo "[$(date '+%H:%M:%S')] $1"; }
# Generate compose file from template
log "Generating compose file..."
envsubst < "$COMPOSE_TEMPLATE" > "$COMPOSE_FILE"
# Record pre-deployment state for rollback
log "Recording pre-deployment state..."
PREVIOUS_IMAGES=$(docker stack services "$STACK_NAME" \
--format '{{.Name}}={{.Image}}' 2>/dev/null || echo "")
echo "$PREVIOUS_IMAGES" > /tmp/rollback-images.txt
# Deploy
log "Deploying stack: $STACK_NAME"
docker stack deploy \
-c "$COMPOSE_FILE" \
--with-registry-auth \
"$STACK_NAME"
# Wait for convergence
log "Waiting for services to converge..."
MAX_WAIT=300
ELAPSED=0
while [ $ELAPSED -lt $MAX_WAIT ]; do
ALL_CONVERGED=true
while IFS= read -r line; do
service=$(echo "$line" | awk '{print $2}')
replicas=$(echo "$line" | awk '{print $4}')
current=$(echo "$replicas" | cut -d/ -f1)
desired=$(echo "$replicas" | cut -d/ -f2)
if [ "$current" != "$desired" ]; then
ALL_CONVERGED=false
log " Waiting: $service ($current/$desired replicas)"
fi
done < <(docker stack services "$STACK_NAME" --format 'table {{.ID}}\t{{.Name}}\t{{.Image}}\t{{.Replicas}}' | tail -n +2)
if [ "$ALL_CONVERGED" = true ]; then
log "All services converged."
break
fi
sleep 5
ELAPSED=$((ELAPSED + 5))
done
if [ $ELAPSED -ge $MAX_WAIT ]; then
log "ERROR: Deployment did not converge within ${MAX_WAIT}s"
log "Initiating rollback..."
while IFS='=' read -r service image; do
[ -z "$service" ] && continue
log "Rolling back $service to $image"
docker service update --image "$image" "$service" --detach
done < /tmp/rollback-images.txt
exit 1
fi
log "Deployment successful."
Testing Strategies
Pre-Deployment Tests
# Run in CI before building the production image
# 1. Unit tests
npm test # or go test, pytest, etc.
# 2. Integration tests with docker-compose
docker compose -f docker-compose.test.yml up -d
docker compose -f docker-compose.test.yml run --rm app npm run test:integration
docker compose -f docker-compose.test.yml down
# 3. Image vulnerability scanning
trivy image --severity CRITICAL,HIGH myapp/api:latest
# Or use Docker Scout
docker scout cves myapp/api:latest --only-severity critical,high
Post-Deployment Verification
#!/bin/bash
# verify-deployment.sh - Post-deploy health check
set -euo pipefail
STACK_NAME="${1:?Usage: verify-deployment.sh STACK_NAME}"
HEALTH_URL="${2:-http://localhost:8080/health}"
MAX_RETRIES=30
RETRY_INTERVAL=10
log() { echo "[$(date '+%H:%M:%S')] $1"; }
# Check 1: All services have correct replica count
log "Checking replica counts..."
UNHEALTHY_SERVICES=0
docker stack services "$STACK_NAME" --format '{{.Name}} {{.Replicas}}' | \
while read name replicas; do
current=$(echo "$replicas" | cut -d/ -f1)
desired=$(echo "$replicas" | cut -d/ -f2)
if [ "$current" != "$desired" ]; then
log "WARN: $name has $current/$desired replicas"
UNHEALTHY_SERVICES=$((UNHEALTHY_SERVICES + 1))
else
log "OK: $name ($replicas)"
fi
done
# Check 2: No tasks in failed state
log "Checking for failed tasks..."
FAILED=$(docker stack services "$STACK_NAME" -q | \
xargs -I {} docker service ps {} --filter "desired-state=shutdown" \
--format '{{.Name}} {{.Error}}' 2>/dev/null | grep -v "^$" | head -5)
if [ -n "$FAILED" ]; then
log "WARN: Found failed tasks:"
echo "$FAILED"
fi
# Check 3: HTTP health check
log "Running HTTP health check..."
for i in $(seq 1 $MAX_RETRIES); do
STATUS=$(curl -s -o /dev/null -w "%{http_code}" "$HEALTH_URL" || echo "000")
if [ "$STATUS" = "200" ]; then
log "Health check passed (HTTP $STATUS)"
exit 0
fi
log "Attempt $i/$MAX_RETRIES: HTTP $STATUS, retrying in ${RETRY_INTERVAL}s..."
sleep $RETRY_INTERVAL
done
log "ERROR: Health check failed after $MAX_RETRIES attempts"
exit 1
Rollback on Failure
There are three rollback strategies:
Strategy 1: Swarm Native Rollback
# Swarm's built-in rollback (reverts to previous task spec)
docker service rollback myapp_api
# This works per-service, not per-stack
# For stack-wide rollback, you need to rollback each service
Strategy 2: Redeploy Previous Image
# Redeploy with the previous image tag
docker service update --image myapp/api:v2.0.0 myapp_api
# Or redeploy the entire stack with the previous compose file
git checkout HEAD~1 -- docker-compose.yml
docker stack deploy -c docker-compose.yml --with-registry-auth myapp
Strategy 3: Automated Rollback in CI/CD
# In GitHub Actions
- name: Deploy with rollback
uses: appleboy/ssh-action@v1
with:
host: ${{ secrets.PROD_MANAGER_HOST }}
username: deploy
key: ${{ secrets.PROD_SSH_KEY }}
script: |
cd /opt/myapp
# Save current state
docker stack services myapp --format '{{.Name}}={{.Image}}' > /tmp/pre-deploy.txt
# Deploy new version
docker stack deploy -c docker-compose.yml --with-registry-auth myapp
# Verify
if ! /opt/myapp/scripts/verify-deployment.sh myapp http://localhost:8080/health; then
echo "Deployment failed, rolling back..."
while IFS='=' read -r service image; do
docker service update --image "$image" "$service" --detach
done < /tmp/pre-deploy.txt
echo "Rollback complete"
exit 1
fi
Environment Promotion
Use the same image across environments. Never rebuild for production. Promote the exact image that was tested:
# Environment-specific configuration via .env files
# staging.env
API_REPLICAS=2
API_CPU_LIMIT=1.0
API_MEM_LIMIT=512M
IMAGE_TAG=ghcr.io/myorg/myapp:abc123
# production.env
API_REPLICAS=6
API_CPU_LIMIT=2.0
API_MEM_LIMIT=1G
IMAGE_TAG=ghcr.io/myorg/myapp:abc123 # Same image!
# Deploy to staging
set -a && source staging.env && set +a
envsubst < docker-compose.template.yml > docker-compose.yml
docker stack deploy -c docker-compose.yml --with-registry-auth myapp
# Promote to production (same image, different config)
set -a && source production.env && set +a
envsubst < docker-compose.template.yml > docker-compose.yml
docker stack deploy -c docker-compose.yml --with-registry-auth myapp
Security Considerations
- SSH key rotation: Rotate deploy SSH keys quarterly. Store them in CI/CD secrets, never in code.
- Registry tokens: Use short-lived tokens or service accounts with minimal permissions (push to specific repos only).
- Secret updates: If a deployment requires new secrets, create them before deploying the stack. Secrets cannot be created inside a stack deploy.
- Image signing: Enable Docker Content Trust (
DOCKER_CONTENT_TRUST=1) to ensure only signed images are deployed. - Audit trail: Log every deployment with the commit SHA, timestamp, deployer, and environment. Store these logs externally.
Complete Pipeline Architecture
| Step | Trigger | Actions | Gate |
|---|---|---|---|
| 1. Test | Every push | Unit tests, lint, type check | Must pass |
| 2. Build | Push to main | Docker build, push to registry | Tests passed |
| 3. Scan | After build | Vulnerability scan (Trivy/Scout) | No critical CVEs |
| 4. Deploy staging | After scan | Stack deploy to staging Swarm | Automatic |
| 5. Verify staging | After deploy | Health checks, smoke tests | Must pass |
| 6. Deploy production | Manual approval | Stack deploy to production Swarm | Manual gate |
| 7. Verify production | After deploy | Health checks, monitoring | Auto-rollback on failure |
Conclusion
CI/CD with Docker Swarm is refreshingly straightforward compared to more complex orchestration platforms. The pipeline is: build, push, SSH, docker stack deploy. The complexity lies not in the deployment mechanism but in the supporting infrastructure: proper testing before deployment, health checks for verification after deployment, automated rollback when things go wrong, and environment promotion to ensure staging and production run the exact same image.
Start with the GitHub Actions or GitLab CI pipeline from this guide, add the verification script, and iterate. The first deployment is manual and scary. The hundredth is automatic and boring. That is the goal.