Networking

Docker Swarm Networking Deep Dive: Overlay Networks, Ingress and Service Mesh

April 3, 2025 · 19 min read

Docker Swarm networking is where most operational complexity hides. On the surface, it looks simple: create an overlay network, attach services, and they can talk to each other across nodes. Under the hood, Swarm is managing VXLAN tunnels, a distributed DNS system, an internal load balancer (IPVS), and the ingress routing mesh. Understanding these layers is essential for debugging connectivity issues, optimizing performance, and securing inter-service communication.

This article dissects every networking layer in Docker Swarm, from the data plane to the control plane, with practical examples and troubleshooting techniques you will actually need in production.

Swarm Network Architecture Overview

When you initialize a Swarm, Docker automatically creates two networks:

Network	Driver	Purpose	Scope
`ingress`	overlay	Routing mesh for published ports	All Swarm nodes
`docker_gwbridge`	bridge	Connects overlay networks to the host's network stack	Per node

Every service you create is automatically attached to the ingress network if it publishes ports. For service-to-service communication, you create custom overlay networks.

Overlay Networks

Overlay networks use VXLAN (Virtual Extensible LAN) to encapsulate Layer 2 Ethernet frames inside UDP packets, creating a virtual network that spans all Swarm nodes. Each overlay network gets its own isolated subnet and VXLAN Network Identifier (VNI).

Creating Overlay Networks

# Basic overlay network
docker network create \
  --driver overlay \
  --subnet 10.10.0.0/24 \
  my-app-network

# Encrypted overlay network (IPSec between nodes)
docker network create \
  --driver overlay \
  --opt encrypted \
  --subnet 10.11.0.0/24 \
  secure-network

# Internal network (no external access)
docker network create \
  --driver overlay \
  --internal \
  --subnet 10.12.0.0/24 \
  db-network

# Attachable network (allows standalone containers)
docker network create \
  --driver overlay \
  --attachable \
  --subnet 10.13.0.0/24 \
  debug-network

Understanding VXLAN Encapsulation

When a container on Node A sends a packet to a container on Node B via an overlay network, the following happens:

The packet exits the container's veth pair into the overlay network's Linux bridge (br0)
The VXLAN driver encapsulates the entire Ethernet frame inside a UDP packet (destination port 4789)
The outer UDP packet is routed through the host's physical network to Node B
Node B's VXLAN driver decapsulates the UDP packet and delivers the original frame to the local overlay bridge
The frame reaches the destination container through its veth pair

This encapsulation adds approximately 50 bytes of overhead per packet (VXLAN header + outer UDP/IP/Ethernet headers). For most workloads this is negligible, but for high-throughput, small-packet workloads (e.g., memcached, Redis), the overhead can be measurable.

Tip: Set the MTU on your overlay networks to account for VXLAN overhead. If your physical network MTU is 1500, the overlay network MTU should be 1450. If your infrastructure supports jumbo frames (MTU 9000), set the overlay to 8950.

# Create overlay with custom MTU
docker network create \
  --driver overlay \
  --opt com.docker.network.driver.mtu=1450 \
  --subnet 10.10.0.0/24 \
  my-app-network

The Ingress Routing Mesh

The ingress routing mesh is Swarm's built-in load balancer for external traffic. When you publish a port on a service, every node in the Swarm listens on that port, even if the node is not running a replica of that service. Incoming traffic is routed via IPVS (IP Virtual Server) to a node that is running the service.

# Publish port 80 on all nodes, routing to container port 8080
docker service create \
  --name web \
  --replicas 3 \
  --publish published=80,target=8080 \
  nginx:latest

# Verify: any node's IP on port 80 reaches the service
curl http://node-1:80
curl http://node-2:80  # Works even if node-2 has no replica
curl http://node-3:80

How the Routing Mesh Works

The routing mesh uses Linux IPVS in NAT mode. When a request arrives at any node on the published port:

The packet hits an iptables DNAT rule that redirects it to the ingress network's IPVS virtual IP
IPVS selects a backend (a container running the service) using round-robin by default
The packet is forwarded via the overlay ingress network to the selected container
The response follows the reverse path back through the originating node

Bypassing the Routing Mesh

For workloads that need client IP preservation or want to avoid the extra hop, you can publish ports in host mode:

# Host mode: each task binds directly to the host's port
docker service create \
  --name web-direct \
  --mode global \
  --publish mode=host,published=80,target=8080 \
  nginx:latest

Warning: Host mode publishing means you can only run one replica per node on a given port. Use --mode global to ensure one replica per node, or use constraints to control placement. Also note that the client will only reach the service on nodes that are running a replica.

Feature	Routing Mesh (ingress)	Host Mode
Client IP visible	No (NAT obscures it)	Yes
Any node answers	Yes	Only nodes with replicas
Multiple replicas per node	Yes	No (port conflict)
Extra network hop	Yes (IPVS forwarding)	No
External LB needed	Optional	Required for HA

Encrypted Overlay Networks

By default, overlay network traffic is unencrypted. Data traverses the physical network as plain-text VXLAN-encapsulated packets. For sensitive traffic, enable encryption:

# Create encrypted overlay
docker network create \
  --driver overlay \
  --opt encrypted \
  secure-backend

# In a stack file
networks:
  secure-backend:
    driver: overlay
    driver_opts:
      encrypted: "true"

Encryption uses IPSec ESP (Encapsulating Security Payload) with AES-GCM. The keys are automatically negotiated between nodes using the Swarm's built-in PKI. There is no manual key management required.

Performance impact: Encrypted overlay networks add roughly 10-20% latency overhead and reduce throughput by 15-30% compared to unencrypted overlays. Benchmark your specific workload before enabling encryption cluster-wide. Reserve it for networks carrying sensitive data (database connections, API tokens, PII).

Service Discovery and DNS

Swarm provides built-in DNS-based service discovery. Every service gets a DNS entry that resolves to the service's virtual IP (VIP) or directly to the individual task IPs.

VIP-Based Discovery (Default)

When a service is created, Swarm assigns it a virtual IP on each overlay network it is attached to. The service name resolves to this VIP, and IPVS load balances across all healthy tasks.

# Create two services on the same network
docker network create --driver overlay app-net

docker service create --name api --network app-net --replicas 3 myapp/api
docker service create --name web --network app-net --replicas 2 myapp/web

# From inside a web container:
# 'api' resolves to the VIP, e.g., 10.0.1.5
# IPVS load balances across all 3 api replicas
nslookup api
# Server:  127.0.0.11
# Name:    api
# Address: 10.0.1.5

# To see individual task IPs, query tasks.
nslookup tasks.api
# Name:    tasks.api
# Address: 10.0.1.10
# Address: 10.0.1.11
# Address: 10.0.1.12

DNS Round-Robin Mode

For services that need direct connections to individual tasks (e.g., databases with read replicas, stateful services), you can disable the VIP and use DNS round-robin:

# Create service with DNS round-robin
docker service create \
  --name cache \
  --network app-net \
  --replicas 3 \
  --endpoint-mode dnsrr \
  redis:7

# 'cache' now resolves to all 3 container IPs directly
# Client-side load balancing is required
nslookup cache
# Address: 10.0.1.20
# Address: 10.0.1.21
# Address: 10.0.1.22

Warning: Services using dnsrr endpoint mode cannot use the ingress routing mesh. You cannot publish ports on a DNS round-robin service. External traffic must reach these services through another service that acts as a proxy.

Custom Overlay Networks in Stack Files

A well-structured stack uses multiple overlay networks to isolate traffic between tiers:

version: "3.8"

services:
  nginx:
    image: nginx:latest
    networks:
      - frontend
    ports:
      - "443:443"
    deploy:
      replicas: 2

  api:
    image: myapp/api:latest
    networks:
      - frontend
      - backend
    deploy:
      replicas: 4

  worker:
    image: myapp/worker:latest
    networks:
      - backend
      - db-net
    deploy:
      replicas: 6

  postgres:
    image: postgres:16
    networks:
      - db-net
    deploy:
      replicas: 1

networks:
  frontend:
    driver: overlay
  backend:
    driver: overlay
    driver_opts:
      encrypted: "true"
  db-net:
    driver: overlay
    driver_opts:
      encrypted: "true"
    internal: true  # No outbound internet access

In this topology:

nginx can reach api via the frontend network but cannot reach postgres
api spans both frontend and backend networks, acting as a bridge
worker can reach postgres via db-net but has no access to frontend
postgres is on an internal network with no external connectivity

Load Balancing Internals

Swarm uses two load balancing mechanisms:

Internal Load Balancing (Service VIP)

When a container connects to a service name, the service's VIP receives the connection. The Linux kernel's IPVS module (running in the container's network namespace) distributes connections across backend tasks using round-robin.

# Inspect the IPVS configuration inside a container
docker exec -it $(docker ps -q -f name=web) sh

# Inside the container:
apt-get update && apt-get install -y ipvsadm
ipvsadm -Ln

# Output shows the VIP and its backends:
# TCP  10.0.1.5:8080 rr
#   -> 10.0.1.10:8080    Masq    1      0      0
#   -> 10.0.1.11:8080    Masq    1      0      0
#   -> 10.0.1.12:8080    Masq    1      0      0

External Load Balancing (Ingress)

For published ports, IPVS runs on each node in the ingress network namespace. External traffic hitting any node is distributed to service tasks across the cluster. This is a Layer 4 (TCP/UDP) load balancer, not Layer 7 (HTTP).

For Layer 7 load balancing (path-based routing, TLS termination, host-based routing), deploy a reverse proxy like Traefik, Nginx, or HAProxy as a Swarm service:

version: "3.8"
services:
  traefik:
    image: traefik:v3.0
    command:
      - "--providers.docker.swarmMode=true"
      - "--entrypoints.web.address=:80"
      - "--entrypoints.websecure.address=:443"
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
    deploy:
      placement:
        constraints:
          - node.role == manager
    networks:
      - proxy

  api:
    image: myapp/api:latest
    deploy:
      labels:
        - "traefik.enable=true"
        - "traefik.http.routers.api.rule=Host(`api.example.com`)"
        - "traefik.http.services.api.loadbalancer.server.port=8080"
      replicas: 4
    networks:
      - proxy

networks:
  proxy:
    driver: overlay

Network Troubleshooting

When services cannot communicate, follow this systematic debugging process:

Step 1: Verify Network Attachment

# Check which networks a service is on
docker service inspect --pretty myapp_api | grep -A5 "Networks"

# List all tasks and their nodes
docker service ps myapp_api

# Inspect a specific task's network settings
docker inspect $(docker ps -q -f name=myapp_api.1) \
  --format '{{json .NetworkSettings.Networks}}' | jq .

Step 2: Test DNS Resolution

# Exec into a container and test DNS
docker exec -it $(docker ps -q -f name=myapp_web.1) sh

# Test service discovery
nslookup api
nslookup tasks.api

# If DNS fails, check the embedded DNS server
cat /etc/resolv.conf
# Should show: nameserver 127.0.0.11

Step 3: Test Connectivity

# From inside a container on the same network
wget -qO- http://api:8080/health
# or
curl http://api:8080/health

# Test raw TCP connectivity
nc -zv api 8080

Step 4: Check for Port Conflicts and Firewall Issues

# Verify required Swarm ports are open between nodes
# TCP 2377 - Cluster management
# TCP/UDP 7946 - Node communication (gossip)
# UDP 4789 - VXLAN overlay network traffic

# Test from node to node
nc -zv manager-01 2377
nc -zv worker-01 7946
nc -zuv worker-01 4789

Step 5: Inspect Network Internals

# List all overlay networks
docker network ls --filter driver=overlay

# Inspect a network for connected containers
docker network inspect my-app-network

# Check for stale network endpoints (common after node failures)
docker network inspect my-app-network \
  --format '{{range .Containers}}{{.Name}} {{.IPv4Address}}{{println}}{{end}}'

# Prune unused networks
docker network prune

Tip: For persistent network debugging, deploy a nicolaka/netshoot container attached to the overlay network in question. It comes with tcpdump, iperf, nslookup, and other network diagnostic tools pre-installed. Use usulnet to quickly attach diagnostic containers to any network in your Swarm without remembering CLI flags.

# Deploy netshoot as a Swarm service for debugging
docker service create \
  --name netshoot \
  --network my-app-network \
  --constraint 'node.hostname == worker-01' \
  --replicas 1 \
  nicolaka/netshoot sleep 3600

# Exec into it
docker exec -it $(docker ps -q -f name=netshoot) bash

# Run diagnostics
tcpdump -i eth0 -n port 8080
iperf3 -c api -p 5201
traceroute api

Network Performance Optimization

Several tuning options can improve overlay network performance:

# Increase connection tracking table size on all nodes
sysctl -w net.netfilter.nf_conntrack_max=1048576
sysctl -w net.netfilter.nf_conntrack_tcp_timeout_established=86400

# Optimize VXLAN performance
sysctl -w net.core.rmem_max=16777216
sysctl -w net.core.wmem_max=16777216

# Make persistent in /etc/sysctl.d/99-swarm.conf
cat > /etc/sysctl.d/99-swarm.conf << 'EOF'
net.netfilter.nf_conntrack_max=1048576
net.netfilter.nf_conntrack_tcp_timeout_established=86400
net.core.rmem_max=16777216
net.core.wmem_max=16777216
EOF
sysctl --system

Conclusion

Docker Swarm's networking model is deceptively powerful. Overlay networks, the ingress routing mesh, and built-in service discovery handle the vast majority of container networking requirements without external tools. The key to operating it well is understanding what happens at each layer: when a packet is encapsulated and why, how IPVS routes traffic, and where DNS resolution fits in.

For teams managing Swarm clusters, platforms like usulnet provide a visual overlay of your network topology, making it easier to see which services are on which networks and diagnose connectivity issues without digging through CLI output. Combined with the troubleshooting techniques in this guide, you should be able to diagnose and resolve any Swarm networking issue you encounter.

Swarm Network Architecture Overview

Overlay Networks

Creating Overlay Networks

Understanding VXLAN Encapsulation

The Ingress Routing Mesh

How the Routing Mesh Works

Bypassing the Routing Mesh

Encrypted Overlay Networks

Service Discovery and DNS

VIP-Based Discovery (Default)

DNS Round-Robin Mode

Custom Overlay Networks in Stack Files

Load Balancing Internals

Internal Load Balancing (Service VIP)

External Load Balancing (Ingress)

Network Troubleshooting

Step 1: Verify Network Attachment

Step 2: Test DNS Resolution

Step 3: Test Connectivity

Step 4: Check for Port Conflicts and Firewall Issues

Step 5: Inspect Network Internals

Network Performance Optimization

Conclusion

Related Articles

Docker Swarm in Production: Complete Deployment and Operations Guide

Securing Docker Swarm: Hardening Your Cluster Against Threats

Docker Networking Explained: Bridge, Host, Overlay and Beyond