Docker Swarm Networking Deep Dive: Overlay Networks, Ingress and Service Mesh
Docker Swarm networking is where most operational complexity hides. On the surface, it looks simple: create an overlay network, attach services, and they can talk to each other across nodes. Under the hood, Swarm is managing VXLAN tunnels, a distributed DNS system, an internal load balancer (IPVS), and the ingress routing mesh. Understanding these layers is essential for debugging connectivity issues, optimizing performance, and securing inter-service communication.
This article dissects every networking layer in Docker Swarm, from the data plane to the control plane, with practical examples and troubleshooting techniques you will actually need in production.
Swarm Network Architecture Overview
When you initialize a Swarm, Docker automatically creates two networks:
| Network | Driver | Purpose | Scope |
|---|---|---|---|
ingress |
overlay | Routing mesh for published ports | All Swarm nodes |
docker_gwbridge |
bridge | Connects overlay networks to the host's network stack | Per node |
Every service you create is automatically attached to the ingress network if it publishes ports. For service-to-service communication, you create custom overlay networks.
Overlay Networks
Overlay networks use VXLAN (Virtual Extensible LAN) to encapsulate Layer 2 Ethernet frames inside UDP packets, creating a virtual network that spans all Swarm nodes. Each overlay network gets its own isolated subnet and VXLAN Network Identifier (VNI).
Creating Overlay Networks
# Basic overlay network
docker network create \
--driver overlay \
--subnet 10.10.0.0/24 \
my-app-network
# Encrypted overlay network (IPSec between nodes)
docker network create \
--driver overlay \
--opt encrypted \
--subnet 10.11.0.0/24 \
secure-network
# Internal network (no external access)
docker network create \
--driver overlay \
--internal \
--subnet 10.12.0.0/24 \
db-network
# Attachable network (allows standalone containers)
docker network create \
--driver overlay \
--attachable \
--subnet 10.13.0.0/24 \
debug-network
Understanding VXLAN Encapsulation
When a container on Node A sends a packet to a container on Node B via an overlay network, the following happens:
- The packet exits the container's veth pair into the overlay network's Linux bridge (br0)
- The VXLAN driver encapsulates the entire Ethernet frame inside a UDP packet (destination port 4789)
- The outer UDP packet is routed through the host's physical network to Node B
- Node B's VXLAN driver decapsulates the UDP packet and delivers the original frame to the local overlay bridge
- The frame reaches the destination container through its veth pair
This encapsulation adds approximately 50 bytes of overhead per packet (VXLAN header + outer UDP/IP/Ethernet headers). For most workloads this is negligible, but for high-throughput, small-packet workloads (e.g., memcached, Redis), the overhead can be measurable.
# Create overlay with custom MTU
docker network create \
--driver overlay \
--opt com.docker.network.driver.mtu=1450 \
--subnet 10.10.0.0/24 \
my-app-network
The Ingress Routing Mesh
The ingress routing mesh is Swarm's built-in load balancer for external traffic. When you publish a port on a service, every node in the Swarm listens on that port, even if the node is not running a replica of that service. Incoming traffic is routed via IPVS (IP Virtual Server) to a node that is running the service.
# Publish port 80 on all nodes, routing to container port 8080
docker service create \
--name web \
--replicas 3 \
--publish published=80,target=8080 \
nginx:latest
# Verify: any node's IP on port 80 reaches the service
curl http://node-1:80
curl http://node-2:80 # Works even if node-2 has no replica
curl http://node-3:80
How the Routing Mesh Works
The routing mesh uses Linux IPVS in NAT mode. When a request arrives at any node on the published port:
- The packet hits an iptables DNAT rule that redirects it to the ingress network's IPVS virtual IP
- IPVS selects a backend (a container running the service) using round-robin by default
- The packet is forwarded via the overlay ingress network to the selected container
- The response follows the reverse path back through the originating node
Bypassing the Routing Mesh
For workloads that need client IP preservation or want to avoid the extra hop, you can publish ports in host mode:
# Host mode: each task binds directly to the host's port
docker service create \
--name web-direct \
--mode global \
--publish mode=host,published=80,target=8080 \
nginx:latest
--mode global to ensure one replica per node, or use constraints to control placement. Also note that the client will only reach the service on nodes that are running a replica.
| Feature | Routing Mesh (ingress) | Host Mode |
|---|---|---|
| Client IP visible | No (NAT obscures it) | Yes |
| Any node answers | Yes | Only nodes with replicas |
| Multiple replicas per node | Yes | No (port conflict) |
| Extra network hop | Yes (IPVS forwarding) | No |
| External LB needed | Optional | Required for HA |
Encrypted Overlay Networks
By default, overlay network traffic is unencrypted. Data traverses the physical network as plain-text VXLAN-encapsulated packets. For sensitive traffic, enable encryption:
# Create encrypted overlay
docker network create \
--driver overlay \
--opt encrypted \
secure-backend
# In a stack file
networks:
secure-backend:
driver: overlay
driver_opts:
encrypted: "true"
Encryption uses IPSec ESP (Encapsulating Security Payload) with AES-GCM. The keys are automatically negotiated between nodes using the Swarm's built-in PKI. There is no manual key management required.
Performance impact: Encrypted overlay networks add roughly 10-20% latency overhead and reduce throughput by 15-30% compared to unencrypted overlays. Benchmark your specific workload before enabling encryption cluster-wide. Reserve it for networks carrying sensitive data (database connections, API tokens, PII).
Service Discovery and DNS
Swarm provides built-in DNS-based service discovery. Every service gets a DNS entry that resolves to the service's virtual IP (VIP) or directly to the individual task IPs.
VIP-Based Discovery (Default)
When a service is created, Swarm assigns it a virtual IP on each overlay network it is attached to. The service name resolves to this VIP, and IPVS load balances across all healthy tasks.
# Create two services on the same network
docker network create --driver overlay app-net
docker service create --name api --network app-net --replicas 3 myapp/api
docker service create --name web --network app-net --replicas 2 myapp/web
# From inside a web container:
# 'api' resolves to the VIP, e.g., 10.0.1.5
# IPVS load balances across all 3 api replicas
nslookup api
# Server: 127.0.0.11
# Name: api
# Address: 10.0.1.5
# To see individual task IPs, query tasks.
nslookup tasks.api
# Name: tasks.api
# Address: 10.0.1.10
# Address: 10.0.1.11
# Address: 10.0.1.12
DNS Round-Robin Mode
For services that need direct connections to individual tasks (e.g., databases with read replicas, stateful services), you can disable the VIP and use DNS round-robin:
# Create service with DNS round-robin
docker service create \
--name cache \
--network app-net \
--replicas 3 \
--endpoint-mode dnsrr \
redis:7
# 'cache' now resolves to all 3 container IPs directly
# Client-side load balancing is required
nslookup cache
# Address: 10.0.1.20
# Address: 10.0.1.21
# Address: 10.0.1.22
dnsrr endpoint mode cannot use the ingress routing mesh. You cannot publish ports on a DNS round-robin service. External traffic must reach these services through another service that acts as a proxy.
Custom Overlay Networks in Stack Files
A well-structured stack uses multiple overlay networks to isolate traffic between tiers:
version: "3.8"
services:
nginx:
image: nginx:latest
networks:
- frontend
ports:
- "443:443"
deploy:
replicas: 2
api:
image: myapp/api:latest
networks:
- frontend
- backend
deploy:
replicas: 4
worker:
image: myapp/worker:latest
networks:
- backend
- db-net
deploy:
replicas: 6
postgres:
image: postgres:16
networks:
- db-net
deploy:
replicas: 1
networks:
frontend:
driver: overlay
backend:
driver: overlay
driver_opts:
encrypted: "true"
db-net:
driver: overlay
driver_opts:
encrypted: "true"
internal: true # No outbound internet access
In this topology:
nginxcan reachapivia thefrontendnetwork but cannot reachpostgresapispans bothfrontendandbackendnetworks, acting as a bridgeworkercan reachpostgresviadb-netbut has no access tofrontendpostgresis on aninternalnetwork with no external connectivity
Load Balancing Internals
Swarm uses two load balancing mechanisms:
Internal Load Balancing (Service VIP)
When a container connects to a service name, the service's VIP receives the connection. The Linux kernel's IPVS module (running in the container's network namespace) distributes connections across backend tasks using round-robin.
# Inspect the IPVS configuration inside a container
docker exec -it $(docker ps -q -f name=web) sh
# Inside the container:
apt-get update && apt-get install -y ipvsadm
ipvsadm -Ln
# Output shows the VIP and its backends:
# TCP 10.0.1.5:8080 rr
# -> 10.0.1.10:8080 Masq 1 0 0
# -> 10.0.1.11:8080 Masq 1 0 0
# -> 10.0.1.12:8080 Masq 1 0 0
External Load Balancing (Ingress)
For published ports, IPVS runs on each node in the ingress network namespace. External traffic hitting any node is distributed to service tasks across the cluster. This is a Layer 4 (TCP/UDP) load balancer, not Layer 7 (HTTP).
For Layer 7 load balancing (path-based routing, TLS termination, host-based routing), deploy a reverse proxy like Traefik, Nginx, or HAProxy as a Swarm service:
version: "3.8"
services:
traefik:
image: traefik:v3.0
command:
- "--providers.docker.swarmMode=true"
- "--entrypoints.web.address=:80"
- "--entrypoints.websecure.address=:443"
ports:
- "80:80"
- "443:443"
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
deploy:
placement:
constraints:
- node.role == manager
networks:
- proxy
api:
image: myapp/api:latest
deploy:
labels:
- "traefik.enable=true"
- "traefik.http.routers.api.rule=Host(`api.example.com`)"
- "traefik.http.services.api.loadbalancer.server.port=8080"
replicas: 4
networks:
- proxy
networks:
proxy:
driver: overlay
Network Troubleshooting
When services cannot communicate, follow this systematic debugging process:
Step 1: Verify Network Attachment
# Check which networks a service is on
docker service inspect --pretty myapp_api | grep -A5 "Networks"
# List all tasks and their nodes
docker service ps myapp_api
# Inspect a specific task's network settings
docker inspect $(docker ps -q -f name=myapp_api.1) \
--format '{{json .NetworkSettings.Networks}}' | jq .
Step 2: Test DNS Resolution
# Exec into a container and test DNS
docker exec -it $(docker ps -q -f name=myapp_web.1) sh
# Test service discovery
nslookup api
nslookup tasks.api
# If DNS fails, check the embedded DNS server
cat /etc/resolv.conf
# Should show: nameserver 127.0.0.11
Step 3: Test Connectivity
# From inside a container on the same network
wget -qO- http://api:8080/health
# or
curl http://api:8080/health
# Test raw TCP connectivity
nc -zv api 8080
Step 4: Check for Port Conflicts and Firewall Issues
# Verify required Swarm ports are open between nodes
# TCP 2377 - Cluster management
# TCP/UDP 7946 - Node communication (gossip)
# UDP 4789 - VXLAN overlay network traffic
# Test from node to node
nc -zv manager-01 2377
nc -zv worker-01 7946
nc -zuv worker-01 4789
Step 5: Inspect Network Internals
# List all overlay networks
docker network ls --filter driver=overlay
# Inspect a network for connected containers
docker network inspect my-app-network
# Check for stale network endpoints (common after node failures)
docker network inspect my-app-network \
--format '{{range .Containers}}{{.Name}} {{.IPv4Address}}{{println}}{{end}}'
# Prune unused networks
docker network prune
nicolaka/netshoot container attached to the overlay network in question. It comes with tcpdump, iperf, nslookup, and other network diagnostic tools pre-installed. Use usulnet to quickly attach diagnostic containers to any network in your Swarm without remembering CLI flags.
# Deploy netshoot as a Swarm service for debugging
docker service create \
--name netshoot \
--network my-app-network \
--constraint 'node.hostname == worker-01' \
--replicas 1 \
nicolaka/netshoot sleep 3600
# Exec into it
docker exec -it $(docker ps -q -f name=netshoot) bash
# Run diagnostics
tcpdump -i eth0 -n port 8080
iperf3 -c api -p 5201
traceroute api
Network Performance Optimization
Several tuning options can improve overlay network performance:
# Increase connection tracking table size on all nodes
sysctl -w net.netfilter.nf_conntrack_max=1048576
sysctl -w net.netfilter.nf_conntrack_tcp_timeout_established=86400
# Optimize VXLAN performance
sysctl -w net.core.rmem_max=16777216
sysctl -w net.core.wmem_max=16777216
# Make persistent in /etc/sysctl.d/99-swarm.conf
cat > /etc/sysctl.d/99-swarm.conf << 'EOF'
net.netfilter.nf_conntrack_max=1048576
net.netfilter.nf_conntrack_tcp_timeout_established=86400
net.core.rmem_max=16777216
net.core.wmem_max=16777216
EOF
sysctl --system
Conclusion
Docker Swarm's networking model is deceptively powerful. Overlay networks, the ingress routing mesh, and built-in service discovery handle the vast majority of container networking requirements without external tools. The key to operating it well is understanding what happens at each layer: when a packet is encapsulated and why, how IPVS routes traffic, and where DNS resolution fits in.
For teams managing Swarm clusters, platforms like usulnet provide a visual overlay of your network topology, making it easier to see which services are on which networks and diagnose connectivity issues without digging through CLI output. Combined with the troubleshooting techniques in this guide, you should be able to diagnose and resolve any Swarm networking issue you encounter.