Linux Performance Tuning: Optimizing Your Server for Maximum Throughput
A default Linux installation is tuned for a general-purpose workload that suits nobody particularly well. A database server needs different tuning than a web server. A Docker host running dozens of containers has different requirements than a bare-metal application. The kernel's default parameters are conservative, designed for compatibility rather than peak performance.
This guide covers the key areas of Linux performance tuning, from hardware-level CPU settings through kernel parameters to application-level profiling. Every recommendation includes the reasoning behind it, so you can make informed decisions for your specific workload.
CPU Performance: Governors and Scheduling
CPU Frequency Governors
CPU governors control how the processor scales its frequency. The wrong governor can leave performance on the table or waste power:
# Check current governor
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
# List available governors
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors
# performance powersave ondemand conservative schedutil
# Set governor for all CPUs
for cpu in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do
echo "performance" > "$cpu"
done
# Or use cpupower
cpupower frequency-set -g performance
# Check current frequencies
cpupower frequency-info
watch -n 1 'cat /proc/cpuinfo | grep "MHz"'
| Governor | Behavior | Use Case |
|---|---|---|
performance |
Always maximum frequency | Latency-sensitive servers, databases |
schedutil |
Scheduler-driven scaling | Modern default, good balance |
ondemand |
Scale up on load, down on idle | Variable workloads |
powersave |
Always minimum frequency | Battery-powered or cost-optimized |
conservative |
Gradual scaling | Steady workloads |
performance governor. The power savings from frequency scaling are negligible compared to the latency cost, especially for database and web workloads.
CPU Affinity and Isolation
# Pin a process to specific CPUs
taskset -c 0,1 /opt/myapp/bin/server
# Isolate CPUs from the scheduler (in kernel command line)
# GRUB: isolcpus=2,3,4,5
# These CPUs will only run tasks explicitly assigned to them
# With systemd, use CPUAffinity
[Service]
CPUAffinity=0 1 2 3
I/O Schedulers
The I/O scheduler determines how disk read/write requests are ordered and merged:
# Check current scheduler
cat /sys/block/sda/queue/scheduler
# [mq-deadline] kyber bfq none
# Change scheduler
echo "kyber" > /sys/block/sda/queue/scheduler
# Make persistent via udev rule
# /etc/udev/rules.d/60-scheduler.rules
ACTION=="add|change", KERNEL=="sd[a-z]*", ATTR{queue/rotational}=="0", \
ATTR{queue/scheduler}="none"
ACTION=="add|change", KERNEL=="sd[a-z]*", ATTR{queue/rotational}=="1", \
ATTR{queue/scheduler}="mq-deadline"
| Scheduler | Best For | Description |
|---|---|---|
none |
NVMe SSDs | No scheduling overhead, best for fast storage |
mq-deadline |
SATA SSDs, databases | Deadline-based, prevents starvation |
kyber |
Fast SSDs, mixed workloads | Low-overhead, latency-targeted |
bfq |
HDDs, interactive use | Budget Fair Queuing, per-process fairness |
Memory Management
Swappiness
# Check current swappiness (0-200, default 60)
cat /proc/sys/vm/swappiness
# For servers with enough RAM, reduce swappiness
# This reduces swap usage but does not disable it
sysctl vm.swappiness=10
# For database servers (PostgreSQL, MySQL), very low swappiness
sysctl vm.swappiness=1
# Make persistent
echo "vm.swappiness = 10" >> /etc/sysctl.d/99-performance.conf
Dirty Page Ratios
# How much dirty (unwritten) data to allow before flushing to disk
# Default: 20% of RAM (dirty_ratio), start flushing at 10% (dirty_background_ratio)
# For write-heavy workloads (databases, logging)
sysctl vm.dirty_ratio=15
sysctl vm.dirty_background_ratio=5
sysctl vm.dirty_expire_centisecs=3000
sysctl vm.dirty_writeback_centisecs=500
# For SSDs, flush more aggressively
sysctl vm.dirty_ratio=10
sysctl vm.dirty_background_ratio=3
Transparent Huge Pages (THP)
# Check THP status
cat /sys/kernel/mm/transparent_hugepage/enabled
# [always] madvise never
# For most servers and databases, disable THP
# (THP can cause latency spikes due to compaction)
echo "never" > /sys/kernel/mm/transparent_hugepage/enabled
echo "never" > /sys/kernel/mm/transparent_hugepage/defrag
# For Redis specifically, always disable THP
# For PostgreSQL, disable or use madvise
echo "madvise" > /sys/kernel/mm/transparent_hugepage/enabled
# Make persistent via systemd service or kernel parameter
# Kernel command line: transparent_hugepage=never
Filesystem Mount Options
# Check current mount options
mount | grep "^/dev"
findmnt -l
# Optimized mount options for ext4
# /etc/fstab entry:
/dev/sda2 / ext4 defaults,noatime,nodiratime,commit=60 0 1
# Optimized for XFS
/dev/sda2 / xfs defaults,noatime,nodiratime,logbufs=8,logbsize=256k 0 1
# For /tmp (tmpfs in RAM)
tmpfs /tmp tmpfs defaults,noatime,nosuid,nodev,size=4G 0 0
| Option | Effect | Recommendation |
|---|---|---|
noatime |
Do not update access time on reads | Always enable on servers |
nodiratime |
Do not update directory access time | Always enable on servers |
commit=60 |
Flush journal every 60 seconds (ext4) | Higher = better performance, more risk |
data=writeback |
Metadata journaled, data not (ext4) | Faster writes, slight corruption risk |
discard |
Enable TRIM for SSDs | Use fstrim.timer instead (less overhead) |
barrier=0 |
Disable write barriers | Only with battery-backed RAID controller |
Network Tuning
TCP Buffer Sizes
# /etc/sysctl.d/99-network-performance.conf
# Increase socket buffer sizes
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.core.rmem_default = 1048576
net.core.wmem_default = 1048576
# TCP buffer auto-tuning (min, default, max)
net.ipv4.tcp_rmem = 4096 1048576 16777216
net.ipv4.tcp_wmem = 4096 1048576 16777216
# Increase connection backlog
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 65536
net.ipv4.tcp_max_syn_backlog = 65535
# Enable TCP Fast Open
net.ipv4.tcp_fastopen = 3
# Reduce TIME_WAIT connections
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fin_timeout = 15
# Increase local port range
net.ipv4.ip_local_port_range = 1024 65535
# Enable BBR congestion control (if available)
net.core.default_qdisc = fq
net.ipv4.tcp_congestion_control = bbr
# Verify BBR is active
sysctl net.ipv4.tcp_congestion_control
# net.ipv4.tcp_congestion_control = bbr
# Check available congestion control algorithms
sysctl net.ipv4.tcp_available_congestion_control
Connection Tracking (for Docker/Firewall Hosts)
# Increase conntrack table size
net.netfilter.nf_conntrack_max = 1048576
net.nf_conntrack_max = 1048576
# Reduce conntrack timeouts
net.netfilter.nf_conntrack_tcp_timeout_established = 600
net.netfilter.nf_conntrack_tcp_timeout_time_wait = 30
# Increase hash table size (set at boot via modprobe)
echo "options nf_conntrack hashsize=262144" > /etc/modprobe.d/nf_conntrack.conf
Profiling and Monitoring Tools
htop -- Interactive Process Viewer
# Install
pacman -S htop # Arch
apt install htop # Debian/Ubuntu
# Useful htop keybindings
# F5 - Tree view (show parent/child relationships)
# F6 - Sort by column
# F9 - Kill process
# H - Toggle user threads
# K - Toggle kernel threads
# M - Sort by memory
# P - Sort by CPU
vmstat -- Virtual Memory Statistics
# Report every 2 seconds, 10 times
vmstat 2 10
# Key columns:
# r - Processes waiting for CPU
# b - Processes in uninterruptible sleep (I/O wait)
# si/so - Swap in/out (should be near zero)
# bi/bo - Block I/O in/out
# us - User CPU%
# sy - System CPU%
# wa - I/O wait% (high = disk bottleneck)
# id - Idle%
iotop -- I/O Monitoring
# Show per-process I/O usage
iotop -o # Only show processes doing I/O
iotop -a # Accumulated I/O (total since start)
iotop -P # Show only processes (not threads)
perf -- Linux Performance Profiler
# Install
apt install linux-tools-common linux-tools-$(uname -r)
# Profile system-wide for 10 seconds
perf record -ag -- sleep 10
perf report
# Top-like live CPU profiling
perf top
# Count cache misses
perf stat -e cache-misses,cache-references ./myapp
# Profile a specific command
perf record -g ./myapp --benchmark
perf report --stdio
# Generate flame graph data
perf script | stackcollapse-perf.pl | flamegraph.pl > flame.svg
Quick Diagnostic Commands
# System overview
uptime # Load averages
free -h # Memory usage
df -h # Disk space
iostat -x 2 # Disk I/O statistics
mpstat -P ALL 2 # Per-CPU statistics
sar -n DEV 2 # Network throughput
# Find resource hogs
ps aux --sort=-%mem | head -20 # Top memory consumers
ps aux --sort=-%cpu | head -20 # Top CPU consumers
# Check for I/O bottleneck
iowait=$(vmstat 1 2 | tail -1 | awk '{print $16}')
echo "I/O wait: ${iowait}%"
Putting It All Together: Performance Profile
Here is a complete sysctl configuration for a high-performance web/Docker server:
# /etc/sysctl.d/99-performance.conf
# CPU and Memory
vm.swappiness = 10
vm.dirty_ratio = 15
vm.dirty_background_ratio = 5
vm.vfs_cache_pressure = 50
# Network Performance
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 65536
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 1048576 16777216
net.ipv4.tcp_wmem = 4096 1048576 16777216
net.ipv4.tcp_max_syn_backlog = 65535
net.ipv4.tcp_fastopen = 3
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fin_timeout = 15
net.ipv4.ip_local_port_range = 1024 65535
net.core.default_qdisc = fq
net.ipv4.tcp_congestion_control = bbr
# File Descriptors
fs.file-max = 2097152
fs.nr_open = 2097152
fs.inotify.max_user_watches = 524288
# Docker / Container Networking
net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-iptables = 1
net.netfilter.nf_conntrack_max = 1048576
For Docker hosts specifically, monitoring container resource usage is critical. usulnet provides real-time CPU, memory, and I/O metrics for every container, alerting you when resource limits are approached so you can tune before performance degrades.
Measure first, tune second: Never apply tuning parameters blindly. Profile your workload, identify the bottleneck (CPU, memory, I/O, or network), and then apply targeted optimizations. Re-measure after each change to verify the improvement. Tuning without measurement is superstition, not engineering.