Performance

Linux Performance Tuning: Optimizing Your Server for Maximum Throughput

March 14, 2025 · 20 min read

A default Linux installation is tuned for a general-purpose workload that suits nobody particularly well. A database server needs different tuning than a web server. A Docker host running dozens of containers has different requirements than a bare-metal application. The kernel's default parameters are conservative, designed for compatibility rather than peak performance.

This guide covers the key areas of Linux performance tuning, from hardware-level CPU settings through kernel parameters to application-level profiling. Every recommendation includes the reasoning behind it, so you can make informed decisions for your specific workload.

CPU Performance: Governors and Scheduling

CPU Frequency Governors

CPU governors control how the processor scales its frequency. The wrong governor can leave performance on the table or waste power:

# Check current governor
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor

# List available governors
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors
# performance  powersave  ondemand  conservative  schedutil

# Set governor for all CPUs
for cpu in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do
    echo "performance" > "$cpu"
done

# Or use cpupower
cpupower frequency-set -g performance

# Check current frequencies
cpupower frequency-info
watch -n 1 'cat /proc/cpuinfo | grep "MHz"'

Governor	Behavior	Use Case
`performance`	Always maximum frequency	Latency-sensitive servers, databases
`schedutil`	Scheduler-driven scaling	Modern default, good balance
`ondemand`	Scale up on load, down on idle	Variable workloads
`powersave`	Always minimum frequency	Battery-powered or cost-optimized
`conservative`	Gradual scaling	Steady workloads

Tip: On servers, use performance governor. The power savings from frequency scaling are negligible compared to the latency cost, especially for database and web workloads.

CPU Affinity and Isolation

# Pin a process to specific CPUs
taskset -c 0,1 /opt/myapp/bin/server

# Isolate CPUs from the scheduler (in kernel command line)
# GRUB: isolcpus=2,3,4,5
# These CPUs will only run tasks explicitly assigned to them

# With systemd, use CPUAffinity
[Service]
CPUAffinity=0 1 2 3

I/O Schedulers

The I/O scheduler determines how disk read/write requests are ordered and merged:

# Check current scheduler
cat /sys/block/sda/queue/scheduler
# [mq-deadline] kyber bfq none

# Change scheduler
echo "kyber" > /sys/block/sda/queue/scheduler

# Make persistent via udev rule
# /etc/udev/rules.d/60-scheduler.rules
ACTION=="add|change", KERNEL=="sd[a-z]*", ATTR{queue/rotational}=="0", \
  ATTR{queue/scheduler}="none"
ACTION=="add|change", KERNEL=="sd[a-z]*", ATTR{queue/rotational}=="1", \
  ATTR{queue/scheduler}="mq-deadline"

Scheduler	Best For	Description
`none`	NVMe SSDs	No scheduling overhead, best for fast storage
`mq-deadline`	SATA SSDs, databases	Deadline-based, prevents starvation
`kyber`	Fast SSDs, mixed workloads	Low-overhead, latency-targeted
`bfq`	HDDs, interactive use	Budget Fair Queuing, per-process fairness

Memory Management

Swappiness

# Check current swappiness (0-200, default 60)
cat /proc/sys/vm/swappiness

# For servers with enough RAM, reduce swappiness
# This reduces swap usage but does not disable it
sysctl vm.swappiness=10

# For database servers (PostgreSQL, MySQL), very low swappiness
sysctl vm.swappiness=1

# Make persistent
echo "vm.swappiness = 10" >> /etc/sysctl.d/99-performance.conf

Dirty Page Ratios

# How much dirty (unwritten) data to allow before flushing to disk
# Default: 20% of RAM (dirty_ratio), start flushing at 10% (dirty_background_ratio)

# For write-heavy workloads (databases, logging)
sysctl vm.dirty_ratio=15
sysctl vm.dirty_background_ratio=5
sysctl vm.dirty_expire_centisecs=3000
sysctl vm.dirty_writeback_centisecs=500

# For SSDs, flush more aggressively
sysctl vm.dirty_ratio=10
sysctl vm.dirty_background_ratio=3

Transparent Huge Pages (THP)

# Check THP status
cat /sys/kernel/mm/transparent_hugepage/enabled
# [always] madvise never

# For most servers and databases, disable THP
# (THP can cause latency spikes due to compaction)
echo "never" > /sys/kernel/mm/transparent_hugepage/enabled
echo "never" > /sys/kernel/mm/transparent_hugepage/defrag

# For Redis specifically, always disable THP
# For PostgreSQL, disable or use madvise
echo "madvise" > /sys/kernel/mm/transparent_hugepage/enabled

# Make persistent via systemd service or kernel parameter
# Kernel command line: transparent_hugepage=never

Warning: Redis, MongoDB, and many databases explicitly recommend disabling Transparent Huge Pages. THP compaction events cause unpredictable latency spikes that can impact database response times significantly.

Filesystem Mount Options

# Check current mount options
mount | grep "^/dev"
findmnt -l

# Optimized mount options for ext4
# /etc/fstab entry:
/dev/sda2 / ext4 defaults,noatime,nodiratime,commit=60 0 1

# Optimized for XFS
/dev/sda2 / xfs defaults,noatime,nodiratime,logbufs=8,logbsize=256k 0 1

# For /tmp (tmpfs in RAM)
tmpfs /tmp tmpfs defaults,noatime,nosuid,nodev,size=4G 0 0

Option	Effect	Recommendation
`noatime`	Do not update access time on reads	Always enable on servers
`nodiratime`	Do not update directory access time	Always enable on servers
`commit=60`	Flush journal every 60 seconds (ext4)	Higher = better performance, more risk
`data=writeback`	Metadata journaled, data not (ext4)	Faster writes, slight corruption risk
`discard`	Enable TRIM for SSDs	Use fstrim.timer instead (less overhead)
`barrier=0`	Disable write barriers	Only with battery-backed RAID controller

Network Tuning

TCP Buffer Sizes

# /etc/sysctl.d/99-network-performance.conf

# Increase socket buffer sizes
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.core.rmem_default = 1048576
net.core.wmem_default = 1048576

# TCP buffer auto-tuning (min, default, max)
net.ipv4.tcp_rmem = 4096 1048576 16777216
net.ipv4.tcp_wmem = 4096 1048576 16777216

# Increase connection backlog
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 65536
net.ipv4.tcp_max_syn_backlog = 65535

# Enable TCP Fast Open
net.ipv4.tcp_fastopen = 3

# Reduce TIME_WAIT connections
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fin_timeout = 15

# Increase local port range
net.ipv4.ip_local_port_range = 1024 65535

# Enable BBR congestion control (if available)
net.core.default_qdisc = fq
net.ipv4.tcp_congestion_control = bbr

# Verify BBR is active
sysctl net.ipv4.tcp_congestion_control
# net.ipv4.tcp_congestion_control = bbr

# Check available congestion control algorithms
sysctl net.ipv4.tcp_available_congestion_control

Connection Tracking (for Docker/Firewall Hosts)

# Increase conntrack table size
net.netfilter.nf_conntrack_max = 1048576
net.nf_conntrack_max = 1048576

# Reduce conntrack timeouts
net.netfilter.nf_conntrack_tcp_timeout_established = 600
net.netfilter.nf_conntrack_tcp_timeout_time_wait = 30

# Increase hash table size (set at boot via modprobe)
echo "options nf_conntrack hashsize=262144" > /etc/modprobe.d/nf_conntrack.conf

Profiling and Monitoring Tools

htop -- Interactive Process Viewer

# Install
pacman -S htop   # Arch
apt install htop # Debian/Ubuntu

# Useful htop keybindings
# F5 - Tree view (show parent/child relationships)
# F6 - Sort by column
# F9 - Kill process
# H  - Toggle user threads
# K  - Toggle kernel threads
# M  - Sort by memory
# P  - Sort by CPU

vmstat -- Virtual Memory Statistics

# Report every 2 seconds, 10 times
vmstat 2 10

# Key columns:
# r  - Processes waiting for CPU
# b  - Processes in uninterruptible sleep (I/O wait)
# si/so - Swap in/out (should be near zero)
# bi/bo - Block I/O in/out
# us - User CPU%
# sy - System CPU%
# wa - I/O wait% (high = disk bottleneck)
# id - Idle%

iotop -- I/O Monitoring

# Show per-process I/O usage
iotop -o        # Only show processes doing I/O
iotop -a        # Accumulated I/O (total since start)
iotop -P        # Show only processes (not threads)

perf -- Linux Performance Profiler

# Install
apt install linux-tools-common linux-tools-$(uname -r)

# Profile system-wide for 10 seconds
perf record -ag -- sleep 10
perf report

# Top-like live CPU profiling
perf top

# Count cache misses
perf stat -e cache-misses,cache-references ./myapp

# Profile a specific command
perf record -g ./myapp --benchmark
perf report --stdio

# Generate flame graph data
perf script | stackcollapse-perf.pl | flamegraph.pl > flame.svg

Quick Diagnostic Commands

# System overview
uptime                    # Load averages
free -h                   # Memory usage
df -h                     # Disk space
iostat -x 2               # Disk I/O statistics
mpstat -P ALL 2           # Per-CPU statistics
sar -n DEV 2              # Network throughput

# Find resource hogs
ps aux --sort=-%mem | head -20   # Top memory consumers
ps aux --sort=-%cpu | head -20   # Top CPU consumers

# Check for I/O bottleneck
iowait=$(vmstat 1 2 | tail -1 | awk '{print $16}')
echo "I/O wait: ${iowait}%"

Putting It All Together: Performance Profile

Here is a complete sysctl configuration for a high-performance web/Docker server:

# /etc/sysctl.d/99-performance.conf
# CPU and Memory
vm.swappiness = 10
vm.dirty_ratio = 15
vm.dirty_background_ratio = 5
vm.vfs_cache_pressure = 50

# Network Performance
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 65536
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 1048576 16777216
net.ipv4.tcp_wmem = 4096 1048576 16777216
net.ipv4.tcp_max_syn_backlog = 65535
net.ipv4.tcp_fastopen = 3
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fin_timeout = 15
net.ipv4.ip_local_port_range = 1024 65535
net.core.default_qdisc = fq
net.ipv4.tcp_congestion_control = bbr

# File Descriptors
fs.file-max = 2097152
fs.nr_open = 2097152
fs.inotify.max_user_watches = 524288

# Docker / Container Networking
net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-iptables = 1
net.netfilter.nf_conntrack_max = 1048576

For Docker hosts specifically, monitoring container resource usage is critical. usulnet provides real-time CPU, memory, and I/O metrics for every container, alerting you when resource limits are approached so you can tune before performance degrades.

Measure first, tune second: Never apply tuning parameters blindly. Profile your workload, identify the bottleneck (CPU, memory, I/O, or network), and then apply targeted optimizations. Re-measure after each change to verify the improvement. Tuning without measurement is superstition, not engineering.

CPU Performance: Governors and Scheduling

CPU Frequency Governors

CPU Affinity and Isolation

I/O Schedulers

Memory Management

Swappiness

Dirty Page Ratios

Transparent Huge Pages (THP)

Filesystem Mount Options

Network Tuning

TCP Buffer Sizes

Connection Tracking (for Docker/Firewall Hosts)

Profiling and Monitoring Tools

htop -- Interactive Process Viewer

vmstat -- Virtual Memory Statistics

iotop -- I/O Monitoring

perf -- Linux Performance Profiler

Quick Diagnostic Commands

Putting It All Together: Performance Profile

Related Articles

Linux Kernel Tuning: sysctl Parameters Every Admin Should Know

Linux Disk Management: LVM, RAID, and Filesystem Administration

Docker Resource Limits