Every system administrator eventually encounters the dreaded "Out of Memory" situation: a critical process is killed, a container disappears, or an entire server becomes unresponsive. Understanding how Linux manages memory is not optional knowledge; it is the foundation for troubleshooting performance issues, sizing servers, and configuring containers correctly.

Linux memory management is sophisticated and often counterintuitive. A server showing 95% memory usage might be perfectly healthy, while one showing 60% usage might be about to OOM. This guide explains what is actually happening under the hood.

Virtual Memory: The Big Picture

Every process in Linux sees its own virtual address space, completely separate from physical RAM. The kernel's memory management unit (MMU) translates virtual addresses to physical addresses through page tables. This abstraction provides several benefits:

  • Isolation: Processes cannot access each other's memory
  • Overcommit: The total virtual memory allocated can exceed physical RAM
  • Demand paging: Physical memory is allocated only when actually accessed
  • Memory-mapped files: Files can be accessed as if they were in memory

Memory is managed in pages, typically 4KB each. When a process accesses a virtual address that has no physical page mapped to it, a page fault occurs. The kernel then allocates a physical page, maps it, and the process continues. This is a minor page fault. A major page fault occurs when the data must be read from disk (swap or a memory-mapped file).

Page Cache: Why "Used" Memory Is Not a Problem

The most common memory misconception is that high memory usage means the system is running out of memory. Linux aggressively uses free RAM as page cache, storing recently read file data in memory so that subsequent reads are served from RAM instead of disk.

$ free -h
               total        used        free      shared  buff/cache   available
Mem:            31Gi        8.2Gi       1.1Gi       312Mi        22Gi        22Gi
Swap:           8.0Gi          0B        8.0Gi

In this example, the server has 31GB of RAM. Only 1.1GB is "free," but 22GB is "available." The difference is page cache: the kernel is using 22GB to cache file data, but it will immediately give that memory back to applications that need it.

Column Meaning Should You Worry?
total Total physical RAM No (it is what it is)
used RAM used by processes Only if approaching total
free Completely unused RAM Low free is normal and healthy
buff/cache Kernel buffers + page cache No (reclaimable on demand)
available Estimated memory available for new processes This is the number that matters

Key insight: Monitor available, not free. A server with 500MB free but 20GB available is healthy. A server with 500MB available is in trouble regardless of what free reports.

Buffers vs Cached

The buff/cache column in free combines two types of cached data:

  • Buffers: Metadata about the filesystem (directory entries, inode data, block device buffers). Typically small (tens to hundreds of MB).
  • Cached: File contents cached in memory (page cache). Can consume most of available RAM on file-server workloads.
# See the breakdown in /proc/meminfo
grep -E "^(MemTotal|MemFree|MemAvailable|Buffers|Cached|SwapTotal|SwapFree|Slab|SReclaimable)" /proc/meminfo
MemTotal:       32780468 kB
MemFree:         1143280 kB
MemAvailable:   23145688 kB
Buffers:          284532 kB
Cached:         21847640 kB
SwapTotal:       8388604 kB
SwapFree:        8388604 kB
Slab:            1023456 kB
SReclaimable:     812340 kB

The SReclaimable component of Slab memory is also reclaimable under memory pressure. The kernel uses slab allocation for its own data structures (dentries, inodes, etc.), and the reclaimable portion can be freed when applications need the memory.

Swap: Extension or Safety Net?

Swap is disk space used as an extension of RAM. When the kernel needs physical memory and has exhausted reclaimable cache, it can move (swap out) infrequently used memory pages to disk, freeing physical RAM for active use.

The Swappiness Parameter

The vm.swappiness parameter (0-200, default 60) controls the kernel's tendency to swap out application memory versus dropping page cache:

# Check current swappiness
cat /proc/sys/vm/swappiness

# Set temporarily (lost on reboot)
sudo sysctl vm.swappiness=10

# Set permanently
echo "vm.swappiness=10" | sudo tee /etc/sysctl.d/99-swappiness.conf
sudo sysctl --system
Swappiness Value Behavior Good For
0 Swap only to avoid OOM Databases (want data in cache)
10 Minimal swap, prefer dropping cache Servers with SSDs
60 (default) Balanced swap and cache General-purpose workloads
100 Aggressively swap application memory Systems with lots of idle processes
Tip: For database servers (PostgreSQL, MySQL), set swappiness to 1 or 10. These applications manage their own caching (shared_buffers, InnoDB buffer pool) and benefit from keeping their memory pages in RAM. Swapping database pages to disk defeats the purpose of database memory caches.

Swap Configuration

# Check current swap
swapon --show

# Create a swap file (4GB)
sudo fallocate -l 4G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile

# Make persistent (add to /etc/fstab)
echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab

# Create swap with zram (compressed in-memory swap)
sudo modprobe zram
echo lz4 | sudo tee /sys/block/zram0/comp_algorithm
echo 4G | sudo tee /sys/block/zram0/disksize
sudo mkswap /dev/zram0
sudo swapon -p 100 /dev/zram0

The OOM Killer

When the system runs out of both physical memory and swap, the kernel's Out-of-Memory (OOM) killer selects and kills processes to free memory. It uses a scoring system to decide which process to sacrifice:

# View OOM score for a process (higher = more likely to be killed)
cat /proc/$(pidof postgres)/oom_score

# View the adjustable OOM score (-1000 to 1000)
cat /proc/$(pidof postgres)/oom_score_adj

# Protect a process from OOM killer (-1000 = never kill)
echo -1000 | sudo tee /proc/$(pidof postgres)/oom_score_adj

# Make a process the first target (1000 = kill first)
echo 1000 | sudo tee /proc/$(pidof some-dispensable-process)/oom_score_adj

The OOM killer considers:

  • Total memory used by the process and its children
  • The process's oom_score_adj (administrative adjustment)
  • Root processes get a slight discount (less likely to be killed)
# Check kernel logs for OOM events
dmesg | grep -i "out of memory"
journalctl -k | grep -i "oom"

# Typical OOM log entry:
# Out of memory: Killed process 12345 (java) total-vm:8234567kB,
# anon-rss:4123456kB, file-rss:0kB, shmem-rss:0kB,
# oom_score_adj:0
Warning: Setting oom_score_adj=-1000 on too many processes can cause the OOM killer to fail to find a candidate, potentially leading to a complete system hang. Only protect genuinely critical processes (like your primary database).

Cgroups and Memory Limits

Control groups (cgroups) allow you to limit memory usage per process group. This is the mechanism Docker uses for container memory limits. Cgroups v2 is the modern standard:

# Check if cgroups v2 is enabled
mount | grep cgroup2

# Create a cgroup with a 512MB memory limit
sudo mkdir /sys/fs/cgroup/myapp
echo 536870912 | sudo tee /sys/fs/cgroup/myapp/memory.max
echo 268435456 | sudo tee /sys/fs/cgroup/myapp/memory.high  # Throttle at 256MB

# Add a process to the cgroup
echo $PID | sudo tee /sys/fs/cgroup/myapp/cgroup.procs

# Monitor memory usage of the cgroup
cat /sys/fs/cgroup/myapp/memory.current
cat /sys/fs/cgroup/myapp/memory.stat

The memory.high and memory.max limits have different behaviors:

Limit Behavior When Exceeded Docker Flag
memory.high Kernel throttles allocations, reclaims aggressively --memory-reservation
memory.max OOM kills processes in the cgroup --memory
memory.swap.max Limits swap usage for the cgroup --memory-swap

Docker Memory Constraints

Docker containers use cgroups for memory isolation. Understanding the flags is critical for production deployments:

# Hard memory limit (container OOM-killed if exceeded)
docker run --memory=512m myapp

# Memory + swap limit (total memory available)
docker run --memory=512m --memory-swap=1g myapp
# This gives 512MB RAM + 512MB swap

# Disable swap for a container
docker run --memory=512m --memory-swap=512m myapp

# Soft limit (reservation, kernel will try to honor under pressure)
docker run --memory=512m --memory-reservation=256m myapp

# Docker Compose equivalent
services:
  app:
    deploy:
      resources:
        limits:
          memory: 512M
        reservations:
          memory: 256M
# Check container memory usage
docker stats --format "table {{.Name}}\t{{.MemUsage}}\t{{.MemPerc}}"

# Detailed memory stats
docker exec mycontainer cat /sys/fs/cgroup/memory.current
docker exec mycontainer cat /sys/fs/cgroup/memory.stat

# Check if a container was OOM-killed
docker inspect mycontainer | jq '.[0].State.OOMKilled'
Tip: Always set memory limits on Docker containers in production. Without limits, a memory leak in one container can consume all host memory and trigger the host-level OOM killer, potentially killing unrelated containers. Tools like usulnet provide real-time memory monitoring across all containers, alerting you before limits are reached.

Monitoring Tools

free

# Human-readable output
free -h

# Wide format (separates buffers and cache)
free -hw

# Continuous monitoring (every 2 seconds)
free -h -s 2

vmstat

# System-wide memory and CPU stats (every 2 seconds)
vmstat 2
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  0      0 1143280 284532 21847640  0    0     2    15   45   89  3  1 96  0  0

# Key columns:
# si (swap in) and so (swap out) - Should be near 0
# free - Unused RAM (low is normal)
# buff + cache - Reclaimable memory
# wa (wait) - CPU time waiting for I/O (high = disk pressure)

/proc/meminfo Deep Dive

# Key fields to monitor
cat /proc/meminfo | grep -E "^(MemTotal|MemAvailable|Dirty|Writeback|AnonPages|Mapped|Shmem|KernelStack|PageTables|Committed_AS|VmallocUsed|HugePages)"

# MemAvailable: How much memory is actually available
# AnonPages: Memory used by processes (not file-backed)
# Dirty: Pages waiting to be written to disk
# Committed_AS: Total memory committed (can exceed physical RAM)
# HugePages_Total: Huge pages allocated

Troubleshooting Memory Issues

Finding Memory Hogs

# Sort processes by memory usage (RSS)
ps aux --sort=-%mem | head -20

# More detailed: proportional set size (accounts for shared memory)
sudo smem -t -k -s pss | tail -20

# Per-process breakdown
cat /proc/$PID/status | grep -E "^(VmRSS|VmSwap|VmSize|RssAnon|RssFile|RssShmem)"

# Find processes using swap
for pid in $(ls /proc/ | grep -E '^[0-9]+$'); do
  swap=$(grep VmSwap /proc/$pid/status 2>/dev/null | awk '{print $2}')
  if [ -n "$swap" ] && [ "$swap" -gt 0 ]; then
    name=$(cat /proc/$pid/comm 2>/dev/null)
    echo "$swap kB - $name (PID: $pid)"
  fi
done | sort -rn | head -10

Diagnosing Memory Leaks

# Track memory growth over time
while true; do
  ps -o pid,rss,comm -p $PID
  sleep 60
done >> memory_track.log

# Use valgrind for detailed leak detection (development)
valgrind --leak-check=full --show-leak-kinds=all ./myapp

# For production: check /proc/$PID/smaps for mapping growth
cat /proc/$PID/smaps_rollup

Huge Pages

Huge pages (2MB or 1GB instead of 4KB) reduce TLB (Translation Lookaside Buffer) misses for applications with large memory footprints, particularly databases:

# Check current huge pages configuration
cat /proc/meminfo | grep Huge
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

# Allocate 1024 huge pages (2GB total)
echo 1024 | sudo tee /proc/sys/vm/nr_hugepages

# Make persistent
echo "vm.nr_hugepages=1024" | sudo tee /etc/sysctl.d/99-hugepages.conf

# Transparent Huge Pages (THP) - automatic, no application changes
cat /sys/kernel/mm/transparent_hugepage/enabled
# [always] madvise never

# Disable THP (recommended for databases like MongoDB, Redis)
echo never | sudo tee /sys/kernel/mm/transparent_hugepage/enabled
Warning: Transparent Huge Pages (THP) can cause latency spikes in database workloads due to compaction overhead. MongoDB and Redis documentation specifically recommend disabling THP. PostgreSQL works well with explicit huge pages but should have THP disabled.

Memory Management Best Practices

  1. Monitor available, not free: Low free memory is normal. Low available memory is a problem.
  2. Set swap to 1-2x RAM on servers with SSDs. It provides a safety buffer without significant performance impact.
  3. Tune swappiness: Set to 10 for database servers, 60 for general workloads.
  4. Always set Docker memory limits: Prevent container memory leaks from affecting the host.
  5. Watch for swap I/O: If vmstat shows constant si/so activity, you either need more RAM or need to find the memory hog.
  6. Protect critical processes: Use oom_score_adj to protect databases and other critical services.
  7. Alert on available memory: Set monitoring alerts when available memory drops below 10-15% of total.

Linux memory management is designed to use every byte of RAM productively. Understanding the difference between "used by applications" and "used by cache" is fundamental to correctly interpreting system health. When combined with proper Docker memory limits and proactive monitoring through tools like usulnet, you can prevent memory-related incidents before they impact your services.