Linux Memory Management: Understanding RAM, Swap and OOM Killer
Every system administrator eventually encounters the dreaded "Out of Memory" situation: a critical process is killed, a container disappears, or an entire server becomes unresponsive. Understanding how Linux manages memory is not optional knowledge; it is the foundation for troubleshooting performance issues, sizing servers, and configuring containers correctly.
Linux memory management is sophisticated and often counterintuitive. A server showing 95% memory usage might be perfectly healthy, while one showing 60% usage might be about to OOM. This guide explains what is actually happening under the hood.
Virtual Memory: The Big Picture
Every process in Linux sees its own virtual address space, completely separate from physical RAM. The kernel's memory management unit (MMU) translates virtual addresses to physical addresses through page tables. This abstraction provides several benefits:
- Isolation: Processes cannot access each other's memory
- Overcommit: The total virtual memory allocated can exceed physical RAM
- Demand paging: Physical memory is allocated only when actually accessed
- Memory-mapped files: Files can be accessed as if they were in memory
Memory is managed in pages, typically 4KB each. When a process accesses a virtual address that has no physical page mapped to it, a page fault occurs. The kernel then allocates a physical page, maps it, and the process continues. This is a minor page fault. A major page fault occurs when the data must be read from disk (swap or a memory-mapped file).
Page Cache: Why "Used" Memory Is Not a Problem
The most common memory misconception is that high memory usage means the system is running out of memory. Linux aggressively uses free RAM as page cache, storing recently read file data in memory so that subsequent reads are served from RAM instead of disk.
$ free -h
total used free shared buff/cache available
Mem: 31Gi 8.2Gi 1.1Gi 312Mi 22Gi 22Gi
Swap: 8.0Gi 0B 8.0Gi
In this example, the server has 31GB of RAM. Only 1.1GB is "free," but 22GB is "available." The difference is page cache: the kernel is using 22GB to cache file data, but it will immediately give that memory back to applications that need it.
| Column | Meaning | Should You Worry? |
|---|---|---|
total |
Total physical RAM | No (it is what it is) |
used |
RAM used by processes | Only if approaching total |
free |
Completely unused RAM | Low free is normal and healthy |
buff/cache |
Kernel buffers + page cache | No (reclaimable on demand) |
available |
Estimated memory available for new processes | This is the number that matters |
Key insight: Monitor
available, notfree. A server with 500MB free but 20GB available is healthy. A server with 500MB available is in trouble regardless of whatfreereports.
Buffers vs Cached
The buff/cache column in free combines two types of cached data:
- Buffers: Metadata about the filesystem (directory entries, inode data, block device buffers). Typically small (tens to hundreds of MB).
- Cached: File contents cached in memory (page cache). Can consume most of available RAM on file-server workloads.
# See the breakdown in /proc/meminfo
grep -E "^(MemTotal|MemFree|MemAvailable|Buffers|Cached|SwapTotal|SwapFree|Slab|SReclaimable)" /proc/meminfo
MemTotal: 32780468 kB
MemFree: 1143280 kB
MemAvailable: 23145688 kB
Buffers: 284532 kB
Cached: 21847640 kB
SwapTotal: 8388604 kB
SwapFree: 8388604 kB
Slab: 1023456 kB
SReclaimable: 812340 kB
The SReclaimable component of Slab memory is also reclaimable under memory pressure. The kernel uses slab allocation for its own data structures (dentries, inodes, etc.), and the reclaimable portion can be freed when applications need the memory.
Swap: Extension or Safety Net?
Swap is disk space used as an extension of RAM. When the kernel needs physical memory and has exhausted reclaimable cache, it can move (swap out) infrequently used memory pages to disk, freeing physical RAM for active use.
The Swappiness Parameter
The vm.swappiness parameter (0-200, default 60) controls the kernel's tendency to swap out application memory versus dropping page cache:
# Check current swappiness
cat /proc/sys/vm/swappiness
# Set temporarily (lost on reboot)
sudo sysctl vm.swappiness=10
# Set permanently
echo "vm.swappiness=10" | sudo tee /etc/sysctl.d/99-swappiness.conf
sudo sysctl --system
| Swappiness Value | Behavior | Good For |
|---|---|---|
| 0 | Swap only to avoid OOM | Databases (want data in cache) |
| 10 | Minimal swap, prefer dropping cache | Servers with SSDs |
| 60 (default) | Balanced swap and cache | General-purpose workloads |
| 100 | Aggressively swap application memory | Systems with lots of idle processes |
Swap Configuration
# Check current swap
swapon --show
# Create a swap file (4GB)
sudo fallocate -l 4G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile
# Make persistent (add to /etc/fstab)
echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab
# Create swap with zram (compressed in-memory swap)
sudo modprobe zram
echo lz4 | sudo tee /sys/block/zram0/comp_algorithm
echo 4G | sudo tee /sys/block/zram0/disksize
sudo mkswap /dev/zram0
sudo swapon -p 100 /dev/zram0
The OOM Killer
When the system runs out of both physical memory and swap, the kernel's Out-of-Memory (OOM) killer selects and kills processes to free memory. It uses a scoring system to decide which process to sacrifice:
# View OOM score for a process (higher = more likely to be killed)
cat /proc/$(pidof postgres)/oom_score
# View the adjustable OOM score (-1000 to 1000)
cat /proc/$(pidof postgres)/oom_score_adj
# Protect a process from OOM killer (-1000 = never kill)
echo -1000 | sudo tee /proc/$(pidof postgres)/oom_score_adj
# Make a process the first target (1000 = kill first)
echo 1000 | sudo tee /proc/$(pidof some-dispensable-process)/oom_score_adj
The OOM killer considers:
- Total memory used by the process and its children
- The process's
oom_score_adj(administrative adjustment) - Root processes get a slight discount (less likely to be killed)
# Check kernel logs for OOM events
dmesg | grep -i "out of memory"
journalctl -k | grep -i "oom"
# Typical OOM log entry:
# Out of memory: Killed process 12345 (java) total-vm:8234567kB,
# anon-rss:4123456kB, file-rss:0kB, shmem-rss:0kB,
# oom_score_adj:0
oom_score_adj=-1000 on too many processes can cause the OOM killer to fail to find a candidate, potentially leading to a complete system hang. Only protect genuinely critical processes (like your primary database).
Cgroups and Memory Limits
Control groups (cgroups) allow you to limit memory usage per process group. This is the mechanism Docker uses for container memory limits. Cgroups v2 is the modern standard:
# Check if cgroups v2 is enabled
mount | grep cgroup2
# Create a cgroup with a 512MB memory limit
sudo mkdir /sys/fs/cgroup/myapp
echo 536870912 | sudo tee /sys/fs/cgroup/myapp/memory.max
echo 268435456 | sudo tee /sys/fs/cgroup/myapp/memory.high # Throttle at 256MB
# Add a process to the cgroup
echo $PID | sudo tee /sys/fs/cgroup/myapp/cgroup.procs
# Monitor memory usage of the cgroup
cat /sys/fs/cgroup/myapp/memory.current
cat /sys/fs/cgroup/myapp/memory.stat
The memory.high and memory.max limits have different behaviors:
| Limit | Behavior When Exceeded | Docker Flag |
|---|---|---|
memory.high |
Kernel throttles allocations, reclaims aggressively | --memory-reservation |
memory.max |
OOM kills processes in the cgroup | --memory |
memory.swap.max |
Limits swap usage for the cgroup | --memory-swap |
Docker Memory Constraints
Docker containers use cgroups for memory isolation. Understanding the flags is critical for production deployments:
# Hard memory limit (container OOM-killed if exceeded)
docker run --memory=512m myapp
# Memory + swap limit (total memory available)
docker run --memory=512m --memory-swap=1g myapp
# This gives 512MB RAM + 512MB swap
# Disable swap for a container
docker run --memory=512m --memory-swap=512m myapp
# Soft limit (reservation, kernel will try to honor under pressure)
docker run --memory=512m --memory-reservation=256m myapp
# Docker Compose equivalent
services:
app:
deploy:
resources:
limits:
memory: 512M
reservations:
memory: 256M
# Check container memory usage
docker stats --format "table {{.Name}}\t{{.MemUsage}}\t{{.MemPerc}}"
# Detailed memory stats
docker exec mycontainer cat /sys/fs/cgroup/memory.current
docker exec mycontainer cat /sys/fs/cgroup/memory.stat
# Check if a container was OOM-killed
docker inspect mycontainer | jq '.[0].State.OOMKilled'
Monitoring Tools
free
# Human-readable output
free -h
# Wide format (separates buffers and cache)
free -hw
# Continuous monitoring (every 2 seconds)
free -h -s 2
vmstat
# System-wide memory and CPU stats (every 2 seconds)
vmstat 2
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 0 0 1143280 284532 21847640 0 0 2 15 45 89 3 1 96 0 0
# Key columns:
# si (swap in) and so (swap out) - Should be near 0
# free - Unused RAM (low is normal)
# buff + cache - Reclaimable memory
# wa (wait) - CPU time waiting for I/O (high = disk pressure)
/proc/meminfo Deep Dive
# Key fields to monitor
cat /proc/meminfo | grep -E "^(MemTotal|MemAvailable|Dirty|Writeback|AnonPages|Mapped|Shmem|KernelStack|PageTables|Committed_AS|VmallocUsed|HugePages)"
# MemAvailable: How much memory is actually available
# AnonPages: Memory used by processes (not file-backed)
# Dirty: Pages waiting to be written to disk
# Committed_AS: Total memory committed (can exceed physical RAM)
# HugePages_Total: Huge pages allocated
Troubleshooting Memory Issues
Finding Memory Hogs
# Sort processes by memory usage (RSS)
ps aux --sort=-%mem | head -20
# More detailed: proportional set size (accounts for shared memory)
sudo smem -t -k -s pss | tail -20
# Per-process breakdown
cat /proc/$PID/status | grep -E "^(VmRSS|VmSwap|VmSize|RssAnon|RssFile|RssShmem)"
# Find processes using swap
for pid in $(ls /proc/ | grep -E '^[0-9]+$'); do
swap=$(grep VmSwap /proc/$pid/status 2>/dev/null | awk '{print $2}')
if [ -n "$swap" ] && [ "$swap" -gt 0 ]; then
name=$(cat /proc/$pid/comm 2>/dev/null)
echo "$swap kB - $name (PID: $pid)"
fi
done | sort -rn | head -10
Diagnosing Memory Leaks
# Track memory growth over time
while true; do
ps -o pid,rss,comm -p $PID
sleep 60
done >> memory_track.log
# Use valgrind for detailed leak detection (development)
valgrind --leak-check=full --show-leak-kinds=all ./myapp
# For production: check /proc/$PID/smaps for mapping growth
cat /proc/$PID/smaps_rollup
Huge Pages
Huge pages (2MB or 1GB instead of 4KB) reduce TLB (Translation Lookaside Buffer) misses for applications with large memory footprints, particularly databases:
# Check current huge pages configuration
cat /proc/meminfo | grep Huge
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
# Allocate 1024 huge pages (2GB total)
echo 1024 | sudo tee /proc/sys/vm/nr_hugepages
# Make persistent
echo "vm.nr_hugepages=1024" | sudo tee /etc/sysctl.d/99-hugepages.conf
# Transparent Huge Pages (THP) - automatic, no application changes
cat /sys/kernel/mm/transparent_hugepage/enabled
# [always] madvise never
# Disable THP (recommended for databases like MongoDB, Redis)
echo never | sudo tee /sys/kernel/mm/transparent_hugepage/enabled
Memory Management Best Practices
- Monitor
available, notfree: Low free memory is normal. Low available memory is a problem. - Set swap to 1-2x RAM on servers with SSDs. It provides a safety buffer without significant performance impact.
- Tune swappiness: Set to 10 for database servers, 60 for general workloads.
- Always set Docker memory limits: Prevent container memory leaks from affecting the host.
- Watch for swap I/O: If
vmstatshows constantsi/soactivity, you either need more RAM or need to find the memory hog. - Protect critical processes: Use
oom_score_adjto protect databases and other critical services. - Alert on available memory: Set monitoring alerts when available memory drops below 10-15% of total.
Linux memory management is designed to use every byte of RAM productively. Understanding the difference between "used by applications" and "used by cache" is fundamental to correctly interpreting system health. When combined with proper Docker memory limits and proactive monitoring through tools like usulnet, you can prevent memory-related incidents before they impact your services.