Server access management at scale is one of the most consequential infrastructure challenges. Every server you deploy is a potential entry point for attackers. Every user with SSH access is a potential vector for compromise. Every unaudited privileged action is a compliance gap. The difference between a secure infrastructure and a breached one often comes down to how rigorously you manage who can access what, how they authenticate, and whether you can reconstruct exactly what happened after the fact.

This guide covers the full stack of server access management: from SSH key best practices for individual administrators to SSH certificate authorities for organizations, RBAC patterns with sudo, PAM configuration, LDAP integration, and comprehensive audit logging including full session recording.

SSH Key Management at Scale

Key Generation Best Practices

# Generate an Ed25519 key (recommended - fast, secure, short)
ssh-keygen -t ed25519 -C "[email protected]" -f ~/.ssh/id_ed25519

# Generate an RSA key (for compatibility with older systems)
ssh-keygen -t rsa -b 4096 -C "[email protected]" -f ~/.ssh/id_rsa

# Always use a passphrase for private keys
# Use ssh-agent to avoid typing it repeatedly
eval "$(ssh-agent -s)"
ssh-add ~/.ssh/id_ed25519
Key Type Recommended Security Compatibility
Ed25519 Yes (default choice) Excellent (128-bit equivalent) OpenSSH 6.5+
RSA 4096 Yes (for compatibility) Good Universal
ECDSA Acceptable Good (if curve is trusted) OpenSSH 5.7+
DSA No Deprecated Legacy only
RSA 1024/2048 No Insufficient Universal

The Problem with authorized_keys at Scale

The traditional approach of managing ~/.ssh/authorized_keys files on every server breaks down quickly:

  • Adding a new team member requires touching every server
  • Revoking access for a departing employee requires touching every server
  • Key rotation requires coordinated updates across all servers
  • There is no central audit trail of who has access to what
  • Stale keys accumulate and are never cleaned up
Warning: The number one server access vulnerability in small teams is forgotten authorized_keys entries for former team members. If you do not have a process for revoking SSH access when someone leaves, you have a permanent backdoor on every server they ever accessed.

SSH Certificates: The Scalable Solution

SSH certificates solve the key distribution problem. Instead of distributing public keys to every server, you set up a Certificate Authority (CA). Servers trust the CA. Users present certificates signed by the CA. Revoking access means revoking the certificate, not touching any server.

# Step 1: Create an SSH Certificate Authority
ssh-keygen -t ed25519 -f /etc/ssh/ca_user_key -C "SSH User CA"
ssh-keygen -t ed25519 -f /etc/ssh/ca_host_key -C "SSH Host CA"

# Step 2: Configure servers to trust the user CA
# /etc/ssh/sshd_config
TrustedUserCAKeys /etc/ssh/ca_user_key.pub

# Step 3: Sign a user's public key (creates a certificate)
ssh-keygen -s /etc/ssh/ca_user_key \
  -I "[email protected]" \
  -n deploy,admin \
  -V +12h \
  -z 1001 \
  ~/.ssh/id_ed25519.pub

# This creates ~/.ssh/id_ed25519-cert.pub
# -I: Key identity (for audit logs)
# -n: Principals (authorized usernames on the server)
# -V: Validity period (12 hours)
# -z: Serial number (for revocation)

# Step 4: User connects normally (certificate is used automatically)
ssh [email protected]

# Step 5: Inspect a certificate
ssh-keygen -L -f ~/.ssh/id_ed25519-cert.pub
# Type: [email protected] user certificate
# Public key: ED25519-CERT SHA256:...
# Signing CA: ED25519 SHA256:... (using ssh-ed25519)
# Key ID: "[email protected]"
# Serial: 1001
# Valid: from 2025-05-03T08:00:00 to 2025-05-03T20:00:00
# Principals: deploy, admin

Host Certificates (Eliminating TOFU)

# Sign the server's host key
ssh-keygen -s /etc/ssh/ca_host_key \
  -I "server.example.com" \
  -h \
  -n server.example.com,10.0.0.5 \
  -V +365d \
  /etc/ssh/ssh_host_ed25519_key.pub

# Configure the server to present its certificate
# /etc/ssh/sshd_config
HostCertificate /etc/ssh/ssh_host_ed25519_key-cert.pub

# Configure clients to trust the host CA
# ~/.ssh/known_hosts (or /etc/ssh/ssh_known_hosts)
@cert-authority *.example.com ssh-ed25519 AAAA... (CA public key)

# Now clients never see "The authenticity of host ... can't be established"
# They verify the host certificate against the trusted CA

Certificate Revocation

# Create a Key Revocation List (KRL)
ssh-keygen -k -f /etc/ssh/revoked_keys -z 1001 /path/to/compromised_key.pub

# Or revoke by serial number
ssh-keygen -k -f /etc/ssh/revoked_keys -z 1001

# Configure sshd to check the KRL
# /etc/ssh/sshd_config
RevokedKeys /etc/ssh/revoked_keys

# The KRL is a compact binary format
# Much more efficient than listing individual keys

Automated Certificate Signing with step-ca

# Use Smallstep's step-ca for automated SSH certificate management
step ca init --ssh

# Users request short-lived certificates
step ssh certificate [email protected] ~/.ssh/id_ed25519 \
  --provisioner admin

# The certificate is valid for a short period (e.g., 16 hours)
# No keys stored on servers, no authorized_keys to manage
# Revocation is handled by short certificate lifetimes

Sudo Configuration

Sudo is the primary mechanism for granting and controlling privileged access on Linux. Proper configuration is critical for both security and auditability.

# /etc/sudoers.d/10-admin-group
# Allow admin group full sudo access (with password)
%admin ALL=(ALL:ALL) ALL

# /etc/sudoers.d/20-deploy-group
# Allow deploy group to manage Docker without password
%deploy ALL=(ALL) NOPASSWD: /usr/bin/docker, /usr/bin/docker-compose
%deploy ALL=(ALL) NOPASSWD: /usr/bin/systemctl restart myapp
%deploy ALL=(ALL) NOPASSWD: /usr/bin/systemctl status *

# /etc/sudoers.d/30-monitoring-group
# Allow monitoring group read-only system access
%monitoring ALL=(ALL) NOPASSWD: /usr/bin/journalctl, /usr/bin/systemctl status *
%monitoring ALL=(ALL) NOPASSWD: /usr/bin/docker ps, /usr/bin/docker logs *
%monitoring ALL=(ALL) NOPASSWD: /usr/bin/docker stats --no-stream *

# /etc/sudoers.d/99-security
# Security hardening
Defaults    requiretty
Defaults    use_pty
Defaults    logfile="/var/log/sudo.log"
Defaults    log_input, log_output
Defaults    iolog_dir="/var/log/sudo-io/%{seq}"
Defaults    passwd_timeout=1
Defaults    timestamp_timeout=5
Defaults    secure_path="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
Tip: The log_input, log_output directives record the complete input and output of every sudo session. Combined with iolog_dir, this creates a full audit trail of every privileged command and its output, which can be replayed with sudoreplay.
# Replay a sudo session
sudo sudoreplay -l          # List recorded sessions
sudo sudoreplay 000001      # Replay session #1

# Search sessions
sudo sudoreplay -l user=deploy command="docker"

PAM Modules

Pluggable Authentication Modules (PAM) provide a flexible framework for authentication policy. You can chain multiple authentication methods, enforce MFA, and integrate with external identity providers.

# /etc/pam.d/sshd - SSH authentication stack

# Standard Unix authentication
auth    required    pam_env.so
auth    required    pam_unix.so

# Google Authenticator (TOTP 2FA)
auth    required    pam_google_authenticator.so

# Deny access after too many failures
auth    required    pam_faildelay.so delay=3000000
auth    required    pam_tally2.so deny=5 unlock_time=900

# Account management
account required    pam_unix.so
account required    pam_nologin.so

# Session configuration
session required    pam_unix.so
session required    pam_limits.so
session optional    pam_lastlog.so

Setting Up TOTP 2FA for SSH

# Install Google Authenticator PAM module
sudo apt install libpam-google-authenticator

# Each user sets up their TOTP
google-authenticator
# Generates a QR code and emergency scratch codes

# Configure SSH to require both key AND TOTP
# /etc/ssh/sshd_config
AuthenticationMethods publickey,keyboard-interactive
ChallengeResponseAuthentication yes

# /etc/pam.d/sshd
auth required pam_google_authenticator.so nullok
# nullok allows users who haven't set up 2FA to still log in
# Remove nullok once all users have enrolled

LDAP Integration

For organizations with an existing directory service, LDAP integration centralizes user management:

# Install LDAP client and NSS modules
sudo apt install libnss-ldapd libpam-ldapd nscd

# /etc/nslcd.conf
uri ldaps://ldap.example.com
base dc=example,dc=com
ssl on
tls_reqcert demand
tls_cacertfile /etc/ssl/certs/ca-certificates.crt

# Map LDAP groups to system groups
map group cn memberOf

# /etc/nsswitch.conf
passwd: files ldap
group:  files ldap
shadow: files ldap

# PAM LDAP configuration
# /etc/pam.d/common-auth
auth    sufficient  pam_ldap.so
auth    required    pam_unix.so try_first_pass

# Test LDAP authentication
id ldapuser
getent passwd ldapuser

SSSD (Recommended Over nslcd)

# Install SSSD
sudo apt install sssd sssd-ldap

# /etc/sssd/sssd.conf
[sssd]
services = nss, pam, ssh
domains = example.com

[domain/example.com]
id_provider = ldap
auth_provider = ldap
ldap_uri = ldaps://ldap.example.com
ldap_search_base = dc=example,dc=com
ldap_tls_reqcert = demand
ldap_tls_cacert = /etc/ssl/certs/ca-certificates.crt
cache_credentials = true
enumerate = false

# SSH public key lookup from LDAP
ldap_user_ssh_public_key = sshPublicKey

# /etc/ssh/sshd_config
AuthorizedKeysCommand /usr/bin/sss_ssh_authorizedkeys
AuthorizedKeysCommandUser nobody

SSSD is preferred over nslcd/pam_ldap because it provides caching (users can log in even if LDAP is temporarily unreachable), supports multiple identity providers, and integrates directly with SSH for public key lookup from the directory.

RBAC Patterns for Server Access

Define clear access tiers that map to organizational roles:

Role SSH Access Sudo Access Use Case
Viewer Read-only commands journalctl, docker ps, systemctl status Support, monitoring
Operator Service management docker, systemctl restart, deployment scripts DevOps, on-call
Admin Full access ALL (with password) Infrastructure team
Emergency Break-glass only ALL NOPASSWD Sealed credentials for emergencies
# Implement with system groups
sudo groupadd server-viewer
sudo groupadd server-operator
sudo groupadd server-admin

# Assign users to groups
sudo usermod -aG server-operator deploy-user
sudo usermod -aG server-admin infra-admin

# Restrict SSH access by group
# /etc/ssh/sshd_config
AllowGroups server-viewer server-operator server-admin

# Or use Match blocks for group-specific SSH settings
Match Group server-viewer
    ForceCommand /usr/local/bin/restricted-shell.sh
    AllowTcpForwarding no
    X11Forwarding no

Match Group server-operator
    AllowTcpForwarding yes
    X11Forwarding no

Session Recording

For compliance and incident investigation, record complete SSH sessions:

Using script Command

# /etc/profile.d/session-recording.sh
if [ -n "$SSH_CONNECTION" ] && [ -z "$SESSION_RECORDING" ]; then
  export SESSION_RECORDING=1
  SESSION_LOG="/var/log/sessions/$(whoami)_$(date +%Y%m%d_%H%M%S)_$$"
  mkdir -p /var/log/sessions
  script -q -f -t 2>"${SESSION_LOG}.timing" "${SESSION_LOG}.log"
  exit
fi

Using tlog (Recommended)

# Install tlog
sudo apt install tlog

# Configure tlog as the login shell for audited users
# /etc/tlog/tlog-rec-session.conf
{
    "shell": "/bin/bash",
    "notice": "\\nATTENTION: Your session is being recorded.\\n",
    "writer": "journal",
    "journal": {
        "priority": "info",
        "augment": true
    }
}

# Set tlog as the shell for specific users or groups
sudo usermod -s /usr/bin/tlog-rec-session audited-user

# Or configure via SSSD for LDAP users
# /etc/sssd/sssd.conf
[session_recording]
scope = some
users = admin1, admin2
groups = server-admin

# Replay a session
tlog-play -r journal -M TLOG_REC=

# Search for sessions
journalctl -o verbose TLOG_REC=* | grep TLOG_REC

Automation with Ansible

Automate the deployment and maintenance of access controls across your fleet:

# roles/ssh-hardening/tasks/main.yml
---
- name: Configure SSH daemon
  template:
    src: sshd_config.j2
    dest: /etc/ssh/sshd_config
    owner: root
    group: root
    mode: '0600'
    validate: '/usr/sbin/sshd -t -f %s'
  notify: restart sshd

- name: Deploy SSH CA public key
  copy:
    content: "{{ ssh_ca_user_public_key }}"
    dest: /etc/ssh/ca_user_key.pub
    owner: root
    group: root
    mode: '0644'

- name: Create access groups
  group:
    name: "{{ item }}"
    state: present
  loop:
    - server-viewer
    - server-operator
    - server-admin

- name: Configure sudo rules
  template:
    src: "{{ item }}.j2"
    dest: "/etc/sudoers.d/{{ item }}"
    owner: root
    group: root
    mode: '0440'
    validate: '/usr/sbin/visudo -cf %s'
  loop:
    - 10-admin-group
    - 20-deploy-group
    - 30-monitoring-group
    - 99-security

- name: Deploy audit rules for SSH
  template:
    src: audit-ssh.rules.j2
    dest: /etc/audit/rules.d/50-ssh.rules
  notify: restart auditd

- name: Install and configure tlog
  package:
    name: tlog
    state: present

- name: Configure tlog for session recording
  template:
    src: tlog-rec-session.conf.j2
    dest: /etc/tlog/tlog-rec-session.conf
# templates/sshd_config.j2
# Hardened SSH configuration
Port {{ ssh_port | default(22) }}
ListenAddress 0.0.0.0

# Authentication
PermitRootLogin no
PasswordAuthentication no
PubkeyAuthentication yes
AuthenticationMethods publickey
TrustedUserCAKeys /etc/ssh/ca_user_key.pub

# Access control
AllowGroups server-viewer server-operator server-admin

# Security
MaxAuthTries 3
MaxSessions 5
LoginGraceTime 30
ClientAliveInterval 300
ClientAliveCountMax 2
X11Forwarding no
AllowAgentForwarding no
AllowTcpForwarding no

# Logging
LogLevel VERBOSE
SyslogFacility AUTH

# Hardened crypto
KexAlgorithms curve25519-sha256,[email protected]
Ciphers [email protected],[email protected]
MACs [email protected],[email protected]
HostKeyAlgorithms ssh-ed25519,[email protected]
# Run the playbook across your fleet
ansible-playbook -i inventory.yml site.yml --tags ssh-hardening

# Verify SSH configuration
ansible all -m shell -a "sshd -T | grep -E '(permitrootlogin|passwordauthentication|trustedusercakeys)'"

Privileged Access Management (PAM) Workflow

  1. Request: User requests access to a specific server for a defined purpose and duration
  2. Approve: Manager or security team approves the request
  3. Grant: System issues a short-lived SSH certificate with the appropriate principals
  4. Use: User connects, and the session is recorded
  5. Expire: Certificate expires automatically after the approved duration
  6. Audit: All sessions are available for review
Tip: Short-lived SSH certificates (8-16 hours) are the single most impactful improvement you can make to your server access security. They eliminate the need for key revocation, prevent stale access, and create a natural audit point at each renewal. Combined with session recording, they provide complete accountability for server access.

Server access management is a discipline, not a one-time configuration. Regular access reviews, automated key rotation, centralized logging, and session recording form the foundation of a secure access model. The tools exist to implement every aspect described in this guide. The challenge is organizational discipline in deploying and maintaining them consistently across your infrastructure.