Bash Shell Scripting: From Basics to Production-Ready Scripts
Every Linux administrator writes shell scripts. The question is whether those scripts are reliable, maintainable tools or fragile hacks that break under unexpected conditions. The difference between a hobby script and a production-ready one comes down to error handling, input validation, proper quoting, and defensive coding practices.
This guide takes you from the fundamentals through to the practices that separate scripts that run in cron jobs on production servers from scripts that should never leave a developer's laptop.
The Foundation: Script Header
Every production script should start with these lines:
#!/usr/bin/env bash
set -euo pipefail
IFS=$'\n\t'
This is so important that it deserves a detailed explanation:
| Setting | Effect | Without It |
|---|---|---|
set -e |
Exit immediately on any command failure | Script silently continues after errors |
set -u |
Treat unset variables as errors | Unset variables expand to empty string (dangerous) |
set -o pipefail |
Pipeline fails if any command fails, not just the last | bad_cmd | good_cmd reports success |
IFS=$'\n\t' |
Safer word splitting (no space splitting) | Filenames with spaces break loops |
Variables and Quoting
Improper quoting is the single most common source of bugs in shell scripts:
# Always quote variable expansions
name="my file.txt"
cp "$name" /backup/ # Correct
cp $name /backup/ # WRONG: splits into "my" and "file.txt"
# Use curly braces for clarity
echo "Processing ${name} now"
echo "File: ${filename%.txt}.csv" # Parameter expansion
# Default values
DB_HOST="${DB_HOST:-localhost}" # Use default if unset
DB_PORT="${DB_PORT:=5432}" # Set and use default if unset
# Required variables
: "${API_KEY:?ERROR: API_KEY must be set}" # Exit with error if unset
# Read-only variables
readonly CONFIG_FILE="/etc/myapp/config.yaml"
declare -r LOG_DIR="/var/log/myapp"
# Arrays
servers=("web1" "web2" "web3")
echo "${servers[0]}" # First element
echo "${servers[@]}" # All elements
echo "${#servers[@]}" # Array length
for server in "${servers[@]}"; do
echo "Deploying to $server"
done
Conditionals
# String comparison (always use [[ ]] in Bash, not [ ])
if [[ "$status" == "running" ]]; then
echo "Service is running"
elif [[ "$status" == "stopped" ]]; then
echo "Service is stopped"
else
echo "Unknown status: $status"
fi
# Numeric comparison
if (( count > 10 )); then
echo "Count exceeds threshold"
fi
# File tests
if [[ -f "$config_file" ]]; then # File exists
source "$config_file"
fi
if [[ -d "$dir" ]]; then # Directory exists
echo "Directory found"
fi
if [[ -w "$file" ]]; then # File is writable
echo "Can write to $file"
fi
if [[ -z "$variable" ]]; then # Variable is empty
echo "Variable is empty"
fi
if [[ -n "$variable" ]]; then # Variable is not empty
echo "Variable has value: $variable"
fi
# Command success
if command -v docker &> /dev/null; then
echo "Docker is installed"
fi
# Pattern matching
if [[ "$filename" == *.tar.gz ]]; then
echo "Compressed archive"
fi
# Regex matching
if [[ "$email" =~ ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$ ]]; then
echo "Valid email format"
fi
Loops
# For loop over files (safe with spaces)
for file in /var/log/*.log; do
[[ -f "$file" ]] || continue # Skip if glob doesn't match
echo "Processing: $file"
done
# C-style for loop
for (( i=0; i<10; i++ )); do
echo "Iteration $i"
done
# While loop reading lines from a file
while IFS= read -r line; do
echo "Line: $line"
done < /etc/hosts
# While loop with command output
while IFS= read -r container; do
echo "Container: $container"
done < <(docker ps -q)
# Until loop
until docker ps &> /dev/null; do
echo "Waiting for Docker daemon..."
sleep 2
done
# Process substitution (avoid subshell variable scope issues)
count=0
while IFS= read -r line; do
(( count++ ))
done < <(docker ps -q)
echo "Total containers: $count" # Works! (wouldn't with pipe)
Functions
# Function definition
log() {
local level="$1"
shift
echo "[$(date '+%Y-%m-%d %H:%M:%S')] [$level] $*" >&2
}
log INFO "Starting deployment"
log ERROR "Connection failed"
# Function with return value
is_container_running() {
local name="$1"
docker ps --filter "name=$name" --filter "status=running" -q | grep -q .
}
if is_container_running "nginx"; then
log INFO "nginx is running"
fi
# Function with output capture
get_container_ip() {
local name="$1"
docker inspect -f '{{range.NetworkSettings.Networks}}{{.IPAddress}}{{end}}' "$name"
}
ip=$(get_container_ip "nginx")
echo "nginx IP: $ip"
# Local variables (always use local in functions)
process_file() {
local file="$1"
local -r max_size=1048576 # Read-only local
local size
size=$(stat -c %s "$file" 2>/dev/null) || {
log ERROR "Cannot stat: $file"
return 1
}
if (( size > max_size )); then
log WARN "File too large: $file ($size bytes)"
return 1
fi
echo "$file"
}
Error Handling
# Trap for cleanup on exit
cleanup() {
local exit_code=$?
log INFO "Cleaning up..."
rm -f "$TEMP_FILE"
[[ -d "$TEMP_DIR" ]] && rm -rf "$TEMP_DIR"
exit "$exit_code"
}
trap cleanup EXIT
# Trap for specific signals
on_sigint() {
log WARN "Interrupted by user"
exit 130
}
trap on_sigint INT
# Create safe temp files
TEMP_FILE=$(mktemp /tmp/myapp.XXXXXX)
TEMP_DIR=$(mktemp -d /tmp/myapp.XXXXXX)
# Error handling patterns
do_critical_thing() {
some_command || {
log ERROR "Critical operation failed"
exit 1
}
}
# Retry logic
retry() {
local max_attempts="$1"
local delay="$2"
shift 2
local attempt=1
until "$@"; do
if (( attempt >= max_attempts )); then
log ERROR "Failed after $max_attempts attempts: $*"
return 1
fi
log WARN "Attempt $attempt failed. Retrying in ${delay}s..."
sleep "$delay"
(( attempt++ ))
done
}
# Usage: retry 5 10 docker pull nginx:latest
retry 5 10 docker pull nginx:latest
Argument Parsing
#!/usr/bin/env bash
set -euo pipefail
# Script metadata
readonly SCRIPT_NAME="$(basename "$0")"
readonly SCRIPT_VERSION="1.0.0"
# Default values
VERBOSE=false
DRY_RUN=false
CONFIG_FILE="/etc/myapp/config.yaml"
OUTPUT_DIR="."
usage() {
cat << EOF
Usage: $SCRIPT_NAME [OPTIONS]
Commands:
deploy Deploy the application
backup Create a backup
status Show current status
Options:
-c, --config FILE Configuration file (default: $CONFIG_FILE)
-o, --output DIR Output directory (default: $OUTPUT_DIR)
-v, --verbose Enable verbose output
-n, --dry-run Show what would be done without doing it
-h, --help Show this help message
--version Show version
Examples:
$SCRIPT_NAME deploy
$SCRIPT_NAME -c /opt/myapp/config.yaml backup
$SCRIPT_NAME --verbose --dry-run deploy
EOF
}
# Parse arguments
while [[ $# -gt 0 ]]; do
case "$1" in
-c|--config)
CONFIG_FILE="$2"
shift 2
;;
-o|--output)
OUTPUT_DIR="$2"
shift 2
;;
-v|--verbose)
VERBOSE=true
shift
;;
-n|--dry-run)
DRY_RUN=true
shift
;;
-h|--help)
usage
exit 0
;;
--version)
echo "$SCRIPT_NAME $SCRIPT_VERSION"
exit 0
;;
-*)
echo "Error: Unknown option: $1" >&2
usage >&2
exit 1
;;
*)
COMMAND="$1"
shift
break
;;
esac
done
# Validate required arguments
if [[ -z "${COMMAND:-}" ]]; then
echo "Error: No command specified" >&2
usage >&2
exit 1
fi
# Validate config file exists
if [[ ! -f "$CONFIG_FILE" ]]; then
echo "Error: Config file not found: $CONFIG_FILE" >&2
exit 1
fi
Logging
# Structured logging function
LOG_FILE="/var/log/myapp/script.log"
log() {
local level="$1"
shift
local timestamp
timestamp=$(date '+%Y-%m-%d %H:%M:%S')
local message="[$timestamp] [$level] $*"
# Always write to log file
echo "$message" >> "$LOG_FILE"
# Write to stderr based on level
case "$level" in
ERROR|WARN)
echo "$message" >&2
;;
INFO)
echo "$message" >&2
;;
DEBUG)
if [[ "$VERBOSE" == true ]]; then
echo "$message" >&2
fi
;;
esac
}
# Usage
log INFO "Deployment started"
log DEBUG "Using config: $CONFIG_FILE"
log WARN "Disk space below 20%"
log ERROR "Failed to connect to database"
Parallel Execution
# Simple background parallelism
deploy_to_server() {
local server="$1"
log INFO "Deploying to $server..."
ssh "$server" "cd /opt/app && docker compose pull && docker compose up -d"
log INFO "Deployed to $server"
}
servers=("web1" "web2" "web3" "web4")
# Run deployments in parallel
pids=()
for server in "${servers[@]}"; do
deploy_to_server "$server" &
pids+=($!)
done
# Wait for all to complete and check results
failed=0
for pid in "${pids[@]}"; do
if ! wait "$pid"; then
(( failed++ ))
fi
done
if (( failed > 0 )); then
log ERROR "$failed deployments failed"
exit 1
fi
# GNU Parallel (if available, more powerful)
parallel --jobs 4 --halt soon,fail=1 \
deploy_to_server ::: "${servers[@]}"
# xargs parallelism
printf '%s\n' "${servers[@]}" | xargs -P 4 -I {} \
ssh {} "docker compose -f /opt/app/docker-compose.yml pull"
Common Patterns
Lock File (Prevent Concurrent Execution)
LOCK_FILE="/var/run/myapp-deploy.lock"
acquire_lock() {
if ! mkdir "$LOCK_FILE" 2>/dev/null; then
local pid
pid=$(cat "$LOCK_FILE/pid" 2>/dev/null || echo "unknown")
log ERROR "Another instance is running (PID: $pid)"
exit 1
fi
echo $$ > "$LOCK_FILE/pid"
trap 'rm -rf "$LOCK_FILE"' EXIT
}
acquire_lock
Progress Indicator
spinner() {
local pid=$1
local spin='|/-\'
local i=0
while kill -0 "$pid" 2>/dev/null; do
printf "\r[%c] Working..." "${spin:i++%${#spin}:1}"
sleep 0.2
done
printf "\r[done] \n"
}
long_running_command &
spinner $!
Configuration File Parsing
# Parse simple KEY=VALUE config files
load_config() {
local config_file="$1"
if [[ ! -f "$config_file" ]]; then
log ERROR "Config file not found: $config_file"
return 1
fi
while IFS='=' read -r key value; do
# Skip comments and empty lines
[[ "$key" =~ ^#.*$ || -z "$key" ]] && continue
# Trim whitespace
key=$(echo "$key" | xargs)
value=$(echo "$value" | xargs)
# Remove quotes
value="${value%\"}"
value="${value#\"}"
# Export as environment variable
export "$key=$value"
done < "$config_file"
}
load_config "/etc/myapp/app.conf"
ShellCheck: Your Best Friend
# Install ShellCheck
pacman -S shellcheck # Arch
apt install shellcheck # Debian/Ubuntu
# Check a script
shellcheck myscript.sh
# Check with specific shell
shellcheck -s bash myscript.sh
# Ignore specific warnings
shellcheck -e SC2034,SC2086 myscript.sh
# In-script directives
# shellcheck disable=SC2034
unused_variable="this is intentional"
# Common ShellCheck warnings to understand:
# SC2086 - Double quote to prevent globbing/splitting
# SC2034 - Variable appears unused
# SC2155 - Declare and assign separately to avoid masking return values
# SC2164 - Use cd ... || exit in case cd fails
# SC2206 - Quote to prevent word splitting on array assignment
A Complete Production Script Example
#!/usr/bin/env bash
#
# docker-deploy.sh - Deploy Docker Compose applications with rollback support
#
set -euo pipefail
IFS=$'\n\t'
readonly SCRIPT_NAME="$(basename "$0")"
readonly SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
readonly LOG_FILE="/var/log/docker-deploy.log"
# Defaults
COMPOSE_FILE="docker-compose.yml"
ROLLBACK_ENABLED=true
HEALTH_CHECK_TIMEOUT=60
VERBOSE=false
log() {
local level="$1"; shift
echo "[$(date '+%Y-%m-%d %H:%M:%S')] [$level] $*" | tee -a "$LOG_FILE" >&2
}
cleanup() {
local exit_code=$?
rm -f "${TEMP_FILE:-}"
if (( exit_code != 0 )) && [[ "$ROLLBACK_ENABLED" == true ]]; then
log WARN "Deployment failed. Initiating rollback..."
rollback
fi
exit "$exit_code"
}
trap cleanup EXIT
trap 'log WARN "Interrupted"; exit 130' INT TERM
usage() {
cat << EOF
Usage: $SCRIPT_NAME [OPTIONS]
Deploy a Docker Compose application with health checking and rollback.
Options:
-f FILE Compose file name (default: $COMPOSE_FILE)
-t SECONDS Health check timeout (default: $HEALTH_CHECK_TIMEOUT)
--no-rollback Disable automatic rollback on failure
-v, --verbose Verbose output
-h, --help Show this help
EOF
}
rollback() {
if [[ -f "${BACKUP_COMPOSE:-}" ]]; then
log INFO "Rolling back to previous version..."
docker compose -f "$BACKUP_COMPOSE" up -d
fi
}
health_check() {
local timeout="$1"
local elapsed=0
while (( elapsed < timeout )); do
if docker compose -f "$COMPOSE_FILE" ps --status running -q | grep -q .; then
local unhealthy
unhealthy=$(docker compose -f "$COMPOSE_FILE" ps | grep -c "unhealthy" || true)
if (( unhealthy == 0 )); then
log INFO "Health check passed"
return 0
fi
fi
sleep 5
(( elapsed += 5 ))
log DEBUG "Health check: ${elapsed}s / ${timeout}s"
done
log ERROR "Health check timed out after ${timeout}s"
return 1
}
# Parse arguments
while [[ $# -gt 0 ]]; do
case "$1" in
-f) COMPOSE_FILE="$2"; shift 2 ;;
-t) HEALTH_CHECK_TIMEOUT="$2"; shift 2 ;;
--no-rollback) ROLLBACK_ENABLED=false; shift ;;
-v|--verbose) VERBOSE=true; shift ;;
-h|--help) usage; exit 0 ;;
-*) log ERROR "Unknown option: $1"; usage >&2; exit 1 ;;
*) PROJECT_DIR="$1"; shift; break ;;
esac
done
: "${PROJECT_DIR:?ERROR: Project directory required}"
cd "$PROJECT_DIR"
log INFO "Deploying $PROJECT_DIR"
# Pull new images
log INFO "Pulling images..."
docker compose -f "$COMPOSE_FILE" pull
# Backup current state
TEMP_FILE=$(mktemp)
docker compose -f "$COMPOSE_FILE" config > "$TEMP_FILE"
BACKUP_COMPOSE="$TEMP_FILE"
# Deploy
log INFO "Starting deployment..."
docker compose -f "$COMPOSE_FILE" up -d --remove-orphans
# Health check
health_check "$HEALTH_CHECK_TIMEOUT"
log INFO "Deployment completed successfully"
Shell scripts are the glue of Linux administration, and they are often the first automation tool used to manage Docker infrastructure. When your scripts grow beyond a few hundred lines or need complex logic, consider whether a tool like usulnet might handle the task more reliably through its built-in automation features.
The rule of production scripts: If a script runs in cron or is triggered by an automated system, it must have
set -euo pipefail, proper logging, error handling with cleanup traps, and pass ShellCheck with zero warnings. Scripts without these safeguards will fail silently, and you will not know until it is too late.