← Back to blog
Linux ARM Performance
 ·  6 min read

Linux performance tuning for low-power ARM servers: a practical guide

Squeeze every bit of performance from your Pi: CPU governor, memory management, swap configuration, I/O scheduler tuning, and monitoring with lightweight tools — all tested on a Raspberry Pi 5 running a 24/7 production workload.

Context: why tuning matters on ARM

A Raspberry Pi 5 is not a server in the traditional sense. It has 4 ARM Cortex-A76 cores, up to 8GB of LPDDR4X RAM, and no ECC memory. The kernel's default configuration is optimized for general desktop use — not for a headless server running continuous background processes.

The goal of these tweaks is not to squeeze maximum raw performance, but to ensure consistent, predictable behavior under sustained load: no unexpected thermal throttling, no I/O stalls, no OOM kills at 3am when no one is watching.

CPU governor

The CPU governor controls how the kernel scales clock frequency in response to load. Check the current governor:

cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
# default: ondemand or powersave

For a server with mostly idle but occasionally bursty workloads (like a trading bot that wakes up every 4 hours), ondemand is generally the right choice — it ramps up quickly when needed and backs off to save power when idle.

Governor Behavior Best for
powersave Always minimum frequency Battery / maximum power saving
ondemand Scales up fast on load, down when idle Most server workloads ✓
performance Always maximum frequency Latency-critical / benchmarking
schedutil Kernel scheduler-integrated scaling Modern kernels, general purpose
# Set governor for all cores
echo ondemand | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

# Make persistent via cpufrequtils
sudo apt install cpufrequtils
echo 'GOVERNOR="ondemand"' | sudo tee /etc/default/cpufrequtils

Temperature and thermal throttling

The Pi 5 will start throttling the CPU at 80°C. Monitor the thermal zone:

# Current CPU temperature
cat /sys/class/thermal/thermal_zone0/temp
# Returns value in millidegrees: 46300 = 46.3°C

# Watch temperature in real time
watch -n2 'vcgencmd measure_temp'

If you consistently see temperatures above 70°C under load, throttling is likely degrading performance. Solutions in order of invasiveness:

  1. Heatsink on the SoC (drops idle 10–15°C)
  2. Thermal pad between the SoC and the case
  3. Active cooling (small 5V fan triggered by temperature)
  4. Reduce arm_freq in /boot/config.txt if passive cooling is not enough
Pi 5 note

The Pi 5 runs noticeably hotter than the Pi 4 at the same workload due to the faster SoC. A passive heatsink that was sufficient on Pi 4 may not be enough on Pi 5. After adding a 20mm aluminum heatsink, idle temperature dropped from 58°C to 44°C on this setup.

Memory and swap

vm.swappiness

The default swappiness=60 tells the kernel to start using swap when RAM is 40% used. For a server with 8GB RAM running a workload that peaks at ~600MB, this is too aggressive — it wastes I/O bandwidth swapping out pages that could stay in RAM.

# Check current value
cat /proc/sys/vm/swappiness

# Reduce to avoid unnecessary swapping
sudo sysctl vm.swappiness=10

# Persist
echo 'vm.swappiness=10' | sudo tee -a /etc/sysctl.d/99-tuning.conf

Swap on NVMe vs SD card

If your Pi boots from an SD card, avoid swap entirely or move it to an NVMe SSD. SD cards have limited write cycles and extremely poor random I/O performance. A swap storm on an SD card will both destroy the card and make the system unresponsive.

# Check if swap is on SD (mmcblk) or NVMe (nvme0n1)
swapon --show

# If on SD, consider disabling swap or using zram instead
sudo systemctl disable dphys-swapfile
sudo swapoff -a

zram: RAM-compressed swap

<code>zram</code> creates a compressed block device in RAM that acts as swap. It's significantly faster than disk-based swap and is ideal for Pi setups where you want swap as a safety net without the I/O penalty:

sudo apt install zram-tools
sudo systemctl enable zramswap
sudo systemctl start zramswap

I/O scheduler

Linux supports multiple I/O schedulers. The optimal choice depends on your storage hardware:

Scheduler Best for
mq-deadline NVMe SSD / fast storage (no seek penalty)
bfq SD card / slow rotational storage (prioritizes fairness)
none NVMe with hardware queue (let the device handle ordering)
# Check current scheduler per device
cat /sys/block/nvme0n1/queue/scheduler
# [none] mq-deadline kyber bfq

# Set scheduler
echo mq-deadline | sudo tee /sys/block/nvme0n1/queue/scheduler

# Persist via udev rule
echo 'ACTION=="add|change", KERNEL=="nvme*", ATTR{queue/scheduler}="none"' | \
  sudo tee /etc/udev/rules.d/60-scheduler.rules

Systemd journal size

By default, journald can consume several GB of disk space over time. On a Pi with limited storage, constrain it:

# Edit /etc/systemd/journald.conf
sudo tee -a /etc/systemd/journald.conf << EOF
SystemMaxUse=200M
SystemKeepFree=500M
MaxRetentionSec=2week
EOF

sudo systemctl restart systemd-journald

Lightweight monitoring stack

Avoid heavy monitoring solutions (Prometheus + Grafana stack requires 500MB+ RAM). For a Pi homelab, these tools cover 95% of needs with negligible overhead:

Tool Use Install
htop Interactive CPU/memory overview sudo apt install htop
vmstat Memory, swap, I/O stats over time Built-in (procps)
iostat Disk throughput per device sudo apt install sysstat
nethogs Per-process network bandwidth sudo apt install nethogs
ncdu Disk usage breakdown sudo apt install ncdu

For continuous monitoring without a full stack, a simple cron job writing key metrics to a JSON endpoint (CPU, RAM, temp, uptime, disk) is often enough — and can feed a live dashboard on your portfolio site.

# Quick system snapshot
vmstat -s | grep -E 'memory|swap|cpu'
iostat -x 1 3 /dev/nvme0n1

Network stack tuning

For a server handling multiple concurrent connections (VPN peers, web requests, Telegram long-polling, API calls), a few sysctl tweaks improve throughput:

sudo tee -a /etc/sysctl.d/99-tuning.conf << EOF
# Increase TCP buffer sizes
net.core.rmem_max=16777216
net.core.wmem_max=16777216
net.ipv4.tcp_rmem=4096 87380 16777216
net.ipv4.tcp_wmem=4096 65536 16777216

# Enable BBR congestion control (Pi 5 kernel supports it)
net.core.default_qdisc=fq
net.ipv4.tcp_congestion_control=bbr
EOF

sudo sysctl --system

Putting it all together

After applying these settings, the Pi 5 running a multi-threaded Python trading bot (9 concurrent threads), WireGuard VPN (2 active peers), Apache2 with SSL, and fail2ban holds at:

Key takeaway

The biggest performance gains on a Pi homelab come from thermal management (heatsink) and storage choices (NVMe over SD). CPU and network tuning are secondary. Fix the thermal bottleneck first — everything else is incremental.

← Back to blog 🔗 View on GitHub