Bassam Ismail
Personal

Trusting a home server: a local model, monitoring, and backups

16 min read

There is a moment that decides whether a home server is a toy or infrastructure: the first time it breaks while you are not looking, and you find out from the machine instead of from a person asking why the photos are gone. Getting to that moment took two things I added last, after the fun parts were already running. One is a language model that lives entirely on the box, so the jobs that read my messages and money never send them anywhere. The other is the unglamorous layer of monitoring and backups that means I am the first to know when something is wrong. This is the part about trusting the machine.

This is the final part of a series on running a home server on a spare MacBook. The machine is built, reachable, full of background jobs, media, and a dashboard. What was missing was the confidence to actually depend on it.

TL;DR

Running a local LLM on a home server keeps sensitive data like bank texts and messages from routing through third-party APIs. The author uses LM Studio with a Gemma MLX 4-bit model on Apple Silicon, fronted by Caddy, so every summarization and categorization job stays fully on-box. Monitoring layers Gatus, Uptime Kuma, and Beszel into overlapping watchers that push alerts via a self-hosted ntfy server, so the machine reports failures before any person does. Nightly rsync-over-SSH backups with hard-link snapshots go to a second local Mac, and the author tests restores deliberately, because an untested backup is a rumor.

A model on the box, because the data should not leave it

Several of the launchd jobs from earlier in the series want a little intelligence. Summarize the week. Categorize an odd transaction the bank texted me about. Draft a reply I will probably rewrite anyway. The obvious way to add that is an API call to a hosted model. I deliberately did not, for one reason that overrides convenience: those jobs read my bank texts, my messages, and my calendar. Sending that to someone else's server to save myself some effort is exactly the trade the whole private-by-default setup exists to avoid. A server with no public address, holding my own data, is undercut the moment that data routes through a third party on every summarization.

So the box runs its own model. The runtime is LM Studio, which is a native macOS app rather than a container. That matters: model inference wants the GPU and the Mac's unified memory directly, and a Linux container on Apple Silicon would be fighting the hardware through a virtualization layer. So this is the one piece of the stack that lives outside OrbStack on purpose. LM Studio binds its server to 0.0.0.0:1234, and Caddy fronts it the same way it fronts everything else, reaching back out of the container to the host:

chat.home.example.dev {
    reverse_proxy host.docker.internal:1234
}

host.docker.internal is the one bit of glue worth calling out: it is how a container reaches a service running on the host Mac, which is exactly the bridge a native model app and a containerized proxy need. From inside the tailnet the model now answers at chat.home.example.dev with a real certificate, like any other service.

The model itself is a Gemma model in MLX 4-bit. MLX is Apple's array framework for Apple-Silicon unified memory, which in practice means the model runs on the GPU without copying weights back and forth across a CPU/GPU boundary that does not really exist on this hardware. The 4-bit quantization brings it down to about 5.25 GB, which fits comfortably alongside everything else the laptop is doing. I run it with a modest context window and the GPU fully engaged:

lms load gemma --context-length 8192 --gpu max

What makes this usable from the rest of the system is that LM Studio exposes an OpenAI-compatible endpoint. The jobs do not know or care that the model is local; they POST /v1/chat/completions and read back the same JSON shape they would get from a hosted API. A bare call looks like this:

curl -s http://localhost:1234/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{"model":"gemma","messages":[{"role":"user","content":"Summarize this week."}]}'

For the times I actually want to sit and talk to it rather than have a cron job poke it, Open WebUI runs in a container at open-webui:8080 and points at the same endpoint. That gives me a chat window without a single token leaving the house.

The honest tradeoff: the local model is smaller and slower than a frontier hosted one. A call takes around three seconds, and on genuinely hard reasoning it is clearly weaker. But the jobs that use it are not doing hard reasoning. They are categorizing, summarizing, and tidying, and for that the gap is small and the privacy gain is total. I reserved the principle, not the dogma: when I genuinely need a stronger model for a one-off, I use one, with data I have chosen for that specific task. The default, the always-on, runs-unattended-on-my-messages path, stays on the box.

HOW I FIND OUT BEFORE THE HOUSEHOLD DOESprobeevery minutestatusup / downnotifypush servermy phonefirst to know[ the goal: never learn of an outage from a person ]

Monitoring is just deciding to be the first to know

For a long time my "monitoring" was using the services and noticing when one felt off. That is not monitoring, it is hoping. Monitoring is just deciding to be the first to know. The fix was a small stack of single-job watchers that overlap on purpose, because the failure I fear most is not the server going down. It is a monitor failing silently and leaving me falsely reassured.

The primary watcher is Gatus, which I like because the entire definition of "healthy" lives in one declarative file I can read top to bottom. It runs at gatus:8080, and its config is mounted read-only so the container cannot rewrite its own rules:

volumes:
  - ./gatus/config:/config:ro

The config at ~/selfhost/gatus/config/config.yaml is a list of endpoints, each in a group (ai, dashboard, data, deploy, home, media), each with its own check interval and its own conditions for what counts as up. Intervals run from 5s for the things I want to know about immediately to 30m for the things that only need an occasional sanity poke. Conditions are small expressions: [STATUS] == 200 for endpoints that should be strict, [STATUS] < 500 relaxed for services that legitimately return a 4xx when idle, [RESPONSE_TIME] < N to catch a service that is technically up but crawling, and [BODY].status for endpoints that report their own health in the response body. A representative slice, plus the alerting block that ties it to my phone:

endpoints:
  - name: jellyfin
    group: media
    url: https://jellyfin.home.example.dev/health
    interval: 60s
    conditions:
      - "[STATUS] == 200"
      - "[RESPONSE_TIME] < 800"
alerting:
  ntfy:
    url: https://ntfy.home.example.dev
    topic: home-alerts
    priority: 4
    default-alert:
      send-on-resolved: true
      failure-threshold: 3
      success-threshold: 2

The failure-threshold: 3 is deliberate. A single failed probe is usually the network blinking, not the service dying, and an alert on every blink is an alert I will learn to ignore. Three consecutive failures before it fires, two consecutive successes before it tells me it recovered, and send-on-resolved so the story has an ending. The whole point is that when my phone buzzes, it means something.

Uptime Kuma runs alongside it at uptime-kuma:3001, watching the same services from a slightly different angle. Running two checkers on the same endpoints sounds redundant, and it is, on purpose. Two cheap watchers occasionally disagreeing is itself useful information: it usually means the thing is flapping, or the disagreement is between "the service is down" and "the path to the service is down," which are different problems with different fixes.

Uptime Kuma also taught me the most painful lesson in this part. Twice it corrupted its SQLite database and went down, which is a special kind of insult: the watcher I installed to tell me when things break became the thing that broke, silently. The recovery and the permanent fix:

# recover what is salvageable from the corrupted file
sqlite3 kuma.db ".recover" | sqlite3 kuma-recovered.db
 
# stop fast-but-fragile journaling; trade a little speed for durability
sqlite3 kuma.db "PRAGMA journal_mode=DELETE; PRAGMA synchronous=FULL;"

The default write-ahead-log journaling is faster but more prone to leaving a half-written database if the process dies mid-write, which on a laptop that occasionally gets put to sleep is not hypothetical. journal_mode=DELETE with synchronous=FULL is the boring, durable choice. On top of that I added an hourly online backup, a launchd job firing every 3600 seconds that snapshots the live database without locking it:

sqlite3 kuma.db ".backup '$SNAPSHOTS/kuma-$(date +%Y%m%d-%H%M).db'"

.backup is the right tool here precisely because it is safe to run against a database that is being written to, so the snapshot is consistent even while Uptime Kuma is mid-check.

The third watcher is Beszel, which is not about whether services respond but about the slow death of the host itself: disk filling, memory pressure, CPU pinned. It is split into a hub at beszel:8090 and an agent (henrygd/beszel-agent:latest) listening on 45876. The agent connects out to the hub over a WebSocket authenticated with an SSH key, and reports host vitals on an interval. These are the metrics that do not trip a status check until it is too late: the disk does not announce that it is at 94 percent, it just one day refuses to write, and by then Postgres is crash-looping.

Tying all three together is ntfy, a self-hosted push server running at ntfy:80 behind ntfy.home.example.dev, listening on the topic home-alerts. Every watcher, and every backup script, ends its chain at the same place: a push notification on my phone, so an alert lands with a human instead of in a log nobody reads. Any script can join the chain with a single curl:

curl -s -X POST "https://ntfy.home.example.dev/home-alerts" \
  -H "Title: $TITLE" -H "Priority: $([ "$RC" -eq 0 ] && echo 3 || echo 5)" \
  -H "Tags: $TAGS" -d "$MSG"

The priority trick reads well: a success sends at normal priority 3, a failure ($RC non-zero) escalates to 5, which on the phone is the difference between a quiet line in the notification shade and a buzz I will actually feel.

Monitoring is not about graphs. It is about arranging things so that the machine, not a frustrated human, is the one that tells you something broke, and so that no single watcher can fail without another noticing.

Backups, and the difference between a copy and a restore

The disk-usage alarm from part three is monitoring. Backups are the other half: the assumption that the laptop will eventually die, get stolen, or fill its disk past saving, and the plan for when it does. Mine lives in ~/selfhost/scripts/backup.sh, fired nightly by launchd, and it is deliberately unsophisticated. It pushes from homeserver to my main Mac, the one I use daily, over the same tailnet the rest of the server lives on. No cloud, no extra account, just one machine copying to another inside the private network.

The transport is rsync over SSH, using a dedicated ed25519 key (~/.ssh/id_ed25519_backup) that exists only for this job, so the backup path has its own credential rather than reusing my interactive login key. The SSH options matter as much as the key: BatchMode=yes so the job fails loudly instead of hanging on a password prompt at 3 a.m., and accept-new so a first run on a known machine does not block on host-key confirmation:

RSYNC_SSH="ssh -i ~/.ssh/id_ed25519_backup -o BatchMode=yes -o StrictHostKeyChecking=accept-new"
rsync -aR --delete --link-dest="$DEST_BASE/$YESTERDAY" -e "$RSYNC_SSH" \
  "$SRC/./$item" "$DEST_USER@my-main-mac:$DEST_BASE/$TODAY/"
find "$DEST_BASE" -maxdepth 1 -mindepth 1 -type d -name "20*" -mtime +14 -exec rm -rf {} +

Each night writes a fresh dated directory, …/selfhost-backups/<YYYY-MM-DD>/, which sounds wasteful until the --link-dest flag does its job. It tells rsync to compare against yesterday's snapshot and, for any file that has not changed, create a hard link instead of a new copy. Unchanged files cost no extra disk: I get the mental model of a full snapshot every night with the storage cost of an incremental. The -R flag preserves the relative source paths so the layout on the far side mirrors the source, and --delete keeps each snapshot honest by removing files that are gone from the source rather than letting deletions linger forever. The find at the end prunes anything older than 14 days, which is the window I have decided is enough.

What it backs up is config only: the compose file, the .env, each service's */config directory, and the data dirs. It explicitly does not back up Media. The films and shows are large, re-acquirable, and not worth the space or the nightly transfer; what is irreplaceable is the state, the databases, the dashboards, the dozen small config files that took an evening each to get right. When the job finishes it closes the loop through the same ntfy chain as everything else, sending a "Backup OK" or "Backup FAILED" line with the snapshot size and the target's free space, so I can see at a glance whether the far Mac is itself filling up.

The part I had to learn the hard way is that a backup you have never restored is a rumor. A copy that exists is not the same as a copy you can bring back, and the only way to know which one you have is to actually do the restore. So once in a while I do, against a real service. Pull a known snapshot down into the live data directory and bring the stack back up:

rsync -av -e 'ssh -i ~/.ssh/id_ed25519_backup' \
  my-main-mac:~/selfhost-backups/2026-05-11/uptime-kuma/data/ uptime-kuma/data/
cd ~/selfhost && docker compose up -d

If the service comes up with its history intact, the backup was real. If it does not, I would much rather find out on a Tuesday afternoon with the original still running than during an actual incident. An untested backup is a story you tell yourself about being safe.

One small thing that makes all of the above possible at all: a laptop wants to sleep, and a server must not. The lid is closed and the machine is on a shelf, so left alone it would suspend and every watcher and backup would quietly stop. The fix is a single command kept alive in the background:

caffeinate -d -i -m -s

That holds off display sleep, idle sleep, disk sleep, and system sleep, which is the whole list. It is the least glamorous line in the entire stack and one of the most important, because everything else assumes the machine is awake.

The limitation I chose on purpose

I will name the gap honestly, because this series has tried to throughout: my backup is a single copy to a single nearby machine. It survives the laptop dying, getting stolen, or eating its own disk. It does not survive my home burning down, because both machines are in it. A genuinely cautious person would push an encrypted copy offsite, and they would be right to. I have decided that for this data, on this threat model, a local mirror I actually test beats a remote one I would set up once and never verify. The offsite copy that exists but has never been restored is the same rumor as any other untested backup, just further away. That is a real limitation, chosen on purpose, with its eyes open, not an oversight I am hoping nobody notices.

What six parts of this added up to

The machine started as a spare laptop I felt guilty about wasting. It became the most useful piece of infrastructure I own, and writing this series clarified why, which was not the technology. Tailscale, Caddy, launchd, OrbStack, a local model, a handful of watchers: none of it is exotic, and most of it is a weekend's reading. The thing that made it work was a small set of rules I kept arriving at from different directions:

RuleWhere it showed up
Keep the risky logic boring and readableThe finance and ingestion jobs, written to be re-read, not admired
Make failures loud rather than making fragile things reliableDisk alarms, ntfy pushes, a backup that reports its own size
Let the cheap redundant check beat the clever single oneGatus and Uptime Kuma watching the same services on purpose
Treat the interface as the productThe reverse proxy, real hostnames and certs for every service
Decide your limitations on purposeOne local backup, tested, instead of an offsite one never verified

The laptop is still on the shelf, lid closed, awake on caffeinate, quietly reconciling my books and serving my photos and running a model that never phones home and telling me before I notice when something is wrong. It owes its second life not to being a clever project but to being treated as a machine I decided to actually trust, and then doing the unglamorous work to deserve it.

FAQ

How do I run a local LLM on a Mac home server without sending data to the cloud?

Install LM Studio as a native macOS app, load a quantized model such as Gemma MLX 4-bit, and bind its OpenAI-compatible server to 0.0.0.0:1234. Front it with Caddy using host.docker.internal as the upstream so containerized services can reach the model without any traffic leaving the machine.

Why use host.docker.internal in a Caddy reverse proxy config?

host.docker.internal is the DNS name a Docker or OrbStack container uses to reach a service running directly on the host Mac. It is the bridge you need when a native app like LM Studio runs outside the container network and a containerized proxy like Caddy needs to forward requests to it.

How do I prevent Uptime Kuma SQLite database corruption on a home server?

Switch the database away from write-ahead-log journaling by running PRAGMA journal_mode=DELETE; PRAGMA synchronous=FULL; against the kuma.db file. Also add an hourly launchd job that uses sqlite3 .backup to snapshot the live database safely without locking it.

What is the difference between Gatus and Uptime Kuma for home server monitoring?

Gatus defines health checks in a single declarative YAML file with fine-grained conditions like response time and body field checks, while Uptime Kuma provides a UI-driven dashboard. Running both on the same endpoints is intentional redundancy: when they disagree it usually means a service is flapping or the path to it is down, which is useful information on its own.

Passing --link-dest to rsync tells it to compare the new snapshot against the previous day's directory and create hard links for unchanged files instead of copying them. You get the mental model of a full nightly snapshot at the storage cost of an incremental, with each dated directory browsable on its own.