Bassam Ismail
Engineering

One Binary, One Box

10 min read

The whole org-wide platform runs as one Go binary on a t4g.nano with 512 MB of RAM, a practical single Go binary setup rather than a miniature platform team. No containers, no gateway fleet, no orchestrator: just Caddy in front, systemd underneath, and enough routing logic to make every API, auth flow, MCP stream, and project subdomain land in the right handler. The trick was not making the system small; it was moving only the stateful, elastic parts to managed services and keeping compute boring until traffic proves otherwise.

TL;DR

One statically linked Go binary on a t4g.nano behind Caddy. A single process serves four surfaces (the API, the auth flow, the MCP server, and every project subdomain) by switching on the Host header. Auth is in-process Google OAuth minting a signed session cookie, which is why viewers never see a login prompt or a token. Caddy terminates TLS and issues certificates on demand for subdomains that did not exist a second ago, gated by a tiny allow endpoint. Deploying is build, copy to object storage, pull over SSM, restart under systemd. No pipeline, no containers, no orchestrator.

One process, every surface in a single Go binary

The platform answers on several hostnames: an API host, an auth host, an MCP host, and a wildcard of project subdomains. In a serverless design each of those was its own gateway. Here they are one binary that reads the Host header and routes to the right sub-mux.

func (d *deps) route(w http.ResponseWriter, r *http.Request) {
	host := strings.ToLower(r.Host)
	if i := strings.IndexByte(host, ':'); i >= 0 {
		host = host[:i]
	}
	zone := d.cfg.Zone
	switch {
	case host == "mcp."+zone:
		d.mcpMux.ServeHTTP(w, r)
	case host == "auth."+zone:
		d.authMux.ServeHTTP(w, r)
	case host == "api."+zone:
		d.platformAPI.ServeHTTP(w, r)
	case strings.HasSuffix(host, ".public."+zone):
		d.serveH.ServePublic(w, r) // open, shareable artifacts
	case strings.HasSuffix(host, "."+zone):
		d.serveH.ServeAuthed(w, r) // internal artifacts, behind sign-in
	default:
		d.localDevRoute(w, r)
	}
}

Each sub-mux is an ordinary http.ServeMux with its own routes and middleware. The whole server is a http.Server with a ReadHeaderTimeout and a logging wrapper; there is no framework. What the switch statement replaced is the real win: several gateways with their own stage configs and per-route integrations, collapsed into one router you can read top to bottom.

It also means the same binary runs on a laptop. A process listening on a port does not care whether that port is on a server or on localhost, so there is no emulator and no "works locally, breaks deployed" gap. The dev path is the same code pointed at a dev table and bucket.

THE ENTIRE PLATFORMone box, top to bottomEDGECaddyTLS :443ONE GO BINARY :8080api.auth.mcp.servingAWS DATA (UNCHANGED FROM V1)DynamoDBprojects, groupsS3artifact bundles[ Compute is one process; storage stayed managed. ]

Auth that viewers never see

Part 1 promised that internal artifacts are already authenticated, with no token to generate. This is where that promise is paid for. The binary does the OAuth dance itself and mints its own session cookie, which is what let an earlier managed user pool be deleted entirely. The ingestion side of that promise is covered in Building Press, Part 1: Read the work, leave the secrets.

The login handler redirects to Google with PKCE:

q.Set("client_id", h.GoogleClientID)
q.Set("response_type", "code")
q.Set("scope", "openid email profile")
q.Set("redirect_uri", h.SelfOrigin+"/callback")
q.Set("code_challenge", challenge)         // S256 of a random verifier
q.Set("code_challenge_method", "S256")
q.Set("state", state)
q.Set("prompt", "select_account")

The callback exchanges the code, verifies the ID token, enforces the company domain, and sets a signed cookie:

if !strings.HasSuffix(claims.Email, "@"+h.AllowedEmailDomain) {
	httpx.Text(w, http.StatusForbidden, "only @"+h.AllowedEmailDomain+" accounts allowed")
	return
}
sessionJWT, _ := session.SignSession(h.SessionSecretARN, session.SessionClaims{
	Email: claims.Email, Sub: claims.Sub, Name: claims.Name,
})
cookie := session.BuildSetCookie("session", sessionJWT, session.CookieOpts{
	Domain: h.CookieDomain, MaxAgeSeconds: session.SessionTTLSeconds, // 12h
})
session.SetCookie(w, cookie)

The cookie is an HS256 JWT, signed with a secret pulled once from Secrets Manager, scoped to the whole zone so it covers every project subdomain, and HttpOnly; Secure; SameSite=Lax. The API side composes the same idea as middleware: a request is authorized if it carries a valid API key, or the session cookie, or a Google ID token. What used to be "an API Gateway authorizer" is now three small http.Handler wrappers that fall through to each other. A viewer with a valid session reaches the content without ever touching the auth machinery.

TLS for subdomains that do not exist yet

Projects get a subdomain created the moment the project is, so there is no chance to provision certificates ahead of time. Caddy solves this with on-demand issuance: it fetches a certificate the first time a hostname is hit. Left unguarded, that is also a way to burn through certificate-authority rate limits, so Caddy asks the binary first, on a tiny internal endpoint, whether a hostname is allowed to mint a cert. That shape follows Caddy's on-demand TLS model, but keeps the approval decision inside the application.

func (d *deps) tlsAllow(w http.ResponseWriter, r *http.Request) {
	host := strings.ToLower(r.URL.Query().Get("domain"))
	zone := strings.ToLower(d.cfg.Zone)
	switch {
	case host == zone,
		strings.HasSuffix(host, ".public."+zone),
		strings.HasSuffix(host, "."+zone):
		w.WriteHeader(http.StatusOK)
	default:
		http.Error(w, "not allowed", http.StatusForbidden)
	}
}

The Caddyfile wires that gate and proxies everything to the binary:

{
	email [email protected]
	on_demand_tls {
		ask http://127.0.0.1:8080/_/tls-allow
	}
}
 
*.<zone>, <zone> {
	tls { on_demand }
	reverse_proxy 127.0.0.1:8080 {
		flush_interval -1            # do not buffer the MCP SSE streams
		header_up Host {host}
		header_up X-Forwarded-Proto https
	}
}

flush_interval -1 is what makes the MCP streaming story from Part 1 actually work. The "which group, which project" question the MCP server asks mid-deploy rides an SSE stream. A proxy that buffers holds the question until the connection closes, which means forever. One line of config fixes that. (There is a real wrinkle: a *.public.<zone> block has to be declared separately, because a Caddy wildcard matches exactly one label and the public artifacts live two labels deep. Certificates silently fail to issue until you discover this.)

Serving a project

When a request lands on a project subdomain, the binary pulls the subdomain off the host, looks the project up by a bySubdomain secondary index, finds its active version, and assembles an object key of project_id/version/path:

key := project.ProjectID + "/" + versionID + filePath
if h.tryServeS3(w, r.Context(), key, shareSetCookie) {
	return
}

Content type is read off the stored object (set correctly at upload time, which is Part 3's story), and HTML is explicitly made uncacheable so that publishing a new version is visible immediately rather than minutes later:

ct := "application/octet-stream"
if obj.ContentType != nil && *obj.ContentType != "" {
	ct = *obj.ContentType
}
w.Header().Set("content-type", ct)
if strings.HasPrefix(ct, "text/html") {
	w.Header().Set("cache-control", "no-cache, no-store, must-revalidate")
} else {
	w.Header().Set("cache-control", "public, max-age=60")
}

Uploads are capped at a couple thousand files and tens of megabytes, with anything large going through a presigned upload straight to object storage rather than through the binary. That split is a scar from the serverless version, and Building Press, Part 3: The prompts are data, not code explains why it exists at all.

Deploying is a build, a copy, and a restart

There is no pipeline, no build server, no registry. The binary is cross-compiled, pushed to a bucket, and pulled down on the box over SSM, so there is no SSH key to manage:

build:
	GOOS=linux GOARCH=arm64 CGO_ENABLED=0 \
	  go build -ldflags="-s -w" -o /tmp/server ./cmd/server
 
push: build
	aws s3 cp /tmp/server s3://app-artifacts/server-linux-arm64
 
restart:
	aws ssm send-command --instance-ids i-0abc1234 \
	  --document-name AWS-RunShellScript \
	  --parameters 'commands=[
	    "aws s3 cp s3://app-artifacts/server-linux-arm64 /opt/app/server",
	    "chmod +x /opt/app/server",
	    "systemctl restart app.service"
	  ]'

CGO_ENABLED=0 makes the binary portable enough to be a plain copy. systemd owns the process and restarts it if it dies:

# /etc/systemd/system/app.service
[Service]
EnvironmentFile=/etc/app.env
ExecStart=/opt/app/server
Restart=always
RestartSec=2
LimitNOFILE=65536

The box itself comes up from a single user-data script that installs Caddy, writes the units and the Caddyfile, pulls the binary, and starts everything. A rollback is the same SSM command pointed at the previous artifact, because the artifact is just a file.

Why one box is enough for a single Go binary

A single instance is a single point of failure with no autoscaling and an OS I patch myself. I took that on purpose, and the reasoning is about the workload.

The traffic is internal and bursty. People publish an artifact and share a link; the link gets opened a handful of times by colleagues, not thousands of times by the public. A few minutes of downtime on an internal tool is an annoyance, not an incident. The instance is a burstable t4g.nano (the in-repo note literally says to bump it to a micro only if concurrent uploads start running it out of memory). It carries an Elastic IP so DNS survives replacing the box, and its permissions are one IAM role that reads the artifact bucket and the data tables.

This design does not paint itself into a corner. The process-on-a-port shape means the high-availability version is not a rewrite: it is the same binary on a container service behind a load balancer, deployed when the workload justifies it. A weekly cost report watches spend and alerts if it crosses a threshold.

It is worth naming what this setup accepts as operational tax, though:

What it buysWhat it accepts
Near-zero infra overheadManual OS patching and instance replacement
Fast deploys (copy + restart)Active SSE connections drop on restart
No orchestrator complexityCert state lives on one disk; back it up
Single IAM role, auditableMemory headroom is thin under concurrent uploads
Clean HA exit path documentedNo autoscaling; a spike absorbs into burst credit

The exit is written down. When the workload outgrows this, the move is clear.

FAQ

Isn't a single 512 MB box reckless for an org-wide service?

For a public, high-traffic service: yes. For an internal artifact host with bursty, low-concurrency reads and a tolerance for brief downtime, it is right-sized. The guardrails are an Elastic IP for stable DNS, systemd auto-restart, a cost alert, and a documented path to a load-balanced container service that reuses the same binary.

How does one process serve a large number of project subdomains?

The wildcard *.<zone> resolves to the one box. Caddy issues certificates on demand per hostname, gated by the allow endpoint. The binary maps each subdomain to a project via a secondary index at request time. Adding a project is a database write, not a deployment.

Why drop the managed user pool for hand-rolled OAuth?

Once the platform is a long-running process, doing the OAuth exchange and minting a signed cookie in-process is a few hundred lines and removes an entire managed dependency, its configuration, and its per-environment drift. It also makes the "viewers are already authenticated, no token" promise something the binary owns directly rather than coordinates with another service.

What stops a deploy from taking the box down permanently?

systemd restarts the process. If a bad binary crash-loops, the previous one is a single SSM command away because the artifact in object storage is just a file. There is no half-applied infrastructure state to untangle, which is the upside of a deploy being a copy and a restart rather than an orchestration.

When should a single Go binary move behind a load balancer?

When downtime becomes an incident, concurrent uploads outgrow the memory budget, or traffic stops being internal and bursty. The clean exit is the same process on more than one machine behind a managed load balancer, not a new application architecture.