Zero-downtime deploys without a platform: the symlink pattern

Deploying a static site looks like the easy part. You build the files, you copy them to the server, nginx serves them. Then you watch a deploy land while someone is loading a page, and you understand why deploy platforms exist.

The naive version is a single rsync straight into the directory nginx serves from. rsync writes files one at a time, so for a few hundred milliseconds that directory holds a mix: a fresh index.html pointing at asset hashes that have not arrived yet, or old HTML sitting next to new assets. A visitor who loads the page in that window gets a broken render. On a personal blog that is a flicker nobody reports. On anything that takes money it is a bug.

Adding a CDN and a second origin makes the window wider, not narrower. This blog runs two origin nodes behind Akamai in separate regions. rsync into both in sequence and there is a stretch where one node already has the new build and the other still has the old one. The rolling deploy has now put two versions of the site live at the same time.

The fix is not a faster rsync. It is making the cut from old to new a single operation that the web server cannot catch halfway through. That operation already exists in the filesystem, and it is the humble symlink swap.

The pattern

Picture the symlink as the sign on an office door that reads "we're in here." Renting the next office, furnishing it, and wiring it up takes time. Moving the sign takes a second. Nobody ever walks into a half-furnished room, because you do not point the sign at a room until the room is finished.

That is the whole idea. The deploy is three operations, in this order:

rsync the build into a fresh versioned directory, releases/<timestamp>/. nginx has no idea this directory exists. The transfer can take as long as it needs, because no traffic is pointed at it yet.
Stage a new symlink at a fresh name, then atomically swap it into place: ln -sfn releases/<timestamp> current.new followed by mv -Tf current.new current. current is the path nginx serves from. This is the sign moving from one door to the next, and the filesystem sees a single rename event with no in-between state.
Prune. Keep the last five releases, delete the rest.

The release layout on each node looks like this:

/var/www/blog/
├── releases/
│   ├── <release-id-1>/   ← older
│   ├── <release-id-2>/   ← older
│   └── <release-id-3>/   ← current
└── current -> releases/<release-id-3>

nginx's root points at the symlink, never at a specific release:

root /var/www/blog/current;

The runner does all the work off to the side. nginx only ever sees one event: the swap.

The post-deploy half of this runs on each origin node after the rsync finishes:

RELEASE_ID=$(date +%Y%m%d-%H%M%S)
cd /var/www/blog

# Stage the new symlink at a fresh name, then atomically swap
ln -sfn releases/$RELEASE_ID current.new
mv -Tf current.new current

# Prune: keep the last 5 releases by modification time
ls -1dt releases/*/ \
  | tail -n +6 \
  | xargs -r rm -rf

The two-step swap

Look closely at the swap. It is two commands, not one:

ln -sfn releases/<id> current.new
mv -Tf current.new current

The first command creates a fresh symlink at a brand-new name, current.new. That name does not exist yet, so ln is doing the simple thing it does best: writing a new symlink that points where you tell it. The second command atomically replaces the live current symlink with the prepared one.

The split matters because ln -sfn becomes treacherous the instant you point it at a name that already exists and that already resolves to a directory. On Linux with GNU coreutils, ln -sfn releases/<id> current against an existing symlink-to-directory can follow the link and create a new symlink inside the target, leaving you with current/current -> releases/<id> and a webroot that no longer points where nginx expects. The exact behaviour shifts between coreutils versions and platforms, which is reason enough never to invoke it that way.

mv -Tf does not have that ambiguity. It calls rename(2), which the kernel guarantees is atomic when source and destination live on the same filesystem: current resolves to either the old release or the new one, never to nothing in between and never to both. The -T flag tells mv to treat the destination as a name to be replaced rather than a directory to be moved into. The -f forces the replacement.

The pattern that falls out: ln is the symlink creation primitive, safe at a new name. mv is the replacement primitive, safe against whatever was at the old name. Use each for the job it does without surprises.

Two scope notes. rename(2)'s atomicity holds only within a single filesystem, so keep releases/ and current under the same mount — split them across a separate partition, NFS, or a cross-mount bind and the guarantee disappears. And the atomicity is about the swap itself, not about layers above it: in-flight HTTP requests, an open_file_cache entry inside nginx, or a CDN edge serving the previous HTML are separate problems the filesystem cannot solve. The next section handles the first two; the CDN comes back at the end.

Does nginx actually notice?

A fair worry: if nginx reads current once at startup, the swap would change nothing until a reload. nginx does resolve the root path string when the worker starts, but it re-walks that path on every open() in the file-serving path, so the new target takes effect on the next request. No reload, no restart — with one footnote. If open_file_cache is on (off in stock config, common in production tunings), the swap is invisible to entries still inside the cache window. Tune open_file_cache_valid to match how quickly deploys need to be observable, or live with the window.

Do not take my word for it. Tail the access log on one terminal, run the swap on another, and request a file you know changed between releases. With the default file-serving path and no caches stacked above, the new bytes show up immediately. Verifying this with a five-second test beats trusting a blog post, including this one.

The dual-node question

Two nodes raise an ordering question: swap them at the same moment, or one after the other?

Simultaneous swaps risk a brief window where node01 serves the new release and node02 still serves the old one. Sequential swaps close that window but stretch the deploy out and mean draining each node before you touch it.

This setup swaps both at once, and the reason is the load-balancing policy rather than laziness. The Akamai GTM is set to Ranked Failover, not round-robin. Real traffic only reaches node02 when node01 is down. At any ordinary moment one node answers every visitor and the other sits warm in reserve, so version skew between the two during a two-second swap never reaches a human. The honest caveat: GTM liveness probes run on their own schedule. There is a narrow theoretical window where a probe coincides with the primary's swap and shifts traffic to the secondary mid-deploy. For a static blog the chance of a visitor hitting that window is statistical noise. For an active-active pair fronting a checkout flow, the answer would flip: swap sequentially, drain each node before you touch it, and accept the slower deploy.

The deploy user can do exactly one thing

The account that receives the rsync is not a general-purpose login. An SSH Match block strips it down to the minimum it needs and nothing more:

Match User <deploy-user>
AuthenticationMethods publickey
AllowAgentForwarding no
AllowTcpForwarding no
PermitTunnel no
X11Forwarding no

No shell, no forwarding, no tunnel — it can receive files and run the swap script, and that is the entire job description. This is least privilege applied to a service account, with no euphemism about what "least" still allows: a leaked key lets an attacker publish an arbitrary release, exfiltrate everything in releases/, or swap current to an empty directory and take the blog down. That is real damage. What the Match block buys you is a boundary — the harm stops at the static site's files and does not become a shell, a pivot to other accounts, or a tunnel into the rest of the box. A command= restriction in authorized_keys narrows it further still, down to the single allowed command.

Rollback is the same move in reverse

Here is the side benefit that makes the whole pattern worth it. Because old releases stay on disk, rollback needs no artifact store, no rebuild, and no transfer. The bytes are already sitting in releases/. Rolling back is the swap again, aimed at an earlier directory:

ln -sfn releases/<release-id-2> current.new
mv -Tf current.new current

A rollback.yml GitHub Actions workflow takes a release ID as input, validates it against what is actually on the node, and runs that one swap on each origin. It finishes in about fifteen seconds, most of which is the SSH handshake. The primitive that ships a release is the primitive that un-ships it, so rollback stops being a documented procedure you hope works under pressure and becomes one command you have already run a hundred times.

What you actually need

None of this is specific to Astro, to nginx, or to static files. The symlink swap is the foundation any deploy can build on. Dynamic stacks add steps on top — signalling the running process, invalidating an opcache, running migrations, draining long-lived connections. A CDN in front adds one more — purging the edge cache for the HTML envelope, since content-hashed asset URLs handle the rest. Capistrano formalised the symlink half for Rails back in 2009, and most platforms still do some version of it under a friendlier dashboard. You are not avoiding their cleverness; you are doing the bottom layer yourself, the part that was never that clever in the first place.

If you self-host, you can add this in an afternoon. Make a releases/ directory, point a current symlink at one of them, write the four-line post-deploy script above, and set nginx's root to the symlink. Atomic deploys and free rollback fall out of those four pieces. The cost is a handful of shell commands and a naming convention, and what you get back is never again watching a half-written directory go live while someone is reading.