Internal AI

Building Press, Part 3: The prompts are data, not code

June 19, 202610 min read

The first time I changed a drafting prompt and nothing changed, Press, the editorial engine I built to draft from my own work, taught me that where a prompt lives is not an academic detail.

The draft_blog prompt is the one that turns a mined story into a full blog post, and I had just rewritten it: tighter thesis, a demand for real mechanism, a ban on the filler that makes AI writing read like AI writing. I edited the default in the code, committed, deployed, watched the next draft come back, and it was the old prompt. Word for word. The deploy was clean. The code was correct. The engine was reading something else entirely.

That something else was a row in a database, and finding it is what taught me the rule this part is about: in a production system that leans on a model, the prompt is the surface you touch most, and it should live where you can change it like config, not like source. Prompts are data.

TL;DR

In a production AI pipeline, the prompt is the surface you touch most, so it should live in a database row you can edit at runtime, not in a source-code constant that requires a build and deploy to change. Press, my editorial engine, resolves prompts in a fixed order: an exact stage-and-model row wins, then the model-null stage default, then the compiled code default as a last fallback. The seeder is insert-only, which means a changed code default does nothing to a row that already exists, so updating behavior requires editing both the code default and the live row. Treating prompts as config rather than code cuts the cost of iteration down to a database write, which is the whole point.

The prompt is the product, so it lives where I can edit it

Mining clusters redacted signals into story candidates. Drafting writes the blog post, and separate prompts write the LinkedIn and X versions because the shape of a good post on each is different. A panel of review judges scores the draft for quality, for hook strength, and for whether it reads as machine-written. A compose stage writes a piece from a freeform brief. An SEO stage checks how answerable the result is. Each of those is a named stage with its own prompt, sitting on top of the kind of pipeline shape I wrote about in Building Press, Part 2: The data model is the spine.

The prompts are not constants in a source file. The code ships a default for each stage, but the default is a seed, not the live instruction. The instruction the engine actually runs is a row in a prompts table, and that row is editable from the Studio dashboard. When a draft comes back flat, I do not open an editor, change a string literal, and wait on a build and a deploy. I open the prompt for that stage, edit it, save, and the next run uses it. Raising the quality bar is a database write. That is what prompts as config means in practice.

That distinction sounds small until you count how often each surface changes. I have edited the code around drafting a handful of times. I have edited the drafting prompt dozens of times, sometimes twice in an afternoon, chasing a specific tell I just noticed in the output. The thing I touch most should not be the thing that costs the most to touch.

Tip

Treat prompts as tunable config, not source. The engine reads the active row at runtime; the code default is only a seed and a fallback. If iterating on the instruction means iterating on a deploy, you will iterate less, and the quality bar is exactly the thing you want to iterate on relentlessly.

Resolution: override, then default, then fallback

A stage can run on more than one model, and a prompt that is right for one model can be wrong for another. So resolution is layered. When the engine needs the prompt for a stage, it asks by stage and model and resolves in a fixed order.

An exact match for the stage and the specific model wins. If there is no model-specific override, the engine uses the stage default, which is the row whose model is null. If even that is missing, it falls back to the default compiled into the code. Most stages run on the model-null row; the model-specific override exists for the rare case where one model needs a different nudge, and most of the time it is empty.

That layering is also exactly where my change went to die.

Deep-dive: the prompt-resolution rule (and the gotcha that bit me)

The seeding function that populates the prompts table only ever inserts rows that are missing. It never updates a row that already exists. That is deliberate: the model-null row is also the row the Studio editor writes to, so if seeding overwrote it on every deploy, every human edit would be silently clobbered the next time I shipped unrelated code. Insert-only seeding protects the edits.

The cost of that protection is the gotcha. The first time the engine ran, it seeded the model-null row from the code default. From then on, that row existed. So when I later changed the code default and deployed, the seeder saw the row already present and did nothing, and resolution found the model-null row first and used it. The code default was now dead weight: a fallback for a row that would always be there. To actually change behavior I had to change the code default and update the live row, because the live row is what wins.

I learned this the obvious way. I changed the code, I deployed, and I watched the old prompt keep running.

The fix in practice is a two-step I now do on purpose: edit the code default so a fresh environment seeds correctly, and update the live row so the running engine changes today. One without the other is a quiet no-op in one direction or the other.

What the bar actually encodes

Because the prompt is where the quality lives, the prompt is where the editorial standard gets written down. The bar I am aiming for is the bar of a thoughtful technical publication: a real idea rather than a recap, the mechanism under the hood, the tradeoffs, what got rejected. None of that is enforced by code. It is enforced by the words in the drafting prompt and scored by the words in the review prompts.

Some of it is defensive. The drafting prompt carries a set of calibrate-and-concede rules: do not use false absolutes a reader could falsify, do not invent precision, and name at least one real limitation of whatever you are describing, because a piece that only celebrates itself reads as marketing. There is a hard rule banning the em dash, because it is one of the loudest tells that text came from a model, and the cheapest place to stop it is at the drafting stage rather than in editing. These are not code changes. They are sentences I added to a row, and the review judges carry matching sentences that penalize the same tells from the other side.

The point is that the standard is executable. It is not a wiki page someone is supposed to remember. It is the literal instruction the model runs, which means improving the standard and changing the system are the same action.

Source constants and vague quality asks

The obvious alternative was to keep prompts in source, as constants, versioned with the code. It is tidy, and it gives you a clean git history of every wording change. I gave it up because it taxes the exact loop I most want to be frictionless. A prompt tweak should not wait on CI and a deploy. The git history is a genuine loss, and I will say plainly that I traded auditability for iteration speed.

The other thing I rejected was vaguer and more tempting: telling the model to write better. Asking for higher quality in the abstract does almost nothing, because the model already believes it is writing well. What moves output is naming the specific failure and the specific structure that prevents it. Not "write a great post," but commit to one thesis, show the mechanism, concede a limitation, and never open with a recap. Specificity is the whole game, and specificity is only easy to keep editing when editing is cheap.

A bad prompt edit ships instantly

Here is the cost I have not designed away. A bad prompt edit ships instantly. There is no deploy gate in front of it, no review, no second pair of eyes. The same property that lets me raise the bar with a save lets me lower it with a save, and a careless edit quietly degrades every draft from that moment until I notice. Code changes go through a pull request and CI. Prompt changes go straight to production by design, because that immediacy is the point. My entire safety net right now is keeping the code default as a known-good baseline I can paste back.

So the point I keep coming back to is not that prompts belong in a database. It is narrower and more useful: the piece of an AI system you tune most should be the cheapest piece to change, even when that means giving up some of the safety the rest of the system gets for free. You move the friction to where iteration is rare and you accept the risk where iteration is constant. For everything in Press that sits downstream of a model, the prompt is where iteration is constant, so the prompt is where the friction had to go.

FAQ

Why did my prompt change in code not affect the running AI pipeline?

If your seeder is insert-only, it skips any row that already exists in the prompts table. The live row wins at runtime, so a changed code default is dead weight until you also update the live row directly.

How do I store LLM prompts so I can edit them without redeploying?

Keep prompts as rows in a database table, keyed by stage and optionally by model. The engine reads the active row at runtime; the code default is only a seed for fresh environments and a last-resort fallback.

What is the right prompt resolution order for a multi-model AI pipeline?

Resolve most-specific first: an exact match on stage and model wins, then a model-null stage default, then the code default compiled into the binary. Most stages run on the model-null row; model-specific overrides cover rare edge cases.

What do you lose when you store prompts in a database instead of source control?

You lose git history for every wording change, which is a real auditability tradeoff. The author of Press accepted that cost explicitly in exchange for frictionless iteration, keeping the code default as a known-good baseline to paste back if needed.

How do you prevent a bad prompt edit from degrading production output?

There is no deploy gate in front of a database prompt edit, so a careless change ships instantly. A practical minimum safety net is maintaining the code default as a known-good version you can paste back into the live row when you notice quality drop.