Engineering

Building Press, Part 4: Review is where the human gates the irreversible

June 19, 20269 min read

One draft nearly shipped with a client's name in it. The deterministic redactor had already run. The draft read cleanly. The name appeared in a form the glossary did not list, so nothing flagged it, and nothing about the draft looked wrong. That is the exact failure mode human in the loop publishing is designed to catch: the gap between a draft that reads well and a draft that is actually safe to send into the world.

Part 1 covered ingestion, where redaction runs in code before anything leaves my machine. This part is about the other end of the pipe: the moment that decides whether the whole system is safe to run unattended.

A good draft and a safe draft are different properties, and the model that produces the first has no reliable view of the second. So Press, the editorial engine I built to draft from my own work, treats AI-assisted content as safe only when the irreversible, outward-facing steps are gated by a human or by deterministic code, never by the model's confidence. Publishing, posting to a platform, and deleting are the steps you cannot take back. Everything before them can be as wrong as it likes, as often as it likes, because it sits behind a gate the model cannot open on its own.

TL;DR

Human in the loop publishing means no post can move from draft to live without passing deterministic safety gates and an explicit human approval. In the Press editorial engine, confidentiality checks run as hard blocks in code, SEO checks surface as judgment flags, and the state machine has no direct edge from draft to published. A model's confidence score is not a safety signal, so the irreversible act of publishing stays behind a doorway only a person can open. The design trades full autonomy for a review surface that is fast to vet, because the actual bottleneck is human trust, not drafting speed.

Human in the loop publishing means nothing publishes itself

Between a draft and a live post sit three gates, and they do not all work the same way.

The first is a confidentiality deep-scan. The deterministic redactor from ingestion already ran at egress, replacing known secrets and glossary terms by exact match. The deep-scan is the second layer: an LLM pass that reads the draft looking for the things exact-match cannot catch, like a client described rather than named, or a detail that is individually harmless but identifying in combination. It exists precisely because the first layer is fast and certain but literal.

The second gate is a set of SEO and answerability checks. Does the piece actually answer the question it claims to? Is there a focus keyphrase, a title, a meta description? These are quality and reach concerns, not safety, but they live at the same boundary because the same principle applies: catch it before it ships, not after.

The third gate is a human approval. Me, reading the thing, clicking publish.

Two kinds of gate

The distinction that makes this work is between gates that catch a deterministic leak and gates that are a judgment call.

A deterministic gate blocks hard. If the confidentiality scan finds a literal secret or a denylisted term still present in the body, you cannot wave it through. There is no "publish anyway" button for that class of finding, because the cost of a false negative is unbounded and the check itself is certain: the term is either present or it is not. A hard block sends the post back to draft. You fix it, or it does not ship.

A judgment gate flags for a person. The deep-scan's "this paragraph might identify a client" is a probability, not a fact. The SEO checks are advice. These surface to me with context, and I decide. The system's job there is not to be right; it is to make the decision easy and to make sure the decision gets made by something that can be held responsible.

Important

Gate the irreversible step, not the model's output. A confidence score on a draft tells you how sure the model is, which is not the same as how safe the post is. Put the hard checks and the human on the act of publishing, and the quality of any individual draft stops being a safety question.

The state machine

Every post moves through an explicit set of states, and the transitions are where the gates live. A post is born a draft. Sending it to review runs the gates. If they pass and I approve, it becomes approved, and from there it either publishes now or waits for a scheduled time. A blocked gate routes it straight back to draft with the reason attached. That same bias toward explicit structure shows up in Building Press, Part 2: The data model is the spine, because the data model has to make the allowed paths hard to misread.

The shape matters more than it looks. There is no edge from draft to published. You cannot skip review, because the only transition into published comes from approved, and the only way to reach approved is through in_review with the gates satisfied and a human's yes. The irreversible state has exactly one narrow doorway, and that doorway is the gate. That is the point of human in the loop publishing: the machine can prepare the work, but it cannot cross the boundary by itself.

Deep-dive: deterministic blocks vs judgment flags

The implementation keeps these as two separate result types so they cannot be confused at the call site. A deterministic finding carries enough to identify the exact offending term and its location; the publish action checks for any such finding and refuses the transition if one exists, full stop, before a human is even consulted. A judgment finding carries a description and a severity, and it is rendered into the review surface as something to read, not something that blocks the button.

The reason for the split is that the two failure modes have opposite costs. For a literal secret, a false negative is catastrophic and a false positive is cheap (you reword one sentence), so you bias hard toward blocking. For a "this might be too promotional" note, a false positive is expensive (it nags you on every post until you stop reading the notes) so you bias toward advising. Mixing them, treating a judgment call as a hard block or a hard block as advice, breaks both: you either cannot publish anything or you train yourself to click through warnings that sometimes matter.

A model's self-assessment cannot open the door

The tempting shortcut was to let the model grade itself. Ask the drafting model for a confidence score, or run a second model as a judge and auto-publish above some threshold. I built a version of the judge and kept it, but only as an advisory signal inside review, never as a gate. The moment a model's self-assessment can open the door to publishing, you have moved the irreversible decision back inside the model, which is exactly where this design says it must not live. A fluent model is most confident precisely when it is fluently wrong.

The other rejected option was making review purely advisory: surface everything, block nothing, trust the operator. That fails the opposite way. The deterministic leaks are the cases where I am least able to be the backstop, because they are easy to miss in a draft that otherwise reads clean. Those are the cases that must block in code, not flag for tired human eyes.

Gates catch known failure classes

These gates catch known failure classes. They do not catch everything. A draft that is confident, fluent, and simply wrong about a technical claim can pass every automated check, because none of them verify truth. That is not a gap I can close with another gate; it is the reason the human approval is non-optional and cannot be automated away. The honest design consequence is that Press optimizes for "easy to review" over "fully autonomous." Short, structured drafts with the answer up front are not only better to read; they are faster for a person to vet, which is the actual bottleneck. A system that drafts ten posts an hour but produces drafts I cannot quickly trust has not saved me anything.

Automation earns its keep by changing where the human spends attention, not by removing the human from the decisions that cannot be undone. Press does not publish for me. It does the reading, the clustering, the drafting, and the catching, so that the one judgment only I can make is the one thing left to do.

FAQ

Why can't I let the AI model gate its own publishing step?

A model's self-assessment reflects how confident it is, not how safe the output is. Confidence rises with fluency, not correctness, so auto-publishing above a threshold moves the irreversible decision back inside the model, which is where it must not live.

What is the difference between a deterministic gate and a judgment gate in an AI publishing workflow?

A deterministic gate blocks hard when an exact-match rule fires, such as a denylisted term still present in the body. A judgment gate surfaces a probability or advisory note to a human for a decision, because the cost of a false positive there is high enough that blocking automatically would train you to ignore the warnings.

How do I prevent client names or confidential details from leaking in AI-drafted content?

Run a deterministic redactor at ingestion and again at egress to catch known glossary terms by exact match, then follow it with an LLM deep-scan that looks for indirect identification like a client described rather than named, or details that are individually harmless but identifying in combination. The two-layer approach covers what each layer alone cannot.

What states should a post move through in an AI-assisted editorial state machine?

A safe minimal path is draft, in review, approved, and published, with archived as a terminal state. The critical constraint is that there is no direct edge from draft to published; the only way to reach published is through approved, which requires the gates to pass and a human to say yes.

Can automated SEO checks replace human review before publishing?

No. SEO and answerability checks are quality and reach concerns that live at the same boundary as safety checks, but they are advisory, not blocking. A draft can pass every automated check and still be factually wrong, which is why human approval is non-optional and cannot be automated away.