Bassam Ismail
Building Press
Building Press·Part 2 of 6
Engineering

Building Press, Part 2: The data model is the spine

6 min read

I write far less than I do, so the engine that became Press exists to close that gap: it reads my work, drafts it, and walks each draft to a published post. The first time it handed me a draft I wanted to change, I hesitated before touching it. Not because the writing was precious, but because I did not trust what "edit this" would cost me. If the post was a single blob of text in a single column, one careless save, or one over-eager AI rewrite, would overwrite the only copy. That hesitation is the tell I want to talk about. It means the data model is wrong.

My thesis, after building the thing, is narrow: the data model is the spine of an editorial engine, and the spine has to be made of distinct tables with one clear owner each. The moment you collapse "content" into a single fuzzy record, every stage of the pipeline starts reaching across lines it should not, and editing stops feeling safe.

Three tables, not one blob

Press models the lifecycle as three nouns that turn into each other: signals become stories become posts.

A signal is a single redacted snippet of work: a Slack message, a commit, a line from a meeting summary. A story is a cluster of signals the miner judged to be one tellable thing. A post is the published artifact a story becomes after drafting and review.

The temptation is to skip the ceremony and keep one content table with a status column that marches from "raw" to "published." I tried that shape in my head and rejected it, because it fails the test that matters: can each stage fail on its own, and can a human step in at any point without untangling the others? With three tables, mining can produce a bad cluster and I can fix or dismiss the story without touching any signal or post. Drafting can write a weak post and I can rewrite it without re-mining. The boundaries are not bureaucracy. They are the seams where a human or a retry can get a grip.

story_id: the whole "have we processed this?" check

Mining needs to know which signals it has already consumed. The entire mechanism for that is one nullable column. A signal carries a story_id once it has been clustered, and mining selects the unprocessed ones with a single predicate:

select * from signals where story_id is null;

That is the whole bookkeeping system. No "mined" boolean, no separate queue table, no timestamp to reason about. A signal is either spoken for (it points at a story) or it is fair game (the column is null). The simplicity is the feature: there is exactly one place the truth lives, and it is impossible for "consumed" and "assigned to a story" to disagree, because they are the same fact.

I want to be honest about what this does not do. story_id is null tracks consumption, not novelty. It guarantees a signal is mined at most once. It does not guarantee that two stories are about genuinely different things. On a busy week, the miner clustered overlapping signals into two stories that were really the same topic, and both got drafted before I noticed. The column did its job perfectly and still let near-duplicates through, because deduplication of meaning is a different problem living one layer up, in the mining prompt. A clean column is not a substitute for a clean judgment.

The body lives on a revision, not the post

Here is the decision that earned its keep more than any other: a post's body is not stored on the post row. The posts row holds identity and metadata (slug, title, pillar, status, SEO fields) and a pointer, current_revision_id. The actual Markdown lives in post_revisions, and every edit, human or AI, inserts a new revision rather than mutating the old one.

This is why "let the AI revise this" is a safe button instead of a scary one. An AI rewrite does not overwrite anything. It writes a new revision and, if I like it, I move the pointer. The worst case is a revision I simply never promote. The history is a diffable, rollback-able stack, and the question "what did this look like before the model touched it" always has an answer.

Tip

Make the destructive operation impossible to express, not merely discouraged. When "edit" can only mean "append a revision," there is no code path that loses the previous version, so you never have to remember to be careful.

Deep-dive: body-on-revision, and the autosave sharp edge

The pointer is deliberately a soft reference. posts.current_revision_id is just a uuid column, not a hard foreign key, because a strict FK would create a circular constraint: a post points at a revision, and a revision points back at its post, so neither can be inserted first without a deferred constraint dance. A soft pointer sidesteps that entirely, at the cost that the application is responsible for keeping it valid.

The honest sharp edge is autosave. "Every edit is a new revision" is true for committed edits and AI revisions, the ones that go through the explicit save path. The live editor, though, autosaves your in-progress typing back onto the working revision in place, so it does not spawn a row on every keystroke. That is the right call for an editor (you do not want a thousand revisions from one writing session), but it means "reversible" applies to promoted and AI revisions, not to an open editing session you have been typing into for ten minutes. If you want a checkpoint mid-session, you take a snapshot explicitly. Knowing exactly where the guarantee starts and stops is more useful than pretending it is absolute.

The tables at the edges

Two more tables hang off posts, and keeping them separate follows the same logic. social_drafts holds the LinkedIn and X copy for a post: secondary artifacts with their own lifecycle (queued, posted, a captured URL) that should not bloat the post row or block the blog from rendering. engagement records the metrics those social posts pull back later. A post can exist and publish with no social drafts, a social draft can be posted and gather engagement independently, and none of it touches the post's body. Each table owns one concern, and the failure of any one of them leaves the others standing.

The durable lesson

The shape of your data decides which mistakes are even possible. I did not make editing safe by being careful; I made it safe by storing the body on a revision so that carelessness has nowhere to land. I did not make mining idempotent with discipline; I made it idempotent with a nullable column that cannot lie. When the model is right, distinct tables let it work; when it is wrong, they give a person somewhere to stand. Design the tables so that the operations you fear cannot be written down, and you stop needing to fear them.