Bassam Ismail
Thought Leadership

When Claude Became a Shortcut Before the Problem Was Understood

9 min read

The delivery plan crossed the line at "use Claude in Chrome to finesse AWS support." Before that, it looked like ordinary AI-assisted intake: Slack notes, Git history, browser downloads, transcripts, and assistant sessions turned into daily stories and posts. The moment AWS support appeared in the plan as something to finesse through a browser agent, assisted delivery stopped being acceleration and started becoming camouflage. The fix was not a better prompt. I paused the assistant-first workflow and forced the plan through a constraint ledger: who owns access, what artifacts exist, what can run unattended, and what evidence shows a real problem has been solved.

TL;DR

Claude in Chrome is useful for exploration, but it is not a delivery plan by itself. The practical answer is to separate discovery from implementation, prove ownership and access, and collect evidence before asking a model to generate more artifacts. Once the boundary is clear, the assistant can classify, summarize, draft, and critique without pretending the hard parts are already solved.

Why Claude in Chrome needs a gate before delivery

The tempting version of this project was simple: open Chrome, ask Claude to download the files, scrape the transcripts, read the notes, summarize the repos, and generate a content series. It had the smell of progress. Windows would pile up. Text would accumulate. Someone could reasonably say the machine was working.

The uncomfortable version was less cinematic: most of the risk lived outside the prompt. We did not yet know which accounts were required, which exports were allowed, which files could be downloaded directly, which repos actually built, whether video transcripts existed, or whether browser automation would survive long enough to finish. The model could narrate around those gaps beautifully. That was precisely the problem.

I have started treating this as a three-stage model, not a vibe: Explore, Prove, Build. The four classification buckets and the intake manifest below are tools inside the Prove gate. Nothing reaches Build without passing through them.

DELIVERY GATEexplorefind factsproverun checksbuildship path[ The model moves faster after the boundary is known. ]

Exploration is allowed to be messy. Implementation is not. The mistake is letting exploratory prompts inherit implementation authority.

The failure mode: Browser access masquerading as engineering access

Browser access feels powerful because it is close to the user. It can see logged-in pages, click export buttons, and read whatever the session can read. That is also why it is a lousy foundation for a delivery plan.

A browser session is not an integration contract. It is a pile of privileges, cookies, UI timing, and hope. If the work depends on it, the plan should say so plainly. If the work can be moved to an API, clone, export, or scheduled job, it usually should be.

The first pass at the plan looked like this:

Work itemAssistant-shaped requestEngineering question
DocumentsDownload through ChromeIs there an export API or shared drive source?
ReposSummarize from GitHubCan we clone, build, and inspect dependencies?
VideosDownload and transcribeAre transcripts available, licensed, and complete?
NotesRead meetings and sessionsWhat is the access boundary and retention rule?
StoriesGenerate postsWhat evidence says the source signal is useful?

That table changed the conversation. It turned "can Claude do this" into "which part of this is actually solved." The second question is less fun, which is one reason it works.

A rubric I now use before handing work to an LLM

The rubric is deliberately plain. If I need a complicated governance framework to decide whether a model should click around a browser, I have already lost the room.

1. Classify the work (inside the Prove gate)

I split every item into one of four buckets:

items:
  - name: docs-export
    bucket: access
    owner: ops
    evidence: shared-drive-export-link
  - name: repo-summary
    bucket: implementation
    owner: engineering
    evidence: local-build-log
  - name: story-drafts
    bucket: generation
    owner: editorial
    evidence: reviewed-sample-set
  - name: aws-case
    bucket: blocked
    owner: platform
    evidence: support-case-id-required

The bucket matters more than the task name. A story draft is generation. A repo summary might be exploration until the repo builds. An AWS support issue is not solved because a browser agent can type into a form. It stays blocked until the authorized account, case ID, and acceptance path are named.

To make the downgrade concrete: "Claude can gather AWS support evidence" started life in the generation bucket. After the ledger pass, it moved to blocked. The claim became: gather evidence from an AWS support case. The questions that moved it: which account holds the case, who is the authorized contact, what is the case ID, and how does resolution get confirmed? None of those were answerable from a browser session. The task did not disappear; it waited for a human to supply the missing constraints before the model touched it again.

This is the same instinct behind treating prompts as durable inputs instead of hidden magic, which I wrote about in Building Press, Part 3: The prompts are data, not code.

2. Require evidence, not a transcript

LLM sessions are useful records, but they are weak evidence. They tell me what the assistant said and sometimes what the human asked. They do not prove a command ran, a dependency installed, a server responded, or a policy allowed the action.

For repo intake, I want artifacts like this:

cd ~/story-worker
npm ci
npm test
npm run build
npm run inspect -- --format json > reports/repo-inspect.json

For an API, I want a boring curl that exits cleanly:

export API_HOST='https://api.example.test'
export API_TOKEN='[REDACTED:secret]'
 
curl --fail --silent --show-error \
  -H "Authorization: Bearer $API_TOKEN" \
  "[REDACTED:password]" \
  | jq '.items | length'

For a server process, I want the unit file or compose service, not a paragraph claiming it runs on a spare laptop:

services:
  story-worker:
    image: node:22-alpine
    working_dir: /app
    command: ["node", "src/worker.mjs"]
    volumes:
      - ./src:/app/src:ro
      - ./data:/app/data
      - ./reports:/app/reports
    environment:
      API_HOST: https://api.example.test
      OUTPUT_DIR: /app/reports
    restart: unless-stopped

These artifacts are not glamorous. That is their charm. They are hard to bluff and easy to rerun.

3. Separate capture from authoring

The proposed system mixed two jobs: collecting private work history and generating public narrative from it. Those need different controls.

Capture needs deterministic connectors, retention rules, and logs. Authoring needs judgment, revision, and taste. When the same Claude session handles both, it becomes too easy to confuse "the assistant found a document" with "the assistant understood which parts are publishable."

STORY SYSTEMSOURCESchatgitnotesCAPTUREexportslogsREVIEWfactsriskAUTHORdraftsedits

The boring implementation shape is a pipeline: capture first, normalize second, summarize third, draft last. The model can help at multiple points, but it should not be the only thing holding the pipeline together. That review boundary is also why Building Press, Part 4: Review is where the human gates the irreversible became a separate part of the system instead of a footnote.

What I rejected

I rejected "just make it a Claude skill" as the default answer. A skill can package instructions, workflows, and tool expectations. It does not solve browser-only documents, private video access, AWS account authority, or repeatable data ingestion on its own.

I also rejected a pure manual workflow. Asking a person to download every file, paste every transcript, and babysit every summary is not a system. It is a ritual with a keyboard.

The workable middle was an application with explicit connectors where possible, browser-assisted capture only where necessary, and a review step before anything becomes external content. Claude can still be involved, but its job becomes narrower: classify, summarize, draft, and critique within a known boundary.

Deep-dive: The minimum viable intake manifest

I like making the intake manifest boring enough to diff. Each source gets an owner, method, retention note, and evidence path. This makes it obvious when a source is not integrated and is merely being wished into existence.

sources:
  - id: chat-history
    method: export
    owner: ops
    retention_days: 30
    evidence: reports/chat-export-2026-06-23.json
  - id: git-history
    method: clone
    owner: engineering
    retention_days: 90
    evidence: reports/git-summary.json
  - id: meeting-notes
    method: connector
    owner: operations
    retention_days: 30
    evidence: reports/notes-index.json
  - id: videos
    method: transcript-export
    owner: editorial
    retention_days: 14
    evidence: reports/video-transcripts.json

A manifest like this does not prevent bad decisions. It makes them visible sooner, which is most of the job.

The sharp edge that remains

This approach has a cost. It slows the first hour down. Someone has to name owners, write manifests, collect evidence, and tell the model no when it offers a plausible shortcut. On a small team, that can feel like bureaucracy wearing a sensible jacket.

There is also a judgment call around browser automation. Sometimes it is the only practical way to capture a source that has no export API. I do not ban it. I treat it as a manual bridge, not infrastructure. If the bridge becomes permanent, I either formalize it or admit that the workflow is operationally fragile.

The model could narrate around those gaps beautifully. That was precisely the problem.

Where Claude helped once the boundary was clear

After the gate, Claude became useful again. It could compare session notes, find repeated themes, identify which prototypes were only prompt-shaped demos, and separate them from cases where actual engineering constraints had been removed.

That distinction matters. A Claude-prompted app is not fake, but it is not the same thing as a solved system. If no one can run it tomorrow, if the data path is unclear, if the access depends on one person's browser session: the work is still exploratory. Useful, maybe. Shippable, no.

The better prompt was not clever. It was constrained:

Classify each artifact as exploration, evidence, implementation, or publication.
For each item, cite the artifact path, the owner, and the next missing constraint.
Do not infer solved status from assistant output alone.
Mark browser-only access as manual unless an export or API path is shown.

That prompt works because the system around it has something to check. Without the manifest and evidence, it would be another confident sorting hat for a pile of uncertainty.

FAQ

Why is using Claude in Chrome unreliable for delivery?

Because a browser session is not a stable integration boundary. It depends on logged-in state, UI timing, human-owned access, and page behavior that can change without notice.

Where should I capture evidence in AI-assisted delivery?

Capture evidence at the boundary where the claim becomes testable: build logs for repos, curl output for APIs, export files for documents, and support case IDs for vendor issues.

When is a Claude-prompted app actually solved work?

It is solved when someone can rerun it from source, verify the data path, name the owner, and show the constraint it removed. A transcript alone is not enough.

Should Claude skills replace a small internal application?

Use a skill for repeatable instructions and model behavior. Use an application when you need durable connectors, scheduling, storage, audit logs, or repeatable ingestion.

The model is most useful after the work has edges. Before that, it mostly gives uncertainty a nicer speaking voice.