Skip to content
~/tariqul.islam
/blog

/blog/ai-first-build-workflow

20 Apr 2026 · 2 min read

I shipped a production SaaS with zero hand-written code. Here's the workflow.

Not a demo, not a prototype — a deployed SaaS where AI typed every line and I made every decision. The exact spec-generate-review-gate loop, and where it breaks.

Cover illustration for “I shipped a production SaaS with zero hand-written code. Here's the workflow.”

I set myself an honest test: take real products — auth, billing, tenancy, the boring hard parts — from empty repository to production without hand-writing a single line of code. GitHub Copilot and Claude did all the typing. I did everything else.

It worked, twice. StampBD is a multi-database SaaS for Bangladesh's stamp vendors — stock tracking, sales, government reporting. Amaz is a complete Amazon seller management platform — inventory, purchase orders, forecasting, returns, multi-warehouse, SP-API integration. But "AI wrote it" is the least interesting sentence about either, because the workflow that made it work is 80% engineering judgment and 20% prompting.

Direction, not absence

The phrase I use with clients: this is AI-accelerated delivery directed by engineering judgment. I architected, reviewed, and shipped; AI typed. Nine years of building production systems the manual way — micro-finance platforms to multi-tenant ERPs — is what makes the directing possible: I know what a correct diff looks like, so I can reject an incorrect one in seconds.

The loop has five gates:

01 spec      (human)  architecture, acceptance criteria, what NOT to build
02 generate  (ai)     Copilot + Claude write the implementation
03 review    (human)  every diff, read like a tech lead reads a team PR
04 gate      (ci)     types, tests, lint — no green, no merge
05 ship      (human)  deploy, monitor, own the incident

The spec is where the leverage lives. A feature starts as a written decision — invitations are workspace-scoped tokens, single-use, 7-day expiry, no silent account creation — plus acceptance criteria. The instruction that changed everything: "flag anything underspecified instead of guessing." AI's failure mode isn't bad code; it's confident code built on a guess you didn't know it made.

What breaks, honestly

  • Long-range consistency. AI happily creates a second way to do something that already has a first way. The fix is architectural: strong conventions documented in the repo, and review that treats "this duplicates existing structure" as a hard reject.
  • Schema design. I don't delegate data models. Every table is a human decision, because schema mistakes are the ones you pay for over years.
  • The last 10% of a hairy bug. When generation loops on a fix, I stop generating and start reading. The judgment to know when to stop prompting is — again — the actual skill.

What this means if you're hiring

Speed without a lowered bar, but only when someone senior owns the gates. An AI workflow directed by someone who couldn't build the system manually produces the same thing fast typing always produced: impressive piles of wrong.

I now run this loop on client work daily. The honest pitch isn't "AI makes it cheap" — it's that the senior engineer you're paying for stops spending their hours typing and spends all of them on the decisions that were always the expensive part.

/blog/ai-first-build-workflow/next

Dealing with this exact problem?

This post exists because a real project hit a real wall. If you're approaching the same wall, a 30-minute conversation now is cheaper than the rewrite later.