/blog/ai-first-build-workflow
20 Apr 2026 · 2 min read
I shipped a production SaaS with zero hand-written code. Here's the workflow.
Not a demo, not a prototype — a deployed SaaS where AI typed every line and I made every decision. The exact spec-generate-review-gate loop, and where it breaks.
I set myself an honest test: take real products — auth, billing, tenancy, the boring hard parts — from empty repository to production without hand-writing a single line of code. GitHub Copilot and Claude did all the typing. I did everything else.
It worked, twice. StampBD is a multi-database SaaS for Bangladesh's stamp vendors — stock tracking, sales, government reporting. Amaz is a complete Amazon seller management platform — inventory, purchase orders, forecasting, returns, multi-warehouse, SP-API integration. But "AI wrote it" is the least interesting sentence about either, because the workflow that made it work is 80% engineering judgment and 20% prompting.
Direction, not absence
The phrase I use with clients: this is AI-accelerated delivery directed by engineering judgment. I architected, reviewed, and shipped; AI typed. Nine years of building production systems the manual way — micro-finance platforms to multi-tenant ERPs — is what makes the directing possible: I know what a correct diff looks like, so I can reject an incorrect one in seconds.
The loop has five gates:
01 spec (human) architecture, acceptance criteria, what NOT to build
02 generate (ai) Copilot + Claude write the implementation
03 review (human) every diff, read like a tech lead reads a team PR
04 gate (ci) types, tests, lint — no green, no merge
05 ship (human) deploy, monitor, own the incidentThe spec is where the leverage lives. A feature starts as a written decision — invitations are workspace-scoped tokens, single-use, 7-day expiry, no silent account creation — plus acceptance criteria. The instruction that changed everything: "flag anything underspecified instead of guessing." AI's failure mode isn't bad code; it's confident code built on a guess you didn't know it made.
What breaks, honestly
- Long-range consistency. AI happily creates a second way to do something that already has a first way. The fix is architectural: strong conventions documented in the repo, and review that treats "this duplicates existing structure" as a hard reject.
- Schema design. I don't delegate data models. Every table is a human decision, because schema mistakes are the ones you pay for over years.
- The last 10% of a hairy bug. When generation loops on a fix, I stop generating and start reading. The judgment to know when to stop prompting is — again — the actual skill.
What this means if you're hiring
Speed without a lowered bar, but only when someone senior owns the gates. An AI workflow directed by someone who couldn't build the system manually produces the same thing fast typing always produced: impressive piles of wrong.
I now run this loop on client work daily. The honest pitch isn't "AI makes it cheap" — it's that the senior engineer you're paying for stops spending their hours typing and spends all of them on the decisions that were always the expensive part.