The 5 levels of agentic engineering: From vibecoding to production-maxxing
Vibecoding is great, but it's best applied as a communication medium... not building production apps. There's a good path to production that doesn't mean switching to traditional software engineering.
It’s ok to vibecode, it’s not ok to ship slop to users. I have a mental model i’m working on to try to balance moving quickly and not breaking things. (Building less, shipping more.)
Internal only
Goal: Figure out if you should build anything.
When to use: You are the only user and are trying to communicate ideas rather than ship usable software.
Models to use: Whatever is fast and good enough (in practice, i find this to be gpt-5.3-codex at medium reasoning effort.)
What you’re allowed to ship: Literally anything. Terrible is fine. Worse is better.
Attention to agent effort: Virtually none. Let it run as long as it wants, ship terrible stuff, expect to throw it away.
Alpha
Goal: Figure out if you’re building something anyone wants.
When to use: When you have < 10 users, and you know most of them directly through 1 degree of separation. You can talk to all of them, and you kind of expect them to churn, because they’re being nice to you more than being a real user.
Models to use: Basically the same fast / good enough.
What you’re allowed to ship: Things that don’t have serious security bugs and unusable performance characteristics.
Attention to agent effort: Slightly more. Don’t let it do anything absolutely terrible, but in practice most modern agents are good enough to not make the sloppiest mistakes.
Private Beta
Goal: Figure out if you’re building something anyone wants enough to use frequently.
When to use: When you have ~10 users but none of them are 1 degree of separation. More importantly: Some of them haven’t churned and are actually getting a quantum of utility.
Models to use: Start thinking about something that’s better at reasoning, and slower.
What you’re allowed to ship: Roughly the same as Alpha, but it should actually be useful for someone. You should still be embarrassed by how bad it is.
Attention to agent effort: I recommend having the agent perform an after action report style summary where it carefully explains all of the changes it made (in a text file) and you should be able to ask questions of your agent to ensure you’re on the same page.
Public Beta
Goal: Figure out if you’re building something people want to use frequently
When to use: When you have enough users that you don’t know all of them / can’t talk to them individually. (Dunbar’s number is about ~150 and is probably a decent guide for consumer products. For B2B, it’s some meaningful amount in your target market.)
Models to use: Slower and more thoughtful for anything that touches all of your users.
What you’re allowed to ship: Something that mostly works, but has a few rough edges. Enough people should be using the product that every minute you spend of effort results in at least 10x saved effort by your users.
Attention to agent effort: More thoroughly code reviewed… not necessarily by a human but there should be some process for maintaining code standards beyond YOLO. (Linters, type checkers, actual tests, playwright tests, etc.)
Production
Goal: Make something people want.
When to use: You have something good enough that it spreads naturally by word of mouth.
Models to use: Ones that are consistent and never break. In practice that means thoroughly vetted and able to be trusted.
What you’re allowed to ship: Something that works in a way that you anticipate will be quality. All of your users should use this part of your software, and every minute you spend should result in 100x saved effort by your users.
Attention to agent effort: Systematic and process driven. You should have an audit trail that proves your software does what you expect, and you shouldn’t have any surprises.
Nobody is shipping production code with agents today.
By my definition, I think the best teams might be shipping public beta quality code. I’m unconvinced that anyone has a robust production level pipeline without thorough human intervention.
It won’t be that way for long, but as of today I think it’s that way.
