We're building AI copilots for jobs that don't need copilots

The copilot metaphor made sense for a minute. Now it is a branding workaround for products that cannot commit to what they actually do.

Share

The word 'copilot' entered the product vocabulary at a specific moment when AI was capable enough to be useful but not reliable enough to be autonomous. It was a useful frame. It acknowledged that the AI needed a human partner. It set appropriate expectations. It positioned the product as an assistant rather than a replacement, which made it easier to sell to people worried about their jobs.

That was a few years ago. Since then, the copilot frame has been stretched so far past its original meaning that it no longer communicates anything.

Here is the original test for whether something is actually a copilot: would you feel comfortable if the copilot made a mistake without you noticing? In aviation, the answer is no, which is why there is a co-pilot, an autopilot, and a set of warning systems. The human is in the loop not for legal reasons but because the task is complex, high-stakes, and genuinely benefits from redundancy. The copilot metaphor only makes sense when the oversight is real, not performative.

Most products currently called AI copilots fail this test in one of two directions. Either the task is so simple that oversight adds friction without adding value, or the task is so complex that the AI output requires so much review that calling it a copilot is flattering the product. There is a narrow band of tasks where copilot is accurate. Most product teams are not building in that band.

Take the common case: an AI writing assistant positioned as a copilot. The task is generating a first draft or suggesting edits. If the AI does this well, the user reviews quickly and accepts most of it. If the AI does this poorly, the user rewrites everything. In neither case is the human functioning as a copilot in any meaningful sense. The human is either approving or correcting. The copilot frame is doing nothing but making the product sound more sophisticated than it is.

The honest version of this question is: should this be a copilot at all? There are really only two valid answers for most tasks. Either the AI should just do the thing automatically, with exception handling for the cases it cannot handle, or the AI is a raw capability that the user calls on selectively. The in-between state, where the AI does something and a human must review everything before it counts, is usually a sign that the team has not committed to a product direction.

This matters for product design more than it matters for marketing. If you build a copilot, you are designing for ongoing human attention. The interface, the workflow, the user expectation are all shaped around a human who is actively engaged. If you build an automation, you are designing for exception handling. The interface, the workflow, and the expectation are completely different. Teams that call something a copilot but build it like an automation, or vice versa, end up with products that feel wrong without anyone being able to articulate why.

There is also a product strategy implication. Copilots are, by design, not autonomous. They require ongoing human attention to function. That is a ceiling on how much value they can deliver. Every hour a user spends supervising the copilot is an hour they are not spending on something else. The most successful AI products over the next few years will be the ones that moved from copilot to automation for the tasks where automation is appropriate, and were honest with themselves about which tasks those are.

The founders who are building the best AI products right now are the ones who asked the uncomfortable question early: are we building a copilot because the task requires it, or because we are not confident enough in our AI to commit to full automation? If it is the latter, the copilot frame is a delay tactic, not a product strategy.

So here is the question: for the specific task your product handles, what would have to be true about reliability before you would remove the human from the loop entirely? And if you cannot answer that question, what does that tell you about what you are actually building?