How Product Development Processes Should Change in the Age of AI Agents#

2026-05-10

Product development with AI agents and harnesses

Work processes inside companies are changing quickly as AI agents enter daily operations. The company I work at is a technology-driven advertising operations company, broadly divided into an advertising operations organization and a product development organization.

The advertising operations organization manages advertising campaigns on behalf of clients. Recently, this group moved toward a GitHub-based workflow where advertising guidelines, skills, and brand data are organized as projects, and operators work conversationally with AI agents such as Claude Code or Hermes. This transition was accepted relatively naturally.

The reason is clear. Each advertising operator is responsible for a set of brands and handles those brands from A to Z. In this structure, AI agents amplify the execution power of the existing owner. Common work can be captured as skills, and DB or system integrations can be handled through agents. From each operator’s perspective, their existing responsibility boundary did not change dramatically.

The product development organization is different. It is made up of planners, designers, and developers, and several people participate in a single feature flow. In this structure, AI agents do not merely increase an individual’s execution power. They also shake the boundaries and responsibilities between planning, design, and development.

The Problem Is Intent Transfer, Not Code#

The old product development process was typical. A planner wrote a planning document to communicate requirements, a designer designed the screen, and a developer implemented and deployed it.

Then planners became able to use Claude Code to directly modify screens and add features in the frontend project. A planner implemented a renewal direction on a separate branch. Parts that could use existing APIs were implemented with real behavior, while parts that required API changes or additions were implemented with mock or fake data.

This is powerful. A planner is no longer limited to writing a document. They can create a screen that actually moves. But this also creates a problem. Development concerns such as internal module reuse, alignment with the existing structure, API boundaries, state management, and maintainability may not be sufficiently reflected. As a result, the code diff can grow beyond what a human can reasonably review.

At first glance, a screen directly implemented by a planner may look more concrete than a traditional planning document. There is a screen, buttons produce visible behavior, and some features are connected to real APIs. But that does not automatically mean intent transfer has improved.

Traditional planning documents included not only screen structure but also why the screen was structured that way, what should happen when a button is clicked, and what user flow was intended. By contrast, a code change plus a text document generated by an AI agent from that code may not be enough to communicate intent to humans.

A screen shows “what changed.” It does not clearly explain “why it changed,” “what alternatives were considered,” “what trade-offs were made,” or “what counts as correct behavior.” A document generated by an AI agent after reading changed code can easily become a plausible post-hoc interpretation of the result, rather than a record of the planner’s original intent.

In that case, the developer is not receiving product intent. They are reverse-engineering intent from the diff and the screen behavior. That is closer to archaeology than collaboration.

Code Cannot Replace Planning Documents#

Code changes made by planners are valuable. But they cannot fully replace planning documents.

Code is an executable draft. It can show screens and interactions more vividly than a static planning document. But code does not preserve intent by itself. It does not automatically record why something matters, what criteria drove a decision, which parts are final, and which parts are temporary.

For a planner’s code branch to be useful, it needs to come with the following information.

Why this screen structure is needed
What user problem it is trying to solve
What was inconvenient in the existing flow
Expected behavior for buttons, states, and edge cases
Where mock and fake data remain
What contract the real API should have
Which decisions are final and which are still open

Without this information, developers spend too much time interpreting the implementation result instead of understanding the product intent.

We Need a Boundary Between Prototype and Production#

The most important distinction is how we treat the planner’s code change.

If we treat it as “code to put into production,” the risk becomes large. Code quality, API boundaries, reuse, security, performance, failure handling, and maintainability cannot be judged from screen behavior alone. Especially in a large renewal, long-term costs can grow quickly if prototype code becomes product code as-is.

If we treat the planner’s code change as an “executable draft,” it becomes highly valuable. The planner can quickly validate intent through the screen, and the developer gets a more concrete starting point than a static document. The question is how to promote that draft to production quality.

So the issue is not whether to merge the planner’s branch as-is. The issue is whether there is a path for promoting a prototype into production.

That path needs at least the following steps.

Product intent review
Design consistency review
API and data contract definition
Mock and fake data removal plan
Development integration
Harness validation
Staging verification
Deployment and rollback plan

In this flow, the developer’s role is not to inspect every line of code. The developer should be responsible for system boundaries, data flow, deployability, and long-term maintainability.

Review Does Not Disappear. It Changes.#

In an AI-agent-based development flow, the important question is not “should we review or not?” The more important question is “what should we review?”

The old model, where humans read every diff and judge every implementation detail, will not scale for long. The amount of change produced by AI agents can easily exceed the capacity of manual human review. Review should move from code-centered review to intent, contract, risk, and harness-result-centered review.

Humans should review questions such as:

What user flow does this feature change?
What API is needed, and how is it different from the existing API?
What data states and DB tables are affected?
Are authentication, authorization, billing, or reporting affected?
Where do mock and fake data remain?
Is rollback possible if this fails?
What do tests, type checks, e2e tests, and visual regression guarantee?

This review is not a process for blocking AI development. It is the process of making explicit the judgment criteria that harnesses should eventually take over.

Harnesses Do Not Create the Answer by Themselves#

The opposing view was to trust AI agents and harnesses more, reflect the planner’s changes first, and improve the broken parts afterward. In that view, tests, skills, and harnesses should be strengthened so that humans do not need to understand the current project structure in detail.

There is real truth in this view. If humans remain attached to old-style review and design control, the performance gains from AI-agent-based development may arrive too slowly. In an age of larger diffs, the assumption that humans must manually understand and approve all code becomes a bottleneck.

In the long run, many development judgments will likely be automated by harnesses and agents. Humans should move toward a world where they do not need to keep every implementation detail of a project in their heads.

But there is an important condition. Harnesses only verify what has been made explicit. Tests do not create the correct answer by themselves. Someone must first define what correct behavior means, what UX is intended, and which data states must not be broken.

If this definition is weak and the team simply chooses to “reflect first and improve later,” the harness may not become stronger. Instead, tests may lock in incorrect behavior as normal behavior. The result is not automation, but a gap in responsibility.

Principles for the Transition Period#

I do not think the answer is to simply choose one side.

The direction of using AI agents more actively is right. The fact that planners can now create executable drafts themselves is an irreversible change. This capability can greatly increase product development speed and the density of experiments.

But treating that draft as production code is a separate matter. A planner’s code change should be recognized as an executable product draft. The development organization should not necessarily reimplement it manually from scratch. Instead, it should be responsible for promoting it to production using AI agents and harnesses.

During this transition period, the following principles matter.

Break large changes down by user flow, page, and API.
Attach intent, priority, edge cases, and success criteria to planner code changes.
Explicitly list mock and fake data, then convert them into real API work.
Redefine developer review as system contract and risk review, not full code inspection.
Treat harnesses as including types, tests, API contracts, visual regression, staging verification, feature flags, and rollback.
Gradually move human judgment criteria into tests, skills, documents, and contracts.

A Better Frame for Persuasion#

This debate should not be framed as “whether we accept AI development or not.” In that frame, one side looks like it is blocking innovation, and the other side looks like it is ignoring quality.

A better frame is this:

I agree that we should use AI agents more actively. But what we need now is not to slow down. What we need is to clarify the path for promoting a planner’s executable draft into production quality.

Developer review is not about making humans inspect AI-generated code in the old way. It is about having humans first clarify the criteria that harnesses should eventually judge on their behalf.

With this framing, the debate can move from “should we review or not?” to “which judgments should we turn into automatable criteria?”

Conclusion#

AI agents are fundamentally changing role structures in product development organizations. Planners are no longer people who only write documents, and developers are no longer people who simply implement planning documents. But that does not mean intent transfer and responsibility judgment disappear.

On the contrary, this is the moment to make intent, contracts, risks, and correct-behavior criteria more explicit. Until harnesses become strong enough, there are still areas where humans need to judge. Turning those judgments into harnesses is the core work of this transition period.

A planner’s code change is not production code. It is an executable draft. The fact that such drafts can now be created quickly is a major step forward. What we need next is a system for safely promoting those drafts into products.

Product development in the age of AI agents should not evolve only toward producing more code faster. It should evolve toward turning human intent and system quality criteria into forms that can be automated.