For about two years, AI in software engineering mostly meant smarter autocomplete. The tools were genuinely useful, but they sat inside the IDE and waited for a human to hit tab. The past twelve months have moved the conversation somewhere else. Agentic coding tools - systems that can read a specification, plan changes across multiple files, run tests, and open a pull request without continuous supervision - are now in production at companies that previously used them as side experiments. The question has shifted from "do these things work" to "what part of the software factory should they be allowed to touch". This piece is a survey of what is actually shipping, what is producing measurable leverage, and where the hype still breaks against reality.
What Agentic Coding Actually Is
Agentic coding describes software development workflows in which an AI system acts as an autonomous engineer for a bounded task. The system receives a goal, decides on a plan, executes that plan using developer tools (file edits, shell commands, test runs, git operations), and reports the result. The unit of work is no longer a line of suggested code but a self-contained change with a verifiable outcome. The human sits at the review boundary rather than the keystroke boundary.
The category overlaps with - but is distinct from - chat assistants and inline autocomplete. The defining property is autonomy over a task: the model is trusted to take a sequence of tool actions, observe the results, and decide what to do next, within a defined sandbox.
How Spec-to-PR Workflows Work in Practice
The workflow that has produced the most reliable leverage in production looks roughly the same across teams, regardless of which tooling they have chosen.
-
Spec capture: A human writes or generates a tightly scoped specification. Good specs include the expected behavior, the file or module under change, the relevant tests, and the success criteria. Vague specs produce vague pull requests; this has not changed.
-
Sandboxed execution: The agent runs in an isolated environment with read and write access to the codebase, a shell, and the test runner. Crucially, it cannot push to protected branches or hit production systems. Worktree-based or container-based isolation is the default.
-
Plan, change, verify: The agent produces a plan, executes the plan as a sequence of edits and test runs, and iterates until the success criteria are met. The intermediate state is visible to a human reviewer if anything goes wrong.
-
Pull request handoff: The result is a pull request with a written description, a diff, and passing tests. From here, the workflow rejoins the team's normal review process.
-
AI-assisted review: Increasingly, the same models that wrote the change also produce a structured review of it, flagging risks the human reviewer should focus on. The human still decides; the model provides the first pass.
Why Engineering Teams Are Adopting It
The benefits are not the ones the early hype emphasized. Two patterns dominate the production case studies.
The first is leverage on undifferentiated work. Migrations, refactors, dependency upgrades, test backfills, and boilerplate-heavy CRUD changes are work that consumes senior engineering time without producing senior engineering value. Agentic coding systems are now reliable enough to take large fractions of this work off the human queue, with a review burden that is meaningfully smaller than the original implementation effort.
The second is faster feedback loops. When the cost of trying a refactor drops by an order of magnitude, teams try more of them. Spike branches are cheap. Throwaway prototypes are cheaper. The set of architectural options a team can practically explore in a week has expanded.
What the data does not yet support is a step-change in greenfield product development. The systems are most useful where the constraints are clearest and the success criteria are testable.
Where the Hype Still Breaks Against Reality
The places where agentic coding underperforms are predictable, and they are mostly where every previous generation of tooling also underperformed.
Ambiguous specs produce wandering agents. A model that can take twenty actions can also take twenty wrong actions. Teams that rely on the agent to "figure out what we meant" pay for it in review time.
Cross-cutting changes are still hard. Refactors that span service boundaries, touch deployment configuration, or require domain-specific business judgment are exactly the kinds of changes where the model's lack of long-term context bites.
Verification gaps matter. An agent that opens a green pull request has demonstrated only that the tests it ran passed. If the test suite has gaps - and most do - the agent has not actually demonstrated correctness, and the review burden moves elsewhere.
Cost is real. The compute bill for an autonomous agent that runs to completion is dramatically higher than for inline autocomplete. Teams that adopt agentic tooling without measuring per-task cost get unpleasant invoices.
Tools and Approaches Working in Production
A handful of tools and patterns now appear in most serious deployments:
-
Claude Code, Cursor, and similar agentic IDEs: Bring spec-to-PR workflows directly into the developer's existing editor and shell. The advantage is that the agent inherits the developer's tooling, environment, and tacit context.
-
Devin and similar fully autonomous platforms: Aim at the further end of the spectrum, taking issues and shipping pull requests with minimal human-in-the-loop steps. They work best where specifications are tight and tests are strong.
-
AI test harnesses: Frameworks that generate property-based and fuzz tests using language models, then feed those tests back to the agent's verification loop. The combination closes a meaningful fraction of the verification gap.
-
AI code review: First-pass review tooling that runs against every pull request and flags risks for the human reviewer. Several teams now treat AI code review as a required CI step rather than an optional one.
-
Worktree-based isolation: A small but important pattern. Each agentic task runs in its own git worktree, which keeps the main checkout clean and makes parallel agent runs safe.
Conclusion
Agentic coding has moved from autocomplete to autonomous pull requests, but the more interesting shift is that the bottleneck of software engineering has moved with it. Specifications, tests, and review capacity are now the scarce resources. Teams that invest in those parts of the pipeline get compounding leverage from agentic tooling; teams that do not get a faster way to produce mediocre changes. The technology is real, the wins are concrete, and the hype is still ahead of where it should be in some corners. The honest framing is that the software factory has gained a powerful new class of worker. As with any such addition, what changes most is the management of the work, not just the work itself.