The Pendulum¶

How the rush to AI-assisted development is quietly reviving the failure modes we spent twenty years escaping¶

A working white paper — Part of an ongoing series of field observations

About this document¶

This is a living document. It is being assembled over time from direct, first-hand observations inside a software organisation undergoing an AI-driven transformation, supplemented by the broader industry conversation. Names, identifying details, and organisation-specific particulars have been deliberately removed. The subject here is never a person and never a single company — it is a pattern of reasoning that is recurring across our industry right now. Where a specific decision or memo is described, it stands as an archetype, not an accusation.

The argument is not that AI-assisted development is bad, nor that the leaders driving it are foolish. It is that a familiar reflex — the pursuit of predictability through up-front control — is reasserting itself under new branding, and that we are at risk of un-learning hard-won lessons because the tooling feels new enough to make the old lessons seem irrelevant.

A note on this revision. This document is long by design — it is a working reference, not a finished publication, and the depth is deliberately retained so that nothing is lost while the argument and evidence are still developing. For readers who want the gist before deciding whether to read further, the Executive summary and Key claims at a glance below are the on-ramp. A future published version is likely to be shorter, or to be broken into several focused articles; this master document keeps everything.

Executive summary¶

The software industry spent roughly two decades learning, expensively, that you cannot fully specify a non-trivial software product before you build it — that the right thing to build is discovered through feedback, not predicted through specification. Iterative methods were the correction to that lesson. Capable AI coding agents are now prompting some organisations to swing back toward specify-everything-up-front delivery, not because anyone decided the old way was right, but because the agent supplies a fresh, persuasive justification for it: if implementation is now fast and cheap, the reasoning goes, then the work is to specify completely up front and let the agent execute.

This paper argues that the reasoning contains one inherited error — it assumes the difficulty of software lived in the implementation that the agent made cheap, when the difficulty always lived in knowing what to build, which the agent does not touch. The agent changed the cost of writing code. It did not change the cost of being wrong about what to build. A great deal follows from that single distinction.

The paper is deliberately even-handed about where the swing is reasonable. Agents genuinely excel at low-risk, low-novelty, well-trodden work, and a large share of commercial software lives there. The danger is not using agents; it is imposing one corner's playbook across an entire portfolio — including the high-risk, novel, architecturally deep work where the missing comprehension and the deferred discovery become hazards rather than annoyances. The paper also takes seriously the strongest objections to its own thesis — that rich up-front context is not the same as Waterfall, that verification practices are maturing, and that sentiment toward AI is broadly positive — and holds them open as live questions rather than dismissing them.

The central concern is not the tool. It is amnesia: the risk of discarding hard-won lessons about feedback, comprehension, and honest predictability because a genuinely powerful new tool makes those lessons feel obsolete. The paper's constructive claim is that the same agent, used inside a short feedback loop with comprehension deliberately protected, delivers everything its proponents want — quality, speed, ownership, trust — without the swing's failure modes. The disagreement with the swing is about means, not ends.

This is a living document. Its arguments (Part I) are developed; its field evidence (Part II) is just beginning to accumulate; its synthesis (Part III) is a hypothesis to be tested; and its register of competing perspectives (Part IV) is explicitly open, intended to capture every serious reading of what is happening — including those that contradict the paper's own thesis — and to let accumulating evidence, not the author's preference, decide between them over time.

Key claims at a glance¶

The following are the paper's load-bearing claims, stated plainly. Each is argued in full in the body; each is held as a position open to revision by evidence.

The bottleneck was never the typing. It was knowing what to build and understanding what was built. The agent accelerates production; it does not touch discovery or comprehension.
A "complete" up-front specification is a contradiction for novel work. The most detailed specification possible is the program itself; any description short of that leaves gaps the agent fills with defaults.
Rich context is good; freezing it behind a sign-off gate is the problem. The test is whether the artefact is treated as a revisable hypothesis or as a contract. This distinction is central and easily missed.
The swing is right in one corner of the map and wrong as a universal model. Risk, novelty, and architectural complexity determine where specify-and-delegate is safe and where it is hazardous.
Code is a liability, not an asset. Volume is a false productivity signal; the scarce resource is comprehension, and the swing optimises the wrong one.
Delegating implementation can delegate comprehension with it — "Knowledge Debt" — breaking the feedback loop at both ends.
Reorganising people reorganises systems (Conway's Law): a restructure framed as a reporting-line change is silently an architecture decision.
The honest form of predictability is cadence and steering, not scope-on-a-date. The swing reaches for the kind that cannot be kept.
The constructive path keeps the tool and the lesson: agents pointed at exploration and inside short feedback loops, with comprehension protected and rigour scaled to the stakes.

Open questions the paper does not consider settled (see Part IV): whether front-loaded verification practices are maturing fast enough to close the loop; whether business–IT trust is in fact being eroded or strengthened; and whether "more context up front" is meaningfully distinct from a return to up-front specification. These are matters for the evidence to decide.

Thesis¶

For roughly two decades, the software profession painfully internalised a single lesson: you cannot fully specify a non-trivial software product up front, and pretending you can produces the exact failures — late delivery, wrong features, brittle systems, broken trust — that the up-front specification was supposed to prevent. Iterative and incremental methods (Agile, Scrum, XP, Lean, Continuous Delivery) were not a fashion. They were a correction — an empirical response to the documented failure rate of big-design-up-front, sequential ("Waterfall") delivery.

The rise of capable AI coding agents is now triggering a swing back toward that older model — not because anyone has decided Waterfall was right after all, but because agents create a seductive new justification for behaviours we had learned to distrust. If an agent can implement a well-specified requirement quickly, then (the reasoning goes) the bottleneck is no longer implementation but specification — so we should specify more, earlier, and more completely. "Document everything up front so the agent can do its work" is Waterfall's central premise wearing a new coat.

This paper documents that swing as it happens: the reasoning that drives it, the organisational changes it produces, and the downstream impact on the team, the software, and — most corrosively — the trust between the business and IT that iterative methods were specifically designed to rebuild.

The pendulum is the recurring motion of our industry: every genuine advance arrives with a story about why the old constraints no longer apply, and we have to re-learn why they did.

Reader's map¶

The paper is structured in four movements:

The swing — what is happening, why the reasoning is so persuasive, where it's actually reasonable, and how the pressure is being sold from outside.
The evidence — field observations, organised by impact area, accumulated over time.
The synthesis — what the pattern teaches, and what a genuinely AI-native practice would look like if it kept the lessons instead of discarding them.
The open register — competing perspectives and objections, including those that contradict this paper's thesis, held open and tested against evidence rather than adjudicated in advance.

For the impatient: the Executive summary and Key claims above stand on their own. Part I is the argument; Part IV is where the strongest disagreements live; the rest can be read by following the cross-references from whichever claim you most want to test.

Part I — The Swing¶

1. Introduction: the shape of a pendulum¶

Section brief. Establish the central metaphor and stake the claim. Open with the specific, concrete trigger — the podcast/observation that "we've walked back twenty years of learning." Define the pendulum: each technological advance comes packaged with a narrative that the old constraints are now obsolete, and we periodically have to re-discover that they weren't. State plainly what this paper is and isn't (not anti-AI; anti-amnesia). Set the tone: fair, observational, structural — critiquing reasoning, never people.

Drafted prose follows.

I heard the sentence that started this paper on a podcast, late one evening. The conversation had drifted, as so many do now, to what AI coding agents are doing to our profession — and one of the speakers said, almost in passing, that it felt like we had quietly walked back twenty years of learning. Back to documenting everything up front. Back to specify-then-build. Back, in all but name, to Waterfall.

I put the episode down and could not stop turning the thought over, because I was watching it happen. Not in the abstract, industry-trend sense — in the memos landing in my inbox, the restructures being announced, the new processes I was being asked to adopt. The shape was unmistakable once I saw it. We were being told that because an agent can now build a well-specified thing quickly, the job was to specify completely and up front, get it approved, and then go and build it. That the bottleneck had moved, and so the discipline should move with it. It was delivered with the confidence of novelty. But I had read this story before. We all had. It was the story our profession spent two decades learning to distrust.

This paper is about that motion — the swing of a pendulum.

Our industry advances in swings. Every genuine breakthrough arrives carrying a quiet, persuasive narrative: the old constraints no longer apply. Sometimes that narrative is right, and the constraint really has dissolved. More often it is half-right in a way that is more dangerous than being wrong, because it lets us discard a hard-won lesson on the grounds that the lesson belonged to a world that no longer exists. Then we re-learn the lesson, usually at cost, and call the re-learning innovation. The pendulum swings out on a story and swings back on the evidence.

The breakthrough this time is real. I want to be unambiguous about that from the first page, because everything that follows depends on it. AI coding agents are a genuine advance. They change what a single engineer can do in an afternoon. The leaders reaching for them are not foolish, and the problems they are trying to solve — software delivered late, software that misses what the business actually needed, quality that erodes under pressure, timelines no one can trust — are real problems that deserve serious answers. None of this is a complaint about AI, and none of it is a complaint about the people steering toward it. If you came looking for either, you will be disappointed.

What this paper is about is amnesia. Specifically, it is about the way a real advance is being used to justify the return of a way of working that we had good, expensive, well-documented reasons to abandon — the belief that you can know a non-trivial software product completely before you build it, write that knowledge down, freeze it, schedule it, and then execute the schedule. That belief has a name. It dominated our profession for a long time, it failed in a remarkably consistent way, and the entire iterative movement — Agile, Scrum, Lean, Continuous Delivery, whatever local dialect you grew up speaking — was the correction. Not a fashion. A correction, grounded in the documented failure rate of the thing it replaced.

The agent does not repeal that lesson. It changes the cost of writing code. It does not change the cost of being wrong about what to build — and being wrong about what to build, then discovering it slowly, behind a sign-off gate, after the requirement was frozen, was precisely the failure the old way produced. The keystrokes were never the expensive part. The discovery was. The agent has made the cheap part cheaper and left the expensive part exactly where it was, while encouraging us to behave as though the reverse were true.

So we are swinging back. Not because anyone sat down and concluded that big-design-up-front was right all along — almost no one would defend it by name — but because the agent supplies a fresh, modern-sounding reason to do the old thing. "Document everything up front so the agent can do its work" is Waterfall's founding premise wearing this season's coat. And it is meeting very little resistance, partly because much of what passed for iterative practice had already hollowed out into ceremony years ago, leaving the actual insight — feedback over prediction, steering over specification — undefended when the swing came.

I am writing this from inside the swing, in real time, as a working software engineer watching it reshape an organisation I am part of. What follows is therefore two things at once. It is an argument — that this pattern is real, that it recurs, and that it carries predictable costs to the team, to the software, and most corrosively to the trust between the business and the people who build for it. And it is a field log — a record of specific, anonymised observations gathered as they happen, each one tied not to a grievance but to the structural mechanism it reveals and the lesson it shows us un-learning. Anecdotes are easy to wave away. Anecdotes pinned to a mechanism, accumulating in one direction over time, are harder to ignore. That is the document I am trying to build.

A note on fairness, because it governs everything here. The antagonist of this paper is never a person and never a single company. It is a pattern of reasoning — the old, recurring reflex to reach for predictability through up-front control whenever delivery feels chaotic, now handed a powerful new justification. I have anonymised every specific deliberately. Where I describe a decision or a memo, treat it as an archetype, because that is what makes it useful: the pattern is not interesting because it happened to me, it is interesting because it is happening everywhere, probably including wherever you are reading this. Throughout, I will try to state the strongest version of the case I am arguing against before I argue against it. Where I fail to, that is a flaw in the writing, not a sign that the other side has no case. It usually has a good one. That is exactly what makes a pendulum so hard to catch.

And one clarification that the rest of this paper depends on, stated here at the outset because it is the single point most easily misread. This paper is not against up-front thinking, rich context, or clear requirements. It is against freezing them behind a sign-off gate and treating them as complete. Those are entirely different things, and conflating them would turn a careful argument into a foolish one. Understanding the business need before you build, gathering the best context you can, writing down what you currently believe the problem to be — these are good practices under any methodology, and the agentic era arguably makes them more valuable, not less, because better context produces better starting points. The disease was never the document. The disease is the gate: the moment the document is declared finished, signed off, and handed across a boundary to execution on the belief that what remains is mere implementation. The test that separates the healthy version from the harmful one is a single question asked of the up-front artefact — is it held as a revisable hypothesis, or as a contract? A rich specification treated as a hypothesis, fed into a fast loop of build-and-learn, is good modern practice. The identical document treated as a frozen contract is the swing this paper is about. The same artefact can be either. The posture toward it is what matters, and it is the posture, never the context, that this paper critiques. Wherever the argument below appears to attack "specification," read it as attacking the frozen, gated, treated-as-complete kind — never the act of thinking clearly before building.

Let me start, then, where the argument has to start: with an honest account of the lesson we are in the middle of forgetting, and why we learned it in the first place.

[Next: Section 2 — A short, honest history of the lesson we are un-learning]

2. A short, honest history of the lesson we are un-learning¶

Section brief. The credibility of the whole paper rests on getting this right and getting it fair. Sketch the genuine history: why sequential/up-front methods dominated, why they were a reasonable response to the conditions of their time (expensive compute, expensive change, manufacturing analogies), and then why they failed at scale — the documented pattern of late, over-budget, requirement-mismatched delivery. Then the correction: the Agile Manifesto and its antecedents (iterative development, Boyd, Lean, Toyota) as an empirical response, not an ideology. Key point to land: Agile's core insight was about feedback and uncertainty, not about ceremony or velocity charts. Most "Agile" failures are failures to absorb that insight while keeping the rituals. This sets up the irony: the AI swing repeats the original mistake (believing uncertainty can be specified away), and most "Agile" practice had already lost the plot, which is why the swing meets so little resistance. (Drafted below. Facts checked: Agile Manifesto — 17 practitioners, Snowbird, Utah, 11–13 Feb 2001, explicitly against documentation-driven heavyweight process. Royce 1970 — verify the direct quotes against the original paper "Managing the Development of Large Software Systems" before publishing; "risky and invites failure" is widely reproduced but confirm wording at source.)

Drafted prose follows.

There is a story the software industry tells about itself, and like most origin stories it is half true, which is the most dangerous kind of true.

The story goes like this. Once, we built software the wrong way — in a rigid, sequential march called Waterfall, where you gathered every requirement, then designed the whole system, then built it, then tested it, then shipped it, each phase complete before the next began. It failed, repeatedly and expensively. So a better way was found — iterative, incremental, feedback-driven — and in 2001 it was given a name and a manifesto, and the industry was saved. We do Agile now. We learned.

Almost every load-bearing claim in that story is more complicated than it sounds, and the complications are exactly what we are in danger of forgetting. So it is worth telling properly, because the lesson we are un-learning is not the cartoon version. It is the real one underneath.

Start with Waterfall itself, because even its origin is an irony that should make us humble. The sequential model is almost universally traced to a 1970 paper by Winston Royce, "Managing the Development of Large Software Systems." What is far less often mentioned is that Royce presented the pure sequential approach in order to criticise it. He described the simple, straight-through model and then said, in plain terms, that it was risky and invited failure — that in his experience it simply did not work on large developments. The bulk of his paper was an argument for adding iteration and feedback. Royce did not even use the word "waterfall." The diagram he offered as an example of what not to rely on became, through a game of citation telephone conducted by people who evidently had not read past the first figure, the canonical picture of how serious software ought to be built. A warning was adopted as a manual.

Pause on that, because it is the pendulum's first appearance in this paper, decades before anyone trained a language model. An idea was stripped of its context and its caveats, reduced to a reassuring diagram, and adopted at scale precisely because the reassuring version was easier to act on than the careful one. That is not a quirk of the 1970s. That is the recurring mechanism this entire paper is about. Hold the shape of it in mind.

And here is the part the cartoon leaves out: the sequential model, for all Royce's warnings, was not stupid, and it did not come from nowhere. It was a reasonable response to the conditions of its time. Computer time was breathtakingly expensive. Change was expensive — you were not refactoring at the speed of a keystroke, you were rewriting punched decks and rebuilding over hours or days. The dominant engineering analogies came from disciplines where you genuinely cannot iterate cheaply: you do not pour a bridge, see how it feels, and pour it again. In a world where the cost of getting it wrong late was catastrophic and the cost of thinking hard up front was comparatively low, front-loading the thinking was not madness. It was an attempt — a rational one — to manage real risk with the tools and economics available. This matters for fairness, and it matters for the argument, because the people reaching for up-front control today are responding to a real version of the same pressure. We will return to that.

But software is not a bridge, and the analogy that justified the sequential model is also what doomed it. The defining property of any non-trivial software product is that you do not, and cannot, fully know what it should be until you are some way into building it. Requirements are not discovered by interrogation and then frozen; they are discovered by building something and watching what happens — by users touching a real thing and revealing what they actually meant, which is reliably not what they first said. The sequential model's foundational assumption — that requirements can be made complete and stable before construction begins — is not merely hard to satisfy. It is a category error about what kind of activity software development is. It treats discovery as if it were transcription.

So the model failed, and it failed in a consistent, recognisable signature that anyone who lived through it can recite: projects ran late, ran over budget, and — most damningly — delivered, at the end of a long march, software that did not match what the business now needed, because by the time the long march finished, the need had moved, or had never been correctly understood in the first place. The defect found in testing at the end was the cheapest part; the requirement that was wrong from the start, faithfully implemented over eighteen months, was the catastrophe. And every late discovery was expensive in direct proportion to how late it was, because the whole structure was built to prevent going back.

The correction did not arrive in 2001. This is the second thing the cartoon gets wrong, and it matters. Iterative and incremental ideas had been circulating for decades — Royce himself in 1970, evolutionary and spiral approaches through the 80s, the lessons of Lean manufacturing and the Toyota Production System about feedback and flow, Boyd's loop in another field entirely making the same point about the survival value of fast observation-and-adjustment cycles. The insight that you steer by feedback rather than by foresight is old, recurrent, and was arrived at independently many times. What happened in 2001 was not invention. It was naming and consolidation.

In February of that year, seventeen software practitioners met at a ski lodge at Snowbird, in Utah, over three days. They came from a scattering of existing "lightweight" methods — Extreme Programming, Scrum, Crystal, Feature-Driven Development, DSDM, Adaptive Software Development, and others — and what united them was not a shared method but a shared antagonist: the documentation-driven, heavyweight, sequential process that they had all, separately, watched fail. They did not agree on much. What they could agree on was a short statement of values. They valued working software over comprehensive documentation, customer collaboration over contract negotiation, responding to change over following a plan, and individuals and interactions over processes and tools — while explicitly granting that the things on the right had value, just less than the things on the left.

Read those values again with this paper's concern in mind, because the relevance is almost uncomfortable. Working software over comprehensive documentation. Responding to change over following a plan. The entire document is, at its heart, a single argument: that you cannot specify your way to the right software, and that the attempt to do so is the disease, not the cure. The manifesto was not a productivity hack or a project-management fad. It was the industry's hard-won, empirically-paid-for answer to the precise failure the sequential model produced. It was a correction.

Which brings us to the third and most important thing the cartoon gets wrong — the thing that explains why the swing this paper describes is meeting so little resistance.

Most of what is practised under the name "Agile" today has already lost the plot.

The insight at the centre — feedback over prediction, steering over specifying, uncertainty acknowledged rather than denied — turned out to be far harder to keep than the rituals built around it. Rituals are easy to adopt and easy to measure: the stand-up, the sprint, the backlog, the velocity chart, the story-point estimate. The insight is hard, because it requires the organisation to tolerate not knowing — to commit to a direction and a cadence without committing to an exact scope on an exact date, which is the very thing anxious stakeholders most want to be promised. And so, predictably, a great many organisations kept the ceremonies and quietly discarded the insight. They run sprints that are just short waterfalls. They write the whole specification up front and then "deliver it incrementally." They demand the fixed scope, the fixed date, and the daily stand-up, and call the result Agile. The word survived; the meaning largely did not.

This is not a side-note. It is the setup for everything that follows. Because if the profession had genuinely internalised why we work iteratively — if the insight were alive and defended rather than embalmed in ceremony — then the arrival of a tool that tempts us back toward specify-everything-up-front would meet immediate, instinctive resistance. Engineers and leaders alike would recognise the smell. Instead, the swing arrives into an environment where "Agile" was already a hollow ritual for many, where the actual lesson was already half-forgotten, and where a confident new narrative about specifying completely so the agent can execute does not feel like a betrayal of a hard-won principle — because the principle was no longer being felt. You cannot defend a lesson you have already reduced to a stand-up meeting.

There is a sharper way to put this, suggested by a practitioner reading of how the field actually divides, and it adds a causal prediction the bare observation lacks. Roughly speaking there are two camps operating under the single word "Agile." One camp does Agile because it is what everyone does — large organisations running the full ceremony, operating in constrained environments, squads with little real autonomy, still building from specifications: in effect, Waterfall performed in sprints. The other camp runs true Agile — teams that understand the principles, self-organise, know what the method is actually for, and are willing to experiment. The distinction matters here because it predicts who swings. It is precisely the first camp — the one that adopted the rituals without the insight, that was already doing spec-driven delivery in two-week costumes — that will find specify-everything-up-front-and-hand-it-to-an-agent most natural, because it is barely a change from what they were already doing. The swing is not a random affliction. It lands hardest exactly where the insight was already absent, which is both why it meets so little resistance and why it tends to be invisible to those it affects: you cannot notice you are abandoning a principle you were not practising. The organisations most vulnerable to the swing are the ones that never truly left Waterfall in the first place; the agent simply gives their existing instinct a modern and respectable name.

So this is the lesson we are in the middle of un-learning, stated plainly so we can watch it happen: software is an activity of discovery under irreducible uncertainty, the right thing to build is found by feedback and not by specification, and every methodology that has denied this has produced the same failures. We learned it the expensive way once. We named it in Utah in 2001. We then spent two decades slowly forgetting what the name meant. And now a genuinely powerful new tool is arriving with a story that gives us permission to forget the rest.

The next section is about why that story is so persuasive — and why the tool, real and valuable as it is, does not actually support the conclusion it is being used to justify.

[Next: Section 3 — Why the agent makes the old mistake feel new]

3. Why the agent makes the old mistake feel new¶

Section brief. The intellectual heart of Part I. Walk through the seductive syllogism step by step: (1) agents make implementation fast and cheap; (2) therefore the constraint moves "left" to specification; (3) therefore invest heavily in complete up-front specification; (4) therefore a fully-specified requirement becomes a schedulable, predictable, committable unit of work. Show why each step contains a hidden, false assumption — chiefly that requirements are knowable in advance and that specification is the hard part rather than the discovery. Introduce the key reframing: the agent has changed the cost of typing code, not the cost of being wrong about what to build. The expensive thing was never the keystrokes. Connect to your "Knowledge Debt" concept here as a forward reference — when comprehension is outsourced along with implementation, the feedback loop that iterative methods depend on is severed at both ends.

Drafted prose follows.

If the lesson of the previous section is so well established — if the failure of specify-everything-up-front is one of the best-documented patterns in our field — then we should ask an honest question. Why are intelligent, experienced people walking back toward it? Not stupid people. Not lazy people. The leaders driving this are, in my experience, thoughtful and genuinely trying to solve real problems. So the explanation cannot be that they have forgotten the lesson out of carelessness. Something is making the old mistake feel like a new and reasonable idea. This section is about what that something is.

The engine of the swing is a syllogism. It is rarely stated this baldly, but it is the chain of reasoning underneath nearly every "AI-ready" restructure, and it is worth setting out in full because it is genuinely persuasive — each step follows so naturally from the last that the conclusion feels not just defensible but obvious. The argument runs:

AI agents make the implementation of software dramatically faster and cheaper.
Therefore the bottleneck in delivery is no longer implementation. It has moved "to the left" — to specification, to deciding precisely what to build.
Therefore the rational investment is in specifying more completely and earlier — producing a thorough, unambiguous description of the desired software up front, so the now-cheap implementation step can execute against it.
Therefore a fully-specified requirement becomes a discrete, estimable, schedulable, committable unit of work — and a pipeline of such units gives the business the reliable, predictable delivery it has always wanted.

That is the whole case. And notice what makes it so seductive: every individual step is partly true, and the conclusion lands on exactly the thing the business most wants — predictability. It does not feel like a regression. It feels like the long-promised maturation of software into a properly engineered, plannable discipline, finally made possible by a tool powerful enough to deliver it. The swing does not arrive announcing itself as Waterfall. It arrives wearing the language of progress and rigour and professionalism.

So we have to take it apart carefully, step by step, because the flaw is not in any one step being wholly wrong. The flaw is in a single false assumption that is quietly inherited from step to step and never examined.

Step one is true, and we should say so plainly. Agents do make implementation faster and cheaper. This is not in dispute and this paper does not dispute it. The generation of code — the turning of a clear, bounded intention into working syntax — is genuinely, sometimes startlingly, accelerated. Anyone arguing otherwise is not paying attention. The honesty of conceding this fully is what gives us the standing to challenge what comes next.

Step two is where the false assumption enters, disguised as arithmetic. "Implementation is no longer the bottleneck, therefore the bottleneck has moved to specification." This sounds like simple displacement — squeeze one part of the pipe and the pressure moves elsewhere. But it smuggles in a premise: that the difficulty of software was located in implementation in the first place, such that relieving it relocates the constraint to the adjacent phase. And that premise is precisely, demonstrably false — it is the same category error that doomed the sequential model fifty years ago. The hard part of software was never the typing. It was never the act of writing the code. The hard part — the genuinely, irreducibly hard part — was knowing what to build, and that knowledge does not exist in advance to be specified. It is discovered, incrementally, by building something and confronting reality with it. Step two assumes the bottleneck moved from implementation to specification. The truth is that the real bottleneck was never in the pipe the syllogism is looking at. It was, and remains, in the loop between building and learning — a loop the syllogism does not even represent.

This is the spine of the entire paper, so let me state it as flatly as I can: the agent has changed the cost of writing code. It has not changed the cost of being wrong about what to build. Those are different quantities, and they were always different quantities. We conflated them only because, historically, they happened to travel together — writing code and discovering whether the code was the right code were tangled into the same slow activity, so making that activity faster felt like it would speed up everything. The agent severs them. It makes the writing nearly free while leaving the discovery exactly as expensive as it ever was. And a syllogism that treats the now-cheap thing as though it were the formerly-expensive thing will reach a confident, well-reasoned, wrong conclusion.

Step three inherits the error and acts on it. "Therefore invest in complete up-front specification." If you have accepted step two — if you believe the constraint is now specification — then pouring effort into more complete, earlier, more rigorous specs is the logical response. But you are now optimising the wrong activity. You are investing in the completeness of a description of something you do not yet adequately understand, on the assumption that understanding is a matter of description. It is not. A more detailed specification of a misunderstood requirement is not closer to the truth; it is a more elaborate, more confident, more expensively-produced version of the same misunderstanding — and harder to abandon precisely because so much was invested in it. (Section 4 examines what this specification actually is once you write it down: an "AI-ready prompt," which is to say a spec, which is to say the thing we are pretending not to have reintroduced.)

Step four is where the error becomes organisationally load-bearing. "Therefore a fully-specified requirement is a committable unit of work, and a pipeline of them yields predictable delivery." This is the step the business cares about — it is the point of the whole exercise — and it depends entirely on the completeness of the specification being real. A "fully-specified requirement" can only be a reliable, estimable, committable unit if the specification is actually complete and actually stable. But we have just seen that it is neither, and cannot be, for any non-trivial product. So the schedule built on top of it is a schedule built on a fiction — a confident, detailed, signed-off fiction, which is the most dangerous kind, because everyone has agreed to treat it as fact. The predictability promised in step four is not predictability. It is the appearance of predictability, purchased by pretending the irreducible uncertainty was specified away. And when reality arrives — when the discovery that was deferred shows up anyway, as it always does, now much later and much more expensively — the schedule breaks, and the trust built on the schedule breaks with it. (Section 13 follows that break to its destination.)

There is a further turn, and it is the one that makes this swing genuinely worse than the original — not merely a repeat of it. The sequential model of the 1970s at least left the human developer in possession of understanding. A developer working through a flawed up-front spec was, at least, building the thing themselves, reading every line, accumulating comprehension of the system even as they implemented the wrong requirements. The understanding was real even when the requirements were wrong. The agentic version severs that too. When implementation is delegated to an agent, comprehension is delegated along with it — quietly, as a side effect nobody chose. The code arrives without the understanding that writing it would have produced. This is the concept I have elsewhere called Knowledge Debt: the accumulating gap between the code a system contains and the comprehension the team actually holds of it. I will treat it at length in Part II, because it is one of the central observable harms of the swing. But it belongs here too, in the analysis of the syllogism, because it explains why the swing breaks the iterative loop at both ends rather than one. Iterative development depends on a feedback loop: build a little, learn from it, adjust. The swing damages the front of that loop by deferring discovery behind a specification gate — and it damages the back of the loop by delivering code the team does not understand well enough to learn from. You cannot adjust intelligently on the basis of a result you do not comprehend. A feedback loop with comprehension removed from the return path is not a feedback loop. It is an open loop wearing the costume of one.

This is why the old mistake feels new. The syllogism is clean, each step is locally plausible, and the conclusion delivers precisely what leadership has always been asked to provide. The tool at the centre is real and its acceleration is real. Nothing about the experience feels like regression. And yet the whole structure rests on a single inherited falsehood — that the difficulty of software lived in the part the agent made cheap — and a single unexamined consequence — that delegating the writing also delegates the understanding. Strip those two things out and the case for swinging back collapses. Leave them in, unexamined, and the swing feels not only reasonable but inevitable.

The next section takes the syllogism out of the abstract and into the specific form it actually takes inside an organisation: the instruction to turn every requirement into an approved, signed-off "AI-ready prompt" before any work begins — and why that artefact is a specification document in everything but name.

[Next: Section 4 — The "AI-ready prompt" as a specification document in disguise]

4. The "AI-ready prompt" as a specification document in disguise¶

Section brief. Concrete, specific, and where the field evidence starts to bite. Examine the now-common organisational instruction to translate stakeholder discussions into a complete, approved "AI-ready prompt" before work begins, with the requirement and estimate signed off, after which the engineer "goes off and does the work." Name it for what it is structurally: a big-design-up-front handoff. The prompt is the spec; the sign-off gate is the phase gate; "go off and do the work" is the construction phase. Note the tell-tale reintroduced assumptions: that the requirement is complete and stable at sign-off, that estimation is reliable once the prompt exists, that discovery happens before rather than during. Be fair: acknowledge what's genuinely good about forcing clarity of intent. The critique is the gate, not the clarity.

Corroboration (Zechner) — the strongest external support in the paper. An agent author argues directly that spec-driven development is a return to waterfall that the industry spent decades learning to avoid, with the only change being a faster iteration cycle — the underlying flaw (incomplete spec → the agent fills the gaps with mediocre internet-average patterns) is unchanged. His decisive formulation: the most detailed spec it is possible to write is the program itself; any natural-language spec below that level leaves blanks filled by mediocre defaults. That single idea is the technical core of this section — a "complete" prompt is a contradiction, because completeness is the code. Use sparingly and in the author's own words; do not let one agreeing expert turn the section into an appeal to authority. Note too that Zechner collapses "vibe coding" (casual dictation) and "enterprise vibe coding" (detailed spec first) into the same act — both hand the gaps to the agent — which punctures the apparent sophistication of a formal "AI-ready prompt" process: dressing the spec up does not remove the blanks.

Drafted prose follows.

The previous section traced the swing's logic in the abstract. This section watches it touch ground, because an organisation does not adopt a syllogism. It adopts a process — and the process is where the abstract error becomes a concrete instruction that lands in an engineer's inbox.

The instruction, in the cases I am observing, takes a recognisable form. An engineer is expected to participate in stakeholder discussions about a desired feature, then to write that feature up into what is called an "AI-ready prompt" — a thorough, structured description complete enough for an agent to implement against. The requirement and an accompanying time estimate are then approved by the appropriate stakeholders and technical people. Once approved, the engineer "goes off and gets the work done." The language is new. The shape is not.

Let me name the shape directly, because naming it is most of the work of this section. The prompt is the specification. The approval is the phase gate. "Go off and get the work done" is the construction phase. What has been described, in the vocabulary of agents and prompts, is a sequential, document-driven, big-design-up-front process with a sign-off boundary between the deciding and the building. It is the precise structure the previous fifty years taught us to distrust, reconstituted out of new nouns. If you took this process to the seventeen people at Snowbird and stripped the word "AI" from it, they would recognise it instantly as the thing they met to argue against.

It is worth being scrupulously fair about what is good here, because the critique is precise and I do not want it heard as broader than it is. Forcing clarity of intent is genuinely valuable. Making an engineer articulate what a feature is for, who needs it, and what "done" would mean is good practice under any methodology — it is, in fact, exactly the kind of thinking iterative methods also demand, just continuously rather than once. The problem is not that engineers are asked to think clearly before building. The problem is the gate: the moment at which the thinking is declared complete, frozen, signed off, and handed across a boundary to execution, on the assumption that what remains is mere implementation. The clarity is good. The gate is the reintroduced disease. A reader who comes away thinking this paper opposes writing things down has misread it; this paper opposes treating the written-down thing as finished knowledge.

And here the strongest external observation available to this paper does real work — not as authority, but because a practitioner who builds coding agents for a living has stated the technical core of the matter more cleanly than I can. The argument is that spec-driven development is, structurally, a return to waterfall, and that the only thing the agent has changed is the speed of the iteration cycle; the foundational flaw — that an incomplete specification leaves gaps which the agent fills with whatever patterns it absorbed from the open internet, which are on average mediocre — is entirely unchanged. The decisive formulation is this: the most detailed specification it is possible to write is the program itself. Any natural-language description short of the actual code is, by definition, incomplete — and every blank it leaves is a decision the agent will make for you, silently, drawing on the average of everything it has seen. A "complete prompt" is therefore very nearly a contradiction in terms. Completeness is the code. If your prompt were genuinely complete, it would be the program; since it is not the program, it is not complete, and the difference is filled in by something other than your judgment.

This is the quiet detonation under the whole "AI-ready prompt" edifice. The process presents the prompt as a sufficient, approved, complete artefact against which implementation is a deterministic act. But the prompt is necessarily incomplete, and the gap between the prompt and the program is not empty — it is filled by the agent's defaults. So the sign-off ceremony approves something that does not contain what it claims to contain. The stakeholders believe they have approved "the feature." What they have actually approved is a partial description of the feature plus an unbounded, unexamined set of decisions that will be made later, by a tool, on the basis of internet-average patterns, with no one watching the moment they are made. The estimate attached to that approval inherits the same fiction: you cannot reliably estimate the time to build a thing whose specification is, by nature, a sketch whose blanks will be filled by surprises.

There is a tempting defence of the "AI-ready prompt" that says it is more sophisticated than mere vibe coding — that writing a detailed, structured spec first is a disciplined, enterprise-grade practice quite different from casually dictating an app and hoping. It is worth dismantling this defence directly, because it is the form the process most often takes when it wants to sound rigorous. The uncomfortable observation from practice is that casual dictation and detailed-spec-first are the same act differing only in length. Both hand the agent a description that falls short of the program, and both therefore hand the agent the same job: fill the blanks with defaults. A longer, more structured, more formally-approved prompt has more words, but it does not have fewer blanks in the places that matter — because the places that matter are exactly the ones you did not know to specify, which is what made them matter. Dressing the spec up in process and sign-off does not remove the gap. It conceals it, and adds a ceremony that makes everyone more confident in a document that has not earned the confidence. The sophistication is real as theatre and absent as engineering.

None of this means the prompt is worthless or that engineers should not write clear intentions. It means the prompt must be understood for what it actually is: a starting hypothesis, useful precisely because building against it will reveal where it was wrong — not a contract, not a frozen truth, and emphatically not a basis for a committed delivery schedule. The difference between those two readings of the same document is the entire difference between iterative practice and the swing. A prompt held as a revisable hypothesis, fed into a short loop of build-and-learn, is a fine and modern way to work. The identical prompt, signed off at a gate and treated as complete, is Waterfall with better tooling. The artefact is the same. The posture toward it is everything, and the swing adopts the wrong posture while believing it has adopted a new one.

The following section asks why an organisation would want the wrong posture so badly — why the gate, the sign-off, and the schedule are so attractive that intelligent people will reconstruct a discredited process to obtain them. The answer is older than AI, and it is not really about AI at all.

[Next: Section 5 — Predictability as the real driver]

5. Predictability as the real driver¶

Section brief. Expose the actual motivation underneath the AI narrative, because it's older than AI. The recurring justification for these restructures is that the business wants a reliable, plannable, committable delivery schedule, and the current model can't provide one. This is true and sympathetic — and it is precisely the demand that produced heavyweight up-front process in the first place. The deep pattern: leadership reaches for predictability-through-control whenever delivery feels chaotic, and AI now supplies a fresh rationale for the same reach. The irony to land: iterative methods deliver predictability too, but a different kind — predictable cadence and steering, not predictable scope-on-a-date. The swing trades the achievable predictability for the seductive-but-false kind.

Corroboration (Zechner): a practitioner who built his own coding agent states the spine of this paper almost verbatim — writing lines of code was never the bottleneck; thinking about and designing the solution was. The agent makes exploration faster, not commitment safer. (Booch's framing of the agent as "a junior, overly enthusiastic programmer who doesn't know what they don't know" is the same observation from the architecture side.)

Drafted prose follows.

We have established that the "AI-ready prompt" process reconstructs a discredited shape, and that intelligent people are doing the reconstructing. The natural question is why — what do they want so badly that they will rebuild something they would, if asked directly, agree was a mistake? The answer is the quiet protagonist of this whole paper, and it predates AI by decades. They want predictability. More precisely: they want to be able to promise the business a reliable, plannable, committed delivery schedule, and to keep that promise.

This deserves genuine sympathy, not scorn, and the paper is weaker if it withholds it. Consider the position of the people one rung up from engineering — the product and delivery leadership who sit at the boundary between the business and the people who build. The business asks them a reasonable question: when will it be done, and can we count on that? They are held accountable for the answer. They are blamed when the date slips or the quality disappoints. And underneath them sits an activity that, conducted honestly, refuses to give them the clean answer they are being asked for — because, as we have seen, the honest answer to "when will this non-trivial thing be done" is "we will know more as we build it." That is a true answer and an almost unsayable one in a budget meeting. The pressure to convert it into a confident date is enormous, constant, and entirely understandable. Anyone who has not felt that pressure has not stood in that position.

So the desire for predictability is not a character flaw. It is a structural feature of the relationship between a business and its software function, and it has been generating the same response for half a century. Whenever delivery feels chaotic, leadership reaches for predictability through control — through more up-front planning, more complete specification, more rigorous sign-off, tighter sequencing. The reach is instinctive and it recurs because the underlying anxiety recurs. The sequential model of the 1970s was one expression of it. The heavyweight, documentation-driven methodologies of the 1980s and 1990s were another. The "AI-ready" restructure is simply the latest, and what is genuinely new about it is not the impulse but the permission: the agent supplies a fresh, modern, credible-sounding rationale for an old reach that had become embarrassing to make in plain Agile-era language. You could not, in 2018, stand up and propose returning to big design up front without being laughed out of the room. In 2026, you can propose the identical thing as "AI-native delivery" and be applauded for vision. The impulse found a respectable new costume.

Here is the irony at the centre of the section, and it is the one worth landing hard. Iterative methods are not the enemies of predictability. They deliver predictability too — they were, in part, designed to. But they deliver a different kind of predictability, and the difference is the whole game. The swing offers predictable scope on a date: this exact set of features, finished by this exact day. Iterative practice offers predictable cadence and direction: a reliable rhythm of working increments, a dependable ability to steer, and high confidence that at any given moment you are building something close to what is actually needed and can correct quickly when you are not. One promises to tell you precisely what you will get and when. The other promises to keep you continuously close to the right thing and to never let you travel far in the wrong direction before noticing.

The first kind of predictability is the one the business instinctively asks for, and it is the one that cannot honestly be delivered for novel work, because it requires knowing the unknowable in advance. The second kind is less emotionally satisfying — it does not produce the clean Gantt chart that makes a steering committee feel safe — but it is real, it is deliverable, and it is, on any honest accounting, more valuable, because predictable cadence plus reliable steering is what actually gets the right software built. The tragedy of the swing is that it trades the achievable, real predictability for the seductive, false one. It abandons the kind of certainty you can actually keep in exchange for the kind that photographs well in a slide deck and shatters on contact with the first deferred discovery.

A practitioner who built his own coding agent put the engineering half of this more plainly than any methodology debate manages to: writing lines of code was never the bottleneck — thinking about and designing the solution was. That sentence is this entire paper compressed. If the bottleneck was always the thinking and the deciding, then a tool that accelerates the typing does not make commitment safer; it only makes exploration faster. And exploration speed is genuinely worth having — it is, as a later section argues, the real and substantial gift of the agent. But exploration speed is the opposite of what the predictability reach wants. The reach wants to stop exploring and start committing. The agent is excellent at the former and changes nothing about the wisdom of the latter. From the architecture side, the same truth shows up in the now-common observation that the agent behaves like a junior, eager programmer who wants to do well but does not know what it does not know — exactly the kind of contributor you would never hand a frozen spec and a deadline and send off unsupervised, because the things it does not know it does not know are precisely the things the gate assumes have been settled.

The deepest cost of the predictability reach is therefore self-inflicted and invisible until it lands. Leadership reaches for control to protect the trust between the business and IT — to finally be able to make and keep a promise. But the promise it reaches for is the one kind it cannot keep, and reaching for it actively forecloses the kind it could. The chase for false predictability does not just fail to build trust; it spends the trust that honest iterative delivery would have accumulated. Part II returns to this as the most corrosive of the swing's observable harms. For now the point is only this: the engine of the swing is not really AI at all. It is an old, sympathetic, recurring hunger for a certainty that software does not offer — and the agent's true role is to have made that hunger respectable to indulge again.

The next three sections complete the analysis of Part I by adding the dimensions the syllogism ignores: where on the landscape of software work the swing is actually reasonable versus dangerous; how the pressure is being manufactured and sold from outside the organisation; and the economic inversion — code as liability rather than asset — that the whole "more code, faster" narrative gets backwards.

[Next: Section 6 — The risk / novelty / complexity space]

6. The risk / novelty / complexity space: where the swing is reasonable and where it is dangerous¶

Section brief — NEW, derived from Booch. This is the fairness engine of the whole paper, and it should come before the evidence so the reader trusts the author's judgment. Borrow Booch's three-axis model and make it the paper's analytical backbone: software work varies along risk (from a throwaway script to "people die if this fails"), novelty (well-trodden ground vs. a genuinely new problem), and complexity (deployment-scale vs. deep architectural complexity). The crucial, disarming point: agents genuinely excel in the low-risk, low-novelty, deployment-complexity corner — and a large fraction of commercial work lives there. The swing toward specify-and-delegate is not wrong everywhere; it is wrong as a universal operating model imposed across the whole space. Land the insight that most disagreements about "AI's impact on software engineering" are really people standing at different coordinates in this space and generalising from their own location. The restructure's error is not adopting agents — it is applying a single corner's playbook to the entire product portfolio, including its high-risk, high-novelty, architecturally-deep parts (legacy platform risk, cross-system coordination, end-of-life migrations). This section earns the author the right to critique, because it concedes exactly where leadership is correct.

Drafted prose follows.

A criticism of the swing that stopped here would be vulnerable to a fair rebuttal: but I have seen agents work brilliantly. And the rebuttal would be correct. Agents do work brilliantly — sometimes. The mistake is not in observing that; it is in generalising from one's own vantage point to the whole of software. To argue well against the swing, this paper needs a map of the territory it is arguing about, so that the claim becomes precise: not the swing is wrong, but the swing is wrong where it is wrong, and an organisation imposing it everywhere is failing to distinguish where that is. This section builds the map, and in doing so it concedes — deliberately and up front — exactly where leadership is right. That concession is not a weakness in the argument. It is what gives the argument its standing.

The most useful framing I have encountered comes from software architecture rather than from the AI debate, and it places any piece of software work along three independent axes.

The first is risk: the cost of being wrong. At one end sits a throwaway script, a marketing landing page, an internal tool three people use — work where a defect is an annoyance and a rewrite is cheap. At the other end sits software where failure is catastrophic and possibly irreversible: systems that move money at scale, that control physical machinery, that hold safety or life in their correct functioning, that cannot be quietly rolled back once they have acted on the world. Risk is not about how hard the software is to build. It is about how much it costs when it is wrong.

The second is novelty: how well-trodden the ground is. Some work has been done ten million times before — the standard CRUD application, the familiar integration, the conventional web form — and the patterns are everywhere, settled, and abundant in exactly the training data an agent has absorbed. Other work is genuinely new: a problem your organisation, or the field, has not solved before, where there is no average-of-the-internet answer to draw on because the answer does not yet exist.

The third is complexity, and it splits into two quite different things that are often confused. There is deployment complexity — many moving parts, much surface area, large scale, but each part individually well-understood — and there is architectural complexity, the deep kind, where the difficulty is in the relationships, the coordination, the emergent behaviour, the significant decisions whose cost of getting wrong compounds through the entire system. Wiring together fifty familiar services is complicated. Designing the coordination logic for a fleet of autonomous agents acting in the physical world is complex in the deeper sense. They are not the same, and they do not respond to the same tools.

Now place the agent on this map honestly, because honesty here is the whole point. The agent is genuinely excellent in one corner of the space: low risk, low novelty, deployment-style complexity. The familiar feature, built from abundant well-known patterns, where a mistake is cheap to catch and cheap to fix — this is where the agent shines, and it shines brightly. And here is the concession that matters: a very large fraction of commercial software work lives in exactly that corner. The unglamorous truth of most organisations is that most of what they build is not novel, not high-risk, not architecturally deep. It is the ten-millionth instance of a known pattern. In that corner, specify-and-delegate is not foolish. It is often the right call. Leadership's enthusiasm is not hallucinated; it is generalised from real, repeated, genuine success in the corner where most of the day-to-day work happens to sit.

The error — and it is a precise, nameable error rather than a general foolishness — is in treating a corner's playbook as a universal operating model. The further any piece of work moves from that benign corner, the worse the swing's prescription becomes, and it degrades along each axis independently. As risk rises, the cost of the agent's silent, internet-average gap-filling stops being an annoyance and becomes a hazard, and the missing comprehension (the Knowledge Debt of Section 3) becomes the difference between a caught defect and a catastrophe nobody understood well enough to see coming. As novelty rises, the agent's central strength inverts into its central weakness: there is no abundant pattern to draw on, so the gaps it fills are filled with confident guesses about a problem that has no established answer — the eager junior improvising in a domain where improvisation is exactly what you cannot afford. As architectural complexity rises, the per-prompt, per-feature delivery model becomes structurally blind to the significant decisions, because no single prompt owns the relationships between the parts, and significance — measured precisely by the cost of changing a decision later — is exactly the thing a feature-at-a-time pipeline cannot see.

This map also explains, as a bonus, why the public argument about AI and software is so rancorous and so unresolvable. Most disagreements are not really disagreements about the agent at all. They are two people standing at different coordinates in this space, each generalising honestly from their own location. The engineer building standard web features and the engineer maintaining a safety-critical legacy platform are both telling the truth about their experience, and their experiences point in opposite directions, because they are describing different regions of the map as though they were describing the same thing. The agent did not change; the coordinates did. Recognising this dissolves a great deal of pointless argument and replaces it with the only question that matters: where does this particular piece of work actually sit?

Which brings the map to bear on the specific situation this paper observes. The error of a product portfolio reorganised wholesale around specify-and-delegate is not that it adopts agents. It is that it applies the benign corner's playbook to the entire portfolio — including the parts that sit nowhere near that corner. A portfolio is not homogeneous. It contains the standard features where the swing is fine, and it also contains the high-risk, high-novelty, architecturally-deep work: the ageing platform whose end-of-life carries existential risk, the cross-system coordination whose significant decisions compound, the migration whose cost of being wrong is measured in years. A single operating model imposed across all of it will be roughly right for the benign majority and dangerously wrong for the critical minority — and the critical minority is, almost by definition, where the organisation can least afford to be dangerously wrong. The damage does not announce itself, because the model keeps working in the corner where most work lives, right up until the day it fails expensively in the corner where the stakes were highest.

The constructive implication — developed fully in Section 16 — is not "abandon agents" but "locate the work before choosing the method." A mature practice would assess where a piece of work sits on these three axes and apply specify-and-delegate where it fits and disciplined, comprehension-preserving, iterative engineering where it does not. The failure is not the tool. It is the refusal to distinguish, the imposition of one corner's truth as the whole truth. That refusal is what turns a genuine advance into an organisational hazard.

The next section turns from the internal logic of the swing to an external force that helps explain its momentum: the fact that the predictability-and-speed narrative is being actively manufactured and sold, and sold to precisely the person with the authority to impose it.

[Next: Section 7 — The pressure is being sold upward]

7. The pressure is being sold upward: the vendor and economic layer¶

Section brief — NEW, derived from Zechner. The swing is not purely an internal misjudgment; it is being actively manufactured by market forces aimed at exactly the CIO archetype. Document the industry-pressure mechanics as structural context: (a) token prices for frontier models have not fallen as widely promised — some releases raised prices or changed tokenisation so the same text bills as more tokens; (b) subscription pricing is subsidised while pay-as-you-go API pricing carries real margin, which distorts the perceived economics of "just use more agents"; © most importantly, vendor go-to-market focus has shifted (per the source, from early 2026) away from developers and toward CIOs and enterprise contracts. That last point is the load-bearing one for this paper: the predictability-and-speed narrative that justifies the restructure is being marketed directly to the decision-maker who orders the restructure, over the heads of the engineers who can see its flaws. This reframes the CIO archetype sympathetically — they are the target of a sophisticated, well-funded narrative — while explaining why the swing meets so little resistance at the top. Keep this section tight and sourced carefully; it's the most "claim-like" material and the easiest to overreach on. Flag the cost-by-seniority budgeting trend and the pull toward smaller/local models as the likely economic correction.

Author's note — verify before publishing. The specific market claims in this section (token-price movements, tokeniser changes, the timing of the vendor go-to-market shift) come from a single practitioner source and are the most time-sensitive, most falsifiable material in the paper. Confirm each against current primary evidence before this section goes out under your name; pricing and vendor strategy move quickly and a wrong specific here would be seized on to discredit the whole. The structural argument stands even if individual figures need updating — but update them.

Drafted prose follows.

Everything so far has treated the swing as an internal event: a logic adopted, a process imposed, a hunger indulged. But the swing has a tailwind that originates outside the organisation entirely, and leaving it out would make the leadership at the centre of this paper look more culpable than it is. The predictability-and-speed narrative that justifies the restructure is not only believed internally. It is being actively manufactured and sold — and sold, with considerable precision, to exactly the person who has the authority to impose it.

Three mechanics are worth setting out, with the caution that this is the most time-sensitive material in the paper and the specific figures should be re-verified whenever it is read, because this part of the landscape moves monthly.

First, the promised collapse in the cost of using these models has not straightforwardly arrived. The widely-repeated expectation was that token prices would fall steadily toward triviality, making "just use more agents" an economic free lunch. In practice the picture is muddier: some frontier releases have raised prices rather than lowered them, and changes to how text is tokenised can mean the same input quietly bills as more units than it used to. The headline story of inexorably cheapening intelligence is, at the level of an actual invoice, not as clean as the narrative implies.

Second, the prices that developers most often encounter are not the prices that reveal the true economics. Flat-rate subscriptions, which is how most individual developers experience these tools, are subsidised — priced to drive adoption and gather usage, not to reflect cost. The pay-as-you-go pricing that an enterprise actually pays at scale carries real margin, and looks materially less attractive when set beside the cost of running comparable open-weight models directly. The cheapness that the individual developer feels, and reasonably reports upward as real, is partly an artefact of the subsidy. Decisions made for a whole organisation get priced at the unsubsidised rate, and the gap between the two is not always visible to the person making the case.

Third, and most important for this paper, the marketing has changed audience. Through the period when these tools were establishing themselves, the vendors courted developers directly — developer-focused messaging, developer tooling, developer enthusiasm. That served a purpose: it generated training data, drove bottom-up adoption, and built the credibility the tools now trade on. But the centre of gravity has shifted, and recently. The primary audience is increasingly the CIO and the enterprise contract — the person who signs for the organisation, not the engineer who types in the terminal. The narrative of speed, predictability, and transformation is now aimed, deliberately and with real budget behind it, at the decision-maker most able to mandate it and least able to see the failure modes from the keyboard.

That third point is the one that does real work here, because it reframes the leadership at the centre of this paper in a way that is both fairer and more illuminating. The CIO ordering the restructure is not the originator of the swing's central claims. They are the target of a sophisticated, well-funded, professionally-constructed narrative designed to land on exactly their desk. The speed-and-predictability story arrives at the top of the organisation polished, credentialed, and confirmed by the genuine enthusiasm of vendors and peers — while the engineers who can see the gaps in it sit several layers down, with neither the budget aimed at them nor the standing to puncture a story the whole industry is telling. This is why the swing meets so little resistance where it is decided. The case for it is being made, expensively and skilfully, to the precise altitude at which it will be adopted; the case against it lives at an altitude with no microphone. The asymmetry is not accidental. It is the shape of how the narrative is sold.

None of this requires bad faith from anyone. The vendors are doing what vendors do; the CIO is responding rationally to a credible, well-supported story arriving through every channel they trust; the enthusiastic developers reporting real wins are reporting real wins. The point is structural, not moral: the swing is propelled partly by a market force that systematically over-represents the upside to the people with authority and under-represents the failure modes to the same people, because the failure modes are visible mainly from a vantage point the marketing does not address. An organisation that does not see this will mistake a manufactured narrative for an independent discovery, and will mistake the absence of resistance at the top for the absence of objections anywhere.

There is a likely correction already visible in the same landscape, worth noting because it tempers the picture rather than sharpening it. Cost pressure is pushing toward smaller and local models — capable distilled models that can run on modest hardware and handle a large fraction of real work without the per-token meter running at all. And organisations are beginning to budget AI usage deliberately, by role and seniority, rather than treating it as free. Both trends point toward a more sober, more bounded economics than the current narrative assumes — which is to say the market itself may, in time, deflate the very story it is currently selling. That does not help an organisation that restructures around the inflated version today. But it suggests the inflation is a phase, not a permanent condition, and that the sane equilibrium is closer than the current rhetoric implies.

This is the most contestable section in the paper, and intellectual honesty requires marking it as such rather than asserting it with a confidence the evidence does not yet support. The word "mis-selling" is deliberately not used here, because a fair critic will rightly object that the capabilities and risks of these tools are, on the whole, openly documented — including in the very ways this paper relies upon — and that buyers are not being deceived about what the tools are. That objection has force. The claim this section actually makes is narrower and, I think, defensible even so: not that anyone is lying, but that the structure of how the narrative reaches decision-makers systematically over-weights the upside at the altitude where decisions are made and under-weights the failure modes that are visible mainly from the keyboard. Whether that structural asymmetry is real, and whether it actually distorts decisions, is an empirical question this paper cannot settle from a single vantage point — and a thoughtful reading holds that the prevailing sentiment toward effective AI use is broadly positive, with real wins outnumbering the failures. That competing reading is recorded directly, and given room to stand on its own terms, in Part IV. This section states one hypothesis about the market's influence; it does not claim to have proven it, and the reader who finds it the weakest link in the paper is encouraged to weigh it against the dissent registered later.

The final section of Part I turns to the economic error at the very heart of the narrative — the assumption, almost never stated because it is almost never noticed, that producing more code faster is the goal at all.

[Next: Section 8 — Code is a liability, not an asset]

8. Code is a liability, not an asset: the economics the swing inverts¶

Section brief — NEW (Booch + Zechner; well-supported). This may be the most counter-cultural and most shareable idea in the paper, so give it room. The entire AI-coding narrative rests on an unstated premise: that more code, faster is the goal. This section attacks that premise directly. Every line of code is a future liability — it must be read, understood, maintained, secured, and eventually changed or retired. Code is cost that happens to be useful, not value in itself. Build the argument in three moves: (1) the bottleneck was never typing — it was deciding what to build and understanding what you built (ties to §5); (2) volume is therefore a false productivity signal — generating large volumes of code does not create value, it defers debt (Zechner: "you just delayed the punishment"); (3) Booch's ethics layer makes this sharper still — every line is a decision someone is accountable for, and "the agent wrote it" is not absolution any more than "just following orders" is. The swing's core economic error: it optimises the metric that was cheap to move (lines produced) while silently taxing the metric that was always scarce (comprehension and sound judgment about what should exist). Land a memorable inversion for the public audience: the best engineers have always been measured by the code they didn't write. Caution / steel-man: be fair to the genuine counter-case — exploratory and throwaway code has real value precisely because it's disposable (see §16 workflow), and speed-to-learn is a legitimate gain. The liability framing applies to retained, production code, not to fast disposable exploration. Make that distinction explicitly or the argument overreaches.

Drafted prose follows.

Underneath the entire swing sits an assumption so basic that it is almost never spoken, which is precisely why it is so dangerous. The assumption is that producing more code, faster, is the goal — that code is output, output is value, and a tool that multiplies output therefore multiplies value. Strip the AI narrative down to its foundation and this is what remains: a productivity story, in which the product being produced is code, and more of it sooner is self-evidently better. This section attacks that foundation directly, because it is wrong, and because almost everything else the swing gets wrong follows from getting this wrong first.

Here is the inversion, stated plainly: code is a liability, not an asset. Every line of code an organisation owns is a line it must read, understand, test, secure, debug, maintain, and eventually change or retire. Code is not treasure that accumulates in a vault; it is more like obligation that accumulates on a balance sheet. It is cost that happens, when done well, to also be useful — and the usefulness is the point, not the volume. A system that delivers what is needed in ten thousand lines is unambiguously superior to one that delivers the same thing in a hundred thousand, because the second system has nine times the surface to maintain, secure, and comprehend, for identical benefit. Any framing that treats lines produced as a measure of value has the sign backwards on most of what it is counting.

The argument runs in three moves, and the first we have already established: the bottleneck was never the typing. It was deciding what to build and understanding what you built. If that is true — and the whole of this paper argues that it is — then a tool that accelerates the production of lines is accelerating the activity that was never scarce, while doing nothing for the activities that always were. This is not a small inefficiency. It is an optimisation aimed at the wrong quantity entirely, and optimising the wrong quantity hard enough actively damages the right one, because the flood of cheaply-produced code raises the comprehension and maintenance burden — the genuinely scarce resource — rather than relieving it.

The second move follows: volume is a false productivity signal. The feeling of productivity when an agent generates a great deal of working-looking code very fast is real as a feeling and misleading as a measurement. A practitioner who works this way daily put the underlying truth bluntly: code is never free, and if you believe any amount of code is good to have now, you have merely delayed the punishment. Generating an enormous volume of code in a short time does not create value in proportion; it creates future obligation in proportion, and the obligation comes due later, when the system must be understood and changed by people — quite possibly people who were not there when the agent wrote it and who hold none of the comprehension that writing it would have produced. The productivity is borrowed against the future at an interest rate nobody quoted. This is the economic face of the Knowledge Debt that Section 3 introduced and Part II documents: the debt is not only in the missing understanding but in the sheer retained mass of code that the missing understanding now has to cover.

The third move sharpens the point from economics into accountability, and it is the one that should give any engineer pause. Software engineering reaches, at its top layer, into ethics — every line of code is, in the end, a decision that someone is answerable for. Code that handles money, that processes personal data, that makes a determination affecting a person's life, that holds a safety property — these are not morally neutral artefacts, and "the agent generated it" is not absolution any more than "I was only following the specification" ever was. When the volume of code an organisation owns vastly exceeds the volume any human has actually examined and understood, accountability does not disappear; it is merely held by people who can no longer discharge it, because you cannot stand behind a decision you did not know was being made. The swing's enthusiasm for volume quietly manufactures exactly this condition: more and more code, owned and shipped and depended upon, that fewer and fewer humans have ever genuinely read. The liability is not only technical. It is moral, and it accrues silently.

Put the three moves together and the swing's core economic error stands clearly: it optimises the metric that was cheap to move — lines produced — while silently taxing the metric that was always scarce: comprehension, and sound judgment about what should exist at all. It pours effort into making more of the thing that was never the constraint, and in doing so it enlarges the thing that always was. There is an old piece of wisdom worth restating for this moment, because it has rarely been more relevant: the best engineers have always been measured not by the code they wrote but by the code they didn't — the feature argued out of existence before it was built, the abstraction that removed a thousand lines, the elegant solution that fit in a tenth of the space. The swing inverts that wisdom precisely, and rewards the opposite of what the discipline learned to value.

Now the necessary fairness, because this argument is easy to overstate and overstating it would be its own kind of dishonesty. The liability framing applies to retained, production code — the code an organisation keeps, ships, depends on, and must live with. It does not apply to exploratory or throwaway code, and there the calculus genuinely inverts. Code written to learn something, to test a hypothesis, to prototype an approach you intend to discard, is valuable precisely because it is disposable — its whole purpose is to be cheap to produce and cheap to throw away, and the agent's ability to generate it fast is a real and substantial gift. Exploration is where the agent's speed pays off honestly, because exploration is the activity where producing-to-discard is the point and comprehension of the artefact does not need to outlive the experiment. The distinction is therefore not "code good" or "code bad" but retained versus disposable: the liability framing governs the code you keep; the speed gift governs the code you throw away. Section 16 builds a whole practice on exactly this distinction — agents pointed at exploration, where volume-to-discard is a virtue, and disciplined comprehension protected for the code that survives. The error of the swing is not that it produces code fast. It is that it treats all that code as kept, and as value, and as finished — and so converts a genuine gift for cheap exploration into a machine for manufacturing expensive, un-comprehended, permanent liability.

This completes the analysis of Part I's central question — why the swing happens and why it is mistaken. One structural matter remains before the evidence: the swing is not only a change to how software is built but, by a route almost no one notices, a change to what gets built, because reorganising the people reorganises the systems. That is the subject of the next section.

[Next: Section 9 — Conway's Law]

9. Conway's Law: the restructure is an architecture decision in disguise¶

Section brief — NEW (training-data grounded — VERIFY framing before publishing). A sharp, under-noticed structural point. Conway's Law (1967) observes that organisations design systems that mirror their own communication structures. Caution: this is my characterisation of a well-established principle — confirm the attribution and the standard phrasing against a primary source before you publish it under your name; it is widely cited but also widely loosely-paraphrased. The argument for this section: a restructure justified purely as a delivery/management change (engineers moved under product, one engineer per product) is, by Conway's Law, also silently an architecture change. Single-owner, product-siloed teams will tend to produce single-owner, product-siloed systems with weak seams between them — exactly the conditions in which cross-cutting concerns (security, shared platform health, end-of-life migration, integration coherence) have no natural home. This connects your §6/§16 cross-cutting-concern thread to a named, respected principle and elevates it from "a worry" to "a predictable consequence." The deeper point worth making carefully: leadership is choosing a system architecture without realising it, because the decision was framed as an HR/reporting-line change. That reframing — org design is system design — is genuinely valuable and is the kind of insight that marks the author as senior. Steel-man to include: product-aligned teams also have real architectural benefits (clear ownership, high cohesion within a product, the value stream thinking behind "Team Topologies"–style organisation); the critique is the absence of a deliberate counterbalancing mechanism for the seams, not product alignment itself. Note for the author: if you cite Conway's Law or any "Team Topologies"–style framing, verify the specifics — I'm working from training data here, not a checked source.

Drafted prose follows. (Author: verify the Conway's Law attribution and wording, and any Team-Topologies framing, against primary sources before publishing — this section leans on well-known principles I am recalling rather than ones I have checked for you.)

There is a consequence of the restructure that almost no one in the room when it is decided will name, because the decision is framed as a question of reporting lines and the consequence is a question of system architecture, and those are filed in different parts of the organisational mind. But they are the same decision. Reorganising the people is reorganising the systems, on a delay, and the route by which that happens has been understood for nearly sixty years.

The principle, generally attributed to Conway in 1967, holds that organisations are constrained to produce designs which are copies of their own communication structures. The systems an organisation builds come to mirror the way the organisation itself is arranged to talk. Teams that communicate easily produce components that integrate easily; boundaries between teams become boundaries in the software; the shape of the org becomes, over time, the shape of the architecture. This is not a tendency that careful engineers can simply choose to resist by good intentions. It is closer to a gravitational field. You can build against it, but only deliberately and at cost, and only if you have first noticed it is there.

Apply the principle to the restructure this paper observes. Engineers are moved under product. Each is given deep ownership of an assigned product, ideally one engineer per product. The justification offered is entirely about delivery and accountability — clearer ownership, a single throat to choke, alignment of engineering effort with product need. Every word of that justification is about management. Not one word is about architecture. And yet, by Conway's Law, an organisation arranged as a set of single-owner, product-aligned silos will tend, reliably and without anyone deciding it, to produce a set of single-owner, product-aligned systems — strongly bounded around each product, weakly connected between them, with thin and under-tended seams at the joins. The communication structure has been redrawn, so the architecture will be redrawn to match it. Leadership has made a far-reaching architectural decision while believing it made an HR decision.

This is the section's central and genuinely valuable reframing, and it is worth stating as starkly as it deserves: org design is system design. The two are not analogous or related or connected. They are, over a long enough horizon, the same act performed twice — once explicitly on the org chart and once implicitly in the codebase. A restructure is an architecture diagram that has not finished rendering. Treating it as a purely managerial matter does not make its architectural consequences go away; it only ensures that those consequences arrive unchosen, unexamined, and unowned.

And the specific architectural consequence of this restructure is the one Part I has been circling from several directions: the homeless cross-cutting concern. The things that do not belong to any single product — the shared platform's health, security as a property of the whole estate rather than of one feature, the coherence of how systems integrate with one another, the looming end-of-life of a platform that several products quietly depend on — these have no natural owner in a structure where every owner is defined by a single product. Under the old structure, however imperfectly, such concerns had at least a possible home: a function, a layer, a person whose remit spanned products. The product-siloed structure dissolves that home by construction. Every engineer is pointed inward at their product; the spaces between products, where the cross-cutting concerns live, belong to no one. Conway's Law predicts this not as a risk that might materialise if people are careless, but as the default outcome of the structure — the thing that will happen unless a deliberate countervailing mechanism is built to prevent it. The worry that earlier sections raised as a worry, this section can therefore upgrade to a prediction: the seams will be neglected, because the org has been arranged so that neglecting them is the path of least resistance.

Now the fairness this section requires, because product-aligned organisation is not a foolish idea and presenting it as one would be a cheap shot that weakens the argument. Aligning teams to products has real and well-understood benefits. It produces clear ownership and accountability. It produces high cohesion within each product, because the people who understand it best are continuously responsible for it. It reflects a serious and respected school of thought about organising engineering work around value streams and well-defined team boundaries — the broad family of thinking sometimes gathered under the "Team Topologies" banner, which takes Conway's Law seriously and tries to use it rather than fight it, deliberately shaping teams so that the architecture they induce is the architecture you actually want. Product alignment, done with eyes open, can be exactly right. The critique here is therefore narrow and precise, and it must stay narrow to stay honest: the problem is not product alignment. The problem is product alignment adopted without the deliberate counterbalancing mechanism that the same body of thought insists upon — without anyone owning the seams, without an explicit home for the cross-cutting concerns, without the recognition that an architectural decision is being made at all. The mature version of this restructure would pair product ownership with a deliberate, named mechanism for the spaces between products. The version this paper observes adopts the alignment and omits the counterbalance, and omits it precisely because it does not know it is making an architectural choice. The failure is not the structure. It is the structure chosen blind.

This closes Part I. The argument has established what the swing is, why it feels new, why it is mistaken, where it is and is not reasonable, how it is propelled from outside, what it gets backwards about the economics of code, and how it reshapes systems while believing it only reshapes reporting lines. Part II turns from analysis to observation — the accumulating field record of what the swing actually does, gathered as it happens, organised by the area of impact, and disciplined throughout by the requirement that every observation be tied to a mechanism rather than left as a grievance.

[Next: Part II — the field observations begin]

Part II — The Evidence (field observations)¶

Part brief. This is the accumulating core of the white paper and the reason it's a living document. Each observation below is logged using the evidence template (Appendix A) so that, over time, individual anecdotes harden into a documented pattern. The discipline: every entry pairs a concrete, anonymised observation with the structural mechanism it illustrates and the lesson-being-un-learned it maps to. Anecdote alone is dismissible; anecdote tied to mechanism is evidence. Organise observations under the five impact areas below (team, verification, software, business/IT trust, profession). Start sparse; let it grow.

10. Impact on the team¶

Section brief. Observations about people and structure. Candidate threads to watch and log: the dissolution of the engineering-management/advocacy layer and what disappears with it (the "who speaks for engineering quality now?" question); the bus-factor / continuity risk of one-engineer-per-product ownership; the implicit pressure-to-conform framing ("you're either onboard or you find happiness elsewhere") and its effect on honest dissent; the loss of cross-pollination when engineers are siloed by product; the morale and trust cost of having deep prior work go unacknowledged while being told to "adapt." Map each to the structural mechanism, not the grievance.

New thread — skill atrophy and the juniors problem (both speakers, strongly). This is a distinct mechanism from Knowledge Debt: that one is about code nobody understands; this is about people who never develop understanding in the first place. Log it as its own line of evidence. Mechanism: friction and pain are how engineers build intuition; agents remove friction, which is fine for things you needn't learn but corrosive for fundamentals. The cohort entering now faces compounding pressure — reduced fundamentals exposure plus heavy pressure to use agents to avoid looking slow — and may never acquire the "smell" (Booch) that lets an experienced engineer sense when something is subtly wrong. Both speakers independently give the same prescription: in the things that matter, do it by hand; let agents assist learning in areas already understood rather than replace the learning. Watch for the organisational version of this: a one-engineer-per-product model with agent-accelerated delivery quietly removes the apprenticeship structure through which judgment was historically transmitted.

Drafted framing prose follows. This is a living section: dated observations are appended over time using the Appendix A template. The prose below establishes the impact area and its mechanisms; the log accumulates beneath it.

Part I argued. Part II watches. The shift in register is deliberate: an argument can be answered with a better argument, but an accumulating record of what actually happened, each entry tied to the mechanism that produced it, is a different and harder kind of thing to wave away. What follows across the next five sections is a field log, organised by where the swing's effects land. The discipline established in the part introduction governs every entry: a concrete, anonymised observation, paired with the structural mechanism it illustrates, mapped to the lesson being un-learned, and accompanied by the strongest fair statement of the contrary case. An entry that is only a grievance does not belong here. An entry that is a grievance fastened to a mechanism is evidence.

This first section concerns the impact on the team — on people and on the structure they work within. Several mechanisms are worth watching from the outset, before the specific observations accumulate.

The first is the disappearance of the engineering advocacy layer. When a structure dissolves the role whose job was to speak for engineering concerns as such — quality, sustainability, the health of the craft — those concerns do not find another voice automatically. They become things everyone is presumed to care about and no one is positioned to defend, which in practice means they are defended only when defending them is costless, which is to say rarely. The mechanism to watch is not "morale dropped." It is "the organisational position from which a certain class of argument could be made has been removed, and so that class of argument stops being made."

The second is continuity, or its absence — the bus factor. One engineer per product is a clean line on an org chart and a single point of failure in reality. The mechanism is straightforward and worth stating without drama: when the comprehension of a product lives in exactly one head, the departure, illness, or simple unavailability of that head is the loss of the product's comprehensibility, and the swing's tendency to leave code un-understood (Section 3's Knowledge Debt) means there may be no fallback comprehension anywhere — not in a second engineer, and not in the code itself, which was never fully understood even by the one who shipped it.

The third is the chilling of honest dissent. When a change is communicated with an explicit fork — adapt enthusiastically, or find happiness elsewhere — the message lands as more than information. It establishes the cost of objecting. The mechanism to watch is subtle and rarely shows up as open conflict: it shows up as the absence of the objections that would have improved the change, because the people best placed to raise them have correctly read the room and concluded that raising them is unsafe. Silence here is not consent. It is a measurable suppression of exactly the feedback the organisation most needs, produced by the framing of the change itself.

The fourth is the loss of cross-pollination. Engineers siloed one-per-product stop encountering each other's problems, and the informal transfer of technique, pattern, and hard-won caution that happens when engineers work across and alongside one another quietly stops. The mechanism is the removal of the channels through which judgment used to diffuse through a team.

And the fifth, which both expert voices in the source material raise independently and forcefully, deserves to be logged as its own distinct line of evidence because it is a different harm from all the others: skill atrophy, and especially its effect on those entering the profession now. This is not the same as Knowledge Debt. Knowledge Debt is about code that no one understands. This is about people who never come to understand — a degradation not of the artefact but of the engineer. The mechanism is precise and, once stated, hard to unsee: friction and difficulty are not obstacles to learning, they are the learning. The struggle of doing a hard thing by hand is how an engineer builds the intuition that later lets them sense, without quite being able to say why, that something is subtly wrong — the quality experienced engineers call "smell." Agents remove friction. For things you do not need to learn, removing friction is pure gain. For the fundamentals, removing the friction removes the mechanism by which the fundamentals were ever acquired. The engineer who never struggled to build the thing never develops the sense that warns them when the generated thing is off.

The cohort most exposed to this is the one entering the field right now, and they are caught in a vice. On one side, reduced exposure to fundamentals in their training and a culture that increasingly treats agent fluency as the core skill. On the other, intense pressure to use agents heavily and immediately, because not doing so looks slow and incompetent next to peers who do. The result is a generation at risk of arriving in the profession without the hard-won intuitions that the senior engineers around them take for granted and rely upon — intuitions those seniors acquired precisely through the friction the juniors are now encouraged to skip. The prescription offered, independently, by both expert voices is the same and worth recording: in the things that matter, do it by hand; let the agent assist your learning in domains you already understand rather than substitute for the learning in domains you do not.

The organisational version of this mechanism is the one this section will watch most closely, because the restructure embodies it. The traditional structure through which engineering judgment was transmitted across a generation was a kind of apprenticeship — juniors working alongside seniors, absorbing the smell, the caution, the judgment, by proximity and shared struggle over time. A one-engineer-per-product model, especially one accelerated by agents so that the lone engineer is shipping faster than ever, quietly dismantles that apprenticeship. There is no alongside. There is one engineer and an agent, and the agent cannot transmit judgment because it has none to transmit — it is, in the apt description, the eager junior itself, not the mentor. The structure optimised for individual ownership has, as an unchosen side effect, removed the mechanism by which the next generation of owners was supposed to be formed.

Observation log (append dated entries below using the Appendix A template):

— [awaiting first logged observation]

[Next: Section 11 — Impact on verification]

11. Impact on verification: the collapse of "done"¶

Section brief — NEW (your data; strongly supported by both the memo and Zechner). Place this before "impact on the software" because the verification loop is upstream of artefact quality — when it breaks, everything downstream degrades silently. The argument: in iterative practice the arbiter of "done" was external and concrete — a passing test, a working increment a stakeholder could exercise, a reviewer who understood the change. The swing quietly replaces that arbiter with "the prompt was approved." Approval of intent is being mistaken for verification of outcome. Trace the mechanism through the memo's own process — stakeholder discussion → AI-ready prompt → estimate sign-off → "go off and get the work done" → (nothing named after that). The gate is at the front; there is no named gate at the back. Contrast Zechner's discipline, where the entire weight of quality rests on the back-end gate: line-by-line diff review of core code to the same standard as a human contributor, with throwaway code consciously exempted. The question the section forces: who verifies, against what, and do they understand what they're approving? If the answer is "the same engineer who wrote the prompt, against the prompt they wrote, reviewing code they didn't write and may not fully grasp," the verification loop has closed on itself. This is also the natural and honest home for the author's security-aware-but-defers-to-specialists positioning: when verification collapses, security defects are among the first things to slip through, and they are exactly the class of problem an enthusiastic agent introduces silently (Booch's trust concern; the "junior who doesn't know what they don't know"). Steel-man: a disciplined team can absolutely put a rigorous back-end gate in place (Zechner proves it) — so the critique is precise: the swing as commonly implemented moves the gate to the front and names nothing at the back, not that AI-assisted work is inherently unverifiable.

Drafted framing prose follows. Living section; append dated observations using the Appendix A template.

This section is placed deliberately before the section on software quality, because verification is upstream of quality. When the loop that decides whether something is correct breaks, the correctness of everything that passes through it degrades — and it degrades silently, because the broken loop keeps emitting "done" with the same confidence it always had. A team can be producing steadily worse software while every signal it watches stays green, if the signals are the wrong ones. So the question of how the swing changes verification comes first.

Begin with what verification was, in functioning iterative practice, because the contrast is the whole argument. The arbiter of "done" was external to the person claiming doneness, and it was concrete. A test passed or it did not. A working increment was put in front of a stakeholder who could exercise it and say "yes, that" or "no, not that." A reviewer who understood the change read it and signed off on the basis of understanding. In each case the judgement of "done" rested on something outside the builder's own assertion — a behaving artefact, an independent comprehending human, a check that could fail. That externality was the point. It is what made "done" mean something.

Now trace what the swing's process actually specifies, following the observed instruction step by step. There is a stakeholder discussion. It is written up into an AI-ready prompt. The prompt and its estimate are approved. The engineer goes off and gets the work done. And then — read the sequence again, looking for the verification step — there is nothing named after that. The process specifies, in detail, a gate at the front: the prompt is reviewed, the estimate is reviewed, sign-off is obtained before work begins. It specifies no gate at the back. The elaborate ceremony of approval all happens before a line is built, and once the building starts the process simply ends with "get the work done," as though done were self-evident and self-certifying.

This is the quiet substitution at the heart of the section: approval of intent has been mistaken for verification of outcome. The sign-off that the process treats as its quality gate is a gate on the plan, obtained before there is anything to verify. It certifies that everyone agreed on what should be built. It certifies nothing whatever about what was built — and given everything Part I established about the gap between a prompt and a program, what was built necessarily contains a great deal that the prompt never specified and the sign-off never saw. The front gate inspects the sketch. Nothing inspects the painting.

It is worth being precise that this is a failure of the swing as commonly implemented, not an inherent property of AI-assisted work — and the proof is that disciplined practitioners do the opposite, deliberately. The most rigorous agent-using workflow documented in this paper's source material rests its entire weight on the back-end gate. The discipline is to guard-rail the agent heavily before it implements, and then — this is the crucial part — to review the resulting code line by line, to exactly the standard one would apply to a contribution from a human colleague, with deliberate and conscious exemptions only for code that is genuinely throwaway. The quality does not come from the prompt being good. It comes from a human who understands the system reading every line of the kept code and standing behind it. That is a real verification loop, with a real external arbiter, relocated to where the agent's output actually needs checking: the back. The swing's error is not that it uses agents. It is that it keeps the front gate, which agents made less sufficient, and quietly drops the back gate, which agents made more necessary.

The section's central question can therefore be put as a single test, and it is the test every observation logged here will apply: who verifies, against what, and do they understand what they are approving? Hold the swing's process up to that test and watch it close on itself. Who verifies? The engineer who wrote the prompt. Against what? Against the prompt they themselves wrote. Reviewing what? Code they did not write, generated by an agent, that they may not fully understand — because, per Section 3, the comprehension that writing it would have produced was delegated along with the writing. The loop has folded into a circle. The verifier, the standard, and the thing being protected against have collapsed into a single point: a person checking generated code against their own incomplete description of what they wanted, with neither the independent artefact nor the independent comprehension that made verification mean something. It is not that this always fails. It is that when it succeeds, it succeeds by luck or by an individual engineer's private discipline, not because the process contains anything that would catch the failure.

This is the natural home for a particular caution about security, and the right register for it is modesty rather than alarm. Security is not this author's specialty, and the claim here is not a security expert's claim — it is the observation any careful engineer can make, which is that security defects are precisely the class of problem this broken loop is worst at catching. A security flaw rarely announces itself in the behaviour the prompt described; the feature works, the demo passes, the stakeholder is satisfied, and the vulnerability sits quietly in code no human read closely. This is exactly the kind of subtle, invisible-until-exploited problem that the eager-but-unaware agent introduces without flagging — it does not know what it does not know, and a verification loop that has folded onto itself has no independent vantage point from which to notice. The appropriate response is not for the lone engineer to become a security expert; it is for the organisation to retain the specialist review and the cross-cutting security attention that the broken loop and the product-siloed structure (Section 9) both tend to dissolve. The danger is not that engineers are careless. It is that the structure removes the independent check at precisely the point where an independent check mattered most, and then accelerates production through the gap.

Observation log (append dated entries below using the Appendix A template):

— [awaiting first logged observation]

[Next: Section 12 — Impact on the software]

12. Impact on the software¶

Section brief. Observations about the artefacts and systems. Threads to watch and log: accumulation of AI-generated code that no one fully comprehends (Knowledge Debt in the wild); erosion of architectural coherence when work is gated per-prompt rather than designed across a system; the fate of cross-cutting concerns (security, platform health, tech-debt, .NET-version / platform end-of-life risk) that have no "product" to belong to and therefore no champion under a product-only work intake; quality-assurance gaps when the up-front prompt is treated as sufficient and emergent discovery is squeezed out; supportability of systems built fast but understood shallowly.

Corroboration (Zechner) — code is never free. Two framings worth keeping in the author's own words: that generating enormous volumes of code does not create value but defers debt ("you just delayed the punishment"), and that volume is a false productivity signal. This is the empirical backbone for Knowledge Debt: the swing optimises the one metric (code produced) that was never the constraint, while degrading the one that always was (comprehension). Tie directly to the cross-cutting-concern thread — architecturally significant decisions (Booch: significance measured by cost of change) are exactly what a per-prompt delivery gate is structurally blind to, because no single prompt owns them.

Drafted framing prose follows. Living section; append dated observations using the Appendix A template.

This section concerns the artefacts themselves — the systems and the code, what happens to them under the swing. Much of the conceptual groundwork was laid in Part I, so this framing's job is to convert those arguments into the specific, observable degradations to watch for and log, each tied to its mechanism.

The central phenomenon is Knowledge Debt, and it is worth restating crisply now that the term has earned its place. Knowledge Debt is the accumulating gap between the code a system contains and the comprehension the team actually holds of it. It is the direct artefact-level consequence of delegating implementation: the code arrives without the understanding that producing it by hand would have created. Unlike technical debt, which is roughly visible — you can often see the messy code — Knowledge Debt is invisible by nature, because what is missing is not in the code at all; it is in the heads that should understand the code and do not. A system can look clean and be deeply in Knowledge Debt, because the debt is the absence of comprehension, and absence does not show up in a code review of the code itself. The mechanism to watch is the widening of this gap: more code shipped, by fewer people who understand less of it, faster than comprehension can be built.

The empirical backbone here comes from a practitioner who states the economic half plainly: code is never free, and generating large volumes of it does not create value — it defers a reckoning. Volume is a false productivity signal (the full argument is Section 8). At the artefact level the consequence is concrete: the system grows faster than the team's understanding of it, and every increment of that gap is an increment of risk that lands later, when something must be changed or fixed and no one holds the comprehension required to do so safely. The punishment, in the practitioner's phrase, was only delayed — and the swing's velocity is, in large part, the sound of it being delayed faster.

The second degradation to watch is the erosion of architectural coherence, and it follows directly from the per-prompt delivery model. When work is gated and delivered one feature, one prompt at a time, the unit of attention is the feature. But the most consequential decisions in a system are not features; they are the relationships between things — the patterns, the boundaries, the shared abstractions, the significant decisions whose defining property is that they are expensive to change later. There is a precise definition of architectural significance worth holding onto: a decision is architecturally significant in proportion to the cost of changing it. By that measure, the significant decisions are exactly the ones a feature-at-a-time pipeline is structurally blind to, because no single prompt owns them and no single feature is where they live. The mechanism is not that engineers make bad architectural decisions under the swing. It is that the swing's unit of work has no slot in which architectural decisions are made at all — they get made implicitly, accreted feature by feature, by an agent filling gaps with internet-average defaults, until the system has an architecture that no one chose and no one owns. This connects directly to the homeless cross-cutting concern of Section 9: the significant, spanning decisions have no home in the work intake, so they are made by default rather than by design.

The third degradation is supportability, which is where Knowledge Debt and architectural erosion compound into something an organisation feels directly. A system that was built fast, understood shallowly, and grown without architectural ownership is a system that is expensive and dangerous to support — because support requires exactly the comprehension that was never built, applied to exactly the spanning structure that was never owned. The mechanism to watch is the lengthening and worsening of the path from "something is wrong" to "we understand why and have safely fixed it," as the comprehension required for that path thins out across the estate.

The fairness this section owes is the same distinction Section 8 drew and it must be repeated here because it is the line between honest critique and overstatement: none of this is an indictment of generated code as such. Code produced fast and then genuinely understood, reviewed, and owned is fine — better than fine, it is the legitimate gift of the tool. Code produced for disposal, as exploration, is fine precisely because no one needs to carry comprehension of an artefact they intend to throw away. The degradations catalogued here attach specifically to retained, production code produced under a process that does not build or preserve comprehension of it. The observable harm is not "the team used an agent." It is "the team's systems are accumulating mass faster than the team's understanding of them, with no part of the process responsible for closing the gap." That is the thing to watch, and the thing to log.

Observation log (append dated entries below using the Appendix A template):

— [awaiting first logged observation]

[Next: Section 13 — Impact on trust between business and IT]

13. Impact on trust between business and IT¶

Section brief. The most important and most corrosive impact area. Threads to watch and log: the over-promise cycle — agent-accelerated estimates create commitments that reality can't honour, which damages exactly the trust the restructure was meant to build; the predictability paradox (promising scope-on-a-date, missing it, eroding credibility further); what happens to the business relationship when quality issues surface from code nobody deeply understands; the long-run reputational cost to "IT" as a function when the AI narrative over-delivers on promise and under-delivers on outcome. Land the throughline: iterative methods rebuilt business/IT trust precisely by not promising what couldn't be known; the swing re-breaks it.

Connect to §7 (vendor layer). The over-promise often originates outside the organisation: the speed-and-predictability narrative sold to leadership sets the expectation that engineers are then held to. The trust damage lands on IT, but the inflated promise was partly imported. Worth naming carefully — it's both a fairer read of leadership and a sharper diagnosis.

Drafted framing prose follows. Living section; append dated observations using the Appendix A template.

This is the most important impact area in the paper, and the most corrosive, because it strikes at the thing the swing was supposed to improve. The entire justification for the restructure, traced back in Section 5, was the desire to give the business a reliable, predictable, trustworthy delivery relationship. Trust between the business and the people who build for it is the prize. This section is about how the swing spends that trust while believing it is earning it.

The core mechanism is the over-promise cycle, and it runs as follows. The swing generates optimistic estimates — agent-accelerated, justified by the speed at which a demo can be produced, and committed to at the front-loaded sign-off gate before the deferred discovery has surfaced. Those estimates become promises made to the business. Then reality arrives: the discovery that was deferred shows up anyway (Section 3), the verification that was skipped surfaces defects later (Section 11), the code that no one fully understood proves expensive to finish or fix (Section 12). The promise is missed. And here is the cruel part of the cycle — the response to a missed promise, under the predictability reach, is almost never "we should stop making promises of this kind." It is "we need to specify harder, gate tighter, commit more precisely next time." The failure of false predictability is met with a redoubled pursuit of false predictability. The cycle does not self-correct; it self-reinforces, and each turn of it spends a little more of the trust it was meant to build.

The deepest version of this is what might be called the predictability paradox, and it is worth stating sharply because it is the section's central insight. The kind of predictability the swing promises is the kind that, when it fails, damages trust most. A promise of precise scope on a precise date is a promise that can be precisely, visibly, embarrassingly broken — and because the swing encourages exactly this kind of promise, it manufactures exactly this kind of breakage. Iterative delivery, by contrast, promises something less seductive but more keepable: a reliable cadence, a dependable ability to steer, honest visibility into progress. That kind of promise is harder to break because it claimed less and claimed truer. By reaching for the impressive promise over the keepable one, the swing does not merely risk failing to build trust — it actively constructs the most trust-destroying failure mode available, and then runs it on a loop.

There is a second mechanism, quieter and slower: the trust damage from quality problems that surface from un-comprehended code. When a defect emerges from code no human deeply understood, the organisation cannot respond to the business with the thing trust is built on — a clear account of what went wrong and confident assurance it is handled. Instead the response is hesitant, the fix is slow, the explanation is thin, because the comprehension required to be crisp about it was never built (Section 12). The business experiences this as an IT function that does not seem to understand its own systems — which, in the specific and literal sense of Knowledge Debt, is true. Each such episode erodes the business's confidence not in a project but in the competence of the function, which is a deeper and slower-healing kind of damage.

Now the connection that makes the section fairer and the diagnosis sharper, and it reaches back to Section 7. The over-promise frequently does not originate inside the organisation at all. The speed-and-predictability narrative was sold, expensively and skilfully, to leadership from outside — and leadership, having absorbed it, sets the expectation that engineers are then measured against. The inflated promise is, in part, imported: manufactured by a market force aimed at the decision-maker, adopted in good faith, and then passed downward as a standard. This matters for two reasons. It is a fairer read of leadership, who are not cynically over-promising but earnestly relaying a story they were given strong reason to believe. And it is a sharper diagnosis, because it locates part of the trust problem outside the organisation's own conduct: the gap between what was promised and what could be delivered was partly engineered elsewhere, and an organisation that does not see this will keep trying to close the gap by squeezing the engineers, when part of the gap was never theirs to close. The trust damage lands on IT. The inflated promise was partly handed to IT from above, and to those above from outside.

The throughline that ties this section to the whole paper is worth stating as the section's closing frame, because it is the deepest irony in the entire argument. Iterative methods rebuilt the business–IT relationship, historically, precisely by not promising what could not be known — by trading the impressive promise for the keepable one, and earning trust through reliability rather than through ambition. That was the hard-won settlement. The swing breaks that settlement in the name of strengthening it. It reaches for the impressive promise again, believing the agent has finally made it keepable, and in doing so re-opens the exact wound that iterative practice was the cure for. The business does not end up trusting IT more. It ends up, after enough turns of the cycle, trusting it less — and trusting it less specifically about the thing the whole exercise was meant to fix.

A necessary caveat governs this entire section, and it must be stated plainly rather than buried, because this is the area where the paper most risks asserting as observed fact what is so far only a predicted failure mode. This section describes a mechanism, not a measurement. The over-promise cycle and the predictability paradox are arguments about how trust could erode if the swing runs unchecked — they are hypotheses about a trajectory, not a report of trust already broken. And there is a serious, well-founded competing view that deserves equal billing: that the prevailing sentiment around effective AI use is currently positive, that organisations are reporting real and valued outcomes, and that the trust relationship is in many places being strengthened rather than damaged by tools that genuinely help. That view is held by thoughtful practitioners with direct exposure, and nothing in this section's logic refutes it — because the question is empirical, and the evidence is not yet in. It is entirely possible that mature verification practices (Section 11) and disciplined use keep the over-promise cycle from ever starting, in which case the trust damage described here simply does not materialise. The honest position, and the one this paper takes, is that the mechanism is real and worth watching while the outcome is undecided. The observation log below exists precisely to test which way it actually goes — and entries recording positive trust outcomes are as valuable to that test as entries recording damage. The competing positive reading is recorded in full in Part IV, where it stands on its own terms rather than as a concession wrung from a critique.

Observation log (append dated entries below using the Appendix A template):

— [awaiting first logged observation]

[Next: Section 14 — Impact on the practice and the profession]

14. Impact on the practice and the profession¶

Section brief. Zoom out from the single organisation to the industry pattern the anonymised case illustrates: deskilling vs. up-skilling debates, the redefinition of "software engineer" toward "agent operator," what is lost and what is genuinely gained, and the generational knowledge-transfer risk when comprehension is outsourced early in careers.

New illustration — the "clankers" PR flood (Zechner), a parable of volume over comprehension. A vivid, concrete case for this section and a strong evidence-template entry. A healthy open-source project that once received a small number of human-authored pull requests per week now receives dozens per day from agents — novel-length descriptions, sweeping file changes, almost none mergeable. The maintainer's fix is the paper's thesis in miniature: auto-close by default, and require a contributor to first explain the problem in their own human voice before earning the right to submit code. The signal that mattered — does this person understand the problem? — had been drowned by volume and had to be deliberately restored. Use this as the memorable image for the whole profession-level argument: when production cost collapses, comprehension becomes the scarce thing worth gating on, which is the exact inversion of the "specify-and-delegate" reflex. Booch's optimism belongs here as the counterweight so the section isn't bleak: with the tedious parts handled, judgment, curiosity, and architectural thinking become the human centre of the work — a genuinely good time to be an engineer if the profession keeps cultivating those things rather than delegating them away.

Drafted framing prose follows. Living section; append dated observations using the Appendix A template.

The previous sections kept their gaze on a single organisation. This one lifts it to the profession, because the anonymised case this paper documents is an instance of something happening everywhere, and the wider pattern both confirms the diagnosis and points toward the way out. The question here is what the swing does to software engineering as a craft and as a career — and, importantly, the answer is not uniformly grim, which matters for the honesty of the whole paper.

Start with the parable, because it captures the entire profession-level argument in a single concrete image. Consider a healthy open-source project in the world before agents: it might receive a small handful of pull requests in a week, each written by a human who had understood the problem well enough to propose a solution, each carrying a comprehensible description in a recognisable human voice. The same project after agents: dozens of pull requests per day, generated by agents, each accompanied by a description of exhausting length, each touching anywhere from a handful to a thousand files — and almost none of them mergeable. The signal that a maintainer relied upon, the signal that a contribution came from someone who understood the problem, was simply drowned. Not degraded — drowned, beneath a volume of plausible-looking, confidently-described, fundamentally un-understood output.

The maintainer's solution is the thesis of this entire paper rendered as an operational policy, and it is worth dwelling on for that reason. The response was to auto-close contributions by default, and to require that a would-be contributor first explain, in their own human voice, what the problem is and why their approach addresses it — and only once that understanding is demonstrated does the contributor earn the right to submit code at all. Read what that policy does. It re-establishes, deliberately and by force, the thing the flood destroyed: it gates on comprehension rather than on output. It treats the cheap, abundant thing — generated code — as worthless without the scarce, expensive thing — a human who actually understands the problem. When the cost of producing the artefact collapsed, the maintainer correctly identified that the artefact was no longer the valuable part, and moved the gate to the part that still was. This is the exact inversion of the swing's specify-and-delegate reflex. The swing says: specify, and let the agent produce volume. The maintainer learned: volume is now free and therefore worthless as a signal, so gate on understanding, which is now the only thing that is scarce. The profession's hard cases are already discovering, under pressure, the principle the swing is busy forgetting.

Generalise the parable and it states the profession-level shift directly. When production cost collapses, comprehension becomes the scarce resource — the thing worth selecting for, gating on, and protecting. This reframes nearly every anxiety in the paper as an instance of one pattern: the organisation, like the flooded maintainer, is being buried in cheap output and must decide whether to keep treating output as the valuable thing or to recognise that the valuable thing has moved. The professions and teams that thrive will be the ones that make that recognition deliberately; the ones that do not will drown in their own velocity, exactly as the project drowned in its pull requests.

This is also where the legitimate fears about the career deserve a clear-eyed hearing: the deskilling debate, the redefinition of "software engineer" toward "operator of agents," the generational knowledge-transfer risk that Section 10 logged. These are real, and the paper does not dismiss them. But they are not destiny, and the parable shows why: the response to a flood of cheap output is not to lament it but to re-establish the gates that select for understanding. A profession that does that consciously does not deskill; it re-centres on the skills that the cheap output cannot supply.

And this is the place to let the counterweight in, deliberately, because a paper that ended Part II in unrelieved gloom would be both dishonest and less persuasive. There is a genuinely optimistic reading available, and it comes from the most architecturally serious voice in the source material, which is striking precisely because that voice has no naïveté about the tools' limits. The reading is this: with the tedious, mechanical parts of the work handled by the agent, the human centre of software engineering moves toward exactly the things that were always the most valuable and the most human — judgment, the cultivated "smell" for when something is subtly wrong, architectural thinking, and above all a limitless curiosity about how things work. On this reading it is a genuinely glorious time to be a software engineer, because the parts of the job that were drudgery are receding and the parts that were always the real work are coming to the fore. The expert's own prescription embodies the optimism: read source code from domains foreign to you, build judgment deliberately, stay curious without limit. The agent, used well, frees time for precisely that.

The catch — and it is the catch on which this entire paper turns — is the conditional buried in the optimism. It is a glorious time to be an engineer if the profession keeps cultivating the things that the agent cannot supply rather than delegating them away. The same tool that could free humans to do the higher-order work could equally be used to skip the apprenticeship that produces the capacity for higher-order work, to drown comprehension in volume, to gate on output instead of understanding. The agent does not decide which. People and organisations decide which, and they decide it by whether they treat comprehension as the scarce thing to protect or as a cost to eliminate. That choice — not the tool — is what determines whether this is the profession's golden age or the beginning of its hollowing. Which is the precise question Part III takes up.

Observation log (append dated entries below using the Appendix A template):

— [awaiting first logged observation]

[Next: Part III — Synthesis]

Part III — Synthesis¶

15. What the pendulum teaches¶

Section brief. Pull the observations back up to principle. The recurring lesson: technology changes the cost of production; it does not change the epistemics of building the right thing. Uncertainty about what to build is irreducible by specification — it is only reducible by feedback. Every swing toward up-front control is a bet that this time the uncertainty is gone; it never is. Be explicit that the agent is a real and valuable advance — the failure is not the tool but the amnesia about why we work iteratively.

Anchor with Booch's spectrum. Software engineering spans laws of physics → algorithms → architecture → organisation → economics → ethics; implementation is one band, and the agent assists mainly there. Stating this explicitly makes the paper's claim precise: the swing mistakes acceleration of one layer for transformation of all of them. Judgment — the durable skill — lives in the layers the agent doesn't touch.

Drafted prose follows.

It is time to lift the accumulated argument and evidence back up to principle, and to state, as plainly as the whole paper has been building toward, what the pendulum actually teaches. The lesson is not "AI is bad" — the paper has conceded the tool's real and substantial value too many times for that. The lesson is more durable and more useful, and it is this: technology changes the cost of production. It does not change the epistemics of building the right thing.

Unpack that, because it is the seed from which everything else grew. Every wave of advance in software — the compiler, the high-level language, the framework, the cloud, and now the agent — has lowered the cost of producing software. Each was real; each was valuable; each genuinely made some part of the work cheaper or faster. And each arrived with a narrative suggesting that because production had gotten cheaper, the fundamental difficulty of software had been substantially addressed. But the fundamental difficulty of software was never production. It was, and remains, knowing what to build — and knowing what to build is not a production problem that gets cheaper with better tools. It is an epistemic problem: a problem of knowledge, of discovering through feedback what is actually needed, under uncertainty that cannot be specified away because it is the uncertainty of not yet knowing what you do not yet know. No tool that accelerates production touches that uncertainty, because the uncertainty does not live in production. It lives in the loop between building and learning, and it is reducible only by traversing that loop, not by speeding up one half of it.

This is why every swing toward up-front control is, at bottom, the same bet — and the same losing bet. Each swing wagers that this time the uncertainty is gone: that the new tool, or the new rigour, or the new process, has finally made it possible to know the right thing in advance and specify it completely before building. The sequential model made that bet in 1970. The heavyweight methodologies made it through the 80s and 90s. The swing makes it now, with the agent as the reason it might finally be true. And it is never true, because the bet is not really about the tool — it is about the nature of software, which the tool does not change. The pendulum swings out on the belief that the uncertainty has been conquered and swings back on the discovery that it has not. The discovery is always the same discovery. Only the costume on the tool is new.

There is a precise way to see why the swing's bet is mis-specified, and it comes from a view of what software engineering actually is — a spectrum of distinct concerns stacked from the physical to the moral. At the bottom sit the laws of physics: the hard constraints no budget or cleverness can override. Above them, algorithms: the translation of what is theoretically possible into procedures that execute. Above those, architecture: the significant decisions, significant exactly to the degree that they are costly to change. Above architecture, organisation: building systems and teams that outlive any individual. Above that, economics: ensuring the thing built is sustainable. And at the top, ethics: the recognition that every line is ultimately a decision someone is accountable for. Software engineering is not any one of these bands. It is the whole stack, and the practice of it is the exercise of judgment across the stack.

Now place the agent on that spectrum, and the paper's claim becomes exact rather than rhetorical. The agent assists, mainly and powerfully, in one band: implementation — the translation of clear, bounded intention into working code, sitting roughly at the algorithmic layer. That is a real and valuable contribution to one band of a multi-band discipline. The swing's foundational error is to mistake acceleration of one layer for transformation of all of them — to reason that because the implementation band got dramatically faster, the whole stack has been transformed, and the discipline can now be operated as though specification-then-generation were the whole of it. But the layers the agent does not touch are exactly the layers where the hard part always lived: the architectural judgment about what is significant, the organisational design that outlives individuals, the economic discernment about what should exist, the ethical accountability for what is built. Those are judgment, and judgment is the durable skill precisely because it inhabits the bands the agent cannot reach. The swing accelerates the one layer that was never the constraint and then behaves as though it had transformed the layers that always were. That single misattribution — one band mistaken for the whole — is the pendulum's mechanism, stated in its most general form. Every swing in the history of the field is a version of it.

What the pendulum teaches, then, is not to distrust new tools. It is to hold, through each swing, a clear account of which layer the new tool actually changed — and to refuse the narrative, however confident and well-funded, that says the change at one layer has dissolved the difficulties at all the others. The agent is a genuine gift to the production layer. It is not a transformation of the epistemics of building the right thing, because nothing is, because that difficulty is intrinsic to what software is. An engineer, a team, or an organisation that holds that distinction clearly can take everything the tool offers and lose nothing that matters. One that loses the distinction will swing out on the narrative and swing back on the evidence, like every cohort before it, and call the round trip progress.

The final two sections turn from what the pendulum teaches to what to do with the teaching: how to keep the lesson while keeping the tool, and how, in the end, to catch the pendulum rather than merely ride it.

[Next: Section 16 — Keeping the lesson while keeping the tool]

16. Keeping the lesson while keeping the tool¶

Section brief. The constructive payoff — essential, because a purely critical paper is dismissible and (for the author) reads as grievance rather than leadership. Sketch what a genuinely AI-native practice looks like if it keeps the hard-won lessons: agents inside short feedback loops rather than at the end of long specification gates; prompts as living, revisable artefacts rather than signed-off contracts; comprehension preserved deliberately (guarding against Knowledge Debt) as a first-class engineering activity; product ownership embraced with an engineering-advocacy and cross-cutting-concern mechanism rather than instead of one; predictability offered as reliable cadence and steering, honestly, rather than scope-on-a-date, dishonestly. This is where the author's constructive-partner stance shows: same goals as leadership (quality, timeliness, ownership), better means.

Borrow the practitioner workflow (Zechner) as a concrete model, not just principle. The disciplined pattern worth describing: use agents to explore the solution space (spin up several, prototype competing approaches, evaluate, then commit) rather than to mass-produce committed code; guard-rail heavily before implementation (interfaces defined, modules identified, tests scoped) so the agent cannot wander; review core code line-by-line to the same standard as a human contributor; let throwaway code go unreviewed deliberately, as a choice keyed to the risk/novelty position (§6), not by default. This gives readers something actionable and demonstrates the author isn't anti-agent — the same tool, used inside the feedback loop instead of at the end of a gate. Note the matching map: explore-the-solution-space is where the genuine productivity gain lives, because it attacks the real bottleneck (design/thinking) rather than the false one (typing).

Drafted prose follows.

A paper that only diagnosed would deserve the dismissal it would surely receive: that it is the complaint of someone resistant to change, dressed up in history. So this section does the harder and more important thing. It describes what a genuinely AI-native practice looks like when it keeps the hard-won lessons instead of discarding them — a practice that takes everything the agent actually offers and gives up nothing that matters. The thesis of the section is that this is not only possible but already being done by disciplined practitioners, and that the difference between it and the swing is not the tool but the posture toward the tool. The same agent sits at the centre of both. One practice points it inside the feedback loop; the other points it at the end of a specification gate. Everything follows from that.

Begin with the single most important reframing, because it redirects the tool toward its real strength. Use the agent to explore the solution space, not to mass-produce committed code. The genuine, substantial productivity gain from agents is not that they type the final code faster. It is that they let you investigate several possible approaches cheaply and quickly — spin up competing prototypes of an idea, see how each actually behaves, learn from the comparison, and only then commit to one with real knowledge of the alternatives. This is the gift that matters, and notice precisely why: it attacks the real bottleneck rather than the false one. The real bottleneck was always the thinking and the deciding (Section 5); exploration is thinking and deciding, accelerated. The false bottleneck was the typing; mass-production of committed code is typing, accelerated, which speeds up the thing that was never slow while manufacturing the liability of Section 8. The same tool serves both. Pointed at exploration, it accelerates the activity that was genuinely scarce. Pointed at production, it floods you with code. The first is the practice; the second is the swing.

This reframing also resolves the apparent tension in Section 8's liability argument, and it is worth making the resolution explicit. Exploratory code is meant to be thrown away, so it carries no liability — its value is the learning it produces, not the artefact it leaves behind, and the agent's speed at producing-to-discard is therefore pure gain. The liability framing and the exploration gift are not in conflict; they are two halves of one coherent stance. Generate freely for learning and discard freely; retain deliberately and comprehend what you retain. The discipline is in keeping the two modes distinct and never letting exploratory output silently become retained production code without passing through the comprehension gate.

Which is the second element of the practice: guard-rail heavily before implementation, then review the kept code line by line. When the agent is set to implement something that will be retained, it should not be set loose on an open-ended prompt to wander and fill gaps with defaults. The context should be tightly constrained first — interfaces defined, the relevant modules identified, the tests scoped, the boundaries drawn — so that the agent operates inside a space whose shape was chosen by a human exercising architectural judgment, rather than improvising the shape itself from internet averages. And then the resulting code, the code that will be kept, is reviewed line by line, to exactly the standard one would apply to a contribution from a human colleague. This is the back-end gate that Section 11 found missing from the swing, restored to where it belongs. The comprehension that the agent's implementation would otherwise have stripped away (Section 3) is deliberately rebuilt by a human reading and understanding and standing behind every retained line. It is slower than letting the agent run. It is supposed to be. The slowness is the comprehension being preserved, and the comprehension was always the point.

The third element is to make the exemptions conscious and keyed to the work's position on the map of Section 6, rather than letting them happen by default. Throwaway code can go unreviewed — deliberately, as a choice, because it is throwaway. Code in the benign corner — low risk, low novelty — can be handled with a lighter touch, again as a choice that reflects where the work sits. The rigour is spent where the axes say it must be spent: on the high-risk, the novel, the architecturally significant, the retained. This is the opposite of the swing's one-size-fits-all imposition. It is a practice that locates the work before choosing the method, applying the agent's speed liberally where the stakes are low and the comprehension-preserving discipline strictly where the stakes are high. The judgment about which is which is itself the senior engineering skill the whole paper has been defending — and it is, not coincidentally, exactly the judgment that lives in the layers the agent cannot reach (Section 15).

From these elements, the organisational corollaries follow directly, and they map one-to-one onto the harms of Part II. Prompts should be held as living, revisable hypotheses rather than signed-off contracts (answering Section 4), which means the gate moves from the front to the back and the schedule is built on cadence rather than on frozen scope. Comprehension should be treated as a first-class, protected engineering activity — something the process explicitly makes room and time for, rather than an efficiency to be optimised away (answering Sections 3 and 12). Product ownership, which has real merit, should be adopted with a deliberate mechanism for the cross-cutting concerns and the seams between products — someone owning the spaces between, the architecture made an explicit choice rather than an unchosen consequence (answering Section 9 and the homeless-concern thread). The apprenticeship through which judgment passes between generations should be deliberately preserved, not dissolved by isolating each engineer with an agent that has no judgment to transmit (answering Section 10). And predictability should be offered to the business honestly, as reliable cadence and dependable steering, rather than dishonestly, as scope-on-a-date that cannot be kept (answering Section 5 and Section 13).

Notice what every one of those corollaries has in common with the leadership it is implicitly arguing against: the same goals. Quality. Timeliness. Ownership. A trustworthy relationship with the business. This is the heart of the constructive stance, and it is worth saying without hedging. The disagreement between this paper and the swing it critiques is not a disagreement about ends. Leadership wants quality software delivered reliably by people who own their work, and so does this paper, and so does every engineer worth the title. The disagreement is entirely about means — about whether those shared ends are served by reaching for false predictability through a front-loaded specification gate, or by the harder, less photogenic, genuinely more effective discipline of keeping the agent inside a real feedback loop with comprehension protected and judgment applied where the stakes demand it. The paper does not oppose the goals. It argues that the swing's means defeat the swing's own goals, and that a better means is available, already practised, and reachable from where any organisation currently stands.

It is worth dwelling, before closing, on how genuinely positive the agent-enabled version of this could be — because the constructive case is stronger than mere harm-avoidance, and a paper this concerned with failure modes owes the reader the upside in equal measure. Consider feedback loops. The traditional iterative cadence settled around the two-week sprint largely because that was the rhythm at which a team could produce something real enough to learn from. But the agent changes the economics of producing-something-real. If a working slice can be stood up in days, or even hours, then the feedback loop need not be two weeks long — it could be two days, or two hours. An agent used this way is not a threat to iterative practice; it is potentially the most powerful accelerant iterative practice has ever been given, because it attacks the one cost that set the floor on loop length. The same tool that, pointed at a specification gate, recreates Waterfall, can — pointed at the feedback loop — make the loop faster than it has ever been. This is the genuinely exciting possibility the swing squanders: not faster typing, but faster learning. An organisation that grasped this would use agents to get something in front of real users in hours and adjust before the day was out, compressing the discovery that was always the expensive part into a fraction of its former duration.

The same forces point toward a rethinking of team structure that need not be the grim deskilling story Part II warned of, provided it is done deliberately rather than by attrition. If implementation is cheap and the loop is fast, the heavy multi-role team built to manage slow, expensive production may give way to something much smaller and more potent: a tight unit in which a person guiding product direction, an engineer ensuring what is built aligns with sound design and organisational principles, and perhaps a quality or experience specialist, work directly and continuously against fast feedback. Some of the most effective teams have always pushed in this direction anyway — putting engineers directly in front of users to gather feedback first-hand, collapsing the relay of intermediaries through which understanding used to degrade. A small team of deeply engaged people, each retaining comprehension and judgment, armed with an agent that makes the loop fast, is a genuinely attractive shape — and it is the opposite of the isolated-engineer-with-an-agent that Section 10 feared. The difference between the attractive version and the corrosive one is, again, not the tool. It is whether the team is collapsed around preserved comprehension and direct contact with reality, or merely hollowed out to cut cost. The agent permits both. Which one an organisation gets is a choice, and naming the attractive possibility is part of making the better choice reachable.

The final section steps back from method to disposition — to what it means, in the end, not merely to survive the pendulum but to catch it.

[Next: Section 17 — Conclusion: catching the pendulum]

17. Conclusion: catching the pendulum¶

Section brief. Short. Return to the metaphor. The point of watching a pendulum is to know it swings — and that the swing is survivable and even useful if you remember which lessons are worth carrying through it. Restate the invitation: this is a field log, it will grow, and the reader is invited to recognise the pattern in their own organisation.

Drafted prose follows.

A pendulum is not a thing to be stopped. It swings because of forces that are real — and in our field the forces are real too: the genuine power of each new tool, the genuine and sympathetic hunger for predictability, the genuine pressure of a market selling the next transformation. You will not still the pendulum by wishing it still, and this paper has not tried to. What you can do is understand the motion well enough that you are no longer thrown by it.

That is the whole purpose of watching a pendulum: to know that it swings, and to know which way. The swing this paper has documented — back toward specify-everything-up-front, dressed in the language of agents and prompts and AI-native delivery — is survivable, and parts of it are even useful, if you carry the right lessons through it. The agent is a real gift to one layer of the work. Held inside a feedback loop, pointed at exploration, paired with protected comprehension and judgment applied where the stakes demand it, it takes nothing that matters and gives something substantial. The danger was never the tool. The danger was the amnesia — the forgetting of why we work iteratively, the un-learning of a lesson the field paid for in expensive decades, made easy by the fact that much of the profession had already let that lesson decay into ceremony before the swing arrived to finish the job.

So the disposition this paper argues for is not resistance and not enthusiasm but memory under motion — the ability to hold, while the pendulum carries everyone toward the confident new narrative, a clear account of what is genuinely new (the cost of production at one layer) and what is exactly as old as it ever was (the irreducible difficulty of knowing what to build). The engineer who holds that distinction is not anti-AI and not swept up in it. They are simply not fooled by the round trip. They can take the tool's real gift and decline its false promise, because they remember which is which. That memory is what it means to catch the pendulum rather than ride it: not to stop the swing, but to refuse to lose, on the way out, the things you will need when it comes back.

This document is, and will remain, a field log. It was written from inside the swing, in real time, by someone watching it reshape an organisation he is part of — and it is deliberately unfinished. The arguments of Part I are as complete as today's understanding allows. The evidence of Part II is just beginning; its observation logs are mostly empty, waiting for the specific, anonymised, mechanism-tied entries that the coming months will supply. The synthesis of Part III is a hypothesis about the way through, to be tested against what actually happens. The paper will grow, and some of what it currently argues may be revised by what the evidence shows — which is, after all, the very method it advocates: hold the position as a hypothesis, build, learn, and adjust on the basis of what you come to understand.

And so the closing word is an invitation rather than a conclusion, because a pattern is only useful if others can see it in their own situation. If any of this is recognisable — the syllogism that felt so reasonable, the prompt that was really a spec, the gate that moved to the front, the promise that could not be kept, the comprehension quietly draining out of systems that grow faster than anyone can understand them — then the pattern is doing its work. You are watching the pendulum too. The task, the same for all of us caught in the motion, is to remember what is worth carrying through to the other side.

[End of Part III. Part IV records the competing perspectives; the Appendices follow.]

Part IV — The Open Register: competing perspectives¶

Purpose of this part. This paper makes an argument, and an argument is only as honest as its willingness to host the strongest versions of what contradicts it. This part exists to do exactly that — to record, in their own terms and at full strength, the serious perspectives that disagree with or qualify the thesis. It is deliberately not a rebuttal section. The convention elsewhere in the paper is that each evidence entry must carry a steel-man; this part is the steel-man given room to stand on its own, un-rebutted, because a living document examining a phenomenon still in motion should not pretend the question is closed when it is not. Some of these perspectives may be vindicated by the evidence that accumulates in Part II. Some may not. The paper's commitment is to let the evidence decide rather than to defend its opening position — and to update this register, and the body, as understanding changes. Where a perspective is paraphrased from a named individual's feedback, it is rendered faithfully and anonymously; the people who offered these readings did the paper the service of disagreeing well, and that service is honoured by representing them accurately rather than conveniently.

18. Perspectives that challenge the thesis¶

The perspectives below are numbered for reference, not ranked. Each is stated as its holder would state it, followed only by a note on what evidence would tend to confirm or disconfirm it — not by a counter-argument. The point is to know what to watch for, not to win in advance.

Perspective A — "More context up front is not a return to Waterfall." A reading from practitioners with direct experience of genuinely heavyweight historical methodologies holds that the paper risks conflating two different things, and that the lived experience of working this way does not feel like Waterfall at all. Providing the best context you can before design is simply understanding the business requirement properly — sound practice in any era. There are always gaps, issues, and assumptions, but a subsequent pass fixes them, and that is precisely what is observed happening: not one-shot builds discarded and restarted from a revised spec, but a base that is incrementally improved. Because the gap between iterations is now so short, even an incomplete specification can be remediated quickly through iterative steps — the same correction loop that has always existed, only faster. On this view the paper's alarm about "specification" misfires, because what is actually happening is fast iterative refinement of a well-contextualised starting point, which is good practice wearing unfamiliar clothes. (This perspective deserves particular weight because it comes from direct experience of the real heavyweight methodologies the paper invokes — the comparison class is first-hand, not imagined.) — What would confirm it: observation that teams genuinely treat the up-front artefact as revisable, iterate on a short cycle, and remediate gaps without ceremony or phase-gate friction. What would disconfirm it: observation that the up-front artefact is in practice frozen at sign-off, that remediation requires re-opening a closed gate, or that the "another pass" is rare rather than routine. — Where the paper already half-agrees: the context-vs-gate distinction in Section 1 concedes that rich context is good and that only the frozen gate is the target. The open question this perspective sharpens is empirical: in the field, is the artefact actually held as a hypothesis (the paper's healthy case) or as a contract (the paper's harmful case)? This perspective asserts the former is what is really happening. Part II is where that gets tested.

Perspective B — "Verification is a maturity problem, and the maturity is arriving." The paper's claim that the back-end verification gate has been dropped (Section 11) may describe an early, immature stage rather than an inherent property of agentic delivery. Many organisations are actively building flows that embed detailed verification into the AI development cycle — automated checks, structured review, test generation and validation as first-class steps. On this view the missing-gate problem is real today in some places but transient, a wrinkle that maturing practice is already ironing out, and the paper risks mistaking a teething stage for a structural flaw. — What would confirm it: observation of verification practices genuinely closing the loop — comprehension preserved, defects caught at the back gate, quality holding as adoption matures. What would disconfirm it: observation that verification remains front-loaded (approval of intent) while back-end checking stays shallow or absent even in otherwise mature setups, or that automated verification is itself un-comprehended and merely defers the problem. — Note: the paper's own Section 11 explicitly frames the broken loop as a failure of the swing "as commonly implemented," not as inherent to AI-assisted work, and points to disciplined practitioners who do close the gate. This perspective and the paper may largely agree; the disagreement is only about how widespread the mature version is and how fast it is spreading. That is a question for the evidence.

Perspective C — "Trust is not being destroyed; sentiment is broadly positive." The paper's Section 13 describes a trust-erosion mechanism, but the prevailing experience reported by many is the opposite: positive outcomes from AI used effectively, value delivered, and a business–IT relationship strengthened rather than strained. On this view the over-promise cycle is a hypothetical failure mode that, in practice, is outweighed by real wins — and the paper risks presenting a possible trajectory as though it were an observed one. Effective use is not always the case, of course, but the base rate of positive sentiment is real and should not be written out of the picture. — What would confirm it: accumulating observations of delivered value, met expectations, and trust maintained or improved where AI is used with discipline. What would disconfirm it: accumulating observations of the over-promise cycle actually running — optimistic agent-accelerated commitments missed, trust spent, the cycle reinforcing itself. — Note: Section 13 was revised to mark its claims explicitly as a mechanism, not a measurement, and the observation log is designed to record positive trust outcomes as readily as negative ones. This perspective is, in effect, a prediction that the log will fill up green. The paper does not claim to know that it will not.

Perspective D — "There is no mis-selling; the tools and their risks are openly known." The paper's vendor-pressure section (Section 7) may overstate the role of market manipulation. The capabilities and limitations of these tools are, on the whole, openly documented — including in the very sources this paper draws on — and sophisticated buyers understand what they are getting and what the risks are. To frame the narrative as something "sold over the heads" of engineers risks implying a deception that is not really present; decision-makers are not dupes, and the information needed to evaluate the tools is available to anyone who looks. — What would confirm it: observation that decision-makers do in fact weigh documented risks, that adoption decisions reflect informed trade-offs rather than narrative capture. What would disconfirm it: observation that the failure modes really are systematically invisible at decision-making altitude despite being documented, i.e. that availability of information is not the same as its reaching the decision. — Note: Section 7 was revised to drop the word "mis-selling" and to narrow its claim to a structural asymmetry in how information reaches decision-makers, explicitly conceding that openness of documentation is real. This perspective presses on whether even that narrower claim holds. It is the paper's most contestable section and is marked as such in its own text.

19. Perspectives that extend or sharpen the thesis¶

Not all serious responses pull against the paper; some push it further, and recording them here keeps the register balanced and prevents Part IV from reading as a list of retreats.

Perspective E — "The real disease is misunderstanding Agile, and it predates AI." A practitioner reading locates the root cause not in AI at all but in the long-standing, widespread misunderstanding of what Agile is and does — and sees the AI swing as merely the latest symptom. On this view there are two camps: those who "do Agile" because everyone does (large organisations, full ceremony, constrained squads, still building from specs — Waterfall in sprints) and those who run true Agile (principles understood, self-organising teams, willing to experiment). It is the first camp that will now want to specify everything up front and hand it to agents, because doing so barely differs from what they were already doing — but that move addresses none of the root causes true Agile exists to address: building higher-value products for the customer and getting something useful out sooner. This perspective also presses two points the paper should hold onto: that customers nine times out of ten describe the solution they think they want rather than the problem they are trying to solve, which is exactly why feedback loops matter; and that a proper MVP (not an eighty-percent-built solution mislabelled as one) remains invaluable. — Relation to the thesis: this strengthens and partly reframes the paper. It suggests the pendulum metaphor, while apt, may understate how many organisations never actually left the far swing — they performed Agile without absorbing it, and the agent simply re-legitimises their original instinct. The two-camps framing has been folded into Section 2; it is recorded here in fuller form because it is a substantial perspective in its own right.

Perspective F — "Agents could be the greatest accelerant iterative practice has ever had." A constructive extension holds that the paper, for all its even-handedness, still frames the agent largely as a hazard to be managed, and that the genuine upside deserves equal prominence: if implementation is cheap and fast, feedback loops can collapse from weeks to hours, learning can accelerate dramatically, and team structures can become smaller, tighter, and more directly connected to users. On this view AI, used to accelerate learning and feedback rather than to justify upfront specification and phase gates, could be a huge enabler of Agile rather than a threat to it. The fork is stark and worth stating in the perspective's own terms: use AI to accelerate learning and feedback, and it is a profound enabler; use it to justify more upfront specification and stronger phase gates, and we are simply rediscovering old mistakes with better tooling. — Relation to the thesis: this is the optimistic mirror of the whole paper and the paper embraces it. Section 16 was expanded to give the shorter-feedback-loops and collapsed-team-structure possibilities their full due. It is recorded here because it is not merely a qualification but a genuine alternative future — the same tool, the same moment, a different choice — and the register should hold the bright possibility as prominently as the dark one.

20. How this register is maintained¶

This part is permanent infrastructure, not a one-time concession. As the paper grows, the rules for it are: any serious perspective that challenges or materially qualifies the thesis is added here at full strength, in its own terms, before any response is formed. Perspectives are revised as their holders refine them and as evidence bears on them. A perspective that the accumulating evidence (Part II) confirms is not quietly deleted to protect the thesis; it is marked confirmed, and the body of the paper is corrected to match — because the method this paper exists to defend is precisely that one updates the position when the feedback arrives, rather than freezing it and defending the gate. The register's health is a measure of the paper's honesty. If it ever shrinks while the body's confidence grows, something has gone wrong.

Appendices¶

Appendix A — Evidence capture template¶

Brief. A short reusable template the author fills in each time a fresh observation occurs, so entries stay consistent and rise above anecdote. Suggested fields: Date · Impact area (team/software/trust/profession) · The observation (anonymised, factual) · The structural mechanism it illustrates · The lesson being un-learned · Counter-argument / steel-man (what's genuinely defensible about it) · Status (one-off / emerging pattern / confirmed pattern). The steel-man field is non-negotiable: it's what keeps the paper credible and the author honest.

Template follows. Copy one block per observation into the relevant Part II section's observation log. Keep entries factual and anonymised; the steel-man field is mandatory, not optional.

Observation entry template

Date logged: (DD/MM/YYYY)
Impact area: (Team / Verification / Software / Business–IT Trust / Profession — pick the primary one; cross-reference others if relevant)
The observation: (What actually happened. Anonymised — no names, no identifying product or org detail. Factual and specific: what was said, decided, shipped, or broke. Resist editorialising here; the analysis goes in the next fields.)
The structural mechanism it illustrates: (Which mechanism from the paper does this instantiate? e.g. front-loaded gate / verification loop closed on itself / Knowledge Debt widening / homeless cross-cutting concern / over-promise cycle / apprenticeship dissolved. This is the field that turns anecdote into evidence — if you can't name a mechanism, the entry may not belong.)
The lesson being un-learned: (Which hard-won iterative-era lesson does this episode quietly discard? Tie it back to the thesis.)
Counter-argument / steel-man: (MANDATORY. The strongest fair case that this was reasonable, or that it will turn out fine, or that leadership's choice was defensible given what they knew. If you cannot state a credible steel-man, either you don't yet understand the situation well enough to log it, or it isn't evidence — it's a grievance. Do not skip this field.)
Status: (One-off / Emerging pattern / Confirmed pattern. Upgrade status only when repetition across independent instances justifies it. A single vivid episode is a one-off until it recurs.)
Cross-references: (Other entries this connects to; relevant section numbers, e.g. Section 3, Section 11.)

Worked example (illustrative, fictional — shows the level of discipline expected)

Date logged: (example)
Impact area: Verification (cross-ref: Software)
The observation: A feature was approved at the prompt-and-estimate gate, built by an agent, demonstrated successfully to stakeholders, and marked done. Three weeks later a defect surfaced in a code path that the original prompt had not described and that no human had read before release.
The structural mechanism: The verification loop closed on itself (Section 11) — the only check was front-loaded approval of the prompt; there was no back-end gate, so generated code in the prompt's blanks reached production unread.
The lesson being un-learned: That "done" must rest on an external arbiter (a passing test, a comprehending reviewer), not on approval of intent.
Counter-argument / steel-man: The feature did work as specified; the defect was in an edge case that might have slipped through human-written code too. Front-loading review is efficient for the large benign majority of features, and a heavier back-end gate on everything would slow delivery the business is asking to speed up. A fair reading is that the process optimises for the common case and this was an uncommon one.
Status: One-off (watch for recurrence before upgrading).
Cross-references: Section 3, Section 11, Section 12.

Appendix B — Glossary¶

Brief. Define the load-bearing terms so the public/industry audience shares your vocabulary: Pendulum, Knowledge Debt, Big Design Up Front, Waterfall, iterative/incremental delivery, feedback loop, predictability-through-control vs. predictability-through-cadence, AI-ready prompt, agent, steel-man. Add from source material: the risk/novelty/complexity space (Booch's three axes); architectural significance ("measured by cost of change"); engineering "smell" (intuition that something is subtly wrong); vibe coding vs. enterprise vibe coding (and why they're the same act); explore-the-solution-space (the legitimate agent productivity gain); clankers (agent-generated PR/contribution flood); unreliable narrator (Booch's framing of LLM confabulation).

Draft definitions follow. Tighten or expand to taste; these establish shared vocabulary for a reader who may not share the author's background.

The Pendulum — The recurring motion by which the software field swings between trusting up-front specification and trusting iterative feedback. Each swing is propelled by a narrative that a new advance has made the old constraints obsolete; each swing back is driven by rediscovering that it has not. The series-level frame of which this paper documents one instance.
Knowledge Debt — The accumulating gap between the code a system contains and the comprehension the team actually holds of it. Distinct from technical debt (messy code you can see); Knowledge Debt is the invisible absence of understanding, incurred when implementation — and with it comprehension — is delegated.
Big Design Up Front (BDUF) — The practice of fully specifying and designing a system before building it, on the assumption that requirements can be known and frozen in advance. The core practice this paper argues the swing reintroduces.
Waterfall — The sequential, phase-gated development model (requirements → design → build → test → ship) popularly traced to Royce's 1970 paper — which, notably, presented it as a model to avoid. Shorthand for the discredited up-front-and-sequential approach.
Iterative / incremental delivery — Building software in small cycles, each producing something real enough to learn from, steering by feedback rather than by an up-front plan. The correction the Agile movement named in 2001.
Feedback loop — The build-a-little, learn, adjust cycle on which iterative practice depends. The paper argues the swing breaks it at both ends: deferring discovery at the front, delivering un-comprehended code at the back.
Predictability-through-control vs. predictability-through-cadence — Two kinds of predictability. The first promises exact scope on an exact date (seductive, unkeepable for novel work). The second promises reliable rhythm and dependable steering (less impressive, genuinely deliverable). The swing trades the second for the first.
AI-ready prompt — A structured natural-language description of a desired feature, complete enough (in theory) for an agent to implement against. The paper's contention: structurally a specification, and necessarily incomplete, because completeness is the code.
Agent — An AI system that can carry out multi-step implementation work from a natural-language instruction. Genuinely powerful at production; characterised in the source material as an eager junior that does not know what it does not know.
Risk / novelty / complexity space — Three independent axes (cost of being wrong; how well-trodden the ground is; deployment vs. architectural complexity) along which any software work can be located. Agents excel in the low-risk, low-novelty, deployment corner; the swing errs by imposing that corner's playbook across the whole space.
Architectural significance — The property of a design decision measured by the cost of changing it later. Significant decisions are exactly what a per-feature delivery model is structurally blind to.
Engineering "smell" — The cultivated, hard-to-articulate intuition that something is subtly wrong, built through friction and experience; one of the durable human skills the agent cannot supply and skill-atrophy threatens.
Vibe coding / enterprise vibe coding — Casually dictating an app at a high level vs. writing a detailed spec first. The source material's contention: structurally the same act, since both hand the agent gaps to fill with defaults.
Explore-the-solution-space — Using agents to prototype and compare multiple approaches cheaply before committing. The legitimate, substantial productivity gain, because it accelerates the real bottleneck (thinking) rather than the false one (typing).
Clankers — Informal term for the flood of agent-generated contributions (e.g. open-source pull requests) characterised by high volume, lavish descriptions, and low mergeability; the parable of volume drowning the comprehension signal.
Unreliable narrator — A framing of large language models as non-deterministic, prone to confabulation, and without grounding in truth — hence always requiring guard rails. Useful shorthand for why generated output cannot be trusted unverified.
Steel-man — The strongest, fairest version of an opposing argument, stated before it is engaged. The discipline that keeps this paper credible rather than a grievance; mandatory in every evidence entry.

Appendix C — Corroborating voices¶

Brief. A short, honest sourcing note. This paper's primary evidence is first-hand field observation; its secondary support is the wider practitioner conversation. Two voices are drawn on repeatedly and should be credited plainly: a veteran software architect (judgment, the engineering spectrum, the risk/novelty/complexity model, "smell," and a measured optimism about the profession) and a coding-agent author (the waterfall-return argument, "code is never free," the disciplined explore-then-implement workflow, the economic/vendor-pressure observations, and the contribution-flood parable). Discipline for the author: these are corroboration, not proof. The argument must stand on the field evidence and the reasoning; where an expert agrees, cite briefly and move on. A paper that leans on agreeing authorities to carry its weight is as fragile as the up-front spec it critiques. Quote in the author's own words, attribute by role rather than by elaborate credential-stacking, and resist the pull to treat "an expert said so" as the argument.

Sourcing note follows.

The evidence in this paper is of two kinds, and honesty requires keeping them distinct. The primary evidence is first-hand: direct observation of an organisation undergoing the swing, gathered in real time and logged, anonymised, under the discipline of Appendix A. That is the spine. The secondary support is the wider practitioner conversation — the views of experienced people, encountered through long-form discussion, who have arrived independently at parts of the same diagnosis.

Two such voices recur in this document and are owed plain credit. The first is a veteran software architect whose long career spans the building of large systems, from whom this paper draws the framing of software engineering as a spectrum from physics to ethics, the risk/novelty/complexity map, the notion of engineering "smell," the characterisation of the agent as an eager junior and of language models as unreliable narrators, and — importantly — the measured optimism that it is a good time to be an engineer if the profession protects what the tool cannot supply. The second is the author of a minimal coding agent, a working practitioner who builds and uses these tools daily, from whom this paper draws the direct argument that spec-driven development is a return to waterfall, the formulations that the most detailed spec is the program itself and that code is never free, the disciplined explore-then-implement-then-review workflow that Section 16 builds upon, the observations about token pricing and the vendor shift toward enterprise buyers, and the parable of the contribution flood.

The discipline this paper holds itself to regarding these voices is the same one it asks of any evidence entry, and it is worth stating openly so the reader can hold the author to it. These are corroboration, not proof. That two thoughtful practitioners independently see parts of the same pattern is meaningful — independent convergence is genuine evidence — but it is not the argument, and it must never be allowed to become the argument. The case stands or falls on the reasoning and the first-hand evidence; the expert agreement is a check on that reasoning, not a substitute for it. To lean on agreeing authorities to carry the weight would be to commit, in the paper's own structure, the very error it diagnoses elsewhere: mistaking a confident external narrative for an independently verified truth. The voices are cited by role rather than by credential-stacking, quoted briefly and in the author's own paraphrase where possible, and then set down. Where they agree, that is reassuring. Where the first-hand evidence should ever contradict them, the first-hand evidence wins — because that, in the end, is the method this entire paper exists to defend.