← Back to home

Shell Theory: A Model That Explains When AI Replaces Programmers

· Mart van der Jagt

My teammate and I keep having the same argument. One moment we're witness to the cascade. Watching us software engineers become empty shells, mere relays for whatever the AI produced. The next moment we're amazed how iterative collaboration with AI keeps pushing our boundaries beyond what we thought achievable.

Both observations feel true simultaneously.

The Contradictions

Three narratives keep surfacing in our discussions:

  1. "AI alone delivers 80% of expert quality." Even without deep expertise, you can get pretty far.
  2. "AI is an amplifier of operator skill." The better you are, the higher the yield.
  3. "Relying on AI output reduces the impulse to think critically." The more you trust it, the less you question it.

These statements seem logically incompatible. The amplification of skill argument is moot if everyone is an expert by default. Relying more on AI can't simultaneously amplify skill and erode critical thinking. And if AI gives us expert quality regardless, there's no reason to worry about reduction in critical thinking.

I've been trying to reconcile this intuition into something concrete. What follows is a model. It's not scientifically validated, but useful for reasoning about what we're observing.

The Model

Without AI

Let's define output without AI as a function of agency (V) and experience level (E):

R_noAI(V, E) = V × (1 + E)

Where:

Agency is the foundation. Experience multiplies your output. In this model a principal engineer with low agency still produces less than a senior with high agency.

With AI

Here it gets interesting. AI introduces exponential amplification, but this amplification is masked by a floor until your yield exceeds it:

Y(V, E) = V × (1 + E) × exp(A × ((1 + E)/2 - 1))

R_AI(V, E) = max(F, Y(V, E))

Where:

In plain terms:

  1. The Yield Y(V, E): AI applies exponential amplification to your baseline output from the start. This yield exists whether you're a junior or a principal. It's always there, compounding your agency and experience.
  2. The Flatline F: If your amplified yield Y falls below the flatline F, you only see F. AI rescues you to a fixed output level. You don't have the skills yet to know which questions to ask, or you don't have the curiosity to ask those questions.
  3. The Amplification Zone: Once Y exceeds F, you see your full exponential yield. The more you bring, the more AI compounds it. Amplification is capped. Not by AI's ceiling, but by ours. Our cognitive limits bind what we can meaningfully process.
Conceptual diagram showing the Shell Theory model: a horizontal flatline F, an exponential yield curve Y, and the actual output R_AI which follows the flatline until Y exceeds F, then follows the exponential curve into the amplification zone.

What This Model Explains

Resolving contradictions 1 and 2: The floor-ceiling mechanism.
The model resolves the apparent conflict between "AI gives everyone 80%" and "AI amplifies skill." They are actually mathematically distinct effects operating on different populations. When Y < F, you're being rescued to the flatline. You ship working code and feel productive, but your output is artificially lifted to a floor you cannot pass. When Y > F, AI triggers exponential amplification; your yield scales with both agency and experience. The max(F, Y) function ensures the floor, while the exponential in Y raises the ceiling.

Resolving contradiction 3: Agency matters.
The concern that "relying on AI erodes critical thinking" isn't addressed by the model's equations. It's addressed by how agency is defined. The model treats V as a parameter, but the definition describes agency as responsive: growing through deliberate practice, atrophying through disuse.

This creates a feedback loop that the math alone doesn't show:

The model doesn't predict this erosion, but it explains why the erosion matters. Your position isn't determined by a one-time calculation; it's determined by what V becomes over time. Two engineers with identical starting positions can diverge: one practices deliberate engagement and grows V, eventually breaking into the amplification zone. The other accepts the rescue, lets V decay, and becomes permanently dependent on the flatline.

This is why contradiction 3 coexists with contradictions 1 and 2. AI can amplify skill (for those above the flatline). AI does provide a floor (for everyone). And AI may erode critical thinking (for those who let the floor replace engagement). The determining variable is whether you treat the flatline as a safety net or a hammock.

Calibrating the Model

Feel free to skip this section. What follows is an attempt to assert the model against real-world anchors. If you're not interested in the math, jump ahead to The Model in Practice. The calibration points that follow are illustrative. The model's value isn't in the exact numbers but in the structural insight: Floor and amplification are distinct mechanisms operating on different populations.

Defining Experience Levels (E)

Real expertise development has been hypothesized to follow different types of curves. Logarithmic (rapid early gains, diminishing returns), S-shaped (slow start, steep middle, plateau), or non-uniform (qualitative leaps between levels). There's no consensus in the literature on which pattern best describes software engineering capability growth. Linear is simple, and for a thought experiment, simple wins:

Note that with this definition juniors get zero amplification by design. The ability to leverage AI beyond the flatline is in itself what distinguishes a higher experience level.

Defining the Flatline (F)

To determine F, we need a reference point. AGI provides that anchor. But what is AGI in measurable terms?

Google DeepMind's Levels of AGI framework proposes evaluating AI systems across two dimensions: performance depth (how well) and generality (how broad). Rather than treating AGI as a single threshold, it defines graduated levels from novice through expert to superhuman. This framing suggests we can anchor AGI to domain-specific benchmarks that measure expert-level performance.

The SWE-bench verified benchmark offers exactly that. This benchmark evaluates AI models on 500 real GitHub bug-fixing tasks where they must produce patches that pass actual test suites. Crucially, these tasks come with human time estimates: 91% take less than one hour for an "experienced engineer" who has "a few hours to familiarize themselves with the codebase."

The phrase "experienced engineer" is key. SWE-bench was calibrated against senior-level performance on well-defined tasks. And it only measures a slice of engineering work: fixing bugs in existing codebases with clear acceptance criteria. It doesn't measure architectural decisions, system design, handling ambiguous requirements, or greenfield development.

This gives us our first anchor: AGI on SWE-bench = F = 100 correlates with the output of a moderate-agency senior on well-defined work.

The leading models today (Claude Opus 4.5 and Gemini 3 Pro) score around 79% on SWE-bench Verified. This gives us the current flatline: F = 79.

Deriving Agency (V)

If a moderate senior (E=3) produces 100 on SWE-bench-type tasks without AI:

R_noAI = V × (1 + E) = 100
V × 4 = 100
V = 25

This gives us: V = 25 represents moderate agency.

With V = 25 as the anchor, we model agency as normally distributed across the engineering population. Assuming a variation of 20% with μ = 25 gives σ = 5:

The exact values are illustrative. What matters is the concept: Agency varies meaningfully across engineers, and this variance affects output before AI even enters the equation.

If you have lower agency, you can still reach the flatline. It just takes longer to get there.

Deriving the Amplification Rate (A)

To define the slope of the exponential curve, we need a calibration point. A known relationship between experience and amplification. Without empirical data stratified by experience level, we make an informed assumption: at principal level (E=4), AI amplifies output to 2× R_noAI.

It reflects a hypothesis that the most experienced engineers can roughly double their effective output through AI collaboration. Leveraging it for code generation, exploration, and iteration while applying judgement that compounds the result.

Using the standard technique for deriving an exponential rate constant (taking the natural logarithm of both sides) we get:

A = ln(C) / S = Exponential Amplification Rate
Where: S = (E - 1)/2 = Experience Offset

With E = 4 and C = 2:

S = (4 - 1)/2 = 1.5
A = ln(2) / 1.5 ≈ 0.462

This anchors the exponential curve. AI is powerful enough to meaningfully amplify engineers, with the curve calibrated to reach 2× at principal level. If future evidence suggests principals achieve a different amplification ratio—say, 3×—then A would become ln(3)/1.5 ≈ 0.732.

The Model in Practice

With each parameter anchored to an assumption, here's how the model plays out:

Parameters:

Chart showing output values for moderate agency (V=25) across experience levels: Junior gets lifted to flatline at 79, while Mid-level (94), Senior (159), and Principal (250) show increasing amplification above the flatline.

The asymmetry is clear: below the flatline, the exponential yield exists but is masked—everyone sees 79. Above it, the exponential becomes visible and compounds experience, reaching 2× output at principal level.

Reality

The model assumes you're working within a single domain where your experience level applies uniformly. Reality is messier. A single task often demands multiple skill domains and your experience level differs across them.

Consider a senior backend engineer who doesn't know frontend, building a full-stack feature. AI amplifies their backend work exponentially while rescuing their frontend work to the flatline. The final output can be equally impressive though.

What differs is the force behind the output. A specialist operating entirely above the flatline earns their amplified output through skill. A generalist in the blend achieves high output through a combination: part genuine skill amplification (in their strong domains), part being rescued (in their weak domains).

This matters for growth. When output comes from skill, you're building compound advantage. When output comes from the flatline, you're borrowing capability you haven't earned yet. This is not bad, but let's remain honest about which parts of your work are truly yours. Most importantly, your compound depends on the agency you are showing and the time you have already spent building experience. AI can accelerate knowledge acquisition. But seniority is knowledge and experience. You still need to live through production incidents, watch systems evolve, and feel the consequences of decisions that seemed fine until they weren't.

What Happens When We Reach AGI?

Before we continue, a quick recap on what AGI means. It's not machines taking over. The formal definition is AI matching the cognitive capability of a well-educated adult. In this model we have anchored that to the SWE-bench: AGI translates to senior-level execution on well-scoped tasks.

So: the flatline F currently sits at 79, and with AGI the flatline rises to F = 100. Every developer will be lifted to senior-level execution as their new floor on scoped work. It will be more difficult to escape the shell zone. Only high-agency mid-levels and above will reliably enter the amplification zone and low-agency engineers may remain trapped in the shell zone even at senior level.

But AGI on SWE-bench doesn't demonstrate the ability to make architectural trade-offs, scope ambiguous problems, or engage in higher-order thinking to determine which tasks should exist in the first place. And benchmarks that do test these capabilities (ARC-AGI-2 for compositional reasoning, RE-Bench for long-horizon R&D, SPIN-Bench for strategic planning under uncertainty) show significant gaps. The exponential amplification zone doesn't disappear, but shifts to a different domain.

When Can We No Longer Best It?

The amplification advantage depends on your yield exceeding the flatline: Y(V, E) > F. As F rises, fewer engineers qualify. At some point, the flatline exceeds what any human can produce even with AI assistance, and the amplification zone disappears entirely.

The maximum AI-assisted output in our model is a high-agency principal: Y(35, 4) = 35 × 5 × 2 = 350. If AI capability reached F = 350, even the best engineers would fall below the flatline. At that point, everyone produces 350—the flatline value—regardless of agency or experience. No amplification, no differentiation. Just the floor. It's the penthouse floor though.

The intermediate thresholds tell the story of who loses amplification as F rises:

F value Benchmark equivalent Who loses amplification
79 (today) ~79% SWE-bench All juniors, low-agency mid-levels
100 (AGI) 100% SWE-bench All juniors, low/moderate mid-levels, low-agency seniors
175 1.75× senior-level Everyone except high-agency seniors and moderate+ principals
350 3.5× senior-level Everyone—amplification zone collapses

Beyond F = 100, we're extrapolating past the SWE-bench anchor into speculation. What the model does predict with more confidence: as F rises, the amplification zone shrinks. The threshold for "bringing enough to exceed the flatline" keeps rising. Whether the threshold will rise enough to exceed human capability is an open question, but the direction of pressure is clear.

So, Does AI Make Us Better or Dumber?

Both. That's the paradox.

If you stay below the flatline, accepting AI's first answer, never pushing beyond what it hands you, then you're being rescued, not amplified. Your skills may atrophy. You become a shell.

If you operate above the flatline, using AI as a sparring partner, iterating on its suggestions, bringing domain expertise and architectural judgment, then you're being exponentially amplified. You become more capable than you could be alone.

The same tool, but with two completely different outcomes. The variable is you.


Curious where you land? Take the Shell Theory self-test to find out whether AI is amplifying your skills or turning you into a shell. And if you lead a team, read how to retain the high-agency engineers who are most at risk of leaving in the AI era.

Frequently Asked Questions

Will AI replace programmers and software engineers?

Not as a category, but it will replace the contributions of those who stop engaging critically. AI can already match roughly 80% of senior-level performance on well-defined tasks. For programmers and software engineers operating below this flatline, AI effectively produces their output for them. For those operating above it, AI exponentially amplifies what they bring. The determining factor is not whether AI gets smarter, but whether you bring enough agency and experience to operate above what AI delivers alone.

Is AI taking entry-level jobs? Should junior developers worry?

AI is taking entry-level tasks, not entry-level careers. The flatline means that AI already produces the output a junior would deliver on well-defined work. But output is not the same as capacity. Juniors with high agency—curiosity, discipline, initiative—use AI as scaffolding to learn faster and move into the amplification zone. Juniors without agency become permanently dependent on output they cannot evaluate. The question is not whether junior developer jobs are disappearing, but whether you are building the agency to move beyond them. See The Junior Developer of the Future for more on this.

How does AI erode skills over time?

Through a feedback loop. If the flatline rescues your output, you ship without needing to engage critically. Without that engagement, your agency (curiosity, discipline, cognitive engagement) atrophies. Lower agency produces lower yield, pushing you further below the flatline. The more AI rescues you, the more you need to be rescued. This is the mechanism behind concerns that AI makes developers lazy—it’s not laziness per se, but a gradual erosion of the habits that keep you above the floor.

What does “high agency” mean in this context?

Agency is the combination of curiosity (the drive to question), discipline (the persistence to follow through), and cognitive engagement (the capacity for reasoning and imagination). It is not a fixed trait. Agency grows through deliberate practice and atrophies through disuse. High-agency developers don’t just use AI—they challenge its output, iterate on suggestions, and apply their own judgment. This is what separates those in the amplification zone from those stuck at the flatline.

How do I know if I’m becoming too dependent on AI?

Ask yourself: are you treating the flatline as a safety net or a hammock? If you accept AI output without questioning it, skip understanding concepts because “it works,” or feel less capable when AI isn’t available, you may be in the shell zone. Take the self-test to find out where you stand.