Shell Theory: Solving the AI Amplification Paradox
My teammate and I keep having the same argument. One moment we're witness to the cascade. Watching us software engineers become empty shells, mere relays for whatever the AI produced. The next moment we're amazed how iterative collaboration with AI keeps pushing our boundaries beyond what we thought achievable.
Both observations feel true simultaneously.
The Contradictions
Three narratives keep surfacing in our discussions:
- "AI alone delivers 80% of expert quality." Even without deep expertise, you can get pretty far.
- "AI is an amplifier of operator skill." The better you are, the higher the yield.
- "Relying on AI output reduces the impulse to think critically." The more you trust it, the less you question it.
These statements seem logically incompatible. The amplification of skill argument is moot if everyone is an expert by default. Relying more on AI can't simultaneously amplify skill and erode critical thinking. And if AI gives us expert quality regardless, there's no reason to worry about reduction in critical thinking.
I've been trying to reconcile this intuition into something concrete. What follows is a model. It's not scientifically validated, but useful for reasoning about what we're observing.
The Model
Without AI
Let's define output without AI as a function of agency (V) and experience level (E):
R_noAI(V, E) = V × (1 + E)
Where:
- V = Agency: For this model, agency is defined as the combination of curiosity (the drive to question), discipline (the persistence to follow through), and cognitive engagement (the capacity for reasoning and imagination). Agency is active, not passive. It can grow through deliberate practice or atrophy through disuse. Unlike fixed traits, agency responds to how you work.
- E = Experience level (junior through principal)
Agency is the foundation. Experience multiplies your output. In this model a principal engineer with low agency still produces less than a senior with high agency.
With AI
Here it gets interesting. AI introduces exponential amplification, but this amplification is masked by a floor until your yield exceeds it:
Y(V, E) = V × (1 + E) × exp(A × ((1 + E)/2 - 1))
R_AI(V, E) = max(F, Y(V, E))
Where:
- F = AI Flatline: The fixed output floor that AI alone can deliver
- A = Amplification Rate: Controls how steeply experience compounds into output
- Y = Yield: Your AI-amplified output before the flatline is applied
In plain terms:
- The Yield Y(V, E): AI applies exponential amplification to your baseline output from the start. This yield exists whether you're a junior or a principal. It's always there, compounding your agency and experience.
- The Flatline F: If your amplified yield Y falls below the flatline F, you only see F. AI rescues you to a fixed output level. You don't have the skills yet to know which questions to ask, or you don't have the curiosity to ask those questions.
- The Amplification Zone: Once Y exceeds F, you see your full exponential yield. The more you bring, the more AI compounds it. Amplification is capped. Not by AI's ceiling, but by ours. Our cognitive limits bind what we can meaningfully process.
What This Model Explains
Resolving contradictions 1 and 2: The floor-ceiling mechanism.
The model resolves the apparent conflict between "AI gives everyone 80%" and "AI amplifies skill." They are actually mathematically distinct effects operating on different populations. When Y < F, you're being rescued to the flatline. You ship working code and feel productive, but your output is artificially lifted to a floor you cannot pass. When Y > F, AI triggers exponential amplification; your yield scales with both agency and experience. The max(F, Y) function ensures the floor, while the exponential in Y raises the ceiling.
Resolving contradiction 3: Agency matters.
The concern that "relying on AI erodes critical thinking" isn't addressed by the model's equations. It's addressed by how agency is defined. The model treats V as a parameter, but the definition describes agency as responsive: growing through deliberate practice, atrophying through disuse.
This creates a feedback loop that the math alone doesn't show:
- If you're rescued to the flatline, you ship without needing to engage critically
- Without critical engagement, agency (V) atrophies over time
- Lower agency produces lower yield, pushing you further below the flatline
- The more AI rescues you, the more you need to be rescued
The model doesn't predict this erosion, but it explains why the erosion matters. Your position isn't determined by a one-time calculation; it's determined by what V becomes over time. Two engineers with identical starting positions can diverge: one practices deliberate engagement and grows V, eventually breaking into the amplification zone. The other accepts the rescue, lets V decay, and becomes permanently dependent on the flatline.
This is why contradiction 3 coexists with contradictions 1 and 2. AI can amplify skill (for those above the flatline). AI does provide a floor (for everyone). And AI may erode critical thinking (for those who let the floor replace engagement). The determining variable is whether you treat the flatline as a safety net or a hammock.
Calibrating the Model
Feel free to skip this section. What follows is an attempt to assert the model against real-world anchors. If you're not interested in the math, jump ahead to The Model in Practice. The calibration points that follow are illustrative. The model's value isn't in the exact numbers but in the structural insight: Floor and amplification are distinct mechanisms operating on different populations.
Defining Experience Levels (E)
Real expertise development has been hypothesized to follow different types of curves. Logarithmic (rapid early gains, diminishing returns), S-shaped (slow start, steep middle, plateau), or non-uniform (qualitative leaps between levels). There's no consensus in the literature on which pattern best describes software engineering capability growth. Linear is simple, and for a thought experiment, simple wins:
- E = 1 → Junior
- E = 2 → Mid-level
- E = 3 → Senior
- E = 4 → Principal
Note that with this definition juniors get zero amplification by design. The ability to leverage AI beyond the flatline is in itself what distinguishes a higher experience level.
Defining the Flatline (F)
To determine F, we need a reference point. AGI provides that anchor. But what is AGI in measurable terms?
The SWE-bench verified benchmark offers a concrete proxy. This benchmark evaluates AI models on 500 real GitHub bug-fixing tasks where they must produce patches that pass actual test suites. Crucially, these tasks come with human time estimates: 91% take less than one hour for an "experienced engineer" who has "a few hours to familiarize themselves with the codebase."
The phrase "experienced engineer" is key. SWE-bench was calibrated against senior-level performance on well-defined tasks. And it only measures a slice of engineering work: fixing bugs in existing codebases with clear acceptance criteria. It doesn't measure architectural decisions, system design, handling ambiguous requirements, or greenfield development.
This gives us our first anchor: AGI on SWE-bench = F = 100 correlates with the output of a moderate-agency senior on well-defined work.
The leading models today (Claude Opus 4.5 and Gemini 3 Pro) score around 74% on SWE-bench Verified. This gives us the current flatline: F = 74.
Deriving Agency (V)
If a moderate senior (E=3) produces 100 on SWE-bench-type tasks without AI:
R_noAI = V × (1 + E) = 100
V × 4 = 100
V = 25
This gives us: V = 25 represents moderate agency.
With V = 25 as the anchor, we model agency as normally distributed across the engineering population. Assuming a variation of 20% with μ = 25 gives σ = 5:
- V = 15 = Low agency (μ - 2σ, ~2nd percentile)
- V = 25 = Moderate agency (μ, 50th percentile)
- V = 35 = High agency (μ + 2σ, ~98th percentile)
The exact values are illustrative. What matters is the concept: Agency varies meaningfully across engineers, and this variance affects output before AI even enters the equation.
If you have lower agency, you can still reach the flatline. It just takes longer to get there.
Deriving the Amplification Rate (A)
To define the slope of the exponential curve, we need a calibration point. A known relationship between experience and amplification. Without empirical data stratified by experience level, we make an informed assumption: at principal level (E=4), AI amplifies output to 2× R_noAI.
It reflects a hypothesis that the most experienced engineers can roughly double their effective output through AI collaboration. Leveraging it for code generation, exploration, and iteration while applying judgement that compounds the result.
Using the standard technique for deriving an exponential rate constant (taking the natural logarithm of both sides) we get:
A = ln(C) / S = Exponential Amplification Rate
Where: S = (E - 1)/2 = Experience Offset
With E = 4 and C = 2:
S = (4 - 1)/2 = 1.5
A = ln(2) / 1.5 ≈ 0.462
This anchors the exponential curve. AI is powerful enough to meaningfully amplify engineers, with the curve calibrated to reach 2× at principal level. If future evidence suggests principals achieve a different amplification ratio—say, 3×—then A would become ln(3)/1.5 ≈ 0.732.
The Model in Practice
With each parameter anchored to an assumption, here's how the model plays out:
Parameters:
- F = 74 (current AI flatline, based on ~74% SWE-bench performance)
- E = Experience level: 1 (Junior), 2 (Mid-level), 3 (Senior), 4 (Principal)
- V = Agency: 15 (low), 25 (moderate), 35 (high)
- A ≈ 0.462 (amplification rate, derived from 2× calibration at E=4)
The asymmetry is clear: below the flatline, the exponential yield exists but is masked—everyone sees 74. Above it, the exponential becomes visible and compounds experience, reaching 2× output at principal level.
Reality
The model assumes you're working within a single domain where your experience level applies uniformly. Reality is messier. A single task often demands multiple skill domains and your experience level differs across them.
Consider a senior backend engineer who doesn't know frontend, building a full-stack feature. AI amplifies their backend work exponentially while rescuing their frontend work to the flatline. The final output can be equally impressive though.
What differs is the force behind the output. A specialist operating entirely above the flatline earns their amplified output through skill. A generalist in the blend achieves high output through a combination: part genuine skill amplification (in their strong domains), part being rescued (in their weak domains).
This matters for growth. When output comes from skill, you're building compound advantage. When output comes from the flatline, you're borrowing capability you haven't earned yet. This is not bad, but let's remain honest about which parts of your work are truly yours. Most importantly, your compound depends on the agency you are showing and the time you have already spent building experience. AI can accelerate knowledge acquisition. But seniority is knowledge and experience. You still need to live through production incidents, watch systems evolve, and feel the consequences of decisions that seemed fine until they weren't.
What Happens When We Reach AGI?
The flatline F currently sits at 74, and with AGI the flatline rises to F = 100. Every developer gets lifted to senior-level execution as their new floor on scoped work.
But AGI on SWE-bench doesn't demonstrate the ability to make architectural trade-offs, scope ambiguous problems, or engage in higher-order thinking to determine which tasks should exist in the first place. And benchmarks that do test these capabilities (ARC-AGI-2 for compositional reasoning, RE-Bench for long-horizon R&D, SPIN-Bench for strategic planning under uncertainty) show significant gaps. The exponential amplification zone doesn't disappear, but shifts to a different domain.
When Can We No Longer Best It?
The amplification advantage depends on your yield exceeding the flatline: Y(V, E) > F. As F rises, fewer engineers qualify. At some point, the flatline exceeds what any human can produce even with AI assistance, and the amplification zone disappears entirely.
The maximum AI-assisted output in our model is a high-agency principal: Y(35, 4) = 35 × 5 × 2 = 350. If AI capability reached F = 350, even the best engineers would fall below the flatline. At that point, everyone produces 350—the flatline value—regardless of agency or experience. No amplification, no differentiation. Just the floor. It's the penthouse floor though.
The intermediate thresholds tell the story of who loses amplification as F rises:
| F value | Benchmark equivalent | Who loses amplification |
|---|---|---|
| 74 (today) | 74% SWE-bench | All juniors, low-agency mid-levels |
| 100 (AGI) | 100% SWE-bench | All juniors, low/moderate mid-levels, low-agency seniors |
| 175 | 1.75× senior-level | Everyone except high-agency seniors and moderate+ principals |
| 350 | 3.5× senior-level | Everyone—amplification zone collapses |
Beyond F = 100, we're extrapolating past the SWE-bench anchor into speculation. What the model does predict with more confidence: as F rises, the amplification zone shrinks. The threshold for "bringing enough to exceed the flatline" keeps rising. Whether the threshold will rise enough to exceed human capability is an open question, but the direction of pressure is clear.
So, Does AI Make Us Better or Dumber?
Both. That's the paradox.
If you stay in the flatline zone, accepting AI's first answer, never pushing beyond what it hands you, then you're being rescued, not amplified. Your skills may atrophy. You become a shell.
If you operate above the flatline, using AI as a sparring partner, iterating on its suggestions, bringing domain expertise and architectural judgment, then you're being exponentially amplified. You become more capable than you could be alone.
The same tool, but with two completely different outcomes. The variable is you.