Aider (open source)
2-9 Wh
Consumption
Claude Code, GitHub Copilot, Cursor, and Devin spike consumption x83-x200. We analyze when the energy cost makes sense.
A median Claude Code session consumes 41 Wh: the equivalent of 136 normal text queries — or leaving an LED bulb on for 4 hours.
Code agents multiply consumption x 10-x 200 compared to a text query. A median Claude Code session consumes 41 Wh — 136 normal queries. An agent makes dozens of calls in a loop with iterative reasoning. The cost is justified for well-defined repetitive tasks; not for open-ended exploration or domains the model doesn't master. No company publishes Wh per task metrics. They should.
Consumption per session or task. Low-high range. Reference: 0.3 Wh = 1 text query.
41 Wh. That’s what a median session of Claude Code consumes — data measured directly by researcher Simon P. Couch in January 2026.
That’s the same as 136 normal text queries. Or an LED bulb left on for four hours. Or 12% of the energy your laptop consumes in a full workday.
And that’s the median session. A complex session — a full day with the agent active — can reach 50-200 Wh.
The question isn’t whether the consumption is high. It’s whether it’s worth it.
There’s a fundamental difference between a code assistant and a code agent.
Classic autocomplete — the original GitHub Copilot, smart snippets — makes a single model call per suggestion. The cost is small and one-off: 0.5-2 Wh per work session.
An agent is something else. It doesn’t wait for you to write: it acts. It receives a task in natural language (“implement OAuth authentication,” “migrate these tests to the new API,” “find and fix the performance bug in checkout”) and proceeds autonomously.
To complete that task, the agent:
A “simple task” can trigger 20-50 model calls. A complex task, hundreds. And each call includes the accumulated context — all files read, all history — which means the tokens per call also grow over time.
The result: consumption doesn’t scale linearly with task complexity. It scales superlinearly.
These are the documented consumption ranges for the main code agents, with their multipliers relative to a simple text query (0.3 Wh as reference):
| Tool | Consumption per session/task | Multiplier |
|---|---|---|
| Aider (open source) | 2-9 Wh | x7-x30 |
| GitHub Copilot Agent | 3-15 Wh | x10-x50 |
| Amazon Q Developer Pro | 4-18 Wh | x13-x60 |
| Windsurf SWE-1 | 5-20 Wh | x17-x67 |
| Cursor AI | 5-25 Wh | x17-x83 |
| OpenAI Codex / GPT-5.1-Codex | 6-20 Wh | x20-x67 |
| OpenAI Codex / GPT-5.3-Codex | 12-40 Wh | x40-x133 |
| Devin 2.0 | 10-60 Wh | x33-x200 |
| Claude Code + Sonnet 4.6 | 25-45 Wh | x83-x150 |
| Claude Code + Opus 4.6 | 45-70 Wh | x150-x233 |
Some observations about this table:
Aider is the positive outlier. The open source agent consumes x4 fewer tokens than Claude Code for equivalent tasks. Efficiency isn’t a monopoly of commercial solutions.
Devin 2.0 is the most unpredictable. The 10-60 Wh range reflects enormous variance: its full autonomous mode can consume as much as an extended Claude Code session with Opus.
GPT-5.3-Codex doubles its predecessor. The jump from x20-x67 to x40-x133 between versions illustrates the trend: models with integrated reasoning cost more, though they’re also more capable.
Of the entire list above, only one public analysis exists with detailed methodology: that of Simon P. Couch, published in January 2026.
Couch analyzed his own work sessions with Claude Code over weeks and documented the following:
“A developer using code agents 8 hours a day consumes an energy equivalent to keeping a refrigerator running for 24 hours.” — Simon P. Couch, Claude Code energy consumption analysis, January 2026
What makes this analysis valuable isn’t just the number: it’s that no one else has published similar data. Not Anthropic, not OpenAI, not GitHub, not Cursor. The companies selling these tools don’t publish Wh per task. They only publish price per token — which is a proxy variable for consumption, but doesn’t equate to actual consumption in context.
Here comes the uncomfortable part of the analysis: the high energy cost may be justified if the productivity gain is real.
Internal data from GitHub points to a +55% speed increase in scoped tasks with Copilot Agent. Studies of teams adopting full code agents report equivalences of 3-4 days of work compressed into one for certain types of tasks.
If that’s true — and the methodology has limitations we’ll discuss — the ROI can be positive even considering energy consumption.
But there’s a problem with this data:
The productivity benchmarks are produced by the companies themselves. GitHub measures the impact of Copilot. Anthropic measures the impact of Claude Code. No independent study has simultaneously measured:
The rebound effect is real and documented in other technologies: when something becomes faster, it gets used more. A team that adopts code agents doesn’t just do the same things faster — it also generates more code, more iterations, more reviews, more PRs. More total spending? Probably yes.
The question nobody is answering is: does that additional code generate value, or does it just accumulate technical debt?
Not all use cases are equal. These are the situations where the energy cost of a code agent has a clear return:
Migrations and refactoring with well-defined patterns. Migrating from one API version to another, updating dependencies, converting tests from one framework to another. The agent knows the pattern, applies it to hundreds of files with consistency. A human would take days; the agent, hours. The time differential has real business value.
Rapid prototyping where time to market matters. In exploration phases with real deadlines — a demo for investors, an MVP to validate a hypothesis — the speed cost can far exceed the energy cost.
Understanding large codebases. Asking an agent to explain the architecture of a 200,000-line project, trace a function’s flow, or identify all usage points of an API. Here the agent reads more than it writes, and the value lies in synthesis.
Regression tests and coverage. Generating tests for well-documented existing code is predictable and the agent does it well. The freed human time can be dedicated to higher cognitive value tasks.
Open-ended exploration. “Do something interesting with this data.” “Improve the application’s performance.” “Refactor to make it cleaner.” Without clear success criteria, the agent iterates without converging. Many model calls, uncertain result, manual review inevitable anyway.
Domains the model doesn’t master well. If the agent doesn’t know the domain well — a very specific library, an uncommon language, undocumented business logic — it will make mistakes and need many iterations to correct them. High consumption, mediocre result.
Tasks where speed doesn’t matter. If there’s no deadline, if the generated code is going to need exhaustive review anyway, if the team is going to spend more time reviewing what the agent did than it would have taken to write it: the ROI is negative.
When the generated code creates more technical debt than it resolves. Agents are optimizers for completing the assigned task. They have no business context of their own, they don’t know the team’s implicit conventions, they don’t know which parts of the code are most critical. The code they generate can work and still be a problem six months down the road.
There’s a structural problem in how the impact of code agents is being evaluated:
Productivity studies are funded by those who sell productivity. The most cited study on Copilot’s impact is from GitHub, which belongs to Microsoft, which sells Copilot. The most favorable analysis of Claude Code comes from Anthropic. This doesn’t invalidate the data, but it does require reading it with critical thinking.
Success metrics are biased toward what’s easy to measure. Speed of completing a scoped task: measurable. Code quality at six months: not measurable in a three-week study. Accumulated technical debt: also not. Impact on the developer’s ability to maintain and understand their own code: nearly impossible to isolate.
No provider publishes energy consumption metrics per task. Token prices are public. Wh per task are not. The energy transparency demanded of household appliances is not demanded of software tools that consume orders of magnitude more energy than any washing machine.
At AISHA we make a specific request: that code agent providers publish Wh per task metrics, just as they publish price per token and generation speed. It’s not difficult information to calculate for those who have access to their own systems. It’s information that users and engineering teams need to make informed decisions.
A code agent isn’t better than a human developer. It’s different: faster at certain types of tasks, more energy-costly, with no business context of its own. The decision to use it well requires knowing exactly what type of task you have on your hands.
If you’re a developer: Distinguish what type of task you have before invoking the agent. Repetitive task with clear criteria → agent. Open-ended exploration → write it yourself first. Consider Aider for tasks where maximum autonomy isn’t necessary: x4 less consumption for comparable results.
If you lead an engineering team: Establish a usage policy, not just an access policy. Measure total cycle time — including review and correction of generated code — not just generation time. Define which types of tasks justify a full agent versus simple assistance.
If you’re a CTO or technical lead: A team of 20 engineers using code agents 6 hours daily consumes the energy equivalent of several hundred refrigerators running 24/7. That’s a relevant data point for ESG and for operational costs when compute is pay-per-use.
If you work in technology sustainability: Demand that development tool providers include Wh per task metrics in their dashboards. Cost per token is already published. Cost in Wh should be too — it’s not technically difficult, it’s a transparency decision.
Related
Cuánta energía cuesta que la IA 'piense' de verdad — y por qué el modo de razonamiento activado por defecto es un problema
La guía definitiva del consumo energético por modelo y modalidad en 2026
Our calculator helps you put queries, images, reasoning and agents into context.
Open calculator