Visualization of energy flows in programming sessions with AI agents, heat scale

Your Code Agent Consumes More Energy Than You Writing by Hand — But Is It Worth It?

Claude Code, GitHub Copilot, Cursor, and Devin spike consumption x83-x200. We analyze when the energy cost makes sense.

By AISHA · April 10, 2026 · 8 min read

A median Claude Code session consumes 41 Wh: the equivalent of 136 normal text queries — or leaving an LED bulb on for 4 hours.

Code agents multiply consumption x 10-x 200 compared to a text query. A median Claude Code session consumes 41 Wh — 136 normal queries. An agent makes dozens of calls in a loop with iterative reasoning. The cost is justified for well-defined repetitive tasks; not for open-ended exploration or domains the model doesn't master. No company publishes Wh per task metrics. They should.

Energy consumption by code agent tool

Consumption per session or task. Low-high range. Reference: 0.3 Wh = 1 text query.

Aider (open source)

2-9 Wh

GitHub Copilot Agent

3-15 Wh

Amazon Q Developer

4-18 Wh

Windsurf SWE-1

5-20 Wh

Cursor AI

5-25 Wh

OpenAI Codex (5.1)

6-20 Wh

Devin 2.0

10-60 Wh

Claude Code + Sonnet 4.6

25-45 Wh

OpenAI Codex (5.3)

12-40 Wh

Claude Code + Opus 4.6

45-70 Wh

41 Wh. That’s what a median session of Claude Code consumes — data measured directly by researcher Simon P. Couch in January 2026.

That’s the same as 136 normal text queries. Or an LED bulb left on for four hours. Or 12% of the energy your laptop consumes in a full workday.

And that’s the median session. A complex session — a full day with the agent active — can reach 50-200 Wh.

The question isn’t whether the consumption is high. It’s whether it’s worth it.


What a code agent is and why it consumes so much

There’s a fundamental difference between a code assistant and a code agent.

Classic autocomplete — the original GitHub Copilot, smart snippets — makes a single model call per suggestion. The cost is small and one-off: 0.5-2 Wh per work session.

An agent is something else. It doesn’t wait for you to write: it acts. It receives a task in natural language (“implement OAuth authentication,” “migrate these tests to the new API,” “find and fix the performance bug in checkout”) and proceeds autonomously.

To complete that task, the agent:

  • Reads project files — sometimes dozens
  • Plans and breaks down the task before touching code
  • Writes code, executes it in a sandbox environment, and reads the output
  • Interprets errors and decides how to fix them
  • Starts over if the result doesn’t pass tests

A “simple task” can trigger 20-50 model calls. A complex task, hundreds. And each call includes the accumulated context — all files read, all history — which means the tokens per call also grow over time.

The result: consumption doesn’t scale linearly with task complexity. It scales superlinearly.


The data: tool by tool

These are the documented consumption ranges for the main code agents, with their multipliers relative to a simple text query (0.3 Wh as reference):

ToolConsumption per session/taskMultiplier
Aider (open source)2-9 Whx7-x30
GitHub Copilot Agent3-15 Whx10-x50
Amazon Q Developer Pro4-18 Whx13-x60
Windsurf SWE-15-20 Whx17-x67
Cursor AI5-25 Whx17-x83
OpenAI Codex / GPT-5.1-Codex6-20 Whx20-x67
OpenAI Codex / GPT-5.3-Codex12-40 Whx40-x133
Devin 2.010-60 Whx33-x200
Claude Code + Sonnet 4.625-45 Whx83-x150
Claude Code + Opus 4.645-70 Whx150-x233

Some observations about this table:

Aider is the positive outlier. The open source agent consumes x4 fewer tokens than Claude Code for equivalent tasks. Efficiency isn’t a monopoly of commercial solutions.

Devin 2.0 is the most unpredictable. The 10-60 Wh range reflects enormous variance: its full autonomous mode can consume as much as an extended Claude Code session with Opus.

GPT-5.3-Codex doubles its predecessor. The jump from x20-x67 to x40-x133 between versions illustrates the trend: models with integrated reasoning cost more, though they’re also more capable.


The Claude Code case: the only data with public methodology

Of the entire list above, only one public analysis exists with detailed methodology: that of Simon P. Couch, published in January 2026.

Couch analyzed his own work sessions with Claude Code over weeks and documented the following:

  • Median session: 592,000 tokens across 24 interactive exchanges
  • Consumption per median session: 41 Wh
  • Equivalence: 136 normal text queries
  • Intensive use session (multiple instances, full day): 50-200 Wh

“A developer using code agents 8 hours a day consumes an energy equivalent to keeping a refrigerator running for 24 hours.” — Simon P. Couch, Claude Code energy consumption analysis, January 2026

What makes this analysis valuable isn’t just the number: it’s that no one else has published similar data. Not Anthropic, not OpenAI, not GitHub, not Cursor. The companies selling these tools don’t publish Wh per task. They only publish price per token — which is a proxy variable for consumption, but doesn’t equate to actual consumption in context.


The productivity paradox

Here comes the uncomfortable part of the analysis: the high energy cost may be justified if the productivity gain is real.

Internal data from GitHub points to a +55% speed increase in scoped tasks with Copilot Agent. Studies of teams adopting full code agents report equivalences of 3-4 days of work compressed into one for certain types of tasks.

If that’s true — and the methodology has limitations we’ll discuss — the ROI can be positive even considering energy consumption.

But there’s a problem with this data:

The productivity benchmarks are produced by the companies themselves. GitHub measures the impact of Copilot. Anthropic measures the impact of Claude Code. No independent study has simultaneously measured:

  1. Development speed
  2. Total energy consumption
  3. Quality of the code produced
  4. Long-term maintainability
  5. Technical debt generated

The rebound effect is real and documented in other technologies: when something becomes faster, it gets used more. A team that adopts code agents doesn’t just do the same things faster — it also generates more code, more iterations, more reviews, more PRs. More total spending? Probably yes.

The question nobody is answering is: does that additional code generate value, or does it just accumulate technical debt?


In which cases IS the cost worth it?

Not all use cases are equal. These are the situations where the energy cost of a code agent has a clear return:

Migrations and refactoring with well-defined patterns. Migrating from one API version to another, updating dependencies, converting tests from one framework to another. The agent knows the pattern, applies it to hundreds of files with consistency. A human would take days; the agent, hours. The time differential has real business value.

Rapid prototyping where time to market matters. In exploration phases with real deadlines — a demo for investors, an MVP to validate a hypothesis — the speed cost can far exceed the energy cost.

Understanding large codebases. Asking an agent to explain the architecture of a 200,000-line project, trace a function’s flow, or identify all usage points of an API. Here the agent reads more than it writes, and the value lies in synthesis.

Regression tests and coverage. Generating tests for well-documented existing code is predictable and the agent does it well. The freed human time can be dedicated to higher cognitive value tasks.


In which cases is it NOT worth it?

Open-ended exploration. “Do something interesting with this data.” “Improve the application’s performance.” “Refactor to make it cleaner.” Without clear success criteria, the agent iterates without converging. Many model calls, uncertain result, manual review inevitable anyway.

Domains the model doesn’t master well. If the agent doesn’t know the domain well — a very specific library, an uncommon language, undocumented business logic — it will make mistakes and need many iterations to correct them. High consumption, mediocre result.

Tasks where speed doesn’t matter. If there’s no deadline, if the generated code is going to need exhaustive review anyway, if the team is going to spend more time reviewing what the agent did than it would have taken to write it: the ROI is negative.

When the generated code creates more technical debt than it resolves. Agents are optimizers for completing the assigned task. They have no business context of their own, they don’t know the team’s implicit conventions, they don’t know which parts of the code are most critical. The code they generate can work and still be a problem six months down the road.


The measurement bias

There’s a structural problem in how the impact of code agents is being evaluated:

Productivity studies are funded by those who sell productivity. The most cited study on Copilot’s impact is from GitHub, which belongs to Microsoft, which sells Copilot. The most favorable analysis of Claude Code comes from Anthropic. This doesn’t invalidate the data, but it does require reading it with critical thinking.

Success metrics are biased toward what’s easy to measure. Speed of completing a scoped task: measurable. Code quality at six months: not measurable in a three-week study. Accumulated technical debt: also not. Impact on the developer’s ability to maintain and understand their own code: nearly impossible to isolate.

No provider publishes energy consumption metrics per task. Token prices are public. Wh per task are not. The energy transparency demanded of household appliances is not demanded of software tools that consume orders of magnitude more energy than any washing machine.

At AISHA we make a specific request: that code agent providers publish Wh per task metrics, just as they publish price per token and generation speed. It’s not difficult information to calculate for those who have access to their own systems. It’s information that users and engineering teams need to make informed decisions.


A code agent isn’t better than a human developer. It’s different: faster at certain types of tasks, more energy-costly, with no business context of its own. The decision to use it well requires knowing exactly what type of task you have on your hands.


What can I do?

  • If you’re a developer: Distinguish what type of task you have before invoking the agent. Repetitive task with clear criteria → agent. Open-ended exploration → write it yourself first. Consider Aider for tasks where maximum autonomy isn’t necessary: x4 less consumption for comparable results.

  • If you lead an engineering team: Establish a usage policy, not just an access policy. Measure total cycle time — including review and correction of generated code — not just generation time. Define which types of tasks justify a full agent versus simple assistance.

  • If you’re a CTO or technical lead: A team of 20 engineers using code agents 6 hours daily consumes the energy equivalent of several hundred refrigerators running 24/7. That’s a relevant data point for ESG and for operational costs when compute is pay-per-use.

  • If you work in technology sustainability: Demand that development tool providers include Wh per task metrics in their dashboards. Cost per token is already published. Cost in Wh should be too — it’s not technically difficult, it’s a transparency decision.

Sources

Related

Keep exploring AISHA

Next step

Calculate the approximate impact of your AI usage.

Our calculator helps you put queries, images, reasoning and agents into context.

Open calculator