Abstract illustration of two AI queries — one fast as lightning, another labyrinthine and costly, in teal and amber tones on dark background

Reasoning vs. fast response: the query that consumes x130 more

How much energy it costs for AI to truly 'think' — and why reasoning mode enabled by default is a problem

By AISHA · April 4, 2026 · 6 min read

A long reasoning query to OpenAI's o3 model consumes 39.2 Wh — direct measurement. That is x 131 more than a simple text query (0.3 Wh). And there are models where the multiplier reaches x 514.

AI reasoning — the ability to 'think before responding' — multiplies energy consumption between x 10 and x 500 depending on the model and task. Reasoning models consume on average x 30 more than standard models. Code agents reach 41 Wh per median session (x 137). Deep Research systems exceed 100 Wh per report. The problem: many models enable reasoning by default, even when it's not needed.

Energy multiplier: reasoning vs. fast response (base = 0.3 Wh)

Gemini 2.5 Flash-Lite (fast query)

0.17 x

Claude Sonnet 4.6 (no thinking)

1.5 x

Claude Sonnet 4.6 (adaptive/high)

15 x

GPT-5 (average, URI estimate)

63 x

DeepSeek-R1 (long, measured)

112 x

o3 (long, measured)

131 x

Claude Code (median session)

137 x

Phi-4-reasoning-plus (maximum measured)

514 x

x 131

o3 long reasoning multiplier (measured)

x 30

Average reasoning models vs. standard (HF)

41 Wh

Median Claude Code session (measured)

70 %

Tokens wasted in code agents

39.2 Wh. That is what a long reasoning query to OpenAI’s o3 model consumes — direct measurement, high confidence. It is x131 more than a simple text query (0.3 Wh). The same energy as charging your smartphone nearly three times.

And o3 is not the extreme case. Hugging Face’s AI Energy Score v2 found that reasoning models consume on average x30 more than standard ones. Some reach x700.

The difference between asking an AI something and asking it to think is not incremental. It is orders of magnitude.


What reasoning is and why it costs so much

Since 2024, the main AI models incorporate a “reasoning” or “thinking” mode: instead of responding immediately, the model generates an internal chain of thought — sometimes thousands of invisible tokens — before producing the final response.

This process is computationally very expensive because:

  • It generates hidden tokens: The model can produce 10–100 times more internal tokens than it shows the user. Each token consumes energy even if you never see it.
  • It activates additional layers: Reasoning models typically activate more parameters, more attention layers, and more internal verification cycles.
  • It scales with complexity: Unlike a fast response (relatively fixed cost), reasoning scales with the difficulty of the problem. A complex question can generate 10 minutes of internal “thinking”.

Dauner and Socher documented that reasoning models emit up to x50 more CO₂ than concise models, with one case of 37,575 tokens for a single response.


The data: model by model

Hugging Face’s AI Energy Score v2 (December 2025), based on direct measurements on H100 hardware, provides the most solid data:

ModelWithout reasoningWith reasoningMultiplier
DeepSeek-R1-Distill-Llama-70B0.050 Wh7.63 Whx154
Phi-4-reasoning-plus0.018 Wh9.46 Whx514

These are real measurements, not estimates. And they confirm that reasoning is not a marginal cost — it is a change of scale.

Calibrated estimates for closed commercial models show the same pattern:

ModelWithout reasoningWith reasoning
GPT-50.4–1.0 Wh8–45 Wh
GPT-5.40.5–1.2 Wh4–18 Wh
Claude Sonnet 4.60.25–0.6 Wh1.5–8 Wh
Claude Opus 4.60.6–1.5 Wh5–20 Wh
Gemini 2.5 Pro0.25–0.6 Wh2–12 Wh
Gemini 2.5 Flash0.12–0.25 Wh0.6–2.5 Wh
DeepSeek-V3.20.08–0.18 Wh1.5–8 Wh

Gemini 2.5 Flash-Lite is the notable exception: even with thinking active, it stays at 0.2–0.8 Wh — demonstrating that efficient reasoning is possible.

Reasoning is not free. It is a cost multiplier ranging from x10 to x500 depending on the model. Every time you enable “thinking” you are choosing — consciously or unconsciously — to consume an order of magnitude more energy.


Code agents: reasoning in a loop

If point reasoning is already expensive, code agents take that cost to the extreme: they apply reasoning iteratively, in loops that can last tens of minutes, reading files, executing commands, verifying results, and starting again.

Simon P. Couch measured in January 2026 the real consumption of Claude Code in programming sessions:

  • Median session: 592,439 tokens across 24 interactive exchanges
  • Consumption per session: 41 Whx137 the base reference
  • Intensive daily use (2–3 simultaneous instances): ~1,300 Wh — the equivalent of a dishwasher cycle

Other agents operate in similar ranges:

  • Claude Code + Opus 4.6: 45–70 Wh per session (x150–x233)
  • GPT-5.3-Codex: 12–40 Wh per task (x40–x133)
  • Devin 2.0: 10–60 Wh per autonomous task (x33–x200)
  • Cursor AI: 5–25 Wh per heavy session (x17–x83)
  • GitHub Copilot Agent: 3–15 Wh per PR flow (x10–x50)
  • Aider: 2–9 Wh per task (x7–x30)

The wasted token problem

Morph published in April 2026 a revealing analysis: 70% of tokens consumed by code agents are waste:

  • 35–45% on reading files
  • 15–25% on tool output
  • 15–20% on context forwarding
  • 10–15% on internal reasoning
  • Only 5–15% generates actual code

A single-character fix consumed more than 21,000 input tokens. Claude Code uses x4.2 more tokens than Aider for identical tasks (479,000 vs ~105,000).


Deep Research: the query that replaces an analyst

Deep Research systems represent the extreme of reasoning: a single question triggers dozens of web searches, page readings, code executions, and iterative synthesis. The result is a research report — and the energy cost reflects it.

SystemEnergy per reportMultiplier
OpenAI DR (o3)35–120 Whx117–x400
Gemini Deep Research20–80 Whx67–x267
Claude Research20–70 Whx67–x233
Perplexity Deep Research15–60 Whx50–x200
OpenAI DR (o4-mini)8–25 Whx27–x83
Grok DeepSearch8–30 Whx27–x100

Simon Willison documented a Deep Research session with o4-mini: 60,506 input tokens, 22,883 output tokens (89% were internal reasoning tokens), 77 tool calls (45 searches + 24 page visits + 12 code executions). Cost: ~$1.10.

A Perplexity example: 7 user input tokens, 3,847 output tokens, but 308,156 invisible reasoning tokens. Reasoning represented between 54% and 78% of the total cost.


The “default thinking” problem

Here is the real risk: several models enable reasoning by default, even for questions that do not need it.

  • Claude Sonnet 4.6 has “adaptive thinking” mode enabled by default. A simple query that could be resolved in 0.3 Wh is processed with unnecessary reasoning, consuming 1.5–8 Wh.
  • GPT-5 uses a router that mixes fast response and reasoning according to its own criteria — not the user’s.
  • Claude Opus 4.6 operates by default in thinking mode, even when the price is reduced: reducing price does not equate to reducing energy consumption.

It is like a car with the turbo permanently on, even when going to buy bread.

Reasoning mode should be opt-in, not opt-out. Enabling it by default for all queries is systematic energy waste at the scale of hundreds of millions of users.


The definitive multiplier table

To put everything in perspective, this is the complete scale from the lightest query to the heaviest:

ActionEnergyMultiplier
Gemini 2.5 Flash-Lite (fast query)0.05 Whx0.17
Simple text query (reference)0.3 Whx1
Claude Sonnet 4.6 (adaptive/high)1.5–8 Whx5–x27
Gemini 2.5 Pro (thinking)2–12 Whx7–x40
GPT-5 (average, URI estimate)18.9 Whx63
DeepSeek-R1 (long, direct measurement)33.6 Whx112
o3 (long, direct measurement)39.2 Whx131
Claude Code (median session, measured)41 Whx137
Deep Research o3 (full report)35–120 Whx117–x400
Sora 2 (10s clip, before shutdown)90–936 Whx300–x3,120

From the lightest to the heaviest query there is a factor of x18,000. These are not variations — they are different worlds of consumption disguised under the same chat interface.


What can I do?

  • If you are a user: Disable reasoning mode when you don’t need it. Most everyday queries — writing, searching, summarising, translating — are resolved better and faster without thinking. Reserve reasoning for problems that truly require it: complex analysis, difficult code, deep research.

  • If you lead a technical team: Establish a model cascade policy: Flash-Lite/mini for routine tasks, standard model for general tasks, reasoning only when there is a clear ROI. This can reduce your team’s consumption by 80–90% without affecting output quality.

  • If you are a developer: Disable thinking by default in your integrations. Use thinking: "off" or equivalent as default and enable it only when the task justifies it. Implement reasoning token budgets. And consider lighter agents like Aider (x4 fewer tokens than Claude Code for equivalent tasks).

  • If you work in regulation: Default-enabled reasoning is a clear case of systematic unnecessary energy consumption at massive scale. A regulation requiring providers to offer the efficient mode as the default option — like the ECO mode on appliances — would have a measurable impact on global AI consumption.

Sources

Related

Keep exploring AISHA

Next step

Calculate the approximate impact of your AI usage.

Our calculator helps you put queries, images, reasoning and agents into context.

Open calculator