Reasoning vs. fast response: the query that consumes x130 more

AI reasoning — the ability to 'think before responding' — multiplies energy consumption between x 10 and x 500 depending on the model and task. Reasoning models consume on average x 30 more than standard models. Code agents reach 41 Wh per median session (x 137). Deep Research systems exceed 100 Wh per report. The problem: many models enable reasoning by default, even when it's not needed.

39.2 Wh. That is what a long reasoning query to OpenAI’s o3 model consumes — direct measurement, high confidence. It is x131 more than a simple text query (0.3 Wh). The same energy as charging your smartphone nearly three times.

And o3 is not the extreme case. Hugging Face’s AI Energy Score v2 found that reasoning models consume on average x30 more than standard ones. Some reach x700.

The difference between asking an AI something and asking it to think is not incremental. It is orders of magnitude.

What reasoning is and why it costs so much

Since 2024, the main AI models incorporate a “reasoning” or “thinking” mode: instead of responding immediately, the model generates an internal chain of thought — sometimes thousands of invisible tokens — before producing the final response.

This process is computationally very expensive because:

It generates hidden tokens: The model can produce 10–100 times more internal tokens than it shows the user. Each token consumes energy even if you never see it.
It activates additional layers: Reasoning models typically activate more parameters, more attention layers, and more internal verification cycles.
It scales with complexity: Unlike a fast response (relatively fixed cost), reasoning scales with the difficulty of the problem. A complex question can generate 10 minutes of internal “thinking”.

Dauner and Socher documented that reasoning models emit up to x50 more CO₂ than concise models, with one case of 37,575 tokens for a single response.

The data: model by model

Hugging Face’s AI Energy Score v2 (December 2025), based on direct measurements on H100 hardware, provides the most solid data:

Model	Without reasoning	With reasoning	Multiplier
DeepSeek-R1-Distill-Llama-70B	0.050 Wh	7.63 Wh	x154
Phi-4-reasoning-plus	0.018 Wh	9.46 Wh	x514

These are real measurements, not estimates. And they confirm that reasoning is not a marginal cost — it is a change of scale.

Calibrated estimates for closed commercial models show the same pattern:

Model	Without reasoning	With reasoning
GPT-5	0.4–1.0 Wh	8–45 Wh
GPT-5.4	0.5–1.2 Wh	4–18 Wh
Claude Sonnet 4.6	0.25–0.6 Wh	1.5–8 Wh
Claude Opus 4.6	0.6–1.5 Wh	5–20 Wh
Gemini 2.5 Pro	0.25–0.6 Wh	2–12 Wh
Gemini 2.5 Flash	0.12–0.25 Wh	0.6–2.5 Wh
DeepSeek-V3.2	0.08–0.18 Wh	1.5–8 Wh

Gemini 2.5 Flash-Lite is the notable exception: even with thinking active, it stays at 0.2–0.8 Wh — demonstrating that efficient reasoning is possible.

Reasoning is not free. It is a cost multiplier ranging from x10 to x500 depending on the model. Every time you enable “thinking” you are choosing — consciously or unconsciously — to consume an order of magnitude more energy.

Code agents: reasoning in a loop

If point reasoning is already expensive, code agents take that cost to the extreme: they apply reasoning iteratively, in loops that can last tens of minutes, reading files, executing commands, verifying results, and starting again.

Simon P. Couch measured in January 2026 the real consumption of Claude Code in programming sessions:

Median session: 592,439 tokens across 24 interactive exchanges
Consumption per session: 41 Wh — x137 the base reference
Intensive daily use (2–3 simultaneous instances): ~1,300 Wh — the equivalent of a dishwasher cycle

Other agents operate in similar ranges:

Claude Code + Opus 4.6: 45–70 Wh per session (x150–x233)
GPT-5.3-Codex: 12–40 Wh per task (x40–x133)
Devin 2.0: 10–60 Wh per autonomous task (x33–x200)
Cursor AI: 5–25 Wh per heavy session (x17–x83)
GitHub Copilot Agent: 3–15 Wh per PR flow (x10–x50)
Aider: 2–9 Wh per task (x7–x30)

The wasted token problem

Morph published in April 2026 a revealing analysis: 70% of tokens consumed by code agents are waste:

35–45% on reading files
15–25% on tool output
15–20% on context forwarding
10–15% on internal reasoning
Only 5–15% generates actual code

A single-character fix consumed more than 21,000 input tokens. Claude Code uses x4.2 more tokens than Aider for identical tasks (479,000 vs ~105,000).

Deep Research: the query that replaces an analyst

Deep Research systems represent the extreme of reasoning: a single question triggers dozens of web searches, page readings, code executions, and iterative synthesis. The result is a research report — and the energy cost reflects it.

System	Energy per report	Multiplier
OpenAI DR (o3)	35–120 Wh	x117–x400
Gemini Deep Research	20–80 Wh	x67–x267
Claude Research	20–70 Wh	x67–x233
Perplexity Deep Research	15–60 Wh	x50–x200
OpenAI DR (o4-mini)	8–25 Wh	x27–x83
Grok DeepSearch	8–30 Wh	x27–x100

Simon Willison documented a Deep Research session with o4-mini: 60,506 input tokens, 22,883 output tokens (89% were internal reasoning tokens), 77 tool calls (45 searches + 24 page visits + 12 code executions). Cost: ~$1.10.

A Perplexity example: 7 user input tokens, 3,847 output tokens, but 308,156 invisible reasoning tokens. Reasoning represented between 54% and 78% of the total cost.

The “default thinking” problem

Here is the real risk: several models enable reasoning by default, even for questions that do not need it.

Claude Sonnet 4.6 has “adaptive thinking” mode enabled by default. A simple query that could be resolved in 0.3 Wh is processed with unnecessary reasoning, consuming 1.5–8 Wh.
GPT-5 uses a router that mixes fast response and reasoning according to its own criteria — not the user’s.
Claude Opus 4.6 operates by default in thinking mode, even when the price is reduced: reducing price does not equate to reducing energy consumption.

It is like a car with the turbo permanently on, even when going to buy bread.

Reasoning mode should be opt-in, not opt-out. Enabling it by default for all queries is systematic energy waste at the scale of hundreds of millions of users.

The definitive multiplier table

To put everything in perspective, this is the complete scale from the lightest query to the heaviest:

Action	Energy	Multiplier
Gemini 2.5 Flash-Lite (fast query)	0.05 Wh	x0.17
Simple text query (reference)	0.3 Wh	x1
Claude Sonnet 4.6 (adaptive/high)	1.5–8 Wh	x5–x27
Gemini 2.5 Pro (thinking)	2–12 Wh	x7–x40
GPT-5 (average, URI estimate)	18.9 Wh	x63
DeepSeek-R1 (long, direct measurement)	33.6 Wh	x112
o3 (long, direct measurement)	39.2 Wh	x131
Claude Code (median session, measured)	41 Wh	x137
Deep Research o3 (full report)	35–120 Wh	x117–x400
Sora 2 (10s clip, before shutdown)	90–936 Wh	x300–x3,120

From the lightest to the heaviest query there is a factor of x18,000. These are not variations — they are different worlds of consumption disguised under the same chat interface.

What can I do?

If you are a user: Disable reasoning mode when you don’t need it. Most everyday queries — writing, searching, summarising, translating — are resolved better and faster without thinking. Reserve reasoning for problems that truly require it: complex analysis, difficult code, deep research.
If you lead a technical team: Establish a model cascade policy: Flash-Lite/mini for routine tasks, standard model for general tasks, reasoning only when there is a clear ROI. This can reduce your team’s consumption by 80–90% without affecting output quality.
If you are a developer: Disable thinking by default in your integrations. Use thinking: "off" or equivalent as default and enable it only when the task justifies it. Implement reasoning token budgets. And consider lighter agents like Aider (x4 fewer tokens than Claude Code for equivalent tasks).
If you work in regulation: Default-enabled reasoning is a clear case of systematic unnecessary energy consumption at massive scale. A regulation requiring providers to offer the efficient mode as the default option — like the ECO mode on appliances — would have a measurable impact on global AI consumption.

Reasoning vs. fast response: the query that consumes x130 more

Energy multiplier: reasoning vs. fast response (base = 0.3 Wh)

What reasoning is and why it costs so much

The data: model by model

Code agents: reasoning in a loop

The wasted token problem

Deep Research: the query that replaces an analyst

The “default thinking” problem

The definitive multiplier table

What can I do?

Sources

Keep exploring AISHA

Una imagen de IA consume lo mismo que cargar tu móvil 4 veces

Solo existen 10 mediciones reales del consumo de IA en el mundo

¿Cuánta energía consume la IA que usas cada día?

Next step

Calculate the approximate impact of your AI usage.