Gemini 2.5 Flash-Lite (fast query)
0.17 x
Consumption
How much energy it costs for AI to truly 'think' — and why reasoning mode enabled by default is a problem
A long reasoning query to OpenAI's o3 model consumes 39.2 Wh — direct measurement. That is x 131 more than a simple text query (0.3 Wh). And there are models where the multiplier reaches x 514.
AI reasoning — the ability to 'think before responding' — multiplies energy consumption between x 10 and x 500 depending on the model and task. Reasoning models consume on average x 30 more than standard models. Code agents reach 41 Wh per median session (x 137). Deep Research systems exceed 100 Wh per report. The problem: many models enable reasoning by default, even when it's not needed.
x 131
o3 long reasoning multiplier (measured)
x 30
Average reasoning models vs. standard (HF)
41 Wh
Median Claude Code session (measured)
70 %
Tokens wasted in code agents
39.2 Wh. That is what a long reasoning query to OpenAI’s o3 model consumes — direct measurement, high confidence. It is x131 more than a simple text query (0.3 Wh). The same energy as charging your smartphone nearly three times.
And o3 is not the extreme case. Hugging Face’s AI Energy Score v2 found that reasoning models consume on average x30 more than standard ones. Some reach x700.
The difference between asking an AI something and asking it to think is not incremental. It is orders of magnitude.
Since 2024, the main AI models incorporate a “reasoning” or “thinking” mode: instead of responding immediately, the model generates an internal chain of thought — sometimes thousands of invisible tokens — before producing the final response.
This process is computationally very expensive because:
Dauner and Socher documented that reasoning models emit up to x50 more CO₂ than concise models, with one case of 37,575 tokens for a single response.
Hugging Face’s AI Energy Score v2 (December 2025), based on direct measurements on H100 hardware, provides the most solid data:
| Model | Without reasoning | With reasoning | Multiplier |
|---|---|---|---|
| DeepSeek-R1-Distill-Llama-70B | 0.050 Wh | 7.63 Wh | x154 |
| Phi-4-reasoning-plus | 0.018 Wh | 9.46 Wh | x514 |
These are real measurements, not estimates. And they confirm that reasoning is not a marginal cost — it is a change of scale.
Calibrated estimates for closed commercial models show the same pattern:
| Model | Without reasoning | With reasoning |
|---|---|---|
| GPT-5 | 0.4–1.0 Wh | 8–45 Wh |
| GPT-5.4 | 0.5–1.2 Wh | 4–18 Wh |
| Claude Sonnet 4.6 | 0.25–0.6 Wh | 1.5–8 Wh |
| Claude Opus 4.6 | 0.6–1.5 Wh | 5–20 Wh |
| Gemini 2.5 Pro | 0.25–0.6 Wh | 2–12 Wh |
| Gemini 2.5 Flash | 0.12–0.25 Wh | 0.6–2.5 Wh |
| DeepSeek-V3.2 | 0.08–0.18 Wh | 1.5–8 Wh |
Gemini 2.5 Flash-Lite is the notable exception: even with thinking active, it stays at 0.2–0.8 Wh — demonstrating that efficient reasoning is possible.
Reasoning is not free. It is a cost multiplier ranging from x10 to x500 depending on the model. Every time you enable “thinking” you are choosing — consciously or unconsciously — to consume an order of magnitude more energy.
If point reasoning is already expensive, code agents take that cost to the extreme: they apply reasoning iteratively, in loops that can last tens of minutes, reading files, executing commands, verifying results, and starting again.
Simon P. Couch measured in January 2026 the real consumption of Claude Code in programming sessions:
Other agents operate in similar ranges:
Morph published in April 2026 a revealing analysis: 70% of tokens consumed by code agents are waste:
A single-character fix consumed more than 21,000 input tokens. Claude Code uses x4.2 more tokens than Aider for identical tasks (479,000 vs ~105,000).
Deep Research systems represent the extreme of reasoning: a single question triggers dozens of web searches, page readings, code executions, and iterative synthesis. The result is a research report — and the energy cost reflects it.
| System | Energy per report | Multiplier |
|---|---|---|
| OpenAI DR (o3) | 35–120 Wh | x117–x400 |
| Gemini Deep Research | 20–80 Wh | x67–x267 |
| Claude Research | 20–70 Wh | x67–x233 |
| Perplexity Deep Research | 15–60 Wh | x50–x200 |
| OpenAI DR (o4-mini) | 8–25 Wh | x27–x83 |
| Grok DeepSearch | 8–30 Wh | x27–x100 |
Simon Willison documented a Deep Research session with o4-mini: 60,506 input tokens, 22,883 output tokens (89% were internal reasoning tokens), 77 tool calls (45 searches + 24 page visits + 12 code executions). Cost: ~$1.10.
A Perplexity example: 7 user input tokens, 3,847 output tokens, but 308,156 invisible reasoning tokens. Reasoning represented between 54% and 78% of the total cost.
Here is the real risk: several models enable reasoning by default, even for questions that do not need it.
It is like a car with the turbo permanently on, even when going to buy bread.
Reasoning mode should be opt-in, not opt-out. Enabling it by default for all queries is systematic energy waste at the scale of hundreds of millions of users.
To put everything in perspective, this is the complete scale from the lightest query to the heaviest:
| Action | Energy | Multiplier |
|---|---|---|
| Gemini 2.5 Flash-Lite (fast query) | 0.05 Wh | x0.17 |
| Simple text query (reference) | 0.3 Wh | x1 |
| Claude Sonnet 4.6 (adaptive/high) | 1.5–8 Wh | x5–x27 |
| Gemini 2.5 Pro (thinking) | 2–12 Wh | x7–x40 |
| GPT-5 (average, URI estimate) | 18.9 Wh | x63 |
| DeepSeek-R1 (long, direct measurement) | 33.6 Wh | x112 |
| o3 (long, direct measurement) | 39.2 Wh | x131 |
| Claude Code (median session, measured) | 41 Wh | x137 |
| Deep Research o3 (full report) | 35–120 Wh | x117–x400 |
| Sora 2 (10s clip, before shutdown) | 90–936 Wh | x300–x3,120 |
From the lightest to the heaviest query there is a factor of x18,000. These are not variations — they are different worlds of consumption disguised under the same chat interface.
If you are a user: Disable reasoning mode when you don’t need it. Most everyday queries — writing, searching, summarising, translating — are resolved better and faster without thinking. Reserve reasoning for problems that truly require it: complex analysis, difficult code, deep research.
If you lead a technical team: Establish a model cascade policy: Flash-Lite/mini for routine tasks, standard model for general tasks, reasoning only when there is a clear ROI. This can reduce your team’s consumption by 80–90% without affecting output quality.
If you are a developer: Disable thinking by default in your integrations. Use thinking: "off" or equivalent as default and enable it only when the task justifies it. Implement reasoning token budgets. And consider lighter agents like Aider (x4 fewer tokens than Claude Code for equivalent tasks).
If you work in regulation: Default-enabled reasoning is a clear case of systematic unnecessary energy consumption at massive scale. A regulation requiring providers to offer the efficient mode as the default option — like the ECO mode on appliances — would have a measurable impact on global AI consumption.
Related
Por qué generar imágenes con IA cuesta entre 3 y 33 veces más energía que una consulta de texto — y qué puedes hacer al respecto
Inventario forense de todo lo que sabemos — y lo que no — sobre la energía que consume la inteligencia artificial
La guía definitiva del consumo energético por modelo y modalidad en 2026
Our calculator helps you put queries, images, reasoning and agents into context.
Open calculator