Text (flash)
x0.17-x0.8
Consumption
The definitive guide to energy consumption by model and modality in 2026
Generating 10 seconds of video with Veo 3.1 can consume as much energy as a microwave running for 1-2 hours.
Text queries average ~0.3 Wh. Reasoning typically ranges between x 5 and x 130. Image between x 0.3 and x 14. Current commercial video between x 133 and x 1,400. Code agents between x 20 and x 150. Efficiency improves every year, but total consumption rises because each improvement triggers a surge in usage volume (Jevons Paradox).
Logarithmic scale. The width uses a conservative reference value; on the right the range documented in open sources is shown (0.3 Wh = x1).
This is the Jevons Paradox: if something becomes cheaper and more efficient, it gets used much more and total consumption can grow
| Series | 2024 | 2025 | 2026 | 2027 | 2028 |
|---|---|---|---|---|---|
| Efficiency per token (Wh) | 0.45 | 0.3 | 0.26 | 0.22 | 0.18 |
| Daily queries (billions) | 0.7 | 1.5 | 3.5 | 6 | 9 |
0.24 Wh
Only direct measurement (Google Gemini)
x 133-x1,400
Current commercial video vs text
x 46
Variation across image models
x 514
Extreme peak in reasoning benchmark (Phi-4)
Generating 10 seconds of video with Veo 3.1 can consume as much energy as a microwave running for 1-2 hours.
That sentence is not rhetorical exaggeration. It’s a measured data point. And it’s just the tip of the iceberg of a reality that AI companies prefer not to quantify publicly.
At AISHA we have compiled, cross-referenced, and verified all available measurements as of April 2026 — academic papers, production data, independent benchmarks — to build the most comprehensive guide in English to the real energy consumption of artificial intelligence.
This is what we know.
To talk in comparable numbers, we need a starting point. The reference unit is the standard text query: approximately 0.3 Wh (watt-hours).
How much is that? The energy a 10-watt LED bulb consumes in less than two minutes. It seems insignificant. But when multiplied by the billions of daily queries worldwide, the aggregate impact is anything but trivial.
Google is the only provider that has published a direct production measurement: 0.24 Wh as the median for text queries to Gemini (August 2025, real infrastructure measurement, not an estimate). Sam Altman stated that ChatGPT consumes 0.34 Wh on average, but without publishing any methodology. Anthropic has published absolutely nothing.
With that reference of 0.3 Wh as the baseline (x1), we can compare everything else.
Not all text models consume the same amount. The difference between the lightest and the heaviest exceeds 40 times. This table shows it:
| Model | Consumption per query | Multiplier |
|---|---|---|
| Gemini 2.5 Flash-Lite | 0.10 – 0.15 Wh | x0.3 – x0.5 |
| Llama 4 Scout | 0.15 – 0.30 Wh | x0.5 – x1 |
| DeepSeek V4 | 0.15 – 0.35 Wh | x0.5 – x1.2 |
| GPT-5-mini | 0.20 – 0.40 Wh | x0.7 – x1.3 |
| Mistral Large | 0.25 – 0.50 Wh | x0.8 – x1.7 |
| Claude Sonnet 4.6 | 0.40 – 0.90 Wh | x1.3 – x3 |
| GPT-5.4 | 0.50 – 1.20 Wh | x1.7 – x4 |
| Gemini 2.5 Ultra | 0.35 – 0.70 Wh | x1.2 – x2.3 |
| Claude Opus 4.6 | ~4 Wh (estimated) | ~x13 |
“Flash” or “mini” models are between 3 and 10 times more efficient than full frontier models. For the vast majority of everyday tasks — summarizing a text, drafting an email, answering a factual question — the small model is sufficient.
Model choice is not neutral. Choosing poorly can multiply your consumption by 26 times for the same task.
The revolution of “thinking models” — models that reason internally before responding — has radically changed the energy equation. They generate chains of thought tens of thousands of tokens long before giving an answer, and that internal process consumes energy.
The following table collects the available measurements for the main reasoning modes:
| Mode | Consumption | Multiplier vs. base text |
|---|---|---|
| GPT-5.4 with reasoning | 4 – 18 Wh | x13 – x60 |
| Claude with Extended Thinking | 2 – 8 Wh | x7 – x27 |
| o3 (long prompts) | ~39 Wh | ~x130 |
| Deep Research (any provider) | 10 – 40 Wh | x33 – x133 |
In the worst case, a single reasoning query consumes the same as 130 normal text queries.
The Hugging Face AI Energy Score v2 (December 2025), which measured 205 open-source models on H100 GPUs, found even more extreme results:
Activating reasoning mode when it’s not necessary is like using a 40-ton truck to go buy bread.
The research by Bertazzini et al. (June 2025) measured 17 diffusion models on an RTX 4090 and found a 46-fold variation between the most efficient and the least efficient.
These are the extremes of the spectrum:
| Model | Consumption per image | Equivalence |
|---|---|---|
| LCM_SSD_1B (most efficient) | 0.086 Wh | ~0.3 text queries |
| Ideogram 3 | 0.8 – 2.5 Wh | 3 – 8 queries |
| Midjourney v7 | 1 – 4 Wh | 3 – 13 queries |
| DALL-E 4 | 2 – 6 Wh | 7 – 20 queries |
| Native GPT-4o image | ~3 Wh | ~10 queries |
| Lumina (least efficient) | 4.08 Wh | ~14 queries |
The difference between the cheapest and the most expensive model is the difference between turning on a flashlight and turning on an oven.
A counterintuitive finding: int8 quantization, which is supposed to reduce consumption, actually increases it by up to 64.5% in some image models. Efficiency is not always what it seems.
700 million images in one week. That’s what users generated when OpenAI launched native image generation in GPT-4o. That’s equivalent to approximately 2,100 MWh in image generation alone, in seven days.
If text is the bicycle, video is the airplane. The research by Delavande and Luccioni (September 2025) measured 7 open-source video models on H100 and documented an 800-fold range between the cheapest and the most expensive.
These numbers speak for themselves:
| Model | Duration | Consumption | Multiplier vs. text |
|---|---|---|---|
| AnimateDiff (most efficient) | 2 sec | 0.14 Wh | x0.5 |
| Runway Gen-3 | 5 sec | 3 – 8 Wh | x10 – x27 |
| WAN2.1-14B | 5 sec | ~109 Wh | ~x363 |
| Kling 3.0 | 15 sec | ~400 Wh | ~x1,333 |
| Sora 2 | 10 sec | ~1,000 Wh | ~x3,333 |
944 Wh per 5-second clip. That’s what Sora consumed — as much energy as charging a smartphone for a month. OpenAI shut it down on March 24, 2026 after accumulating total revenue of $2.1 million against estimated operating costs of $15 million per day.
A technical detail that aggravates the problem: doubling the video duration quadruples the energy consumption. The relationship is not linear — it’s exponential.
Passoni et al. (May 2025) published the only paper with measurements of audio generation (text-to-audio), with 7 models on NVIDIA A40 GPUs:
The concerning finding: newer models consistently consume more energy than older ones. The industry prioritizes quality over efficiency, without exception.
One single paper. Seven models. Zero data from commercial services. That is all the transparency that exists today in generative audio.
Code agents represent a new consumption paradigm. Simon P. Couch analyzed Claude Code sessions (January 2026) and found that a median session processes 592,000 tokens and consumes approximately 41 Wh — the equivalent of 136 conventional text queries.
Complex sessions can reach 50 to 200 Wh. A developer using code agents during a full workday can consume as much energy as an average European household in a day.
A developer with a code agent running for eight hours consumes the same as their refrigerator in 24 hours.
This is perhaps the most important data point in the entire guide: efficiency per query improves constantly, but total consumption never stops growing.
Google demonstrated a 33-fold efficiency improvement in 12 months (May 2024 to May 2025). And yet, its total carbon emissions increased by 48-50% in the same period. Its actual electricity consumption grew by 27%, even though its accounting based on renewable energy certificates (market-based) declared a “12% reduction.”
This is the Jevons Paradox applied to AI: when a resource is used more efficiently, its cost drops, it becomes more accessible, the volume of use skyrockets, and total consumption increases.
The data confirms it:
Efficiency is necessary but insufficient. Without demand governance — choosing the right model, avoiding unnecessary use, measuring the impact — technological improvement only accelerates the problem.
Everything above is based on the measurements that exist. But there are entire categories for which we have no data at all:
The barrier is not technical. NVIDIA DCGM, the GPU monitoring system, is already deployed in every data center in the world. APIs already report costs in dollars per call. Adding an energy_wh field would be trivial.
Companies choose not to do it. The barrier is political, not technical.
If you’re a user: Use our AI footprint calculator to estimate your consumption. As a rule of thumb: text < image < audio < code < reasoning < video. The smallest model that solves your task is always the best choice.
If you’re a company: AI consumption is already part of your carbon footprint under CSRD. Demand consumption data per service from your providers. If Google can publish 0.24 Wh, so can everyone else.
If you’re a developer: Flash/mini by default. Reasoning only when the problem requires it. Cache results. Every architecture decision has an energy cost that gets multiplied by millions of users.
If you’re a regulator: Measurement is possible today, with technology that already exists in every datacenter. Appliance energy labels reduced consumption by 60% over 30 years. AI needs its own label.
Related
La brecha entre la inversión en IA y el valor real que genera — y qué pueden hacer las empresas para estar en el 5% que sí funciona
Manifiesto AISHA: por qué defendemos la inteligencia artificial y por qué exigimos que se use de forma responsable
Our calculator helps you put queries, images, reasoning and agents into context.
Open calculator