Transparency / Opacity

Forensic inventory of energy opacity

Map of which providers publish data, which do not, and with what methodological quality.

Public evidence remains minimal and highly uneven

As of April 2026, nearly all debate about AI energy consumption rests on a handful of laboratory measurements, a single granular production figure, and several corporate or academic estimates with high margins of error. The main problem is not a lack of interest: it is the lack of open, comparable telemetry by service.

Truly useful primary sources

10

Among papers, open benchmarks, corporate statements, and auditable estimates.

Public range for a text query

0.24–0.34 Wh

Google and OpenAI mark the narrow known reference range for general chat.

Maximum observed deviation

x 27

Opaque estimation chains can inflate the difference between inferred and actual figures.

This inventory separates direct measurement, production data, and indirect estimation to answer a simple question: what do we actually know and what are we still assuming.

The conclusion is uncomfortable: most figures circulating in the press, regulation, and marketing are not verifiable telemetry. They are approximations built on assumed hardware, estimated utilization, and proprietary models that remain closed.

Consumption by modality with currently available evidence

Logarithmic scale based on the most cited public range for text, image generation, and open-source video.

Conclusion: the central problem is no longer calculating a nice number, but distinguishing between real telemetry and speculative narrative. Without that distinction, any comparison between models remains fragile.

The 10 pieces that truly sustain the debate

This section gathers the sources that genuinely contribute to the energy debate: direct laboratory measurement, a granular production case, and a small set of academic or corporate estimates that, even with limitations, help bracket orders of magnitude.

Primary evidence inventory

Filter by type to distinguish real production, open laboratory, and indirect estimation.

		Reported value	Key finding
Google — Gemini median August 2025 · arXiv:2508.15734v1	Production	0.24 Wh / query	The only granular production figure published, including TPU, host overhead, and PUE.
Sam Altman — ChatGPT June 2025 · corporate blog	Estimate	0.34 Wh / query	Serves as a media reference, but comes without methodology, peer review, or breakdown by modality.
Hugging Face AI Energy Score December 2025 · Sasha Luccioni et al.	Direct	1 to 5 stars	Compares over 200 open models and shows that reasoning can increase consumption by up to hundreds of times.
ML.Energy (University of Michigan) 2025-2026 · Jae-Won Chung et al.	Direct	Open leaderboard	Provides useful context for open-source models, but does not solve the black box of closed providers.
The Hidden Cost of an Image June 2025 · arXiv:2506.17016	Direct	Up to x46 between models	Confirms the enormous energy dispersion in image generation and the limited usefulness of comparing by brand without technical context.
Video Killed the Energy Budget September 2025 · arXiv:2509.19222	Direct	Up to x2,000 vs text	Open-source video already marks a clear physical rupture: modality matters more than the model's marketing.
Generative audio May 2025 · arXiv:2505.07615	Direct	Varies by model	Nearly the only useful empirical reference for text-to-audio, and it leaves out the dominant commercial platforms.
How Hungry is AI? 2025 · arXiv:2505.09598	Estimate	o3: 39.2 Wh · Claude 3.7: 17 Wh	Good snapshot of possible scenarios, but remains theoretical inference based on pricing and hardware assumptions.
Monte Carlo bottom-up simulation September 2025 · arXiv:2509.20241	Estimate	Median 0.34 Wh	One of the best academic approximations, but depends on too many unobservable input assumptions.
Claude Code energy estimate January 2026 · Simon P. Couch	Estimate	41 Wh / median session	Useful for sizing agents, although the author himself acknowledges a margin of error close to x3.

The table summarizes comparable findings. Full details and methodological limitations remain in the original sources.

Verifiable production

Google is the only major platform that has published a comparable granular production figure.
That data point already proves that per-query measurement is technically feasible.
What is still missing is the most sensitive: agents, commercial video, and breakdowns by premium service.

Open laboratory

Academia and open source do allow measuring text, image, audio, and video under controlled conditions.
That universe is useful for understanding orders of magnitude, not for replacing real product telemetry.
The dominant closed models remain outside the public comparison.

Indirect estimation

The most cited estimates mix assumed hardware, uncertain utilization, and financial costs as a proxy.
They can guide a discussion, but cannot support fine-grained regulation or fair commercial comparison.
The more opaque the provider, the larger the margin of error.

What data remains absent from major providers

Opacity is not uniform. There is an especially severe gap in agents, commercial video, aggregated inference, and distributed workloads within closed platforms. This table documents what key information remains unpublished and where there has been explicit refusal or sustained silence.

Inventory of still-unknown data

Filter by provider to see which information gaps remain open.

	Missing data	Status
OpenAI Text (GPT-5)	Actual consumption per query	No data
OpenAI Image (DALL-E / GPT-4o)	Actual consumption per image	No data
OpenAI Video (Sora 2)	Consumption per clip in production	No data
OpenAI Agent (Deep Research)	Actual consumption per session	No data
Anthropic Text (Claude)	Actual consumption per query in production	No data
Anthropic Agents (Claude Code / Research)	Actual consumption per automated session	No data
Google Agent (Gemini Deep Research)	Actual consumption per session	Request denied
Google Video (Veo 2/3)	Consumption per clip in production	No data
Meta Integrated inference	Aggregated AI consumption across Facebook, Instagram, and WhatsApp	No data
xAI Text (Grok 4)	Actual consumption and emissions from Colossus	No data
Music platforms Suno / Udio	Any public empirical data	No data
Commercial video Runway / Pika / Kling	Any public empirical data	No data

The absence of data does not mean the absence of internal telemetry. It means the absence of useful publication for customers, regulators, or researchers.

What this table reveals

The most serious opacity is no longer in training, but in recurring commercial inference: agents, video, tools integrated into productivity suites, and aggregated consumption of platforms with billions of users.

The fact that Google was able to publish a median per query and, at the same time, deny more specific data for intensive services shows that the barrier is selective. Enough is shared to shape a narrative, not enough to enable comparison.

The products with the greatest potential regulatory friction are the least transparent.
Agents remain the most opaque and strategically sensitive angle in the market.
Without per-service data, public conversation shifts to aggregated averages of little use.

If the industry knows the exact consumption to manage capacity, pricing, and usage limits, then the absence of publication is not ignorance: it is strategy.

How a figure is manufactured when no direct measurement exists

Bottom-up estimates do not fail due to individual bad faith, but because of the accumulation of unobservable assumptions. Each step adds uncertainty: architecture, hardware, utilization, overhead, PUE, and cost allocation among multiple tasks or users.

When a provider does not publish per-query telemetry, the analyst reconstructs the energy cost from the outside. That work can be intellectually rigorous and still remain an informed speculation.

The problem is cumulative: if each step introduces a reasonable margin, the total error can grow until it renders commercial or regulatory comparison useless.

1. Active parameters

Proprietary MoE models do not reveal how many experts are activated per token.
The total parameter count is useless if we do not know how much architecture is used in each response.
Starting error: can range from x2 to x10.

2. Assumed hardware

H100, H200, B200, or TPU radically change the cost per FLOP.
Without visibility into the actual cluster, any estimate starts with the wrong silicon.
Typical deviation: 50%–100%.

3. Actual utilization

The same hardware behaves very differently at 10%, 30%, or 60% utilization.
Batch economics and scheduling are a black box for external observers.
Here the error can grow to x3–x5.

4. FLOPs per query

The standard formula does not adequately capture routing, attention, caching, or other proprietary optimizations.
Two prompts with the same number of tokens may not cost the same.
Deviation can range from 30% to x2.

5. Conversion to energy

Using maximum TDP versus average consumption significantly changes the final result.
Furthermore, production clusters process in parallel with efficiencies impossible to observe from outside.
Additional bias can reach 50%.

6. System overhead

GPU is not the same as a full node: CPU, network, memory, and storage also consume power.
Google documented that its accelerator accounts for only 58% of the total.
Here another jump of 50%–100% appears.

7. PUE and data center context

The same workload changes if it operates with a PUE of 1.09 or 1.5 and with different cooling.
Dense AI clusters have thermal dynamics that do not always match the provider's historical PUE.
Final margin: 10%–50%.

AISHA: when an energy figure depends on too many invisible assumptions, it ceases to be an operational data point and becomes a sophisticated conjecture. The regulatory goal should not be to guess better, but to measure better.

Sources

Same category