The New Cost Ledger of AI

Overview

AI Has a New Cost Ledger

The conventional business cost vocabulary — inventory, credit, wastage, interest coverage — now has new entries. AI has generated an entirely fresh set of line items that creep up insidiously if left untracked.

Token Economics

What Is a Token, and What Does It Cost?

Every word you input to an AI engine, and every word it returns, costs money. A token is roughly 4 characters or 0.75 words. Input tokens and output tokens are billed separately — output costs more because it demands greater compute, energy, and trained inference.

Costs are quoted per million tokens. A 1,000-word article submitted for analysis consumes approximately 1,300 tokens.

Price Trajectory

Costs Are Falling — but Total Spend Is Rising

OpenAI's output token cost has fallen from $30 per million (GPT-4, 2023) to $10 (Turbo), $5 (GPT-4o), and $2.50 (GPT-5.4). Yet overall enterprise AI spend continues to climb as employees use AI for an ever-wider range of tasks — a classic Jevons Paradox dynamic.

Cost Multipliers

Beyond Token Price: What Else Drives Cost

Model size, reasoning depth (extended thinking), context window size, multimodal processing, latency tiers, and enterprise uptime guarantees each add to the bill. Any one of these can silently inflate usage costs by an order of magnitude.

Governance

Tokenmaxxing — the New Corporate Risk

Tokenmaxxing — excessive AI-token consumption without commensurate business results — is the term now in circulation. Axios reported a USD 500 million single-month bill at a US company with no usage controls. Amazon and Microsoft have both scaled back or concealed AI dashboards amid runaway expenditure.

India Context

India Is Still in the Honeymoon Stage

Indian companies do not disclose AI spend with the candour of their US counterparts. Proprietary tracking suggests India remains at a stage where the promise of gains outpaces beady-eyed cost focus. That equilibrium will shift.

Personal Usage

The Consumer Treadmill

Free-tier context windows shrink; paid tiers are calibrated to entice without fully satisfying. Even as per-token costs fall, total personal AI spending rises with habit formation. A pricing equilibrium is approaching — but not yet here.

Introduction

A New Lexicon of Cost Has Arrived With AI

The conventional cost vocabulary of running a business — selling expenses, wastage, inventory, credit and debit days, interest coverage ratios — now has new entries. With the rapid invasion of AI in daily commercial and personal life, an entirely fresh set of line items has developed. You had better know them, track them, and control them, because they have the nasty habit of creeping up on you insidiously.

These are not theoretical future concerns. They are live, recurring, and growing charges appearing on enterprise procurement ledgers today — and increasingly catching the attention of finance departments who find them difficult to benchmark, forecast, or justify.

Fundamentals

Token Cost — The Unit Price Everyone Now Understands

Put plainly: every word that you ask AI costs you; every answer that it gives you is also charged to you. Every conversation is charged exponentially, and every output — Excel, Word, PowerPoint — has its price.

Tokenisation and its cost is an interesting study. There are two categories of token: input and output. An input token is consumed when you submit words, images, or audio into the engine. Output tokens are generated by the AI engine's response — be it text, code, music, or images.

A token is roughly 4 characters, or 0.75 words. Submitting a 1,000-word article for analysis would consume approximately 1,300 input tokens. Costs are quoted per million tokens (per-MTok).

Input tokens typically cost less than output tokens, because output generation demands substantially more energy, compute, and the full weight of the model's training to produce a coherent response.

Tokenisation — a worked example

A 1,000-word article ≈ 1,300 input tokens. At GPT-4o pricing of $5 per million input tokens, that single analysis costs roughly $0.0065 — negligible in isolation. Multiply across 500 daily employee queries, each averaging 2,000 tokens in and 3,000 out, and a mid-sized enterprise easily breaches $10,000 per month without any governance in place.

Price trajectory — OpenAI GPT series

GPT-4 (2023): $30/MTok output → GPT-4 Turbo: $10 → GPT-4o: $5 → GPT-5.4: $2.50. A ~92% reduction in unit cost over three years, even as model capability has risen sharply. Total enterprise spend has nonetheless continued upward as usage has expanded far faster than prices have fallen.

Reference Data

Leading AI Engine Token Pricing — Indicative Rates

The following are indicative rates per million tokens (per-MTok) for major commercially available large language models as of mid-2026. Prices are subject to tier, volume discount, and model version; ranges indicate tiered or variable pricing.

#	Model / Engine	Company	Input Cost ($/MTok)	Output Cost ($/MTok)
1	GPT-5.5	OpenAI	~$5.00	~$30.00
2	Claude Opus 4.7	Anthropic	~$5.00	~$25.00
3	Gemini 3.1 Pro	Google AI	$2.00 – $4.00	$12.00 – $18.00
4	Grok-4.3	xAI	~$1.25	~$2.50
5	DeepSeek V4 Pro	DeepSeek	~$0.435	~$0.87
6	Mistral Medium 3.5	Mistral AI	~$1.50	~$7.50
7	Llama 4 Maverick	Meta AI	~$0.27	~$0.85
8	Command R+	Cohere	~$2.50	~$10.00
9	Qwen 3.5 Max	Alibaba / Qwen	$0.60 – $1.20	$2.00 – $6.00
10	ERNIE 4.5	Baidu AI Cloud	$1.00 – $3.00	$5.00 – $12.00

Source: Published API pricing pages of respective AI providers; analyst estimates where official tiered rates not publicly disclosed. Rates as of mid-2026; subject to revision. MTok = million tokens.

Price Trend Analysis

The Falling Unit Cost — and Rising Total Spend

Per-token pricing has fallen dramatically across the industry over three years, driven by model efficiency, competition, and scale. The OpenAI trajectory is the most documented and illustrates a structural trend applicable across providers: unit price falls, but total expenditure rises as usage expands to fill the reduced-cost capacity — a dynamic economists recognise as the Jevons Paradox.

OpenAI Output Token Price Trajectory (2023–2026)

Cost per million output tokens — USD — hover for details

Source: OpenAI published pricing. GPT-5.4 rate is indicative based on announced pricing tiers.

As prices have fallen from $30 to $2.50 per million output tokens — a reduction of approximately 92% — the total number of tokens consumed by enterprise users has grown by an estimated order of magnitude. The net effect on enterprise AI budgets has been upward, not downward.

Indicative Token Pricing Comparison — Mid-2026

Input vs. output cost per million tokens (USD) across leading models — hover for values

Source: Published API pricing; analyst estimates. Ranges use midpoint values. See table above for full detail.

Cost Architecture

Beyond Token Price — The Hidden Cost Multipliers

Token price is only the most visible element of AI cost. Several architectural and usage-driven factors multiply the base per-token rate significantly. Understanding these is essential for any organisation seeking to govern AI expenditure.

🔢

Model Size

Large, complex models such as GPT-5.5 and Claude Opus carry a significant premium. The additional engineering cost, infrastructure, and inference compute required to run frontier models is reflected directly in per-token pricing.

🧠

Reasoning Depth

Extended thinking — visible on Claude as Low, Medium, or High effort, and on other engines as chain-of-thought or o-series modes — instructs the engine to draw on wider resources for longer. This multiplies token consumption and cost per query, sometimes dramatically.

📐

Context Window Size

The maximum tokens an AI engine holds in active memory. Free tiers have historically offered small windows; paid tiers now offer over one million tokens — equivalent to a full book. Larger contexts drive up cost per session significantly.

🎥

Multimodal Processing

Processing images, video, and audio requires substantially more compute than text alone. OpenAI shut down Sora as image generation costs soared; Veo3 has reduced free usage tiers. Multimodal queries carry a steep cost premium over text-only interactions.

⚡

Latency Tiers

Faster response times require priority routing and dedicated compute allocation. Organisations requiring near-real-time responses — for customer-facing applications, for instance — pay a latency premium above standard API rates.

🏢

Enterprise Uptime & SLA

The 24×7 availability and reliability that enterprise deployments require carries its own premium. SLA commitments, dedicated infrastructure, data residency requirements, and compliance overlays all add to the total cost of AI ownership.

Enterprise Risk

Tokenmaxxing — The New Corporate Profligacy

"A recent report by Axios noted the USD 500 million bill in just one month that hit a company where there were no controls on usage. Everything that could be used — Agents, code, queries — was used."

The unchecked explosion in AI usage costs has spawned a term: tokenmaxxing — excessive AI-token consumption without commensurate business results. In other words, carelessly using AI simply because it is available and accessible.

It is not only individual employees who go awry. Amazon US recently stopped publishing its AI dashboard after employees began gaming usage metrics in ways that cost the company handsomely. Microsoft recently scaled back some licences, too, because of cost. These are not isolated incidents; they are early signals of a structural governance gap.

From being treated as operational costs — variable, discretionary, department-level — AI expenditures are now becoming capital costs: large, recurring, difficult to reverse, and carrying unclear direct business benefit. Finance departments are beginning to ask the harder questions: a better, faster report is produced — but did it secure more business? AI identified vendor mismanagement — but did it lead to structural overhaul?

Finance Department Checklist — questions now being asked

What is the per-query cost by department and use case? · What measurable business outcome was produced by each AI workload? · Which engine tiers are being used, and are they proportionate to task complexity? · Are Agents and automated pipelines running on metered or unmetered API calls? · Is there an AI usage governance policy, and who owns it?

India Perspective

India — Still in the AI Honeymoon Phase

Indian companies do not reveal their AI spends, nor are they as candid as their US counterparts on AI, tokens, cost, and results. Proprietary AI tracker data (see AI and Our World, SBSI blog) reveals that India is still at the honeymoon stage — where the promise of gains is considerably more prominent than any beady-eyed cost focus.

This is not unusual for an early adoption cycle. What is worth noting is that the reckoning, when it comes, tends to be sharper for organisations that have not built cost governance frameworks in the adoption phase. The US experience should serve as a valuable early warning.

India's large IT services and BPO sectors, which are integrating AI deeply into delivery pipelines, will be among the first to confront the cost discipline question at scale. The economics are unforgiving: if AI is delivering genuine productivity gains, those gains must be measured and attributed. If they are not being measured, they are likely not being captured — and costs will compound regardless.

Personal & Consumer Usage

The Consumer Treadmill — Free Until It Is Not

In the personal space, users will have noticed that free-tier limits are reached quickly. AI providers either reduce the context window on free tiers, or construct paid tiers calibrated to just about satisfy — while leaving the user enticingly short. The result is a familiar SaaS treadmill: start free, hit the wall, upgrade.

So even if token costs fall, as they will with continued scale and competition, total AI spending rises because usage expands to fill the newly affordable space. A pricing equilibrium will eventually be established, as it has been in cloud computing and mobile data. We are not there yet.

The structural dynamic — falling per-unit cost, rising total consumption, lagging governance — is the defining economic characteristic of the current AI adoption phase. Organisations and individuals who understand this dynamic early will be better placed to extract value without incurring the runaway costs that have already surprised some of the world's largest technology companies.

Summary

The AI Cost Ledger — Key Line Items

Cost Category	Definition	Control Lever
Input Token Cost	Charge per million tokens submitted to the engine (text, images, documents)	Prompt engineering; batch processing
Output Token Cost	Higher charge per million tokens generated by the model in its response	Response length limits; summarisation
Reasoning / Extended Thinking	Premium for multi-step inference and chain-of-thought processing modes	Match reasoning level to task complexity
Context Window Cost	Cost scales with how much prior conversation / document is loaded into active memory	Session management; context pruning
Multimodal Processing	Additional compute premium for image, audio, and video inputs or outputs	Limit multimodal to necessary use cases
Latency Premium	Surcharge for faster API response times and priority routing	Async processing where real-time is not required
Enterprise SLA / Uptime	Premium for guaranteed availability, data residency, and compliance coverage	Tier selection; workload classification

Source: SBSI Research synthesis; AI provider documentation; industry analyst frameworks.