The conventional business cost vocabulary — inventory, credit, wastage, interest coverage — now has new entries. AI has generated an entirely fresh set of line items that creep up insidiously if left untracked.
Every word you input to an AI engine, and every word it returns, costs money. A token is roughly 4 characters or 0.75 words. Input tokens and output tokens are billed separately — output costs more because it demands greater compute, energy, and trained inference.
Costs are quoted per million tokens. A 1,000-word article submitted for analysis consumes approximately 1,300 tokens.
OpenAI's output token cost has fallen from $30 per million (GPT-4, 2023) to $10 (Turbo), $5 (GPT-4o), and $2.50 (GPT-5.4). Yet overall enterprise AI spend continues to climb as employees use AI for an ever-wider range of tasks — a classic Jevons Paradox dynamic.
Model size, reasoning depth (extended thinking), context window size, multimodal processing, latency tiers, and enterprise uptime guarantees each add to the bill. Any one of these can silently inflate usage costs by an order of magnitude.
Tokenmaxxing — excessive AI-token consumption without commensurate business results — is the term now in circulation. Axios reported a USD 500 million single-month bill at a US company with no usage controls. Amazon and Microsoft have both scaled back or concealed AI dashboards amid runaway expenditure.
Indian companies do not disclose AI spend with the candour of their US counterparts. Proprietary tracking suggests India remains at a stage where the promise of gains outpaces beady-eyed cost focus. That equilibrium will shift.
Free-tier context windows shrink; paid tiers are calibrated to entice without fully satisfying. Even as per-token costs fall, total personal AI spending rises with habit formation. A pricing equilibrium is approaching — but not yet here.
A New Lexicon of Cost Has Arrived With AI
The conventional cost vocabulary of running a business — selling expenses, wastage, inventory, credit and debit days, interest coverage ratios — now has new entries. With the rapid invasion of AI in daily commercial and personal life, an entirely fresh set of line items has developed. You had better know them, track them, and control them, because they have the nasty habit of creeping up on you insidiously.
These are not theoretical future concerns. They are live, recurring, and growing charges appearing on enterprise procurement ledgers today — and increasingly catching the attention of finance departments who find them difficult to benchmark, forecast, or justify.
Token Cost — The Unit Price Everyone Now Understands
Put plainly: every word that you ask AI costs you; every answer that it gives you is also charged to you. Every conversation is charged exponentially, and every output — Excel, Word, PowerPoint — has its price.
Tokenisation and its cost is an interesting study. There are two categories of token: input and output. An input token is consumed when you submit words, images, or audio into the engine. Output tokens are generated by the AI engine's response — be it text, code, music, or images.
A token is roughly 4 characters, or 0.75 words. Submitting a 1,000-word article for analysis would consume approximately 1,300 input tokens. Costs are quoted per million tokens (per-MTok).
Input tokens typically cost less than output tokens, because output generation demands substantially more energy, compute, and the full weight of the model's training to produce a coherent response.
A 1,000-word article ≈ 1,300 input tokens. At GPT-4o pricing of $5 per million input tokens, that single analysis costs roughly $0.0065 — negligible in isolation. Multiply across 500 daily employee queries, each averaging 2,000 tokens in and 3,000 out, and a mid-sized enterprise easily breaches $10,000 per month without any governance in place.
GPT-4 (2023): $30/MTok output → GPT-4 Turbo: $10 → GPT-4o: $5 → GPT-5.4: $2.50. A ~92% reduction in unit cost over three years, even as model capability has risen sharply. Total enterprise spend has nonetheless continued upward as usage has expanded far faster than prices have fallen.
Leading AI Engine Token Pricing — Indicative Rates
The following are indicative rates per million tokens (per-MTok) for major commercially available large language models as of mid-2026. Prices are subject to tier, volume discount, and model version; ranges indicate tiered or variable pricing.
| # | Model / Engine | Company | Input Cost ($/MTok) | Output Cost ($/MTok) |
|---|---|---|---|---|
| 1 | GPT-5.5 | OpenAI | ~$5.00 | ~$30.00 |
| 2 | Claude Opus 4.7 | Anthropic | ~$5.00 | ~$25.00 |
| 3 | Gemini 3.1 Pro | Google AI | $2.00 – $4.00 | $12.00 – $18.00 |
| 4 | Grok-4.3 | xAI | ~$1.25 | ~$2.50 |
| 5 | DeepSeek V4 Pro | DeepSeek | ~$0.435 | ~$0.87 |
| 6 | Mistral Medium 3.5 | Mistral AI | ~$1.50 | ~$7.50 |
| 7 | Llama 4 Maverick | Meta AI | ~$0.27 | ~$0.85 |
| 8 | Command R+ | Cohere | ~$2.50 | ~$10.00 |
| 9 | Qwen 3.5 Max | Alibaba / Qwen | $0.60 – $1.20 | $2.00 – $6.00 |
| 10 | ERNIE 4.5 | Baidu AI Cloud | $1.00 – $3.00 | $5.00 – $12.00 |
Source: Published API pricing pages of respective AI providers; analyst estimates where official tiered rates not publicly disclosed. Rates as of mid-2026; subject to revision. MTok = million tokens.
The Falling Unit Cost — and Rising Total Spend
Per-token pricing has fallen dramatically across the industry over three years, driven by model efficiency, competition, and scale. The OpenAI trajectory is the most documented and illustrates a structural trend applicable across providers: unit price falls, but total expenditure rises as usage expands to fill the reduced-cost capacity — a dynamic economists recognise as the Jevons Paradox.
As prices have fallen from $30 to $2.50 per million output tokens — a reduction of approximately 92% — the total number of tokens consumed by enterprise users has grown by an estimated order of magnitude. The net effect on enterprise AI budgets has been upward, not downward.
Beyond Token Price — The Hidden Cost Multipliers
Token price is only the most visible element of AI cost. Several architectural and usage-driven factors multiply the base per-token rate significantly. Understanding these is essential for any organisation seeking to govern AI expenditure.
Tokenmaxxing — The New Corporate Profligacy
"A recent report by Axios noted the USD 500 million bill in just one month that hit a company where there were no controls on usage. Everything that could be used — Agents, code, queries — was used."
The unchecked explosion in AI usage costs has spawned a term: tokenmaxxing — excessive AI-token consumption without commensurate business results. In other words, carelessly using AI simply because it is available and accessible.
It is not only individual employees who go awry. Amazon US recently stopped publishing its AI dashboard after employees began gaming usage metrics in ways that cost the company handsomely. Microsoft recently scaled back some licences, too, because of cost. These are not isolated incidents; they are early signals of a structural governance gap.
From being treated as operational costs — variable, discretionary, department-level — AI expenditures are now becoming capital costs: large, recurring, difficult to reverse, and carrying unclear direct business benefit. Finance departments are beginning to ask the harder questions: a better, faster report is produced — but did it secure more business? AI identified vendor mismanagement — but did it lead to structural overhaul?
What is the per-query cost by department and use case? · What measurable business outcome was produced by each AI workload? · Which engine tiers are being used, and are they proportionate to task complexity? · Are Agents and automated pipelines running on metered or unmetered API calls? · Is there an AI usage governance policy, and who owns it?
India — Still in the AI Honeymoon Phase
Indian companies do not reveal their AI spends, nor are they as candid as their US counterparts on AI, tokens, cost, and results. Proprietary AI tracker data (see AI and Our World, SBSI blog) reveals that India is still at the honeymoon stage — where the promise of gains is considerably more prominent than any beady-eyed cost focus.
This is not unusual for an early adoption cycle. What is worth noting is that the reckoning, when it comes, tends to be sharper for organisations that have not built cost governance frameworks in the adoption phase. The US experience should serve as a valuable early warning.
India's large IT services and BPO sectors, which are integrating AI deeply into delivery pipelines, will be among the first to confront the cost discipline question at scale. The economics are unforgiving: if AI is delivering genuine productivity gains, those gains must be measured and attributed. If they are not being measured, they are likely not being captured — and costs will compound regardless.
The Consumer Treadmill — Free Until It Is Not
In the personal space, users will have noticed that free-tier limits are reached quickly. AI providers either reduce the context window on free tiers, or construct paid tiers calibrated to just about satisfy — while leaving the user enticingly short. The result is a familiar SaaS treadmill: start free, hit the wall, upgrade.
So even if token costs fall, as they will with continued scale and competition, total AI spending rises because usage expands to fill the newly affordable space. A pricing equilibrium will eventually be established, as it has been in cloud computing and mobile data. We are not there yet.
The structural dynamic — falling per-unit cost, rising total consumption, lagging governance — is the defining economic characteristic of the current AI adoption phase. Organisations and individuals who understand this dynamic early will be better placed to extract value without incurring the runaway costs that have already surprised some of the world's largest technology companies.
The AI Cost Ledger — Key Line Items
| Cost Category | Definition | Control Lever |
|---|---|---|
| Input Token Cost | Charge per million tokens submitted to the engine (text, images, documents) | Prompt engineering; batch processing |
| Output Token Cost | Higher charge per million tokens generated by the model in its response | Response length limits; summarisation |
| Reasoning / Extended Thinking | Premium for multi-step inference and chain-of-thought processing modes | Match reasoning level to task complexity |
| Context Window Cost | Cost scales with how much prior conversation / document is loaded into active memory | Session management; context pruning |
| Multimodal Processing | Additional compute premium for image, audio, and video inputs or outputs | Limit multimodal to necessary use cases |
| Latency Premium | Surcharge for faster API response times and priority routing | Async processing where real-time is not required |
| Enterprise SLA / Uptime | Premium for guaranteed availability, data residency, and compliance coverage | Tier selection; workload classification |
Source: SBSI Research synthesis; AI provider documentation; industry analyst frameworks.