The AI Podcast with Fexingo: Artificial Intelligence, Machine Learning, and Modern AI Models

Why AI Model Prices Are Dropping Faster Than Ever

June 10, 20268 min · 1,340 words

Open in Steadcast for Mac Apple Podcasts Overcast

Show notes

Lucas and Luna dive into the accelerating collapse of AI inference costs. On June 10, 2026, they examine how Google's new subscription pricing — $9.99 per month for Gemini Ultra — signals a broader trend: model prices are dropping 10x faster than Moore's Law would predict. They break down the economics behind the plunge, from open-source competition (Meta's Llama 4) to hardware efficiency gains (NVIDIA's Blackwell Ultra). Using fresh data — Meta's India data center deal with Reliance and Google's price war warning shot — they explain why software margins are getting squeezed and what it means for enterprise AI adoption. A must-listen for anyone building on AI or investing in the space. #AI #ArtificialIntelligence #MachineLearning #ModelEconomics #InferenceCosts #Google #Gemini #Meta #Llama4 #NVIDIA #BlackwellUltra #OpenSource #PricingWar #EnterpriseAI #Technology #TechTrends #FexingoBusiness #BusinessPodcast Keep every episode free: buymeacoffee.com/fexingo

Highlighted moments

“When GPT-3 launched in 2020, the cost per thousand tokens for inference was roughly sixty cents. Today, with models like Llama 4 or Gemini 2.5, you're looking at less than one cent per thousand tokens”

Jump to 0:00 in the transcript

Transcript

0:00Lucas: So Google just fired a warning shot in the AI subscription price wars — that's what TechCrunch called it this morning, and I think the headline undersells it. Luna: Because it's not just Google vs. OpenAI anymore, is it? Lucas: Exactly. Google announced that Gemini Ultra will now cost $9.99 per month for individual users. That's half the price of OpenAI's ChatGPT Plus at $20. And for enterprise customers, they're dropping the per-seat cost by about forty percent. Luna: Forty percent — that's aggressive. But is this just a pricing battle, or does it reflect something deeper about model economics? Lucas: That's the real story. Model inference costs are collapsing faster than almost anyone predicted. And Google's move is a signal that they're operating with dramatically lower cost structures than they had even six months ago. Luna: We've talked about inference costs squeezing software margins before, but this feels like an inflection point. Lucas: It really does. Let me give you a concrete number. When GPT-3 launched in 2020, the cost per thousand tokens for inference was roughly sixty cents. Today, with models like Llama 4 or Gemini 2.5, you're looking at less than one cent per thousand tokens — and in some cases, fractions of a cent. Luna: That's a sixty-times drop in less than six years. Moore's Law would have given us maybe a four-times improvement in that period. Lucas: Right — so we're seeing what some analysts call 'hyper-scaling economics'. It's not just chip improvements, though NVIDIA's Blackwell Ultra, which started shipping in volume this quarter, plays a big role. It's also algorithmic efficiency — things like model distillation, quantization, and speculative decoding. Luna: And open-source models. Meta's Llama 4 is now competitive with proprietary models for many tasks, and it's free to self-host. Lucas: That's the competitive pressure that's really driving prices down. If you're Google or OpenAI, you can't charge a premium when a company can run Llama 4 on their own hardware for pennies. So you have to make your proprietary model so good — or so cheap — that the convenience wins. Luna: And cheaper inference also unlocks new use cases. Things that didn't make economic sense at fifty cents per query now become viable at a fraction of a cent. Lucas: Exactly. We're seeing this in customer service automation, code generation, even real-time translation. The marginal cost of an AI interaction is approaching zero. That changes the calculus for every software company. Luna: But it also puts pressure on margins for ai native companies. If the underlying model gets commoditized, the value moves up the stack — to the application layer, the data moat, the user experience. Lucas: And that's where I think the Google price cut is most revealing. Google isn't just competing on model quality — they're bundling Gemini into their entire ecosystem. Search, Workspace, Android. They can afford to sell the model at cost, or even at a loss, because it drives engagement across their platform. Luna: So OpenAI has to respond. They have to drop prices too, or differentiate in some other way. Lucas: They already have. ChatGPT Plus went from $20 to $15 in April for new subscribers. And I expect they'll match Google at $9.99 within the next quarter. The question is whether that's sustainable given their cost structure. Luna: Which brings us to the hardware side. NVIDIA's Blackwell Ultra is supposed to cut inference costs by another 30 to 40 percent over Hopper. But AMD and Intel are also pushing hard. Lucas: Yeah, AMD's MI400 series is gaining traction, especially for inference workloads. And Intel's Gaudi 3 is competitive on price-performance. The commoditization of AI hardware is accelerating the cost decline. Luna: And the data center buildout continues. Meta just signed its first AI data center deal in India with Reliance. That's a massive new region for compute. Lucas: India is going to be a huge market. Reliance is building out gigawatt-scale AI infrastructure. And they're likely using a mix of NVIDIA and AMD chips, which will further drive competition and lower costs. Luna: So what does this mean for the average business? If you're a startup building on top of AI, your cost of goods sold is plummeting. That's great for margins and for experimenting with new features. Lucas: But it also means your competitors have the same access. The moat isn't the model anymore — it's your data, your distribution, your brand. And that's a tough transition for companies that built their entire pitch around having a proprietary model. Luna: We saw that with some of the AI stocks getting crushed earlier this month. The market is starting to price in commoditization. Lucas: Yeah, and I think there's more pain ahead for pure-play model companies. The winners will be the platforms — the Googles, the Microsofts, the Metas — that can embed AI into existing products and monetize through subscriptions or advertising. Luna: Let's take a quick step back. You mentioned that inference costs are approaching zero. How low can they go? Lucas: I think we'll see another ten-times reduction in the next two years. We're still early in the efficiency curve. Techniques like speculative decoding — where a smaller model drafts tokens and a larger model verifies them — can cut latency and cost by two to three times on their own. Luna: And open-source models are going to keep improving. The gap between open and closed is shrinking fast. Lucas: Which brings me to something I've been thinking about. If model costs essentially go to zero, what happens to the business model of companies like OpenAI? They've raised tens of billions of dollars. They need to show a return. Luna: They're betting on becoming a platform too — with ChatGPT as the interface, and API as the backend. But if anyone can run a similar model for free, that's a tough sell. Lucas: Exactly. And that's why I think the next phase of AI competition isn't about model performance on benchmarks — it's about ecosystem lock-in. Google has search, email, docs, maps. Microsoft has Office, Azure, GitHub. Meta has social graph and messaging. OpenAI has... a chat app and an API. Luna: Unless they find a way to integrate deeper into workflows. Which they're trying with things like ChatGPT plugins and enterprise features. Lucas: And it's working to some degree. But the price pressure is real. And it's only going to intensify. Luna: You know, speaking of value shifting up the stack — if today's conversation gave you something usable, I want to mention something briefly. Lucas: Go ahead. Luna: This show is ad-free, and it stays that way because of listeners who chip in a couple of dollars a month. It genuinely makes a difference — helps us cover research, hosting, and the occasional coffee. Lucas: Yeah, if you've gotten value from these episodes, consider supporting at buy me a coffee dot com slash fexingo. No pressure, but it keeps the lights on. Luna: Alright, back to the economics. So where do you see the biggest opportunity for startups in this new landscape? Lucas: I think the biggest opportunity is in vertical-specific AI agents. The horizontal models are becoming cheap and commoditized, but a model that's fine-tuned on medical records or legal documents — that's still valuable. It's the data layer, not the model layer. Luna: So the winners will be companies that own unique data sets and can build specialized workflows around them. Lucas: Exactly. And with inference costs dropping, you can afford to run these models at scale. You can process millions of documents or customer interactions for a fraction of what it would have cost two years ago. Luna: That's the real story of June 2026, I think. Not just cheaper models, but the unlocking of entire new categories of applications. Lucas: And it's happening fast. I expect by this time next year, we'll look back at current prices as expensive. The trend line is clear. Luna: Alright, something to watch. Thanks Lucas. Lucas: Thanks Luna.

More from The AI Podcast with Fexingo: Artificial Intelligence, Machine Learning, and Modern AI Models

Why Apple Intelligence Is Reshaping Enterprise AI Adoption

Jun 13, 20268 min

Why AI Hardware Stocks Are Splitting Into Two Markets

Jun 12, 20266 min

Why ASML and Applied Materials Surged While Nvidia Stalled

Jun 12, 20268 min

Intel Stock Surges 18 Percent on AI Foundry Bet

Jun 11, 20266 min

Why AI Model Safety Is Now a Public Company Risk

Jun 11, 20267 min