The AI Podcast with Fexingo: Artificial Intelligence, Machine Learning, and Modern AI Models

Why AI Model Prices Are Dropping Faster Than Ever

June 9, 20266 min · 941 words

Open in Steadcast for Mac Apple Podcasts Overcast

Show notes

Lucas and Luna explore the rapid decline in AI model pricing, driven by competition, efficiency gains, and open-weight releases. With Anthropic's Claude Fable 5 now publicly accessible and a new wave of small, cheap models flooding the market, inference costs are plummeting. The hosts examine how this price compression is reshaping enterprise adoption, software margins, and the hardware stocks that soared on AI hype. They cite Broadcom's 19 percent weekly drop and Arm's 22 percent plunge as signs that the market is revaluing AI bets. A concrete look at the economic forces making AI cheaper and what it means for builders and investors. #AIModelPricing #InferenceCosts #ClaudeFable5 #Anthropic #Broadcom #ArmHoldings #NVIDIA #AMD #SmallLanguageModels #EnterpriseAI #SoftwareMargins #AIStocks #TechInvesting #FexingoBusiness #BusinessPodcast #Technology #AIPodcast #MachineLearning Keep every episode free: buymeacoffee.com/fexingo

Highlighted moments

“I look at the price per million tokens for a mid-tier model. Twelve months ago it was around three dollars. Today you can get comparable quality for under fifty cents. That's an eighty percent drop.”

Jump to 0:00 in the transcript

Transcript

0:00Lucas: So Broadcom lost nineteen percent in a week. Arm dropped twenty-two percent. And the common thread isn't just that AI hardware stocks are getting crushed — it's that the underlying economics of AI are shifting faster than most people expected. Luna: You're saying the sell-off is tied to model prices coming down? Lucas: Exactly. Because if you're betting on endless demand for expensive chips, you need the cost of running AI models to stay high, or at least grow. But the opposite is happening. We're seeing a price compression in AI inference that's probably the most underappreciated story of 2026. Luna: Speaking of stories that don't get enough attention — you know what keeps this show ad-free? It's listeners chipping in a couple of bucks a month. Lucas: It really does. If today's conversation gives you something useful, buy me a coffee dot com slash fexingo. That small amount genuinely makes a difference. Luna: Yeah, it's how we keep this going without running ads. So, back to model prices — what's the catalyst you're seeing? Lucas: Well, take Anthropic's announcement today. Claude Fable 5 is now publicly accessible — that's a version of their Mythos model that's been optimized for efficiency. And it's not just Anthropic. We've got a wave of small, open-weight models that are running on less hardware. The marginal cost of a query is falling fast. Luna: And that hits companies like NVIDIA and Broadcom because their revenue projections assume high-margin chip sales for inference. Lucas: Right. If a model can run on a fraction of the GPUs, the total addressable market for those chips shrinks. Market is starting to price that in. But it's not all bad news — it's great for anyone building on top of AI. Luna: So software margins could actually expand if your input costs drop. Lucas: Exactly. The companies that were getting squeezed by high inference costs — they're about to get some relief. But the hardware narrative gets recalibrated. And that's why you see Arm down twenty-two percent. It's not that Arm is failing. It's that the multiple was priced for perfection in AI, and now perfection looks less certain. Luna: Is there a data point that crystalizes this for you? Lucas: I look at the price per million tokens for a mid-tier model. Twelve months ago it was around three dollars. Today you can get comparable quality for under fifty cents. That's an eighty percent drop. And the trend line is still sloping down. Luna: That's wild. And it's not just one provider — it's across the board. Lucas: Competition works. OpenAI, Anthropic, Google, Meta releasing open weights — they're all undercutting each other. The winner isn't necessarily the best model; it's the cheapest model that's good enough. Luna: So enterprises that held back on AI because of cost are now jumping in. Lucas: We're seeing that. But there's a second-order effect: if models are cheap, the volume of queries explodes. So total compute demand might still grow even as per-unit costs fall. The question is whether that volume growth offsets the price compression. Luna: And that's the debate the market is having right now. Lucas: Exactly. Some analysts think it's a wash — more queries, same revenue. Others think the price drop is structural and permanent, meaning hardware demand peaks earlier. The recent stock moves suggest the market is leaning toward the second view. Luna: You mentioned Broadcom down nineteen percent. Is that purely AI or is there company-specific news? Lucas: Broadcom is heavily tied to custom AI chips for hyperscalers. If those hyperscalers start designing their own silicon that's cheaper and more efficient, Broadcom's role shrinks. Plus, they have exposure to networking hardware that might not scale if data center buildouts slow. Luna: What about companies that benefit from cheaper AI? Like software firms or cloud platforms. Lucas: Microsoft, for instance, is down six percent in the last five days, but part of that is broader tech sell-off. Their actual AI revenue could get a boost if more customers can afford to use Copilot and Azure AI services. Same for Meta — they're investing heavily in open models, which drives down costs for everyone, including themselves. Luna: So the winners and losers are flipping. Lucas: The landscape is definitely rotating. The companies that sold picks and shovels during the gold rush had great margins when demand was insatiable. Now the gold is cheaper to extract, and the shovel makers have to adjust. Luna: Let's talk about the timeline. How quickly do you see this price compression impacting actual earnings? Lucas: It's already showing up. Some AI startups are reporting higher gross margins because their model costs dropped mid-quarter. But for the big hardware names, you'll see the impact in Q3 and Q4 guidance. If they guide lower, expect more pain. Luna: And for listeners who are investing in AI — what's the takeaway? Lucas: Look at the end users. Who is actually deploying AI and seeing returns? Those are the companies that benefit from cheaper inputs. The infrastructure plays had their moment — now it's about application layer value. Luna: That's a useful lens. Any specific sector you're watching? Lucas: Healthcare diagnostics, legal document analysis, customer support automation — industries with high labor costs and repetitive tasks. Those are the ones where cheaper AI unlocks massive ROI. And they don't need the most expensive chips. Luna: So the narrative shifts from 'AI is expensive' to 'AI is a commodity'. Lucas: Exactly. And commodities are great for consumers, brutal for producers that can't differentiate. That's the story of June 2026. Luna: Thanks, Lucas. That's a lot to digest.

More from The AI Podcast with Fexingo: Artificial Intelligence, Machine Learning, and Modern AI Models

Why Apple Intelligence Is Reshaping Enterprise AI Adoption

Jun 13, 20268 min

Why AI Hardware Stocks Are Splitting Into Two Markets

Jun 12, 20266 min

Why ASML and Applied Materials Surged While Nvidia Stalled

Jun 12, 20268 min

Intel Stock Surges 18 Percent on AI Foundry Bet

Jun 11, 20266 min

Why AI Model Safety Is Now a Public Company Risk

Jun 11, 20267 min