Overview
Wes Roth breaks down the release of OpenAI's GPT 5.4, which arrives with native computer use capabilities, strong benchmark results on GDP-Val (beating human experts ~70% of the time), and state-of-the-art OS World performance surpassing human-level desktop navigation. The video also covers Anthropic's designation as a supply chain risk by the Department of Defense, a key researcher leaving OpenAI for Anthropic, and OpenAI's new financial services tools.
Key Topics
GPT 5.4 Release Highlights
- Native computer use: GPT 5.4 is the first general-purpose model with built-in computer use capabilities — no external tooling required. It can issue mouse and keyboard commands in response to screenshots and write code to operate computers via libraries like Playwright.
- GDP-Val benchmark: GPT 5.4 Pro achieves an 82–83% win-or-tie rate against deliverables from human experts (people with ~14 years of experience at companies like Deloitte, Wells Fargo, Google). Pure win rate sits around 70%.
- OS World verified: GPT 5.4 achieves a 75% success rate on desktop navigation tasks (screenshots + keyboard/mouse), up from 47% for GPT 5.2. This surpasses human performance of 72.4%.
- Visual debugging loop: The model can now build a game, use its computer vision to play-test it, and identify graphical issues — ending the cycle of "it's just a black screen" that plagued earlier models.
Anthropic Supply Chain Designation
- Anthropic has been officially labeled a supply chain risk by the Department of Defense.
- The designation has a narrow scope — it applies only to use of Claude as a direct part of contracts with the Department, not all usage by customers who also have government contracts.
- Anthropic plans to challenge the designation in court and is in active negotiations with the government.
Labor Market Impacts
- Anthropic published research on AI's labor market effects, finding that while major displacement hasn't happened yet, hiring is slowing for early-career roles — people just out of college in their first few years of building skills.
- These findings align with the Stanford paper that used Anthropic's data.
- Current workplace automation represents only a tiny percentage of what's actually possible.
OpenAI's Competitive Moves
- Skills support: OpenAI is adopting a skills-like system similar to Anthropic's, including migration paths from Anthropic to OpenAI.
- ChatGPT for Excel: A direct tool for spreadsheet work within ChatGPT.
- Financial services suite: New tools targeting finance, with an internal investment banking benchmark where GPT 5.4 Thinking scores 87% vs. GPT 5.2 Pro at 71% and Opus 4.6 at 64%.
- Priority mode: A "fast lane" for faster responses (possibly running on Cerebras chips).
- Mid-stream interruption: Users can redirect the model while it's generating a response.
Talent Movement
- Max Schwarzer, who worked on GPT 5 reasoning, scaling test-time compute, and helped ship the original O1 preview, is leaving OpenAI for Anthropic.
- He cited trusted colleagues who had already moved to Anthropic as his motivation — no drama, praised OpenAI and Sam Altman on the way out.
Key Takeaways
- Native computer use in a foundation model is a major milestone — it enables AI agents that can interact with real software without custom integrations.
- The GDP-Val results show AI completing professional-grade work better than experienced humans 70% of the time, raising serious questions about workforce impact.
- The Anthropic supply chain designation, while narrow, signals real government tension around AI company alignment and procurement.
- The AI competitive landscape is intensifying with OpenAI adopting Anthropic's successful patterns (skills, domain tools) while pushing ahead on model capabilities.
About the Creator
Wes Roth covers AI news, model releases, and the broader implications of artificial intelligence progress. He tests new models hands-on and tracks the competitive dynamics between major AI labs.