OpenAI Launches GPT-5.4, Its Most Expensive

With Thinking and Pro variants, a 1M-token context window, and native computer-use capabilities, OpenAI's latest model is its most deliberate bid for professional workflows yet.

The Lead: A Model Built for the Workday, Not the Demo Stage

Just two days after quietly releasing GPT-5.3 Instant as a lightweight conversational model, OpenAI has made a far louder statement. GPT-5.4, announced Thursday, arrives in two purpose-built variants Thinking and Pro each targeting a different slice of professional work. The message is unmistakable: OpenAI is no longer marketing general intelligence; it is selling productivity.

The timing is deliberate. Google’s Gemini continues to close the gap on multimodal benchmarks, Anthropic’s Claude has built a loyal enterprise following around its safety-first positioning, and Meta’s open-source Llama models keep undercutting the market on price. GPT-5.4 is OpenAI’s answer: a model that doesn’t just score well on tests but claims to outperform human professionals in 83% of comparisons across 44 knowledge-work occupations, according to OpenAI’s GDPval benchmark.

Technical Breakdown: How GPT-5.4 Actually Works

Two Models, Two Jobs

GPT-5.4 Thinking is the reasoning-heavy variant, available to all paid ChatGPT subscribers. It surfaces an upfront plan of its chain-of-thought, allowing users to steer the model mid-response before it burns through thousands of tokens on a wrong path. It excels at deep web research, multi-step analysis, and tasks requiring sustained context over long sessions.

GPT-5.4 Pro is the high-performance variant, reserved for ChatGPT Pro ($200/month) and Enterprise users. It pushes further on advanced problems scoring 38% on FrontierMath’s hardest problems compared to 27.1% for Thinking and sets a new state of the art of 89.3% on BrowseComp, a benchmark for persistent web-browsing agents.

Context, Efficiency, and Computer Use

The API version supports up to 1 million tokens of context by far the largest window OpenAI has offered allowing agents to hold entire codebases, long contract sets, or multi-quarter financial models in a single session. OpenAI reports that GPT-5.4 uses up to 47% fewer tokens than its predecessor on some tasks, which partially offsets a per-token price increase to $2.50/$15 per million input/output tokens for Thinking and $30/$180 for Pro.

GPT-5.4 is also OpenAI’s first mainstream model with native computer-use capabilities, enabling agents to interact directly with software in a build-run-verify-fix loop. A new system called Tool Search replaces the old approach of dumping every tool definition into the system prompt. Instead, the model looks up tool definitions on demand, keeping prompts lean and reducing latency in environments with dozens or hundreds of integrations.

Factual Accuracy

OpenAI claims GPT-5.4 is its most factual model to date: individual claims are 33% less likely to be false, and full responses are 18% less likely to contain any errors compared to GPT-5.2. On BigLaw Bench, a legal-specific evaluation, the model scored 91%.

Why This Matters for the Industry

The bifurcation into Thinking and Pro signals a strategic shift away from one-size-fits-all foundation models. OpenAI is now explicitly segmenting by use case reasoning depth versus throughput a move that mirrors how cloud providers tier their compute offerings.

For competitors, the pressure is specific. Anthropic’s Claude has cultivated enterprise trust through its safety-first approach, but GPT-5.4’s legal and financial benchmarks including a jump from 43.7% to 88% on an internal investment banking evaluation aim directly at that same buyer. Google’s Gemini still leads on certain multimodal tasks, but OpenAI’s 1M-token window and native computer use narrow the gap on long-horizon agentic work. Meta’s Llama remains the value option, but it lacks the enterprise integration stack that GPT-5.4 now bundles with Excel add-ins, FactSet connectors, and reusable financial “Skills.”

For end users, the practical upside is a model that requires less hand-holding. The upfront thinking plan in the Thinking variant means fewer wasted tokens and faster iteration cycles. The enterprise finance suite suggests that AI is moving from a general-purpose assistant to a domain-specific co-worker.

Ethical and Practical Considerations

GPT-5.4 is not without concerns. The pricing structure Pro costs $30/$180 per million tokens, making it OpenAI’s most expensive model ever risks creating a two-tier AI ecosystem where only well-funded enterprises can access peak performance. Smaller startups and independent developers may find themselves priced out of the frontier.

On safety, OpenAI introduced a new evaluation testing whether reasoning models misrepresent their chain-of-thought. Early results suggest GPT-5.4 Thinking is less prone to deceptive reasoning than its predecessors, but the company acknowledges that deception can still occur under certain conditions. This remains an open research problem, not a solved one.

The 83% figure on GDPval also warrants scrutiny. Matching or exceeding a professional on a benchmark is not the same as replacing one. Context, judgment, and accountability in high-stakes fields like law and finance still require human oversight a point OpenAI itself implicitly acknowledges by marketing the model as a co-pilot, not an autonomous agent.

Future Outlook: The Next 12 Months

GPT-5.4 likely sets the template for what comes next across the industry. Expect Anthropic and Google to respond with their own variant-based strategies within the quarter, segmenting models by reasoning depth, speed, and cost. The era of a single “best model” may be ending, replaced by portfolios of specialized systems.

Tool Search and native computer use point toward a future where AI agents don’t just answer questions but operate software autonomously across applications. Within 12 months, the competitive benchmark will shift from “which model scores highest” to “which model completes the most real-world tasks end-to-end without human intervention.”

The real test for GPT-5.4 will not be its launch-day benchmarks. It will be whether enterprise customers report measurable productivity gains that justify the premium. If OpenAI can deliver on that promise, it cements its position at the center of the professional AI stack. If not, the competitors circling the same market will move fast.

Key Takeaways

GPT-5.4 ships in two variants: Thinking (reasoning-first, all paid users) and Pro (max performance, $200/month and Enterprise tiers).
1M-token context window and native computer-use capabilities mark a shift toward long-horizon, agentic workflows.
Tool Search replaces prompt-stuffing for tool definitions, reducing latency and cost at scale.
Factual accuracy improves by 18–33% over GPT-5.2, with a 91% score on legal benchmarks.
Pricing is the highest in OpenAI’s lineup ($30/$180 per million tokens for Pro), raising accessibility concerns for smaller teams.

OpenAI Launches GPT-5.4, Its Most Expensive—and Most Capable—Model Yet

Table of Contents

The Lead: A Model Built for the Workday, Not the Demo Stage

Technical Breakdown: How GPT-5.4 Actually Works

Two Models, Two Jobs

Context, Efficiency, and Computer Use

Factual Accuracy

Why This Matters for the Industry

Ethical and Practical Considerations

Future Outlook: The Next 12 Months

Key Takeaways

About Muhammad Zeeshan

Comments (0)

Leave a Comment

No Comments Yet

Relevant AI Tools

PhotoRoom

Replit

DeepBrain AI

More AI News