OpenAI Doubled the Price — Developers Don’t Care

OpenAI GPT-5.5 agentic AI model visualization with neural network nodes
KEY POINTS
  • GPT-5.5 scores 82.7% on Terminal-Bench 2.0, setting a new state-of-the-art for agentic coding benchmarks
  • Input tokens cost $5/M and output tokens $30/M, exactly double GPT-5.4’s pricing
  • BrowseComp score of 90.1% surpasses Gemini 3.1 Pro (85.9%) in autonomous web research
  • The model uses fewer tokens to complete equivalent Codex tasks, offsetting much of the price increase in practice

82.7%. That is the number OpenAI posted on Terminal-Bench 2.0 when GPT-5.5 went live on April 23, 2026. In a market where every fraction of a percentage point triggers funding rounds and strategy pivots, that figure alone would be headline-worthy. But the real story is what happened next: OpenAI doubled its per-token pricing, and the developer community collectively shrugged and kept building. Understanding why requires looking beyond the benchmark leaderboard and into the economics of agentic AI.

What GPT-5.5 Actually Does Differently

From Chatbot to Coworker

OpenAI describes GPT-5.5 as “our smartest and most intuitive to use model yet, and the next step toward a new way of getting work done on a computer.” The emphasis is deliberate: this is not a model that simply generates better text. GPT-5.5 is designed to autonomously switch between writing code, debugging it, researching the web, analyzing data, creating documents and spreadsheets, operating software, and moving across tools until a task is finished. It is, in OpenAI’s framing, the difference between asking an intern a question and handing a project to a senior engineer.

The gains are most visible in agentic coding and computer use scenarios. On SWE-Bench Pro, which evaluates real-world GitHub issue resolution, GPT-5.5 reaches 58.6%. On the broader agentic tool-use leaderboard, it ranks second out of 112 models with an average score of 99.2. These are not synthetic toy problems; they measure a model’s ability to navigate actual developer workflows end-to-end.

Trend Insight — The shift from “chat” to “agent” is now the primary axis of competition among frontier labs. GPT-5.5’s architecture reflects a bet that users will pay more for a model that finishes tasks autonomously rather than one that merely answers questions faster.


The Benchmark Numbers That Matter

Terminal-Bench, BrowseComp, and SWE-Bench Pro

Three benchmarks tell the story of GPT-5.5’s positioning. Terminal-Bench 2.0, which tests sustained multi-step coding in realistic terminal environments, saw GPT-5.5 score 82.7%, a new state-of-the-art. BrowseComp, designed to measure a model’s ability to track down hard-to-find information across the open web, returned 90.1% for GPT-5.5 Pro, surpassing Gemini 3.1 Pro’s 85.9%. And SWE-Bench Pro, the gold standard for real-world software engineering tasks, landed at 58.6%.

What makes these numbers significant is their practical correlation. Terminal-Bench measures whether a model can actually ship working code in production-like environments. BrowseComp tests autonomous research ability, the kind of work that knowledge workers spend hours on every day. SWE-Bench Pro evaluates whether a model can read a GitHub issue, navigate a real codebase, and submit a working fix. Together, they paint a picture of a model built for work, not conversation.

Trend Insight — Benchmark diversity is becoming as important as benchmark scores. A model that leads on one metric but lags on others signals narrow optimization. GPT-5.5’s simultaneous leadership across coding, browsing, and engineering benchmarks reflects a genuinely broader capability base.


The Price Doubled. Here Is Why Nobody Is Complaining.

Token Economics in the Agentic Era

GPT-5.5 is priced at $5 per million input tokens and $30 per million output tokens, exactly double what GPT-5.4 charged. The Pro tier jumps to $30/$180. On paper, this looks like a significant cost increase. In practice, OpenAI has built in a counterweight: GPT-5.5 uses significantly fewer tokens to complete the same Codex tasks as its predecessor. The model matches GPT-5.4’s per-token latency while delivering materially higher intelligence per token consumed.

This creates a counterintuitive dynamic. A model that costs twice as much per token but uses half as many tokens to finish a job ends up at roughly the same effective cost while delivering a better result. For enterprise customers running thousands of agentic workflows per day, the math tilts further in GPT-5.5’s favor because fewer tokens also mean lower latency and faster task completion. OpenAI appears to have learned from the cloud computing playbook: charge more per unit, but make each unit do more work.

The broader context matters too. Enterprise now accounts for more than 40% of OpenAI’s revenue and is on track to reach parity with consumer revenue by end of 2026. Enterprise buyers evaluate total cost of task completion, not per-token pricing. For a company paying a senior developer $200,000 per year, a model that can autonomously resolve GitHub issues at 58.6% accuracy is not expensive at $30 per million output tokens. It is a bargain.

Trend Insight — The AI pricing model is shifting from per-token to per-task economics. As models become more efficient at completing entire workflows, the relevant metric is no longer cost per token but cost per completed task. Expect every major lab to follow this pricing logic within the next two quarters.


What This Means for the AI Race

Anthropic, Google, and the Pressure to Respond

GPT-5.5 arrives just weeks after GPT-5.4 and days after Google’s Cloud Next 2026 blitz, where the company unveiled TPU 8t/8i chips, Auto Browse for Chrome, and Deep Research Max. Anthropic, meanwhile, released Claude Mythos at the frontier end of its lineup. The cadence of releases has accelerated to the point where a model is old news within weeks of launch.

The competitive dynamics are shifting. Google is betting on infrastructure dominance with custom silicon and tight Workspace integration. Anthropic is pushing safety and reliability as differentiators. OpenAI is leaning into the agentic paradigm, positioning GPT-5.5 as the model that does not just answer your question but finishes your project. Each strategy has merit, but OpenAI’s approach has the most direct path to revenue because it maps onto work that companies are already paying humans to do.

For developers, the practical takeaway is straightforward. GPT-5.5 is now the strongest option for agentic coding workflows and autonomous research tasks. The price increase is real but offset by efficiency gains. And the window before competitors respond with equivalent capabilities is measured in weeks, not months. The AI race in 2026 is no longer about who has the smartest model. It is about who can turn that intelligence into completed work.

Trend Insight — The frontier lab competition has entered a phase where release cadence matters as much as model quality. OpenAI shipping GPT-5.5 just weeks after 5.4 signals that the traditional “big launch” model is being replaced by continuous deployment of incrementally better systems. This favors labs with deep infrastructure and rapid iteration capability.


Related

Sources

  1. OpenAI — Introducing GPT-5.5
  2. MarkTechPost — GPT-5.5 Benchmark Analysis
  3. TechCrunch — OpenAI Releases GPT-5.5

AI Biz Insider · AI Trends EN · aibizinsider.com


AI Biz Insider에서 더 알아보기

구독을 신청하면 최신 게시물을 이메일로 받아볼 수 있습니다.

코멘트

댓글 남기기

AI Biz Insider에서 더 알아보기

지금 구독하여 계속 읽고 전체 아카이브에 액세스하세요.

계속 읽기

AI Biz Insider에서 더 알아보기

지금 구독하여 계속 읽고 전체 아카이브에 액세스하세요.

계속 읽기