This $0.14 Model Almost Matches GPT-5.4

DeepSeek V4 open-source AI model visualization with cyan-teal tech theme

KEY POINTS

DeepSeek V4-Pro (1.6T parameters, 49B active) is now the largest open-weights AI model ever released under MIT license
V4-Pro costs $1.74/M input tokens vs. GPT-5.4’s $2.50 and Claude Sonnet 4.6’s $3.00 while scoring within 0.2 points on SWE-bench Verified
V4-Flash at $0.14/M input tokens undercuts every competing small model, including GPT-5.4 Nano ($0.20)
Efficiency breakthrough: V4-Pro uses only 27% of the FLOPs and 10% of the KV cache compared to its predecessor at 1M-token context

How much would you pay for an AI model that nearly matches GPT-5.4 on coding benchmarks? DeepSeek says $1.74 per million tokens should do it. The Chinese AI lab dropped two preview models on April 24 that could reshape the economics of running frontier-class AI, and the open-source community is already scrambling to quantize them for local hardware.

The Largest Open-Weights Model Ever

V4-Pro: 1.6 Trillion Parameters, MIT Licensed

DeepSeek-V4-Pro is a Mixture of Experts (MoE) model with 1.6 trillion total parameters and 49 billion active per token. That makes it the largest open-weights model available today, surpassing Kimi K2.6 (1.1T), GLM-5.1 (754B), and more than doubling its predecessor DeepSeek V3.2 (685B). It ships with a 1 million token context window and an MIT license, meaning anyone can use it commercially.

Its smaller sibling, DeepSeek-V4-Flash, packs 284 billion total parameters with 13 billion active. At 160GB on Hugging Face (versus Pro’s 865GB), Flash is already attracting attention from developers hoping to run it on high-end consumer hardware like Apple’s M5 MacBook Pro with 128GB of memory.

Benchmark Numbers That Matter

On SWE-bench Verified, the standard coding benchmark, V4-Pro scores 80.6%, landing within 0.2 percentage points of Claude Opus 4.6. On IMOAnswerBench, a mathematical reasoning test, it reaches 89.8, beating Claude (75.3) and Gemini (81.0), though GPT-5.4 still leads at 91.4. On MMLU-Pro, DeepSeek claims V4-Pro matches GPT-5.4 level performance.

DeepSeek is candid about its limitations. Their paper acknowledges that even with extended reasoning tokens, V4-Pro-Max “falls marginally short of GPT-5.4 and Gemini-3.1-Pro, suggesting a developmental trajectory that trails state-of-the-art frontier models by approximately 3 to 6 months.”

Trend Insight — DeepSeek’s transparent benchmarking stands out. Rather than cherry-picking wins, they acknowledge being 3-6 months behind frontier. This honesty, combined with MIT licensing, builds developer trust and accelerates adoption. The real competitive moat for closed-source labs is shrinking from years to months.

The Price War Heats Up

Flash: The Cheapest Small Model on the Market

DeepSeek-V4-Flash charges $0.14 per million input tokens and $0.28 per million output tokens. That undercuts GPT-5.4 Nano ($0.20/$1.25), Gemini 3.1 Flash-Lite ($0.25/$1.50), and Claude Haiku 4.5 ($1/$5). For developers running high-volume inference workloads, the savings compound fast.

Pro: Frontier Performance at Haiku Prices

V4-Pro at $1.74/$3.48 per million tokens sits well below GPT-5.4 ($2.50/$15), Claude Sonnet 4.6 ($3/$15), and Gemini 3.1 Pro ($2/$12). In output pricing, the gap is dramatic: V4-Pro’s $3.48 per million output tokens is less than a quarter of what GPT-5.4 charges ($15) for broadly comparable performance.

Trend Insight — The output token pricing gap is the real story. Most production AI applications are output-heavy (generating code, writing content, powering agents). V4-Pro delivering frontier-adjacent quality at $3.48 per million output tokens versus $15-30 at OpenAI and Anthropic forces incumbents to either cut prices or justify the premium with measurably better results.

The Efficiency Engineering Behind the Price

Radical Compute Reduction

DeepSeek’s technical paper reveals the engineering that enables these prices. At 1 million token context, V4-Pro requires only 27% of the single-token FLOPs (measured in equivalent FP8 FLOPs) and 10% of the KV cache size compared to DeepSeek V3.2. V4-Flash pushes efficiency even further: 10% of the FLOPs and just 7% of the KV cache relative to V3.2.

This is not just an incremental improvement. A 73-90% reduction in compute requirements for long-context inference means DeepSeek can profitably serve the model at prices that would be unsustainable for less efficient architectures. It also hints at why the model runs well on the DeepSeek API despite China’s restricted access to cutting-edge NVIDIA hardware.

Open Source Implications

With MIT licensing, V4’s weights are already on Hugging Face. The quantization community, particularly teams like Unsloth, is expected to release compressed versions soon. A quantized V4-Flash could realistically run on high-end consumer GPUs or Apple Silicon machines, putting near-frontier AI in the hands of independent developers without any API costs at all.

Trend Insight — DeepSeek’s efficiency focus solves two problems simultaneously: it keeps API prices low and makes local deployment feasible. This dual strategy threatens the business model of both API providers (on price) and hardware vendors (on required compute). Expect the quantized V4-Flash to become a popular choice for privacy-sensitive enterprise deployments within weeks.

What This Means for the Industry

DeepSeek V4 arrives at a pivotal moment. OpenAI just launched GPT-5.5, Google invested up to $40 billion in Anthropic, and the frontier labs are spending unprecedented sums on compute. Meanwhile, DeepSeek demonstrates that clever architecture can close most of the performance gap at a fraction of the cost.

The 3-6 month lag DeepSeek acknowledges is real, but for the vast majority of production use cases, “95% of frontier at 25% of the price” is a compelling value proposition. The question facing OpenAI, Google, and Anthropic is whether their remaining quality edge justifies a 4-8x price premium for enterprise customers making budget decisions today.

Sources

AI Biz Insider · AI Trends EN · aibizinsider.com